# Overview

# Overview

Welcome to the Fluxions API. Our hosted endpoints cover three product surfaces:

- **Transcription** — `akro-v1`, our listening model: speech-to-text, speaker diarization, and non-speech events (breaths, laughter, hesitations) in one call. Production-ready today.
- **Text-to-Speech** — hosted **VUI** for conversational TTS. *Coming soon — [join the waitlist](/?join=vui).*
- **Realtime Voice** — OpenAI Realtime-compatible WebSocket for end-to-end streaming voice conversations. *Coming soon.*

This page covers the basics that apply across all surfaces: authentication, base URL, and a health check.

## Authentication

All API requests require authentication using an API key. Include your API key in the `Authorization` header:

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/endpoint" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

headers = {'Authorization': 'YOUR_API_KEY'}
response = requests.get('https://api.fluxions.ai/endpoint', headers=headers)
data = response.json()
```

**JavaScript (.js)**
```javascript
const response = await fetch('https://api.fluxions.ai/endpoint', {
  headers: {'Authorization': 'YOUR_API_KEY'}
});
const data = await response.json();
```

**Important**: Do not use the "Bearer " prefix. Include the API key directly in the Authorization header.

## Base URL

```
https://api.fluxions.ai
```

## GET /health — Health Check

Check the API status and version information. *No authentication required.*

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/health"
```

**Python (.py)**
```python
import requests

response = requests.get('https://api.fluxions.ai/health')
data = response.json()
print(f"Status: {data['status']}, Model: {data['model']}")
```

**JavaScript (.js)**
```javascript
const response = await fetch('https://api.fluxions.ai/health');
const data = await response.json();
console.log(`Status: ${data.status}, Model: ${data.model}`);
```

### Response

```json
{
  "status": "ok",
  "version": "1.0.0",
  "model": "akro-v1"
}
```



---

# Transcription

# Transcription

Our **akro-v1** model is a comprehensive listening model that performs:

- **Transcription** — Convert speech to text with high accuracy
- **Speaker Diarization** — Identify and separate different speakers ("who said what")
- **Non-Speech Detection** — Capture breathing, laughter, hesitation, and other contextual sounds

This makes it ideal for transcribing meetings, interviews, podcasts, and any audio where understanding the full context matters.

All transcription endpoints require authentication — see [Overview](#overview) for API key setup.

## POST /submit — Submit Transcription

Submit audio for processing and receive a job ID immediately. Poll `/transcriptions/{id}` for results including transcription, speaker diarization, and non-speech events.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `non_speech` | boolean | `false` | Include non-speech sounds |
| `filename` | string | `"audio"` | Name for the uploaded file |
| `cache` | boolean | `true` | Use cached results for identical files |

### Request

Body: raw audio file bytes.

**Bash (.sh)**
```bash
curl -X POST "https://api.fluxions.ai/submit" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: audio/mpeg" \
  --data-binary @audio.mp3
```

**Python (.py)**
```python
import requests

with open('audio.mp3', 'rb') as f:
    response = requests.post(
        'https://api.fluxions.ai/submit',
        headers={'Authorization': 'YOUR_API_KEY'},
        data=f
    )

job = response.json()
job_id = job['id']
```

**JavaScript (.js)**
```javascript
const formData = new FormData();
formData.append('file', audioFile);

const response = await fetch('https://api.fluxions.ai/submit', {
  method: 'POST',
  headers: {'Authorization': 'YOUR_API_KEY'},
  body: formData
});

const job = await response.json();
const jobId = job.id;
```

### Response

```json
{
  "id": 124,
  "status": "submitted",
  "created_at": "2025-10-24T10:35:00.000Z",
  "original_audio_url": "https://...",
  "query_urls": {
    "get": "https://api.fluxions.ai/transcriptions/124",
    "status": "https://api.fluxions.ai/transcriptions/124"
  },
  "cached": false
}
```

### Workflow

1. Submit audio via `/submit` and receive job ID
2. Poll `/transcriptions/{id}` to check status
3. When `status` is `"completed"`, retrieve full results

## GET /transcriptions/{id} — Get Transcription Results

Retrieve the full results for a specific job: transcription, speaker diarization, and non-speech events.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `word_level_timestamps` | boolean | `false` | Include word-level timestamps in segments |

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/transcriptions/124" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

response = requests.get(
    'https://api.fluxions.ai/transcriptions/124',
    headers={'Authorization': 'YOUR_API_KEY'}
)

result = response.json()
if result['status'] == 'completed':
    print(result['text'])
```

**JavaScript (.js)**
```javascript
const response = await fetch(
  'https://api.fluxions.ai/transcriptions/124',
  {
    headers: {'Authorization': 'YOUR_API_KEY'}
  }
);

const result = await response.json();
if (result.status === 'completed') {
  console.log(result.text);
}
```

### Response

```json
{
  "id": 124,
  "status": "completed",
  "created_at": "2025-10-24T10:35:00.000Z",
  "updated_at": "2025-10-24T10:35:20.000Z",
  "filename": "interview.mp3",
  "audio_duration": 300.0,
  "audio_format": "opus",
  "processing_time": 245.5,
  "language": "en",
  "non_speech": false,
  "num_chunks": 11,
  "num_segments": 25,
  "num_speakers": 2,
  "text": "SPEAKER_0: Yeah, let's actually start off exactly, where we initially began.\nSPEAKER_1: Sounds perfect. That makes complete sense to me.\nSPEAKER_0: So I started thinking about what if this is just a construct?",
  "segments": [
    {
      "speaker": "0",
      "text": "Yeah, let's actually start off exactly, where we initially began.",
      "start": 0.86,
      "end": 6.42,
      "segment_idx": 0
    },
    {
      "speaker": "1",
      "text": "Sounds perfect",
      "start": 6.0,
      "end": 7.2,
      "segment_idx": 0
    },
    {
      "speaker": "1",
      "text": "That makes complete sense to me.",
      "start": 7.5,
      "end": 9.8,
      "segment_idx": 1
    }
  ],
  "audio_url": "https://...r2.cloudflarestorage.com/...",
  "cached": true
}
```

### Status Values

- `submitted` — Job has been submitted
- `processing` — Transcription in progress
- `completed` — Transcription finished successfully
- `failed` — Transcription failed (check `error_message`)

## GET /transcriptions — List Transcriptions

List all transcriptions for your account.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `limit` | integer | `50` | Number of results per page (max: 100) |
| `offset` | integer | `0` | Pagination offset |

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/transcriptions?limit=10&offset=0" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

response = requests.get(
    'https://api.fluxions.ai/transcriptions',
    headers={'Authorization': 'YOUR_API_KEY'},
    params={'limit': 10, 'offset': 0}
)

data = response.json()
print(f"Total: {data['total']}, Found: {len(data['transcriptions'])} transcriptions")
for t in data['transcriptions']:
    print(f"  ID {t['id']}: {t['filename']} - {t['status']}")
```

**JavaScript (.js)**
```javascript
const response = await fetch(
  'https://api.fluxions.ai/transcriptions?limit=10&offset=0',
  {
    headers: {'Authorization': 'YOUR_API_KEY'}
  }
);

const data = await response.json();
console.log(`Total: ${data.total}, Found: ${data.transcriptions.length} transcriptions`);
data.transcriptions.forEach(t => {
  console.log(`  ID ${t.id}: ${t.filename} - ${t.status}`);
});
```

### Response

```json
{
  "total": 150,
  "limit": 10,
  "offset": 0,
  "transcriptions": [
    {
      "id": 150,
      "status": "completed",
      "created_at": "2025-10-24T10:40:00.000Z",
      "filename": "interview.mp3",
      "audio_duration": 1800.0,
      "audio_format": "opus",
      "processing_time": 45.2,
      "num_speakers": 2,
      "num_segments": 142,
      "original_audio_url": "https://...",
      "language": "en"
    }
  ]
}
```

## Response Format

### Text Field

The `text` field contains the full transcription with speaker labels and optional non-speech events:

- **Speaker Labels**: `SPEAKER_0:`, `SPEAKER_1:`, etc. prefix each speaker's utterances
- **Line Breaks**: Newlines (`\n`) separate different speaker turns
- **Non-speech Events**: When enabled, events like `[breath]`, `[pause]` appear inline

**Example**:
```
SPEAKER_0: Yeah, let's start [breath] where we began.
SPEAKER_1: Sounds good. That makes sense.
SPEAKER_0: So I was thinking about [pause] what if this is a construct?
```

### Segments Array

The `segments` array provides precise timing and speaker information for each utterance:

- **speaker**: Speaker ID as a string (`"0"`, `"1"`, etc.)
- **text**: The spoken text for this segment (without non-speech events)
- **start**: Start time in seconds (decimal precision)
- **end**: End time in seconds (decimal precision)
- **segment_idx**: Sequential index for this segment

## Non-Speech Events

When `non_speech=true`, our listening model captures various non-speech sounds and events that provide additional context to the conversation.

### Common Non-Speech Sounds

| Event | Tag | Description | Example Usage |
|-------|-----|-------------|---------------|
| **Breath** | `[breath]` | Audible breathing sounds | `...end of sentence. [breath] Now this is important.` |
| **Laugh** | `[laugh]` or `hahaha` | Laughter - can be written as text or tagged for longer laughs | `Oh wow! hahaha [breath] that's hilarious.` |
| **Hesitation** | `[hesitation]` or `[hesitate]` | Unclear thinking noises or mouth sounds while pausing - not specific words | `Well [hesitation] um I'm not really sure.` |
| **Pause** | `[pause]` | Unnaturally long, noticeable pause (e.g., looking something up) | `Let me just uh... [pause] Let me look this up.` |
| **Environment** | `[env]` | Background noise or environmental sounds | `I was thinking [env] about what you said.` |
| **Tut** | `[tut]` | Tongue click or lip smack sound | `[tut] That's not quite right.` |
| **Sigh** | `[sigh]` | Expressive exhale sound | `[sigh] I suppose you're right.` |
| **Sniff** | `[sniff]` | Nasal inhale or sniffing sound | `[sniff] Something smells good in here.` |
| **Cough** | `[cough]` | Coughing sound | `Sorry, excuse me [cough] as I was saying...` |

### Usage Notes

- Non-speech events are placed inline with the transcribed text
- Events appear at their natural position in the conversation flow
- Word elongation is marked with ellipsis: `um... so... I think...`
- Emphasis on words uses asterisks: `I *really* think so`



---

# Speech

# Speech

Hosted **VUI** — conversational text-to-speech and OpenAI Realtime-compatible streaming voice.

**Coming soon.** [Read the launch post](/blog/vui-launch) or [join the waitlist](/?join=vui).


