Overview
Overview
Welcome to the Fluxions API. Our hosted endpoints cover three product surfaces:
- Transcription —
akro-v1, our listening model: speech-to-text, speaker diarization, and non-speech events (breaths, laughter, hesitations) in one call. Production-ready today. - Text-to-Speech — hosted VUI for conversational TTS. Coming soon — join the waitlist.
- Realtime Voice — OpenAI Realtime-compatible WebSocket for end-to-end streaming voice conversations. Coming soon.
This page covers the basics that apply across all surfaces: authentication, base URL, and a health check.
Authentication
All API requests require authentication using an API key. Include your API key in the Authorization header:
curl "https://api.fluxions.ai/endpoint" \-H "Authorization: YOUR_API_KEY"
import requestsheaders = {'Authorization': 'YOUR_API_KEY'}response = requests.get('https://api.fluxions.ai/endpoint', headers=headers)data = response.json()
const response = await fetch('https://api.fluxions.ai/endpoint', {headers: {'Authorization': 'YOUR_API_KEY'}});const data = await response.json();
Important: Do not use the "Bearer " prefix. Include the API key directly in the Authorization header.
Base URL
https://api.fluxions.ai
GET /health — Health Check
Check the API status and version information. No authentication required.
Request
curl "https://api.fluxions.ai/health"
import requestsresponse = requests.get('https://api.fluxions.ai/health')data = response.json()print(f"Status: {data['status']}, Model: {data['model']}")
const response = await fetch('https://api.fluxions.ai/health');const data = await response.json();console.log(`Status: ${data.status}, Model: ${data.model}`);
Response
{"status": "ok","version": "1.0.0","model": "akro-v1"}
Transcription
Transcription
Our akro-v1 model is a comprehensive listening model that performs:
- Transcription — Convert speech to text with high accuracy
- Speaker Diarization — Identify and separate different speakers ("who said what")
- Non-Speech Detection — Capture breathing, laughter, hesitation, and other contextual sounds
This makes it ideal for transcribing meetings, interviews, podcasts, and any audio where understanding the full context matters.
All transcription endpoints require authentication — see Overview for API key setup.
POST /submit — Submit Transcription
Submit audio for processing and receive a job ID immediately. Poll /transcriptions/{id} for results including transcription, speaker diarization, and non-speech events.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
non_speech | boolean | false | Include non-speech sounds |
filename | string | "audio" | Name for the uploaded file |
cache | boolean | true | Use cached results for identical files |
Request
Body: raw audio file bytes.
curl -X POST "https://api.fluxions.ai/submit" \-H "Authorization: YOUR_API_KEY" \-H "Content-Type: audio/mpeg" \--data-binary @audio.mp3
import requestswith open('audio.mp3', 'rb') as f:response = requests.post('https://api.fluxions.ai/submit',headers={'Authorization': 'YOUR_API_KEY'},data=f)job = response.json()job_id = job['id']
const formData = new FormData();formData.append('file', audioFile);const response = await fetch('https://api.fluxions.ai/submit', {method: 'POST',headers: {'Authorization': 'YOUR_API_KEY'},body: formData});const job = await response.json();const jobId = job.id;
Response
{"id": 124,"status": "submitted","created_at": "2025-10-24T10:35:00.000Z","original_audio_url": "https://...","query_urls": {"get": "https://api.fluxions.ai/transcriptions/124","status": "https://api.fluxions.ai/transcriptions/124"},"cached": false}
Workflow
- Submit audio via
/submitand receive job ID - Poll
/transcriptions/{id}to check status - When
statusis"completed", retrieve full results
GET /transcriptions/{id} — Get Transcription Results
Retrieve the full results for a specific job: transcription, speaker diarization, and non-speech events.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
word_level_timestamps | boolean | false | Include word-level timestamps in segments |
Request
curl "https://api.fluxions.ai/transcriptions/124" \-H "Authorization: YOUR_API_KEY"
import requestsresponse = requests.get('https://api.fluxions.ai/transcriptions/124',headers={'Authorization': 'YOUR_API_KEY'})result = response.json()if result['status'] == 'completed':print(result['text'])
const response = await fetch('https://api.fluxions.ai/transcriptions/124',{headers: {'Authorization': 'YOUR_API_KEY'}});const result = await response.json();if (result.status === 'completed') {console.log(result.text);}
Response
{"id": 124,"status": "completed","created_at": "2025-10-24T10:35:00.000Z","updated_at": "2025-10-24T10:35:20.000Z","filename": "interview.mp3","audio_duration": 300.0,"audio_format": "opus","processing_time": 245.5,"language": "en","non_speech": false,"num_chunks": 11,"num_segments": 25,"num_speakers": 2,"text": "SPEAKER_0: Yeah, let's actually start off exactly, where we initially began.\nSPEAKER_1: Sounds perfect. That makes complete sense to me.\nSPEAKER_0: So I started thinking about what if this is just a construct?","segments": [{"speaker": "0","text": "Yeah, let's actually start off exactly, where we initially began.","start": 0.86,"end": 6.42,"segment_idx": 0},{"speaker": "1","text": "Sounds perfect","start": 6.0,"end": 7.2,"segment_idx": 0},{"speaker": "1","text": "That makes complete sense to me.","start": 7.5,"end": 9.8,"segment_idx": 1}],"audio_url": "https://...r2.cloudflarestorage.com/...","cached": true}
Status Values
submitted— Job has been submittedprocessing— Transcription in progresscompleted— Transcription finished successfullyfailed— Transcription failed (checkerror_message)
GET /transcriptions — List Transcriptions
List all transcriptions for your account.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | integer | 50 | Number of results per page (max: 100) |
offset | integer | 0 | Pagination offset |
Request
curl "https://api.fluxions.ai/transcriptions?limit=10&offset=0" \-H "Authorization: YOUR_API_KEY"
import requestsresponse = requests.get('https://api.fluxions.ai/transcriptions',headers={'Authorization': 'YOUR_API_KEY'},params={'limit': 10, 'offset': 0})data = response.json()print(f"Total: {data['total']}, Found: {len(data['transcriptions'])} transcriptions")for t in data['transcriptions']:print(f" ID {t['id']}: {t['filename']} - {t['status']}")
const response = await fetch('https://api.fluxions.ai/transcriptions?limit=10&offset=0',{headers: {'Authorization': 'YOUR_API_KEY'}});const data = await response.json();console.log(`Total: ${data.total}, Found: ${data.transcriptions.length} transcriptions`);data.transcriptions.forEach(t => {console.log(` ID ${t.id}: ${t.filename} - ${t.status}`);});
Response
{"total": 150,"limit": 10,"offset": 0,"transcriptions": [{"id": 150,"status": "completed","created_at": "2025-10-24T10:40:00.000Z","filename": "interview.mp3","audio_duration": 1800.0,"audio_format": "opus","processing_time": 45.2,"num_speakers": 2,"num_segments": 142,"original_audio_url": "https://...","language": "en"}]}
Response Format
Text Field
The text field contains the full transcription with speaker labels and optional non-speech events:
- Speaker Labels:
SPEAKER_0:,SPEAKER_1:, etc. prefix each speaker's utterances - Line Breaks: Newlines (
\n) separate different speaker turns - Non-speech Events: When enabled, events like
[breath],[pause]appear inline
Example:
SPEAKER_0: Yeah, let's start [breath] where we began.SPEAKER_1: Sounds good. That makes sense.SPEAKER_0: So I was thinking about [pause] what if this is a construct?
Segments Array
The segments array provides precise timing and speaker information for each utterance:
- speaker: Speaker ID as a string (
"0","1", etc.) - text: The spoken text for this segment (without non-speech events)
- start: Start time in seconds (decimal precision)
- end: End time in seconds (decimal precision)
- segment_idx: Sequential index for this segment
Non-Speech Events
When non_speech=true, our listening model captures various non-speech sounds and events that provide additional context to the conversation.
Common Non-Speech Sounds
| Event | Tag | Description | Example Usage |
|---|---|---|---|
| Breath | [breath] | Audible breathing sounds | ...end of sentence. [breath] Now this is important. |
| Laugh | [laugh] or hahaha | Laughter - can be written as text or tagged for longer laughs | Oh wow! hahaha [breath] that's hilarious. |
| Hesitation | [hesitation] or [hesitate] | Unclear thinking noises or mouth sounds while pausing - not specific words | Well [hesitation] um I'm not really sure. |
| Pause | [pause] | Unnaturally long, noticeable pause (e.g., looking something up) | Let me just uh... [pause] Let me look this up. |
| Environment | [env] | Background noise or environmental sounds | I was thinking [env] about what you said. |
| Tut | [tut] | Tongue click or lip smack sound | [tut] That's not quite right. |
| Sigh | [sigh] | Expressive exhale sound | [sigh] I suppose you're right. |
| Sniff | [sniff] | Nasal inhale or sniffing sound | [sniff] Something smells good in here. |
| Cough | [cough] | Coughing sound | Sorry, excuse me [cough] as I was saying... |
Usage Notes
- Non-speech events are placed inline with the transcribed text
- Events appear at their natural position in the conversation flow
- Word elongation is marked with ellipsis:
um... so... I think... - Emphasis on words uses asterisks:
I *really* think so
Speech
Speech
Hosted VUI — conversational text-to-speech and OpenAI Realtime-compatible streaming voice.
Coming soon. Read the launch post or join the waitlist.