Text-to-Speech
Text to Speech Stream
Stream text to speech with real-time audio chunks
POST
Request
Headers
Bearer token with your API key
application/json
Body
The text to convert to speech (max 5000 characters)
ID of the voice to use
Enable audio enhancement
Speech speed multiplier (0.5 to 2.0)
Voice stability 0-100 (lower = more expressive, higher = more consistent)
Voice similarity 0-100 (how closely the output matches the source voice)
Response
The response is streamed as newline-delimited JSON (NDJSON). Each line contains a JSON object:Chunk Object
Type of message: “chunk”, “complete”, or “error”
Contains the audio chunk data
Base64-encoded audio chunk
Error message (only for error type)
Use Cases
Streaming is ideal for:- Real-time applications: Voice assistants, chatbots
- Long-form content: Articles, books, documents
- Low latency: Start playing audio before generation completes
- Progressive enhancement: Display text while generating audio