Voices
Create Voice
Create a custom voice from audio samples
POST
Request
Headers
Bearer token with your API key
multipart/form-data
Form Data
Name for the custom voice
Audio files for voice cloning (minimum 3, maximum 10 files). Each file should
be: - Format: MP3, WAV, or FLAC - Duration: 10-60 seconds each - Quality:
Clear speech, minimal background noise - Total duration: At least 2 minutes
combined
Description of the voice
Primary language code (e.g., “en”, “es”)
Voice gender: “male”, “female”, or “neutral”
Response
Unique identifier for the created voice
Voice name
Processing status: “processing”, “ready”, “failed”
ISO 8601 timestamp
Estimated time for voice processing to complete
Voice Cloning Best Practices
Audio Quality
Audio Quality
- Use high-quality recordings (at least 44.1kHz sample rate) - Ensure minimal background noise - Avoid music or sound effects - Use consistent recording environment
Content Guidelines
Content Guidelines
- Include varied sentence structures - Cover different emotions and tones - Include questions, statements, and exclamations - Avoid repetitive content
Technical Requirements
Technical Requirements
- Minimum 3 files, maximum 10 files - Each file: 10-60 seconds - Total duration: At least 2 minutes - Supported formats: MP3, WAV, FLAC
Voice processing typically takes 10-15 minutes. You’ll receive a webhook
notification when your voice is ready to use.