Skip to main content
POST
/
voices
curl -X POST https://api-dev.qcall.ai/tts/voices \
  -H "Authorization: Bearer your-api-key" \
  -F "name=My Custom Voice" \
  -F "description=Professional voice for my brand" \
  -F "language=en" \
  -F "gender=female" \
  -F "files=@sample1.mp3" \
  -F "files=@sample2.mp3" \
  -F "files=@sample3.mp3"
{
  "id": "voice-custom-123",
  "name": "My Custom Voice",
  "description": "Professional voice for my brand",
  "language": "en",
  "gender": "female",
  "status": "processing",
  "is_custom": true,
  "created_at": "2026-01-29T11:30:00Z",
  "estimated_completion": "2026-01-29T11:45:00Z"
}

Request

Headers

Authorization
string
required
Bearer token with your API key
Content-Type
string
required
multipart/form-data

Form Data

name
string
required
Name for the custom voice
files
file[]
required
Audio files for voice cloning (minimum 3, maximum 10 files). Each file should be:
  • Format: MP3, WAV, or FLAC
  • Duration: 10-60 seconds each
  • Quality: Clear speech, minimal background noise
  • Total duration: At least 2 minutes combined
description
string
Description of the voice
language
string
Primary language code (e.g., “en”, “es”)
gender
string
Voice gender: “male”, “female”, or “neutral”

Response

id
string
Unique identifier for the created voice
name
string
Voice name
status
string
Processing status: “processing”, “ready”, “failed”
created_at
string
ISO 8601 timestamp
estimated_completion
string
Estimated time for voice processing to complete
curl -X POST https://api-dev.qcall.ai/tts/voices \
  -H "Authorization: Bearer your-api-key" \
  -F "name=My Custom Voice" \
  -F "description=Professional voice for my brand" \
  -F "language=en" \
  -F "gender=female" \
  -F "files=@sample1.mp3" \
  -F "files=@sample2.mp3" \
  -F "files=@sample3.mp3"
{
  "id": "voice-custom-123",
  "name": "My Custom Voice",
  "description": "Professional voice for my brand",
  "language": "en",
  "gender": "female",
  "status": "processing",
  "is_custom": true,
  "created_at": "2026-01-29T11:30:00Z",
  "estimated_completion": "2026-01-29T11:45:00Z"
}

Voice Cloning Best Practices

  • Use high-quality recordings (at least 44.1kHz sample rate)
  • Ensure minimal background noise
  • Avoid music or sound effects
  • Use consistent recording environment
  • Include varied sentence structures
  • Cover different emotions and tones
  • Include questions, statements, and exclamations
  • Avoid repetitive content
  • Minimum 3 files, maximum 10 files
  • Each file: 10-60 seconds
  • Total duration: At least 2 minutes
  • Supported formats: MP3, WAV, FLAC
Voice processing typically takes 10-15 minutes. You’ll receive a webhook notification when your voice is ready to use.