Overview
60db’s Speech-to-Text (STT) API converts spoken audio into written text with high accuracy across 50+ languages. Our STT engine uses state-of-the-art AI models optimized for various use cases.Features
Multi-Language
Support for 50+ languages with auto-detection
Speaker Diarization
Identify and separate different speakers
Timestamps
Word-level timestamps for precise alignment
Specialized Models
Models optimized for calls, meetings, and more
Basic Usage
- JavaScript
- Python
Supported Formats
| Format | Max Size | Max Duration | Quality |
|---|---|---|---|
| MP3 | 25MB | 10 min | Good |
| WAV | 25MB | 10 min | Excellent |
| FLAC | 25MB | 10 min | Lossless |
| OGG | 25MB | 10 min | Good |
| M4A | 25MB | 10 min | Good |
Language Support
Auto-Detection
Let the API automatically detect the language:Specify Language
For better accuracy, specify the language:Get Supported Languages
Transcription Models
Choose the model optimized for your use case:General (Default)
Best for general-purpose transcription:Phone Call
Optimized for phone conversations:Meeting
Best for multi-speaker meetings:Medical
Specialized for medical terminology:Advanced Features
Word-Level Timestamps
Get precise timing for each word:Speaker Diarization
Identify different speakers:Best Practices
Audio Quality
Audio Quality
- Use high-quality recordings (16kHz+ sample rate)
- Minimize background noise
- Ensure clear speech
- Avoid audio compression when possible
Accuracy Tips
Accuracy Tips
- Specify the language when known
- Use appropriate model for your use case
- Provide clean audio without music
- Split very long recordings
Performance
Performance
- Keep files under 25MB
- Use appropriate format (WAV for quality, MP3 for size)
- Process in batches for multiple files
Use Cases
Meeting Transcription
Voice Commands
Subtitle Generation
API Reference
Speech to Text API
View complete API documentation