Documentation Index
Fetch the complete documentation index at: https://docs.60db.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
60db’s Speech-to-Text (STT) API converts spoken audio into written text with high accuracy across 39 languages, including code-switched Indic+English. Powered by 60db STT v01 (a non-hallucinating, multi-backend speech recognition stack) — non-hallucinating models that don’t invent text on silent or noisy input.Features
Multi-Language
39 languages with auto-detection and Indic+English code-switching
Speaker Diarization
Opt-in pyannote speaker diarization via
diarize: trueTimestamps
Word-level timestamps included automatically
Non-hallucinating
Non-hallucinating backend that emits blank tokens on silence — no phantom text
Basic Usage
- JavaScript
- Python
Supported Formats
| Format | Max Size | Max Duration | Quality |
|---|---|---|---|
| MP3 | 25MB | 10 min | Good |
| WAV | 25MB | 10 min | Excellent |
| FLAC | 25MB | 10 min | Lossless |
| OGG | 25MB | 10 min | Good |
| M4A | 25MB | 10 min | Good |
Language Support
Auto-Detection
Let the API automatically identify the language. Omit thelanguage field (or pass the string "auto") and the server runs language identification across all 39 supported languages:
Specify Language
For lowest latency, pass a single ISO 639-1 code and the server skips language identification entirely:ur, ja, ko, zh, th, vi, id, tl, sw, tr, fa, he) and Arabic dialect tags (ar-eg, ar-lv, …) return an unsupported_language error. For non-MSA Arabic audio, pass ar for best-effort MSA transcription.
Get Supported Languages
Advanced Features
Word-Level Timestamps
Word timings are always included in the response — no flag needed:Speaker Diarization
Identify different speakers withdiarize: true:
SPEAKER_NN IDs as “Speaker 1”, “Speaker 2” in order of first appearance for readability.
Best Practices
Audio Quality
Audio Quality
- Use high-quality recordings (16kHz+ sample rate)
- Minimize background noise
- Ensure clear speech
- Avoid audio compression when possible
Accuracy Tips
Accuracy Tips
- Specify the language when known
- Use appropriate model for your use case
- Provide clean audio without music
- Split very long recordings
Performance
Performance
- Keep files under 25MB
- Use appropriate format (WAV for quality, MP3 for size)
- Process in batches for multiple files — but stay under the 8-concurrent STT cap per user
Use Cases
Meeting Transcription
Voice Commands
Subtitle Generation
API Reference
Speech to Text API
View complete API documentation