Skip to main content

Overview

60db’s Speech-to-Text (STT) API converts spoken audio into written text with high accuracy across 50+ languages. Our STT engine uses state-of-the-art AI models optimized for various use cases.

Features

Multi-Language

Support for 50+ languages with auto-detection

Speaker Diarization

Identify and separate different speakers

Timestamps

Word-level timestamps for precise alignment

Specialized Models

Models optimized for calls, meetings, and more

Basic Usage

import { SixtyDBClient } from '@60db-own/60db-js';

const client = new SixtyDBClient('your-api-key');

const file = document.querySelector('input[type="file"]').files[0];

const result = await client.speechToText(file, {
  language: 'en'
});

console.log('Transcription:', result.text);
console.log('Confidence:', result.confidence);

Supported Formats

FormatMax SizeMax DurationQuality
MP325MB10 minGood
WAV25MB10 minExcellent
FLAC25MB10 minLossless
OGG25MB10 minGood
M4A25MB10 minGood

Language Support

Auto-Detection

Let the API automatically detect the language:
const result = await client.speechToText(file);
// Language will be auto-detected
console.log('Detected language:', result.language);

Specify Language

For better accuracy, specify the language:
const result = await client.speechToText(file, {
  language: 'en'  // ISO 639-1 code
});

Get Supported Languages

const languages = await client.getLanguages();
languages.forEach(lang => {
  console.log(`${lang.name} (${lang.code})`);
});

Transcription Models

Choose the model optimized for your use case:

General (Default)

Best for general-purpose transcription:
const result = await client.speechToText(file, {
  model: 'general'
});

Phone Call

Optimized for phone conversations:
const result = await client.speechToText(file, {
  model: 'phone_call'
});

Meeting

Best for multi-speaker meetings:
const result = await client.speechToText(file, {
  model: 'meeting',
  speaker_labels: true
});

Medical

Specialized for medical terminology:
const result = await client.speechToText(file, {
  model: 'medical'
});

Advanced Features

Word-Level Timestamps

Get precise timing for each word:
const result = await client.speechToText(file, {
  timestamps: true
});

result.words.forEach(word => {
  console.log(`${word.word}: ${word.start}s - ${word.end}s`);
});

Speaker Diarization

Identify different speakers:
const result = await client.speechToText(file, {
  speaker_labels: true
});

// Result includes speaker information
result.segments.forEach(segment => {
  console.log(`Speaker ${segment.speaker}: ${segment.text}`);
});

Best Practices

  • Use high-quality recordings (16kHz+ sample rate)
  • Minimize background noise
  • Ensure clear speech
  • Avoid audio compression when possible
  • Specify the language when known
  • Use appropriate model for your use case
  • Provide clean audio without music
  • Split very long recordings
  • Keep files under 25MB
  • Use appropriate format (WAV for quality, MP3 for size)
  • Process in batches for multiple files

Use Cases

Meeting Transcription

with open('meeting.mp3', 'rb') as audio:
    result = client.speech_to_text(
        audio,
        model='meeting',
        speaker_labels=True,
        timestamps=True
    )
    
    # Generate meeting notes
    for segment in result['segments']:
        print(f"[{segment['start']:.1f}s] Speaker {segment['speaker']}: {segment['text']}")

Voice Commands

async function processVoiceCommand(audioBlob) {
  const result = await client.speechToText(audioBlob, {
    language: 'en',
    model: 'general'
  });
  
  const command = parseCommand(result.text);
  executeCommand(command);
}

Subtitle Generation

const result = await client.speechToText(videoAudio, {
  timestamps: true
});

const subtitles = generateSRT(result.words);
saveFile('subtitles.srt', subtitles);

API Reference

Speech to Text API

View complete API documentation