Skip to main content

WebSocket API

Real-time bidirectional streaming API for Speech-to-Text (STT) and Text-to-Speech (TTS) services.

Overview

The 60db WebSocket API provides real-time streaming capabilities for:
  • Speech-to-Text (STT): Convert audio to text with 99+ language support
  • Text-to-Speech (TTS): Synthesize natural-sounding speech with multiple voices

Base URLs

EnvironmentSTT URLTTS URL
Productionws://api.60db.ai/ws/sttws://api.60db.ai/ws/tts
Production (Secure)wss://api.60db.ai/ws/sttwss://api.60db.ai/ws/tts

Authentication

WebSocket connections require authentication using API key via query parameter:
ws://api.60db.ai/ws/stt?apiKey=sk_live_your_api_key_here
Getting Your API Key:
  1. Go to app.60db.ai
  2. Navigate to Settings → Developer → API Keys
  3. Click Create API Key
  4. Copy and store your API key securely

🎤 STT WebSocket (Speech-to-Text)

What it does: You send audio → You get text back Use for: Transcribing speech, voice commands, call center analytics, meeting transcription

Quick Start

const WebSocket = require('ws');

const API_KEY = 'sk_live_your_key';
const ws = new WebSocket(`wss://api.60db.ai/ws/stt?apiKey=${API_KEY}`);

ws.onopen = () => console.log('✅ Connected');

ws.onmessage = (data) => {
  const msg = JSON.parse(data);

  if (msg.connection_established) {
    console.log('✅ Authenticated!');

    // Start session
    ws.send(JSON.stringify({
      type: 'start',
      languages: ['en'],
      config: {
        encoding: 'mulaw',
        sample_rate: 8000,
        continuous_mode: true
      }
    }));
  }

  if (msg.type === 'connected') {
    console.log('✅ Ready! Send audio now');

    // Send audio chunks
    const interval = setInterval(() => {
      ws.send(audioBuffer); // Your audio data
    }, 60);

    // Stop after 5 seconds
    setTimeout(() => {
      clearInterval(interval);
      ws.send(JSON.stringify({ type: 'stop' }));
    }, 5000);
  }

  if (msg.type === 'transcription') {
    console.log('📝 Text:', msg.text);
  }

  if (msg.type === 'session_stopped') {
    console.log('✅ Done! Cost:', msg.billing_summary.total_cost);
    ws.close();
  }
};

How STT Works

1. Connect → 2. Authenticate → 3. Start session → 4. Send audio → 5. Get text → 6. Stop

🔊 TTS WebSocket (Text-to-Speech)

What it does: You send text → You get audio back Use for: Voice assistants, audiobooks, accessibility, chatbots

Quick Start

const WebSocket = require('ws');
const fs = require('fs');

const API_KEY = 'sk_live_your_key';
const ws = new WebSocket(`wss://api.60db.ai/ws/tts?apiKey=${API_KEY}`);
const contextId = 'my-session-' + Date.now();
const audioChunks = [];

ws.onopen = () => console.log('✅ Connected');

ws.onmessage = (data) => {
  const msg = JSON.parse(data);

  if (msg.connection_established) {
    console.log('✅ Authenticated!');

    // Create context
    ws.send(JSON.stringify({
      create_context: {
        context_id: contextId,
        voice_id: 'fbb75ed2-975a-40c7-9e06-38e30524a9a1',
        audio_config: {
          audio_encoding: 'LINEAR16',
          sample_rate_hertz: 16000
        }
      }
    }));
  }

  if (msg.context_created) {
    console.log('✅ Context created!');

    // Send text
    ws.send(JSON.stringify({
      send_text: {
        context_id: contextId,
        text: 'Hello, this is a test of the text to speech service.'
      }
    }));

    // Flush
    ws.send(JSON.stringify({
      flush_context: { context_id: contextId }
    }));
  }

  if (msg.audio_chunk) {
    const audioData = Buffer.from(msg.audio_chunk.audioContent, 'base64');
    audioChunks.push(audioData);
    console.log('🔊 Audio chunk received');
  }

  if (msg.flush_completed) {
    console.log('✅ All audio received!');

    // Close context
    ws.send(JSON.stringify({
      close_context: { context_id: contextId }
    }));
  }

  if (msg.context_closed) {
    console.log('✅ Done!');

    // Save audio
    const audio = Buffer.concat(audioChunks);
    fs.writeFileSync('output.pcm', audio);
    console.log('💾 Saved output.pcm');

    ws.close();
  }
};

How TTS Works

1. Connect → 2. Authenticate → 3. Create context → 4. Send text → 5. Flush → 6. Get audio → 7. Close

📋 STT vs TTS Comparison

FeatureSTT (Speech-to-Text)TTS (Text-to-Speech)
InputAudio (binary)Text (string)
OutputText (string)Audio (binary)
Use CaseTranscribe speechGenerate speech
DirectionAudio → TextText → Audio
First Message{ type: "start", ... }{ create_context: {...} }
Session End{ type: "stop" }{ close_context: {...} }
Pricing$0.00000833/second$0.00002/character

📚 Full Documentation


💡 Key Concepts

STT Messages

// Start session
{ type: "start", languages: ["en"], config: {...} }

// Stop session
{ type: "stop" }

TTS Messages

// Create context
{ create_context: { context_id, voice_id, audio_config } }

// Send text
{ send_text: { context_id, text: "..." } }

// Get audio
{ flush_context: { context_id } }

// Close session
{ close_context: { context_id } }

📊 Pricing

ServiceRateMinimum
STT$0.00000833/second$0.01
TTS$0.00002/character$0.01

🆘 Support