Changelog - 60db

Every product surface shares this page. Dated entries flag the surfaces they touched (REST, WebSocket, SDK, CLI, MCP, Proxy) so you can skim by what you integrate with.

2026-04-30

STT/TTS reliability + new features

Added

Longer request timeout (REST POST /stt).

Server-side request timeout raised from 120 s → 600 s; clients should match. The 25 MB / ~1-hour file-size cap remains.

Custom vocabulary boost.

New keywords form field on POST /stt. CSV with optional per-term weights, e.g. Acme:5,XYZ Pharma:8.
The same boost runs on the WebSocket path via the existing context.terms field — no client change required to benefit; existing terms payloads now produce boosted / original markers automatically.
Words replaced by the boost appear in the response with boosted: true and original: "<pre-boost word>".

Word-level timestamps and confidence.

return_timestamps=word adds per-word start / end to the REST response.
include_confidence=true adds per-word confidence (0-1).

Diarization controls.

min_speakers / max_speakers form fields on POST /stt.

Script correction.

script_correction=true for code-mixed Devanagari / Latin audio.

Audio quality indicator.

snr_db is now a top-level field on every STT response and a per-message field on WS transcription events. Surface as a “good / fair / poor” badge (>= 15 good, 0–15 fair, < 0 poor).

Limits

Per-user concurrency cap. Counted across REST + WS combined.

Service	Default
STT	8
TTS	5

Excess returns:

REST — 429 with error_code: STT_CONCURRENCY_LIMIT / TTS_CONCURRENCY_LIMIT and details.limit.
WebSocket — error frame followed by close code 1008. The cap releases when an in-flight request completes; no server-side queueing.

New error codes

Status	`error_code`	When	Retry guidance
429	`STT_CONCURRENCY_LIMIT` / `TTS_CONCURRENCY_LIMIT`	Per-user concurrency cap	Retry after an in-flight request finishes
429	`STT_UPSTREAM_RATE_LIMIT`	Upstream rate-limit pass-through	Honor the `Retry-After` HTTP header
499	`STT_CLIENT_CANCELLED`	Client closed the connection mid-flight	Intentional; no retry, no charge
503	`STT_UPSTREAM_UNAVAILABLE`	Upstream STT 5xx	Exponential backoff

499 is non-standard but used (nginx convention) to signal the client closed the connection before the response was sent.

New warning codes

Code	Meaning
`low_snr_dropped`	Audio dropped before LID/ASR; SNR below floor. Response `text` is empty. No credits charged.
`llm_refinement_not_in_plan`	The `context` field was provided but the active plan does not include LLM refinement. The field was ignored; transcription proceeded without refinement.

The same low_snr_dropped value appears as a processing_mode on WS transcription events — treat as “skip this utterance, no useful text”.

Billing changes

WS double-bill fix. Sessions that previously double-billed when the client sent stop are now billed correctly. billing_summary.total_duration_seconds will show as ~50 % of historical values for affected sessions.
billing_summary.client_estimated_seconds (new, diagnostic) — rough estimate of audio the client sent. Useful only for debugging duration drift; never display as a billed number.
Low-SNR refunds. REST sessions where upstream dropped audio for low SNR no longer charge the user. WS already handled this.
Cancellation propagation. Client disconnects mid-request now correctly abort the upstream call. No charge for cancelled requests (499 STT_CLIENT_CANCELLED).
Diarization surcharge documented. When diarize=true, requests incur a documented +30 % surcharge on top of the base STT rate to cover GPU diarization (pyannote).

Plan-tier gate

context (LLM refinement) is now a paid feature. On the Free plan the field is silently stripped server-side and the response includes warning_codes: ["llm_refinement_not_in_plan"]. Surface an upgrade prompt or hide / disable the input on Free plans for better UX.

Surface summary

REST — POST /stt accepts new form fields (keywords, languages, min_speakers, max_speakers, return_timestamps, include_confidence, script_correction, min_split_sec); returns new fields (snr_db, language_source: "long_audio_chunked", per-word boosted/original/confidence, per-segment chunk_idx).
REST POST /tts, /tts-synthesize, /tts-stream — may now return 429 TTS_CONCURRENCY_LIMIT.
WebSocket /ws/stt — adds low_snr_dropped to processing_mode; per-word boosted/original; message-level snr_db; concurrency-limit error frame + close 1008; corrected billing_summary with new client_estimated_seconds field.
WebSocket /ws/tts — concurrency-limit error frame (legacy {error: {message, code, details}} shape) + close 1008.
Web UI — axios timeout raised to 600 s, error-code-aware toasts, warning_codes surfaced, WS 1008 distinguished from auth/network failures. The 25 MB upload cap is unchanged.

2026-04-14

LLM context refinement for STT

Added — Context-gated LLM refinement

Speech-to-Text now accepts an optional context hint that opens a server-side LLM refinement gate. When supplied, the response transcript is polished for proper-noun accuracy, filler removal, punctuation, and script consistency in mixed-language audio.The shape differs per transport:

Transport	`context` shape	Enables
REST `POST /stt` (and `/v1/transcribe`)	plain `string` — free-form paragraph	LLM refinement of response `text`
WebSocket `/v1/stream` (and `/ws/stt`)	structured object `{general, text, terms}`	Two-phase canonical final flow

REST — `context: string`

Free-form paragraph describing the session (domain, speakers, jargon). Serialized as a multipart form field.

cURL

curl -X POST https://api.60db.ai/stt \
  -H "Authorization: Bearer $API_KEY" \
  -F "[email protected]" \
  -F "context=Cricket coaching session. Players: Arjun Mehta, Ishaan Verma. Discussing batting technique, stamina, running between wickets."

JavaScript

await client.speechToText(file, {
  language: "auto",
  diarize: true,
  context: "Cricket coaching session. Players: Arjun Mehta, Ishaan Verma.",
});

Python

client.speech_to_text(
    audio_file,
    language='auto',
    diarize=True,
    context='Cricket coaching session. Players: Arjun Mehta, Ishaan Verma.',
)

CLI

60db stt:transcribe --file meeting.wav \
  --context "Cricket coaching session. Players: Arjun Mehta, Ishaan Verma."

MCP

{
  "audio_url": "https://example.com/meeting.wav",
  "context": "Cricket coaching session. Players: Arjun Mehta, Ishaan Verma."
}

WebSocket — structured `{general, text, terms}`

Start message

{
  "type": "start",
  "languages": ["en", "hi"],
  "context": {
    "general": [
      { "key": "domain", "value": "Cricket coaching" },
      {
        "key": "players",
        "value": "Arjun Mehta, Ishaan Verma, Aryan Khan, Rohan"
      }
    ],
    "text": "Coach reviewing a batting practice session.",
    "terms": ["Arjun Mehta", "Ishaan Verma", "off-side", "stamina", "wickets"]
  },
  "config": {
    "encoding": "linear",
    "sample_rate": 48000,
    "continuous_mode": true
  }
}

Changed — WebSocket two-phase canonical flow

When context is supplied, each utterance now produces two transcription events sharing a sentence_id:

First emit — is_final: true, speech_final: false — fast dict-corrected text. Use for low-latency UI paint and voicebot barge-in / NLU.
Canonical — is_final: true, speech_final: true — definitive answer. Either LLM-refined (when llm_applied: true) or the original text re-emitted (when LLM was skipped or failed).

New canonical-only fields:

Field	Type	Description
`llm_applied`	`boolean`	`true` if the LLM ran, `false` if skipped / failed.
`llm_latency_ms`	`number`	Round-trip to the LLM endpoint (SLA monitoring).
`llm_reason`	`string`	Diagnostic when `llm_applied: false` (`gate_closed`, `error:TimeoutException`, etc.).

Simple consumers can gate exclusively on speech_final: true and ignore the rest — one canonical event per utterance regardless of whether refinement is on.See the WebSocket STT Reference for the full table and reconciliation patterns.

Added — Word-preservation guardrails (proxy + UI)

To defend against over-aggressive LLM refinement and fast-speech hallucination rejection:

Refined word-retention guardrail. If the canonical text drops more than ~60% of the first-emit’s token count, the proxy rolls back to the first-emit text and marks llm_applied: false, llm_reason: "dropped_too_many_words". Tuned so legitimate polish (filler removal, 10–40% compression) passes through untouched.
Hallucination-rejected fallback. When the upstream server’s word-rate guard emits an empty final (processing_mode: "hallucination_rejected", common on fast-paced Indic/English mixed audio), the proxy upgrades it into a tentative canonical using the cached first-emit / last-interim text plus tentative: true and tentative_reason: "hallucination_rejected" so words never silently disappear.

Both guardrails run in the /ws/stt proxy, so voicebots and web clients inherit them automatically.

Added — Backward-compat `refined` event handling

Pre-migration upstream builds emit LLM refinement as a separate refined event rather than a second transcription. The proxy and web client transparently accept both shapes, so existing integrations keep working through the rollout.

If you see refined events in the wire trace, upstream workers haven’t been restarted onto the two-phase build yet — it’s still fully functional; the refined event is deprecated but accepted.

Surface summary

REST — POST /stt, POST /v1/transcribe — context: string form field.
WebSocket — /v1/stream, /ws/stt — context: {general, text, terms} on start; two-phase canonical flow; llm_applied / llm_latency_ms / llm_reason on canonical.
JavaScript SDK — client.speechToText(audio, { context }).
Python SDK — client.speech_to_text(audio_file, context=...).
CLI — 60db stt:transcribe --context "...".
MCP Server — sixtydb_stt_transcribe accepts context.
Proxy — word-retention guardrail, hallucination fallback, first-emit caching, legacy refined shim.
Web UI — context input in Speech-to-Text page and realtime demo; segments rendered with tentative marker when flagged.

Environment knobs (server-side)

Variable	Default	Purpose
`STT_LLM_ENABLED`	`true`	Master kill-switch for refinement.
`STT_LLM_MODEL`	`60db-tiny`	OpenAI-compatible model identifier.
`STT_LLM_TIMEOUT_SEC`	`10.0`	Per-call timeout — on timeout, canonical falls back to original.
`STT_LLM_MIN_WORDS`	`4`	Skip refinement for tiny utterances (`"Yeah"`, `"Okay"`).
`STT_WS_HALLUCINATION_WPS`	`8.0`	Word-rate ceiling; finals above it are flagged as `hallucination_rejected`.

​Added

​Limits

​New error codes

​New warning codes

​Billing changes

​Plan-tier gate

​Surface summary

​Added — Context-gated LLM refinement

​REST — context: string

​WebSocket — structured {general, text, terms}

​Changed — WebSocket two-phase canonical flow

​Added — Word-preservation guardrails (proxy + UI)

​Added — Backward-compat refined event handling

​Surface summary

​Environment knobs (server-side)

Added

Limits

New error codes

New warning codes

Billing changes

Plan-tier gate

Surface summary

Added — Context-gated LLM refinement

REST — `context: string`

WebSocket — structured `{general, text, terms}`

Changed — WebSocket two-phase canonical flow

Added — Word-preservation guardrails (proxy + UI)

Added — Backward-compat `refined` event handling

Surface summary

Environment knobs (server-side)