Added
Longer request timeout (RESTPOST /stt).- Server-side request timeout raised from 120 s → 600 s; clients should match. The 25 MB / ~1-hour file-size cap remains.
- New
keywordsform field onPOST /stt. CSV with optional per-term weights, e.g.Acme:5,XYZ Pharma:8. - The same boost runs on the WebSocket path via the existing
context.termsfield — no client change required to benefit; existingtermspayloads now produceboosted/originalmarkers automatically. - Words replaced by the boost appear in the response with
boosted: trueandoriginal: "<pre-boost word>".
return_timestamps=wordadds per-wordstart/endto the REST response.include_confidence=trueadds per-wordconfidence(0-1).
min_speakers/max_speakersform fields onPOST /stt.
script_correction=truefor code-mixed Devanagari / Latin audio.
snr_dbis now a top-level field on every STT response and a per-message field on WStranscriptionevents. Surface as a “good / fair / poor” badge (>= 15good,0–15fair,< 0poor).
Limits
Per-user concurrency cap. Counted across REST + WS combined.| Service | Default |
|---|---|
| STT | 8 |
| TTS | 5 |
- REST —
429witherror_code: STT_CONCURRENCY_LIMIT/TTS_CONCURRENCY_LIMITanddetails.limit. - WebSocket — error frame followed by close code
1008. The cap releases when an in-flight request completes; no server-side queueing.
New error codes
| Status | error_code | When | Retry guidance |
|---|---|---|---|
| 429 | STT_CONCURRENCY_LIMIT / TTS_CONCURRENCY_LIMIT | Per-user concurrency cap | Retry after an in-flight request finishes |
| 429 | STT_UPSTREAM_RATE_LIMIT | Upstream rate-limit pass-through | Honor the Retry-After HTTP header |
| 499 | STT_CLIENT_CANCELLED | Client closed the connection mid-flight | Intentional; no retry, no charge |
| 503 | STT_UPSTREAM_UNAVAILABLE | Upstream STT 5xx | Exponential backoff |
499 is non-standard but used (nginx convention) to signal the client closed the connection before the response was sent.New warning codes
| Code | Meaning |
|---|---|
low_snr_dropped | Audio dropped before LID/ASR; SNR below floor. Response text is empty. No credits charged. |
llm_refinement_not_in_plan | The context field was provided but the active plan does not include LLM refinement. The field was ignored; transcription proceeded without refinement. |
low_snr_dropped value appears as a processing_mode on WS transcription events — treat as “skip this utterance, no useful text”.Billing changes
- WS double-bill fix. Sessions that previously double-billed when the client sent
stopare now billed correctly.billing_summary.total_duration_secondswill show as ~50 % of historical values for affected sessions. billing_summary.client_estimated_seconds(new, diagnostic) — rough estimate of audio the client sent. Useful only for debugging duration drift; never display as a billed number.- Low-SNR refunds. REST sessions where upstream dropped audio for low SNR no longer charge the user. WS already handled this.
- Cancellation propagation. Client disconnects mid-request now correctly abort the upstream call. No charge for cancelled requests (
499 STT_CLIENT_CANCELLED). - Diarization surcharge documented. When
diarize=true, requests incur a documented +30 % surcharge on top of the base STT rate to cover GPU diarization (pyannote).
Plan-tier gate
context (LLM refinement) is now a paid feature. On the Free plan the field is silently stripped server-side and the response includes warning_codes: ["llm_refinement_not_in_plan"]. Surface an upgrade prompt or hide / disable the input on Free plans for better UX.Surface summary
- REST —
POST /sttaccepts new form fields (keywords,languages,min_speakers,max_speakers,return_timestamps,include_confidence,script_correction,min_split_sec); returns new fields (snr_db,language_source: "long_audio_chunked", per-wordboosted/original/confidence, per-segmentchunk_idx). - REST
POST /tts,/tts-synthesize,/tts-stream— may now return429 TTS_CONCURRENCY_LIMIT. - WebSocket
/ws/stt— addslow_snr_droppedtoprocessing_mode; per-wordboosted/original; message-levelsnr_db; concurrency-limit error frame + close1008; correctedbilling_summarywith newclient_estimated_secondsfield. - WebSocket
/ws/tts— concurrency-limit error frame (legacy{error: {message, code, details}}shape) + close1008. - Web UI — axios timeout raised to 600 s, error-code-aware toasts,
warning_codessurfaced, WS1008distinguished from auth/network failures. The 25 MB upload cap is unchanged.
Added — Context-gated LLM refinement
Speech-to-Text now accepts an optional context hint that opens a server-side LLM refinement gate. When supplied, the response transcript is polished for proper-noun accuracy, filler removal, punctuation, and script consistency in mixed-language audio.The shape differs per transport:| Transport | context shape | Enables |
|---|---|---|
REST POST /stt (and /v1/transcribe) | plain string — free-form paragraph | LLM refinement of response text |
WebSocket /v1/stream (and /ws/stt) | structured object {general, text, terms} | Two-phase canonical final flow |
REST — context: string
Free-form paragraph describing the session (domain, speakers, jargon). Serialized as a multipart form field.cURL
JavaScript
Python
CLI
MCP
WebSocket — structured {general, text, terms}
Start message
Changed — WebSocket two-phase canonical flow
Whencontext is supplied, each utterance now produces two transcription events sharing a sentence_id:- First emit —
is_final: true, speech_final: false— fast dict-corrected text. Use for low-latency UI paint and voicebot barge-in / NLU. - Canonical —
is_final: true, speech_final: true— definitive answer. Either LLM-refined (whenllm_applied: true) or the original text re-emitted (when LLM was skipped or failed).
| Field | Type | Description |
|---|---|---|
llm_applied | boolean | true if the LLM ran, false if skipped / failed. |
llm_latency_ms | number | Round-trip to the LLM endpoint (SLA monitoring). |
llm_reason | string | Diagnostic when llm_applied: false (gate_closed, error:TimeoutException, etc.). |
speech_final: true and ignore the rest — one canonical event per utterance regardless of whether refinement is on.See the WebSocket STT Reference for the full table and reconciliation patterns.Added — Word-preservation guardrails (proxy + UI)
To defend against over-aggressive LLM refinement and fast-speech hallucination rejection:- Refined word-retention guardrail. If the canonical text drops more than ~60% of the first-emit’s token count, the proxy rolls back to the first-emit text and marks
llm_applied: false, llm_reason: "dropped_too_many_words". Tuned so legitimate polish (filler removal, 10–40% compression) passes through untouched. - Hallucination-rejected fallback. When the upstream server’s word-rate guard emits an empty final (
processing_mode: "hallucination_rejected", common on fast-paced Indic/English mixed audio), the proxy upgrades it into a tentative canonical using the cached first-emit / last-interim text plustentative: trueandtentative_reason: "hallucination_rejected"so words never silently disappear.
/ws/stt proxy, so voicebots and web clients inherit them automatically.Added — Backward-compat refined event handling
Pre-migration upstream builds emit LLM refinement as a separate refined event rather than a second transcription. The proxy and web client transparently accept both shapes, so existing integrations keep working through the rollout.If you see
refined events in the wire trace, upstream workers haven’t been
restarted onto the two-phase build yet — it’s still fully functional; the
refined event is deprecated but accepted.Surface summary
- REST —
POST /stt,POST /v1/transcribe—context: stringform field. - WebSocket —
/v1/stream,/ws/stt—context: {general, text, terms}onstart; two-phase canonical flow;llm_applied/llm_latency_ms/llm_reasonon canonical. - JavaScript SDK —
client.speechToText(audio, { context }). - Python SDK —
client.speech_to_text(audio_file, context=...). - CLI —
60db stt:transcribe --context "...". - MCP Server —
sixtydb_stt_transcribeacceptscontext. - Proxy — word-retention guardrail, hallucination fallback, first-emit caching, legacy
refinedshim. - Web UI — context input in Speech-to-Text page and realtime demo; segments rendered with
tentativemarker when flagged.
Environment knobs (server-side)
| Variable | Default | Purpose |
|---|---|---|
STT_LLM_ENABLED | true | Master kill-switch for refinement. |
STT_LLM_MODEL | 60db-tiny | OpenAI-compatible model identifier. |
STT_LLM_TIMEOUT_SEC | 10.0 | Per-call timeout — on timeout, canonical falls back to original. |
STT_LLM_MIN_WORDS | 4 | Skip refinement for tiny utterances ("Yeah", "Okay"). |
STT_WS_HALLUCINATION_WPS | 8.0 | Word-rate ceiling; finals above it are flagged as hallucination_rejected. |