TL;DR
Chrome's Web Speech API (used by Caption.Ninja, Web Captioner, and many free OBS caption tools) is being deprecated in favor of on-device SODA models. During the migration, both the legacy and new services are unreliable. The free OBS caption category that depended on this API is collapsing. Reliable alternatives use paid speech recognition APIs like Deepgram or AssemblyAI server-side.
What was the Web Speech API?
The Web Speech API is a W3C standard that exposes speech recognition (and synthesis) to web applications through JavaScript. Chrome implemented this API in 2013, providing free server-side speech recognition powered by Google's production speech-to-text infrastructure.
That subsidized an entire category of free captioning tools. Any developer could write a few lines of JavaScript, call new SpeechRecognition(), and get high-quality transcription without paying for an API. Web Captioner, Caption.Ninja, Zip Captions, and dozens of DIY captioning scripts were built on this.
What changed
Google announced a transition from the legacy server-side Web Speech API to a new on-device SODA (Speech On-Device API) model. Reasons:
- Privacy — keeping audio local instead of sending to Google servers
- Cost — Google was eating the per-minute API cost for billions of free requests
- Performance — faster transcription with on-device models
The transition is incomplete. Multiple issues are documented:
- Chromium bug 40286514 — Web Speech API SODA backend rollout issues
- Chromium bug 40948113 — Web Speech API recognition does not work properly
- Brave bug 55414 — On-device SpeechRecognition silently hangs in "downloading" state, no SODA component ever installs
- CVE 2026-7935 — security vulnerability in Chrome's Speech API allowing UI spoofing
The 60-second continuous-mode timeout
Even when working, the Web Speech API has a hard limit: Chrome stops a continuous recognition session after about 60 seconds of silence and fires onend without warning. Long-running dictation (live streams, multi-hour Twitch sessions) requires the page to restart the recognizer in onend to keep going.
Most free caption tools handle this with a reconnect loop. When the API is healthy, it works. When the API is degrading (which is now), the reconnect loop fails too.
What this means for OBS caption tools
If your OBS caption tool depends on the browser's built-in SpeechRecognition object, you are affected:
- Caption.Ninja — Uses Web Speech API. Currently degrading. Their docs recommend Edge but Edge has the same issues.
- Web Captioner — Used Web Speech API. Shut down October 31, 2023 partly because of this trajectory.
- Zip Captions — Uses Web Speech API. Same issues.
- DIY browser-source scripts — Same.
The architectural alternatives
Tools that do NOT depend on the browser's built-in speech recognition keep working:
- Server-side paid speech recognition — Deepgram Nova-3, AssemblyAI Universal-2, OpenAI Whisper API, Google Cloud Speech-to-Text. These are paid and reliable. StreamTranslate uses Deepgram Nova-3.
- Local speech recognition — Whisper.cpp running on the user's GPU. Free but heavy. LocalVocal uses this approach.
For OBS streamers who need reliable real-time captions and translation today, the practical options are LocalVocal (free, local, GPU-heavy) or a managed paid service like StreamTranslate (cloud-based, no GPU, $9.99 once).
Frequently asked questions
Is the Web Speech API completely gone?
Not yet. The legacy server-side version is being phased out, and the new on-device SODA version is rolling out slowly. During the transition, both are unreliable. Eventually only the on-device version will exist.
Will the on-device version work for OBS caption tools when it stabilizes?
Possibly, but with caveats. On-device speech recognition uses the streamer's CPU/GPU during streams. Streamers running games + OBS already have tight resource budgets. And quality may be lower than server-side cloud speech recognition.
What should developers building caption tools use?
Server-side paid APIs (Deepgram, AssemblyAI, OpenAI Whisper) are the most reliable today. Local Whisper is viable for tools that can manage the GPU requirement.
Is Safari/Firefox affected?
They were never affected because they did not have a free Web Speech API to begin with. Safari and Firefox users were locked out of Caption.Ninja and Web Captioner from day one.