Live captioning accuracy is measured by Word Error Rate (WER) — lower is better. Top cloud STT systems achieve 4-8% WER on clear English audio. Deepgram Nova-2, used by StreamTranslate, scores 6.3% WER, outperforming Google (7.1%), Amazon (9.4%), and Whisper (8.2%).
| Engine | WER (English) | WER (Multilingual) |
|---|---|---|
| Deepgram Nova-2 | 6.3% | 9.8% |
| Google Speech-to-Text | 7.1% | 10.4% |
| Whisper v3 Large | 8.2% | 11.2% |
| Amazon Transcribe | 9.4% | 13.1% |
| Web Speech API (Chrome) | 14.7% | N/A |
Audio quality, background noise, accents, gaming jargon, and profanity all impact WER. Streaming-optimized models like Deepgram Nova-2 handle gaming terms and noisy environments better than general-purpose engines.
Start Translating Free →