WER is the single most important metric for caption quality. Here's what it means, how it's calculated, and why StreamTranslate achieves best-in-class accuracy for live streams.
Try StreamTranslate FreeWord Error Rate (WER) is the standard metric for measuring the accuracy of automatic speech recognition systems. It compares the transcription produced by an ASR engine against a reference transcript and counts how many words were wrong. A WER of 5% means 5 out of every 100 words were incorrect — either substituted, deleted, or inserted incorrectly.
The formula is simple: WER = (Substitutions + Deletions + Insertions) / Total Words in Reference. A perfect transcription has a WER of 0%. The lower the WER, the better the caption quality your viewers see on stream.
For live streaming specifically, WER matters more than in post-production transcription because there's no opportunity to correct errors before viewers see them. A high WER creates embarrassing captions that misrepresent what you said, confuse non-native speakers relying on translations, and undermine the accessibility value of captions entirely.
StreamTranslate is built on Deepgram Nova-2, which consistently outperforms competing ASR models on independent benchmarks. Nova-2 was trained specifically on streaming and conversational audio, making it particularly effective in the noisy, variable acoustic environments typical of gaming and live content creation.
Several factors influence WER on your stream: microphone quality, distance from mic, background game audio levels, speech rate, and accent. StreamTranslate addresses these by supporting custom vocabulary lists — you can add game titles, character names, community terms, and slang that general ASR models don't recognize, directly reducing WER for your specific content.
In benchmark testing, Deepgram Nova-2 achieves WER as low as 1-2% on clean audio, and typically 5-8% on noisy gaming audio — significantly better than older ASR systems that might hit 15-20% WER under the same conditions.
Deepgram Nova-2 achieves industry-leading word error rates on clear microphone audio, meaning almost every word is captured correctly.
Add gaming terms, streamer names, and niche vocabulary to further reduce WER for your specific content niche and community.
StreamTranslate's streaming pipeline processes audio chunks continuously, delivering low-latency captions without sacrificing accuracy.
WER is the percentage of words in a transcription that are incorrect. Lower WER means higher accuracy. A WER of 2% means 98 out of 100 words are correct.
Below 5% WER is considered very good for live streaming. Deepgram Nova-2, used by StreamTranslate, achieves WER as low as 1-2% on clear speech.
Background game audio, music, and microphone noise increase WER. Using a quality microphone and enabling noise suppression in OBS helps keep WER low.
StreamTranslate uses Deepgram Nova-2 which publishes independent benchmark WER data showing top performance across multiple speech datasets.
Use a dedicated microphone, reduce background noise, speak clearly, and add custom vocabulary for gaming terms to reduce word error rate on your captions.