What is WER (Word Error Rate) in Streaming Captions?

Q: What is a good WER for live stream captions?

A WER below 5% is considered very good for live streaming. Deepgram Nova-2, used by StreamTranslate, achieves WER as low as 1-2% on clear speech.

Q: Does StreamTranslate publish its WER?

StreamTranslate uses Deepgram Nova-2 which publishes benchmark WER data showing top performance across multiple speech datasets.

Understanding Word Error Rate

Word Error Rate (WER) is the standard metric for measuring the accuracy of automatic speech recognition systems. It compares the transcription produced by an ASR engine against a reference transcript and counts how many words were wrong. A WER of 5% means 5 out of every 100 words were incorrect — either substituted, deleted, or inserted incorrectly.

The formula is simple: WER = (Substitutions + Deletions + Insertions) / Total Words in Reference. A perfect transcription has a WER of 0%. The lower the WER, the better the caption quality your viewers see on stream.

For live streaming specifically, WER matters more than in post-production transcription because there's no opportunity to correct errors before viewers see them. A high WER creates embarrassing captions that misrepresent what you said, confuse non-native speakers relying on translations, and undermine the accessibility value of captions entirely.

How StreamTranslate Achieves Low WER

StreamTranslate is built on Deepgram Nova-2, which consistently outperforms competing ASR models on independent benchmarks. Nova-2 was trained specifically on streaming and conversational audio, making it particularly effective in the noisy, variable acoustic environments typical of gaming and live content creation.

Several factors influence WER on your stream: microphone quality, distance from mic, background game audio levels, speech rate, and accent. StreamTranslate addresses these by supporting custom vocabulary lists — you can add game titles, character names, community terms, and slang that general ASR models don't recognize, directly reducing WER for your specific content.

In benchmark testing, Deepgram Nova-2 achieves WER as low as 1-2% on clean audio, and typically 5-8% on noisy gaming audio — significantly better than older ASR systems that might hit 15-20% WER under the same conditions.

1-2% WER on Clean Audio

Deepgram Nova-2 achieves industry-leading word error rates on clear microphone audio, meaning almost every word is captured correctly.

Custom Vocabulary

Add gaming terms, streamer names, and niche vocabulary to further reduce WER for your specific content niche and community.

Real-Time Correction

StreamTranslate's streaming pipeline processes audio chunks continuously, delivering low-latency captions without sacrificing accuracy.

Frequently Asked Questions

What is Word Error Rate (WER)?

WER is the percentage of words in a transcription that are incorrect. Lower WER means higher accuracy. A WER of 2% means 98 out of 100 words are correct.

What is a good WER for live stream captions?

Below 5% WER is considered very good for live streaming. Deepgram Nova-2, used by StreamTranslate, achieves WER as low as 1-2% on clear speech.

How does background noise affect WER?

Background game audio, music, and microphone noise increase WER. Using a quality microphone and enabling noise suppression in OBS helps keep WER low.

Does StreamTranslate publish its WER benchmarks?

StreamTranslate uses Deepgram Nova-2 which publishes independent benchmark WER data showing top performance across multiple speech datasets.

How can I reduce WER on my stream?

Use a dedicated microphone, reduce background noise, speak clearly, and add custom vocabulary for gaming terms to reduce word error rate on your captions.