Bad captions are worse than no captions. Here's how to get the most accurate subtitles possible from StreamTranslate — microphone tips, audio settings, and speech practices.
Start Free TrialCaption accuracy is primarily determined by audio quality. The Deepgram Nova-2 STT engine that powers StreamTranslate is highly capable — it handles accents, natural speech patterns, and diverse vocabulary well. But it can't accurately transcribe speech buried in background noise, distorted by bad microphone placement, or obscured by game audio bleeding into the vocal track.
Improving caption accuracy is mostly about improving your audio setup — not a StreamTranslate configuration issue. Here's the hierarchy of factors from most to least impactful.
The single most impactful change you can make is getting your microphone closer to your mouth. The difference between a microphone at 12 inches and one at 3-4 inches can be the difference between 80% and 96% caption accuracy. Basic acoustic physics: sound pressure falls off with distance squared. A mic half as far away receives 4x the sound pressure from your voice relative to background noise.
Optimal position for most USB and condenser microphones: 4-6 inches from your mouth, slightly to the side (to avoid plosive sounds like 'p' and 'b' directly into the capsule). For dynamic microphones (Shure SM7B): 2-4 inches, as they require close proximity due to lower sensitivity.
USB Condensers (Blue Yeti, HyperX QuadCast): Good for caption accuracy when positioned correctly. Side-address design means speak into the front face, not the top. XLR Condensers + Interface (Rode NT1, Focusrite Scarlett): Excellent audio quality with proper gain staging. Dynamic Microphones (Shure SM7B, Rode Procaster): Require close proximity but reject background noise excellently — great for gaming environments with loud keyboards and case fans.
Gaming environments are noisy. Mechanical keyboards, CPU fans, case fans, air conditioning, chair movement, and room reverb all add to the noise floor that StreamTranslate has to filter out. Here's how to reduce background noise impact:
In OBS, click the gear icon next to your microphone source. Add the "Noise Suppression" filter and select "RNNoise (ML-based)" if available, or "Speex Noise Suppression" otherwise. This significantly reduces background noise before audio reaches StreamTranslate.
A noise gate automatically mutes your microphone when you're not speaking, preventing background noise during silent moments from being sent to StreamTranslate. Add the "Noise Gate" filter in OBS and set close threshold slightly above your background noise floor.
In OBS audio settings, ensure your game audio (desktop audio) is on a separate channel from your microphone. StreamTranslate should only receive your microphone audio, not a mix of game audio and voice.
Deepgram Nova-2 is trained on diverse content and handles most vocabulary well, but game-specific proper nouns (character names, game items, map names) can sometimes be transcribed incorrectly. For example: "Kraken" might come out as "crackin," "Valorant" might be transcribed as two words, streamer-specific terms might be guessed incorrectly.
The practical approach: game-specific vocabulary errors are usually minor and viewers familiar with the game understand the intended word from context. Focus your accuracy improvement effort on clear speech and good audio setup rather than trying to train the STT engine on specific vocabulary.
You don't need to speak unnaturally or robotically. But a few habits improve accuracy significantly: don't mumble (fully form your words), pause naturally between sentences (Deepgram uses sentence boundaries for segmentation), reduce filler sounds ("um," "uh" are captioned as such — not wrong, just distracting), and avoid talking with food in your mouth or turning away from the microphone.
Your microphone level in OBS should peak around -12dB to -6dB when speaking at normal volume. This gives enough gain for StreamTranslate to receive clear audio without clipping. Use the OBS Audio Mixer meters to check your levels before streaming. A microphone too quiet gives poor signal-to-noise ratio; too loud causes clipping that degrades STT accuracy.
Deepgram Nova-2 is trained on diverse English accents including American, British, Australian, Indian, and others. Accent-related accuracy issues are generally minor with Nova-2 compared to older STT engines. If you have a strong regional accent and notice consistent misrecognition of specific sounds, speaking slightly more deliberately (not unnaturally) often helps more than any technical change.
Get your microphone closer to your mouth — ideally 3-6 inches. This single change has more impact on STT accuracy than any other adjustment. Audio quality is the primary driver of caption accuracy.
Yes. A noise gate prevents background noise during silent moments from being sent to StreamTranslate. Combined with the RNNoise suppression filter, this significantly improves accuracy in noisy gaming environments.
Game-specific proper nouns (character names, map names, item names) are sometimes misrecognized because they're not common English words. This is normal and unavoidable. Focus on overall audio quality for best results.
Aim for microphone peaks between -12dB and -6dB in the OBS Audio Mixer. This ensures StreamTranslate receives clear audio without clipping or being too quiet.
Keyboard sounds can bleed into the microphone and reduce accuracy, especially with condenser mics. Use a dynamic microphone (like Shure SM7B) for better background noise rejection, or add foam padding under your keyboard.
In OBS Audio Settings, set your microphone as a separate input from your desktop audio (game audio). StreamTranslate's browser source should only receive microphone audio — game audio mixed in degrades STT accuracy.