Question 1

What is speech-to-text (STT)?

Accepted Answer

Speech-to-text (STT) is technology that converts spoken audio into written text. It's used in live captioning, voice assistants, dictation software, and real-time translation tools for streamers.

Question 2

What is the difference between STT and ASR?

Accepted Answer

STT (Speech-to-Text) and ASR (Automatic Speech Recognition) are effectively synonymous terms for the same technology. STT is the more consumer-facing term; ASR is more technical.

Question 3

Which STT engine does StreamTranslate use?

Accepted Answer

StreamTranslate uses Deepgram Nova-2, the industry's leading real-time STT engine with best-in-class accuracy on conversational and streaming audio.

Question 4

How accurate is STT for live streaming?

Accepted Answer

Modern STT engines like Deepgram Nova-2 achieve over 99% accuracy on clean audio. Gaming streams with background audio typically achieve 92-97% accuracy depending on microphone quality.

Question 5

Is STT fast enough for live captions?

Accepted Answer

Yes. Streaming STT engines like Deepgram Nova-2 return transcribed text in 100-200ms, which is fast enough for captions to appear synchronized with speech on live streams.

What is STT (Speech-to-Text) for Streaming?

Speech-to-Text: Converting Audio to Words in Real Time

Why STT Quality Matters for Your Stream

Deepgram Nova-2

100-200ms Transcription

Custom Vocabulary

Frequently Asked Questions

What is speech-to-text (STT)?

What is the difference between STT and ASR?

Which STT engine does StreamTranslate use?

How accurate is STT for live streaming?

Is STT fast enough for live captions?