📚 Documentation

Choose a sensible first model by device performance, latency target, and workload type instead of guessing from model names alone.

📚 DocumentationGuide

Model Usage Recommendations

Settings

Transcription settings overview screenshot

Screenshot

What This Page Solves

This page does not explain every parameter. It answers the more useful question:

What should I try first on this machine, for this kind of task?

Start from what matters most:

accuracy first Start with Whisper.
lower latency first Start with a realtime model.
lower-power device, stability first Start with lighter realtime models or lighter Whisper tiers.
strong GPU, realtime but quality-sensitive Whisper is still worth testing.

Situation	First choice	Alternative	Usually avoid
file transcription, safe first baseline	Whisper Small / Medium	move up to Large only after sample validation	jumping straight to the heaviest model
long-form audio, higher-quality final output	Whisper Large family	Medium plus tuning when hardware is tighter	defaulting to Large on weak hardware
realtime microphone without a strong GPU	Realtime model	Whisper after latency validation	heavy Whisper tiers on weak hardware
realtime app capture with long live sessions	Realtime model	Whisper if GPU headroom and RTF both validate well	choosing by quality alone and ignoring session stability
lower-power devices	Lightweight realtime or Whisper Tiny / Base	upgrade later after real validation	starting with heavy models
strong GPU, realtime plus higher quality	Whisper	realtime models when lower latency matters more	assuming realtime models are always better in every live scenario

If the machine is modest, the first goal is a workflow that finishes reliably. Optimization comes second.

A practical team policy is:

Trusting public benchmarks more than your own samples Always validate against your real meetings, lectures, or interviews.
Forcing one model into every workflow At least separate accuracy-first from latency-first use.
Not re-testing after hardware changes Model choices are strongly hardware-dependent.
Treating realtime models as realtime-only forever They are usually better for realtime, but can still participate in other workflows.