Model Usage Recommendations
TODO (Optional New Screenshot): Model selection matrix visual (App 2.0) Include: speed vs quality vs resource-cost comparison between Realtime and Whisper model families. Suggested filename:
model-selection-matrix-v2-en.png
Scope
This page provides model selection guidance only. It does not replace workflow-specific setup docs.
Use Cases
- First-time model selection
- Hardware refresh or downgrade planning
- Team-level baseline presets for repeatable quality
Steps
- Identify workflow type: file, realtime, link, or global realtime.
- Evaluate hardware: CPU/GPU, RAM, storage, and platform.
- Pick an initial model from the matrix.
- Validate with your own sample data.
- Tune parameters only after baseline model fit is confirmed.
| Workflow | Primary Recommendation | Alternative | Avoid |
|---|---|---|---|
| File transcription (general) | Small / Medium | Large-v3-Turbo | Large-v3 on low-end CPU |
| File transcription (accuracy-first) | Large-v3 / Large-v3-Turbo | Medium + tuning | High-latency devices without GPU |
| Realtime microphone | Realtime models (Sherpa) | Tiny / Base | Oversized models on weak hardware |
| Realtime app transcription | Realtime models (Sherpa) | Small (with GPU) | Heavy model under high load |
| Link-transcribe workflow | Small / Medium | Large-v3-Turbo | Fixed model without language checks |
| Global realtime (Beta) | Realtime models (Sherpa) | Tiny | Large persistent models |
Term Explanations
- Baseline model: first candidate model used for initial real-data validation.
- Realtime-first: strategy prioritizing low latency and session stability.
- Accuracy-first: strategy prioritizing transcription fidelity over speed/cost.
Real Scenario: Team-wide Baseline Policy
For mixed hardware fleets, a layered policy is often the most practical:
- Default baseline:
Small/Mediumfor broad compatibility. - Quality lane:
Large-v3/Turbofor critical deliverables. - Realtime lane: Sherpa family for low-latency sessions.
This reduces quality variance across teammates while keeping throughput predictable.
Common Mistakes
- Mistake 1: Relying on public benchmarks only.
Fix: validate with your own production-like samples. - Mistake 2: Forcing one model for all workflows.
Fix: split policies at least by latency-first vs accuracy-first goals. - Mistake 3: Skipping regression after upgrades.
Fix: re-test core presets after each major version/model change.
FAQ
Q: Why start with Small/Medium?
A: They are usually the best balance of quality, speed, and hardware cost.
Q: When should I move to Large-v3?
A: When quality requirements are high and hardware budget is sufficient.
Q: Are realtime models suitable for all languages?
A: Coverage varies by model set. Validate with your target language samples.
Limitations
- Model availability depends on app version and account entitlements.
- Realtime workflows prioritize latency and stability over raw benchmark quality.
- Final selection must be validated with your real production audio.
- Platform: Recommendation logic applies to both Windows/macOS, but available acceleration backends and practical ceilings differ.
- Release status: Dictation is still in rollout preparation and excluded from public workflow guidance.
Contact us