Concepts

TODO (Screenshot Replacement): Model settings page (App 2.0) Include: model family tabs (Whisper/Realtime), download status, default model selector, and GPU engine entry. Suggested filename:
settings-models-v2-en.png
Scope
Audio Note model capabilities are organized into three groups:
- Official Whisper models
- Community models
- Realtime models
Model selection controls recognition quality and speed, but not business workflows such as link download or watch scheduling.
Use Cases
- Low-end devices: realtime models or Whisper Tiny/Base
- Balanced quality and speed: Whisper Small/Medium
- Maximum quality: Whisper Large-v3 / Large-v3-Turbo
- English-only throughput:
.enand Distil English variants
Steps
- Open
Settings > Transcription. - Choose a model family for your workload.
- Download models to a stable local path.
- Run a baseline sample in your real scenario.
- Tune model size or advanced parameters based on quality/latency results.
Official Whisper models (current built-ins)
- Tiny / Tiny English
- Base / Base English
- Small / Small English
- Medium / Medium English
- Large-v2
- Large-v3
- Large-v3-Turbo
Community models (current built-ins)
- Distil Small English
- Distil Medium English
- Distil Large V2 English
- Distil Large V3
Realtime models (current built-ins)
- Sherpa ncnn: Chinese-English / Chinese / English / French
- Sherpa ONNX: Chinese-English / Chinese / English / French / Russian / Korean / Japanese
Built-in model lists may change by version.
Term Explanations
- Tiny / Base / Small / Medium / Large: model size tiers; larger tiers usually improve quality but increase latency/cost.
- Turbo: an optimized variant balancing speed and quality for larger models.
.envariants: English-optimized models with better speed/quality tradeoff for English-only workloads.- Community models: community-trained variants; validate with your own samples before production use.
See also: Model Usage Recommendations.
Real Scenario: From “Works Once” to “Works Every Day”
A common rollout mistake is selecting a large model for everyone on day one, then hitting long wait times and low adoption. A more practical path is:
- Start with
Small/Mediumas a team baseline. - Use
Large-v3/Turboonly for quality-critical outputs. - Document model presets so teammates can reproduce results.
This keeps daily throughput stable while still preserving a high-accuracy lane for important assets.
FAQ
Q: Are realtime models always faster than Whisper?
A: Usually in low-latency scenarios, but actual speed still depends on hardware and runtime settings.
Q: Are community models always more accurate?
A: Not always. They are often optimized for specific languages/domains and must be validated with your data.
Q: Is larger always better?
A: Larger models often improve quality but significantly increase compute and memory cost.
Common Mistakes
- Mistake 1: Choosing by model size only.
Fix: choose by workflow first (realtime vs offline accuracy), then tune size. - Mistake 2: Applying one heavy model to every machine.
Fix: keep a baseline + escalation strategy to balance reliability and quality. - Mistake 3: Skipping real-sample validation.
Fix: run A/B checks with your own audio before changing team defaults.
Limitations
- Status: Stable (non-Beta), with potential model-list updates in future releases.
- Availability depends on app version and subscription entitlements.
- Platform-specific acceleration differs: CUDA/Vulkan on Windows, CoreML on macOS.
- Large models require substantial disk, memory, and often GPU resources.
Contact us