📚 Documentation
Last updated: 2026-02-08

Model Usage Recommendations

TODO (Optional New Screenshot): Model selection matrix visual (App 2.0) Include: speed vs quality vs resource-cost comparison between Realtime and Whisper model families. Suggested filename: model-selection-matrix-v2-en.png

Scope

This page provides model selection guidance only. It does not replace workflow-specific setup docs.

Use Cases

  • First-time model selection
  • Hardware refresh or downgrade planning
  • Team-level baseline presets for repeatable quality

Steps

  1. Identify workflow type: file, realtime, link, or global realtime.
  2. Evaluate hardware: CPU/GPU, RAM, storage, and platform.
  3. Pick an initial model from the matrix.
  4. Validate with your own sample data.
  5. Tune parameters only after baseline model fit is confirmed.
WorkflowPrimary RecommendationAlternativeAvoid
File transcription (general)Small / MediumLarge-v3-TurboLarge-v3 on low-end CPU
File transcription (accuracy-first)Large-v3 / Large-v3-TurboMedium + tuningHigh-latency devices without GPU
Realtime microphoneRealtime models (Sherpa)Tiny / BaseOversized models on weak hardware
Realtime app transcriptionRealtime models (Sherpa)Small (with GPU)Heavy model under high load
Link-transcribe workflowSmall / MediumLarge-v3-TurboFixed model without language checks
Global realtime (Beta)Realtime models (Sherpa)TinyLarge persistent models

Term Explanations

  • Baseline model: first candidate model used for initial real-data validation.
  • Realtime-first: strategy prioritizing low latency and session stability.
  • Accuracy-first: strategy prioritizing transcription fidelity over speed/cost.

Real Scenario: Team-wide Baseline Policy

For mixed hardware fleets, a layered policy is often the most practical:

  1. Default baseline: Small/Medium for broad compatibility.
  2. Quality lane: Large-v3/Turbo for critical deliverables.
  3. Realtime lane: Sherpa family for low-latency sessions.

This reduces quality variance across teammates while keeping throughput predictable.

Common Mistakes

  • Mistake 1: Relying on public benchmarks only.
    Fix: validate with your own production-like samples.
  • Mistake 2: Forcing one model for all workflows.
    Fix: split policies at least by latency-first vs accuracy-first goals.
  • Mistake 3: Skipping regression after upgrades.
    Fix: re-test core presets after each major version/model change.

FAQ

Q: Why start with Small/Medium?
A: They are usually the best balance of quality, speed, and hardware cost.

Q: When should I move to Large-v3?
A: When quality requirements are high and hardware budget is sufficient.

Q: Are realtime models suitable for all languages?
A: Coverage varies by model set. Validate with your target language samples.

Limitations

  • Model availability depends on app version and account entitlements.
  • Realtime workflows prioritize latency and stability over raw benchmark quality.
  • Final selection must be validated with your real production audio.
  • Platform: Recommendation logic applies to both Windows/macOS, but available acceleration backends and practical ceilings differ.
  • Release status: Dictation is still in rollout preparation and excluded from public workflow guidance.
Whisper-Powered Live Transcription: Capture Speech from Mic, Apps & Media Files in Real Time

Contact us

Email
Copyright © 2026. Made by AudioNote, All rights reserved.