Understand how Whisper, realtime models, and transcription scenarios relate inside Audio Note, without mixing user-facing model choices with internal runtime implementation.
Concepts
Settings
Transcription settings overview screenshot
What This Page Solves
In Audio Note, the most important distinction is not “which button starts transcription.” It is the difference between scenario and model route:
File Transcription,Realtime Microphone, andRealtime Appare workflow scenarios.Whisperandrealtime modelsare model routes.MLEngineis an internal runtime for non-Whisper models, not a third user-facing model category.
Mixing these layers usually creates two false assumptions:
- “realtime transcription means realtime models only”
- “MLEngine is another model family users must learn separately”
What Whisper and Realtime Models Are Good At
| Route | Better fit | Typical characteristics |
|---|---|---|
| Whisper | file workflows, long-form audio, quality-first review | higher quality ceiling, richer size tiers, can also be viable in realtime on strong GPUs |
| Realtime models | microphone/app live workflows, captions, lower-latency capture | better RTF, lower latency, usually no GPU dependency, friendlier to lower-power devices |
Realtime models is Audio Note’s umbrella term for models that are better suited for realtime workflows. In practice, they usually have better RTF (Real-Time Factor) and lower latency.
Does Realtime Transcription Mean Realtime Models Only
No.
Realtime is a scenario, not a model family. In realtime scenarios you can choose:
- Realtime models Better defaults for low latency, lower-power devices, and long live sessions.
- Whisper Still a valid option in realtime workflows when GPU performance is strong enough to sustain good RTF.
That is why Audio Note documents realtime transcription as a workflow, not as a synonym for realtime models.
Recommended Decision Order
Choose in this order:
- decide whether the workflow is file-based or realtime
- decide whether the goal is accuracy or lower latency
- then pick Whisper or realtime models based on device constraints
| Scenario | First choice | When to test the other route |
|---|---|---|
| long-form meetings, courses, archive review | Whisper | lower-end hardware or draft-first workflows |
| microphone captions, spoken drafting | Realtime model | strong GPU and quality-sensitive realtime workflows |
| app audio live capture | Realtime model | when GPU headroom is strong and Whisper performs well in sample tests |
| lower-power devices | Realtime or lighter Whisper tiers | only upgrade when sample validation proves it is worth it |
Why You May Still See MLEngine
User-facing documentation should focus on Whisper and realtime models.
MLEngine appears only as internal implementation context:
- it is the runtime used for non-Whisper inference
- current realtime models are carried by it
- it is not an extra model category users need to choose from
The practical distinction is simple: users choose model routes, the app chooses runtime implementation.
Common Mistakes
- Treating realtime scenarios as realtime-model-only Realtime models are often the better default, but Whisper can absolutely be the better answer on strong GPUs.
- Treating MLEngine as a third user-facing model class It is an implementation detail, not a workflow concept.
- Choosing by model size only Realtime live capture and long-form review optimize for different goals.
Read Next
- Device- and scenario-based model selection: Model Usage Recommendations
- GPU decisions and tradeoffs: GPU Transcription
- File-first workflows: File Transcription
- Realtime live capture workflows: Realtime Microphone Transcription
- Parameter tuning after route selection: Advanced Parameter Transcription