📚 Documentation

Understand how Whisper, realtime models, and transcription scenarios relate inside Audio Note, without mixing user-facing model choices with internal runtime implementation.

Audio NoteWhisperRealtime models

Concepts

Settings

Transcription settings overview screenshot

Screenshot

What This Page Solves

In Audio Note, the most important distinction is not “which button starts transcription.” It is the difference between scenario and model route:

  • File Transcription, Realtime Microphone, and Realtime App are workflow scenarios.
  • Whisper and realtime models are model routes.
  • MLEngine is an internal runtime for non-Whisper models, not a third user-facing model category.

Mixing these layers usually creates two false assumptions:

  • “realtime transcription means realtime models only”
  • “MLEngine is another model family users must learn separately”

What Whisper and Realtime Models Are Good At

RouteBetter fitTypical characteristics
Whisperfile workflows, long-form audio, quality-first reviewhigher quality ceiling, richer size tiers, can also be viable in realtime on strong GPUs
Realtime modelsmicrophone/app live workflows, captions, lower-latency capturebetter RTF, lower latency, usually no GPU dependency, friendlier to lower-power devices

Realtime models is Audio Note’s umbrella term for models that are better suited for realtime workflows. In practice, they usually have better RTF (Real-Time Factor) and lower latency.

Does Realtime Transcription Mean Realtime Models Only

No.

Realtime is a scenario, not a model family. In realtime scenarios you can choose:

  • Realtime models Better defaults for low latency, lower-power devices, and long live sessions.
  • Whisper Still a valid option in realtime workflows when GPU performance is strong enough to sustain good RTF.

That is why Audio Note documents realtime transcription as a workflow, not as a synonym for realtime models.

Choose in this order:

  1. decide whether the workflow is file-based or realtime
  2. decide whether the goal is accuracy or lower latency
  3. then pick Whisper or realtime models based on device constraints
ScenarioFirst choiceWhen to test the other route
long-form meetings, courses, archive reviewWhisperlower-end hardware or draft-first workflows
microphone captions, spoken draftingRealtime modelstrong GPU and quality-sensitive realtime workflows
app audio live captureRealtime modelwhen GPU headroom is strong and Whisper performs well in sample tests
lower-power devicesRealtime or lighter Whisper tiersonly upgrade when sample validation proves it is worth it

Why You May Still See MLEngine

User-facing documentation should focus on Whisper and realtime models.

MLEngine appears only as internal implementation context:

  • it is the runtime used for non-Whisper inference
  • current realtime models are carried by it
  • it is not an extra model category users need to choose from

The practical distinction is simple: users choose model routes, the app chooses runtime implementation.

Common Mistakes

  • Treating realtime scenarios as realtime-model-only Realtime models are often the better default, but Whisper can absolutely be the better answer on strong GPUs.
  • Treating MLEngine as a third user-facing model class It is an implementation detail, not a workflow concept.
  • Choosing by model size only Realtime live capture and long-form review optimize for different goals.
Whisper-Powered Live Transcription: Capture Speech from Mic, Apps & Media Files in Real Time

Contact us

Email
Copyright © 2026. Made by AudioNote, All rights reserved.