📚 Documentation

Understand how Whisper, realtime models, and transcription scenarios relate inside Audio Note, without mixing user-facing model choices with internal runtime implementation.

Audio NoteWhisperRealtime models

Concepts

Settings

Transcription settings overview screenshot

Screenshot

What This Page Solves

In Audio Note, the most important distinction is not “which button starts transcription.” It is the difference between scenario and model route:

File Transcription, Realtime Microphone, and Realtime App are workflow scenarios.
Whisper and realtime models are model routes.
MLEngine is an internal runtime for non-Whisper models, not a third user-facing model category.

Mixing these layers usually creates two false assumptions:

“realtime transcription means realtime models only”
“MLEngine is another model family users must learn separately”

What Whisper and Realtime Models Are Good At

Route	Better fit	Typical characteristics
Whisper	file workflows, long-form audio, quality-first review	higher quality ceiling, richer size tiers, can also be viable in realtime on strong GPUs
Realtime models	microphone/app live workflows, captions, lower-latency capture	better RTF, lower latency, usually no GPU dependency, friendlier to lower-power devices

Realtime models is Audio Note’s umbrella term for models that are better suited for realtime workflows. In practice, they usually have better RTF (Real-Time Factor) and lower latency.

Does Realtime Transcription Mean Realtime Models Only

No.

Realtime is a scenario, not a model family. In realtime scenarios you can choose:

Realtime models Better defaults for low latency, lower-power devices, and long live sessions.
Whisper Still a valid option in realtime workflows when GPU performance is strong enough to sustain good RTF.

That is why Audio Note documents realtime transcription as a workflow, not as a synonym for realtime models.

Recommended Decision Order

Choose in this order:

decide whether the workflow is file-based or realtime
decide whether the goal is accuracy or lower latency
then pick Whisper or realtime models based on device constraints

Scenario	First choice	When to test the other route
long-form meetings, courses, archive review	Whisper	lower-end hardware or draft-first workflows
microphone captions, spoken drafting	Realtime model	strong GPU and quality-sensitive realtime workflows
app audio live capture	Realtime model	when GPU headroom is strong and Whisper performs well in sample tests
lower-power devices	Realtime or lighter Whisper tiers	only upgrade when sample validation proves it is worth it

Why You May Still See MLEngine

User-facing documentation should focus on Whisper and realtime models.

MLEngine appears only as internal implementation context:

it is the runtime used for non-Whisper inference
current realtime models are carried by it
it is not an extra model category users need to choose from

The practical distinction is simple: users choose model routes, the app chooses runtime implementation.

Common Mistakes

Treating realtime scenarios as realtime-model-only Realtime models are often the better default, but Whisper can absolutely be the better answer on strong GPUs.
Treating MLEngine as a third user-facing model class It is an implementation detail, not a workflow concept.
Choosing by model size only Realtime live capture and long-form review optimize for different goals.