📚 Documentation

Realtime microphone transcription with Whisper and realtime models for meetings, spoken drafting, and live note capture.

Audio NoteRealtime microphone transcriptionSpeech recognition

Real-time Microphone Transcription

Microphone

Realtime microphone screenshot

Screenshot

What This Page Solves

Realtime microphone transcription is for “I need live text while I am speaking,” including:

personal meeting capture
spoken drafting and outlining
interviews or lectures where quick live text matters more than final polish

It is not the workflow for producing final-quality text immediately. It is the workflow for capturing information with low friction, then cleaning it up later in Note.

When To Use It

Prefer this workflow when

live feedback matters
you expect to clean up the result after the session
the main source is your microphone input

Do not start here when

the source already exists as a file: use File Transcription
you need app audio rather than microphone audio: use Realtime App Transcription
final accuracy matters more than live refresh: record first, then transcribe

Recommended Workflow

Confirm microphone, language, and model before starting.
Run a 30-60 second smoke test for latency, segmentation, and input level.
During the session, avoid frequent model/parameter switching.
After the session, move to Note for names, numbers, terminology, and structure cleanup.
Use AI Chat only after the transcript is trustworthy enough for semantic work.

Key Choices

1. Whisper or realtime model in live capture

Situation	First choice	Why
no strong GPU, low-latency stability matters most	Realtime model	better fit for live capture and usually no GPU dependency
strong GPU and higher quality target in realtime	Whisper	Whisper can still achieve good RTF on capable GPUs
lower-power devices	Realtime model	establish stability before pushing quality

2. When realtime models are the better default

Choose them first when you want:

lower latency
long-running live capture
lower hardware pressure
realtime results without depending on GPU

In implementation terms, current realtime models are carried by internal MLEngine. In user terms, they are simply the model route better suited for live transcription.

Common Mistakes And Troubleshooting

Treating live text as final copy Use Note as the quality gate before sharing anything externally.
Switching models mid-session Lock a stable configuration before the session and optimize afterward.
Blaming the model first Check microphone permissions, routing, input level, and ambient noise before changing model routes.
Assuming Whisper cannot work in realtime On strong GPUs, Whisper may be the better realtime choice.

Check in this order:

OS microphone permission
selected input device and input level
model/device fit
GPU/runtime stability when relevant
noise environment and segmentation behavior