📚 Documentation
Realtime microphone transcription with Whisper and realtime models for meetings, spoken drafting, and live note capture.
Audio NoteRealtime microphone transcriptionSpeech recognition
Real-time Microphone Transcription
Microphone
Realtime microphone screenshot
What This Page Solves
Realtime microphone transcription is for “I need live text while I am speaking,” including:
- personal meeting capture
- spoken drafting and outlining
- interviews or lectures where quick live text matters more than final polish
It is not the workflow for producing final-quality text immediately. It is the workflow for capturing information with low friction, then cleaning it up later in Note.
When To Use It
Prefer this workflow when
- live feedback matters
- you expect to clean up the result after the session
- the main source is your microphone input
Do not start here when
- the source already exists as a file: use File Transcription
- you need app audio rather than microphone audio: use Realtime App Transcription
- final accuracy matters more than live refresh: record first, then transcribe
Recommended Workflow
- Confirm microphone, language, and model before starting.
- Run a 30-60 second smoke test for latency, segmentation, and input level.
- During the session, avoid frequent model/parameter switching.
- After the session, move to Note for names, numbers, terminology, and structure cleanup.
- Use AI Chat only after the transcript is trustworthy enough for semantic work.
Key Choices
1. Whisper or realtime model in live capture
| Situation | First choice | Why |
|---|---|---|
| no strong GPU, low-latency stability matters most | Realtime model | better fit for live capture and usually no GPU dependency |
| strong GPU and higher quality target in realtime | Whisper | Whisper can still achieve good RTF on capable GPUs |
| lower-power devices | Realtime model | establish stability before pushing quality |
2. When realtime models are the better default
Choose them first when you want:
- lower latency
- long-running live capture
- lower hardware pressure
- realtime results without depending on GPU
In implementation terms, current realtime models are carried by internal MLEngine. In user terms, they are simply the model route better suited for live transcription.
Common Mistakes And Troubleshooting
- Treating live text as final copy Use Note as the quality gate before sharing anything externally.
- Switching models mid-session Lock a stable configuration before the session and optimize afterward.
- Blaming the model first Check microphone permissions, routing, input level, and ambient noise before changing model routes.
- Assuming Whisper cannot work in realtime On strong GPUs, Whisper may be the better realtime choice.
Check in this order:
- OS microphone permission
- selected input device and input level
- model/device fit
- GPU/runtime stability when relevant
- noise environment and segmentation behavior
Read Next
- App-audio live capture: Realtime App Transcription
- Record-first workflows: Recording
- Model route guidance: Model Usage Recommendations
- Tuning after the baseline is usable: Advanced Parameter Transcription