📚 Documentation

Realtime microphone transcription with Whisper and realtime models for meetings, spoken drafting, and live note capture.

Audio NoteRealtime microphone transcriptionSpeech recognition

Real-time Microphone Transcription

Microphone

Realtime microphone screenshot

Screenshot

What This Page Solves

Realtime microphone transcription is for “I need live text while I am speaking,” including:

  • personal meeting capture
  • spoken drafting and outlining
  • interviews or lectures where quick live text matters more than final polish

It is not the workflow for producing final-quality text immediately. It is the workflow for capturing information with low friction, then cleaning it up later in Note.

When To Use It

Prefer this workflow when

  • live feedback matters
  • you expect to clean up the result after the session
  • the main source is your microphone input

Do not start here when

  1. Confirm microphone, language, and model before starting.
  2. Run a 30-60 second smoke test for latency, segmentation, and input level.
  3. During the session, avoid frequent model/parameter switching.
  4. After the session, move to Note for names, numbers, terminology, and structure cleanup.
  5. Use AI Chat only after the transcript is trustworthy enough for semantic work.

Key Choices

1. Whisper or realtime model in live capture

SituationFirst choiceWhy
no strong GPU, low-latency stability matters mostRealtime modelbetter fit for live capture and usually no GPU dependency
strong GPU and higher quality target in realtimeWhisperWhisper can still achieve good RTF on capable GPUs
lower-power devicesRealtime modelestablish stability before pushing quality

2. When realtime models are the better default

Choose them first when you want:

  • lower latency
  • long-running live capture
  • lower hardware pressure
  • realtime results without depending on GPU

In implementation terms, current realtime models are carried by internal MLEngine. In user terms, they are simply the model route better suited for live transcription.

Common Mistakes And Troubleshooting

  • Treating live text as final copy Use Note as the quality gate before sharing anything externally.
  • Switching models mid-session Lock a stable configuration before the session and optimize afterward.
  • Blaming the model first Check microphone permissions, routing, input level, and ambient noise before changing model routes.
  • Assuming Whisper cannot work in realtime On strong GPUs, Whisper may be the better realtime choice.

Check in this order:

  1. OS microphone permission
  2. selected input device and input level
  3. model/device fit
  4. GPU/runtime stability when relevant
  5. noise environment and segmentation behavior
Whisper-Powered Live Transcription: Capture Speech from Mic, Apps & Media Files in Real Time

Contact us

Email
Copyright © 2026. Made by AudioNote, All rights reserved.