📚 Documentation

Minimal, targeted tuning for Whisper and realtime models when the baseline already works but accuracy, segmentation, or latency still need work.

📚 DocumentationGuide

Advanced Parameter Transcription

Advanced

Advanced transcription tuning screenshot

Screenshot

What This Page Solves

Advanced parameters are for targeted correction, not for first-time setup.

Use them when:

  • the model route is already correct, but one class of errors keeps repeating
  • the output is close to usable, but not stable enough yet
  • you need a deliberate tradeoff between accuracy, segmentation, latency, and stability

When To Read This Page

Read it when

  • noisy audio causes hallucinated text
  • names, terms, or abbreviations are unstable
  • segments are too short or too long
  • realtime latency is acceptable but segmentation still feels wrong

Do not start here when

  • you have not chosen between Whisper and realtime models yet
  • your first baseline is not stable
  • you are about to change many parameters at once
  1. Change one parameter at a time, or two at most.
  2. Re-test on the same sample clip every round.
  3. Start from the symptom, not from the parameter list.
  4. Promote only proven settings into a reusable preset.

Whisper: Which Problems It Usually Solves Best

Hallucination and repeated text

  • no-speech threshold
  • max context
  • temperature

Terminology, proper nouns, abbreviations

  • prompt
  • Beam Search or Greedy
  • best-of or beam-size

Only part of the source should be processed

  • segment transcription
  • offset range
  • length limits

Realtime Models: Which Problems They Usually Solve Best

Realtime-model tuning is usually less about decoding strategy and more about segmentation behavior:

  • when speech should start
  • when a segment should end
  • how much padding should be added
  • how to avoid fragments that are too short or too long

Typical controls include:

  • VAD scene presets
  • minimum speech and silence duration
  • minimum and maximum segment duration
  • pre/post padding and merge gap
  • thread count

Internally, current realtime-model inference is carried by MLEngine. In public workflow terms, you only need to understand that these are the models better suited for realtime capture.

Common Mistakes And Troubleshooting

  • Expecting advanced params to replace model selection They cannot fix the wrong route choice.
  • Changing many controls in one pass You lose the ability to explain the result.
  • Copying file-transcription settings directly into realtime workflows Realtime workflows must respect latency and segmentation first.
  • Increasing model size every time you see errors Sometimes the problem is language choice, VAD behavior, or source noise, not model size.
Whisper-Powered Live Transcription: Capture Speech from Mic, Apps & Media Files in Real Time

Contact us

Email
Copyright © 2026. Made by AudioNote, All rights reserved.