📚 Documentation
Last updated: 2026-02-08

Advanced Parameter Transcription

TODO (Screenshot Replacement): Advanced parameter panel (App 2.0) Include: Prompt, Beam Search, no-speech threshold, temperature, max context, and reset/apply controls. Suggested filename: advanced-params-v2-en.png

Scope

Advanced parameters tune Whisper decoding behavior. They do not change business workflows such as link downloading or watch scheduling.

Key controls include:

  • Suppress non-speech tokens
  • Prompt
  • Decoding strategy (Greedy / Beam Search)
  • Maximum context
  • No-speech threshold
  • Length limit
  • Entropy threshold
  • Log probability threshold
  • Temperature

Use Cases

  • Hallucination or repetitive output in noisy audio
  • Domain-heavy vocabulary with unstable recognition
  • Long-form audio with coherence issues
  • High-accuracy review pipelines

Steps

  1. Open advanced parameters in transcription settings or task dialog.
  2. Run a baseline with defaults.
  3. Change only 1–2 parameters per test iteration.
  4. Track parameter/result pairs for reusable team presets.
  5. For realtime tasks, start with realtime presets before copying offline values.

Quick tuning tips

  • Hallucination: increase no-speech threshold, lower temperature, reduce max context
  • Terminology misses: add prompt terms and test Beam Search
  • Incomplete output: carefully lower log probability threshold

Term Explanations

  • Prompt: vocabulary/context hint for recognition, not a generic LLM command.
  • No-speech threshold: controls how aggressively silence/background is filtered.
  • Log probability threshold: confidence cutoff that affects truncation vs noise tolerance.

Real Scenario: Terminology-heavy Technical Recording

  1. Run a default baseline and mark high-error segments.
  2. Add domain terms into Prompt, then tune Beam Search only.
  3. If hallucination persists, adjust no-speech threshold and temperature.
  4. Change only 1–2 parameters per round and log outcomes for rollback.

This controlled iteration model is usually more reliable than bulk-changing all knobs at once.

Common Mistakes

  • Mistake 1: Expecting advanced params to replace model selection.
    Fix: choose the right model tier first, then tune decoding behavior.
  • Mistake 2: Editing too many parameters in one pass.
    Fix: isolate variables to understand real impact.
  • Mistake 3: Copying offline presets directly to realtime workloads.
    Fix: start with realtime-safe presets and tune for latency constraints.

FAQ

Q: Is Beam Search always better than Greedy?
A: Not always. Beam Search often improves stability but costs more latency.

Q: Can Prompt enforce output format like an LLM instruction?
A: No. Whisper prompts are contextual hints, not general instruction control.

Q: Why do results differ across machines with same settings?
A: Hardware, drivers, runtime backends, and resource pressure all influence outcomes.

Limitations

  • Parameter behavior is language- and noise-dependent.
  • Some advanced controls may be restricted by entitlement/version.
  • Extreme values can cause empty output, instability, or severe latency.
  • Platform: Advanced params are available on both Windows/macOS, but backend differences can change same-setting outcomes.
Whisper-Powered Live Transcription: Capture Speech from Mic, Apps & Media Files in Real Time

Contact us

Email
Copyright © 2026. Made by AudioNote, All rights reserved.