📚 Documentation
Last updated: 2026-02-08

File Transcription

TODO (Screenshot Replacement): File transcription parameter dialog (App 2.0) Include: file queue list, model/language selectors, GPU toggle, translation toggle, and batch apply button. Suggested filename: file-transcription-dialog-v2-en.png

Scope

File Transcription handles local media transcription workflows:

  • Import (drag-and-drop / file picker)
  • Model and language selection
  • Queue execution for single or batch tasks
  • Note-page editing and export

It does not handle URL downloading. Use Link Transcription for URL-based input.

Use Cases

  • Meetings, interviews, lectures
  • Bulk processing of podcast/live replay assets
  • Subtitle and text output pipelines

Steps

  1. Click Transcribe Files on Home or drag files into the dropzone.
  2. Choose model, language, GPU option, and translation option.
  3. For multiple files, review batch parameter configuration.
  4. Start transcription and monitor queue status.
  5. Open results in the Note page for editing and export.

Supported file formats

  • Audio: MP3, WAV, M4A, FLAC, AAC
  • Video: MP4, AVI, MOV, MKV, FLV

Actual support can vary by codec/container. The in-app file picker is the source of truth.

Parameter tips

  • Lightweight tasks: Tiny/Base + auto language
  • Balanced quality: Small/Medium + explicit language
  • Higher quality: Large-v3 or Large-v3-Turbo + GPU
  • For unstable output, tune Advanced Parameter Transcription

Term Explanations

  • Batch parameters: one shared parameter set applied to multiple files.
  • Translate to English: transcribe source speech and output English text; not a bilingual side-by-side mode.
  • Subtitle export (SRT/VTT): time-coded formats for video players and editing tools.

Practical Workflow (Less Rework)

  1. Start with 1–2 sample files before launching full batch jobs.
  2. Validate text quality first, then optimize throughput and model size.
  3. Use consistent titles (date/project tags) for easier downstream search.
  4. Test one export sample before processing the entire batch.

Troubleshooting Order

  1. Check task phase first (queue/model/runtime/export).
  2. Check storage/path writeability and free space.
  3. If GPU fails, verify CPU baseline first, then debug drivers/runtime.
  4. Re-encode malformed media when container/codec issues are suspected.

Real Scenario (Course Replay Archive)

A common case is processing 10+ lecture replays into searchable notes within a short deadline.

  1. Build a baseline on one sample file (model/language/export format).
  2. Launch batch only after quality is validated.
  3. Normalize terms in Note first, then generate section summaries with AI Chat.

Common Mistakes and Better Alternatives

  • Mistake: mixing different-language media in one batch
    Better: split batches by language to reduce auto-detection drift.
  • Mistake: changing parameters while a batch is running
    Better: keep one parameter profile per batch, then run A/B in the next pass.
  • Mistake: treating raw export as final output
    Better: run a quick editorial pass in Note before distribution.

FAQ

Q: Is batch transcription available to all plans?
A: Availability depends on account entitlements. Typical free-tier usage is single-file-first.

Q: How can I speed up large batches?
A: Use GPU, set practical concurrency, and avoid unnecessarily large models.

Q: Why does transcription stall?
A: Common causes are missing model files, low disk space, incompatible GPU setup, or malformed media files.

Limitations

  • Advanced settings and some models may require activated subscription features.
  • Large models and high concurrency are hardware intensive.
  • Some uncommon formats may require pre-conversion before import.
  • Platform: Windows and macOS share the same workflow, but GPU backend and permission flows differ.
Whisper-Powered Live Transcription: Capture Speech from Mic, Apps & Media Files in Real Time

Contact us

Email
Copyright © 2026. Made by AudioNote, All rights reserved.