📚 Documentation
Last updated: 2026-02-08

GPU Transcription

TODO (Screenshot Replacement): GPU engine selection page (App 2.0) Include: CUDA/Vulkan/CoreML options, detected hardware info, and fallback guidance. Suggested filename: gpu-engine-settings-v2-en.png

Scope

GPU settings accelerate Whisper-style model inference. Current desktop engines:

  1. CUDA (Windows, NVIDIA)
  2. Vulkan (Windows, multi-vendor)
  3. CoreML (macOS, Apple devices)

Use Cases

  • Long-form transcription jobs
  • Medium/Large model workloads
  • Throughput-focused batch processing

Steps

  1. Open Settings > Transcription.
  2. Select the GPU engine for your platform/hardware.
  3. Run CPU vs GPU comparison on the same media sample.
  4. Keep GPU as default only after confirming stability.
  5. If failures occur, fall back to CPU and check drivers/runtime dependencies.

Benchmark Tips

  1. Compare CPU vs GPU on the same file, model, and language settings.
  2. Measure both elapsed time and failure/retry rate.
  3. Validate short and long audio separately (benefits differ by duration).

Term Explanations

  • Runtime libraries: backend dependencies required by specific GPU engines.
  • VRAM: GPU memory; low VRAM often causes OOM or forced fallback.
  • Fallback path: safe recovery route (usually CPU) when acceleration fails.

Real Scenario: Weekly Batch Meeting Processing

  1. Run one baseline task on CPU and record time/failure behavior.
  2. Run the same task on GPU with identical model/language settings.
  3. Roll out GPU defaults only if speed gains remain consistent and retry cost stays low.

This avoids false optimization where peak speed improves but real workflow reliability degrades.

Common Mistakes

  • Mistake 1: Enabling GPU and immediately scaling full concurrency.
    Fix: start with small batches, then increase load gradually.
  • Mistake 2: Judging compatibility by GPU model alone.
    Fix: include driver/runtime/OS patch status in checks.
  • Mistake 3: Abandoning GPU after one failure.
    Fix: keep CPU fallback for continuity, then diagnose backend issues incrementally.

FAQ

Q: CUDA or Vulkan on Windows?
A: Prefer CUDA on NVIDIA GPUs. Use Vulkan for non-NVIDIA hardware or compatibility fallback.

Q: Why no CUDA on macOS?
A: macOS acceleration path is CoreML/Metal-oriented.

Q: Is GPU always faster?
A: Usually for larger models and longer audio. Gains can be modest on short/light tasks.

Limitations

  • Status: Stable (non-Beta); engine/driver compatibility still changes by OS updates.
  • Engine availability depends on OS, GPU model, drivers, and app entitlements.
  • CUDA may require additional runtime libraries on first use.
  • Low-VRAM devices may fail or throttle under large models.
Whisper-Powered Live Transcription: Capture Speech from Mic, Apps & Media Files in Real Time

Contact us

Email
Copyright © 2026. Made by AudioNote, All rights reserved.