📚 Documentation

Quick start guide for Audio Note desktop, designed to help you set up models, run a first transcription, and validate the full note workflow in about 10 minutes.

📚 DocumentationGuide

Getting Started

Workspace

Home overview screenshot

Screenshot

What This Page Solves

First-time users usually do not get blocked by button placement. They get blocked by three decisions:

  • which model to download first
  • whether the first task should be file-based or realtime
  • where to go after transcription for cleanup, AI follow-up, and export

This page is not the full manual. It is the shortest reliable path to a working baseline.

  1. Open Settings > Transcription and confirm model/cache paths plus default language.
  2. Download one “safe baseline” model set.
  3. Run a 3-5 minute real sample through transcription.
  4. Open the result in Note and verify editing, export, and AI follow-up.
  5. Only after that, tune GPU, batching, realtime, or advanced parameters.

The goal is to establish a reusable baseline before optimizing for speed or quality.

What To Download First

You do not need the full model library on day one.

Your goalFirst choiceWhy
Validate the file workflow firstWhisper Small / Mediumbalanced quality and speed
Get live text without a strong GPURealtime modelbetter fit for realtime, usually without GPU dependency
Run realtime with a strong GPU and higher quality targetWhisperWhisper can still achieve good RTF on capable GPUs
Start on a lower-power machineLightweight realtime or Whisper Tiny / Basefocus on stable output first

Realtime models is Audio Note’s umbrella term for models that are better suited for realtime workflows. They typically have better RTF, lower latency, and friendlier device requirements, and usually do not depend on GPU.

For deeper selection guidance, continue with Model Usage Recommendations and Concepts.

How To Validate Your First Setup

Validation 1: file workflow

Run one real audio or video sample through File Transcription. This is the fastest way to verify:

  • the model is available and usable
  • cache and storage paths are correct
  • export, note, and AI workflows all connect properly

Validation 2: realtime workflow

If your main use case is meetings, interviews, or spoken drafting, also run one Realtime Microphone session. Only check three things:

  • can the latency be tolerated
  • are segmentation and refresh stable
  • does the result flow into Note correctly

Validation 3: post-processing workflow

Do not stop at “text appeared.” Confirm that:

  • you can fix names, numbers, and terminology in Note
  • you can extract summaries or action items in AI Chat
  • the export format matches your downstream workflow

Common First-Time Mistakes

  • Too many models, no clear baseline Start with one file-oriented model and one realtime-oriented model.
  • Jumping straight to the largest model Establish a Small / Medium baseline first, then decide whether Large is worth it.
  • Treating first-pass output as final output Realtime text and first-pass transcripts are usually better treated as draft material.
Whisper-Powered Live Transcription: Capture Speech from Mic, Apps & Media Files in Real Time

Contact us

Email
Copyright © 2026. Made by AudioNote, All rights reserved.