Quick start guide for Audio Note desktop, designed to help you set up models, run a first transcription, and validate the full note workflow in about 10 minutes.
Getting Started
Workspace
Home overview screenshot
What This Page Solves
First-time users usually do not get blocked by button placement. They get blocked by three decisions:
- which model to download first
- whether the first task should be file-based or realtime
- where to go after transcription for cleanup, AI follow-up, and export
This page is not the full manual. It is the shortest reliable path to a working baseline.
Recommended First Path
- Open
Settings > Transcriptionand confirm model/cache paths plus default language. - Download one “safe baseline” model set.
- Run a 3-5 minute real sample through transcription.
- Open the result in Note and verify editing, export, and AI follow-up.
- Only after that, tune GPU, batching, realtime, or advanced parameters.
The goal is to establish a reusable baseline before optimizing for speed or quality.
What To Download First
You do not need the full model library on day one.
| Your goal | First choice | Why |
|---|---|---|
| Validate the file workflow first | Whisper Small / Medium | balanced quality and speed |
| Get live text without a strong GPU | Realtime model | better fit for realtime, usually without GPU dependency |
| Run realtime with a strong GPU and higher quality target | Whisper | Whisper can still achieve good RTF on capable GPUs |
| Start on a lower-power machine | Lightweight realtime or Whisper Tiny / Base | focus on stable output first |
Realtime models is Audio Note’s umbrella term for models that are better suited for realtime workflows. They typically have better RTF, lower latency, and friendlier device requirements, and usually do not depend on GPU.
For deeper selection guidance, continue with Model Usage Recommendations and Concepts.
How To Validate Your First Setup
Validation 1: file workflow
Run one real audio or video sample through File Transcription. This is the fastest way to verify:
- the model is available and usable
- cache and storage paths are correct
- export, note, and AI workflows all connect properly
Validation 2: realtime workflow
If your main use case is meetings, interviews, or spoken drafting, also run one Realtime Microphone session. Only check three things:
- can the latency be tolerated
- are segmentation and refresh stable
- does the result flow into Note correctly
Validation 3: post-processing workflow
Do not stop at “text appeared.” Confirm that:
- you can fix names, numbers, and terminology in Note
- you can extract summaries or action items in AI Chat
- the export format matches your downstream workflow
Common First-Time Mistakes
- Too many models, no clear baseline Start with one file-oriented model and one realtime-oriented model.
- Jumping straight to the largest model Establish a Small / Medium baseline first, then decide whether Large is worth it.
- Treating first-pass output as final output Realtime text and first-pass transcripts are usually better treated as draft material.
Read Next
- Model boundaries and terminology: Concepts
- Device- and task-based model selection: Model Usage Recommendations
- Your first production workflow: File Transcription
- Meeting and spoken drafting workflow: Realtime Microphone Transcription
- Core setup map: Settings Overview