Free audio transcription — right in your browser
Runs VoxScriber Nano (open-source) in your browser — local AI, up to 10 min per file, basic accuracy (~85%). For professional use, try Premium.
Transcription runs locally in your browser. You can optionally share the result with us (optional, with consent) to help improve the service. Limit: 10 min per file, ~85% accuracy.
Free vs Premium — see the difference
| Free (browser) | Premium (cloud) | |
|---|---|---|
| File limit | 10 min | 10 horas |
| Accuracy | ~85% | >95% |
| Speaker diarization | ❌ | ✅ |
| Word-level timestamps | ❌ | ✅ |
| Video support (MP4/MOV) | ❌ | ✅ |
| Export formats | TXT, SRT, VTT | DOCX, PDF, JSON… |
| Speed (1h of audio) | ~2 min / 1h | ~2 min / 1h |
| Privacy | 100% local | ☁️ + 🔒 |
Local AI
Transcription runs in your browser. Sharing with our servers is optional (requires consent).
Fast and local
AI processing runs directly in your browser — no waiting in queues.
99 languages
Automatically detects the language of your audio.
No signup needed
Start transcribing instantly, no account required.
How it works
Upload or record audio
Drag an MP3, WAV, M4A, or OGG file, or use your microphone directly.
AI runs on your device
Whisper AI downloads once and stays cached. No wait on your next visit.
Copy or download the text
Results appear in seconds. Download as .txt or copy with one click.
How accurate is browser transcription?
Browser transcription runs OpenAI's Whisper model directly on your device using WebAssembly. We offer three model sizes, and accuracy depends on which one you pick:
- Nano (~40MB) — The default. Around 85% accuracy on clear speech. Best for quick notes, voice messages and drafts. The only model that runs on iOS.
- Mini (~150MB) — Roughly 90% accuracy. A good middle ground if your device has 4GB+ of RAM and you need cleaner output.
- Plus (~500MB) — The most accurate local option, approaching 93% on clear audio. Slower to download and run; best on desktop machines with 8GB+ of RAM.
What lowers accuracy for any local model: background noise, multiple people talking over each other, heavy accents, and low-bitrate recordings such as compressed voice notes. If you need professional accuracy above 95%, word-level timestamps or speaker labels, that requires cloud models — see the comparison above.
Browser vs cloud transcription: which one do you need?
Browser transcription is the right tool when privacy matters most or the audio is short: nothing is uploaded, there is nothing to delete afterwards, and it costs nothing. The trade-off is speed and precision — your CPU processes roughly one hour of audio in twenty minutes, and the local model skips speaker labels and word-level timing.
Cloud transcription is the right tool when you are working: meetings, interviews, lectures, legal recordings. Dedicated GPUs turn an hour of audio into text in about two minutes with over 95% accuracy, label up to 30 different speakers, accept files up to 10 hours long, and export to DOCX, PDF and JSON on top of the subtitle formats.
A practical rule of thumb: if you would be comfortable reading the recording aloud in a cafe, the cloud's speed and accuracy win. If the audio is sensitive — a medical consultation, a confidential meeting, a private voice note — the browser tool keeps everything on your machine and still gives you a usable transcript in minutes. Many of our users combine both: quick private notes in the browser, professional work in the cloud.
See Premium plans →Supported audio formats
Upload MP3, WAV, M4A, OGG, OPUS, FLAC or WEBM — anything your browser can decode. Common sources work out of the box: WhatsApp voice notes (OPUS), iPhone voice memos (M4A), Android recorder files, Zoom recordings (M4A/MP4), Telegram voice messages (OGG) and podcast files (MP3). Video containers like MP4 and MOV are decoded for their audio track when the browser supports the codec. If a file fails to load, the usual cause is an unusual codec inside a common container — converting it to MP3 first solves it in almost every case.
Need a different format first? Use our free converters: free MP3 / WAV / OGG / AAC audio converter
Need more? Try Premium
For professional use — speaker diarization, long files, AI analysis and full export formats.
Speaker diarization
Automatically identifies who is speaking in each segment. Perfect for meetings, interviews and podcasts.
Files up to 10 hours
The local model supports up to 10 min. Premium handles files up to 10 hours long.
Summary, sentiment & topics
AI analyzes the content and generates executive summaries, sentiment analysis and topic extraction.
Full export options
Export to SRT, VTT, DOCX, JSON and PDF — ideal for subtitles, documents and automation.
Frequently asked questions
Free transcription in 20 languages
Whisper supports 99 languages with automatic detection, and we maintain a dedicated page for each of the 20 most-requested languages, with notes on how the model handles that specific language. Pick yours below — the transcriber pre-selects the right language for better accuracy.