Local AI Speech-to-Text

Transcribe audio files using local AI entirely in your browser.
100% private. No server uploads.

🔒

Local AI

Audio never leaves your device

🌐

Multi-lingual

⏱️

Timestamps

Easily track what was said when

🤖

AI Model Download Required

To run speech recognition locally in your browser, this tool needs to download an AI model (Whisper) of approximately 40-70MB on the first run. We recommend using a Wi-Fi connection. No audio data will ever be uploaded to any server.

about,

About

A revolutionary tool that runs OpenAI's Whisper model directly inside your browser. Because the processing is 100% local, it guarantees total privacy. Recommended for business or therapy meeting notes and private memos.

how to,

How to Use

Drop Audio File

Select your audio or video file.

Local Analysis

Whisper AI model converts speech to text locally in your browser.

glossary,

Glossary

Speech Recognition: Technology that converts audio input into text in real time. This tool uses the Whisper AI model via Transformers.js, running entirely within your browser.
Whisper: OpenAI's open-source automatic speech recognition (ASR) model. Supports dozens of languages and delivers high accuracy, especially for English and Japanese.
Transformers.js: A JavaScript library by Hugging Face that allows running transformer-based AI models (like Whisper) directly in the browser using WebAssembly, without any server.
Timestamps: Time markers in a transcription indicating when each segment of speech occurred. Useful for navigating recordings and creating meeting minutes.
Interim Results: Provisional recognition text displayed in real time while speaking. Replaced by the final recognition result once the utterance is complete.
Voice Activity Detection (VAD): Technology that automatically detects human speech segments within an audio signal. Allows more efficient transcription by skipping silent sections.
Clipboard: An OS-level feature for temporarily storing text or images. Use the copy button to copy recognized text to the clipboard and paste it into any other application.

faq,

FAQ

Q.Is my recorded audio sent to a server?: No. Whisper AI runs entirely inside your browser via WebAssembly. Your audio data never leaves your device. This tool stores or collects none of your data.
Q.Which browsers are supported?: Google Chrome and Microsoft Edge work best. Firefox and Safari have limited WebAssembly multi-threading support, which may affect model loading performance.
Q.Can I transcribe languages other than Japanese?: Yes. Whisper supports dozens of languages including English, Chinese, Korean, and Spanish. Simply select your language from the language menu.
Q.Why is the first load slow?: The Whisper model files (~40–70MB depending on size) are downloaded from a CDN on first use. After that, they are cached by the browser for near-instant subsequent loads.
Q.How can I improve recognition accuracy?: Use a quiet environment, speak clearly close to the microphone, and use an external microphone if possible. Selecting the correct language also significantly improves results.
Q.Can I save the transcription result as a file?: Yes. Use the Download button to save the transcription as a .txt file, or use the Copy button to copy it to the clipboard and paste it into any other app.
Q.Can I use it on a smartphone?: Chrome on Android is supported. iOS Safari has limited WebAssembly multi-threading support, which may restrict model loading and transcription performance.