Voice Dictation on Ubuntu 24.04 (Wayland): A Complete Guide Using Faster-Whisper
One-key toggle (Ctrl+Space). English-only. No cloud. This guide shows exactly how I set up voice typing on Ubuntu 24+ so that pressing Ctrl+Space starts listening, and pressing Ctrl+Space again types

Stop typing, start talking. If you've been frustrated by the lack of a reliable voice dictation solution on Linux—especially on Ubuntu with Wayland—this guide is for you.
After extensive research and testing, I've built a solution that actually works. It uses OpenAI's Whisper model (via faster-whisper) for excellent accuracy, works system-wide on Wayland, and is completely offline and free.
The Problem with Voice Dictation on Linux
If you've tried voice dictation on Linux, you've probably encountered these issues:
- nerd-dictation + Vosk: Decent but lower accuracy, doesn't capitalize properly, missing spaces between sentences
- Speech Note: Hardware intensive, not real-time—you speak first, then wait for transcription
- Google Docs Voice Typing: Only works in Chrome browser, not system-wide
- Talon Voice: Steep learning curve, $25/month for the beta with improvements
The bigger problem? Most solutions don't work on Wayland, which is now the default display server on Ubuntu 24.04+. Tools like xdotool that simulate keyboard input simply don't work on Wayland.
The Solution
We'll build a custom voice dictation system using:
- faster-whisper: 4x faster than OpenAI's Whisper, runs on CPU, excellent accuracy
- ydotool: Types text on Wayland (unlike xdotool which only works on X11)
- parecord: Records audio from your microphone
- GNOME keyboard shortcuts: Toggle dictation with a hotkey
The workflow:
- Press
Ctrl+Spaceto start recording - Speak naturally
- Press
Ctrl+Spaceagain to stop - Your speech is transcribed and typed wherever your cursor is
Works in terminals, browsers, text editors—anywhere you can type.
Prerequisites
- Ubuntu 24.04+ (with Wayland)
- At least 8GB RAM (we'll use Whisper "small" model)
- A working microphone
- Internet connection (for initial setup only—dictation works offline)
Step-by-Step Installation
Step 1: Install System Dependencies
sudo apt update && sudo apt install -y portaudio19-dev python3-venv python3-pip git xdotool ydotool pulseaudio-utils
Step 2: Set Up ydotool for Wayland
ydotool needs access to the input subsystem. Add your user to the input group:
sudo usermod -aG input $USER
Create udev rules:
sudo tee /etc/udev/rules.d/60-uinput.rules > /dev/null << 'EOF'
KERNEL=="uinput", MODE="0660", GROUP="input"
EOF
Reload udev rules:
sudo udevadm control --reload-rules && sudo udevadm trigger
Important: Log out and log back in for the group change to take effect.
Verify you're in the input group:
groups | grep input
Step 3: Set Up ydotool Daemon (Optional)
Create a systemd user service:
mkdir -p ~/.config/systemd/user && cat > ~/.config/systemd/user/ydotool.service << 'EOF'
[Unit]
Description=ydotool daemon
[Service]
ExecStart=/usr/bin/ydotoold
[Install]
WantedBy=default.target
EOF
Enable and start it:
systemctl --user daemon-reload && systemctl --user enable ydotool && systemctl --user start ydotool
Note: If the daemon fails to start, that's okay—ydotool still works without it (just with a small notice message).
Step 4: Install faster-whisper
Clone the project and set up a Python virtual environment:
cd ~
git clone https://github.com/doctorguile/faster-whisper-dictation.git
cd faster-whisper-dictation
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install faster-whisper pyaudio pynput transitions soundfile sounddevice numpy
deactivate
Step 5: Create the Dictation Scripts
Create the scripts directory:
mkdir -p ~/.local/bin
Create the start script:
cat > ~/.local/bin/dictate-start << 'EOF'
#!/bin/bash
DICTATION_DIR="/home/$USER/faster-whisper-dictation"
VENV="$DICTATION_DIR/venv/bin/python"
AUDIO_FILE="/tmp/dictation_recording.wav"
PID_FILE="/tmp/dictation.pid"
# Check if already recording
if [ -f "$PID_FILE" ]; then
notify-send "Dictation" "Already recording... tap again to stop"
exit 0
fi
notify-send "Dictation" "🎤 Recording... Press hotkey again to stop"
# Start recording with PulseAudio
parecord --channels=1 --rate=16000 --format=s16le "$AUDIO_FILE" &
echo $! > "$PID_FILE"
EOF
chmod +x ~/.local/bin/dictate-start
Create the stop script:
cat > ~/.local/bin/dictate-stop << 'ENDSCRIPT'
#!/bin/bash
DICTATION_DIR="/home/$USER/faster-whisper-dictation"
VENV="$DICTATION_DIR/venv/bin/python"
AUDIO_FILE="/tmp/dictation_recording.wav"
PID_FILE="/tmp/dictation.pid"
if [ ! -f "$PID_FILE" ]; then
notify-send "Dictation" "Not recording"
exit 0
fi
kill $(cat "$PID_FILE") 2>/dev/null
rm -f "$PID_FILE"
sleep 0.3
notify-send "Dictation" "⏳ Transcribing..."
TEXT=$($VENV << 'PYTHON'
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cpu", compute_type="int8")
segments, _ = model.transcribe("/tmp/dictation_recording.wav", beam_size=5)
print(" ".join([seg.text.strip() for seg in segments]))
PYTHON
)
if [ -n "$TEXT" ]; then
sleep 0.2
ydotool type -- "$TEXT"
notify-send "Dictation" "✅ Done"
else
notify-send "Dictation" "❌ No speech detected"
fi
rm -f "$AUDIO_FILE"
ENDSCRIPT
chmod +x ~/.local/bin/dictate-stop
Create the toggle script:
cat > ~/.local/bin/dictate-toggle << 'ENDSCRIPT'
#!/bin/bash
PID_FILE="/tmp/dictation.pid"
if [ -f "$PID_FILE" ]; then
/home/$USER/.local/bin/dictate-stop
else
/home/$USER/.local/bin/dictate-start
fi
ENDSCRIPT
chmod +x ~/.local/bin/dictate-toggle
Important: Replace $USER with your actual username in the scripts, or run this to fix them:
sed -i "s/\$USER/$USER/g" ~/.local/bin/dictate-start ~/.local/bin/dictate-stop ~/.local/bin/dictate-toggle
Step 6: Test Manually
Open a terminal and run:
~/.local/bin/dictate-start
Speak something for 5 seconds, then run:
~/.local/bin/dictate-stop
You should see your speech transcribed and typed in the terminal.
Step 7: Disable IBus Ctrl+Space Shortcut
By default, GNOME/IBus uses Ctrl+Space for switching input methods. We need to disable this first:
gsettings set org.gnome.desktop.input-sources xkb-options "[]"
gsettings set org.freedesktop.ibus.general.hotkey triggers "[]"
Step 8: Set Up Keyboard Shortcut
Set up Ctrl+Space as the toggle hotkey:
# Add to custom keybindings (preserves existing ones like Flameshot)
gsettings set org.gnome.settings-daemon.plugins.media-keys custom-keybindings "['/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/', '/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom1/']"
# Configure the dictation shortcut
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom1/ name 'Dictation Toggle'
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom1/ command '/home/YOUR_USERNAME/.local/bin/dictate-toggle'
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom1/ binding '<Ctrl>space'
Replace YOUR_USERNAME with your actual username.
Usage
- Click into any text field (terminal, browser, text editor, etc.)
- Press Ctrl+Space — you'll see a notification "🎤 Recording..."
- Speak naturally
- Press Ctrl+Space again — you'll see "⏳ Transcribing..."
- Your text appears where your cursor was
Troubleshooting
"ydotool: notice: ydotoold backend unavailable"
This is just a notice, not an error. ydotool still works, just with a small delay.
Transcription is slow
The first transcription after a reboot will be slower (model loading). Subsequent transcriptions are faster. You can also try using the "base" model instead of "small" for faster (but less accurate) results:
Change WhisperModel("small" to WhisperModel("base" in ~/.local/bin/dictate-stop.
No speech detected
- Check your microphone is working:
parecord --channels=1 /tmp/test.wavthenaplay /tmp/test.wav - Speak louder or closer to the microphone
- Try recording for longer (at least 3 seconds)
Hotkey doesn't work
- Make sure you replaced
YOUR_USERNAMEwith your actual username - Check if IBus is still capturing
Ctrl+Space— run the disable commands in Step 7 again - Try logging out and back in after changing the shortcut
Ctrl+Space still switches input method
If Ctrl+Space is still being captured by IBus, try:
ibus write-cache
ibus restart
Or open Settings → Keyboard → Input Sources and remove any extra input methods.
Model Options
You can change the Whisper model based on your needs:
| Model | Size | RAM Needed | Accuracy | Speed |
|---|---|---|---|---|
| tiny | 75 MB | ~1 GB | Lower | Fastest |
| base | 142 MB | ~1 GB | Good | Fast |
| small | 466 MB | ~2 GB | Better | Medium |
| medium | 1.5 GB | ~5 GB | Great | Slower |
| large-v3 | 3 GB | ~10 GB | Best | Slowest |
For CPU-only systems, I recommend "small" as the best balance of accuracy and speed.
Why This Solution?
- Offline & Private: All processing happens locally—no data sent to the cloud
- Free & Open Source: No subscriptions, no API costs
- Works on Wayland: Unlike most other solutions
- System-wide: Works in any application, including terminal
- Accurate: Uses OpenAI's Whisper model, one of the best speech recognition models available
Credits
- faster-whisper - Fast Whisper implementation
- OpenAI Whisper - The original speech recognition model
- ydotool - Wayland-compatible input simulation
Related Posts
Building Privacy-Compliant RAG Assistants for Canadian SMBs
Introduction Retrieval-Augmented Generation (RAG) systems combine large language models with your own documents and databases to deliver contextually accurate, grounded answers while minimizing exposu
How to Measure ROI for AI Automation Projects in SMBs
Introduction Small and mid-sized businesses (SMBs) are increasingly turning to AI automation to streamline operations, cut costs, and drive growth. To justify these investments and guide future strate