Back to Blog
Guides & Tutorials

Voice Dictation on Ubuntu 24.04 (Wayland): A Complete Guide Using Faster-Whisper

One-key toggle (Ctrl+Space). English-only. No cloud. This guide shows exactly how I set up voice typing on Ubuntu 24+ so that pressing Ctrl+Space starts listening, and pressing Ctrl+Space again types

Voice Dictation on Ubuntu 24.04 (Wayland): A Complete Guide Using Faster-Whisper

Stop typing, start talking. If you've been frustrated by the lack of a reliable voice dictation solution on Linux—especially on Ubuntu with Wayland—this guide is for you.

After extensive research and testing, I've built a solution that actually works. It uses OpenAI's Whisper model (via faster-whisper) for excellent accuracy, works system-wide on Wayland, and is completely offline and free.


The Problem with Voice Dictation on Linux

If you've tried voice dictation on Linux, you've probably encountered these issues:

  • nerd-dictation + Vosk: Decent but lower accuracy, doesn't capitalize properly, missing spaces between sentences
  • Speech Note: Hardware intensive, not real-time—you speak first, then wait for transcription
  • Google Docs Voice Typing: Only works in Chrome browser, not system-wide
  • Talon Voice: Steep learning curve, $25/month for the beta with improvements

The bigger problem? Most solutions don't work on Wayland, which is now the default display server on Ubuntu 24.04+. Tools like xdotool that simulate keyboard input simply don't work on Wayland.


The Solution

We'll build a custom voice dictation system using:

  • faster-whisper: 4x faster than OpenAI's Whisper, runs on CPU, excellent accuracy
  • ydotool: Types text on Wayland (unlike xdotool which only works on X11)
  • parecord: Records audio from your microphone
  • GNOME keyboard shortcuts: Toggle dictation with a hotkey

The workflow:

  1. Press Ctrl+Space to start recording
  2. Speak naturally
  3. Press Ctrl+Space again to stop
  4. Your speech is transcribed and typed wherever your cursor is

Works in terminals, browsers, text editors—anywhere you can type.


Prerequisites

  • Ubuntu 24.04+ (with Wayland)
  • At least 8GB RAM (we'll use Whisper "small" model)
  • A working microphone
  • Internet connection (for initial setup only—dictation works offline)

Step-by-Step Installation

Step 1: Install System Dependencies

sudo apt update && sudo apt install -y portaudio19-dev python3-venv python3-pip git xdotool ydotool pulseaudio-utils

Step 2: Set Up ydotool for Wayland

ydotool needs access to the input subsystem. Add your user to the input group:

sudo usermod -aG input $USER

Create udev rules:

sudo tee /etc/udev/rules.d/60-uinput.rules > /dev/null << 'EOF'
KERNEL=="uinput", MODE="0660", GROUP="input"
EOF

Reload udev rules:

sudo udevadm control --reload-rules && sudo udevadm trigger

Important: Log out and log back in for the group change to take effect.

Verify you're in the input group:

groups | grep input

Step 3: Set Up ydotool Daemon (Optional)

Create a systemd user service:

mkdir -p ~/.config/systemd/user && cat > ~/.config/systemd/user/ydotool.service << 'EOF'
[Unit]
Description=ydotool daemon

[Service]
ExecStart=/usr/bin/ydotoold

[Install]
WantedBy=default.target
EOF

Enable and start it:

systemctl --user daemon-reload && systemctl --user enable ydotool && systemctl --user start ydotool

Note: If the daemon fails to start, that's okay—ydotool still works without it (just with a small notice message).

Step 4: Install faster-whisper

Clone the project and set up a Python virtual environment:

cd ~
git clone https://github.com/doctorguile/faster-whisper-dictation.git
cd faster-whisper-dictation
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install faster-whisper pyaudio pynput transitions soundfile sounddevice numpy
deactivate

Step 5: Create the Dictation Scripts

Create the scripts directory:

mkdir -p ~/.local/bin

Create the start script:

cat > ~/.local/bin/dictate-start << 'EOF'
#!/bin/bash
DICTATION_DIR="/home/$USER/faster-whisper-dictation"
VENV="$DICTATION_DIR/venv/bin/python"
AUDIO_FILE="/tmp/dictation_recording.wav"
PID_FILE="/tmp/dictation.pid"

# Check if already recording
if [ -f "$PID_FILE" ]; then
    notify-send "Dictation" "Already recording... tap again to stop"
    exit 0
fi

notify-send "Dictation" "🎤 Recording... Press hotkey again to stop"

# Start recording with PulseAudio
parecord --channels=1 --rate=16000 --format=s16le "$AUDIO_FILE" &
echo $! > "$PID_FILE"
EOF
chmod +x ~/.local/bin/dictate-start

Create the stop script:

cat > ~/.local/bin/dictate-stop << 'ENDSCRIPT'
#!/bin/bash
DICTATION_DIR="/home/$USER/faster-whisper-dictation"
VENV="$DICTATION_DIR/venv/bin/python"
AUDIO_FILE="/tmp/dictation_recording.wav"
PID_FILE="/tmp/dictation.pid"

if [ ! -f "$PID_FILE" ]; then
    notify-send "Dictation" "Not recording"
    exit 0
fi

kill $(cat "$PID_FILE") 2>/dev/null
rm -f "$PID_FILE"
sleep 0.3

notify-send "Dictation" "⏳ Transcribing..."

TEXT=$($VENV << 'PYTHON'
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cpu", compute_type="int8")
segments, _ = model.transcribe("/tmp/dictation_recording.wav", beam_size=5)
print(" ".join([seg.text.strip() for seg in segments]))
PYTHON
)

if [ -n "$TEXT" ]; then
    sleep 0.2
    ydotool type -- "$TEXT"
    notify-send "Dictation" "✅ Done"
else
    notify-send "Dictation" "❌ No speech detected"
fi

rm -f "$AUDIO_FILE"
ENDSCRIPT
chmod +x ~/.local/bin/dictate-stop

Create the toggle script:

cat > ~/.local/bin/dictate-toggle << 'ENDSCRIPT'
#!/bin/bash
PID_FILE="/tmp/dictation.pid"

if [ -f "$PID_FILE" ]; then
    /home/$USER/.local/bin/dictate-stop
else
    /home/$USER/.local/bin/dictate-start
fi
ENDSCRIPT
chmod +x ~/.local/bin/dictate-toggle

Important: Replace $USER with your actual username in the scripts, or run this to fix them:

sed -i "s/\$USER/$USER/g" ~/.local/bin/dictate-start ~/.local/bin/dictate-stop ~/.local/bin/dictate-toggle

Step 6: Test Manually

Open a terminal and run:

~/.local/bin/dictate-start

Speak something for 5 seconds, then run:

~/.local/bin/dictate-stop

You should see your speech transcribed and typed in the terminal.

Step 7: Disable IBus Ctrl+Space Shortcut

By default, GNOME/IBus uses Ctrl+Space for switching input methods. We need to disable this first:

gsettings set org.gnome.desktop.input-sources xkb-options "[]"
gsettings set org.freedesktop.ibus.general.hotkey triggers "[]"

Step 8: Set Up Keyboard Shortcut

Set up Ctrl+Space as the toggle hotkey:

# Add to custom keybindings (preserves existing ones like Flameshot)
gsettings set org.gnome.settings-daemon.plugins.media-keys custom-keybindings "['/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/', '/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom1/']"

# Configure the dictation shortcut
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom1/ name 'Dictation Toggle'
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom1/ command '/home/YOUR_USERNAME/.local/bin/dictate-toggle'
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom1/ binding '<Ctrl>space'

Replace YOUR_USERNAME with your actual username.


Usage

  1. Click into any text field (terminal, browser, text editor, etc.)
  2. Press Ctrl+Space — you'll see a notification "🎤 Recording..."
  3. Speak naturally
  4. Press Ctrl+Space again — you'll see "⏳ Transcribing..."
  5. Your text appears where your cursor was

Troubleshooting

"ydotool: notice: ydotoold backend unavailable"

This is just a notice, not an error. ydotool still works, just with a small delay.

Transcription is slow

The first transcription after a reboot will be slower (model loading). Subsequent transcriptions are faster. You can also try using the "base" model instead of "small" for faster (but less accurate) results:

Change WhisperModel("small" to WhisperModel("base" in ~/.local/bin/dictate-stop.

No speech detected

  • Check your microphone is working: parecord --channels=1 /tmp/test.wav then aplay /tmp/test.wav
  • Speak louder or closer to the microphone
  • Try recording for longer (at least 3 seconds)

Hotkey doesn't work

  • Make sure you replaced YOUR_USERNAME with your actual username
  • Check if IBus is still capturing Ctrl+Space — run the disable commands in Step 7 again
  • Try logging out and back in after changing the shortcut

Ctrl+Space still switches input method

If Ctrl+Space is still being captured by IBus, try:

ibus write-cache
ibus restart

Or open Settings → Keyboard → Input Sources and remove any extra input methods.


Model Options

You can change the Whisper model based on your needs:

ModelSizeRAM NeededAccuracySpeed
tiny75 MB~1 GBLowerFastest
base142 MB~1 GBGoodFast
small466 MB~2 GBBetterMedium
medium1.5 GB~5 GBGreatSlower
large-v33 GB~10 GBBestSlowest

For CPU-only systems, I recommend "small" as the best balance of accuracy and speed.


Why This Solution?

  • Offline & Private: All processing happens locally—no data sent to the cloud
  • Free & Open Source: No subscriptions, no API costs
  • Works on Wayland: Unlike most other solutions
  • System-wide: Works in any application, including terminal
  • Accurate: Uses OpenAI's Whisper model, one of the best speech recognition models available

Credits

Related Posts