Para-speak: Fast Local Speech-to-Text CLI

When I started working full-time with Claude Code, I found myself wanting to speak to it quite frequently. But I couldn’t find the right tooling that would fit my workflow:

Press a shortcut to start recording
Talk
Press another shortcut to stop recording
Press enter to send

The first prototype came together in a day with Node.js. The reliability and speed of the transcription was amazing. This pushed me to make something real from it and here I’m introducing a Rust based local CLI for speech-to-text.

Thanks to NVIDIA’s Parakeet model, Para-speak is working amazingly for AI assistance coding and I’m open sourcing the CLI tool!

Plug and Play

Para-speak is in its early stages and available on MacOS only. Many decisions are still being made, and it will mature over time.

For now, running the program requires one time setup to initialize Python environment and download the Parakeet model.

# Set up environment and download model (first time only)
cargo run -p verify-cli

All behavior is configurable through environment variables.

Be default, use the following shortcuts:

Start recording: ControlLeft + ControlLeft (double tap)
Stop recording: ControlLeft
Cancel recording: Escape + Escape (double tap)
Pause/resume: No default shortcut

Running the CLI

# Note: On first run, macOS will prompt for Accessibility permissions (for shortcuts)
# and Microphone access (for recording)
./para-speak

# Run in a debug mode
./para-speak -d

Para-speak only listens for keyboard shortcuts - it doesn't consume them yet!

Architecture

Para-speak is built in Rust, handling the majority of functionality—audio capture, keyboard shortcuts, system integration, and the CLI interface.

Python is used specifically for ML inference with the Parakeet MLX model through PyO3 bindings.

The Rust implementation focuses on speed and efficiency. Every part of the audio pipeline and system interaction is optimized for minimal latency. Feedback on Rust code is very welcome as it’s one of my first complete Rust projects.

When idle, Para-speak uses minimal resources—around 10MB of RAM on a MacBook M1 Pro.

Cross-platform support

Para-speak is designed to be cross-platform, but currently available for MacOS only.

Shortcut System & Extensibility

The shortcut system offers different ways to trigger actions:

Single keys: F1, Escape, ControlLeft
Combinations: CmdLeft+Shift+Y, Ctrl+Alt+A
Double-taps: double(ControlLeft, 300) (300 is a delay between taps in ms)

Any combination, divided by ;, can be used for any shortcut - start, stop, pause, or cancel. The system is optimized to minimize resource usage: when idle, it only listens for the start recording shortcut. Once recording begins, other shortcuts become active. For sequences and combinations, Para-speak only listens for the first key, activating full detection only when needed.

Para-speak uses a controller system that makes it easy to extend functionality. Controllers can be enabled through environment variables and get notified of recording events to execute custom actions.

The Spotify controller is one example - it adjusts music volume during recording. The same pattern can be used to build any type of asyncronous integration, or trigger any automation one might need after the recording is transcribed.

Configuration

Para-speak uses environment variables for all configuration. Create a .env.local file in the root of the project directory:

# Keyboard shortcuts
PARA_START_KEYS="double(ControlLeft, 300); CommandLeft+ShiftLeft+KeyY"
PARA_STOP_KEYS="ControlLeft; CommandLeft+ShiftLeft+KeyY"
PARA_CANCEL_KEYS="double(Escape, 300)"
PARA_PAUSE_KEYS="CommandLeft+Alt+Shift+KeyU"

# Core functionality
PARA_PASTE=true                          # Auto-paste transcribed text at cursor

# Spotify integration
PARA_SPOTIFY_RECORDING_VOLUME=30         # Set Spotify to specific volume (0-100)
PARA_SPOTIFY_REDUCE_BY=50                # OR reduce volume by amount (0-100)

# Transcription behavior
PARA_TRANSCRIBE_ON_PAUSE=true            # Experimental: transcribe when pausing (not just on stop)

# Advanced
PARA_SHORTCUT_RESOLUTION_DELAY_MS=50     # Delay for resolving shortcut conflicts
PARA_MEMORY_MONITOR=true                 # Enable memory usage reporting

# Debugging
PARA_DEBUG=true                          # Enable debug mode with verbose output

Check the README for detailed documentation.