Elvin Engineering

Para-speak: Fast Local Speech-to-Text CLI

September 10, 2025

When I started working full-time with Claude Code, I found myself wanting to speak to it quite frequently. But I couldn’t find the right tooling that would fit my workflow:

The first prototype came together in a day with Node.js. The reliability and speed of the transcription was amazing. This pushed me to make something real from it and here I’m introducing a Rust based local CLI for speech-to-text.

Thanks to NVIDIA’s Parakeet model, Para-speak is working amazingly for AI assistance coding and I’m open sourcing the CLI tool!

OpenAI’s Whisper app delivers good accuracy, but it feels slow and requires manual steps to get text where you need it.

Other desktop applications I tried came with cluttered UIs and didn’t provide the flexibility I was looking for - some wouldn’t even let me try them without jumping through hoops.

While trying othe prod, I discovered CtrlSpeak, an open-source project that was implementing something close to what I was looking for. This inspired me to try building my own solution with NVIDIA’s Parakeet model.

Plug and Play

Para-speak is in its early stages and available on MacOS only. Many decisions are still being made, and it will mature over time.

In Ukrainian, “Pora” means “It’s time.”

Since the vowel sounds “o” and “a” are so close, I often pronounce it as “para”.

The name of the project is meant to capture this idea - it’s time to speak.

For now, running the program requires one time setup to initialize Python environment and download the Parakeet model.

# Set up environment and download model (first time only)
cargo run -p verify-cli

All behavior is configurable through environment variables.

Be default, use the following shortcuts:

Make sure double Control is not conflicting with MacOS dictation shortcut at

Keyboard > Dictation > Shortcut

Running the CLI

# Note: On first run, macOS will prompt for Accessibility permissions (for shortcuts)
# and Microphone access (for recording)
./para-speak

# Run in a debug mode
./para-speak -d

Architecture

Para-speak is built in Rust, handling the majority of functionality—audio capture, keyboard shortcuts, system integration, and the CLI interface.

Python is used specifically for ML inference with the Parakeet MLX model through PyO3 bindings.

The Rust implementation focuses on speed and efficiency. Every part of the audio pipeline and system interaction is optimized for minimal latency. Feedback on Rust code is very welcome as it’s one of my first complete Rust projects.

When idle, Para-speak uses minimal resources—around 10MB of RAM on a MacBook M1 Pro.

Shortcut System & Extensibility

The shortcut system offers different ways to trigger actions:

Any combination, divided by ;, can be used for any shortcut - start, stop, pause, or cancel. The system is optimized to minimize resource usage: when idle, it only listens for the start recording shortcut. Once recording begins, other shortcuts become active. For sequences and combinations, Para-speak only listens for the first key, activating full detection only when needed.

Para-speak uses a controller system that makes it easy to extend functionality. Controllers can be enabled through environment variables and get notified of recording events to execute custom actions.

The Spotify controller is one example - it adjusts music volume during recording. The same pattern can be used to build any type of asyncronous integration, or trigger any automation one might need after the recording is transcribed.

Configuration

Para-speak uses environment variables for all configuration. Create a .env.local file in the root of the project directory:

# Keyboard shortcuts
PARA_START_KEYS="double(ControlLeft, 300); CommandLeft+ShiftLeft+KeyY"
PARA_STOP_KEYS="ControlLeft; CommandLeft+ShiftLeft+KeyY"
PARA_CANCEL_KEYS="double(Escape, 300)"
PARA_PAUSE_KEYS="CommandLeft+Alt+Shift+KeyU"

# Core functionality
PARA_PASTE=true                          # Auto-paste transcribed text at cursor

# Spotify integration
PARA_SPOTIFY_RECORDING_VOLUME=30         # Set Spotify to specific volume (0-100)
PARA_SPOTIFY_REDUCE_BY=50                # OR reduce volume by amount (0-100)

# Transcription behavior
PARA_TRANSCRIBE_ON_PAUSE=true            # Experimental: transcribe when pausing (not just on stop)

# Advanced
PARA_SHORTCUT_RESOLUTION_DELAY_MS=50     # Delay for resolving shortcut conflicts
PARA_MEMORY_MONITOR=true                 # Enable memory usage reporting

# Debugging
PARA_DEBUG=true                          # Enable debug mode with verbose output

Check the README for detailed documentation.