speech-to-cli

Voice interface for AI coding assistants. Talk to your CLI agent and hear it respond, powered by Azure Speech Services.

Python 3.8+ Linux GPLv3 MCP Protocol

Works With

Claude Code via MCP
Copilot CLI via MCP
Gemini CLI via extension

MCP Tools

listen

Record from microphone and transcribe to text via Azure STT or local Whisper.

speak

Convert text to natural speech using Azure HD voices with streaming playback.

talk

Speak text then immediately listen for a reply — full-duplex TTS + STT in one call.

converse

Continuous voice loop — listen, let the AI respond, listen again.

multi_speak

Multiple voices in one call — parallel TTS requests, sequential playback.

configure

View or change runtime settings: voice, quality, timeouts, chimes, and more.

Features

Azure HD Voices — DragonHD voices with natural intonation and expressiveness
Live Terminal UI — VU meters, progress bars, and real-time subtitles
Low Latency — ~275ms from end of speech to first audio byte
Voice Activity Detection — Energy-gated VAD auto-calibrates to your environment
Full-Duplex Talk — Overlaps TTS and STT with headphones for instant handoff
No SDK Required — Plain REST and WebSocket API calls, minimal dependencies

Quick Start

# Clone and install
git clone https://github.com/jphein/speech-to-cli.git
cd speech-to-cli && ./install.sh

# Set your Azure Speech key
export AZURE_SPEECH_KEY="your-key-here"
export AZURE_SPEECH_REGION="westus2"

# Add to Claude Code
claude mcp add azure-speech -- python3 /path/to/mcp_speech.py

# Or run standalone
python3 speech.py # mic to clipboard
python3 tts.py "Hello world" # text to speech