speech-to-cli

Voice interface for AI coding assistants. Talk to your CLI agent and hear it respond, powered by Azure Speech Services.

Python 3.8+ Linux GPLv3 MCP Protocol

GitHub Quick Start MCP Setup

Works With

Claude Code via MCP

Copilot CLI via MCP

Gemini CLI via extension

MCP Tools

listen

Record from microphone and transcribe to text via Azure STT or local Whisper.

speak

Convert text to natural speech using Azure HD voices with streaming playback.

talk

Speak text then immediately listen for a reply — full-duplex TTS + STT in one call.

converse

Continuous voice loop — listen, let the AI respond, listen again.

multi_speak

Multiple voices in one call — parallel TTS requests, sequential playback.

configure

View or change runtime settings: voice, quality, timeouts, chimes, and more.

Features

Azure HD Voices — DragonHD voices with natural intonation and expressiveness

Live Terminal UI — VU meters, progress bars, and real-time subtitles

Low Latency — ~275ms from end of speech to first audio byte

Voice Activity Detection — Energy-gated VAD auto-calibrates to your environment

Full-Duplex Talk — Overlaps TTS and STT with headphones for instant handoff

No SDK Required — Plain REST and WebSocket API calls, minimal dependencies

Quick Start

      # Clone and install

      git clone https://github.com/jphein/speech-to-cli.git

      cd speech-to-cli && ./install.sh

      # Set your Azure Speech key

      export AZURE_SPEECH_KEY="your-key-here"

      export AZURE_SPEECH_REGION="westus2"

      # Add to Claude Code

      claude mcp add azure-speech -- python3 /path/to/mcp_speech.py

      # Or run standalone

      python3 speech.py  # mic to clipboard

      python3 tts.py "Hello world"  # text to speech