Schakel Voice¶
Orchestrator middleware for home automation and local voice assistance. Schakel sits between a voice satellite and Home Assistant, routing spoken commands through STT, an intent classifier, and specialized agents (home automation, music, general conversation) before responding via TTS.
Architecture¶
flowchart TB
subgraph Satellite["Voice Satellite"]
mic[Microphone]
speaker[Speaker]
end
subgraph Schakel["Schakel Middleware"]
ws["WebSocket\n/ws/audio"]
ww[Wake Word\nDetection]
stt["STT\n(Whisper)"]
router{{"Intent Router\n(Classifier LLM)"}}
tts["TTS Engine"]
ws --> ww --> stt --> router
tts --> ws
end
subgraph Agents["Specialized Agents"]
domotica["Domotica Agent\n(HA Translator)"]
musica["Musica Agent\n(Spotify)"]
general["General Agent\n(Conversational)"]
end
subgraph External["External Services"]
ha["Home Assistant\nAPI"]
spotify["Spotify\nAPI"]
llm_local["Local LLM\n(Ollama)"]
llm_cloud["Cloud LLM\n(OpenAI-compatible)"]
end
mic -- "audio bytes" --> ws
ws -- "audio bytes" --> speaker
router -- "DOMOTICA" --> domotica
router -- "MUSICA" --> musica
router -- "GENERAL" --> general
domotica -- "JSON action" --> ha
musica --> spotify
general --> llm_local
general --> llm_cloud
domotica -- "confirmation" --> tts
musica -- "response" --> tts
general -- "response" --> tts
How It Works¶
Audio flows through a two-state WebSocket pipeline:
- LISTENING -- streams audio through wake word detection (openwakeword). All audio is discarded until the wake word is heard.
- RECORDING -- buffers audio until the recording limit is reached (3 seconds), then runs the full pipeline: Whisper STT -> intent classification -> agent dispatch -> Piper TTS -> audio response back to the satellite.
The intent router classifies each utterance into one of three categories:
| Intent | Description | Example |
|---|---|---|
| DOMOTICA | Home automation device control | "enciende la luz del salon" |
| MUSICA | Spotify playback commands | "pon Despacito" |
| GENERAL | Questions and conversation | "que tiempo hace manana" |
Each intent is handled by a specialized agent that produces a structured response, which is then synthesized to audio and sent back to the satellite.
Quick Links¶
- Getting Started -- prerequisites and setup
- Configuration -- config.yaml walkthrough
- M5Stack Atom Echo -- flash your voice satellite
- Agent Architecture -- how the agents work
- Docker Deployment -- run with containers