Architecture¶

Schakel is a middleware orchestrator that bridges a voice satellite to multiple backend services. It receives raw audio, processes it through a pipeline, and returns synthesized speech.

System Overview¶

flowchart TB
    subgraph Satellite["Voice Satellite"]
        mic[Microphone]
        speaker[Speaker]
    end

    subgraph Schakel["Schakel Middleware"]
        ws["WebSocket\n/ws/audio"]
        ww[Wake Word\nDetection]
        stt["STT\n(Whisper)"]
        router{{"Intent Router\n(Classifier LLM)"}}
        tts["TTS Engine"]

        ws --> ww --> stt --> router
        tts --> ws
    end

    subgraph Agents["Specialized Agents"]
        domotica["Domotica Agent\n(HA Translator)"]
        musica["Musica Agent\n(Spotify)"]
        general["General Agent\n(Conversational)"]
    end

    subgraph External["External Services"]
        ha["Home Assistant\nAPI"]
        spotify["Spotify\nAPI"]
        llm_local["Local LLM\n(Ollama)"]
        llm_cloud["Cloud LLM\n(OpenAI-compatible)"]
    end

    mic -- "audio bytes" --> ws
    ws -- "audio bytes" --> speaker

    router -- "DOMOTICA" --> domotica
    router -- "MUSICA" --> musica
    router -- "GENERAL" --> general

    domotica -- "JSON action" --> ha
    musica --> spotify
    general --> llm_local
    general --> llm_cloud

    domotica -- "confirmation" --> tts
    musica -- "response" --> tts
    general -- "response" --> tts

Audio Pipeline¶

Audio flows through a two-state WebSocket pipeline:

LISTENING State¶

The satellite streams audio chunks continuously over the WebSocket. The wake word detector (openwakeword) processes each chunk looking for the configured trigger word (default: "alexa"). All audio is discarded until the wake word is detected.

RECORDING State¶

Once the wake word is heard, the pipeline switches to recording mode and buffers audio for 3 seconds (48,000 bytes at 16 kHz, 16-bit mono). After the buffer is full, the complete pipeline runs:

STT -- faster-whisper transcribes the audio buffer to text
Intent Classification -- the router LLM classifies the text as DOMOTICA, MUSICA, or GENERAL
Agent Dispatch -- the classified intent is routed to the appropriate agent
Execution -- the agent processes the request (HA service call, Spotify command, or LLM conversation)
TTS -- Piper synthesizes the response text to 16 kHz PCM audio
Response -- the audio bytes are sent back through the WebSocket to the satellite speaker

The pipeline then resets to LISTENING and waits for the next wake word.

Module Layout¶

Module	Path	Responsibility
Entry point	`app/main.py`	FastAPI app, WebSocket handler, service initialization
Configuration	`app/core/config.py`	YAML loading and Pydantic validation
Logging	`app/core/logger.py`	Logging setup
Schemas	`app/schemas/models.py`	Pydantic models: `Intent`, `HAAction`, `MusicAction`, `RouterResponse`
Wake word	`app/services/audio/wakeword.py`	openwakeword integration
STT	`app/services/audio/stt.py`	faster-whisper transcription
TTS	`app/services/audio/tts.py`	Piper synthesis with resampling to 16 kHz
Intent Router	`app/services/llm/router.py`	Intent classification and agent dispatch
Local LLM	`app/services/llm/local.py`	Ollama async client
Cloud LLM	`app/services/llm/cloud.py`	OpenAI/Anthropic/Mistral async client
HA Client	`app/services/home_assistant/client.py`	Entity discovery and service calls
Spotify Client	`app/services/music/spotify.py`	Spotipy async wrapper

Key Design Decisions¶

All agents operate in Spanish. All system prompts and agent output are in Spanish since this is a Spanish-language voice assistant.

Structured JSON output for action agents. The domotica and musica agents return structured JSON that is parsed and executed programmatically. Only the confirmation field reaches TTS. This ensures reliable service calls without depending on LLM text parsing.

Confirmation from real results. For the musica agent, the TTS confirmation comes from the actual Spotify search result (real track/artist names), not from the LLM's guess. This prevents the assistant from announcing a song name that doesn't match what's actually playing.

Graceful degradation. If Spotify is not configured, the music agent responds politely instead of crashing. If the LLM call fails, the general agent returns a fallback message. If a Home Assistant service call fails, the domotica agent reports the error.

For details on each agent's behavioral contract, see Agents.