Skip to content

Schakel Voice

Orchestrator middleware for home automation and local voice assistance. Schakel sits between a voice satellite and Home Assistant, routing spoken commands through STT, an intent classifier, and specialized agents (home automation, music, general conversation) before responding via TTS.

Architecture

flowchart TB
    subgraph Satellite["Voice Satellite"]
        mic[Microphone]
        speaker[Speaker]
    end

    subgraph Schakel["Schakel Middleware"]
        ws["WebSocket\n/ws/audio"]
        ww[Wake Word\nDetection]
        stt["STT\n(Whisper)"]
        router{{"Intent Router\n(Classifier LLM)"}}
        tts["TTS Engine"]

        ws --> ww --> stt --> router
        tts --> ws
    end

    subgraph Agents["Specialized Agents"]
        domotica["Domotica Agent\n(HA Translator)"]
        musica["Musica Agent\n(Spotify)"]
        general["General Agent\n(Conversational)"]
    end

    subgraph External["External Services"]
        ha["Home Assistant\nAPI"]
        spotify["Spotify\nAPI"]
        llm_local["Local LLM\n(Ollama)"]
        llm_cloud["Cloud LLM\n(OpenAI-compatible)"]
    end

    mic -- "audio bytes" --> ws
    ws -- "audio bytes" --> speaker

    router -- "DOMOTICA" --> domotica
    router -- "MUSICA" --> musica
    router -- "GENERAL" --> general

    domotica -- "JSON action" --> ha
    musica --> spotify
    general --> llm_local
    general --> llm_cloud

    domotica -- "confirmation" --> tts
    musica -- "response" --> tts
    general -- "response" --> tts

How It Works

Audio flows through a two-state WebSocket pipeline:

  1. LISTENING -- streams audio through wake word detection (openwakeword). All audio is discarded until the wake word is heard.
  2. RECORDING -- buffers audio until the recording limit is reached (3 seconds), then runs the full pipeline: Whisper STT -> intent classification -> agent dispatch -> Piper TTS -> audio response back to the satellite.

The intent router classifies each utterance into one of three categories:

Intent Description Example
DOMOTICA Home automation device control "enciende la luz del salon"
MUSICA Spotify playback commands "pon Despacito"
GENERAL Questions and conversation "que tiempo hace manana"

Each intent is handled by a specialized agent that produces a structured response, which is then synthesized to audio and sent back to the satellite.