# Engine High-Level Architecture This document describes the runtime architecture of `engine` for realtime voice/text assistant interactions. ## Goals - Low-latency duplex interaction (user speaks while assistant can respond) - Clear separation between transport, orchestration, and model/service integrations - Backend-optional runtime (works with or without external backend) - Protocol-first interoperability through strict WS v1 control messages ## Top-Level Components ```mermaid flowchart LR C[Client\nWeb / Mobile / Device] <-- WS v1 + PCM --> A[FastAPI App\napp/main.py] A --> S[Session\ncore/session.py] S --> D[Duplex Pipeline\ncore/duplex_pipeline.py] D --> P[Processors\nVAD / EOU / Tracks] D --> R[Workflow Runner\ncore/workflow_runner.py] D --> E[Event Bus + Models\ncore/events.py + models/*] R --> SV[Service Layer\nservices/asr.py\nservices/llm.py\nservices/tts.py] R --> TE[Tool Executor\ncore/tool_executor.py] S --> HB[History Bridge\ncore/history_bridge.py] S --> BA[Control Plane Port\ncore/ports/control_plane.py] BA --> AD[Adapters\napp/backend_adapters.py] AD --> B[(External Backend API\noptional)] SV --> M[(ASR/LLM/TTS Providers)] ``` ## Request Lifecycle (Simplified) 1. Client connects to `/ws?assistant_id=` and sends `session.start`. 2. App creates a `Session` with resolved assistant config (backend or local YAML). 3. Binary PCM frames enter the duplex pipeline. 4. `VAD`/`EOU` processors detect speech segments and trigger ASR finalization. 5. ASR text is routed into workflow + LLM generation. 6. Optional tool calls are executed (server-side or client-side result return). 7. LLM output streams as text deltas; TTS produces audio chunks for playback. 8. Session emits structured events (`transcript.*`, `assistant.*`, `output.audio.*`, `error`). 9. History bridge persists conversation data asynchronously. 10. On `session.stop` (or disconnect), session finalizes and drains pending writes. ## Layering and Responsibilities ### 1) Transport / API Layer - Entry point: `app/main.py` - Responsibilities: - WebSocket lifecycle management - WS v1 message validation and order guarantees - Session creation and teardown - Converting raw WS frames into internal events ### 2) Session + Orchestration Layer - Core: `core/session.py`, `core/duplex_pipeline.py`, `core/conversation.py` - Responsibilities: - Per-session state machine - Turn boundaries and interruption/cancel handling - Event sequencing (`seq`) and envelope consistency - Bridging input/output tracks (`audio_in`, `audio_out`, `control`) ### 3) Processing Layer - Modules: `processors/vad.py`, `processors/eou.py`, `processors/tracks.py` - Responsibilities: - Speech activity detection - End-of-utterance decisioning - Track-oriented routing and timing-sensitive pre/post processing ### 4) Workflow + Tooling Layer - Modules: `core/workflow_runner.py`, `core/tool_executor.py` - Responsibilities: - Assistant workflow execution - Tool call planning/execution and timeout handling - Tool result normalization into protocol events ### 5) Service Integration Layer - Modules: `services/*` - Responsibilities: - Abstracting ASR/LLM/TTS provider differences - Streaming token/audio adaptation - Provider-specific adapters (OpenAI-compatible, DashScope, SiliconFlow, etc.) ### 6) Backend Integration Layer (Optional) - Port: `core/ports/control_plane.py` - Adapters: `app/backend_adapters.py` - Responsibilities: - Fetching assistant runtime config - Persisting call/session metadata and history - Supporting `BACKEND_MODE=auto|http|disabled` ### 7) Persistence / Reliability Layer - Module: `core/history_bridge.py` - Responsibilities: - Non-blocking queue-based history writes - Retry with backoff on backend failures - Best-effort drain on session finalize ## Key Design Principles - Dependency inversion for backend: session/pipeline depend on port interfaces, not concrete clients. - Streaming-first: text/audio are emitted incrementally to minimize perceived latency. - Fail-soft behavior: backend/history failures should not block realtime interaction paths. - Protocol strictness: WS v1 rejects malformed/out-of-order control traffic early. - Explicit event model: all client-observable state changes are represented as typed events. ## Configuration Boundaries - Runtime environment settings live in `app/config.py`. - Assistant-specific behavior is loaded by `assistant_id`: - backend mode: from backend API - engine-only mode: local `engine/config/agents/.yaml` - Client-provided `metadata.overrides` and `dynamicVariables` can alter runtime behavior within protocol constraints. ## Related Docs - WS protocol: `engine/docs/ws_v1_schema.md` - Backend integration details: `engine/docs/backend_integration.md` - Duplex interaction diagram: `engine/docs/duplex_interaction.svg`