- Removed the backend client compatibility wrapper and associated methods to streamline backend integration. - Updated session management to utilize control plane gateways and runtime configuration providers. - Adjusted TTS service implementations to remove the EdgeTTS service and simplify service dependencies. - Enhanced documentation to reflect changes in backend integration and service architecture. - Updated configuration files to remove deprecated TTS provider options and clarify available settings.
4.8 KiB
4.8 KiB
Engine High-Level Architecture
This document describes the runtime architecture of engine for realtime voice/text assistant interactions.
Goals
- Low-latency duplex interaction (user speaks while assistant can respond)
- Clear separation between transport, orchestration, and model/service integrations
- Backend-optional runtime (works with or without external backend)
- Protocol-first interoperability through strict WS v1 control messages
Top-Level Components
flowchart LR
C[Client\nWeb / Mobile / Device] <-- WS v1 + PCM --> A[FastAPI App\napp/main.py]
A --> S[Session\ncore/session.py]
S --> D[Duplex Pipeline\ncore/duplex_pipeline.py]
D --> P[Processors\nVAD / EOU / Tracks]
D --> R[Workflow Runner\ncore/workflow_runner.py]
D --> E[Event Bus + Models\ncore/events.py + models/*]
R --> SV[Service Layer\nservices/asr.py\nservices/llm.py\nservices/tts.py]
R --> TE[Tool Executor\ncore/tool_executor.py]
S --> HB[History Bridge\ncore/history_bridge.py]
S --> BA[Control Plane Port\ncore/ports/control_plane.py]
BA --> AD[Adapters\napp/backend_adapters.py]
AD --> B[(External Backend API\noptional)]
SV --> M[(ASR/LLM/TTS Providers)]
Request Lifecycle (Simplified)
- Client connects to
/ws?assistant_id=<id>and sendssession.start. - App creates a
Sessionwith resolved assistant config (backend or local YAML). - Binary PCM frames enter the duplex pipeline.
VAD/EOUprocessors detect speech segments and trigger ASR finalization.- ASR text is routed into workflow + LLM generation.
- Optional tool calls are executed (server-side or client-side result return).
- LLM output streams as text deltas; TTS produces audio chunks for playback.
- Session emits structured events (
transcript.*,assistant.*,output.audio.*,error). - History bridge persists conversation data asynchronously.
- On
session.stop(or disconnect), session finalizes and drains pending writes.
Layering and Responsibilities
1) Transport / API Layer
- Entry point:
app/main.py - Responsibilities:
- WebSocket lifecycle management
- WS v1 message validation and order guarantees
- Session creation and teardown
- Converting raw WS frames into internal events
2) Session + Orchestration Layer
- Core:
core/session.py,core/duplex_pipeline.py,core/conversation.py - Responsibilities:
- Per-session state machine
- Turn boundaries and interruption/cancel handling
- Event sequencing (
seq) and envelope consistency - Bridging input/output tracks (
audio_in,audio_out,control)
3) Processing Layer
- Modules:
processors/vad.py,processors/eou.py,processors/tracks.py - Responsibilities:
- Speech activity detection
- End-of-utterance decisioning
- Track-oriented routing and timing-sensitive pre/post processing
4) Workflow + Tooling Layer
- Modules:
core/workflow_runner.py,core/tool_executor.py - Responsibilities:
- Assistant workflow execution
- Tool call planning/execution and timeout handling
- Tool result normalization into protocol events
5) Service Integration Layer
- Modules:
services/* - Responsibilities:
- Abstracting ASR/LLM/TTS provider differences
- Streaming token/audio adaptation
- Provider-specific adapters (OpenAI-compatible, DashScope, SiliconFlow, etc.)
6) Backend Integration Layer (Optional)
- Port:
core/ports/control_plane.py - Adapters:
app/backend_adapters.py - Responsibilities:
- Fetching assistant runtime config
- Persisting call/session metadata and history
- Supporting
BACKEND_MODE=auto|http|disabled
7) Persistence / Reliability Layer
- Module:
core/history_bridge.py - Responsibilities:
- Non-blocking queue-based history writes
- Retry with backoff on backend failures
- Best-effort drain on session finalize
Key Design Principles
- Dependency inversion for backend: session/pipeline depend on port interfaces, not concrete clients.
- Streaming-first: text/audio are emitted incrementally to minimize perceived latency.
- Fail-soft behavior: backend/history failures should not block realtime interaction paths.
- Protocol strictness: WS v1 rejects malformed/out-of-order control traffic early.
- Explicit event model: all client-observable state changes are represented as typed events.
Configuration Boundaries
- Runtime environment settings live in
app/config.py. - Assistant-specific behavior is loaded by
assistant_id:- backend mode: from backend API
- engine-only mode: local
engine/config/agents/<assistant_id>.yaml
- Client-provided
metadata.overridesanddynamicVariablescan alter runtime behavior within protocol constraints.
Related Docs
- WS protocol:
engine/docs/ws_v1_schema.md - Backend integration details:
engine/docs/backend_integration.md - Duplex interaction diagram:
engine/docs/duplex_interaction.svg