Files

Xin Wang 4e2450e800 Refactor backend integration and service architecture

- Removed the backend client compatibility wrapper and associated methods to streamline backend integration.
- Updated session management to utilize control plane gateways and runtime configuration providers.
- Adjusted TTS service implementations to remove the EdgeTTS service and simplify service dependencies.
- Enhanced documentation to reflect changes in backend integration and service architecture.
- Updated configuration files to remove deprecated TTS provider options and clarify available settings.

2026-03-06 09:00:43 +08:00

4.8 KiB

Raw Blame History

Engine High-Level Architecture

This document describes the runtime architecture of engine for realtime voice/text assistant interactions.

Goals

Low-latency duplex interaction (user speaks while assistant can respond)
Clear separation between transport, orchestration, and model/service integrations
Backend-optional runtime (works with or without external backend)
Protocol-first interoperability through strict WS v1 control messages

Top-Level Components

flowchart LR
  C[Client\nWeb / Mobile / Device] <-- WS v1 + PCM --> A[FastAPI App\napp/main.py]
  A --> S[Session\ncore/session.py]
  S --> D[Duplex Pipeline\ncore/duplex_pipeline.py]

  D --> P[Processors\nVAD / EOU / Tracks]
  D --> R[Workflow Runner\ncore/workflow_runner.py]
  D --> E[Event Bus + Models\ncore/events.py + models/*]

  R --> SV[Service Layer\nservices/asr.py\nservices/llm.py\nservices/tts.py]
  R --> TE[Tool Executor\ncore/tool_executor.py]

  S --> HB[History Bridge\ncore/history_bridge.py]
  S --> BA[Control Plane Port\ncore/ports/control_plane.py]
  BA --> AD[Adapters\napp/backend_adapters.py]

  AD --> B[(External Backend API\noptional)]
  SV --> M[(ASR/LLM/TTS Providers)]

Request Lifecycle (Simplified)

Client connects to /ws?assistant_id=<id> and sends session.start.
App creates a Session with resolved assistant config (backend or local YAML).
Binary PCM frames enter the duplex pipeline.
VAD/EOU processors detect speech segments and trigger ASR finalization.
ASR text is routed into workflow + LLM generation.
Optional tool calls are executed (server-side or client-side result return).
LLM output streams as text deltas; TTS produces audio chunks for playback.
Session emits structured events (transcript.*, assistant.*, output.audio.*, error).
History bridge persists conversation data asynchronously.
On session.stop (or disconnect), session finalizes and drains pending writes.

Layering and Responsibilities

1) Transport / API Layer

Entry point: app/main.py
Responsibilities:
- WebSocket lifecycle management
- WS v1 message validation and order guarantees
- Session creation and teardown
- Converting raw WS frames into internal events

2) Session + Orchestration Layer

Core: core/session.py, core/duplex_pipeline.py, core/conversation.py
Responsibilities:
- Per-session state machine
- Turn boundaries and interruption/cancel handling
- Event sequencing (seq) and envelope consistency
- Bridging input/output tracks (audio_in, audio_out, control)

3) Processing Layer

Modules: processors/vad.py, processors/eou.py, processors/tracks.py
Responsibilities:
- Speech activity detection
- End-of-utterance decisioning
- Track-oriented routing and timing-sensitive pre/post processing

4) Workflow + Tooling Layer

Modules: core/workflow_runner.py, core/tool_executor.py
Responsibilities:
- Assistant workflow execution
- Tool call planning/execution and timeout handling
- Tool result normalization into protocol events

5) Service Integration Layer

Modules: services/*
Responsibilities:
- Abstracting ASR/LLM/TTS provider differences
- Streaming token/audio adaptation
- Provider-specific adapters (OpenAI-compatible, DashScope, SiliconFlow, etc.)

6) Backend Integration Layer (Optional)

Port: core/ports/control_plane.py
Adapters: app/backend_adapters.py
Responsibilities:
- Fetching assistant runtime config
- Persisting call/session metadata and history
- Supporting BACKEND_MODE=auto|http|disabled

7) Persistence / Reliability Layer

Module: core/history_bridge.py
Responsibilities:
- Non-blocking queue-based history writes
- Retry with backoff on backend failures
- Best-effort drain on session finalize

Key Design Principles

Dependency inversion for backend: session/pipeline depend on port interfaces, not concrete clients.
Streaming-first: text/audio are emitted incrementally to minimize perceived latency.
Fail-soft behavior: backend/history failures should not block realtime interaction paths.
Protocol strictness: WS v1 rejects malformed/out-of-order control traffic early.
Explicit event model: all client-observable state changes are represented as typed events.

Configuration Boundaries

Runtime environment settings live in app/config.py.
Assistant-specific behavior is loaded by assistant_id:
- backend mode: from backend API
- engine-only mode: local engine/config/agents/<assistant_id>.yaml
Client-provided metadata.overrides and dynamicVariables can alter runtime behavior within protocol constraints.

WS protocol: engine/docs/ws_v1_schema.md
Backend integration details: engine/docs/backend_integration.md
Duplex interaction diagram: engine/docs/duplex_interaction.svg

4.8 KiB Raw Blame History

Engine High-Level Architecture

Goals

Top-Level Components

Request Lifecycle (Simplified)

Layering and Responsibilities

1) Transport / API Layer

2) Session + Orchestration Layer

3) Processing Layer

4) Workflow + Tooling Layer

5) Service Integration Layer

6) Backend Integration Layer (Optional)

7) Persistence / Reliability Layer

Key Design Principles

Configuration Boundaries

Related Docs

4.8 KiB

Raw Blame History