Refactor backend integration and service architecture

- Removed the backend client compatibility wrapper and associated methods to streamline backend integration. - Updated session management to utilize control plane gateways and runtime configuration providers. - Adjusted TTS service implementations to remove the EdgeTTS service and simplify service dependencies. - Enhanced documentation to reflect changes in backend integration and service architecture. - Updated configuration files to remove deprecated TTS provider options and clarify available settings.
2026-03-06 09:00:43 +08:00
parent 6b589a1b7c
commit 4e2450e800
22 changed files with 632 additions and 452 deletions
--- a/engine/docs/backend_integration.md
+++ b/engine/docs/backend_integration.md
@@ -27,9 +27,8 @@ Assistant config source behavior:

 ## Architecture

- Ports: `core/ports/backend.py`
+- Ports: `core/ports/control_plane.py`
 - Adapters: `app/backend_adapters.py`
- Compatibility wrappers: `app/backend_client.py`

 `Session` and `DuplexPipeline` receive backend capabilities via injected adapter
 methods instead of hard-coding backend client imports.
--- a/engine/docs/extension_ports.md
+++ b/engine/docs/extension_ports.md
@@ -0,0 +1,47 @@
+# Engine Extension Ports (Draft)
+
+This document defines the draft port set used to keep core runtime extensible.
+
+## Port Modules
+
+- `core/ports/control_plane.py`
+  - `AssistantRuntimeConfigProvider`
+  - `ConversationHistoryStore`
+  - `KnowledgeRetriever`
+  - `ToolCatalog`
+  - `ControlPlaneGateway`
+- `core/ports/llm.py`
+  - `LLMServiceSpec`
+  - `LLMPort`
+  - optional extensions: `LLMCancellable`, `LLMRuntimeConfigurable`
+- `core/ports/tts.py`
+  - `TTSServiceSpec`
+  - `TTSPort`
+- `core/ports/asr.py`
+  - `ASRServiceSpec`
+  - `ASRPort`
+  - optional extensions: `ASRInterimControl`, `ASRBufferControl`
+- `core/ports/service_factory.py`
+  - `RealtimeServiceFactory`
+
+## Adapter Layer
+
+- `app/service_factory.py` provides `DefaultRealtimeServiceFactory`.
+- It maps resolved provider specs to concrete adapters.
+- Core orchestration (`core/duplex_pipeline.py`) depends on the factory port/specs, not concrete provider classes.
+
+## Provider Behavior (Current)
+
+- LLM:
+  - supported providers: `openai`, `openai_compatible`, `openai-compatible`, `siliconflow`
+  - fallback: `MockLLMService`
+- TTS:
+  - supported providers: `dashscope`, `openai_compatible`, `openai-compatible`, `siliconflow`
+  - fallback: `MockTTSService`
+- ASR:
+  - supported providers: `openai_compatible`, `openai-compatible`, `siliconflow`
+  - fallback: `BufferedASRService`
+
+## Notes
+
+- This is a draft contract set; follow-up work can add explicit capability negotiation and contract-version fields.
--- a/engine/docs/high_level_architecture.md
+++ b/engine/docs/high_level_architecture.md
@@ -0,0 +1,129 @@
+# Engine High-Level Architecture
+
+This document describes the runtime architecture of `engine` for realtime voice/text assistant interactions.
+
+## Goals
+
+- Low-latency duplex interaction (user speaks while assistant can respond)
+- Clear separation between transport, orchestration, and model/service integrations
+- Backend-optional runtime (works with or without external backend)
+- Protocol-first interoperability through strict WS v1 control messages
+
+## Top-Level Components
+
+```mermaid
+flowchart LR
+  C[Client\nWeb / Mobile / Device] <-- WS v1 + PCM --> A[FastAPI App\napp/main.py]
+  A --> S[Session\ncore/session.py]
+  S --> D[Duplex Pipeline\ncore/duplex_pipeline.py]
+
+  D --> P[Processors\nVAD / EOU / Tracks]
+  D --> R[Workflow Runner\ncore/workflow_runner.py]
+  D --> E[Event Bus + Models\ncore/events.py + models/*]
+
+  R --> SV[Service Layer\nservices/asr.py\nservices/llm.py\nservices/tts.py]
+  R --> TE[Tool Executor\ncore/tool_executor.py]
+
+  S --> HB[History Bridge\ncore/history_bridge.py]
+  S --> BA[Control Plane Port\ncore/ports/control_plane.py]
+  BA --> AD[Adapters\napp/backend_adapters.py]
+
+  AD --> B[(External Backend API\noptional)]
+  SV --> M[(ASR/LLM/TTS Providers)]
+```
+
+## Request Lifecycle (Simplified)
+
+1. Client connects to `/ws?assistant_id=<id>` and sends `session.start`.
+2. App creates a `Session` with resolved assistant config (backend or local YAML).
+3. Binary PCM frames enter the duplex pipeline.
+4. `VAD`/`EOU` processors detect speech segments and trigger ASR finalization.
+5. ASR text is routed into workflow + LLM generation.
+6. Optional tool calls are executed (server-side or client-side result return).
+7. LLM output streams as text deltas; TTS produces audio chunks for playback.
+8. Session emits structured events (`transcript.*`, `assistant.*`, `output.audio.*`, `error`).
+9. History bridge persists conversation data asynchronously.
+10. On `session.stop` (or disconnect), session finalizes and drains pending writes.
+
+## Layering and Responsibilities
+
+### 1) Transport / API Layer
+
+- Entry point: `app/main.py`
+- Responsibilities:
+  - WebSocket lifecycle management
+  - WS v1 message validation and order guarantees
+  - Session creation and teardown
+  - Converting raw WS frames into internal events
+
+### 2) Session + Orchestration Layer
+
+- Core: `core/session.py`, `core/duplex_pipeline.py`, `core/conversation.py`
+- Responsibilities:
+  - Per-session state machine
+  - Turn boundaries and interruption/cancel handling
+  - Event sequencing (`seq`) and envelope consistency
+  - Bridging input/output tracks (`audio_in`, `audio_out`, `control`)
+
+### 3) Processing Layer
+
+- Modules: `processors/vad.py`, `processors/eou.py`, `processors/tracks.py`
+- Responsibilities:
+  - Speech activity detection
+  - End-of-utterance decisioning
+  - Track-oriented routing and timing-sensitive pre/post processing
+
+### 4) Workflow + Tooling Layer
+
+- Modules: `core/workflow_runner.py`, `core/tool_executor.py`
+- Responsibilities:
+  - Assistant workflow execution
+  - Tool call planning/execution and timeout handling
+  - Tool result normalization into protocol events
+
+### 5) Service Integration Layer
+
+- Modules: `services/*`
+- Responsibilities:
+  - Abstracting ASR/LLM/TTS provider differences
+  - Streaming token/audio adaptation
+  - Provider-specific adapters (OpenAI-compatible, DashScope, SiliconFlow, etc.)
+
+### 6) Backend Integration Layer (Optional)
+
+- Port: `core/ports/control_plane.py`
+- Adapters: `app/backend_adapters.py`
+- Responsibilities:
+  - Fetching assistant runtime config
+  - Persisting call/session metadata and history
+  - Supporting `BACKEND_MODE=auto|http|disabled`
+
+### 7) Persistence / Reliability Layer
+
+- Module: `core/history_bridge.py`
+- Responsibilities:
+  - Non-blocking queue-based history writes
+  - Retry with backoff on backend failures
+  - Best-effort drain on session finalize
+
+## Key Design Principles
+
+- Dependency inversion for backend: session/pipeline depend on port interfaces, not concrete clients.
+- Streaming-first: text/audio are emitted incrementally to minimize perceived latency.
+- Fail-soft behavior: backend/history failures should not block realtime interaction paths.
+- Protocol strictness: WS v1 rejects malformed/out-of-order control traffic early.
+- Explicit event model: all client-observable state changes are represented as typed events.
+
+## Configuration Boundaries
+
+- Runtime environment settings live in `app/config.py`.
+- Assistant-specific behavior is loaded by `assistant_id`:
+  - backend mode: from backend API
+  - engine-only mode: local `engine/config/agents/<assistant_id>.yaml`
+- Client-provided `metadata.overrides` and `dynamicVariables` can alter runtime behavior within protocol constraints.
+
+## Related Docs
+
+- WS protocol: `engine/docs/ws_v1_schema.md`
+- Backend integration details: `engine/docs/backend_integration.md`
+- Duplex interaction diagram: `engine/docs/duplex_interaction.svg`