- Updated Dockerfile for the API to include build tools for C++11 required for native extensions. - Revised requirements.txt to upgrade several dependencies, including FastAPI and SQLAlchemy. - Expanded docker-compose.yml to add MinIO service for S3-compatible storage and improved health checks for backend and engine services. - Enhanced README.md in the Docker directory to provide detailed service descriptions and quick start instructions. - Updated mkdocs.yml to reflect new navigation structure and added deployment overview documentation. - Introduced new Dockerfiles for the engine and web services, including development configurations for hot reloading.
315 lines
6.6 KiB
Markdown
315 lines
6.6 KiB
Markdown
# 系统架构
|
||
|
||
本文档详细介绍 Realtime Agent Studio (RAS) 的系统架构设计。
|
||
|
||
---
|
||
|
||
## 整体架构
|
||
|
||
RAS 采用前后端分离的微服务架构,主要由三个核心服务组成:
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
subgraph Client["客户端"]
|
||
Browser[Web 浏览器]
|
||
Mobile[移动应用]
|
||
ThirdParty[第三方系统]
|
||
end
|
||
|
||
subgraph Frontend["前端服务"]
|
||
WebApp[React 管理控制台]
|
||
end
|
||
|
||
subgraph Backend["后端服务"]
|
||
API[API 服务<br/>FastAPI]
|
||
Engine[实时交互引擎<br/>WebSocket]
|
||
end
|
||
|
||
subgraph Storage["数据存储"]
|
||
DB[(SQLite/PostgreSQL)]
|
||
FileStore[文件存储]
|
||
end
|
||
|
||
subgraph External["外部服务"]
|
||
LLM[LLM 服务]
|
||
ASR[ASR 服务]
|
||
TTS[TTS 服务]
|
||
end
|
||
|
||
Browser --> WebApp
|
||
Mobile -->|WebSocket| Engine
|
||
ThirdParty -->|REST API| API
|
||
WebApp -->|REST API| API
|
||
WebApp -->|WebSocket| Engine
|
||
API <--> DB
|
||
API <--> FileStore
|
||
Engine <--> API
|
||
Engine --> LLM
|
||
Engine --> ASR
|
||
Engine --> TTS
|
||
```
|
||
|
||
---
|
||
|
||
## 核心组件
|
||
|
||
### 1. Web 前端 (React)
|
||
|
||
管理控制台,提供可视化的配置和监控界面。
|
||
|
||
| 功能模块 | 说明 |
|
||
|---------|------|
|
||
| 助手管理 | 创建、配置、测试智能助手 |
|
||
| 资源库 | LLM/ASR/TTS 模型管理 |
|
||
| 知识库 | RAG 文档上传与管理 |
|
||
| 历史记录 | 会话日志查询与回放 |
|
||
| 仪表盘 | 实时数据统计 |
|
||
| 调试控制台 | WebSocket 实时测试 |
|
||
|
||
### 2. API 服务 (FastAPI)
|
||
|
||
RESTful API 后端,处理所有管理操作。
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph API["API 服务"]
|
||
Router[路由层]
|
||
Service[业务逻辑层]
|
||
Model[数据模型层]
|
||
end
|
||
|
||
Client[客户端] --> Router
|
||
Router --> Service
|
||
Service --> Model
|
||
Model --> DB[(数据库)]
|
||
```
|
||
|
||
**主要职责:**
|
||
|
||
- 助手 CRUD 操作
|
||
- 模型资源管理
|
||
- 知识库管理
|
||
- 会话记录存储
|
||
- 认证与授权
|
||
|
||
### 3. 实时交互引擎 (Engine)
|
||
|
||
核心组件,处理实时音视频对话。
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
subgraph Engine["实时交互引擎"]
|
||
WS[WebSocket Handler]
|
||
SM[会话管理器]
|
||
|
||
subgraph Pipeline["管线式引擎"]
|
||
VAD[VAD 检测]
|
||
ASR[语音识别]
|
||
LLM[大语言模型]
|
||
TTS[语音合成]
|
||
end
|
||
|
||
subgraph Multimodal["多模态引擎"]
|
||
RT[Realtime Model<br/>GPT-4o / Gemini]
|
||
end
|
||
end
|
||
|
||
Client[客户端] -->|音频流| WS
|
||
WS --> SM
|
||
SM --> Pipeline
|
||
SM --> Multimodal
|
||
Pipeline -->|文本/音频| WS
|
||
Multimodal -->|文本/音频| WS
|
||
```
|
||
|
||
---
|
||
|
||
## 引擎架构
|
||
|
||
### 管线式全双工引擎
|
||
|
||
传统方案,将语音交互拆分为三个独立阶段:
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C as 客户端
|
||
participant E as 引擎
|
||
participant ASR as 语音识别
|
||
participant LLM as 大语言模型
|
||
participant TTS as 语音合成
|
||
|
||
C->>E: 音频流 (PCM)
|
||
E->>ASR: 语音转文字
|
||
ASR-->>E: 转写文本
|
||
E->>LLM: 生成回复
|
||
LLM-->>E: 回复文本 (流式)
|
||
E->>TTS: 文字转语音
|
||
TTS-->>E: 音频流
|
||
E->>C: 播放音频
|
||
```
|
||
|
||
**特点:**
|
||
|
||
- 灵活选择各环节供应商
|
||
- 可独立优化每个环节
|
||
- 延迟约 500-1500ms
|
||
|
||
### 原生多模态引擎
|
||
|
||
使用端到端多模态模型(如 GPT-4o Realtime):
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C as 客户端
|
||
participant E as 引擎
|
||
participant RT as Realtime Model
|
||
|
||
C->>E: 音频流
|
||
E->>RT: 音频输入
|
||
RT-->>E: 音频输出 (流式)
|
||
E->>C: 播放音频
|
||
```
|
||
|
||
**特点:**
|
||
|
||
- 更低延迟 (< 300ms)
|
||
- 更自然的语音交互
|
||
- 依赖特定模型供应商
|
||
|
||
---
|
||
|
||
## 数据流
|
||
|
||
### WebSocket 会话流程
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C as 客户端
|
||
participant E as 引擎
|
||
participant API as API 服务
|
||
participant DB as 数据库
|
||
|
||
C->>E: 连接 ws://.../ws?assistant_id=xxx
|
||
E->>API: 获取助手配置
|
||
API->>DB: 查询助手
|
||
DB-->>API: 助手数据
|
||
API-->>E: 配置信息
|
||
|
||
C->>E: session.start
|
||
E-->>C: session.started
|
||
E-->>C: config.resolved
|
||
|
||
loop 对话循环
|
||
C->>E: 音频帧 (binary)
|
||
E-->>C: input.speech_started
|
||
E-->>C: transcript.delta
|
||
E-->>C: transcript.final
|
||
E-->>C: assistant.response.delta
|
||
E-->>C: output.audio.start
|
||
E-->>C: 音频帧 (binary)
|
||
E-->>C: output.audio.end
|
||
end
|
||
|
||
C->>E: session.stop
|
||
E->>API: 保存会话记录
|
||
API->>DB: 存储
|
||
E-->>C: session.stopped
|
||
```
|
||
|
||
### 智能打断流程
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C as 客户端
|
||
participant E as 引擎
|
||
participant TTS as TTS 服务
|
||
|
||
Note over E: 正在播放 TTS 音频
|
||
E->>C: 音频帧...
|
||
|
||
C->>E: 用户说话 (VAD 检测)
|
||
E->>E: 触发打断
|
||
E->>TTS: 停止合成
|
||
E-->>C: output.audio.interrupted
|
||
|
||
Note over E: 处理新的用户输入
|
||
E-->>C: input.speech_started
|
||
```
|
||
|
||
---
|
||
|
||
## 部署架构
|
||
|
||
### 开发环境
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph Local["本地开发"]
|
||
Web[npm run dev<br/>:3000]
|
||
API[uvicorn<br/>:8080]
|
||
Engine[python main.py<br/>:8000]
|
||
DB[(SQLite)]
|
||
end
|
||
|
||
Web --> API
|
||
Web --> Engine
|
||
API --> DB
|
||
Engine --> API
|
||
```
|
||
|
||
## 技术选型
|
||
|
||
| 组件 | 技术 | 选型理由 |
|
||
|------|------|---------|
|
||
| **前端框架** | React 18 | 成熟生态,组件化开发 |
|
||
| **状态管理** | Zustand | 轻量级,TypeScript 友好 |
|
||
| **UI 组件** | Tailwind CSS | 原子化 CSS,快速开发 |
|
||
| **后端框架** | FastAPI | 高性能,自动 API 文档 |
|
||
| **WebSocket** | websockets | Python 异步 WebSocket |
|
||
| **ORM** | SQLAlchemy | 功能完善,支持多数据库 |
|
||
| **数据库** | SQLite/PostgreSQL | 开发简单/生产可靠 |
|
||
|
||
---
|
||
|
||
## 扩展性设计
|
||
|
||
### 模型适配器模式
|
||
|
||
```mermaid
|
||
classDiagram
|
||
class ModelAdapter {
|
||
<<interface>>
|
||
+generate(prompt) string
|
||
+stream(prompt) AsyncIterator
|
||
}
|
||
|
||
class OpenAIAdapter {
|
||
+generate(prompt) string
|
||
+stream(prompt) AsyncIterator
|
||
}
|
||
|
||
class AzureAdapter {
|
||
+generate(prompt) string
|
||
+stream(prompt) AsyncIterator
|
||
}
|
||
|
||
class LocalAdapter {
|
||
+generate(prompt) string
|
||
+stream(prompt) AsyncIterator
|
||
}
|
||
|
||
ModelAdapter <|-- OpenAIAdapter
|
||
ModelAdapter <|-- AzureAdapter
|
||
ModelAdapter <|-- LocalAdapter
|
||
```
|
||
|
||
通过适配器模式,可以轻松接入新的模型供应商。
|
||
|
||
---
|
||
|
||
## 相关文档
|
||
|
||
- [WebSocket 协议](../api-reference/websocket.md) - 详细的协议规范
|
||
- [部署概览](../deployment/index.md) - Docker 部署
|
||
- [核心概念](../concepts/index.md) - 助手、管线等概念说明
|