Files
AI-VideoAssistant/docs/content/overview/architecture.md
Xin Wang 4c05131536 Update documentation and configuration for Realtime Agent Studio
- Revised mkdocs.yml to reflect the new site name and description, enhancing clarity for users.
- Added a changelog.md to document important changes and updates for the project.
- Introduced a roadmap.md to outline development plans and progress for future releases.
- Expanded index.md with a comprehensive overview of the platform, including core features and installation instructions.
- Enhanced concepts documentation with detailed explanations of assistants, engines, and their configurations.
- Updated configuration documentation to provide clear guidance on environment setup and service configurations.
- Added extra JavaScript for improved user experience in the documentation site.
2026-03-02 23:35:22 +08:00

357 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 系统架构
本文档详细介绍 Realtime Agent Studio (RAS) 的系统架构设计。
---
## 整体架构
RAS 采用前后端分离的微服务架构,主要由三个核心服务组成:
```mermaid
flowchart TB
subgraph Client["客户端"]
Browser[Web 浏览器]
Mobile[移动应用]
ThirdParty[第三方系统]
end
subgraph Frontend["前端服务"]
WebApp[React 管理控制台]
end
subgraph Backend["后端服务"]
API[API 服务<br/>FastAPI]
Engine[实时交互引擎<br/>WebSocket]
end
subgraph Storage["数据存储"]
DB[(SQLite/PostgreSQL)]
FileStore[文件存储]
end
subgraph External["外部服务"]
LLM[LLM 服务]
ASR[ASR 服务]
TTS[TTS 服务]
end
Browser --> WebApp
Mobile -->|WebSocket| Engine
ThirdParty -->|REST API| API
WebApp -->|REST API| API
WebApp -->|WebSocket| Engine
API <--> DB
API <--> FileStore
Engine <--> API
Engine --> LLM
Engine --> ASR
Engine --> TTS
```
---
## 核心组件
### 1. Web 前端 (React)
管理控制台,提供可视化的配置和监控界面。
| 功能模块 | 说明 |
|---------|------|
| 助手管理 | 创建、配置、测试智能助手 |
| 资源库 | LLM/ASR/TTS 模型管理 |
| 知识库 | RAG 文档上传与管理 |
| 历史记录 | 会话日志查询与回放 |
| 仪表盘 | 实时数据统计 |
| 调试控制台 | WebSocket 实时测试 |
### 2. API 服务 (FastAPI)
RESTful API 后端,处理所有管理操作。
```mermaid
flowchart LR
subgraph API["API 服务"]
Router[路由层]
Service[业务逻辑层]
Model[数据模型层]
end
Client[客户端] --> Router
Router --> Service
Service --> Model
Model --> DB[(数据库)]
```
**主要职责:**
- 助手 CRUD 操作
- 模型资源管理
- 知识库管理
- 会话记录存储
- 认证与授权
### 3. 实时交互引擎 (Engine)
核心组件,处理实时音视频对话。
```mermaid
flowchart TB
subgraph Engine["实时交互引擎"]
WS[WebSocket Handler]
SM[会话管理器]
subgraph Pipeline["管线式引擎"]
VAD[VAD 检测]
ASR[语音识别]
LLM[大语言模型]
TTS[语音合成]
end
subgraph Multimodal["多模态引擎"]
RT[Realtime Model<br/>GPT-4o / Gemini]
end
end
Client[客户端] -->|音频流| WS
WS --> SM
SM --> Pipeline
SM --> Multimodal
Pipeline -->|文本/音频| WS
Multimodal -->|文本/音频| WS
```
---
## 引擎架构
### 管线式全双工引擎
传统方案,将语音交互拆分为三个独立阶段:
```mermaid
sequenceDiagram
participant C as 客户端
participant E as 引擎
participant ASR as 语音识别
participant LLM as 大语言模型
participant TTS as 语音合成
C->>E: 音频流 (PCM)
E->>ASR: 语音转文字
ASR-->>E: 转写文本
E->>LLM: 生成回复
LLM-->>E: 回复文本 (流式)
E->>TTS: 文字转语音
TTS-->>E: 音频流
E->>C: 播放音频
```
**特点:**
- 灵活选择各环节供应商
- 可独立优化每个环节
- 延迟约 500-1500ms
### 原生多模态引擎
使用端到端多模态模型(如 GPT-4o Realtime
```mermaid
sequenceDiagram
participant C as 客户端
participant E as 引擎
participant RT as Realtime Model
C->>E: 音频流
E->>RT: 音频输入
RT-->>E: 音频输出 (流式)
E->>C: 播放音频
```
**特点:**
- 更低延迟 (< 300ms)
- 更自然的语音交互
- 依赖特定模型供应商
---
## 数据流
### WebSocket 会话流程
```mermaid
sequenceDiagram
participant C as 客户端
participant E as 引擎
participant API as API 服务
participant DB as 数据库
C->>E: 连接 ws://.../ws?assistant_id=xxx
E->>API: 获取助手配置
API->>DB: 查询助手
DB-->>API: 助手数据
API-->>E: 配置信息
C->>E: session.start
E-->>C: session.started
E-->>C: config.resolved
loop 对话循环
C->>E: 音频帧 (binary)
E-->>C: input.speech_started
E-->>C: transcript.delta
E-->>C: transcript.final
E-->>C: assistant.response.delta
E-->>C: output.audio.start
E-->>C: 音频帧 (binary)
E-->>C: output.audio.end
end
C->>E: session.stop
E->>API: 保存会话记录
API->>DB: 存储
E-->>C: session.stopped
```
### 智能打断流程
```mermaid
sequenceDiagram
participant C as 客户端
participant E as 引擎
participant TTS as TTS 服务
Note over E: 正在播放 TTS 音频
E->>C: 音频帧...
C->>E: 用户说话 (VAD 检测)
E->>E: 触发打断
E->>TTS: 停止合成
E-->>C: output.audio.interrupted
Note over E: 处理新的用户输入
E-->>C: input.speech_started
```
---
## 部署架构
### 开发环境
```mermaid
flowchart LR
subgraph Local["本地开发"]
Web[npm run dev<br/>:3000]
API[uvicorn<br/>:8080]
Engine[python main.py<br/>:8000]
DB[(SQLite)]
end
Web --> API
Web --> Engine
API --> DB
Engine --> API
```
### 生产环境
```mermaid
flowchart TB
subgraph Internet["互联网"]
User[用户]
end
subgraph LoadBalancer["负载均衡"]
Nginx[Nginx / Traefik]
end
subgraph Docker["Docker 集群"]
Web1[Web 容器]
Web2[Web 容器]
API1[API 容器]
API2[API 容器]
Engine1[Engine 容器]
Engine2[Engine 容器]
end
subgraph Storage["持久化存储"]
PG[(PostgreSQL)]
Redis[(Redis)]
S3[对象存储]
end
User --> Nginx
Nginx --> Web1
Nginx --> Web2
Nginx --> API1
Nginx --> API2
Nginx --> Engine1
Nginx --> Engine2
API1 --> PG
API2 --> PG
API1 --> Redis
Engine1 --> Redis
```
---
## 技术选型
| 组件 | 技术 | 选型理由 |
|------|------|---------|
| **前端框架** | React 18 | 成熟生态,组件化开发 |
| **状态管理** | Zustand | 轻量级TypeScript 友好 |
| **UI 组件** | Tailwind CSS | 原子化 CSS快速开发 |
| **后端框架** | FastAPI | 高性能,自动 API 文档 |
| **WebSocket** | websockets | Python 异步 WebSocket |
| **ORM** | SQLAlchemy | 功能完善,支持多数据库 |
| **数据库** | SQLite/PostgreSQL | 开发简单/生产可靠 |
---
## 扩展性设计
### 模型适配器模式
```mermaid
classDiagram
class ModelAdapter {
<<interface>>
+generate(prompt) string
+stream(prompt) AsyncIterator
}
class OpenAIAdapter {
+generate(prompt) string
+stream(prompt) AsyncIterator
}
class AzureAdapter {
+generate(prompt) string
+stream(prompt) AsyncIterator
}
class LocalAdapter {
+generate(prompt) string
+stream(prompt) AsyncIterator
}
ModelAdapter <|-- OpenAIAdapter
ModelAdapter <|-- AzureAdapter
ModelAdapter <|-- LocalAdapter
```
通过适配器模式,可以轻松接入新的模型供应商。
---
## 相关文档
- [WebSocket 协议](../api-reference/websocket.md) - 详细的协议规范
- [部署指南](../deployment/index.md) - 生产环境部署
- [核心概念](../concepts/index.md) - 助手、管线等概念说明