Update documentation and configuration for Realtime Agent Studio
- Revised mkdocs.yml to reflect the new site name and description, enhancing clarity for users. - Added a changelog.md to document important changes and updates for the project. - Introduced a roadmap.md to outline development plans and progress for future releases. - Expanded index.md with a comprehensive overview of the platform, including core features and installation instructions. - Enhanced concepts documentation with detailed explanations of assistants, engines, and their configurations. - Updated configuration documentation to provide clear guidance on environment setup and service configurations. - Added extra JavaScript for improved user experience in the documentation site.
This commit is contained in:
356
docs/content/overview/architecture.md
Normal file
356
docs/content/overview/architecture.md
Normal file
@@ -0,0 +1,356 @@
|
||||
# 系统架构
|
||||
|
||||
本文档详细介绍 Realtime Agent Studio (RAS) 的系统架构设计。
|
||||
|
||||
---
|
||||
|
||||
## 整体架构
|
||||
|
||||
RAS 采用前后端分离的微服务架构,主要由三个核心服务组成:
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Client["客户端"]
|
||||
Browser[Web 浏览器]
|
||||
Mobile[移动应用]
|
||||
ThirdParty[第三方系统]
|
||||
end
|
||||
|
||||
subgraph Frontend["前端服务"]
|
||||
WebApp[React 管理控制台]
|
||||
end
|
||||
|
||||
subgraph Backend["后端服务"]
|
||||
API[API 服务<br/>FastAPI]
|
||||
Engine[实时交互引擎<br/>WebSocket]
|
||||
end
|
||||
|
||||
subgraph Storage["数据存储"]
|
||||
DB[(SQLite/PostgreSQL)]
|
||||
FileStore[文件存储]
|
||||
end
|
||||
|
||||
subgraph External["外部服务"]
|
||||
LLM[LLM 服务]
|
||||
ASR[ASR 服务]
|
||||
TTS[TTS 服务]
|
||||
end
|
||||
|
||||
Browser --> WebApp
|
||||
Mobile -->|WebSocket| Engine
|
||||
ThirdParty -->|REST API| API
|
||||
WebApp -->|REST API| API
|
||||
WebApp -->|WebSocket| Engine
|
||||
API <--> DB
|
||||
API <--> FileStore
|
||||
Engine <--> API
|
||||
Engine --> LLM
|
||||
Engine --> ASR
|
||||
Engine --> TTS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 核心组件
|
||||
|
||||
### 1. Web 前端 (React)
|
||||
|
||||
管理控制台,提供可视化的配置和监控界面。
|
||||
|
||||
| 功能模块 | 说明 |
|
||||
|---------|------|
|
||||
| 助手管理 | 创建、配置、测试智能助手 |
|
||||
| 资源库 | LLM/ASR/TTS 模型管理 |
|
||||
| 知识库 | RAG 文档上传与管理 |
|
||||
| 历史记录 | 会话日志查询与回放 |
|
||||
| 仪表盘 | 实时数据统计 |
|
||||
| 调试控制台 | WebSocket 实时测试 |
|
||||
|
||||
### 2. API 服务 (FastAPI)
|
||||
|
||||
RESTful API 后端,处理所有管理操作。
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph API["API 服务"]
|
||||
Router[路由层]
|
||||
Service[业务逻辑层]
|
||||
Model[数据模型层]
|
||||
end
|
||||
|
||||
Client[客户端] --> Router
|
||||
Router --> Service
|
||||
Service --> Model
|
||||
Model --> DB[(数据库)]
|
||||
```
|
||||
|
||||
**主要职责:**
|
||||
|
||||
- 助手 CRUD 操作
|
||||
- 模型资源管理
|
||||
- 知识库管理
|
||||
- 会话记录存储
|
||||
- 认证与授权
|
||||
|
||||
### 3. 实时交互引擎 (Engine)
|
||||
|
||||
核心组件,处理实时音视频对话。
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Engine["实时交互引擎"]
|
||||
WS[WebSocket Handler]
|
||||
SM[会话管理器]
|
||||
|
||||
subgraph Pipeline["管线式引擎"]
|
||||
VAD[VAD 检测]
|
||||
ASR[语音识别]
|
||||
LLM[大语言模型]
|
||||
TTS[语音合成]
|
||||
end
|
||||
|
||||
subgraph Multimodal["多模态引擎"]
|
||||
RT[Realtime Model<br/>GPT-4o / Gemini]
|
||||
end
|
||||
end
|
||||
|
||||
Client[客户端] -->|音频流| WS
|
||||
WS --> SM
|
||||
SM --> Pipeline
|
||||
SM --> Multimodal
|
||||
Pipeline -->|文本/音频| WS
|
||||
Multimodal -->|文本/音频| WS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 引擎架构
|
||||
|
||||
### 管线式全双工引擎
|
||||
|
||||
传统方案,将语音交互拆分为三个独立阶段:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant C as 客户端
|
||||
participant E as 引擎
|
||||
participant ASR as 语音识别
|
||||
participant LLM as 大语言模型
|
||||
participant TTS as 语音合成
|
||||
|
||||
C->>E: 音频流 (PCM)
|
||||
E->>ASR: 语音转文字
|
||||
ASR-->>E: 转写文本
|
||||
E->>LLM: 生成回复
|
||||
LLM-->>E: 回复文本 (流式)
|
||||
E->>TTS: 文字转语音
|
||||
TTS-->>E: 音频流
|
||||
E->>C: 播放音频
|
||||
```
|
||||
|
||||
**特点:**
|
||||
|
||||
- 灵活选择各环节供应商
|
||||
- 可独立优化每个环节
|
||||
- 延迟约 500-1500ms
|
||||
|
||||
### 原生多模态引擎
|
||||
|
||||
使用端到端多模态模型(如 GPT-4o Realtime):
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant C as 客户端
|
||||
participant E as 引擎
|
||||
participant RT as Realtime Model
|
||||
|
||||
C->>E: 音频流
|
||||
E->>RT: 音频输入
|
||||
RT-->>E: 音频输出 (流式)
|
||||
E->>C: 播放音频
|
||||
```
|
||||
|
||||
**特点:**
|
||||
|
||||
- 更低延迟 (< 300ms)
|
||||
- 更自然的语音交互
|
||||
- 依赖特定模型供应商
|
||||
|
||||
---
|
||||
|
||||
## 数据流
|
||||
|
||||
### WebSocket 会话流程
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant C as 客户端
|
||||
participant E as 引擎
|
||||
participant API as API 服务
|
||||
participant DB as 数据库
|
||||
|
||||
C->>E: 连接 ws://.../ws?assistant_id=xxx
|
||||
E->>API: 获取助手配置
|
||||
API->>DB: 查询助手
|
||||
DB-->>API: 助手数据
|
||||
API-->>E: 配置信息
|
||||
|
||||
C->>E: session.start
|
||||
E-->>C: session.started
|
||||
E-->>C: config.resolved
|
||||
|
||||
loop 对话循环
|
||||
C->>E: 音频帧 (binary)
|
||||
E-->>C: input.speech_started
|
||||
E-->>C: transcript.delta
|
||||
E-->>C: transcript.final
|
||||
E-->>C: assistant.response.delta
|
||||
E-->>C: output.audio.start
|
||||
E-->>C: 音频帧 (binary)
|
||||
E-->>C: output.audio.end
|
||||
end
|
||||
|
||||
C->>E: session.stop
|
||||
E->>API: 保存会话记录
|
||||
API->>DB: 存储
|
||||
E-->>C: session.stopped
|
||||
```
|
||||
|
||||
### 智能打断流程
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant C as 客户端
|
||||
participant E as 引擎
|
||||
participant TTS as TTS 服务
|
||||
|
||||
Note over E: 正在播放 TTS 音频
|
||||
E->>C: 音频帧...
|
||||
|
||||
C->>E: 用户说话 (VAD 检测)
|
||||
E->>E: 触发打断
|
||||
E->>TTS: 停止合成
|
||||
E-->>C: output.audio.interrupted
|
||||
|
||||
Note over E: 处理新的用户输入
|
||||
E-->>C: input.speech_started
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 部署架构
|
||||
|
||||
### 开发环境
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Local["本地开发"]
|
||||
Web[npm run dev<br/>:3000]
|
||||
API[uvicorn<br/>:8080]
|
||||
Engine[python main.py<br/>:8000]
|
||||
DB[(SQLite)]
|
||||
end
|
||||
|
||||
Web --> API
|
||||
Web --> Engine
|
||||
API --> DB
|
||||
Engine --> API
|
||||
```
|
||||
|
||||
### 生产环境
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Internet["互联网"]
|
||||
User[用户]
|
||||
end
|
||||
|
||||
subgraph LoadBalancer["负载均衡"]
|
||||
Nginx[Nginx / Traefik]
|
||||
end
|
||||
|
||||
subgraph Docker["Docker 集群"]
|
||||
Web1[Web 容器]
|
||||
Web2[Web 容器]
|
||||
API1[API 容器]
|
||||
API2[API 容器]
|
||||
Engine1[Engine 容器]
|
||||
Engine2[Engine 容器]
|
||||
end
|
||||
|
||||
subgraph Storage["持久化存储"]
|
||||
PG[(PostgreSQL)]
|
||||
Redis[(Redis)]
|
||||
S3[对象存储]
|
||||
end
|
||||
|
||||
User --> Nginx
|
||||
Nginx --> Web1
|
||||
Nginx --> Web2
|
||||
Nginx --> API1
|
||||
Nginx --> API2
|
||||
Nginx --> Engine1
|
||||
Nginx --> Engine2
|
||||
API1 --> PG
|
||||
API2 --> PG
|
||||
API1 --> Redis
|
||||
Engine1 --> Redis
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 技术选型
|
||||
|
||||
| 组件 | 技术 | 选型理由 |
|
||||
|------|------|---------|
|
||||
| **前端框架** | React 18 | 成熟生态,组件化开发 |
|
||||
| **状态管理** | Zustand | 轻量级,TypeScript 友好 |
|
||||
| **UI 组件** | Tailwind CSS | 原子化 CSS,快速开发 |
|
||||
| **后端框架** | FastAPI | 高性能,自动 API 文档 |
|
||||
| **WebSocket** | websockets | Python 异步 WebSocket |
|
||||
| **ORM** | SQLAlchemy | 功能完善,支持多数据库 |
|
||||
| **数据库** | SQLite/PostgreSQL | 开发简单/生产可靠 |
|
||||
|
||||
---
|
||||
|
||||
## 扩展性设计
|
||||
|
||||
### 模型适配器模式
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class ModelAdapter {
|
||||
<<interface>>
|
||||
+generate(prompt) string
|
||||
+stream(prompt) AsyncIterator
|
||||
}
|
||||
|
||||
class OpenAIAdapter {
|
||||
+generate(prompt) string
|
||||
+stream(prompt) AsyncIterator
|
||||
}
|
||||
|
||||
class AzureAdapter {
|
||||
+generate(prompt) string
|
||||
+stream(prompt) AsyncIterator
|
||||
}
|
||||
|
||||
class LocalAdapter {
|
||||
+generate(prompt) string
|
||||
+stream(prompt) AsyncIterator
|
||||
}
|
||||
|
||||
ModelAdapter <|-- OpenAIAdapter
|
||||
ModelAdapter <|-- AzureAdapter
|
||||
ModelAdapter <|-- LocalAdapter
|
||||
```
|
||||
|
||||
通过适配器模式,可以轻松接入新的模型供应商。
|
||||
|
||||
---
|
||||
|
||||
## 相关文档
|
||||
|
||||
- [WebSocket 协议](../api-reference/websocket.md) - 详细的协议规范
|
||||
- [部署指南](../deployment/index.md) - 生产环境部署
|
||||
- [核心概念](../concepts/index.md) - 助手、管线等概念说明
|
||||
148
docs/content/overview/index.md
Normal file
148
docs/content/overview/index.md
Normal file
@@ -0,0 +1,148 @@
|
||||
# 产品概览
|
||||
|
||||
了解 Realtime Agent Studio 的核心功能和设计理念。
|
||||
|
||||
---
|
||||
|
||||
## 什么是 RAS?
|
||||
|
||||
Realtime Agent Studio (RAS) 是一个**开源的实时交互智能体工作平台**,让开发者能够快速构建和部署具备语音对话能力的 AI 助手。
|
||||
|
||||
### 核心价值
|
||||
|
||||
| 价值主张 | 说明 |
|
||||
|---------|------|
|
||||
| **低代码配置** | 可视化界面配置助手,无需编写复杂代码 |
|
||||
| **实时交互** | 毫秒级响应,支持语音打断,自然对话体验 |
|
||||
| **开放灵活** | 支持多种模型供应商,自由选择最适合的方案 |
|
||||
| **私有部署** | 完全自主可控,数据不出域 |
|
||||
|
||||
---
|
||||
|
||||
## 功能模块
|
||||
|
||||
```mermaid
|
||||
mindmap
|
||||
root((RAS))
|
||||
助手管理
|
||||
创建配置
|
||||
提示词编辑
|
||||
模型选择
|
||||
工具调用
|
||||
资源库
|
||||
LLM 模型
|
||||
ASR 模型
|
||||
TTS 声音
|
||||
知识库
|
||||
文档上传
|
||||
向量检索
|
||||
RAG 问答
|
||||
监控分析
|
||||
会话回放
|
||||
数据统计
|
||||
自动测试
|
||||
部署集成
|
||||
WebSocket API
|
||||
REST API
|
||||
SDK
|
||||
```
|
||||
|
||||
### 助手管理
|
||||
|
||||
创建和配置智能对话助手:
|
||||
|
||||
- **系统提示词** - 定义助手角色和行为
|
||||
- **模型配置** - 选择 LLM、ASR、TTS 模型
|
||||
- **工具调用** - 配置 Webhook 和客户端工具
|
||||
- **开场白** - 设置首轮对话模式
|
||||
|
||||
### 资源库
|
||||
|
||||
集中管理各类模型资源:
|
||||
|
||||
- **语音识别 (ASR)** - 多供应商 ASR 模型管理
|
||||
- **大语言模型 (LLM)** - OpenAI、Azure、本地模型
|
||||
- **语音合成 (TTS)** - 多音色声音资源
|
||||
|
||||
### 知识库
|
||||
|
||||
为助手提供专业知识:
|
||||
|
||||
- **文档上传** - 支持 PDF、Word、Markdown 等格式
|
||||
- **向量化索引** - 自动分块和向量化
|
||||
- **RAG 检索** - 基于语义的知识检索
|
||||
|
||||
### 监控分析
|
||||
|
||||
全面的数据分析能力:
|
||||
|
||||
- **会话回放** - 完整链路日志和音频回放
|
||||
- **实时仪表盘** - 并发数、延迟、错误率统计
|
||||
- **自动化测试** - 批量测试和效果评估
|
||||
|
||||
---
|
||||
|
||||
## 对比其他方案
|
||||
|
||||
| 特性 | RAS | Vapi | Retell | ElevenLabs |
|
||||
|------|-----|------|--------|------------|
|
||||
| **开源** | :white_check_mark: | :x: | :x: | :x: |
|
||||
| **私有部署** | :white_check_mark: | :x: | :x: | :x: |
|
||||
| **管线式引擎** | :white_check_mark: | :white_check_mark: | :white_check_mark: | :x: |
|
||||
| **多模态模型** | :white_check_mark: | :white_check_mark: | :x: | :white_check_mark: |
|
||||
| **自定义 ASR/TTS** | :white_check_mark: | 有限 | 有限 | :x: |
|
||||
| **知识库** | :white_check_mark: | :white_check_mark: | :white_check_mark: | :x: |
|
||||
| **工作流编辑** | 开发中 | :white_check_mark: | :x: | :x: |
|
||||
| **定价** | 免费 | 按量付费 | 按量付费 | 按量付费 |
|
||||
|
||||
---
|
||||
|
||||
## 适用场景
|
||||
|
||||
<div class="grid cards" markdown>
|
||||
|
||||
- :telephone_receiver: **智能客服**
|
||||
|
||||
---
|
||||
|
||||
7x24 小时自动接听,处理常见咨询,复杂问题转人工
|
||||
|
||||
- :hospital: **医疗问诊**
|
||||
|
||||
---
|
||||
|
||||
预问诊信息收集,健康咨询,用药提醒
|
||||
|
||||
- :school: **教育培训**
|
||||
|
||||
---
|
||||
|
||||
口语练习,知识问答,个性化辅导
|
||||
|
||||
- :handshake: **销售助手**
|
||||
|
||||
---
|
||||
|
||||
产品介绍,需求挖掘,预约安排
|
||||
|
||||
- :headphones: **语音助手**
|
||||
|
||||
---
|
||||
|
||||
智能家居控制,日程管理,信息查询
|
||||
|
||||
- :robot: **虚拟人**
|
||||
|
||||
---
|
||||
|
||||
数字人直播,虚拟主播,交互式展示
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 下一步
|
||||
|
||||
- [快速开始](../quickstart/index.md) - 5 分钟创建第一个助手
|
||||
- [系统架构](architecture.md) - 深入了解技术实现
|
||||
- [核心概念](../concepts/index.md) - 学习关键概念
|
||||
Reference in New Issue
Block a user