Add Mermaid diagram support and update architecture documentation
- Included a new JavaScript file for Mermaid configuration to ensure consistent diagram sizing across documentation. - Enhanced architecture documentation to reflect the updated pipeline engine structure, including VAD, ASR, TD, LLM, and TTS components. - Updated various sections to clarify the integration of external services and tools within the architecture. - Improved styling for Mermaid diagrams to enhance visual consistency and usability.
This commit is contained in:
@@ -31,9 +31,16 @@ flowchart TB
|
||||
end
|
||||
|
||||
subgraph External["外部服务"]
|
||||
LLM[LLM 服务]
|
||||
ASR[ASR 服务]
|
||||
TTS[TTS 服务]
|
||||
OpenAI[OpenAI]
|
||||
SiliconFlow[SiliconFlow]
|
||||
DashScope[DashScope]
|
||||
LocalModel[本地模型]
|
||||
end
|
||||
|
||||
subgraph Tools["工具"]
|
||||
Webhook[Webhook]
|
||||
ClientTool[客户端工具]
|
||||
Builtin[内建工具]
|
||||
end
|
||||
|
||||
Browser --> WebApp
|
||||
@@ -44,9 +51,8 @@ flowchart TB
|
||||
API <--> DB
|
||||
API <--> FileStore
|
||||
Engine <--> API
|
||||
Engine --> LLM
|
||||
Engine --> ASR
|
||||
Engine --> TTS
|
||||
Engine --> External
|
||||
Engine --> Tools
|
||||
```
|
||||
|
||||
---
|
||||
@@ -60,7 +66,7 @@ flowchart TB
|
||||
| 功能模块 | 说明 |
|
||||
|---------|------|
|
||||
| 助手管理 | 创建、配置、测试智能助手 |
|
||||
| 资源库 | LLM/ASR/TTS 模型管理 |
|
||||
| 资源库 | LLM/ASR/TTS/VAD 等模型管理 |
|
||||
| 知识库 | RAG 文档上传与管理 |
|
||||
| 历史记录 | 会话日志查询与回放 |
|
||||
| 仪表盘 | 实时数据统计 |
|
||||
@@ -103,45 +109,74 @@ flowchart TB
|
||||
SM[会话管理器]
|
||||
|
||||
subgraph Pipeline["管线式引擎"]
|
||||
VAD[VAD 检测]
|
||||
ASR[语音识别]
|
||||
LLM[大语言模型]
|
||||
TTS[语音合成]
|
||||
VAD[声音活动检测 VAD]
|
||||
ASR[语音识别 ASR]
|
||||
TD[回合检测 TD]
|
||||
LLM[大语言模型 LLM]
|
||||
TTS[语音合成 TTS]
|
||||
end
|
||||
|
||||
subgraph Multimodal["多模态引擎"]
|
||||
RT[Realtime Model<br/>GPT-4o / Gemini]
|
||||
subgraph Realtime["实时交互引擎连接"]
|
||||
RTOpenAI[OpenAI Realtime]
|
||||
RTGemini[Gemini Live]
|
||||
RTDoubao[Doubao 实时交互]
|
||||
end
|
||||
|
||||
subgraph Tools["工具"]
|
||||
Webhook[Webhook]
|
||||
ClientTool[客户端工具]
|
||||
Builtin[内建工具]
|
||||
end
|
||||
end
|
||||
|
||||
Client[客户端] -->|音频流| WS
|
||||
WS --> SM
|
||||
SM --> Pipeline
|
||||
SM --> Multimodal
|
||||
SM --> Realtime
|
||||
Pipeline --> LLM
|
||||
LLM --> Tools
|
||||
Realtime --> Tools
|
||||
Pipeline -->|文本/音频| WS
|
||||
Multimodal -->|文本/音频| WS
|
||||
Realtime -->|文本/音频| WS
|
||||
```
|
||||
|
||||
### 外部服务与工具
|
||||
|
||||
| 类别 | 说明 | 可选项 |
|
||||
|------|------|--------|
|
||||
| **外部服务** | 管线式引擎各环节所依赖的云/本地服务 | OpenAI、SiliconFlow、DashScope、本地模型 |
|
||||
| **实时交互引擎** | 实时交互引擎可连接的后端 | OpenAI Realtime、Gemini Live、Doubao 实时交互引擎 |
|
||||
| **工具** | 管线式 LLM 与实时交互引擎均可调用 | Webhook、客户端工具、内建工具 |
|
||||
|
||||
---
|
||||
|
||||
## 引擎架构
|
||||
|
||||
### 管线式全双工引擎
|
||||
|
||||
传统方案,将语音交互拆分为三个独立阶段:
|
||||
管线式引擎包含:**声音活动检测(VAD)**、**语音识别(ASR)**、**回合检测(TD)**、**大语言模型(LLM)**、**语音合成(TTS)**。外部服务可选用 **OpenAI**、**SiliconFlow**、**DashScope**、**本地模型**。LLM 可连接**工具**(Webhook、客户端工具、内建工具)。
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant C as 客户端
|
||||
participant E as 引擎
|
||||
participant VAD as VAD
|
||||
participant ASR as 语音识别
|
||||
participant TD as 回合检测
|
||||
participant LLM as 大语言模型
|
||||
participant TTS as 语音合成
|
||||
participant Tools as 工具
|
||||
|
||||
C->>E: 音频流 (PCM)
|
||||
E->>VAD: 检测语音活动
|
||||
VAD-->>E: 有效语音段
|
||||
E->>ASR: 语音转文字
|
||||
ASR-->>E: 转写文本
|
||||
E->>TD: 回合边界
|
||||
TD-->>E: 可送 LLM 的输入
|
||||
E->>LLM: 生成回复
|
||||
LLM->>Tools: 可选:调用工具
|
||||
Tools-->>LLM: 工具结果
|
||||
LLM-->>E: 回复文本 (流式)
|
||||
E->>TTS: 文字转语音
|
||||
TTS-->>E: 音频流
|
||||
@@ -150,10 +185,15 @@ sequenceDiagram
|
||||
|
||||
**特点:**
|
||||
|
||||
- 灵活选择各环节供应商
|
||||
- 可独立优化每个环节
|
||||
- 灵活选择各环节供应商(OpenAI、SiliconFlow、DashScope、本地模型)
|
||||
- 可独立优化 VAD、ASR、TD、LLM、TTS 每个环节
|
||||
- LLM 与工具联动(Webhook、客户端工具、内建工具)
|
||||
- 延迟约 500-1500ms
|
||||
|
||||
### 实时交互引擎
|
||||
|
||||
实时交互引擎可连接**实时交互引擎**,包括 **OpenAI Realtime**、**Gemini Live**、**Doubao 实时交互引擎**等,同样可连接**工具**(Webhook、客户端工具、内建工具)。
|
||||
|
||||
### 原生多模态引擎
|
||||
|
||||
使用端到端多模态模型(如 GPT-4o Realtime):
|
||||
|
||||
Reference in New Issue
Block a user