Update documentation and configuration for Realtime Agent Studio
- Revised mkdocs.yml to reflect the new site name and description, enhancing clarity for users. - Added a changelog.md to document important changes and updates for the project. - Introduced a roadmap.md to outline development plans and progress for future releases. - Expanded index.md with a comprehensive overview of the platform, including core features and installation instructions. - Enhanced concepts documentation with detailed explanations of assistants, engines, and their configurations. - Updated configuration documentation to provide clear guidance on environment setup and service configurations. - Added extra JavaScript for improved user experience in the documentation site.
This commit is contained in:
253
docs/content/concepts/assistants.md
Normal file
253
docs/content/concepts/assistants.md
Normal file
@@ -0,0 +1,253 @@
|
||||
# 助手概念详解
|
||||
|
||||
深入了解助手(Assistant)的设计理念和配置细节。
|
||||
|
||||
---
|
||||
|
||||
## 什么是助手?
|
||||
|
||||
**助手**是 RAS 中的核心实体,代表一个具有特定角色、能力和行为的 AI 对话智能体。每个助手都是独立配置的,可以服务于不同的业务场景。
|
||||
|
||||
### 助手的组成
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Assistant["助手"]
|
||||
Identity[身份定义]
|
||||
Models[模型配置]
|
||||
Capabilities[能力扩展]
|
||||
Behavior[行为控制]
|
||||
end
|
||||
|
||||
subgraph Identity
|
||||
Name[名称]
|
||||
Prompt[系统提示词]
|
||||
Language[语言]
|
||||
end
|
||||
|
||||
subgraph Models
|
||||
LLM[LLM 模型]
|
||||
ASR[ASR 模型]
|
||||
TTS[TTS 声音]
|
||||
end
|
||||
|
||||
subgraph Capabilities
|
||||
Tools[工具调用]
|
||||
KB[知识库]
|
||||
end
|
||||
|
||||
subgraph Behavior
|
||||
Greeting[开场白]
|
||||
Interruption[打断设置]
|
||||
Output[输出模式]
|
||||
end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 身份定义
|
||||
|
||||
### 系统提示词
|
||||
|
||||
系统提示词是助手最重要的配置,它定义了:
|
||||
|
||||
| 要素 | 说明 | 示例 |
|
||||
|------|------|------|
|
||||
| **角色** | 助手扮演什么身份 | "你是一名专业的医疗咨询顾问" |
|
||||
| **能力** | 助手能做什么 | "你可以回答健康问题,但不能开具处方" |
|
||||
| **限制** | 助手不能做什么 | "不要讨论政治话题" |
|
||||
| **风格** | 回复的语气和格式 | "保持友好专业,回答简洁" |
|
||||
|
||||
### 提示词模板
|
||||
|
||||
```markdown
|
||||
## 角色
|
||||
你是{{company}}的智能客服助手"小智"。
|
||||
|
||||
## 任务
|
||||
- 回答用户关于产品和服务的问题
|
||||
- 协助处理订单查询和售后问题
|
||||
- 收集用户反馈
|
||||
|
||||
## 限制
|
||||
- 不讨论竞争对手产品
|
||||
- 不承诺超出权限的优惠
|
||||
- 遇到复杂问题引导用户联系人工客服
|
||||
|
||||
## 风格
|
||||
- 语气友好亲切
|
||||
- 回答简洁明了,每次 2-3 句话
|
||||
- 适当使用语气词使对话更自然
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 模型配置
|
||||
|
||||
### LLM 模型
|
||||
|
||||
大语言模型是助手的"大脑",负责理解用户意图和生成回复。
|
||||
|
||||
| 参数 | 说明 | 建议值 |
|
||||
|------|------|--------|
|
||||
| **温度** | 回复随机性,越高越发散 | 0.7 (对话) / 0.3 (问答) |
|
||||
| **最大 Token** | 单次回复长度上限 | 256-512 |
|
||||
| **上下文长度** | 记忆的对话轮数 | 10-20 轮 |
|
||||
|
||||
### ASR 模型
|
||||
|
||||
语音识别模型将用户语音转为文字。
|
||||
|
||||
| 配置 | 说明 |
|
||||
|------|------|
|
||||
| **语言** | 识别语言,如中文、英文 |
|
||||
| **热词** | 提高特定词汇识别率 |
|
||||
| **标点** | 是否自动添加标点 |
|
||||
|
||||
### TTS 声音
|
||||
|
||||
语音合成将助手回复转为语音输出。
|
||||
|
||||
| 配置 | 说明 |
|
||||
|------|------|
|
||||
| **音色** | 选择声音角色 |
|
||||
| **语速** | 说话速度,0.5-2.0 |
|
||||
| **音调** | 声音高低 |
|
||||
|
||||
---
|
||||
|
||||
## 能力扩展
|
||||
|
||||
### 工具调用
|
||||
|
||||
通过工具让助手能够执行外部操作:
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
User[用户] -->|"查询订单"| Assistant[助手]
|
||||
Assistant -->|调用工具| API[订单 API]
|
||||
API -->|返回数据| Assistant
|
||||
Assistant -->|回复| User
|
||||
```
|
||||
|
||||
**工具定义示例:**
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "get_order_status",
|
||||
"description": "查询用户订单状态",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"order_id": {
|
||||
"type": "string",
|
||||
"description": "订单编号"
|
||||
}
|
||||
},
|
||||
"required": ["order_id"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 知识库关联
|
||||
|
||||
让助手基于私有文档回答问题:
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
Question[用户问题] --> Search[知识检索]
|
||||
Search --> KB[(知识库)]
|
||||
KB --> Context[相关内容]
|
||||
Context --> LLM[LLM]
|
||||
LLM --> Answer[回答]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 行为控制
|
||||
|
||||
### 开场白设置
|
||||
|
||||
| 模式 | 说明 |
|
||||
|------|------|
|
||||
| **助手先说** | 连接后助手主动问候 |
|
||||
| **用户先说** | 等待用户开口 |
|
||||
| **静默** | 不自动开场 |
|
||||
|
||||
### 打断设置
|
||||
|
||||
| 选项 | 说明 |
|
||||
|------|------|
|
||||
| **允许打断** | 用户可随时插话 |
|
||||
| **禁止打断** | 助手说完才能输入 |
|
||||
| **灵敏度** | 打断触发的敏感程度 |
|
||||
|
||||
### 输出模式
|
||||
|
||||
| 模式 | 说明 |
|
||||
|------|------|
|
||||
| **语音** | TTS 语音输出 |
|
||||
| **文本** | 纯文本输出 |
|
||||
| **混合** | 同时输出语音和文本 |
|
||||
|
||||
---
|
||||
|
||||
## 助手版本管理
|
||||
|
||||
### 草稿与发布
|
||||
|
||||
```mermaid
|
||||
gitGraph
|
||||
commit id: "创建助手"
|
||||
commit id: "配置提示词"
|
||||
commit id: "添加工具"
|
||||
branch published
|
||||
checkout published
|
||||
commit id: "发布 v1"
|
||||
checkout main
|
||||
commit id: "修改提示词"
|
||||
commit id: "调整参数"
|
||||
checkout published
|
||||
merge main id: "发布 v2"
|
||||
```
|
||||
|
||||
- **草稿**: 可随时修改,仅供测试
|
||||
- **发布**: 正式上线,用于生产环境
|
||||
|
||||
### 配置导入导出
|
||||
|
||||
支持以 JSON 格式导入导出助手配置,便于:
|
||||
|
||||
- 备份和恢复
|
||||
- 跨环境迁移
|
||||
- 团队共享模板
|
||||
|
||||
---
|
||||
|
||||
## 最佳实践
|
||||
|
||||
### 1. 提示词工程
|
||||
|
||||
- **明确角色**: 清晰定义助手身份
|
||||
- **设定边界**: 明确能做什么、不能做什么
|
||||
- **控制长度**: 语音场景下回复要简短
|
||||
|
||||
### 2. 模型选择
|
||||
|
||||
- **平衡成本与效果**: 不一定需要最强模型
|
||||
- **测试不同供应商**: 找到最适合场景的组合
|
||||
- **考虑延迟**: 语音交互对延迟敏感
|
||||
|
||||
### 3. 工具设计
|
||||
|
||||
- **单一职责**: 每个工具做一件事
|
||||
- **清晰描述**: 让 LLM 正确理解何时调用
|
||||
- **错误处理**: 工具失败时优雅降级
|
||||
|
||||
---
|
||||
|
||||
## 相关文档
|
||||
|
||||
- [助手配置](../assistants/configuration.md) - 配置界面详解
|
||||
- [提示词指南](../assistants/prompts.md) - 编写高质量提示词
|
||||
- [工具集成](../customization/tools.md) - 工具配置详情
|
||||
323
docs/content/concepts/engines.md
Normal file
323
docs/content/concepts/engines.md
Normal file
@@ -0,0 +1,323 @@
|
||||
# 引擎架构详解
|
||||
|
||||
深入了解 RAS 的两种引擎架构:管线式引擎和多模态引擎。
|
||||
|
||||
---
|
||||
|
||||
## 引擎概述
|
||||
|
||||
引擎是 RAS 的核心,负责处理实时语音交互。根据不同需求,可以选择两种架构:
|
||||
|
||||
| 架构 | 特点 | 适用场景 |
|
||||
|------|------|---------|
|
||||
| **管线式** | 灵活、可定制、成本可控 | 大多数场景 |
|
||||
| **多模态** | 低延迟、自然、简单 | 高端体验场景 |
|
||||
|
||||
---
|
||||
|
||||
## 管线式引擎 (Pipeline)
|
||||
|
||||
### 架构设计
|
||||
|
||||
管线式引擎将语音交互拆分为三个独立阶段:
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Input["输入处理"]
|
||||
Audio[用户音频] --> VAD[VAD 检测]
|
||||
VAD --> ASR[语音识别]
|
||||
ASR --> Text[转写文本]
|
||||
end
|
||||
|
||||
subgraph Process["语义处理"]
|
||||
Text --> LLM[大语言模型]
|
||||
LLM --> Response[回复文本]
|
||||
end
|
||||
|
||||
subgraph Output["输出生成"]
|
||||
Response --> TTS[语音合成]
|
||||
TTS --> OutputAudio[助手音频]
|
||||
end
|
||||
```
|
||||
|
||||
### 数据流详解
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant U as 用户
|
||||
participant E as 引擎
|
||||
participant ASR as ASR 服务
|
||||
participant LLM as LLM 服务
|
||||
participant TTS as TTS 服务
|
||||
|
||||
U->>E: 音频帧 (PCM 16kHz)
|
||||
|
||||
Note over E: VAD 检测语音活动
|
||||
E->>E: 累积音频缓冲
|
||||
|
||||
Note over E: 检测到语音结束 (EOU)
|
||||
E->>ASR: 发送音频
|
||||
ASR-->>E: 转写文本 (流式)
|
||||
E-->>U: transcript.delta
|
||||
E-->>U: transcript.final
|
||||
|
||||
E->>LLM: 发送对话历史 + 用户输入
|
||||
LLM-->>E: 回复文本 (流式)
|
||||
E-->>U: assistant.response.delta
|
||||
|
||||
loop 流式合成
|
||||
E->>TTS: 文本片段
|
||||
TTS-->>E: 音频片段
|
||||
E-->>U: 音频帧
|
||||
end
|
||||
|
||||
E-->>U: assistant.response.final
|
||||
```
|
||||
|
||||
### 延迟分析
|
||||
|
||||
管线式引擎的延迟由各环节累加:
|
||||
|
||||
| 环节 | 典型延迟 | 优化方向 |
|
||||
|------|---------|---------|
|
||||
| VAD/EOU | 200-500ms | 调整灵敏度 |
|
||||
| ASR | 100-300ms | 选择快速模型 |
|
||||
| LLM TTFT | 200-500ms | 选择低延迟模型 |
|
||||
| TTS | 100-200ms | 流式合成 |
|
||||
| **总计** | **600-1500ms** | - |
|
||||
|
||||
### 流式优化
|
||||
|
||||
为降低感知延迟,采用流式处理:
|
||||
|
||||
```mermaid
|
||||
gantt
|
||||
title 非流式 vs 流式处理
|
||||
dateFormat X
|
||||
axisFormat %s
|
||||
|
||||
section 非流式
|
||||
ASR完成 :a1, 0, 300ms
|
||||
LLM完成 :a2, after a1, 800ms
|
||||
TTS完成 :a3, after a2, 500ms
|
||||
播放 :a4, after a3, 500ms
|
||||
|
||||
section 流式
|
||||
ASR :b1, 0, 300ms
|
||||
LLM开始 :b2, after b1, 200ms
|
||||
TTS开始 :b3, after b2, 100ms
|
||||
边生成边播放 :b4, after b3, 600ms
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 多模态引擎 (Multimodal)
|
||||
|
||||
### 架构设计
|
||||
|
||||
多模态引擎使用端到端模型,直接处理音频输入输出:
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Client["客户端"]
|
||||
Mic[麦克风] --> AudioIn[音频输入]
|
||||
AudioOut[音频输出] --> Speaker[扬声器]
|
||||
end
|
||||
|
||||
subgraph Engine["引擎"]
|
||||
AudioIn --> RT[Realtime Model]
|
||||
RT --> AudioOut
|
||||
end
|
||||
|
||||
subgraph Model["多模态模型"]
|
||||
RT --> GPT4o[GPT-4o Realtime]
|
||||
RT --> Gemini[Gemini Live]
|
||||
RT --> Step[Step Audio]
|
||||
end
|
||||
```
|
||||
|
||||
### 数据流详解
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant U as 用户
|
||||
participant E as 引擎
|
||||
participant RT as Realtime Model
|
||||
|
||||
U->>E: 音频帧
|
||||
E->>RT: 转发音频
|
||||
|
||||
Note over RT: 端到端处理
|
||||
|
||||
RT-->>E: 音频响应 (流式)
|
||||
E-->>U: 播放音频
|
||||
|
||||
Note over U,RT: 支持全双工<br/>用户可随时打断
|
||||
```
|
||||
|
||||
### 支持的模型
|
||||
|
||||
| 模型 | 供应商 | 特点 |
|
||||
|------|--------|------|
|
||||
| **GPT-4o Realtime** | OpenAI | 最自然的语音,延迟极低 |
|
||||
| **Gemini Live** | Google | 多模态能力强 |
|
||||
| **Step Audio** | 阶跃星辰 | 国内可用,中文优化 |
|
||||
|
||||
### 延迟对比
|
||||
|
||||
```mermaid
|
||||
xychart-beta
|
||||
title "端到端延迟对比"
|
||||
x-axis ["管线式 (普通)", "管线式 (优化)", "多模态"]
|
||||
y-axis "延迟 (ms)" 0 --> 1500
|
||||
bar [1200, 700, 300]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 智能打断机制
|
||||
|
||||
两种引擎都支持智能打断,但实现方式不同。
|
||||
|
||||
### 管线式引擎打断
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant U as 用户
|
||||
participant E as 引擎
|
||||
participant TTS as TTS
|
||||
|
||||
Note over E,TTS: TTS 正在合成播放
|
||||
E->>U: 音频帧...
|
||||
|
||||
U->>E: 用户说话 (检测到 VAD)
|
||||
E->>E: 判断是否有效打断
|
||||
|
||||
alt 有效打断
|
||||
E->>TTS: 停止合成
|
||||
E->>E: 清空音频缓冲
|
||||
E-->>U: output.audio.interrupted
|
||||
Note over E: 处理新输入
|
||||
else 噪音/误触发
|
||||
Note over E: 继续播放
|
||||
end
|
||||
```
|
||||
|
||||
### 多模态引擎打断
|
||||
|
||||
多模态模型原生支持全双工,打断由模型内部处理:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant U as 用户
|
||||
participant E as 引擎
|
||||
participant RT as Realtime Model
|
||||
|
||||
Note over RT: 模型正在输出
|
||||
RT-->>E: 音频流...
|
||||
E-->>U: 播放
|
||||
|
||||
U->>E: 用户说话
|
||||
E->>RT: 转发用户音频
|
||||
|
||||
Note over RT: 模型检测到打断<br/>自动停止输出
|
||||
|
||||
RT-->>E: 新的响应
|
||||
E-->>U: 播放新响应
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 引擎选择指南
|
||||
|
||||
### 决策流程
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Start[选择引擎] --> Q1{延迟要求?}
|
||||
|
||||
Q1 -->|< 500ms| Q2{预算充足?}
|
||||
Q1 -->|> 500ms 可接受| Pipeline[管线式引擎]
|
||||
|
||||
Q2 -->|是| Q3{模型可用?}
|
||||
Q2 -->|否| Pipeline
|
||||
|
||||
Q3 -->|GPT-4o/Gemini 可用| Multimodal[多模态引擎]
|
||||
Q3 -->|国内环境受限| Q4{Step Audio?}
|
||||
|
||||
Q4 -->|可用| Multimodal
|
||||
Q4 -->|不可用| Pipeline
|
||||
```
|
||||
|
||||
### 场景推荐
|
||||
|
||||
| 场景 | 推荐引擎 | 理由 |
|
||||
|------|---------|------|
|
||||
| **企业客服** | 管线式 | 成本可控,可定制 ASR |
|
||||
| **高端虚拟人** | 多模态 | 最自然的交互体验 |
|
||||
| **电话机器人** | 管线式 | 可对接电信 ASR |
|
||||
| **语音助手** | 多模态 | 低延迟,自然对话 |
|
||||
| **口语练习** | 管线式 | 需要精确的 ASR 评分 |
|
||||
|
||||
### 混合方案
|
||||
|
||||
也可以根据用户等级使用不同引擎:
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
User[用户请求] --> Router{路由判断}
|
||||
|
||||
Router -->|VIP 用户| Multimodal[多模态引擎]
|
||||
Router -->|普通用户| Pipeline[管线式引擎]
|
||||
|
||||
Multimodal --> Response[响应]
|
||||
Pipeline --> Response
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 配置示例
|
||||
|
||||
### 管线式引擎配置
|
||||
|
||||
```json
|
||||
{
|
||||
"engine": "pipeline",
|
||||
"asr": {
|
||||
"provider": "openai-compatible",
|
||||
"model": "FunAudioLLM/SenseVoiceSmall",
|
||||
"language": "zh"
|
||||
},
|
||||
"llm": {
|
||||
"provider": "openai",
|
||||
"model": "gpt-4o-mini",
|
||||
"temperature": 0.7
|
||||
},
|
||||
"tts": {
|
||||
"provider": "openai-compatible",
|
||||
"model": "FunAudioLLM/CosyVoice2-0.5B",
|
||||
"voice": "anna"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 多模态引擎配置
|
||||
|
||||
```json
|
||||
{
|
||||
"engine": "multimodal",
|
||||
"model": {
|
||||
"provider": "openai",
|
||||
"model": "gpt-4o-realtime-preview",
|
||||
"voice": "alloy"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 相关文档
|
||||
|
||||
- [系统架构](../overview/architecture.md) - 整体架构设计
|
||||
- [WebSocket 协议](../api-reference/websocket.md) - 协议详情
|
||||
- [部署指南](../deployment/index.md) - 引擎部署配置
|
||||
284
docs/content/concepts/index.md
Normal file
284
docs/content/concepts/index.md
Normal file
@@ -0,0 +1,284 @@
|
||||
# 核心概念
|
||||
|
||||
本章节介绍 Realtime Agent Studio 中的核心概念,帮助你更好地理解和使用平台。
|
||||
|
||||
---
|
||||
|
||||
## 概念总览
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Platform["RAS 平台"]
|
||||
Assistant[助手 Assistant]
|
||||
|
||||
subgraph Resources["资源库"]
|
||||
LLM[LLM 模型]
|
||||
ASR[ASR 模型]
|
||||
TTS[TTS 声音]
|
||||
KB[知识库]
|
||||
end
|
||||
|
||||
subgraph Engine["交互引擎"]
|
||||
Pipeline[管线式引擎]
|
||||
Multimodal[多模态引擎]
|
||||
end
|
||||
|
||||
Session[会话 Session]
|
||||
end
|
||||
|
||||
Assistant --> LLM
|
||||
Assistant --> ASR
|
||||
Assistant --> TTS
|
||||
Assistant --> KB
|
||||
Assistant --> Engine
|
||||
Engine --> Session
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 助手 (Assistant)
|
||||
|
||||
**助手**是 RAS 的核心实体,代表一个可对话的 AI 智能体。
|
||||
|
||||
### 助手配置
|
||||
|
||||
每个助手包含以下配置:
|
||||
|
||||
| 配置项 | 说明 |
|
||||
|-------|------|
|
||||
| **名称** | 助手的显示名称 |
|
||||
| **系统提示词** | 定义助手角色、行为、限制 |
|
||||
| **LLM 模型** | 选择用于生成回复的大语言模型 |
|
||||
| **ASR 模型** | 选择用于语音识别的模型 |
|
||||
| **TTS 声音** | 选择用于语音合成的音色 |
|
||||
| **工具** | 配置助手可调用的外部工具 |
|
||||
| **知识库** | 关联的知识库(用于 RAG) |
|
||||
|
||||
### 助手生命周期
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> Draft: 创建
|
||||
Draft --> Draft: 编辑配置
|
||||
Draft --> Published: 发布
|
||||
Published --> Draft: 取消发布
|
||||
Published --> Published: 更新配置
|
||||
Published --> [*]: 删除
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 会话 (Session)
|
||||
|
||||
**会话**代表一次完整的对话交互,从用户连接到断开。
|
||||
|
||||
### 会话状态
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> Connecting: WebSocket 连接
|
||||
Connecting --> Started: session.started
|
||||
Started --> Active: 对话中
|
||||
Active --> Active: 多轮对话
|
||||
Active --> Stopped: session.stop
|
||||
Stopped --> [*]: 连接关闭
|
||||
```
|
||||
|
||||
### 会话数据
|
||||
|
||||
每个会话记录包含:
|
||||
|
||||
- **基本信息** - ID、时长、时间戳
|
||||
- **音频数据** - 用户和助手的音频记录
|
||||
- **转写文本** - ASR 识别结果
|
||||
- **LLM 交互** - 输入输出和工具调用
|
||||
- **元数据** - 渠道、来源、自定义变量
|
||||
|
||||
---
|
||||
|
||||
## 管线式引擎 vs 多模态引擎
|
||||
|
||||
RAS 支持两种引擎架构,适用于不同场景。
|
||||
|
||||
### 管线式引擎 (Pipeline)
|
||||
|
||||
将语音交互拆分为三个独立环节:
|
||||
|
||||
```
|
||||
用户语音 → [ASR] → 文本 → [LLM] → 回复 → [TTS] → 助手语音
|
||||
```
|
||||
|
||||
**优点:**
|
||||
|
||||
- 灵活选择各环节供应商
|
||||
- 可独立优化每个环节
|
||||
- 成本可控
|
||||
|
||||
**缺点:**
|
||||
|
||||
- 延迟较高(累加延迟)
|
||||
- 需要协调多个服务
|
||||
|
||||
### 多模态引擎 (Multimodal)
|
||||
|
||||
使用端到端模型直接处理:
|
||||
|
||||
```
|
||||
用户语音 → [Realtime Model] → 助手语音
|
||||
```
|
||||
|
||||
**优点:**
|
||||
|
||||
- 更低延迟
|
||||
- 更自然的语音
|
||||
- 架构简单
|
||||
|
||||
**缺点:**
|
||||
|
||||
- 依赖特定供应商
|
||||
- 成本较高
|
||||
- 可定制性有限
|
||||
|
||||
### 选择建议
|
||||
|
||||
| 场景 | 推荐引擎 |
|
||||
|------|---------|
|
||||
| 成本敏感 | 管线式 |
|
||||
| 延迟敏感 | 多模态 |
|
||||
| 需要特定 ASR/TTS | 管线式 |
|
||||
| 追求最自然体验 | 多模态 |
|
||||
|
||||
---
|
||||
|
||||
## 智能打断 (Barge-in)
|
||||
|
||||
**智能打断**是指用户在助手说话时可以随时插话,系统能够:
|
||||
|
||||
1. 检测用户开始说话
|
||||
2. 立即停止 TTS 播放
|
||||
3. 处理用户新的输入
|
||||
|
||||
### 打断检测方式
|
||||
|
||||
| 方式 | 说明 |
|
||||
|------|------|
|
||||
| **VAD** | Voice Activity Detection,检测到声音活动即打断 |
|
||||
| **语义** | 基于语音内容判断是否有意义的打断 |
|
||||
| **混合** | VAD + 语义结合,减少误触发 |
|
||||
|
||||
### 打断流程
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User as 用户
|
||||
participant Engine as 引擎
|
||||
participant TTS as TTS
|
||||
|
||||
Note over Engine,TTS: 助手正在播放回复
|
||||
Engine->>User: 音频流...
|
||||
User->>Engine: 开始说话 (VAD 触发)
|
||||
Engine->>Engine: 打断判断
|
||||
Engine->>TTS: 停止合成
|
||||
Engine->>User: output.audio.interrupted
|
||||
Note over Engine: 处理新输入
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 工具调用 (Tool Calling)
|
||||
|
||||
助手可以通过**工具**扩展能力,访问外部系统或执行特定操作。
|
||||
|
||||
### 工具类型
|
||||
|
||||
| 类型 | 说明 | 示例 |
|
||||
|------|------|------|
|
||||
| **Webhook** | 调用外部 HTTP API | 查询订单、预约日程 |
|
||||
| **客户端** | 由客户端执行的操作 | 打开页面、显示表单 |
|
||||
| **内置** | 平台提供的工具 | 代码执行、计算器 |
|
||||
|
||||
### 工具调用流程
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User as 用户
|
||||
participant LLM as LLM
|
||||
participant Tool as 工具
|
||||
|
||||
User->>LLM: "帮我查一下订单状态"
|
||||
LLM->>LLM: 决定调用工具
|
||||
LLM->>Tool: get_order_status(order_id)
|
||||
Tool-->>LLM: {status: "已发货"}
|
||||
LLM->>User: "您的订单已发货"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 知识库 (Knowledge Base)
|
||||
|
||||
**知识库**让助手能够基于私有文档回答问题,实现 RAG(检索增强生成)。
|
||||
|
||||
### 工作原理
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Indexing["索引阶段"]
|
||||
Doc[文档] --> Chunk[分块]
|
||||
Chunk --> Embed[向量化]
|
||||
Embed --> Store[(向量数据库)]
|
||||
end
|
||||
|
||||
subgraph Query["查询阶段"]
|
||||
Q[用户问题] --> QEmbed[问题向量化]
|
||||
QEmbed --> Search[相似度搜索]
|
||||
Store --> Search
|
||||
Search --> Context[相关上下文]
|
||||
Context --> LLM[LLM 生成回答]
|
||||
end
|
||||
```
|
||||
|
||||
### 支持的文档格式
|
||||
|
||||
- PDF
|
||||
- Word (.docx)
|
||||
- Markdown
|
||||
- 纯文本
|
||||
- HTML
|
||||
|
||||
---
|
||||
|
||||
## 动态变量
|
||||
|
||||
**动态变量**允许在运行时向助手注入上下文信息。
|
||||
|
||||
### 使用方式
|
||||
|
||||
在系统提示词中使用 `{{variable}}` 占位符:
|
||||
|
||||
```
|
||||
你是{{company_name}}的客服助手。
|
||||
当前用户是{{customer_name}},会员等级为{{tier}}。
|
||||
```
|
||||
|
||||
连接时通过 `dynamicVariables` 传入:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "session.start",
|
||||
"metadata": {
|
||||
"dynamicVariables": {
|
||||
"company_name": "ABC 公司",
|
||||
"customer_name": "张三",
|
||||
"tier": "VIP"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 下一步
|
||||
|
||||
- [快速开始](../quickstart/index.md) - 创建第一个助手
|
||||
- [助手配置](../assistants/configuration.md) - 详细配置说明
|
||||
- [WebSocket 协议](../api-reference/websocket.md) - API 接口详情
|
||||
Reference in New Issue
Block a user