Update documentation for Realtime Agent Studio with enhanced content and structure
- Revised site name and description for clarity and detail. - Updated navigation structure to better reflect the organization of content. - Improved changelog entries for better readability and consistency. - Migrated assistant configuration and prompt guidelines to new documentation paths. - Enhanced core concepts section to clarify the roles and capabilities of assistants and engines. - Streamlined workflow documentation to provide clearer guidance on configuration and usage.
This commit is contained in:
137
docs/content/concepts/pipeline-engine.md
Normal file
137
docs/content/concepts/pipeline-engine.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Pipeline 引擎
|
||||
|
||||
Pipeline 引擎把实时对话拆成多个清晰环节,适合需要高可控性、可替换外部能力和复杂业务编排的场景。
|
||||
|
||||
---
|
||||
|
||||
## 运行链路
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Input["输入处理"]
|
||||
Audio[用户音频] --> VAD[声音活动检测 VAD]
|
||||
VAD --> ASR[语音识别 ASR]
|
||||
ASR --> TD[回合检测 TD]
|
||||
end
|
||||
|
||||
subgraph Reasoning["语义处理"]
|
||||
TD --> LLM[大语言模型 LLM]
|
||||
LLM --> Tools[工具]
|
||||
LLM --> Text[回复文本]
|
||||
end
|
||||
|
||||
subgraph Output["输出生成"]
|
||||
Text --> TTS[语音合成 TTS]
|
||||
TTS --> AudioOut[助手音频]
|
||||
end
|
||||
```
|
||||
|
||||
Pipeline 的关键价值不在于“环节多”,而在于每个环节都可以被单独选择、单独优化、单独观测。
|
||||
|
||||
## 它适合什么场景
|
||||
|
||||
- 需要接特定 ASR / TTS 供应商
|
||||
- 需要稳定接入知识库、工具和工作流
|
||||
- 需要把问题定位到具体环节,而不是只看到整体失败
|
||||
- 需要按延迟、成本、质量对不同环节分别优化
|
||||
|
||||
## 数据流
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant U as 用户
|
||||
participant E as 引擎
|
||||
participant ASR as ASR 服务
|
||||
participant LLM as LLM 服务
|
||||
participant TTS as TTS 服务
|
||||
|
||||
U->>E: 音频帧 (PCM)
|
||||
E->>E: VAD / 回合检测
|
||||
E->>ASR: 发送可识别音频
|
||||
ASR-->>E: transcript.delta / transcript.final
|
||||
E->>LLM: 发送对话历史与当前输入
|
||||
LLM-->>E: assistant.response.delta
|
||||
E->>TTS: 文本片段
|
||||
TTS-->>E: 音频片段
|
||||
E-->>U: 音频流与事件
|
||||
```
|
||||
|
||||
## 延迟来自哪里
|
||||
|
||||
| 环节 | 典型影响 | 常见优化点 |
|
||||
|------|----------|------------|
|
||||
| **VAD / EoU** | 用户说完后多久触发回复 | 调整静音阈值和最短语音门限 |
|
||||
| **ASR** | 语音转写速度和准确率 | 选择合适模型、热词和语言设置 |
|
||||
| **LLM** | 首个 token 返回速度 | 选择低延迟模型、优化上下文 |
|
||||
| **TTS** | 文字到音频的生成速度 | 选择流式 TTS,缩短单次回复 |
|
||||
|
||||
Pipeline 的总延迟通常不是单点问题,而是链路总和。因此更适合做“逐环节调优”。
|
||||
|
||||
## EoU(用户说完)为什么重要
|
||||
|
||||
Pipeline 必须决定“什么时候把当前轮输入正式交给 LLM”。这个判断通常由 **EoU** 完成。
|
||||
|
||||
- 阈值小:响应更快,但更容易把用户停顿误判为说完
|
||||
- 阈值大:更稳,但首次响应会更慢
|
||||
|
||||
你可以把它理解为 Pipeline 中最直接影响“对话节奏感”的参数之一。
|
||||
|
||||
## 工具、知识库和工作流如何插入
|
||||
|
||||
Pipeline 特别适合把业务能力插入到对话中:
|
||||
|
||||
- **知识库**:在 LLM 生成前补充领域事实
|
||||
- **工具**:在需要外部信息或动作时调用系统能力
|
||||
- **工作流**:在多步骤、多分支流程中决定接下来走哪个节点
|
||||
|
||||
这也是它在企业客服、流程助手和知识问答场景中更常见的原因。
|
||||
|
||||
## 智能打断
|
||||
|
||||
在 Pipeline 中,打断通常由 VAD 检测和 TTS 停止逻辑协同完成:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant U as 用户
|
||||
participant E as 引擎
|
||||
participant TTS as TTS
|
||||
|
||||
Note over E,TTS: 正在播放回复
|
||||
E->>U: 音频流...
|
||||
U->>E: 用户开始说话
|
||||
E->>E: 判定是否触发打断
|
||||
E->>TTS: 停止合成 / 播放
|
||||
E-->>U: output.audio.interrupted
|
||||
```
|
||||
|
||||
相比端到端实时模型,这种方式更容易解释“为什么打断”以及“在哪个环节发生了问题”。
|
||||
|
||||
## 配置示例
|
||||
|
||||
```json
|
||||
{
|
||||
"engine": "pipeline",
|
||||
"asr": {
|
||||
"provider": "openai-compatible",
|
||||
"model": "FunAudioLLM/SenseVoiceSmall",
|
||||
"language": "zh"
|
||||
},
|
||||
"llm": {
|
||||
"provider": "openai",
|
||||
"model": "gpt-4o-mini",
|
||||
"temperature": 0.7
|
||||
},
|
||||
"tts": {
|
||||
"provider": "openai-compatible",
|
||||
"model": "FunAudioLLM/CosyVoice2-0.5B",
|
||||
"voice": "anna"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 相关文档
|
||||
|
||||
- [引擎架构](engines.md) - 回到选择指南
|
||||
- [Realtime 引擎](realtime-engine.md) - 对比端到端实时模型路径
|
||||
- [工具](../customization/tools.md) - 设计可被 LLM 安全调用的工具
|
||||
- [知识库](../customization/knowledge-base.md) - 在对话中补充领域知识
|
||||
Reference in New Issue
Block a user