- Revised site name and description for clarity and detail. - Updated navigation structure to better reflect the organization of content. - Improved changelog entries for better readability and consistency. - Migrated assistant configuration and prompt guidelines to new documentation paths. - Enhanced core concepts section to clarify the roles and capabilities of assistants and engines. - Streamlined workflow documentation to provide clearer guidance on configuration and usage.
98 lines
2.6 KiB
Markdown
98 lines
2.6 KiB
Markdown
# Realtime 引擎
|
||
|
||
Realtime 引擎直接连接端到端实时模型,适合把低延迟和自然语音体验放在第一位的场景。
|
||
|
||
---
|
||
|
||
## 运行链路
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
Input[音频 / 视频 / 文本输入] --> RT[Realtime Model]
|
||
RT --> Output[音频 / 文本输出]
|
||
RT --> Tools[工具]
|
||
```
|
||
|
||
与 Pipeline 不同,Realtime 引擎不会把 ASR、回合检测、LLM、TTS 作为独立阶段暴露出来,而是更多依赖实时模型整体处理。
|
||
|
||
## 常见后端
|
||
|
||
| 后端 | 特点 |
|
||
|------|------|
|
||
| **OpenAI Realtime** | 语音交互自然,延迟低 |
|
||
| **Gemini Live** | 多模态能力强 |
|
||
| **Doubao 实时交互** | 更适合国内环境与中文场景 |
|
||
|
||
## 它适合什么场景
|
||
|
||
- 语音助手、陪练、虚拟角色等高自然度体验场景
|
||
- 对首响和连续打断体验要求高的入口
|
||
- 希望减少链路拼装复杂度,直接接入端到端模型的团队
|
||
|
||
## 数据流
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant U as 用户
|
||
participant E as 引擎
|
||
participant RT as Realtime Model
|
||
|
||
U->>E: 音频 / 视频 / 文本输入
|
||
E->>RT: 转发实时流
|
||
RT-->>E: 流式文本 / 音频输出
|
||
E-->>U: 播放或渲染结果
|
||
```
|
||
|
||
## Realtime 的优势
|
||
|
||
- **延迟更低**:链路更短,用户感知更自然
|
||
- **全双工更顺滑**:用户插话时,模型更容易在内部处理打断
|
||
- **多模态更直接**:适合音频、视频、文本混合输入输出场景
|
||
|
||
## Realtime 的取舍
|
||
|
||
- 更依赖实时模型供应商的能力边界
|
||
- 不容易对 ASR / TTS / 回合检测做独立替换
|
||
- 成本和可观测性往往不如 Pipeline 那样可逐环节拆分
|
||
|
||
## 智能打断
|
||
|
||
Realtime 模型通常原生支持全双工和打断:
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant U as 用户
|
||
participant E as 引擎
|
||
participant RT as Realtime Model
|
||
|
||
Note over RT: 模型正在输出
|
||
RT-->>E: 音频流...
|
||
E-->>U: 播放
|
||
U->>E: 用户开始说话
|
||
E->>RT: 转发新输入
|
||
Note over RT: 模型内部处理中断并切换回复
|
||
RT-->>E: 新的响应
|
||
E-->>U: 播放新响应
|
||
```
|
||
|
||
这种方式更自然,但你通常只能看到模型的整体行为,而不是每个中间阶段的细节。
|
||
|
||
## 配置示例
|
||
|
||
```json
|
||
{
|
||
"engine": "multimodal",
|
||
"model": {
|
||
"provider": "openai",
|
||
"model": "gpt-4o-realtime-preview",
|
||
"voice": "alloy"
|
||
}
|
||
}
|
||
```
|
||
|
||
## 相关文档
|
||
|
||
- [引擎架构](engines.md) - 回到两类引擎的选择指南
|
||
- [Pipeline 引擎](pipeline-engine.md) - 查看分段可控的运行路径
|
||
- [WebSocket 协议](../api-reference/websocket.md) - 了解客户端如何与引擎建立会话
|