Files
AI-VideoAssistant/docs/content/concepts/realtime-engine.md
Xin Wang b300b469dc Update documentation for Realtime Agent Studio with enhanced content and structure
- Revised site name and description for clarity and detail.
- Updated navigation structure to better reflect the organization of content.
- Improved changelog entries for better readability and consistency.
- Migrated assistant configuration and prompt guidelines to new documentation paths.
- Enhanced core concepts section to clarify the roles and capabilities of assistants and engines.
- Streamlined workflow documentation to provide clearer guidance on configuration and usage.
2026-03-09 05:38:43 +08:00

98 lines
2.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Realtime 引擎
Realtime 引擎直接连接端到端实时模型,适合把低延迟和自然语音体验放在第一位的场景。
---
## 运行链路
```mermaid
flowchart LR
Input[音频 / 视频 / 文本输入] --> RT[Realtime Model]
RT --> Output[音频 / 文本输出]
RT --> Tools[工具]
```
与 Pipeline 不同Realtime 引擎不会把 ASR、回合检测、LLM、TTS 作为独立阶段暴露出来,而是更多依赖实时模型整体处理。
## 常见后端
| 后端 | 特点 |
|------|------|
| **OpenAI Realtime** | 语音交互自然,延迟低 |
| **Gemini Live** | 多模态能力强 |
| **Doubao 实时交互** | 更适合国内环境与中文场景 |
## 它适合什么场景
- 语音助手、陪练、虚拟角色等高自然度体验场景
- 对首响和连续打断体验要求高的入口
- 希望减少链路拼装复杂度,直接接入端到端模型的团队
## 数据流
```mermaid
sequenceDiagram
participant U as 用户
participant E as 引擎
participant RT as Realtime Model
U->>E: 音频 / 视频 / 文本输入
E->>RT: 转发实时流
RT-->>E: 流式文本 / 音频输出
E-->>U: 播放或渲染结果
```
## Realtime 的优势
- **延迟更低**:链路更短,用户感知更自然
- **全双工更顺滑**:用户插话时,模型更容易在内部处理打断
- **多模态更直接**:适合音频、视频、文本混合输入输出场景
## Realtime 的取舍
- 更依赖实时模型供应商的能力边界
- 不容易对 ASR / TTS / 回合检测做独立替换
- 成本和可观测性往往不如 Pipeline 那样可逐环节拆分
## 智能打断
Realtime 模型通常原生支持全双工和打断:
```mermaid
sequenceDiagram
participant U as 用户
participant E as 引擎
participant RT as Realtime Model
Note over RT: 模型正在输出
RT-->>E: 音频流...
E-->>U: 播放
U->>E: 用户开始说话
E->>RT: 转发新输入
Note over RT: 模型内部处理中断并切换回复
RT-->>E: 新的响应
E-->>U: 播放新响应
```
这种方式更自然,但你通常只能看到模型的整体行为,而不是每个中间阶段的细节。
## 配置示例
```json
{
"engine": "multimodal",
"model": {
"provider": "openai",
"model": "gpt-4o-realtime-preview",
"voice": "alloy"
}
}
```
## 相关文档
- [引擎架构](engines.md) - 回到两类引擎的选择指南
- [Pipeline 引擎](pipeline-engine.md) - 查看分段可控的运行路径
- [WebSocket 协议](../api-reference/websocket.md) - 了解客户端如何与引擎建立会话