Files
AI-VideoAssistant/docs/content/index.md
Xin Wang 65ae2287d5 Update documentation for assistant configuration and interaction models
- Corrected phrasing in the introduction of RAS as an open-source alternative.
- Added new documentation sections for voice AI and voice agents.
- Enhanced the flowchart for assistant components to include detailed configurations.
- Updated terminology for engine types to clarify distinctions between Pipeline and Realtime engines.
- Introduced a new section on user utterance endpoints (EoU) to explain detection mechanisms and configurations.
2026-03-06 14:38:59 +08:00

324 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<p align="center">
<img src="images/logo.png" alt="Realtime Agent Studio" width="400">
</p>
<p align="center">
<strong>构建实时交互音视频智能体的开源工作平台</strong>
</p>
<p align="center">
<img src="https://img.shields.io/badge/version-0.1.0-blue" alt="Version">
<img src="https://img.shields.io/badge/license-MIT-green" alt="License">
<img src="https://img.shields.io/badge/python-3.10+-blue" alt="Python">
<img src="https://img.shields.io/badge/node-18+-green" alt="Node">
</p>
<p align="center">
<a href="quickstart/index.md">快速开始</a> ·
<a href="api-reference/index.md">API 文档</a> ·
<a href="getting-started/index.md">安装部署</a> ·
<a href="roadmap.md">路线图</a>
</p>
---
## 什么是 Realtime Agent Studio
Realtime Agent Studio (RAS) 是一款以大语言模型为核心,构建实时交互音视频智能体的工作平台。支持管线式的全双工交互引擎和原生多模态模型两种架构,覆盖实时交互智能体的配置、测试、发布、监控全流程。
可以将 RAS 看作 [Vapi](https://vapi.ai)、[Retell](https://retellai.com)、[ElevenLabs Agents](https://elevenlabs.io) 的开源替代方案。
---
## 核心特性
<div class="grid cards" markdown>
- :zap: **低延迟实时引擎**
---
管线式全双工架构VAD/ASR/TD/LLM/TTS 流水线处理,支持智能打断,端到端延迟 < 500ms
- :brain: **多模态模型支持**
---
支持 GPT-4o Realtime、Gemini Live、Step Audio 等原生多模态模型直连
- :wrench: **可视化配置**
---
无代码配置助手、提示词、工具调用、知识库关联,所见即所得
- :electric_plug: **开放 API**
---
标准 WebSocket 协议RESTful 管理接口,支持 Webhook 回调
- :shield: **私有化部署**
---
Docker 一键部署,数据完全自主可控,支持本地模型
- :chart_with_upwards_trend: **全链路监控**
---
完整会话回放,实时仪表盘,自动化测试与效果评估
</div>
---
## 系统架构
平台架构层级:
```mermaid
flowchart TB
%% ================= ACCESS =================
subgraph Access["Access Layer"]
direction TB
API[API]
SDK[SDK]
Browser[Browser UI]
Embed[Web Embed]
end
%% ================= REALTIME ENGINE =================
subgraph Runtime["Realtime Interaction Engine"]
direction LR
%% -------- Duplex Engine --------
subgraph Duplex["Duplex Interaction Engine"]
direction LR
subgraph Pipeline["Pipeline Engine"]
direction LR
VAD[VAD]
ASR[ASR]
TD[Turn Detection]
LLM[LLM]
TTS[TTS]
end
subgraph Multi["Realtime Engine"]
MM[Realtime Model]
end
end
%% -------- Capabilities --------
subgraph Capability["Agent Capabilities"]
subgraph Tools["Tool System"]
Webhook[Webhook]
ClientTool[Client Tools]
Builtin[Builtin Tools]
end
subgraph KB["Knowledge System"]
Docs[Documents]
Vector[(Vector Index)]
Retrieval[Retrieval]
end
end
end
%% ================= PLATFORM =================
subgraph Platform["Platform Services"]
direction TB
Backend[Backend Service]
Frontend[Frontend Console]
DB[(Database)]
end
%% ================= CONNECTIONS =================
Access --> Runtime
Runtime <--> Backend
Backend <--> DB
Backend <--> Frontend
LLM --> Tools
MM --> Tools
LLM <--> KB
MM <--> KB
```
管线式引擎交互引擎对话流程图:
```mermaid
flowchart LR
User((User Speech))
Audio[Audio Stream]
VAD[VAD\nVoice Activity Detection]
ASR[ASR\nSpeech Recognition]
TD[Turn Detection]
LLM[LLM\nReasoning]
Tools[Tools / APIs]
TTS[TTS\nSpeech Synthesis]
AudioOut[Audio Stream Out]
User --> Audio
Audio --> VAD
VAD --> ASR
ASR --> TD
TD --> LLM
LLM --> Tools
Tools --> LLM
LLM --> TTS
TTS --> AudioOut
AudioOut --> User
```
基于实时交互模型的对话流程图:
```mermaid
flowchart LR
User((User))
Input[Audio / Video / Text]
MM[Multimodal Model]
Tools[Tools / APIs]
KB[Knowledge Base]
Output[Audio / Video / Text]
User --> Input
Input --> MM
MM --> Tools
Tools --> MM
MM --> KB
KB --> MM
MM --> Output
Output --> User
```
---
## 技术栈
| 层级 | 技术 |
|------|------|
| **前端** | React 18, TypeScript, Tailwind CSS, Zustand |
| **后端** | FastAPI (Python 3.10+) |
| **引擎** | Python, WebSocket, asyncio |
| **数据库** | SQLite |
| **知识库** | chroma |
| **部署** | Docker |
---
## 快速导航
<div class="grid cards" markdown>
- :rocket: **[快速开始](quickstart/index.md)**
---
5 分钟创建你的第一个 AI 助手
- :book: **[核心概念](concepts/index.md)**
---
了解助手、管线、多模态等核心概念
- :wrench: **[安装部署](getting-started/index.md)**
---
环境准备、本地开发与 Docker/生产部署
- :robot: **[助手管理](assistants/index.md)**
---
创建和配置智能对话助手
- :gear: **[功能定制](customization/knowledge-base.md)**
---
知识库、工具、语音、工作流
- :bar_chart: **[数据分析](analysis/dashboard.md)**
---
仪表盘、历史记录、测试评估
- :electric_plug: **[API 参考](api-reference/index.md)**
---
WebSocket 协议与 REST 接口文档
</div>
---
## 快速体验
### 使用 Docker 启动
```bash
git clone https://github.com/your-org/AI-VideoAssistant.git
cd docker
docker-compose up -d
# for development
# docker compose --profile dev up -d
```
访问 `http://localhost:3000` 即可使用控制台。
### WebSocket 连接示例
```javascript
const ws = new WebSocket('ws://localhost:8000/ws?assistant_id=YOUR_ID');
ws.onopen = () => {
ws.send(JSON.stringify({
type: 'session.start',
audio: { encoding: 'pcm_s16le', sample_rate_hz: 16000, channels: 1 }
}));
};
```
---
## 许可证
本项目基于 [MIT 许可证](https://github.com/your-org/AI-VideoAssistant/blob/main/LICENSE) 开源。