- Included a new JavaScript file for Mermaid configuration to ensure consistent diagram sizing across documentation. - Enhanced architecture documentation to reflect the updated pipeline engine structure, including VAD, ASR, TD, LLM, and TTS components. - Updated various sections to clarify the integration of external services and tools within the architecture. - Improved styling for Mermaid diagrams to enhance visual consistency and usability.
324 lines
5.8 KiB
Markdown
324 lines
5.8 KiB
Markdown
<p align="center">
|
||
<img src="images/logo.png" alt="Realtime Agent Studio" width="400">
|
||
</p>
|
||
|
||
<p align="center">
|
||
<strong>构建实时交互音视频智能体的开源工作平台</strong>
|
||
</p>
|
||
|
||
<p align="center">
|
||
<img src="https://img.shields.io/badge/version-0.1.0-blue" alt="Version">
|
||
<img src="https://img.shields.io/badge/license-MIT-green" alt="License">
|
||
<img src="https://img.shields.io/badge/python-3.10+-blue" alt="Python">
|
||
<img src="https://img.shields.io/badge/node-18+-green" alt="Node">
|
||
</p>
|
||
|
||
<p align="center">
|
||
<a href="quickstart/index.md">快速开始</a> ·
|
||
<a href="api-reference/index.md">API 文档</a> ·
|
||
<a href="getting-started/index.md">安装部署</a> ·
|
||
<a href="roadmap.md">路线图</a>
|
||
</p>
|
||
|
||
---
|
||
|
||
## 什么是 Realtime Agent Studio?
|
||
|
||
Realtime Agent Studio (RAS) 是一款以大语言模型为核心,构建实时交互音视频智能体的工作平台。支持管线式的全双工交互引擎和原生多模态模型两种架构,覆盖实时交互智能体的配置、测试、发布、监控全流程。
|
||
|
||
可以将 RAS 看作 [Vapi](https://vapi.ai)、[Retell](https://retellai.com)、[ElevenLabs Agents](https://elevenlabs.io) 的**开源替代方案**。
|
||
|
||
---
|
||
|
||
## 核心特性
|
||
|
||
<div class="grid cards" markdown>
|
||
|
||
- :zap: **低延迟实时引擎**
|
||
|
||
---
|
||
|
||
管线式全双工架构,VAD/ASR/TD/LLM/TTS 流水线处理,支持智能打断,端到端延迟 < 500ms
|
||
|
||
- :brain: **多模态模型支持**
|
||
|
||
---
|
||
|
||
支持 GPT-4o Realtime、Gemini Live、Step Audio 等原生多模态模型直连
|
||
|
||
- :wrench: **可视化配置**
|
||
|
||
---
|
||
|
||
无代码配置助手、提示词、工具调用、知识库关联,所见即所得
|
||
|
||
- :electric_plug: **开放 API**
|
||
|
||
---
|
||
|
||
标准 WebSocket 协议,RESTful 管理接口,支持 Webhook 回调
|
||
|
||
- :shield: **私有化部署**
|
||
|
||
---
|
||
|
||
Docker 一键部署,数据完全自主可控,支持本地模型
|
||
|
||
- :chart_with_upwards_trend: **全链路监控**
|
||
|
||
---
|
||
|
||
完整会话回放,实时仪表盘,自动化测试与效果评估
|
||
|
||
</div>
|
||
|
||
---
|
||
|
||
## 系统架构
|
||
|
||
平台架构层级:
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
|
||
%% ================= ACCESS =================
|
||
subgraph Access["Access Layer"]
|
||
direction TB
|
||
API[API]
|
||
SDK[SDK]
|
||
Browser[Browser UI]
|
||
Embed[Web Embed]
|
||
end
|
||
|
||
|
||
%% ================= REALTIME ENGINE =================
|
||
subgraph Runtime["Realtime Interaction Engine"]
|
||
|
||
direction LR
|
||
|
||
%% -------- Duplex Engine --------
|
||
subgraph Duplex["Duplex Interaction Engine"]
|
||
direction LR
|
||
|
||
subgraph Pipeline["Pipeline Engine"]
|
||
direction LR
|
||
VAD[VAD]
|
||
ASR[ASR]
|
||
TD[Turn Detection]
|
||
LLM[LLM]
|
||
TTS[TTS]
|
||
end
|
||
|
||
subgraph Multi["Realtime Engine"]
|
||
MM[Realtime Model]
|
||
end
|
||
|
||
end
|
||
|
||
|
||
%% -------- Capabilities --------
|
||
subgraph Capability["Agent Capabilities"]
|
||
|
||
subgraph Tools["Tool System"]
|
||
Webhook[Webhook]
|
||
ClientTool[Client Tools]
|
||
Builtin[Builtin Tools]
|
||
end
|
||
|
||
subgraph KB["Knowledge System"]
|
||
Docs[Documents]
|
||
Vector[(Vector Index)]
|
||
Retrieval[Retrieval]
|
||
end
|
||
|
||
end
|
||
|
||
end
|
||
|
||
|
||
%% ================= PLATFORM =================
|
||
subgraph Platform["Platform Services"]
|
||
direction TB
|
||
Backend[Backend Service]
|
||
Frontend[Frontend Console]
|
||
DB[(Database)]
|
||
end
|
||
|
||
|
||
%% ================= CONNECTIONS =================
|
||
|
||
Access --> Runtime
|
||
|
||
Runtime <--> Backend
|
||
Backend <--> DB
|
||
Backend <--> Frontend
|
||
|
||
LLM --> Tools
|
||
MM --> Tools
|
||
|
||
LLM <--> KB
|
||
MM <--> KB
|
||
```
|
||
|
||
管线式引擎交互引擎对话流程图:
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
|
||
User((User Speech))
|
||
Audio[Audio Stream]
|
||
|
||
VAD[VAD\nVoice Activity Detection]
|
||
ASR[ASR\nSpeech Recognition]
|
||
|
||
TD[Turn Detection]
|
||
|
||
LLM[LLM\nReasoning]
|
||
|
||
Tools[Tools / APIs]
|
||
|
||
TTS[TTS\nSpeech Synthesis]
|
||
|
||
AudioOut[Audio Stream Out]
|
||
|
||
User --> Audio
|
||
Audio --> VAD
|
||
VAD --> ASR
|
||
ASR --> TD
|
||
TD --> LLM
|
||
|
||
LLM --> Tools
|
||
Tools --> LLM
|
||
|
||
LLM --> TTS
|
||
TTS --> AudioOut
|
||
AudioOut --> User
|
||
```
|
||
|
||
基于实时交互模型的对话流程图:
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
|
||
User((User))
|
||
|
||
Input[Audio / Video / Text]
|
||
|
||
MM[Multimodal Model]
|
||
|
||
Tools[Tools / APIs]
|
||
KB[Knowledge Base]
|
||
|
||
Output[Audio / Video / Text]
|
||
|
||
User --> Input
|
||
Input --> MM
|
||
|
||
MM --> Tools
|
||
Tools --> MM
|
||
|
||
MM --> KB
|
||
KB --> MM
|
||
|
||
MM --> Output
|
||
Output --> User
|
||
```
|
||
|
||
---
|
||
|
||
## 技术栈
|
||
|
||
| 层级 | 技术 |
|
||
|------|------|
|
||
| **前端** | React 18, TypeScript, Tailwind CSS, Zustand |
|
||
| **后端** | FastAPI (Python 3.10+) |
|
||
| **引擎** | Python, WebSocket, asyncio |
|
||
| **数据库** | SQLite |
|
||
| **知识库** | chroma |
|
||
| **部署** | Docker |
|
||
|
||
---
|
||
|
||
## 快速导航
|
||
|
||
<div class="grid cards" markdown>
|
||
|
||
- :rocket: **[快速开始](quickstart/index.md)**
|
||
|
||
---
|
||
|
||
5 分钟创建你的第一个 AI 助手
|
||
|
||
- :book: **[核心概念](concepts/index.md)**
|
||
|
||
---
|
||
|
||
了解助手、管线、多模态等核心概念
|
||
|
||
- :wrench: **[安装部署](getting-started/index.md)**
|
||
|
||
---
|
||
|
||
环境准备、本地开发与 Docker/生产部署
|
||
|
||
- :robot: **[助手管理](assistants/index.md)**
|
||
|
||
---
|
||
|
||
创建和配置智能对话助手
|
||
|
||
- :gear: **[功能定制](customization/knowledge-base.md)**
|
||
|
||
---
|
||
|
||
知识库、工具、语音、工作流
|
||
|
||
- :bar_chart: **[数据分析](analysis/dashboard.md)**
|
||
|
||
---
|
||
|
||
仪表盘、历史记录、测试评估
|
||
|
||
- :electric_plug: **[API 参考](api-reference/index.md)**
|
||
|
||
---
|
||
|
||
WebSocket 协议与 REST 接口文档
|
||
|
||
</div>
|
||
|
||
---
|
||
|
||
## 快速体验
|
||
|
||
### 使用 Docker 启动
|
||
|
||
```bash
|
||
git clone https://github.com/your-org/AI-VideoAssistant.git
|
||
cd docker
|
||
docker-compose up -d
|
||
# for development
|
||
# docker compose --profile dev up -d
|
||
```
|
||
|
||
访问 `http://localhost:3000` 即可使用控制台。
|
||
|
||
### WebSocket 连接示例
|
||
|
||
```javascript
|
||
const ws = new WebSocket('ws://localhost:8000/ws?assistant_id=YOUR_ID');
|
||
|
||
ws.onopen = () => {
|
||
ws.send(JSON.stringify({
|
||
type: 'session.start',
|
||
audio: { encoding: 'pcm_s16le', sample_rate_hz: 16000, channels: 1 }
|
||
}));
|
||
};
|
||
```
|
||
|
||
---
|
||
|
||
## 许可证
|
||
|
||
本项目基于 [MIT 许可证](https://github.com/your-org/AI-VideoAssistant/blob/main/LICENSE) 开源。
|