Files

Xin Wang 809b634420 Enhance AssistantConfig and pipeline for FastGPT integration

- Add new fields in AssistantConfig for FastGPT connection details, including `fastgpt_api_url`, `fastgpt_api_key`, and `fastgpt_app_id`.
- Update the pipeline to utilize the new FastGPT configuration, ensuring proper integration with external services.
- Introduce type handling for different assistant types, including support for realtime modes and external brain management.
- Refactor frontend components to include hints for FastGPT configuration inputs, improving user guidance during setup.

2026-06-16 16:55:51 +08:00

Implement StepFun Realtime service and enhance AssistantConfig

2026-06-14 23:41:40 +08:00

routes

Add workflow editor and node types support in frontend and backend

2026-06-15 10:12:41 +08:00

services

Enhance AssistantConfig and pipeline for FastGPT integration

2026-06-16 16:55:51 +08:00

.env.example

fix frontend voice preview fallback

2026-06-10 12:36:18 +08:00

.gitignore

Initial commit: AI Video Assistant fullstack platform.

2026-06-08 13:51:28 +08:00

.python-version

Initial commit: AI Video Assistant fullstack platform.

2026-06-08 13:51:28 +08:00

app.py

Add workflow editor and node types support in frontend and backend

2026-06-15 10:12:41 +08:00

config.py

Refactor backend to support interface-definition driven model resources

2026-06-14 19:36:12 +08:00

Dockerfile

Initial commit: AI Video Assistant fullstack platform.

2026-06-08 13:51:28 +08:00

models.py

Enhance AssistantConfig and pipeline for FastGPT integration

2026-06-16 16:55:51 +08:00

README.md

Implement StepFun Realtime service and enhance AssistantConfig

2026-06-14 23:41:40 +08:00

requirements.txt

Enhance AssistantConfig and pipeline for FastGPT integration

2026-06-16 16:55:51 +08:00

schemas.py

Enhance AssistantConfig and pipeline for FastGPT integration

2026-06-16 16:55:51 +08:00

README.md

AI Video Assistant — 后端引擎

参考 dograh,用 pipecat 作语音引擎的自建后端。目标是逐步长成类 dograh 平台,但同时支持 WebRTC 和 WS 两种音频输出。

双输出架构(核心)

pipecat 把"管线"和"输出方式"解耦:同一条 STT→LLM→TTS 管线可挂不同 transport。

                                ┌─────────────────────────────────┐
  浏览器 ──WebRTC──► /ws/voice ──┤                                 │
                                │  run_pipeline(transport, cfg):  │
  自定义/话务 ─WS──► /ws/stream ─┤  input→STT→LLM→TTS→output       │
                                └─────────────────────────────────┘

WebRTC(/ws/voice):浏览器,低延迟,带 NAT 穿透。SmallWebRTCTransport
WS(/ws/stream):裸音频流,服务端/话务/自定义客户端,无 ICE/TURN。FastAPIWebsocketTransport

加第三种输出(如 Twilio 电话)= 在 services/pipecat/transports.py 再加一个 build_xxx_transport + serializer,管线一行不用改。

目录结构(对齐 dograh 的 `api/`,便于生长)

ai-video-backend/
├── app.py                       # FastAPI 入口,挂路由 + CORS
├── config.py                    # 读 .env,模型接口环境变量兜底
├── models.py                    # AssistantConfig(对齐前端 AssistantForm)
├── routes/                      # 一个文件一组端点(对齐 dograh routes/)
│   ├── health.py
│   ├── voice_webrtc.py          # WebRTC 信令
│   └── voice_ws.py              # WS 裸音频流
├── services/
│   └── pipecat/                 # 引擎(对齐 dograh services/pipecat/)
│       ├── service_factory.py   # 建 STT/LLM/TTS(按 interface_type 分发)
│       ├── transports.py        # transport 工厂(加输出方式在此)
│       └── pipeline.py          # 管线拼装与运行(transport 无关)
├── Dockerfile
├── requirements.txt
└── .env.example

# 平台长大后会再加(对齐 dograh):
#   db/        SQLAlchemy 模型 + 会话(助手/对话/向量)
#   schemas/   pydantic 请求响应
#   tasks/     后台任务(转写、报表)—— 配 redis
#   services/storage.py  录音存储 —— 配 minio

国产栈(全走 OpenAI 兼容,换栈只改 .env)

类型	默认	接入
LLM	DeepSeek	云端直连,只需 key
STT	SenseVoice / FunASR	本地 OpenAI 兼容转写服务
TTS	CosyVoice	本地 OpenAI 兼容 TTS 服务

讯飞 ASR / TTS / SuperTTS

讯飞鉴权直接存入对应 ModelResource.secrets，接口参数存入 ModelResource.values：

普通语音识别：interface_type=xfyun-asr
普通语音合成：interface_type=xfyun-tts
超拟人语音合成：interface_type=xfyun-super-tts
values.apiUrl 保存讯飞 WebSocket URL，音色、语速等可选参数也放在 values
secrets 分别保存 appId、apiKey、apiSecret

接口定义驱动的模型注册表

LLM、ASR、TTS、Embedding、Realtime 使用同一套两层结构：

assistant_model_bindings -> model_resources -> interface_definitions

interface_definitions: 定义具体接入协议、能力和动态表单字段。
model_resources: 每条资源自带 values/secrets，不复用供应商账号。
assistant_model_bindings: 助手按能力选择模型资源。

interface_type 是具体协议，例如 xfyun-asr、xfyun-tts、 xfyun-super-tts，后端严格按它选择服务实现，不根据模型 ID 或 URL 猜测。

API：

/api/interface-definitions: 前端读取字段定义并动态生成 Dialog。
/api/model-resources: 统一模型资源 CRUD，敏感字段逐项打码。

本地运行(用 uv,Python 3.12)

cd ai-video/backend
uv venv                       # 按 .python-version 用 3.12

# 阶段 A:只验证存储/CRUD(不装 pipecat,秒级)
uv pip install fastapi "uvicorn[standard]" sqlalchemy asyncpg greenlet python-dotenv pydantic loguru
# 阶段 B:做语音时再装全量(含 pipecat,需 3.10+)
# uv pip install -r requirements.txt

cp .env.example .env          # CRUD 阶段只需 DATABASE_URL;语音再填模型 key
# 起 Postgres:在 ai-video/ 下 docker compose up -d postgres
uv run --with-requirements requirements.txt uvicorn app:app --reload --port 8000

pipecat 相关代码用惰性导入,所以阶段 A 不装 pipecat 也能启动并跑 /api/* 与 /health; 只有真正连 /ws/voice、/ws/stream 时才需要全量依赖。

交互式 API 文档:启动后访问 http://localhost:8000/docs(手动戳 CRUD、定 schema 用)。

Docker(ai-video/docker-compose.yaml)—— 调试主路径

api 服务挂了源码 + --reload,前端用 npm dev + HMR,改代码都即时生效。

cd ai-video
docker compose up                 # 前台起 pg + api(:8000)+ ui(:3030),日志直出
docker compose up -d              # 后台起;看日志 docker compose logs -f api
docker compose down               # 停止全部

# 可选:对象存储 / 后台任务
docker compose --profile data up  # + rustfs(S3) / redis
# 可选:公网部署(WebRTC 需 TURN)
docker compose --profile remote up -d

首次 up 会构建 api 镜像(装全量 requirements.txt,含 pipecat,较慢)。之后改 Python 代码靠 --reload 热更新,不用重建;只有改 requirements.txt 才 docker compose build api。

待联调 / TODO

联调 Pipecat 1.3.0 语音链路与各 OpenAI 兼容服务
起本地 SenseVoice / CosyVoice 的 OpenAI 兼容服务
realtime 模式(StepFun StepAudio Realtime)
前端 DebugVoicePanel 接 /ws/voice(参考 dograh useWebSocketRTC.tsx)
加 DB 后:助手配置入库(目前随请求内联)

README.md Unescape Escape

AI Video Assistant — 后端引擎

双输出架构(核心)

目录结构(对齐 dograh 的 api/,便于生长)

国产栈(全走 OpenAI 兼容,换栈只改 .env)

讯飞 ASR / TTS / SuperTTS

接口定义驱动的模型注册表

本地运行(用 uv,Python 3.12)

Docker(ai-video/docker-compose.yaml)—— 调试主路径

待联调 / TODO

README.md

目录结构(对齐 dograh 的 `api/`,便于生长)