Files

Xin Wang c64b7dcf99 Enhance credential management and testing functionality

- Introduce new fields for voice, speed, and language in the AssistantConfig and ProviderCredential models to support TTS and ASR configurations.
- Update the database schema and seeding script to accommodate the new fields, ensuring backward compatibility.
- Implement credential testing endpoints and logic to validate OpenAI-compatible credentials, enhancing user experience and reliability.
- Modify frontend components to include new fields in the credential forms and improve connection testing feedback.
- Refactor related services and API interactions to support the new credential testing feature.

2026-06-09 14:42:25 +08:00

Enhance credential management and testing functionality

2026-06-09 14:42:25 +08:00

routes

Enhance credential management and testing functionality

2026-06-09 14:42:25 +08:00

services

Enhance credential management and testing functionality

2026-06-09 14:42:25 +08:00

.env.example

Initial commit: AI Video Assistant fullstack platform.

2026-06-08 13:51:28 +08:00

.gitignore

Initial commit: AI Video Assistant fullstack platform.

2026-06-08 13:51:28 +08:00

.python-version

Initial commit: AI Video Assistant fullstack platform.

2026-06-08 13:51:28 +08:00

app.py

Implement knowledge base management and enhance assistant configuration

2026-06-09 08:31:39 +08:00

config.py

Enhance AI Video Assistant platform with new Makefile for development commands, update CORS origins for local access, and implement API client for credential management. Add seed data for model credentials and refactor ComponentsModelsPage to utilize API for dynamic data loading. Update Next.js configuration for Turbopack compatibility.

2026-06-08 22:39:45 +08:00

Dockerfile

Initial commit: AI Video Assistant fullstack platform.

2026-06-08 13:51:28 +08:00

models.py

Enhance credential management and testing functionality

2026-06-09 14:42:25 +08:00

README.md

Initial commit: AI Video Assistant fullstack platform.

2026-06-08 13:51:28 +08:00

requirements.txt

Enhance credential management and testing functionality

2026-06-09 14:42:25 +08:00

schemas.py

Enhance credential management and testing functionality

2026-06-09 14:42:25 +08:00

README.md

AI Video Assistant — 后端引擎

参考 dograh,用 pipecat 作语音引擎的自建后端。目标是逐步长成类 dograh 平台,但同时支持 WebRTC 和 WS 两种音频输出。

双输出架构(核心)

pipecat 把"管线"和"输出方式"解耦:同一条 STT→LLM→TTS 管线可挂不同 transport。

                                ┌─────────────────────────────────┐
  浏览器 ──WebRTC──► /ws/voice ──┤                                 │
                                │  run_pipeline(transport, cfg):  │
  自定义/话务 ─WS──► /ws/stream ─┤  input→STT→LLM→TTS→output       │
                                └─────────────────────────────────┘

WebRTC(/ws/voice):浏览器,低延迟,带 NAT 穿透。SmallWebRTCTransport
WS(/ws/stream):裸音频流,服务端/话务/自定义客户端,无 ICE/TURN。FastAPIWebsocketTransport

加第三种输出(如 Twilio 电话)= 在 services/pipecat/transports.py 再加一个 build_xxx_transport + serializer,管线一行不用改。

目录结构(对齐 dograh 的 `api/`,便于生长)

ai-video-backend/
├── app.py                       # FastAPI 入口,挂路由 + CORS
├── config.py                    # 读 .env,所有 provider 接入点
├── models.py                    # AssistantConfig(对齐前端 AssistantForm)
├── routes/                      # 一个文件一组端点(对齐 dograh routes/)
│   ├── health.py
│   ├── voice_webrtc.py          # WebRTC 信令
│   └── voice_ws.py              # WS 裸音频流
├── services/
│   └── pipecat/                 # 引擎(对齐 dograh services/pipecat/)
│       ├── service_factory.py   # 建 STT/LLM/TTS(加 provider 在此)
│       ├── transports.py        # transport 工厂(加输出方式在此)
│       └── pipeline.py          # 管线拼装与运行(transport 无关)
├── Dockerfile
├── requirements.txt
└── .env.example

# 平台长大后会再加(对齐 dograh):
#   db/        SQLAlchemy 模型 + 会话(助手/对话/向量)
#   schemas/   pydantic 请求响应
#   tasks/     后台任务(转写、报表)—— 配 redis
#   services/storage.py  录音存储 —— 配 minio

国产栈(全走 OpenAI 兼容,换栈只改 .env)

类型	默认	接入
LLM	DeepSeek	云端直连,只需 key
STT	SenseVoice / FunASR	本地 OpenAI 兼容转写服务
TTS	CosyVoice	本地 OpenAI 兼容 TTS 服务

本地运行(用 uv,Python 3.12)

cd ai-video/backend
uv venv                       # 按 .python-version 用 3.12

# 阶段 A:只验证存储/CRUD(不装 pipecat,秒级)
uv pip install fastapi "uvicorn[standard]" sqlalchemy asyncpg greenlet python-dotenv pydantic loguru
# 阶段 B:做语音时再装全量(含 pipecat,需 3.10+)
# uv pip install -r requirements.txt

cp .env.example .env          # CRUD 阶段只需 DATABASE_URL;语音再填模型 key
# 起 Postgres:在 ai-video/ 下 docker compose up -d postgres
.venv/bin/uvicorn app:app --reload --port 8000

pipecat 相关代码用惰性导入,所以阶段 A 不装 pipecat 也能启动并跑 /api/* 与 /health; 只有真正连 /ws/voice、/ws/stream 时才需要全量依赖。

交互式 API 文档:启动后访问 http://localhost:8000/docs(手动戳 CRUD、定 schema 用)。

Docker(ai-video/docker-compose.yaml)—— 调试主路径

api 服务挂了源码 + --reload,前端用 npm dev + HMR,改代码都即时生效。

cd ai-video
docker compose up                 # 前台起 pg + api(:8000)+ ui(:3000),日志直出
docker compose up -d              # 后台起;看日志 docker compose logs -f api
docker compose down               # 停止全部

# 可选:对象存储 / 后台任务
docker compose --profile data up  # + rustfs(S3) / redis
# 可选:公网部署(WebRTC 需 TURN)
docker compose --profile remote up -d

首次 up 会构建 api 镜像(装全量 requirements.txt,含 pipecat,较慢)。之后改 Python 代码靠 --reload 热更新,不用重建;只有改 requirements.txt 才 docker compose build api。

待联调 / TODO

pip install 后跑通,核对 pipecat 版本的服务/transport 构造参数(代码内有注释)
起本地 SenseVoice / CosyVoice 的 OpenAI 兼容服务
realtime 模式(目前只 pipeline 级联)
前端 DebugVoicePanel 接 /ws/voice(抄 dograh useWebSocketRTC.tsx)
加 DB 后:助手配置入库(目前随请求内联)

README.md

AI Video Assistant — 后端引擎

双输出架构(核心)

目录结构(对齐 dograh 的 api/,便于生长)

国产栈(全走 OpenAI 兼容,换栈只改 .env)

本地运行(用 uv,Python 3.12)

Docker(ai-video/docker-compose.yaml)—— 调试主路径

待联调 / TODO

目录结构(对齐 dograh 的 `api/`,便于生长)