From 0821d73e7c0cae5b99bab2b9dad9e10f2eb30544 Mon Sep 17 00:00:00 2001
From: Xin Wang <wangx184@chinatelecom.cn>
Date: Sat, 28 Feb 2026 14:37:58 +0800
Subject: [PATCH] Add API reference documentation for WebSocket communication.
 Update mkdocs.yml to include new API reference section.

---
 docs/content/api-reference.md | 306 ++++++++++++++++++++++++++++++++++
 docs/mkdocs.yml               |   2 +
 2 files changed, 308 insertions(+)
 create mode 100644 docs/content/api-reference.md
diff --git a/docs/content/api-reference.md b/docs/content/api-reference.md
new file mode 100644
index 0000000..a791ea6
--- /dev/null
+++ b/docs/content/api-reference.md
@@ -0,0 +1,306 @@
+# API 参考
+
+本节提供 AI Video Assistant 的 API 接口文档，包括 WebSocket 实时通信协议和后端 REST API。
+
+---
+
+## WebSocket 实时通信
+
+WebSocket 端点提供双向实时语音对话能力，支持音频流输入输出和文本消息交互。
+
+### 连接地址
+
+```
+ws://<host>/ws
+```
+
+### 传输规则
+
+- **文本帧**：JSON 格式控制消息
+- **二进制帧**：PCM 音频数据（`pcm_s16le`, 16kHz, 单声道）
+- 帧长度必须是 640 字节的整数倍（20ms 音频 = 640 bytes）
+
+---
+
+### 消息流程
+
+```
+Client -> hello
+Server <- hello.ack
+Client -> session.start
+Server <- session.started
+Server <- config.resolved
+Client -> (binary pcm frames...)
+Server <- input.speech_started / transcript.delta / transcript.final
+Server <- assistant.response.delta / assistant.response.final
+Server <- output.audio.start
+Server <- (binary pcm frames...)
+Server <- output.audio.end
+Client -> session.stop
+Server <- session.stopped
+```
+
+---
+
+### 客户端 -> 服务端消息
+
+#### 1. Handshake: `hello`
+
+客户端连接后发送的第一个消息，用于协议版本协商和认证。
+
+```json
+{
+  "type": "hello",
+  "version": "v1",
+  "auth": {
+    "apiKey": "optional-api-key",
+    "jwt": "optional-jwt"
+  }
+}
+```
+
+| 字段 | 类型 | 必填 | 说明 |
+|---|---|---|---|
+| `type` | string | 是 | 固定为 `"hello"` |
+| `version` | string | 是 | 协议版本，固定为 `"v1"` |
+| `auth` | object | 否 | 认证信息 |
+
+**认证规则**：
+- 若配置了 `WS_API_KEY`，必须提供匹配的 `apiKey`
+- 若 `WS_REQUIRE_AUTH=true`，至少需要提供 `apiKey` 或 `jwt` 之一
+
+---
+
+#### 2. Session Start: `session.start`
+
+握手成功后发送的第二个消息，用于启动对话会话。
+
+```json
+{
+  "type": "session.start",
+  "audio": {
+    "encoding": "pcm_s16le",
+    "sample_rate_hz": 16000,
+    "channels": 1
+  },
+  "metadata": {
+    "appId": "assistant_123",
+    "channel": "web",
+    "configVersionId": "cfg_20260217_01",
+    "systemPrompt": "你是简洁助手",
+    "greeting": "你好，我能帮你什么？",
+    "output": {
+      "mode": "audio"
+    },
+    "dynamicVariables": {
+      "customer_name": "Alice",
+      "plan_tier": "Pro"
+    }
+  }
+}
+```
+
+| 字段 | 类型 | 必填 | 说明 |
+|---|---|---|---|
+| `type` | string | 是 | 固定为 `"session.start"` |
+| `audio` | object | 否 | 音频格式描述 |
+| `audio.encoding` | string | 否 | 固定为 `"pcm_s16le"` |
+| `audio.sample_rate_hz` | number | 否 | 固定为 `16000` |
+| `audio.channels` | number | 否 | 固定为 `1` |
+| `metadata` | object | 否 | 运行时配置 |
+
+**metadata 支持的字段**：
+- `appId` / `app_id` - 应用 ID
+- `channel` - 渠道标识
+- `configVersionId` / `config_version_id` - 配置版本
+- `systemPrompt` - 系统提示词
+- `greeting` - 开场白
+- `output.mode` - 输出模式 (`audio` / `text`)
+- `dynamicVariables` - 动态变量（支持 `{{variable}}` 占位符）
+
+---
+
+#### 3. Text Input: `input.text`
+
+发送文本输入，跳过 ASR 识别，直接触发 LLM 回复。
+
+```json
+{
+  "type": "input.text",
+  "text": "你能做什么？"
+}
+```
+
+| 字段 | 类型 | 必填 | 说明 |
+|---|---|---|---|
+| `type` | string | 是 | 固定为 `"input.text"` |
+| `text` | string | 是 | 用户文本内容 |
+
+---
+
+#### 4. Response Cancel: `response.cancel`
+
+请求中断当前回答。
+
+```json
+{
+  "type": "response.cancel",
+  "graceful": false
+}
+```
+
+| 字段 | 类型 | 必填 | 默认值 | 说明 |
+|---|---|---|---|---|
+| `type` | string | 是 | - | 固定为 `"response.cancel"` |
+| `graceful` | boolean | 否 | `false` | `false` 立即打断 |
+
+---
+
+#### 5. Tool Call Results: `tool_call.results`
+
+回传客户端执行的工具结果。
+
+```json
+{
+  "type": "tool_call.results",
+  "results": [
+    {
+      "tool_call_id": "call_abc123",
+      "name": "weather",
+      "output": { "temp_c": 21, "condition": "sunny" },
+      "status": { "code": 200, "message": "ok" }
+    }
+  ]
+}
+```
+
+| 字段 | 类型 | 必填 | 说明 |
+|---|---|---|---|
+| `type` | string | 是 | 固定为 `"tool_call.results"` |
+| `results` | array | 否 | 工具结果列表 |
+| `results[].tool_call_id` | string | 是 | 工具调用 ID |
+| `results[].name` | string | 是 | 工具名称 |
+| `results[].output` | any | 否 | 工具输出 |
+| `results[].status` | object | 是 | 执行状态 |
+
+---
+
+#### 6. Session Stop: `session.stop`
+
+结束对话会话。
+
+```json
+{
+  "type": "session.stop",
+  "reason": "client_disconnect"
+}
+```
+
+| 字段 | 类型 | 必填 | 说明 |
+|---|---|---|---|
+| `type` | string | 是 | 固定为 `"session.stop"` |
+| `reason` | string | 否 | 结束原因 |
+
+---
+
+#### 7. Binary Audio
+
+在 `session.started` 之后可持续发送二进制 PCM 音频。
+
+- **格式**：`pcm_s16le`
+- **采样率**：16000 Hz
+- **声道**：1（单声道）
+- **帧长**：20ms = 640 bytes
+
+---
+
+### 服务端 -> 客户端事件
+
+#### 事件包络
+
+所有 JSON 事件都包含统一包络字段：
+
+```json
+{
+  "type": "event.name",
+  "timestamp": 1730000000000,
+  "sessionId": "sess_xxx",
+  "seq": 42,
+  "source": "asr",
+  "trackId": "audio_in",
+  "data": {}
+}
+```
+
+| 字段 | 类型 | 说明 |
+|---|---|---|
+| `type` | string | 事件类型 |
+| `timestamp` | number | 事件时间戳（毫秒） |
+| `sessionId` | string | 会话 ID |
+| `seq` | number | 递增序号 |
+| `source` | string | 事件来源 (`asr`/`llm`/`tts`/`tool`) |
+| `trackId` | string | 事件轨道 (`audio_in`/`audio_out`/`control`) |
+| `data` | object | 业务数据 |
+
+---
+
+#### 会话控制类事件
+
+| 事件 | 说明 |
+|---|---|
+| `hello.ack` | 握手成功响应 |
+| `session.started` | 会话启动成功 |
+| `config.resolved` | 服务端最终配置快照 |
+| `heartbeat` | 保活心跳（默认 50 秒间隔） |
+| `session.stopped` | 会话结束确认 |
+| `error` | 统一错误事件 |
+
+---
+
+#### ASR 识别事件
+
+| 事件 | 字段 | 说明 |
+|---|---|---|
+| `input.speech_started` | `probability` | 检测到语音开始 |
+| `input.speech_stopped` | `probability` | 检测到语音结束 |
+| `transcript.delta` | `text` | ASR 增量识别文本 |
+| `transcript.final` | `text` | ASR 最终识别文本 |
+
+---
+
+#### LLM/TTS 输出事件
+
+| 事件 | 字段 | 说明 |
+|---|---|---|
+| `assistant.response.delta` | `text` | 助手增量文本输出 |
+| `assistant.response.final` | `text` | 助手完整文本输出 |
+| `assistant.tool_call` | `tool_call_id`, `tool_name`, `arguments` | 工具调用通知 |
+| `assistant.tool_result` | `tool_call_id`, `ok`, `result` | 工具执行结果 |
+| `output.audio.start` | - | TTS 音频开始 |
+| `output.audio.end` | - | TTS 音频结束 |
+| `response.interrupted` | - | 回答被打断 |
+| `metrics.ttfb` | `latencyMs` | 首包音频时延 |
+
+---
+
+### 错误码
+
+| 错误码 | 说明 |
+|---|---|
+| `protocol.invalid_json` | JSON 格式错误 |
+| `protocol.invalid_message` | 消息格式错误 |
+| `protocol.order` | 消息顺序错误 |
+| `protocol.version_unsupported` | 协议版本不支持 |
+| `auth.invalid_api_key` | API Key 无效 |
+| `auth.required` | 需要认证 |
+| `audio.invalid_pcm` | PCM 数据无效 |
+| `audio.frame_size_mismatch` | 音频帧大小不匹配 |
+| `server.internal` | 服务端内部错误 |
+
+---
+
+### 心跳与超时
+
+- **心跳间隔**：默认 50 秒（`heartbeat_interval_sec`）
+- **空闲超时**：默认 60 秒（`inactivity_timeout_sec`）
+- 客户端应持续发送音频或轻量消息避免被判定闲置
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index 00115aa..4197f78 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -19,3 +19,5 @@ nav:
       - 自动化测试: "features/autotest.md"
       - 语音合成: "features/voices.md"
   - 部署指南: "deployment.md"
+  - API 参考:
+      - WebSocket: "api-reference.md"