Enhance WebSocket session management by requiring assistant_id as a query parameter for connection. Update API reference documentation to reflect changes in message flow and metadata validation rules, including the introduction of whitelists for allowed metadata fields and restrictions on sensitive keys. Refactor client examples to align with the new session initiation process.

This commit is contained in:
Xin Wang
2026-03-01 14:10:38 +08:00
parent b4fa664d73
commit 6a46ec69f4
14 changed files with 725 additions and 424 deletions

View File

@@ -5,7 +5,7 @@ This document defines the public WebSocket protocol for the `/ws` endpoint.
Validation policy:
- WS v1 JSON control messages are validated strictly.
- Unknown top-level fields are rejected for all defined client message types.
- `hello.version` is fixed to `"v1"`.
- `assistant_id` query parameter is required on `/ws`.
## Transport
@@ -17,29 +17,16 @@ Validation policy:
Required message order:
1. Client sends `hello`.
2. Server replies `hello.ack`.
3. Client sends `session.start`.
4. Server replies `session.started`.
5. Client may stream binary audio and/or send `input.text`.
6. Client sends `session.stop` (or closes socket).
1. Client connects to `/ws?assistant_id=<id>`.
2. Client sends `session.start`.
3. Server replies `session.started`.
4. Client may stream binary audio and/or send `input.text`.
5. Client sends `session.stop` (or closes socket).
If order is violated, server emits `error` with `code = "protocol.order"`.
## Client -> Server Messages
### `hello`
```json
{
"type": "hello",
"version": "v1"
}
```
Rules:
- `version` must be `v1`.
### `session.start`
```json
@@ -51,15 +38,18 @@ Rules:
"channels": 1
},
"metadata": {
"appId": "assistant_123",
"channel": "web",
"configVersionId": "cfg_20260217_01",
"client": "web-debug",
"output": {
"mode": "audio"
"source": "web-debug",
"history": {
"userId": 1
},
"overrides": {
"output": {
"mode": "audio"
},
"systemPrompt": "You are concise.",
"greeting": "Hi, how can I help?"
},
"systemPrompt": "You are concise.",
"greeting": "Hi, how can I help?",
"dynamicVariables": {
"customer_name": "Alice",
"plan_tier": "Pro"
@@ -69,9 +59,13 @@ Rules:
```
Rules:
- Client-side `metadata.services` is ignored.
- Service config (including secrets) is resolved server-side (env/backend).
- Client should pass stable IDs (`appId`, `channel`, `configVersionId`) plus small runtime overrides (e.g. `output`, `bargeIn`, greeting/prompt style hints).
- Assistant config is resolved strictly by URL query `assistant_id`.
- `metadata` top-level keys allowed: `overrides`, `dynamicVariables`, `channel`, `source`, `history`, `workflow` (`workflow` is ignored).
- `metadata.overrides` whitelist: `systemPrompt`, `greeting`, `firstTurnMode`, `generatedOpenerEnabled`, `output`, `bargeIn`, `knowledgeBaseId`, `knowledge`, `tools`, `openerAudio`.
- `metadata.services` is rejected with `protocol.invalid_override`.
- `metadata.workflow` is ignored in this MVP protocol version.
- Top-level IDs are forbidden in payload (`assistantId`, `appId`, `app_id`, `configVersionId`, `config_version_id`).
- Secret-like keys are forbidden in metadata (`apiKey`, `token`, `secret`, `password`, `authorization`).
- `metadata.dynamicVariables` is optional and must be an object of string key/value pairs.
- Key pattern: `^[a-zA-Z_][a-zA-Z0-9_]{0,63}$`
- Max entries: 30
@@ -85,7 +79,7 @@ Rules:
- Invalid `dynamicVariables` payload rejects `session.start` with `protocol.dynamic_variables_invalid`.
Text-only mode:
- Set `metadata.output.mode = "text"`.
- Set `metadata.overrides.output.mode = "text"`.
- In this mode server still sends `assistant.response.delta/final`, but will not emit audio frames or `output.audio.start/end`.
### `input.text`
@@ -158,8 +152,6 @@ Envelope notes:
Common events:
- `hello.ack`
- Fields: `sessionId`, `version`
- `session.started`
- Fields: `sessionId`, `trackId`, `tracks`, `audio`
- `config.resolved`
@@ -204,7 +196,7 @@ Common events:
Track IDs (MVP fixed values):
- `audio_in`: ASR/VAD input-side events (`input.*`, `transcript.*`)
- `audio_out`: assistant output-side events (`assistant.*`, `output.audio.*`, `response.interrupted`, `metrics.ttfb`)
- `control`: session/control events (`session.*`, `hello.*`, `error`, `config.resolved`)
- `control`: session/control events (`session.*`, `error`, `config.resolved`)
Correlation IDs (`event.data`):
- `turn_id`: one user-assistant interaction turn.