Add output.audio.played message handling and update documentation

- Introduced `output.audio.played` message type for client acknowledgment of audio playback completion. - Updated `DuplexPipeline` to track client playback state and handle playback completion events. - Enhanced session handling to route `output.audio.played` messages to the pipeline. - Revised API documentation to include details about the new message type and its fields. - Updated schema documentation to reflect the addition of `output.audio.played` in the message flow.
2026-03-04 10:01:34 +08:00
parent 80fff09b76
commit 7d4af18815
8 changed files with 275 additions and 19 deletions
--- a/engine/docs/ws_v1_schema.md
+++ b/engine/docs/ws_v1_schema.md
@@ -20,7 +20,7 @@ Required message order:
 1. Client connects to `/ws?assistant_id=<id>`.
 2. Client sends `session.start`.
 3. Server replies `session.started`.
-4. Client may stream binary audio and/or send `input.text`.
+4. Client may stream binary audio and/or send `input.text`, `response.cancel`, `output.audio.played`, `tool_call.results`.
 5. Client sends `session.stop` (or closes socket).

 If order is violated, server emits `error` with `code = "protocol.order"`.
@@ -100,6 +100,22 @@ Text-only mode:
 }
 ```

+### `output.audio.played`
+
+Client playback ACK after assistant audio is actually drained on local speakers
+(including jitter buffer / playback queue).
+
+```json
+{
+  "type": "output.audio.played",
+  "tts_id": "tts_001",
+  "response_id": "resp_001",
+  "turn_id": "turn_001",
+  "played_at_ms": 1730000018450,
+  "played_ms": 2520
+}
+```
+
 ### `session.stop`

 ```json
@@ -223,6 +239,8 @@ Framing rules:

 TTS boundary events:
 - `output.audio.start` and `output.audio.end` mark assistant playback boundaries.
+- `output.audio.end` means server-side audio send completed (not guaranteed speaker drain).
+- For speaker-drain confirmation, client should send `output.audio.played`.

 ## Event Throttling