Add output.audio.played message handling and update documentation

- Introduced `output.audio.played` message type for client acknowledgment of audio playback completion.
- Updated `DuplexPipeline` to track client playback state and handle playback completion events.
- Enhanced session handling to route `output.audio.played` messages to the pipeline.
- Revised API documentation to include details about the new message type and its fields.
- Updated schema documentation to reflect the addition of `output.audio.played` in the message flow.
This commit is contained in:
Xin Wang
2026-03-04 10:01:34 +08:00
parent 80fff09b76
commit 7d4af18815
8 changed files with 275 additions and 19 deletions

View File

@@ -20,7 +20,7 @@ Required message order:
1. Client connects to `/ws?assistant_id=<id>`.
2. Client sends `session.start`.
3. Server replies `session.started`.
4. Client may stream binary audio and/or send `input.text`.
4. Client may stream binary audio and/or send `input.text`, `response.cancel`, `output.audio.played`, `tool_call.results`.
5. Client sends `session.stop` (or closes socket).
If order is violated, server emits `error` with `code = "protocol.order"`.
@@ -100,6 +100,22 @@ Text-only mode:
}
```
### `output.audio.played`
Client playback ACK after assistant audio is actually drained on local speakers
(including jitter buffer / playback queue).
```json
{
"type": "output.audio.played",
"tts_id": "tts_001",
"response_id": "resp_001",
"turn_id": "turn_001",
"played_at_ms": 1730000018450,
"played_ms": 2520
}
```
### `session.stop`
```json
@@ -223,6 +239,8 @@ Framing rules:
TTS boundary events:
- `output.audio.start` and `output.audio.end` mark assistant playback boundaries.
- `output.audio.end` means server-side audio send completed (not guaranteed speaker drain).
- For speaker-drain confirmation, client should send `output.audio.played`.
## Event Throttling