Add output.audio.played message handling and update documentation

- Introduced `output.audio.played` message type for client acknowledgment of audio playback completion.
- Updated `DuplexPipeline` to track client playback state and handle playback completion events.
- Enhanced session handling to route `output.audio.played` messages to the pipeline.
- Revised API documentation to include details about the new message type and its fields.
- Updated schema documentation to reflect the addition of `output.audio.played` in the message flow.
This commit is contained in:
Xin Wang
2026-03-04 10:01:34 +08:00
parent 80fff09b76
commit 7d4af18815
8 changed files with 275 additions and 19 deletions

View File

@@ -30,6 +30,7 @@ Server <- assistant.response.delta / assistant.response.final
Server <- output.audio.start
Server <- (binary pcm frames...)
Server <- output.audio.end
Client -> output.audio.played (optional)
Client -> session.stop
Server <- session.stopped
```
@@ -143,7 +144,33 @@ Server <- session.stopped
---
### 4. Tool Call Results: `tool_call.results`
### 4. Output Audio Played: `output.audio.played`
客户端回执音频已在本地播放完成(含本地 jitter buffer / 播放队列)。
```json
{
"type": "output.audio.played",
"tts_id": "tts_001",
"response_id": "resp_001",
"turn_id": "turn_001",
"played_at_ms": 1730000018450,
"played_ms": 2520
}
```
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
| `type` | string | 是 | 固定为 `"output.audio.played"` |
| `tts_id` | string | 是 | 已完成播放的 TTS 段 ID |
| `response_id` | string | 否 | 所属回复 ID建议回传 |
| `turn_id` | string | 否 | 所属轮次 ID建议回传 |
| `played_at_ms` | number | 否 | 客户端本地播放完成时间戳(毫秒) |
| `played_ms` | number | 否 | 本次播放耗时(毫秒) |
---
### 5. Tool Call Results: `tool_call.results`
回传客户端执行的工具结果。
@@ -174,7 +201,7 @@ Server <- session.stopped
---
### 5. Session Stop: `session.stop`
### 6. Session Stop: `session.stop`
结束对话会话。
@@ -192,7 +219,7 @@ Server <- session.stopped
---
### 6. Binary Audio
### 7. Binary Audio
`session.started` 之后可持续发送二进制 PCM 音频。
@@ -707,6 +734,8 @@ TTS 音频播放结束标记。
| `data.tts_id` | string | TTS 播放段 ID |
| `data.turn_id` | string | 当前对话轮次 ID |
**说明**`output.audio.end` 表示服务端已发送完成,不代表客户端扬声器已播完。若需要“真实播完”信号,客户端应发送 `output.audio.played`
---
#### `response.interrupted`