diff --git a/changelog/4380.fixed.2.md b/changelog/4380.fixed.2.md new file mode 100644 index 000000000..df4b4e07d --- /dev/null +++ b/changelog/4380.fixed.2.md @@ -0,0 +1 @@ +- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time. diff --git a/changelog/4380.fixed.3.md b/changelog/4380.fixed.3.md new file mode 100644 index 000000000..824129ff0 --- /dev/null +++ b/changelog/4380.fixed.3.md @@ -0,0 +1 @@ +- Fixed Cartesia word timestamps leaking SSML tag text (e.g. ``, ``, ``) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input. diff --git a/changelog/4380.fixed.4.md b/changelog/4380.fixed.4.md new file mode 100644 index 000000000..afd634b27 --- /dev/null +++ b/changelog/4380.fixed.4.md @@ -0,0 +1 @@ +- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `4111 1111 1111 1111`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame. diff --git a/changelog/4380.fixed.md b/changelog/4380.fixed.md new file mode 100644 index 000000000..ee61e723a --- /dev/null +++ b/changelog/4380.fixed.md @@ -0,0 +1 @@ +- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.