Two issues were causing TTSSpeakFrame(append_to_context=True) greetings to silently lose their trailing words and never fire on_assistant_turn_stopped: - LLMAssistantPushAggregationFrame was emitted without a PTS, so the transport routed it through the audio (sync) queue while word-level TTSTextFrames travel through the clock queue. The aggregation could reach the assistant aggregator before the final words, leaving them orphaned in the buffer. Stamp the frame with `_word_last_pts + 1` when there are word timestamps so it can't overtake them. - The aggregator's LLMAssistantPushAggregationFrame handler called push_aggregation() directly, bypassing _trigger_assistant_turn_stopped. For TTS-only flows there is no LLMFullResponseStartFrame, so the turn start timestamp was never set and on_assistant_turn_stopped never fired. Open a turn (if needed) and trigger stopped from the handler. Fixes #4264.
26 KiB
26 KiB