pipecat

Author	SHA1	Message	Date
Mark Backman	f1a3ee97de	fix: surface TTSSpeakFrame greetings in on_assistant_turn_stopped Two issues were causing TTSSpeakFrame(append_to_context=True) greetings to silently lose their trailing words and never fire on_assistant_turn_stopped: - LLMAssistantPushAggregationFrame was emitted without a PTS, so the transport routed it through the audio (sync) queue while word-level TTSTextFrames travel through the clock queue. The aggregation could reach the assistant aggregator before the final words, leaving them orphaned in the buffer. Stamp the frame with `_word_last_pts + 1` when there are word timestamps so it can't overtake them. - The aggregator's LLMAssistantPushAggregationFrame handler called push_aggregation() directly, bypassing _trigger_assistant_turn_stopped. For TTS-only flows there is no LLMFullResponseStartFrame, so the turn start timestamp was never set and on_assistant_turn_stopped never fired. Open a turn (if needed) and trigger stopped from the handler. Fixes #4264.	2026-05-04 10:41:22 -04:00
Ian Lee	b435ddfa44	feat(tts): add includes_inter_frame_spaces flag to word-timestamp API Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and punctuation are already embedded in the token text. When downstream consumers join these tokens with an extra space they produce "hello , world" instead of "hello, world". Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to `add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through `_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`. Defaults to `False` — no behaviour change for existing services. `InworldTTSService` passes `includes_inter_frame_spaces=True` and stops pre-processing tokens in `_calculate_word_times`, returning them verbatim. Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket delivery paths: verbatim text preservation, PTS ordering, text-before-audio ordering, and the Inworld punctuation-token scenario. Made-with: Cursor	2026-04-18 12:03:32 -07:00
Aleix Conchillo Flaqué	b3bb6fdaa5	Modernize Python typing across the codebase Automated via ruff UP006, UP007, UP035, UP045 rules (target: py311): - Replace `typing.List`, `Dict`, `Tuple`, `Set`, `FrozenSet`, `Type` with their built-in equivalents (`list`, `dict`, `tuple`, etc.) - Replace `typing.Optional[X]` with `X \| None` - Replace `typing.Union[X, Y]` with `X \| Y` - Move `Mapping`, `Sequence`, `Callable`, `Awaitable`, `MutableMapping`, `MutableSequence`, `Iterator`, `AsyncIterator`, `AsyncGenerator` imports from `typing` to `collections.abc` - Remove now-unused `typing` imports - Add `from __future__ import annotations` to 5 files that use forward-reference strings in `X \| "Y"` annotations	2026-04-16 09:28:23 -07:00
Mark Backman	5d71de8aad	Fix LLMFullResponseEndFrame racing ahead of final TTSTextFrame Route LLMFullResponseEndFrame through the serialization queue instead of pushing it directly downstream when push_text_frames is enabled. This ensures the frame is emitted only after the audio context is fully drained, preserving correct ordering relative to TTSTextFrames. Previously, the final sentence TTSTextFrame would arrive at the LLMAssistantAggregator after LLMFullResponseEndFrame, causing it to be dropped from the conversation context (especially with RTVI text input where no subsequent interruption would flush the orphaned text).	2026-03-24 15:09:42 -04:00
filipi87	5fd98e1391	Fixing TTS frame order.	2026-03-19 09:43:40 -03:00

5 Commits