pipecat

Author	SHA1	Message	Date
Paul Kompfner	2c65713c99	refactor: explicit kind=='final' check in async-tool routing (Grok) Mirrors the same change applied to AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: replaces the implicit "final happens last" pattern in _process_completed_function_calls with an explicit `if async_payload.kind == "final":` block, plus a trailing defensive `continue` so async-tool messages with an unrecognized kind don't fall through to the regular tool-result handling block.	2026-05-08 15:45:05 -04:00
Paul Kompfner	b14a03d01f	fix: extend cancel_on_interruption=False regression fix to remaining realtime services Applies the same async-tool message routing introduced for AWSNovaSonicLLMService and OpenAIRealtimeLLMService to additional realtime LLM services where the flag's intent ("keep talking while the tool runs") is achievable: - GrokRealtimeLLMService (xAI Realtime — also benefits the deprecated Grok alias since it re-exports the xAI module) - AzureRealtimeLLMService picks up the fix transitively by inheriting from OpenAIRealtimeLLMService — no code change needed. GrokRealtimeLLMService's _process_completed_function_calls now matches the canonical pattern: skip LLMSpecificMessage, detect async-tool messages via parse_message and route them — started skipped silently, intermediate logged as an error and surfaced via push_error, final delivered through the same channel as a synchronous result. UltravoxRealtimeLLMService instead gets a one-time warning when async-tool messages appear in the context. The Ultravox API freezes the conversation during tool execution (https://docs.ultravox.ai/tools/async-tools#custom-tool-timeouts), so the flag's "keep talking while the tool runs" intent isn't achievable there — applying the same code pattern would mislead users into expecting a UX Ultravox can't deliver. Surfacing a clear warning is the right behavior until Ultravox grows true async tool support. Adds async-tool example files for Grok and Azure modeled on the existing Nova Sonic / OpenAI Realtime ones (10s simulated network delay, weather tool registered with cancel_on_interruption=False). Two services remain excluded: - GeminiLiveLLMService — the async-tool path needs deeper investigation. - InworldRealtimeLLMService — appears to have a pre-existing problem with even simple synchronous tool calling on its Realtime API (the request reaches the server fine, but response generation fails with a generic server_error).	2026-05-08 15:43:53 -04:00
Paul Kompfner	ad0f0a1294	refactor: explicit kind=='final' check in async-tool routing Replaces the implicit "final happens last" pattern in _process_completed_function_calls with an explicit `if async_payload.kind == "final":` block in both AWSNovaSonicLLMService and OpenAIRealtimeLLMService. Adds a trailing defensive `continue` so async-tool messages with an unrecognized kind don't fall through to the regular tool-result handling block — clearer at the call site, and safer against future additions to AsyncToolMessageKind.	2026-05-08 15:43:37 -04:00
Paul Kompfner	72d0fb418a	fix: restore cancel_on_interruption=False support in AWS Nova Sonic and OpenAI Realtime Before the new async-tool mechanism landed, AWSNovaSonicLLMService and OpenAIRealtimeLLMService honored cancel_on_interruption=False by simply not cancelling in-flight function calls on interruption — the eventual result then flowed through the same channel as any synchronous tool result. The new mechanism (which appends started/intermediate/final messages to the LLM context as the underlying task progresses) broke that path: the realtime services didn't know how to interpret those messages, and the eventual result was never delivered to the provider. Restore the flag's behavior by teaching both services to detect async-tool messages in the context and route them appropriately: - started → skipped silently. The provider already issued the tool call and natively awaits a result; nothing to send for the started marker. - final → delivered via the formal tool-result channel. Same path as a synchronous tool result, just delayed. Streamed intermediate results (FunctionCallResultProperties(is_final= False)) are not supported on these realtime services. An intermediate result is logged as an error and surfaced via push_error, then dropped. Use a non-realtime LLM service if a tool needs to stream intermediate results. (Docstrings on register_function, register_direct_function, and FunctionCallResultProperties.is_final updated to call this out.) A new shared module pipecat.processors.aggregators.async_tool_messages is the single source of truth for the on-the-wire payload shape: the aggregator uses its build_*_message functions when injecting messages, and the realtime services use parse_message when scanning the context. Adds two example files exercising a network-delayed weather tool with each service. The plain realtime-aws-nova-sonic.py example is also reverted to a synchronous tool call now that the async variant lives in its own file. Similar fixes for other realtime services are forthcoming.	2026-05-08 09:33:06 -04:00
Aleix Conchillo Flaqué	94a94ee28c	Merge pull request #4405 from pipecat-ai/aleix/user-turn-inference-event Split user-turn-stop into inference-triggered and finalized events	2026-05-07 17:51:57 -07:00
Mark Backman	c46ede8335	Use Sphinx .. deprecated:: directive for deprecated aggregator params Aligns deprecation docstrings on LLMUserAggregatorParams and LLMAssistantAggregatorParams with CONTRIBUTING.md conventions: present-tense parameter descriptions plus a `.. deprecated:: 1.2.0` directive noting replacement and 2.0.0 removal. Also adds a runtime DeprecationWarning for `user_turn_completion_config`, which previously had no warning despite being deprecated.	2026-05-07 17:49:00 -07:00
Mark Backman	457a68ce64	Correct docstrings and comments regarding incomplete_long_timeout duration, 10 sec	2026-05-07 17:47:41 -07:00
Aleix Conchillo Flaqué	b78cecf7b2	Rename UserTurnCompletedFrame to UserTurnInferenceCompletedFrame The old name overlapped semantically with `UserStoppedSpeakingFrame`: both could be read as "the user's turn is done." They're at different layers — `UserStoppedSpeakingFrame` is the acoustic stop signal, while this frame is the post-judgment "inference about the turn is now complete (turn is semantically final)" signal emitted by the LLM mixin (on ✓), an end-of-turn classifier, or a custom producer. The new name pairs naturally with the existing `on_user_turn_inference_triggered` event vocabulary and removes the ambiguity with `UserStoppedSpeakingFrame`.	2026-05-07 17:47:41 -07:00
Aleix Conchillo Flaqué	952dddca8b	Replace llm_completion_user_turn_stop_strategies() with FilterIncompleteUserTurnStrategies Wrap the detector chain with `deferred(...)` and append the LLM completion gate via a `UserTurnStrategies` specialization rather than a free-standing helper, mirroring the existing `ExternalUserTurnStrategies` pattern. The class lives next to other strategy containers in `pipecat.turns.user_turn_strategies`, so users discover it where they're already configuring `user_turn_strategies`. The deprecated `filter_incomplete_user_turns` flag now rewires through `FilterIncompleteUserTurnStrategies` under the hood, keeping the migration path identical to before. `deferred(...)` stays public as the explicit escape hatch for non-default compositions.	2026-05-07 17:47:39 -07:00
Aleix Conchillo Flaqué	e3e90d38aa	Preserve full user transcript across multiple inferences in one turn When a stop-strategy chain splits inference-triggered from finalization (e.g. `LLMTurnCompletionUserTurnStopStrategy` gating a deferred detector), more than one inference can fire inside a single user turn — each adds the new transcription segment to the context. Previously each inference overwrote `_pending_user_turn_aggregation`, so the eventual `on_user_turn_stopped` event surfaced only the segment from the last inference, dropping anything the user said before it. Concatenate each segment into `_full_user_turn_aggregation` instead of overwriting, and combine that running buffer with any post-final- inference segment when emitting the public event.	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	d1c8162b0c	Route turn-completion markers through LLMMarkerFrame Add an `LLMMarkerFrame(DataFrame)` for sideband LLM markers that need to be persisted to context but should not flow through the standard text path (TTS, transcript). The frame carries an `append_to_context_immediately` flag so the assistant aggregator can either commit the marker as a stand-alone message (○ / ◐) or merge it with the upcoming aggregation as a prefix on the response (✓). `UserTurnCompletionLLMServiceMixin` now emits `LLMMarkerFrame` instead of pushing the marker as `LLMTextFrame(skip_tts=True)`, which fixes the case where an incomplete-turn marker (○ / ◐) was aggregated by the assistant aggregator but never committed to the context because the assistant turn lifecycle didn't run to completion (no spoken response, no `LLMFullResponseEndFrame`-driven `push_aggregation`). The frame is intentionally generic so other components — STT services with built-in turn signals, end-of-turn classifiers, custom annotations — can use the same mechanism to inject sideband signals into the assistant context.	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	1fa0310ea8	Add changelog for #4405	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	2281cd8359	Extract ExternalUserTurnCompletionStopStrategy as a reusable base `LLMTurnCompletionUserTurnStopStrategy` previously bundled two concerns: pushing `LLMUpdateSettingsFrame` on `StartFrame`, and finalizing the turn on `UserTurnCompletedFrame`. The latter is producer-agnostic — any component that emits `UserTurnCompletedFrame` (STT with built-in turn detection, dedicated end-of-turn classifiers, custom code) can drive finalization the same way. Move the frame-handling half into a new `ExternalUserTurnCompletionStopStrategy`. The LLM-specific subclass now only adds the settings-frame push and inherits finalization. Mirrors the existing `ExternalUserTurnStopStrategy` naming pattern.	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	480eca42f5	Split user-turn-stop into inference-triggered and finalized events Fixes a real bug: with `filter_incomplete_user_turns` enabled, the smart-turn detector's tentative stop was firing `on_user_turn_stopped` before the LLM had a chance to veto it. Observers, transcript appenders and UI indicators received an early — and sometimes duplicated — signal. Decomposes the single stop concern into two events: - `on_user_turn_inference_triggered` fires when a stop strategy has enough signal to start LLM inference. The aggregator pushes the context here, kicking off the LLM call. - `on_user_turn_stopped` fires only when the user turn is semantically final. Built-in strategies fire both events at the same call site, preserving today's behavior for the common case. Adds `LLMTurnCompletionUserTurnStopStrategy`, which gates finalization on a `UserTurnCompletedFrame` (a fieldless system frame emitted by any component judging turn completeness — currently the `UserTurnCompletionLLMServiceMixin` on `✓`). Adds `deferred(strategy)` / `DeferredUserTurnStopStrategy`, a thin wrapper that forwards an inner strategy's events except `on_user_turn_stopped`. Use this to install a stop strategy as an inference trigger only, leaving finalization to a peer (e.g. the LLM completion strategy). Adds `llm_completion_user_turn_stop_strategies()` for the common case: UserTurnStrategies( stop=llm_completion_user_turn_stop_strategies(), ) Deprecates `LLMUserAggregatorParams.filter_incomplete_user_turns`. The aggregator emits a `DeprecationWarning`, wraps existing stop strategies with `deferred(...)`, and appends `LLMTurnCompletionUserTurnStopStrategy` automatically.	2026-05-07 17:46:09 -07:00
Mark Backman	1073510574	Merge pull request #4407 from pipecat-ai/mb/ui-agent-wire-format feat(rtvi): add UI Agent Protocol as first-class RTVI message types	2026-05-07 20:03:41 -04:00
Mark Backman	47c05f3f30	Simplify changelog entry	2026-05-07 16:58:08 -07:00
Mark Backman	24904b89f5	Merge pull request #4443 from Anrahya/fix-gemini-tts-voice-names fix: correct Gemini TTS voice names	2026-05-07 19:41:30 -04:00
orphis	c78977e4c7	chore: remove Gemini TTS voice name test	2026-05-08 05:03:15 +05:30
Mark Backman	f78b5f9240	Merge pull request #4446 from inworld-ai/ian/inworld-pcm [inworld] default to using PCM encoding	2026-05-07 19:25:57 -04:00
Ian Lee	406f8b730b	[inworld] default to using PCM encoding * server returns audio bytes without headers	2026-05-07 16:05:34 -07:00
Mark Backman	7a2cec2e45	Merge pull request #4426 from marcelodiaz558/feature/elevenlabs_stt_keyterms Add ElevenLabs STT keyterms support	2026-05-07 18:44:09 -04:00
Marcelo Díaz	edfcd6948b	Add ElevenLabs STT keyterms support	2026-05-07 21:00:26 +00:00
kompfner	991ee9e0e6	Merge pull request #4404 from pipecat-ai/pk/mitigate-calls-to-missing-tools Mitigate tool-call-related hallucination	2026-05-07 15:05:13 -04:00
Mark Backman	a696729343	Merge pull request #4439 from pipecat-ai/mb/fix-deprecation-video-out-bitrate	2026-05-07 14:42:26 -04:00
orphis	ba705e9501	chore: add changelog for Gemini TTS voice fix	2026-05-08 00:11:19 +05:30
orphis	98c370457b	fix: correct Gemini TTS voice names	2026-05-08 00:09:56 +05:30
Filipi da Silva Fuchter	6189e920e1	Merge pull request #4433 from pipecat-ai/filipi/refactoring_elevenlabs Refactoring ElevenLabs to send close_context as soon as the turn context is complete.	2026-05-07 13:10:36 -03:00
Filipi da Silva Fuchter	73625a273a	Merge pull request #4440 from pipecat-ai/filipi/daily_send_message_issue Fixing a race condition when cleaning up the daily transport.	2026-05-07 13:09:53 -03:00
filipi87	f91a55c97c	Changelog entry for the fix.	2026-05-07 11:32:48 -03:00
filipi87	5f256e241c	Fixing a race condition when cleaning up the daily transport.	2026-05-07 11:29:57 -03:00
Mark Backman	954f63dc7b	Document deprecation docstring convention in CLAUDE.md. Adds an explicit Code Style bullet for the `.. deprecated::` Sphinx directive (forbidding inline `[DEPRECATED]` tags) and extends the Docstring Example with a Pydantic params class showing the directive inside a `Parameters:` block — the context CONTRIBUTING.md's existing example didn't cover.	2026-05-07 10:03:43 -04:00
Mark Backman	6cc66a3df1	Update video_out_bitrate deprecation to use sphinx directive. Replaces the inline `[DEPRECATED]` tag with a `.. deprecated:: 1.1.0` directive per CONTRIBUTING.md docstring conventions, so the deprecation shows up properly in the rendered docs.	2026-05-07 09:57:21 -04:00
filipi87	a445399337	Fixing a bug in the ElevenLabs TTS refactor where alignment state was reset too early mid-turn.	2026-05-07 10:10:54 -03:00
filipi87	5ed2057599	Merge branch 'main' into filipi/refactoring_elevenlabs	2026-05-07 09:32:53 -03:00
Filipi da Silva Fuchter	cacde00e26	Merge pull request #4435 from pipecat-ai/filipi/uninterruptible_frame Refactoring TTSService to preserve uninterruptible frames.	2026-05-07 08:46:42 -03:00
Filipi da Silva Fuchter	b1b598f65e	Merge pull request #4434 from pipecat-ai/filipi/fix_interruption_regression Fix interruption blocked by slow non-uninterruptible frame in queue	2026-05-07 08:46:10 -03:00
filipi87	c48ee93892	Adding changelog entry for the fix.	2026-05-06 16:30:22 -03:00
filipi87	cf22dac171	Refactoring TTSService to preserve uninterruptible frames.	2026-05-06 16:26:45 -03:00
filipi87	36f6e22aee	Adding changelog for the interruption fix.	2026-05-06 15:39:27 -03:00
filipi87	921a7a46cb	Fix interruption blocked by slow non-uninterruptible frame in queue When a non-uninterruptible frame was being processed slowly and an uninterruptible frame was waiting in the queue, _start_interruption skipped task cancellation. This caused interruptions to stall until the slow frame finished, even though it had no reason to block them. The fix: only skip cancellation when the current frame is uninterruptible. Uninterruptible frames already in the queue are preserved regardless, because __create_process_task calls __reset_process_queue internally, which always retains them. Fixes: https://github.com/pipecat-ai/pipecat/issues/4412	2026-05-06 15:35:43 -03:00
filipi87	fda18a9afa	Adding changelog for the elevenlabs improvement.	2026-05-06 14:58:18 -03:00
filipi87	d146a7f8e0	Refactoring ElevenLabs to send close_context as soon as the turn context is complete.	2026-05-06 14:55:49 -03:00
Filipi da Silva Fuchter	90f0f7cd27	Merge pull request #4431 from pipecat-ai/filipi/tts_deadlock Fixing TTSService deadlock.	2026-05-06 14:52:04 -03:00
Mark Backman	37376b3506	Merge pull request #4429 from pipecat-ai/mb/update-grok-default-llm-model fix(xai): update default Grok model to grok-4.20-non-reasoning	2026-05-06 13:41:05 -04:00
Mark Backman	729418c2b7	Merge pull request #4428 from pipecat-ai/mb/deprecate-resampy chore(audio): deprecate ResampyResampler	2026-05-06 13:40:51 -04:00
filipi87	4512038a17	Creating a changelog entry for the fix.	2026-05-06 13:36:20 -03:00
filipi87	a23baf9de6	Fixing TTSService deadlock.	2026-05-06 13:32:26 -03:00
Mark Backman	d18fe7c39c	feat(rtvi): type UI accessibility snapshots	2026-05-06 11:29:19 -04:00
Mark Backman	41124dc494	refactor(rtvi): clarify UI message names	2026-05-06 11:08:25 -04:00
Filipi da Silva Fuchter	95db08646c	Merge pull request #4430 from pipecat-ai/filipi/flux_audio Implementing dynamic watchdog timeout for Deepgram Flux STT	2026-05-06 11:40:06 -03:00

1 2 3 4 5 ...

9364 Commits