pipecat

Author	SHA1	Message	Date
Aleix Conchillo Flaqué	784667bad2	Use inherited create_task/cancel_task in PipelineTask PipelineTask owns its TaskManager but is itself a BaseObject, so it inherits create_task/cancel_task. Replace the explicit self._task_manager.create_task(coro, f"{self}::name") call sites with self.create_task(coro, "name") for consistency with other BaseObject subclasses.	2026-05-08 15:03:44 -07:00
Aleix Conchillo Flaqué	33db71ec32	Call super().setup() in PipelineTask to honor BaseObject contract PipelineTask owns its TaskManager (still constructed in __init__ since TaskObserver needs it eagerly). Adding the explicit `await super().setup(self._task_manager)` in `_setup()` formalizes the BaseObject lifecycle so any future wiring added to BaseObject.setup is picked up automatically.	2026-05-08 15:03:44 -07:00
Aleix Conchillo Flaqué	dc035df0aa	Use inherited create_task/cancel_task in PipelineTask PipelineTask owns its TaskManager but is itself a BaseObject, so it inherits create_task/cancel_task. Replace the explicit self._task_manager.create_task(coro, f"{self}::name") call sites with self.create_task(coro, "name") for consistency with other BaseObject subclasses.	2026-05-08 15:03:44 -07:00
Aleix Conchillo Flaqué	df1b071a13	Move create_task and cancel_task from FrameProcessor to BaseObject Lift the task manager wiring (`_task_manager`, `task_manager` property, `create_task`, `cancel_task`, and `setup(task_manager)`) up to `BaseObject`. Owners propagate the task manager to their child `BaseObject`s via `await child.setup(task_manager)`, matching the existing convention. Removes duplicated `_task_manager` / `task_manager` property / setup implementations from `FrameProcessor`, `FrameProcessorMetrics`, `UserIdleController`, `UserTurnController`, `BaseUserTurnStartStrategy`, and `BaseUserTurnStopStrategy`.	2026-05-08 15:03:44 -07:00
kompfner	95bcebe774	Merge pull request #4448 from pipecat-ai/pk/gemini-live-async-tool-support feat: support cancel_on_interruption=False on Gemini Live (Gemini 2.x)	2026-05-08 16:57:32 -04:00
Paul Kompfner	5509377344	fix(gemini-live-vertex): disable NON_BLOCKING tools GeminiLiveVertexLLMService overrides _supports_non_blocking_tools to return False — Vertex AI's Gemini Live endpoint doesn't yet accept the NON_BLOCKING behavior field on function declarations or the scheduling field on FunctionResponse, and sending either breaks tool calling. Effect: function declarations sent to Vertex no longer carry NON_BLOCKING; FunctionResponses no longer carry scheduling: WHEN_IDLE. Users registering a function with cancel_on_interruption=False against Vertex get the same one-time logger.error + push_error the base class surfaces on Gemini 3.x.	2026-05-08 16:54:15 -04:00
Paul Kompfner	e21180b962	refactor(gemini-live): use inherited LLMService._function_is_async The same registry-lookup helper was hoisted to LLMService in #4447, so drop the local duplicate. Behavior unchanged.	2026-05-08 16:42:54 -04:00
Paul Kompfner	53922819ed	refactor: explicit kind=='final' check in async-tool routing (Gemini Live) Mirrors the same change applied to AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441 / GrokRealtimeLLMService in #4447: replaces the implicit "final happens last" pattern in _process_completed_function_calls with an explicit `if async_payload.kind == "final":` block, plus a trailing defensive `continue` so async-tool messages with an unrecognized kind don't fall through to the regular tool-result handling block.	2026-05-08 16:42:54 -04:00
Paul Kompfner	6faeffb884	chore: add changelog entry for cancel_on_interruption=False on Gemini Live	2026-05-08 16:42:54 -04:00
Paul Kompfner	9086a46900	feat(gemini-live): support cancel_on_interruption=False on supported models Honors cancel_on_interruption=False on Gemini Live for models that support Gemini's NON_BLOCKING tool mechanism (Gemini 2.x at the time of writing). Function declarations registered via register_function(..., cancel_on_interruption=False) are sent with behavior: NON_BLOCKING so the conversation continues while the tool runs; the matching FunctionResponse carries scheduling: WHEN_IDLE so the result lands at a graceful pause rather than mid-sentence. Synchronous (default) tools stay BLOCKING — applying NON_BLOCKING uniformly produced filler responses like "let me look that up for you" on regular calls, since the model knew it would have an opportunity to keep talking while waiting. A new _supports_non_blocking_tools property gates the flow. On models that don't support it (currently Gemini 3.x), the service falls back to plain blocking behavior and surfaces a one-time error + ErrorFrame the moment async-tool messages first appear in the context, explaining that the flag's intent is not achievable. Caveat (Gemini 2.5): an intermittent server-side 1008 "Operation is not implemented" error can fire when realtime input arrives during a pending tool call. We auto-reconnect, but the user may need to repeat what they were saying. The proposed mitigation (https://discuss.ai.google.dev/t/gemini-live-api-websocket-error-1008-operation-is-not-implemented-or-supported-or-enabled/114644/56) of gating realtime input during pending tool calls is fundamentally incompatible with NON_BLOCKING tool calling, so we don't apply it.	2026-05-08 16:42:54 -04:00
Paul Kompfner	1a4a6f4edf	refactor(gemini-live): bring tool-result handling in line with the canonical realtime pattern Lays groundwork for cancel_on_interruption=False support on Gemini Live by restructuring _process_completed_function_calls to match the shape used by AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: a single-pass forward iteration over raw context messages that detects async-tool messages via async_tool_messages.parse_message and routes them — started skipped silently, intermediate logged-as-error and surfaced via push_error, final delivered via the formal FunctionResponse channel. Replaces the prior two-pass structure that went through the adapter for sync results — the service now uses a lightweight self._tool_call_id_to_name map (populated when the model issues tool calls) for the name lookup the adapter used to provide. Extracts a new GeminiLLMAdapter.to_function_response_dict static method for the dict-coercion logic that wraps non-dict tool returns as {value: <result>} for Gemini's FunctionResponse.response field; the adapter's existing inline copy in _from_standard_message uses it too. Example consolidation: - Folds realtime-gemini-live-function-calling.py into the base realtime-gemini-live.py example so the base exercises function calling out of the box (matching realtime-openai.py and realtime-aws-nova-sonic.py). - Renames realtime-gemini-live-vertex-function-calling.py to realtime-gemini-live-vertex.py, mirroring the consolidation. - Adds realtime-gemini-live-async-tool.py. - Updates scripts/evals/run-release-evals.py for the renames. This commit alone doesn't make cancel_on_interruption=False fully work on Gemini Live — additional investigation is pending. This is foundational work to be built on.	2026-05-08 16:42:54 -04:00
kompfner	ff80cde44e	Merge pull request #4447 from pipecat-ai/pk/realtime-async-tool-support-followup fix: extend cancel_on_interruption=False regression fix to remaining realtime services	2026-05-08 16:40:32 -04:00
Paul Kompfner	fb74f7714c	refactor(ultravox): name async-tool result strings after the kinds they serve Renames _ASYNC_TOOL_PLACEHOLDER_RESULT to _ASYNC_TOOL_STARTED_RESULT to match the kind names from async_tool_messages, and lifts the inline "[Async tool result for tool_call_id=...] {result}" into a sibling _ASYNC_TOOL_FINAL_RESULT_TEMPLATE constant for the same reason.	2026-05-08 16:35:14 -04:00
Paul Kompfner	4864eddbc7	feat(ultravox): support cancel_on_interruption=False via placeholder + final-as-text Replaces the prior "log a warning and skip" approach with actual handling of async-tool messages on Ultravox. The catch with Ultravox is that its API freezes the conversation between client_tool_invocation and the matching client_tool_result — there's no "keep talking while the tool runs" channel like NON_BLOCKING on Gemini or function_call_output-without-blocking on OpenAI Realtime. So: - When the model invokes an async-registered function (cancel_on_inter ruption=False), the service immediately ships a placeholder client_tool_result that tells the model "the actual result isn't ready yet; a follow-up will arrive shortly; keep the conversation going". This unfreezes the conversation. The placeholder is sent from _handle_tool_invocation, since the started async-tool message doesn't reach the context-frame path until later. - When the real tool finishes, the final async-tool message lands in the context. _handle_context now forward-iterates and routes async-tool messages: started is a no-op (placeholder already sent), intermediate is logged-as-error and dropped (matching the other realtime services), and final is injected as user-side text via user_text_message with bracketed framing — the only mechanism Ultravox offers for adding non-tool input mid-conversation. Hoists the registry-lookup helper to LLMService as _function_is_async(name) so future services can use the same pattern without re-implementing it. Adds an async-tool example file for Ultravox modeled on the existing ones for the other realtime services.	2026-05-08 16:20:40 -04:00
kompfner	d831930bd0	Merge pull request #4441 from pipecat-ai/pk/realtime-async-tool-support fix: restore cancel_on_interruption=False support in AWS Nova Sonic and OpenAI Realtime	2026-05-08 15:53:20 -04:00
Paul Kompfner	2c65713c99	refactor: explicit kind=='final' check in async-tool routing (Grok) Mirrors the same change applied to AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: replaces the implicit "final happens last" pattern in _process_completed_function_calls with an explicit `if async_payload.kind == "final":` block, plus a trailing defensive `continue` so async-tool messages with an unrecognized kind don't fall through to the regular tool-result handling block.	2026-05-08 15:45:05 -04:00
Paul Kompfner	b14a03d01f	fix: extend cancel_on_interruption=False regression fix to remaining realtime services Applies the same async-tool message routing introduced for AWSNovaSonicLLMService and OpenAIRealtimeLLMService to additional realtime LLM services where the flag's intent ("keep talking while the tool runs") is achievable: - GrokRealtimeLLMService (xAI Realtime — also benefits the deprecated Grok alias since it re-exports the xAI module) - AzureRealtimeLLMService picks up the fix transitively by inheriting from OpenAIRealtimeLLMService — no code change needed. GrokRealtimeLLMService's _process_completed_function_calls now matches the canonical pattern: skip LLMSpecificMessage, detect async-tool messages via parse_message and route them — started skipped silently, intermediate logged as an error and surfaced via push_error, final delivered through the same channel as a synchronous result. UltravoxRealtimeLLMService instead gets a one-time warning when async-tool messages appear in the context. The Ultravox API freezes the conversation during tool execution (https://docs.ultravox.ai/tools/async-tools#custom-tool-timeouts), so the flag's "keep talking while the tool runs" intent isn't achievable there — applying the same code pattern would mislead users into expecting a UX Ultravox can't deliver. Surfacing a clear warning is the right behavior until Ultravox grows true async tool support. Adds async-tool example files for Grok and Azure modeled on the existing Nova Sonic / OpenAI Realtime ones (10s simulated network delay, weather tool registered with cancel_on_interruption=False). Two services remain excluded: - GeminiLiveLLMService — the async-tool path needs deeper investigation. - InworldRealtimeLLMService — appears to have a pre-existing problem with even simple synchronous tool calling on its Realtime API (the request reaches the server fine, but response generation fails with a generic server_error).	2026-05-08 15:43:53 -04:00
Paul Kompfner	ad0f0a1294	refactor: explicit kind=='final' check in async-tool routing Replaces the implicit "final happens last" pattern in _process_completed_function_calls with an explicit `if async_payload.kind == "final":` block in both AWSNovaSonicLLMService and OpenAIRealtimeLLMService. Adds a trailing defensive `continue` so async-tool messages with an unrecognized kind don't fall through to the regular tool-result handling block — clearer at the call site, and safer against future additions to AsyncToolMessageKind.	2026-05-08 15:43:37 -04:00
Paul Kompfner	72d0fb418a	fix: restore cancel_on_interruption=False support in AWS Nova Sonic and OpenAI Realtime Before the new async-tool mechanism landed, AWSNovaSonicLLMService and OpenAIRealtimeLLMService honored cancel_on_interruption=False by simply not cancelling in-flight function calls on interruption — the eventual result then flowed through the same channel as any synchronous tool result. The new mechanism (which appends started/intermediate/final messages to the LLM context as the underlying task progresses) broke that path: the realtime services didn't know how to interpret those messages, and the eventual result was never delivered to the provider. Restore the flag's behavior by teaching both services to detect async-tool messages in the context and route them appropriately: - started → skipped silently. The provider already issued the tool call and natively awaits a result; nothing to send for the started marker. - final → delivered via the formal tool-result channel. Same path as a synchronous tool result, just delayed. Streamed intermediate results (FunctionCallResultProperties(is_final= False)) are not supported on these realtime services. An intermediate result is logged as an error and surfaced via push_error, then dropped. Use a non-realtime LLM service if a tool needs to stream intermediate results. (Docstrings on register_function, register_direct_function, and FunctionCallResultProperties.is_final updated to call this out.) A new shared module pipecat.processors.aggregators.async_tool_messages is the single source of truth for the on-the-wire payload shape: the aggregator uses its build_*_message functions when injecting messages, and the realtime services use parse_message when scanning the context. Adds two example files exercising a network-delayed weather tool with each service. The plain realtime-aws-nova-sonic.py example is also reverted to a synchronous tool call now that the async variant lives in its own file. Similar fixes for other realtime services are forthcoming.	2026-05-08 09:33:06 -04:00
filipi87	c9f0172e9f	Example supporting plain websocket.	2026-05-08 09:46:18 -03:00
filipi87	2638885c62	Adding support for the plain websocket transport.	2026-05-08 09:37:07 -03:00
Aleix Conchillo Flaqué	94a94ee28c	Merge pull request #4405 from pipecat-ai/aleix/user-turn-inference-event Split user-turn-stop into inference-triggered and finalized events	2026-05-07 17:51:57 -07:00
Mark Backman	c46ede8335	Use Sphinx .. deprecated:: directive for deprecated aggregator params Aligns deprecation docstrings on LLMUserAggregatorParams and LLMAssistantAggregatorParams with CONTRIBUTING.md conventions: present-tense parameter descriptions plus a `.. deprecated:: 1.2.0` directive noting replacement and 2.0.0 removal. Also adds a runtime DeprecationWarning for `user_turn_completion_config`, which previously had no warning despite being deprecated.	2026-05-07 17:49:00 -07:00
Mark Backman	457a68ce64	Correct docstrings and comments regarding incomplete_long_timeout duration, 10 sec	2026-05-07 17:47:41 -07:00
Aleix Conchillo Flaqué	b78cecf7b2	Rename UserTurnCompletedFrame to UserTurnInferenceCompletedFrame The old name overlapped semantically with `UserStoppedSpeakingFrame`: both could be read as "the user's turn is done." They're at different layers — `UserStoppedSpeakingFrame` is the acoustic stop signal, while this frame is the post-judgment "inference about the turn is now complete (turn is semantically final)" signal emitted by the LLM mixin (on ✓), an end-of-turn classifier, or a custom producer. The new name pairs naturally with the existing `on_user_turn_inference_triggered` event vocabulary and removes the ambiguity with `UserStoppedSpeakingFrame`.	2026-05-07 17:47:41 -07:00
Aleix Conchillo Flaqué	952dddca8b	Replace llm_completion_user_turn_stop_strategies() with FilterIncompleteUserTurnStrategies Wrap the detector chain with `deferred(...)` and append the LLM completion gate via a `UserTurnStrategies` specialization rather than a free-standing helper, mirroring the existing `ExternalUserTurnStrategies` pattern. The class lives next to other strategy containers in `pipecat.turns.user_turn_strategies`, so users discover it where they're already configuring `user_turn_strategies`. The deprecated `filter_incomplete_user_turns` flag now rewires through `FilterIncompleteUserTurnStrategies` under the hood, keeping the migration path identical to before. `deferred(...)` stays public as the explicit escape hatch for non-default compositions.	2026-05-07 17:47:39 -07:00
Aleix Conchillo Flaqué	e3e90d38aa	Preserve full user transcript across multiple inferences in one turn When a stop-strategy chain splits inference-triggered from finalization (e.g. `LLMTurnCompletionUserTurnStopStrategy` gating a deferred detector), more than one inference can fire inside a single user turn — each adds the new transcription segment to the context. Previously each inference overwrote `_pending_user_turn_aggregation`, so the eventual `on_user_turn_stopped` event surfaced only the segment from the last inference, dropping anything the user said before it. Concatenate each segment into `_full_user_turn_aggregation` instead of overwriting, and combine that running buffer with any post-final- inference segment when emitting the public event.	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	d1c8162b0c	Route turn-completion markers through LLMMarkerFrame Add an `LLMMarkerFrame(DataFrame)` for sideband LLM markers that need to be persisted to context but should not flow through the standard text path (TTS, transcript). The frame carries an `append_to_context_immediately` flag so the assistant aggregator can either commit the marker as a stand-alone message (○ / ◐) or merge it with the upcoming aggregation as a prefix on the response (✓). `UserTurnCompletionLLMServiceMixin` now emits `LLMMarkerFrame` instead of pushing the marker as `LLMTextFrame(skip_tts=True)`, which fixes the case where an incomplete-turn marker (○ / ◐) was aggregated by the assistant aggregator but never committed to the context because the assistant turn lifecycle didn't run to completion (no spoken response, no `LLMFullResponseEndFrame`-driven `push_aggregation`). The frame is intentionally generic so other components — STT services with built-in turn signals, end-of-turn classifiers, custom annotations — can use the same mechanism to inject sideband signals into the assistant context.	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	1fa0310ea8	Add changelog for #4405	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	2281cd8359	Extract ExternalUserTurnCompletionStopStrategy as a reusable base `LLMTurnCompletionUserTurnStopStrategy` previously bundled two concerns: pushing `LLMUpdateSettingsFrame` on `StartFrame`, and finalizing the turn on `UserTurnCompletedFrame`. The latter is producer-agnostic — any component that emits `UserTurnCompletedFrame` (STT with built-in turn detection, dedicated end-of-turn classifiers, custom code) can drive finalization the same way. Move the frame-handling half into a new `ExternalUserTurnCompletionStopStrategy`. The LLM-specific subclass now only adds the settings-frame push and inherits finalization. Mirrors the existing `ExternalUserTurnStopStrategy` naming pattern.	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	480eca42f5	Split user-turn-stop into inference-triggered and finalized events Fixes a real bug: with `filter_incomplete_user_turns` enabled, the smart-turn detector's tentative stop was firing `on_user_turn_stopped` before the LLM had a chance to veto it. Observers, transcript appenders and UI indicators received an early — and sometimes duplicated — signal. Decomposes the single stop concern into two events: - `on_user_turn_inference_triggered` fires when a stop strategy has enough signal to start LLM inference. The aggregator pushes the context here, kicking off the LLM call. - `on_user_turn_stopped` fires only when the user turn is semantically final. Built-in strategies fire both events at the same call site, preserving today's behavior for the common case. Adds `LLMTurnCompletionUserTurnStopStrategy`, which gates finalization on a `UserTurnCompletedFrame` (a fieldless system frame emitted by any component judging turn completeness — currently the `UserTurnCompletionLLMServiceMixin` on `✓`). Adds `deferred(strategy)` / `DeferredUserTurnStopStrategy`, a thin wrapper that forwards an inner strategy's events except `on_user_turn_stopped`. Use this to install a stop strategy as an inference trigger only, leaving finalization to a peer (e.g. the LLM completion strategy). Adds `llm_completion_user_turn_stop_strategies()` for the common case: UserTurnStrategies( stop=llm_completion_user_turn_stop_strategies(), ) Deprecates `LLMUserAggregatorParams.filter_incomplete_user_turns`. The aggregator emits a `DeprecationWarning`, wraps existing stop strategies with `deferred(...)`, and appends `LLMTurnCompletionUserTurnStopStrategy` automatically.	2026-05-07 17:46:09 -07:00
Mark Backman	1073510574	Merge pull request #4407 from pipecat-ai/mb/ui-agent-wire-format feat(rtvi): add UI Agent Protocol as first-class RTVI message types	2026-05-07 20:03:41 -04:00
Mark Backman	47c05f3f30	Simplify changelog entry	2026-05-07 16:58:08 -07:00
Mark Backman	24904b89f5	Merge pull request #4443 from Anrahya/fix-gemini-tts-voice-names fix: correct Gemini TTS voice names	2026-05-07 19:41:30 -04:00
orphis	c78977e4c7	chore: remove Gemini TTS voice name test	2026-05-08 05:03:15 +05:30
Mark Backman	f78b5f9240	Merge pull request #4446 from inworld-ai/ian/inworld-pcm [inworld] default to using PCM encoding	2026-05-07 19:25:57 -04:00
Ian Lee	406f8b730b	[inworld] default to using PCM encoding * server returns audio bytes without headers	2026-05-07 16:05:34 -07:00
Mark Backman	7a2cec2e45	Merge pull request #4426 from marcelodiaz558/feature/elevenlabs_stt_keyterms Add ElevenLabs STT keyterms support	2026-05-07 18:44:09 -04:00
Marcelo Díaz	edfcd6948b	Add ElevenLabs STT keyterms support	2026-05-07 21:00:26 +00:00
kompfner	991ee9e0e6	Merge pull request #4404 from pipecat-ai/pk/mitigate-calls-to-missing-tools Mitigate tool-call-related hallucination	2026-05-07 15:05:13 -04:00
filipi87	cb426cbb14	Fixing format.	2026-05-07 16:04:43 -03:00
filipi87	d39beff817	Fixing format.	2026-05-07 16:01:54 -03:00
filipi87	1eade184f1	Creating a status endpoint to return the available transports.	2026-05-07 15:53:15 -03:00
Mark Backman	a696729343	Merge pull request #4439 from pipecat-ai/mb/fix-deprecation-video-out-bitrate	2026-05-07 14:42:26 -04:00
orphis	ba705e9501	chore: add changelog for Gemini TTS voice fix	2026-05-08 00:11:19 +05:30
orphis	98c370457b	fix: correct Gemini TTS voice names	2026-05-08 00:09:56 +05:30
filipi87	3fa193b983	Unified start route to make all transports available.	2026-05-07 15:34:32 -03:00
Filipi da Silva Fuchter	6189e920e1	Merge pull request #4433 from pipecat-ai/filipi/refactoring_elevenlabs Refactoring ElevenLabs to send close_context as soon as the turn context is complete.	2026-05-07 13:10:36 -03:00
Filipi da Silva Fuchter	73625a273a	Merge pull request #4440 from pipecat-ai/filipi/daily_send_message_issue Fixing a race condition when cleaning up the daily transport.	2026-05-07 13:09:53 -03:00
filipi87	f91a55c97c	Changelog entry for the fix.	2026-05-07 11:32:48 -03:00

... 2 3 4 5 6 ...

9537 Commits