pipecat

Author	SHA1	Message	Date
Antoni Silvestre	18368d047e	Linting and changes to adapt to v1.0	2026-05-18 14:40:56 +02:00
asilvestre	e3abb4b6d7	apply suggestions in PR	2026-05-18 14:40:56 +02:00
asilvestre	c61672194d	Vonage Video Connector Transport	2026-05-18 14:40:49 +02:00
Mark Backman	73278d3309	Use majority language for Soniox transcripts	2026-05-14 15:18:43 -04:00
Mark Backman	49bda11ae8	Merge pull request #4482 from pipecat-ai/mb/soniox-stt-token-language Propagate Soniox token language	2026-05-13 16:28:56 -04:00
Mark Backman	078af6969a	Merge pull request #4473 from timofey-TK/inworld-tts-v2 Add support for Inworld TTS v2 fields	2026-05-13 15:32:16 -04:00
Mark Backman	82f0896d6a	Propagate Soniox token language	2026-05-13 15:23:22 -04:00
Mark Backman	08680732f6	Merge pull request #4475 from pipecat-ai/mb/cartesia-korean-fix Fix Cartesia CJK timestamp spacing	2026-05-13 13:20:42 -04:00
Mark Backman	064b68aa01	Fix Cartesia CJK timestamp spacing	2026-05-13 13:13:40 -04:00
Mark Backman	5fef239b68	Merge pull request #4450 from pipecat-ai/mb/gpt-realtime-whisper Default OpenAI Realtime transcription to gpt-realtime-whisper	2026-05-13 09:48:33 -04:00
Timofey	39e7f9e354	Fix Inworld TTS v2 request fields	2026-05-13 11:17:31 +03:00
Mark Backman	644030584f	Centralize OpenAI audio constants	2026-05-12 17:48:53 -04:00
Mark Backman	abd28e2ac1	Update OpenAI realtime transcription default	2026-05-12 15:20:57 -04:00
Paul Kompfner	a52bdef32b	Add reasoning support to OpenAIRealtimeLLMService for gpt-realtime-2	2026-05-12 13:55:19 -04:00
Paul Kompfner	007fa3a3a8	Handle gpt-realtime-2 multi-output-item audio responses A single Realtime API response can now contain more than one audio item (observed with gpt-realtime-2), and the first item's audio.done can arrive after deltas from the second have started arriving. Deltas still arrive strictly in playback order across items, so we keep forwarding them as received — matching OpenAI's reference implementation. Adjusted OpenAIRealtimeLLMService so a multi-item response is treated as one continuous TTS turn: - _handle_evt_audio_delta: on item switch, advance the tracked item in place (reset total_size) without emitting another TTSStartedFrame. Truncation now always targets the latest item. - _handle_evt_audio_done: debug-trace only; no longer pushes TTSStoppedFrame. - _handle_evt_response_done: pushes a single TTSStoppedFrame per turn, bookending the audio with the Started pushed on the first delta. Added tests covering single-item, overlapping multi-item, non-overlapping multi-item, and interrupt-during-multi-item (last-item-wins truncation).	2026-05-12 10:34:50 -04:00
Paul Kompfner	72d0fb418a	fix: restore cancel_on_interruption=False support in AWS Nova Sonic and OpenAI Realtime Before the new async-tool mechanism landed, AWSNovaSonicLLMService and OpenAIRealtimeLLMService honored cancel_on_interruption=False by simply not cancelling in-flight function calls on interruption — the eventual result then flowed through the same channel as any synchronous tool result. The new mechanism (which appends started/intermediate/final messages to the LLM context as the underlying task progresses) broke that path: the realtime services didn't know how to interpret those messages, and the eventual result was never delivered to the provider. Restore the flag's behavior by teaching both services to detect async-tool messages in the context and route them appropriately: - started → skipped silently. The provider already issued the tool call and natively awaits a result; nothing to send for the started marker. - final → delivered via the formal tool-result channel. Same path as a synchronous tool result, just delayed. Streamed intermediate results (FunctionCallResultProperties(is_final= False)) are not supported on these realtime services. An intermediate result is logged as an error and surfaced via push_error, then dropped. Use a non-realtime LLM service if a tool needs to stream intermediate results. (Docstrings on register_function, register_direct_function, and FunctionCallResultProperties.is_final updated to call this out.) A new shared module pipecat.processors.aggregators.async_tool_messages is the single source of truth for the on-the-wire payload shape: the aggregator uses its build_*_message functions when injecting messages, and the realtime services use parse_message when scanning the context. Adds two example files exercising a network-delayed weather tool with each service. The plain realtime-aws-nova-sonic.py example is also reverted to a synchronous tool call now that the async variant lives in its own file. Similar fixes for other realtime services are forthcoming.	2026-05-08 09:33:06 -04:00
Aleix Conchillo Flaqué	b78cecf7b2	Rename UserTurnCompletedFrame to UserTurnInferenceCompletedFrame The old name overlapped semantically with `UserStoppedSpeakingFrame`: both could be read as "the user's turn is done." They're at different layers — `UserStoppedSpeakingFrame` is the acoustic stop signal, while this frame is the post-judgment "inference about the turn is now complete (turn is semantically final)" signal emitted by the LLM mixin (on ✓), an end-of-turn classifier, or a custom producer. The new name pairs naturally with the existing `on_user_turn_inference_triggered` event vocabulary and removes the ambiguity with `UserStoppedSpeakingFrame`.	2026-05-07 17:47:41 -07:00
Aleix Conchillo Flaqué	952dddca8b	Replace llm_completion_user_turn_stop_strategies() with FilterIncompleteUserTurnStrategies Wrap the detector chain with `deferred(...)` and append the LLM completion gate via a `UserTurnStrategies` specialization rather than a free-standing helper, mirroring the existing `ExternalUserTurnStrategies` pattern. The class lives next to other strategy containers in `pipecat.turns.user_turn_strategies`, so users discover it where they're already configuring `user_turn_strategies`. The deprecated `filter_incomplete_user_turns` flag now rewires through `FilterIncompleteUserTurnStrategies` under the hood, keeping the migration path identical to before. `deferred(...)` stays public as the explicit escape hatch for non-default compositions.	2026-05-07 17:47:39 -07:00
Aleix Conchillo Flaqué	e3e90d38aa	Preserve full user transcript across multiple inferences in one turn When a stop-strategy chain splits inference-triggered from finalization (e.g. `LLMTurnCompletionUserTurnStopStrategy` gating a deferred detector), more than one inference can fire inside a single user turn — each adds the new transcription segment to the context. Previously each inference overwrote `_pending_user_turn_aggregation`, so the eventual `on_user_turn_stopped` event surfaced only the segment from the last inference, dropping anything the user said before it. Concatenate each segment into `_full_user_turn_aggregation` instead of overwriting, and combine that running buffer with any post-final- inference segment when emitting the public event.	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	d1c8162b0c	Route turn-completion markers through LLMMarkerFrame Add an `LLMMarkerFrame(DataFrame)` for sideband LLM markers that need to be persisted to context but should not flow through the standard text path (TTS, transcript). The frame carries an `append_to_context_immediately` flag so the assistant aggregator can either commit the marker as a stand-alone message (○ / ◐) or merge it with the upcoming aggregation as a prefix on the response (✓). `UserTurnCompletionLLMServiceMixin` now emits `LLMMarkerFrame` instead of pushing the marker as `LLMTextFrame(skip_tts=True)`, which fixes the case where an incomplete-turn marker (○ / ◐) was aggregated by the assistant aggregator but never committed to the context because the assistant turn lifecycle didn't run to completion (no spoken response, no `LLMFullResponseEndFrame`-driven `push_aggregation`). The frame is intentionally generic so other components — STT services with built-in turn signals, end-of-turn classifiers, custom annotations — can use the same mechanism to inject sideband signals into the assistant context.	2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué	480eca42f5	Split user-turn-stop into inference-triggered and finalized events Fixes a real bug: with `filter_incomplete_user_turns` enabled, the smart-turn detector's tentative stop was firing `on_user_turn_stopped` before the LLM had a chance to veto it. Observers, transcript appenders and UI indicators received an early — and sometimes duplicated — signal. Decomposes the single stop concern into two events: - `on_user_turn_inference_triggered` fires when a stop strategy has enough signal to start LLM inference. The aggregator pushes the context here, kicking off the LLM call. - `on_user_turn_stopped` fires only when the user turn is semantically final. Built-in strategies fire both events at the same call site, preserving today's behavior for the common case. Adds `LLMTurnCompletionUserTurnStopStrategy`, which gates finalization on a `UserTurnCompletedFrame` (a fieldless system frame emitted by any component judging turn completeness — currently the `UserTurnCompletionLLMServiceMixin` on `✓`). Adds `deferred(strategy)` / `DeferredUserTurnStopStrategy`, a thin wrapper that forwards an inner strategy's events except `on_user_turn_stopped`. Use this to install a stop strategy as an inference trigger only, leaving finalization to a peer (e.g. the LLM completion strategy). Adds `llm_completion_user_turn_stop_strategies()` for the common case: UserTurnStrategies( stop=llm_completion_user_turn_stop_strategies(), ) Deprecates `LLMUserAggregatorParams.filter_incomplete_user_turns`. The aggregator emits a `DeprecationWarning`, wraps existing stop strategies with `deferred(...)`, and appends `LLMTurnCompletionUserTurnStopStrategy` automatically.	2026-05-07 17:46:09 -07:00
Mark Backman	1073510574	Merge pull request #4407 from pipecat-ai/mb/ui-agent-wire-format feat(rtvi): add UI Agent Protocol as first-class RTVI message types	2026-05-07 20:03:41 -04:00
Mark Backman	7a2cec2e45	Merge pull request #4426 from marcelodiaz558/feature/elevenlabs_stt_keyterms Add ElevenLabs STT keyterms support	2026-05-07 18:44:09 -04:00
Marcelo Díaz	edfcd6948b	Add ElevenLabs STT keyterms support	2026-05-07 21:00:26 +00:00
kompfner	991ee9e0e6	Merge pull request #4404 from pipecat-ai/pk/mitigate-calls-to-missing-tools Mitigate tool-call-related hallucination	2026-05-07 15:05:13 -04:00
filipi87	cf22dac171	Refactoring TTSService to preserve uninterruptible frames.	2026-05-06 16:26:45 -03:00
Filipi da Silva Fuchter	90f0f7cd27	Merge pull request #4431 from pipecat-ai/filipi/tts_deadlock Fixing TTSService deadlock.	2026-05-06 14:52:04 -03:00
filipi87	a23baf9de6	Fixing TTSService deadlock.	2026-05-06 13:32:26 -03:00
Mark Backman	d18fe7c39c	feat(rtvi): type UI accessibility snapshots	2026-05-06 11:29:19 -04:00
Mark Backman	41124dc494	refactor(rtvi): clarify UI message names	2026-05-06 11:08:25 -04:00
Mark Backman	8c3521f2e4	chore(audio): deprecate ResampyResampler in favor of SOXR resamplers Emits a DeprecationWarning on instantiation. ResampyResampler will be removed in Pipecat 2.0 along with the default resampy and numba dependencies.	2026-05-06 09:40:13 -04:00
Mark Backman	61a81ed87b	fix(elevenlabs): use alignment by default, normalizedAlignment only with pronunciation dicts PR #4344 unconditionally switched to normalizedAlignment to fix garbled words with pronunciation dictionaries (#4316). But normalizedAlignment returns the post-normalized form of what was spoken - including romanization of non-Latin scripts (Chinese rendered as pinyin), which ends up in the LLM context and degrades subsequent turns. Gate the switch on pronunciation_dictionary_locators being configured. Adds a _select_alignment helper with preferred-with-fallback (both fields are nullable per the API schema), used by both the WebSocket and HTTP services. Tests cover dictionary mode, default mode, fallback when preferred is missing or null, and HTTP field-name variants.	2026-05-05 14:49:41 -04:00
Paul Kompfner	e06e0c0282	Mitigate tool-call-related hallucination When tools change mid-conversation, LLMs can produce a few different flavors of tool-call-related hallucination: calling tools that have been removed, avoiding tools that have been re-added, or hallucinating output (made-up answers or tool-call-shaped non-tool-calls) when tools are unavailable. This change introduces an opt-in ``add_tool_change_messages`` flag on the LLM aggregators (preferred entry point: ``LLMContextAggregatorPair( ..., add_tool_change_messages=True)``) that appends a developer-role message to the context whenever ``LLMSetToolsFrame`` changes the set of advertised standard tools. Helps the LLM stay coherent across tool changes by spelling out exactly what just became available or unavailable. Both aggregators participate; whichever handles the frame first wins, and the other (if any) sees an empty diff against the shared context and stays silent — order-independent regardless of whether the frame flows downstream or upstream. Also tightens the existing missing-handler path (introduced in #4301): - Reworded the terminal tool result to a neutral "The function ``X`` is not currently available." (overridable via ``LLMService.MISSING_FUNCTION_CALL_MESSAGE_TEMPLATE``). Previously read "Error: function 'X' is not registered." - Logs at the call site now distinguish developer error (tool advertised but no handler registered → ``logger.error``) from hallucination (tool not advertised → ``logger.warning``). Includes a manual validation harness (``examples/features/features-add-tool-change-messages.py``) that exercises the new ``add_tool_change_messages`` mitigation by flipping tool availability on a turn counter so its effect can be observed end-to-end with the flag on vs. off.	2026-05-05 13:02:43 -04:00
Mark Backman	fa31a2fd63	Merge pull request #4416 from pipecat-ai/mb/pr-4333-aws-credentials-review feat(aws): add shared credential resolver with boto3 chain fallback	2026-05-04 21:48:33 -04:00
Mark Backman	83190d38e9	Merge pull request #4414 from pipecat-ai/mb/fix-ttsspeakframe-assistant-turn-stopped	2026-05-04 18:12:33 -04:00
Mark Backman	7519c26ac5	Merge pull request #4417 from pipecat-ai/mb/resolve-runner-filepath	2026-05-04 18:09:34 -04:00
Mark Backman	89f10dd9a1	test: drop webrtc-dependent test, remove webrtc extra from CI	2026-05-04 16:42:05 -04:00
Mark Backman	e780f759d0	fix: validate download path containment in runner Resolve and contain the user-supplied filename before serving it from the runner's /files endpoint. Also raise a 404 (instead of returning None) when the downloads folder is unset, and use the resolved basename for Content-Disposition.	2026-05-04 16:20:27 -04:00
Daniel Wirjo	35153de28e	feat(aws): add shared credential resolver with boto3 chain fallback AWS Transcribe STT previously only supported credentials via explicit parameters or environment variables. Services running with IAM roles (EKS pod roles, IRSA, ECS task roles, EC2 instance profiles) or SSO couldn't use Transcribe without exporting static credentials. Changes: - Add resolve_credentials() to utils.py providing a standard fallback chain: explicit params → environment variables → boto3 credential provider chain (instance profiles, IRSA, pod roles, SSO, etc.) - Add AWSCredentials dataclass for type-safe credential passing - Update AWSTranscribeSTTService to use resolve_credentials() instead of manual os.getenv() calls - The boto3 fallback is only attempted when both access key and secret key are unresolved, avoiding replacement of explicitly provided creds - boto3 is imported lazily inside the function to avoid hard dependency for services that don't need the fallback chain - Add 7 unit tests covering the credential resolution chain The Bedrock LLM and Polly TTS services already support the full credential chain via aioboto3.Session() and are not modified. Related to #4197	2026-05-04 15:40:06 -04:00
Mark Backman	90e6b51acd	Fix ElevenLabs alignment chunk spacing	2026-05-04 15:15:37 -04:00
Mark Backman	f1a3ee97de	fix: surface TTSSpeakFrame greetings in on_assistant_turn_stopped Two issues were causing TTSSpeakFrame(append_to_context=True) greetings to silently lose their trailing words and never fire on_assistant_turn_stopped: - LLMAssistantPushAggregationFrame was emitted without a PTS, so the transport routed it through the audio (sync) queue while word-level TTSTextFrames travel through the clock queue. The aggregation could reach the assistant aggregator before the final words, leaving them orphaned in the buffer. Stamp the frame with `_word_last_pts + 1` when there are word timestamps so it can't overtake them. - The aggregator's LLMAssistantPushAggregationFrame handler called push_aggregation() directly, bypassing _trigger_assistant_turn_stopped. For TTS-only flows there is no LLMFullResponseStartFrame, so the turn start timestamp was never set and on_assistant_turn_stopped never fired. Open a turn (if needed) and trigger stopped from the handler. Fixes #4264.	2026-05-04 10:41:22 -04:00
Mark Backman	43abca0b06	feat(rtvi): add UI Agent Protocol as first-class RTVI message types The UI Agent Protocol lets server-side AI agents observe and drive a GUI app on the client side through structured RTVI messages. Five new top-level RTVI types in kebab-case, in line with the rest of the protocol: ui-event client → server (named event with payload) ui-command server → client (named command with payload) ui-snapshot client → server (accessibility tree of the page) ui-cancel-task client → server (cancel an in-flight task group) ui-task server → client (task lifecycle envelope) Each ships paired ``Data`` / ``Message`` pydantic models in ``rtvi.models``, following the existing RTVI envelope convention (``BotReady`` / ``BotReadyData``, ``Error`` / ``ErrorData``, etc.). Built-in command payload models (``Toast``, ``Navigate``, ``ScrollTo``, ``Highlight``, ``Focus``, ``Click``, ``SetInputValue``, ``SelectText``) ship alongside; matching default React handlers live in ``@pipecat-ai/client-react``. Bumps the RTVI ``PROTOCOL_VERSION`` from ``1.2.0`` to ``1.3.0``. Purely additive: only new top-level message types are introduced; no existing wire shapes are changed. The major-version compatibility check on ``client-ready`` still passes for older 1.x clients, so old clients continue to connect without warning; they simply will not exercise the new types. The ``RTVIProcessor`` registers a new ``on_ui_message`` event handler that fires for inbound ``ui-event`` / ``ui-snapshot`` / ``ui-cancel-task`` with the parsed Message envelope, mirroring how ``on_client_message`` works for ``client-message``. Five new pipeline frames let pipeline observers and processors see UI traffic the same way they see other RTVI messages, mirroring the frame-and-event pattern used by ``client-message``: RTVIUICommandFrame(command_name, payload) Pushed by downstream code (e.g. ``pipecat-ai-subagents``'s bridge) to send a UI command to the client. Wrapped by the observer into a ``UICommandMessage`` envelope. RTVIUITaskFrame(data: UITaskData) Same shape but for ``ui-task``; wrapped into ``UITaskMessage``. ``UITaskData`` is a discriminated union of the four lifecycle kinds (group_started / task_update / task_completed / group_completed). RTVIUIEventFrame(msg_id, event_name, payload) RTVIUISnapshotFrame(msg_id, tree) RTVIUICancelTaskFrame(msg_id, task_id, reason) Pushed by ``RTVIProcessor._handle_message`` whenever the matching inbound message arrives, alongside firing ``on_ui_message``. Pipeline observers and processors can match on the frame; subscribers like the subagents bridge keep using the event handler. The data layer is the canonical authority for the wire format: higher-level frameworks like ``pipecat-ai-subagents`` build the agent abstractions on top, and single-LLM Pipecat apps can target the same wire format directly via custom tools that emit these typed messages.	2026-05-02 12:09:01 -04:00
Paul Kompfner	c4f5f1ebbb	test, refactor: follow-ups to LLMService generic refactor Two follow-ups now that LLMService is generic over its adapter: - Add an explicit backward-compat test verifying that an LLMService subclass with no generic parameter (the third-party-provider pattern) instantiates and returns a usable adapter. The existing MockLLMService (declared without brackets) already exercised this implicitly, but it's worth a named assertion. - Drop the now-redundant `params: SomeLLMInvocationParams = ...` variable annotations on `adapter.get_llm_invocation_params()` results. Since `get_llm_adapter()` now returns the precise adapter type, and `BaseLLMAdapter` is generic in its invocation-params type, the call already infers the right TypedDict.	2026-05-01 09:36:14 -04:00
kompfner	6d66bbceeb	Merge pull request #4395 from pipecat-ai/pk/app-resources-api-updates Broaden tool_resources to app_resources	2026-04-30 21:19:05 -04:00
Paul Kompfner	1b5c4cfa2a	feat: broaden tool_resources to app_resources Broaden `tool_resources` to `app_resources` for easy access not just in tool handlers but in other places like custom `FrameProcessor`s. Involves 3 changes: - A rename: `tool_resources` -> `app_resources` - A new property on `PipelineTask`: `app_resources` - A new property on `FrameProcessor`: `pipeline_task` Usage in tool handler: async def get_weather(params: FunctionCallParams): resources = cast(MyAppResources, params.app_resources) ... Usage in custom `FrameProcessor`: class MyProcessor(FrameProcessor): async def process_frame(self, frame, direction): await super().process_frame(frame, direction) if self.pipeline_task is not None: resources = cast(MyAppResources, self.pipeline_task.app_resources) ... The previous `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit `DeprecationWarning`s.	2026-04-30 16:16:17 -04:00
Mark Backman	351105a975	test(krisp): scope importlib.metadata.version mock to imports only The four krisp test files installed a process-wide mock of importlib.metadata.version with `patch(...).start()` at module level and never called .stop(). Once any of these files was collected, the mock leaked across the rest of the test session, returning '0.0.0-dev' for every version check. This corrupted unrelated tests that triggered transformers' import-time dependency check (e.g. lazy imports of LocalSmartTurnAnalyzerV3) — transformers saw tqdm=='0.0.0-dev' and refused to load. Wrap the pipecat imports in `with patch(...)` so the mock is active during import (where pipecat's krisp version check needs it) and torn down before any tests run.	2026-04-30 14:16:54 -04:00
kompfner	ce1311f6ba	Merge pull request #4301 from bnovik0v/fix-4300-missing-tool-lifecycle Fail missing tool calls cleanly	2026-04-27 11:54:43 -04:00
borislav	8869e25142	fix: compare bound method by equality, not identity Bound methods are created fresh on each attribute access, so 'self._missing_function_call_handler is self._missing_function_call_handler' is always False. Using 'is' meant the placeholder branch never fired and both warnings logged when a function was missing at queue time. Switch to == so equality compares the underlying function and instance. Strengthen the missing-at-queue-time test to assert the second warning does not fire.	2026-04-27 17:34:31 +02:00
borislav	822392b0d4	fix: re-resolve registry item at execution time Address review feedback: a function may be unregistered between when run_function_calls queues it and when _run_function_call executes it. Restore the live lookup, falling back to the missing-function handler when the entry is gone, so the call still terminates with a normal tool result. Factor the missing-handler item construction into a helper since it's now built in two places.	2026-04-27 17:22:30 +02:00
kompfner	bc29bdb95e	Merge pull request #4371 from Stoic-Angel/feat-global-context Add a global context for tool calls: tool_resources	2026-04-27 10:55:03 -04:00

1 2 3 4 5 ...

525 Commits