pipecat

Author	SHA1	Message	Date
Mark Backman	28f9203401	Code review fixes	2026-05-21 11:45:17 -04:00
joycech333	77cc314a08	feat: add Inception LLM service with Mercury-2 support Adds InceptionLLMService, an OpenAI-compatible service for Inception's Mercury-2 diffusion-based reasoning model. Supports reasoning_effort (instant/low/medium/high) and realtime mode for reduced TTFT.	2026-05-21 11:23:23 -04:00
mihafabcic-soniox	86a5710801	Add max_endpoint_delay_ms and clean up Sonoix STT settings (#4521 )	2026-05-20 17:54:48 -04:00
asilvestre	bc769eaa82	Changing the example to use OpenAI	2026-05-18 14:40:56 +02:00
asilvestre	dd38fbc735	add documentation entry	2026-05-18 14:40:56 +02:00
asilvestre	c61672194d	Vonage Video Connector Transport	2026-05-18 14:40:49 +02:00
Aleix Conchillo Flaqué	b6ecce754b	Merge pull request #4501 from pipecat-ai/aleix/fix-filter-incomplete-tool-calls Fix filter-incomplete + function-calling deadlock	2026-05-15 15:11:45 -07:00
Aleix Conchillo Flaqué	63064860ef	Move OpenAITTSService instructions into Settings in the example Mirrors the deprecation in ``OpenAITTSService.__init__``: ``instructions`` is now a Settings field. The constructor still accepts it for backward compatibility but the canonical path is through ``Settings``.	2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué	f5158d51e7	Add filter-incomplete + function-calling turn-management example A copy of ``turn-management-filter-incomplete-turns.py`` extended with a ``get_weather(location)`` direct function. Exercises the path where the LLM responds to a complete user turn by calling a tool — used to reproduce (and now verify the fix for) the ``_user_speaking`` gating bug between filter-incomplete and function calls.	2026-05-15 14:54:51 -07:00
Mark Backman	5403aa56e4	Remove Gradium endpoint overrides from voice example Drop the explicit US-region URLs so the example picks up the new region-neutral defaults in GradiumSTTService and GradiumTTSService.	2026-05-15 15:17:12 -04:00
Mark Backman	0e0d76d020	Update Gradium endpoints to region-neutral URLs Drop the EU-region default from the STT/TTS WebSocket URLs in favor of the generic api.gradium.ai endpoint, and remove the explicit overrides from the examples so they pick up the new defaults.	2026-05-15 15:02:05 -04:00
Aleix Conchillo Flaqué	22650b1b56	Move QwenLLMService model into Settings in the qwen example Mirrors the deprecation in ``QwenLLMService.__init__``: ``model`` should be passed via ``settings=QwenLLMService.Settings(model=...)`` instead of as a direct constructor arg.	2026-05-14 13:22:07 -07:00
Mark Backman	49bda11ae8	Merge pull request #4482 from pipecat-ai/mb/soniox-stt-token-language Propagate Soniox token language	2026-05-13 16:28:56 -04:00
Mark Backman	82f0896d6a	Propagate Soniox token language	2026-05-13 15:23:22 -04:00
kompfner	7e4cd23de4	Merge pull request #4474 from pipecat-ai/pk/inworld-realtime-tools Extend cancel_on_interruption=False to Inworld Realtime (best-effort + warning)	2026-05-13 15:12:34 -04:00
Mark Backman	5fef239b68	Merge pull request #4450 from pipecat-ai/mb/gpt-realtime-whisper Default OpenAI Realtime transcription to gpt-realtime-whisper	2026-05-13 09:48:33 -04:00
Filipi da Silva Fuchter	9148e307cc	Merge pull request #4464 from pipecat-ai/filipi/nvidia_sagemaker NVidia sagemaker - TTS and STT services	2026-05-13 07:53:26 -03:00
Filipi da Silva Fuchter	703d23b658	Update examples/voice/voice-nvidia-sagemaker.py Co-authored-by: Mark Backman <mark@daily.co>	2026-05-13 06:36:57 -04:00
Filipi da Silva Fuchter	227ba288da	Update examples/voice/voice-nvidia-sagemaker.py Co-authored-by: Mark Backman <mark@daily.co>	2026-05-13 06:36:45 -04:00
filipi87	bea9e4b3ba	New example voice-nvidia-sagemaker.py	2026-05-12 17:44:11 -03:00
Paul Kompfner	58333b2705	Extend cancel_on_interruption=False to InworldRealtimeLLMService (best-effort) Same async-tool routing approach as #4441: detect async-tool messages in the LLM context, deliver the final result via the formal tool-result channel. Caveat: as of this writing, Inworld Realtime doesn't appear to handle the resulting delayed tool result reliably, so the routing is best-effort and the service emits a one-time warning when async-tool messages are seen. Streamed intermediate results remain unsupported. Also adds function calling to the realtime-inworld.py example, and softens the Inworld mention in the #4447 changelog now that the exclusion is being closed.	2026-05-12 16:03:34 -04:00
Mark Backman	abd28e2ac1	Update OpenAI realtime transcription default	2026-05-12 15:20:57 -04:00
Paul Kompfner	a52bdef32b	Add reasoning support to OpenAIRealtimeLLMService for gpt-realtime-2	2026-05-12 13:55:19 -04:00
Paul Kompfner	1a4a6f4edf	refactor(gemini-live): bring tool-result handling in line with the canonical realtime pattern Lays groundwork for cancel_on_interruption=False support on Gemini Live by restructuring _process_completed_function_calls to match the shape used by AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: a single-pass forward iteration over raw context messages that detects async-tool messages via async_tool_messages.parse_message and routes them — started skipped silently, intermediate logged-as-error and surfaced via push_error, final delivered via the formal FunctionResponse channel. Replaces the prior two-pass structure that went through the adapter for sync results — the service now uses a lightweight self._tool_call_id_to_name map (populated when the model issues tool calls) for the name lookup the adapter used to provide. Extracts a new GeminiLLMAdapter.to_function_response_dict static method for the dict-coercion logic that wraps non-dict tool returns as {value: <result>} for Gemini's FunctionResponse.response field; the adapter's existing inline copy in _from_standard_message uses it too. Example consolidation: - Folds realtime-gemini-live-function-calling.py into the base realtime-gemini-live.py example so the base exercises function calling out of the box (matching realtime-openai.py and realtime-aws-nova-sonic.py). - Renames realtime-gemini-live-vertex-function-calling.py to realtime-gemini-live-vertex.py, mirroring the consolidation. - Adds realtime-gemini-live-async-tool.py. - Updates scripts/evals/run-release-evals.py for the renames. This commit alone doesn't make cancel_on_interruption=False fully work on Gemini Live — additional investigation is pending. This is foundational work to be built on.	2026-05-08 16:42:54 -04:00
Paul Kompfner	4864eddbc7	feat(ultravox): support cancel_on_interruption=False via placeholder + final-as-text Replaces the prior "log a warning and skip" approach with actual handling of async-tool messages on Ultravox. The catch with Ultravox is that its API freezes the conversation between client_tool_invocation and the matching client_tool_result — there's no "keep talking while the tool runs" channel like NON_BLOCKING on Gemini or function_call_output-without-blocking on OpenAI Realtime. So: - When the model invokes an async-registered function (cancel_on_inter ruption=False), the service immediately ships a placeholder client_tool_result that tells the model "the actual result isn't ready yet; a follow-up will arrive shortly; keep the conversation going". This unfreezes the conversation. The placeholder is sent from _handle_tool_invocation, since the started async-tool message doesn't reach the context-frame path until later. - When the real tool finishes, the final async-tool message lands in the context. _handle_context now forward-iterates and routes async-tool messages: started is a no-op (placeholder already sent), intermediate is logged-as-error and dropped (matching the other realtime services), and final is injected as user-side text via user_text_message with bracketed framing — the only mechanism Ultravox offers for adding non-tool input mid-conversation. Hoists the registry-lookup helper to LLMService as _function_is_async(name) so future services can use the same pattern without re-implementing it. Adds an async-tool example file for Ultravox modeled on the existing ones for the other realtime services.	2026-05-08 16:20:40 -04:00
Paul Kompfner	b14a03d01f	fix: extend cancel_on_interruption=False regression fix to remaining realtime services Applies the same async-tool message routing introduced for AWSNovaSonicLLMService and OpenAIRealtimeLLMService to additional realtime LLM services where the flag's intent ("keep talking while the tool runs") is achievable: - GrokRealtimeLLMService (xAI Realtime — also benefits the deprecated Grok alias since it re-exports the xAI module) - AzureRealtimeLLMService picks up the fix transitively by inheriting from OpenAIRealtimeLLMService — no code change needed. GrokRealtimeLLMService's _process_completed_function_calls now matches the canonical pattern: skip LLMSpecificMessage, detect async-tool messages via parse_message and route them — started skipped silently, intermediate logged as an error and surfaced via push_error, final delivered through the same channel as a synchronous result. UltravoxRealtimeLLMService instead gets a one-time warning when async-tool messages appear in the context. The Ultravox API freezes the conversation during tool execution (https://docs.ultravox.ai/tools/async-tools#custom-tool-timeouts), so the flag's "keep talking while the tool runs" intent isn't achievable there — applying the same code pattern would mislead users into expecting a UX Ultravox can't deliver. Surfacing a clear warning is the right behavior until Ultravox grows true async tool support. Adds async-tool example files for Grok and Azure modeled on the existing Nova Sonic / OpenAI Realtime ones (10s simulated network delay, weather tool registered with cancel_on_interruption=False). Two services remain excluded: - GeminiLiveLLMService — the async-tool path needs deeper investigation. - InworldRealtimeLLMService — appears to have a pre-existing problem with even simple synchronous tool calling on its Realtime API (the request reaches the server fine, but response generation fails with a generic server_error).	2026-05-08 15:43:53 -04:00
Paul Kompfner	72d0fb418a	fix: restore cancel_on_interruption=False support in AWS Nova Sonic and OpenAI Realtime Before the new async-tool mechanism landed, AWSNovaSonicLLMService and OpenAIRealtimeLLMService honored cancel_on_interruption=False by simply not cancelling in-flight function calls on interruption — the eventual result then flowed through the same channel as any synchronous tool result. The new mechanism (which appends started/intermediate/final messages to the LLM context as the underlying task progresses) broke that path: the realtime services didn't know how to interpret those messages, and the eventual result was never delivered to the provider. Restore the flag's behavior by teaching both services to detect async-tool messages in the context and route them appropriately: - started → skipped silently. The provider already issued the tool call and natively awaits a result; nothing to send for the started marker. - final → delivered via the formal tool-result channel. Same path as a synchronous tool result, just delayed. Streamed intermediate results (FunctionCallResultProperties(is_final= False)) are not supported on these realtime services. An intermediate result is logged as an error and surfaced via push_error, then dropped. Use a non-realtime LLM service if a tool needs to stream intermediate results. (Docstrings on register_function, register_direct_function, and FunctionCallResultProperties.is_final updated to call this out.) A new shared module pipecat.processors.aggregators.async_tool_messages is the single source of truth for the on-the-wire payload shape: the aggregator uses its build_*_message functions when injecting messages, and the realtime services use parse_message when scanning the context. Adds two example files exercising a network-delayed weather tool with each service. The plain realtime-aws-nova-sonic.py example is also reverted to a synchronous tool call now that the async variant lives in its own file. Similar fixes for other realtime services are forthcoming.	2026-05-08 09:33:06 -04:00
Mark Backman	457a68ce64	Correct docstrings and comments regarding incomplete_long_timeout duration, 10 sec	2026-05-07 17:47:41 -07:00
Aleix Conchillo Flaqué	952dddca8b	Replace llm_completion_user_turn_stop_strategies() with FilterIncompleteUserTurnStrategies Wrap the detector chain with `deferred(...)` and append the LLM completion gate via a `UserTurnStrategies` specialization rather than a free-standing helper, mirroring the existing `ExternalUserTurnStrategies` pattern. The class lives next to other strategy containers in `pipecat.turns.user_turn_strategies`, so users discover it where they're already configuring `user_turn_strategies`. The deprecated `filter_incomplete_user_turns` flag now rewires through `FilterIncompleteUserTurnStrategies` under the hood, keeping the migration path identical to before. `deferred(...)` stays public as the explicit escape hatch for non-default compositions.	2026-05-07 17:47:39 -07:00
Aleix Conchillo Flaqué	480eca42f5	Split user-turn-stop into inference-triggered and finalized events Fixes a real bug: with `filter_incomplete_user_turns` enabled, the smart-turn detector's tentative stop was firing `on_user_turn_stopped` before the LLM had a chance to veto it. Observers, transcript appenders and UI indicators received an early — and sometimes duplicated — signal. Decomposes the single stop concern into two events: - `on_user_turn_inference_triggered` fires when a stop strategy has enough signal to start LLM inference. The aggregator pushes the context here, kicking off the LLM call. - `on_user_turn_stopped` fires only when the user turn is semantically final. Built-in strategies fire both events at the same call site, preserving today's behavior for the common case. Adds `LLMTurnCompletionUserTurnStopStrategy`, which gates finalization on a `UserTurnCompletedFrame` (a fieldless system frame emitted by any component judging turn completeness — currently the `UserTurnCompletionLLMServiceMixin` on `✓`). Adds `deferred(strategy)` / `DeferredUserTurnStopStrategy`, a thin wrapper that forwards an inner strategy's events except `on_user_turn_stopped`. Use this to install a stop strategy as an inference trigger only, leaving finalization to a peer (e.g. the LLM completion strategy). Adds `llm_completion_user_turn_stop_strategies()` for the common case: UserTurnStrategies( stop=llm_completion_user_turn_stop_strategies(), ) Deprecates `LLMUserAggregatorParams.filter_incomplete_user_turns`. The aggregator emits a `DeprecationWarning`, wraps existing stop strategies with `deferred(...)`, and appends `LLMTurnCompletionUserTurnStopStrategy` automatically.	2026-05-07 17:46:09 -07:00
Paul Kompfner	2616076bec	Add deterministic dev-error demo example ``examples/function-calling/function-calling-missing-handler.py`` demonstrates the missing-handler path by deliberately advertising a tool to the LLM without registering its handler — what happens when a developer forgets to call ``register_function``. Exercises the new ``logger.error`` severity end-to-end without needing to coax the LLM into hallucinating.	2026-05-05 13:08:00 -04:00
Paul Kompfner	e06e0c0282	Mitigate tool-call-related hallucination When tools change mid-conversation, LLMs can produce a few different flavors of tool-call-related hallucination: calling tools that have been removed, avoiding tools that have been re-added, or hallucinating output (made-up answers or tool-call-shaped non-tool-calls) when tools are unavailable. This change introduces an opt-in ``add_tool_change_messages`` flag on the LLM aggregators (preferred entry point: ``LLMContextAggregatorPair( ..., add_tool_change_messages=True)``) that appends a developer-role message to the context whenever ``LLMSetToolsFrame`` changes the set of advertised standard tools. Helps the LLM stay coherent across tool changes by spelling out exactly what just became available or unavailable. Both aggregators participate; whichever handles the frame first wins, and the other (if any) sees an empty diff against the shared context and stays silent — order-independent regardless of whether the frame flows downstream or upstream. Also tightens the existing missing-handler path (introduced in #4301): - Reworded the terminal tool result to a neutral "The function ``X`` is not currently available." (overridable via ``LLMService.MISSING_FUNCTION_CALL_MESSAGE_TEMPLATE``). Previously read "Error: function 'X' is not registered." - Logs at the call site now distinguish developer error (tool advertised but no handler registered → ``logger.error``) from hallucination (tool not advertised → ``logger.warning``). Includes a manual validation harness (``examples/features/features-add-tool-change-messages.py``) that exercises the new ``add_tool_change_messages`` mitigation by flipping tool availability on a turn counter so its effect can be observed end-to-end with the flag on vs. off.	2026-05-05 13:02:43 -04:00
Paul Kompfner	1b5c4cfa2a	feat: broaden tool_resources to app_resources Broaden `tool_resources` to `app_resources` for easy access not just in tool handlers but in other places like custom `FrameProcessor`s. Involves 3 changes: - A rename: `tool_resources` -> `app_resources` - A new property on `PipelineTask`: `app_resources` - A new property on `FrameProcessor`: `pipeline_task` Usage in tool handler: async def get_weather(params: FunctionCallParams): resources = cast(MyAppResources, params.app_resources) ... Usage in custom `FrameProcessor`: class MyProcessor(FrameProcessor): async def process_frame(self, frame, direction): await super().process_frame(frame, direction) if self.pipeline_task is not None: resources = cast(MyAppResources, self.pipeline_task.app_resources) ... The previous `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit `DeprecationWarning`s.	2026-04-30 16:16:17 -04:00
Mark Backman	58a038ddb2	Add Soniox real-time TTS service Introduce SonioxTTSService, a WebSocket TTS provider that streams text and receives audio over a persistent connection, multiplexing up to 5 concurrent streams per socket via Soniox's `stream_id`. Also updates the README service table and the Soniox voice example to use the new TTS end-to-end.	2026-04-27 16:04:02 -04:00
Paul Kompfner	124863175a	Add example demonstrating usage of `tool_resources`	2026-04-27 11:20:53 -04:00
kompfner	86effc4d10	Merge pull request #4015 from prettyprettyprettygood/feat/nova-sonic-session-continuation feat(nova-sonic): add proactive session continuation for conversation…	2026-04-27 09:36:48 -04:00
Gökmen Görgen	3bbfc42854	remove adaptive audio enhancement example and support for runtime enhancement level updates in `AICFilter`.	2026-04-25 10:05:47 +02:00
Gökmen Görgen	3b2127f912	rename environment variables and references from `AICOUSTICS` to `AIC`.	2026-04-25 09:51:23 +02:00
Gökmen Görgen	ea12b10742	rename `mcp-aic-adaptive.py` to `mcp-aicoustics-adaptive.py`.	2026-04-25 09:51:23 +02:00
Gökmen Görgen	a2fbed86cf	add adaptive audio enhancement example and support for runtime enhancement level updates in `AICFilter`.	2026-04-25 09:51:23 +02:00
Gökmen Görgen	f75f361629	bump `aic-sdk` to 2.2.0 and update `AICFilter` with `model_id` and `enhancement_level` changes.	2026-04-25 09:51:23 +02:00
Osman Ipek	f1b16a672a	feat(nova-sonic): add proactive session continuation for conversations >8min Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds transparent session continuation that rotates sessions in the background before the limit is reached, preserving conversation context with no user-perceptible interruption. Implementation follows the AWS reference architecture: - Monitor loop detects when session age exceeds threshold - On assistant AUDIO contentStart: start buffering user audio, create next session (sessionStart + promptStart + system instruction) - Track SPECULATIVE/FINAL text counts as completion signal - On completion signal: send conversation history + audioInputStart + buffered audio to next session, then promote immediately - Close old session in background (non-blocking) - Dead session detection: recreate next session if idle >30s Key design decisions: - Session continuation enabled by default (fundamental for long conversations) - Conversation history tracked in real-time via _sc_conversation_history (independent of pipeline context aggregator which updates asynchronously) - Completion signal check in _handle_content_end_event (after history update) to ensure latest text is included in handoff - Rolling audio buffer (default 3s) captures user audio during transition - transition_threshold_seconds capped at 420s (7min) for safety margin - Unified event methods (_send_text_event, _send_client_event, etc.) accept optional stream/prompt_name params, eliminating duplicate SC methods Also adds: - SessionContinuationParams config (enabled, threshold, buffer, timeout)	2026-04-24 14:55:55 -07:00
Mark Backman	d8f5c0be71	Add XAITTSService for xAI streaming WebSocket TTS Adds XAITTSService in the existing xai/tts.py module, alongside the existing XAIHttpTTSService. Connects to xAI's streaming endpoint at wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta chunks down on the same connection so audio starts flowing before the full utterance is synthesized. Extends InterruptibleTTSService since xAI's protocol is strictly sequential per connection and exposes neither a cancel verb nor a context ID — the only way to stop an in-flight utterance is to tear down the WebSocket, which is exactly what InterruptibleTTSService does on interruption when the bot is speaking. Voice, language, codec, and sample_rate are passed as query-string params at connect time; runtime setting changes reconnect the socket. Defaults to raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream. Splits the existing example into voice-xai.py (WebSocket) and voice-xai-http.py (batch HTTP) so each variant has its own entry point. Promotes the xai extra to depend on pipecat-ai[websockets-base] since the new service imports the websockets library.	2026-04-21 15:48:26 -04:00
Mark Backman	58a17c7b1b	Include examples in type checking Remove `examples/` from the `pyrightconfig.json` ignore list and fix the resulting type errors across all example files. Common fixes: - Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the return type is `str` rather than `str \| None`, and misconfiguration fails fast. - Narrow `LLMContextMessage` union members with `isinstance(..., dict)` before dict-style access. - `assert isinstance(params.llm, ...)` before calling service-specific methods that aren't on the base `LLMService`. - Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`) before use.	2026-04-21 15:43:31 -04:00
Mark Backman	b838bd906b	Add changelog for #4340	2026-04-21 13:45:34 -04:00
Mark Backman	c091232f2f	Add xAI streaming STT service New `XAISTTService` wraps xAI's real-time speech-to-text WebSocket (`wss://api.x.ai/v1/stt`). It extends `WebsocketSTTService`, authenticates with the `XAI_API_KEY` as a Bearer token on the WS handshake, and streams raw audio (PCM/mu-law/A-law) with configurable interim results, endpointing, language, multichannel, and diarization settings. - `src/pipecat/services/xai/stt.py`: new service, settings dataclass, and `language_to_xai_stt_language` helper. - `src/pipecat/services/stt_latency.py`: `XAI_TTFS_P99` default. - `pyproject.toml` / `uv.lock`: `xai` extra now pulls in `websockets-base`. - `README.md`: link to xAI STT in the services table. - `examples/voice/voice-xai.py`: swap DeepgramSTTService for XAISTTService so the xAI voice example is fully xAI. - `examples/transcription/transcription-xai.py`: new transcription-only example using the new service.	2026-04-21 13:45:34 -04:00
Paul Kompfner	81571beb1b	Use ExternalUserTurnStrategies, as expected, in a Deepgram Flux example	2026-04-21 10:51:59 -04:00
Mark Backman	42a6fc703c	Address review feedback - Fall back to Language.EN in _primary_detected_language when model is flux-general-en, preserving prior behavior on the default model. - Standardize example on DeepgramFluxSTTService.Settings and drop the now-redundant DeepgramFluxSTTSettings import. - Narrow the changed-behavior changelog to reflect that flux-general-en frames still carry Language.EN.	2026-04-17 15:38:14 -04:00
Mark Backman	6bb4e8295f	Add multilingual support for Deepgram Flux STT Enables the flux-general-multi model with one or more language_hints. Hints are sent as repeatable URL params at connect time and via a Configure control message when updated mid-stream (detect-then-lock). TranscriptionFrame.language now reflects the language Flux detected for each turn via the TurnInfo `languages` field.	2026-04-17 10:30:45 -04:00
Garegin Harutyunyan	4c19f5584c	VIVA SDK TT v3 support (#4252 ) * VIVA SDK TT v3 support * Format fix. * Renamed the API naming, removed '3' from the name. * Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support. * Typo fix in voice-krisp-viva example to use KrispVivaFilter class * style fix. * test run error fixes. * some test related changes. * Fixed tests * Stule fixes.	2026-04-17 07:53:41 -04:00

1 2 3 4 5 ...

1918 Commits