Adds InceptionLLMService, an OpenAI-compatible service for Inception's
Mercury-2 diffusion-based reasoning model. Supports reasoning_effort
(instant/low/medium/high) and realtime mode for reduced TTFT.
Mirrors the deprecation in ``OpenAITTSService.__init__``: ``instructions``
is now a Settings field. The constructor still accepts it for backward
compatibility but the canonical path is through ``Settings``.
A copy of ``turn-management-filter-incomplete-turns.py`` extended with
a ``get_weather(location)`` direct function. Exercises the path where
the LLM responds to a complete user turn by calling a tool — used to
reproduce (and now verify the fix for) the ``_user_speaking`` gating
bug between filter-incomplete and function calls.
Drop the EU-region default from the STT/TTS WebSocket URLs in favor of
the generic api.gradium.ai endpoint, and remove the explicit overrides
from the examples so they pick up the new defaults.
Mirrors the deprecation in ``QwenLLMService.__init__``: ``model`` should
be passed via ``settings=QwenLLMService.Settings(model=...)`` instead of
as a direct constructor arg.
Same async-tool routing approach as #4441: detect async-tool messages in
the LLM context, deliver the final result via the formal tool-result
channel.
Caveat: as of this writing, Inworld Realtime doesn't appear to handle
the resulting delayed tool result reliably, so the routing is
best-effort and the service emits a one-time warning when async-tool
messages are seen. Streamed intermediate results remain unsupported.
Also adds function calling to the realtime-inworld.py example, and
softens the Inworld mention in the #4447 changelog now that the
exclusion is being closed.
Lays groundwork for cancel_on_interruption=False support on Gemini Live by
restructuring _process_completed_function_calls to match the shape used by
AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: a single-pass
forward iteration over raw context messages that detects async-tool
messages via async_tool_messages.parse_message and routes them — started
skipped silently, intermediate logged-as-error and surfaced via push_error,
final delivered via the formal FunctionResponse channel.
Replaces the prior two-pass structure that went through the adapter for
sync results — the service now uses a lightweight self._tool_call_id_to_name
map (populated when the model issues tool calls) for the name lookup the
adapter used to provide. Extracts a new GeminiLLMAdapter.to_function_response_dict
static method for the dict-coercion logic that wraps non-dict tool returns
as {value: <result>} for Gemini's FunctionResponse.response field; the
adapter's existing inline copy in _from_standard_message uses it too.
Example consolidation:
- Folds realtime-gemini-live-function-calling.py into the base
realtime-gemini-live.py example so the base exercises function calling
out of the box (matching realtime-openai.py and realtime-aws-nova-sonic.py).
- Renames realtime-gemini-live-vertex-function-calling.py to
realtime-gemini-live-vertex.py, mirroring the consolidation.
- Adds realtime-gemini-live-async-tool.py.
- Updates scripts/evals/run-release-evals.py for the renames.
This commit alone doesn't make cancel_on_interruption=False fully work on
Gemini Live — additional investigation is pending. This is foundational
work to be built on.
Replaces the prior "log a warning and skip" approach with actual handling
of async-tool messages on Ultravox.
The catch with Ultravox is that its API freezes the conversation between
client_tool_invocation and the matching client_tool_result — there's no
"keep talking while the tool runs" channel like NON_BLOCKING on Gemini
or function_call_output-without-blocking on OpenAI Realtime. So:
- When the model invokes an async-registered function (cancel_on_inter
ruption=False), the service immediately ships a placeholder
client_tool_result that tells the model "the actual result isn't
ready yet; a follow-up will arrive shortly; keep the conversation
going". This unfreezes the conversation. The placeholder is sent
from _handle_tool_invocation, since the started async-tool message
doesn't reach the context-frame path until later.
- When the real tool finishes, the final async-tool message lands in
the context. _handle_context now forward-iterates and routes
async-tool messages: started is a no-op (placeholder already sent),
intermediate is logged-as-error and dropped (matching the other
realtime services), and final is injected as user-side text via
user_text_message with bracketed framing — the only mechanism
Ultravox offers for adding non-tool input mid-conversation.
Hoists the registry-lookup helper to LLMService as
_function_is_async(name) so future services can use the same pattern
without re-implementing it.
Adds an async-tool example file for Ultravox modeled on the existing
ones for the other realtime services.
Applies the same async-tool message routing introduced for AWSNovaSonicLLMService
and OpenAIRealtimeLLMService to additional realtime LLM services where the
flag's intent ("keep talking while the tool runs") is achievable:
- GrokRealtimeLLMService (xAI Realtime — also benefits the deprecated Grok
alias since it re-exports the xAI module)
- AzureRealtimeLLMService picks up the fix transitively by inheriting from
OpenAIRealtimeLLMService — no code change needed.
GrokRealtimeLLMService's _process_completed_function_calls now matches
the canonical pattern: skip LLMSpecificMessage, detect async-tool messages
via parse_message and route them — started skipped silently, intermediate
logged as an error and surfaced via push_error, final delivered through
the same channel as a synchronous result.
UltravoxRealtimeLLMService instead gets a one-time warning when async-tool
messages appear in the context. The Ultravox API freezes the conversation
during tool execution
(https://docs.ultravox.ai/tools/async-tools#custom-tool-timeouts), so the
flag's "keep talking while the tool runs" intent isn't achievable there —
applying the same code pattern would mislead users into expecting a UX
Ultravox can't deliver. Surfacing a clear warning is the right behavior
until Ultravox grows true async tool support.
Adds async-tool example files for Grok and Azure modeled on the existing
Nova Sonic / OpenAI Realtime ones (10s simulated network delay, weather
tool registered with cancel_on_interruption=False).
Two services remain excluded:
- GeminiLiveLLMService — the async-tool path needs deeper investigation.
- InworldRealtimeLLMService — appears to have a pre-existing problem
with even simple synchronous tool calling on its Realtime API (the
request reaches the server fine, but response generation fails with a
generic server_error).
Before the new async-tool mechanism landed, AWSNovaSonicLLMService and
OpenAIRealtimeLLMService honored cancel_on_interruption=False by simply
not cancelling in-flight function calls on interruption — the eventual
result then flowed through the same channel as any synchronous tool
result. The new mechanism (which appends started/intermediate/final
messages to the LLM context as the underlying task progresses) broke
that path: the realtime services didn't know how to interpret those
messages, and the eventual result was never delivered to the provider.
Restore the flag's behavior by teaching both services to detect
async-tool messages in the context and route them appropriately:
- started → skipped silently. The provider already issued the tool call
and natively awaits a result; nothing to send for the started marker.
- final → delivered via the formal tool-result channel. Same path as a
synchronous tool result, just delayed.
Streamed intermediate results (FunctionCallResultProperties(is_final=
False)) are not supported on these realtime services. An intermediate
result is logged as an error and surfaced via push_error, then dropped.
Use a non-realtime LLM service if a tool needs to stream intermediate
results. (Docstrings on register_function, register_direct_function, and
FunctionCallResultProperties.is_final updated to call this out.)
A new shared module pipecat.processors.aggregators.async_tool_messages
is the single source of truth for the on-the-wire payload shape: the
aggregator uses its build_*_message functions when injecting messages,
and the realtime services use parse_message when scanning the context.
Adds two example files exercising a network-delayed weather tool with
each service. The plain realtime-aws-nova-sonic.py example is also
reverted to a synchronous tool call now that the async variant lives in
its own file.
Similar fixes for other realtime services are forthcoming.
Wrap the detector chain with `deferred(...)` and append the LLM
completion gate via a `UserTurnStrategies` specialization rather than
a free-standing helper, mirroring the existing
`ExternalUserTurnStrategies` pattern. The class lives next to other
strategy containers in `pipecat.turns.user_turn_strategies`, so users
discover it where they're already configuring `user_turn_strategies`.
The deprecated `filter_incomplete_user_turns` flag now rewires
through `FilterIncompleteUserTurnStrategies` under the hood, keeping
the migration path identical to before. `deferred(...)` stays public
as the explicit escape hatch for non-default compositions.
Fixes a real bug: with `filter_incomplete_user_turns` enabled, the
smart-turn detector's tentative stop was firing `on_user_turn_stopped`
before the LLM had a chance to veto it. Observers, transcript
appenders and UI indicators received an early — and sometimes
duplicated — signal.
Decomposes the single stop concern into two events:
- `on_user_turn_inference_triggered` fires when a stop strategy has
enough signal to start LLM inference. The aggregator pushes the
context here, kicking off the LLM call.
- `on_user_turn_stopped` fires only when the user turn is semantically
final. Built-in strategies fire both events at the same call site,
preserving today's behavior for the common case.
Adds `LLMTurnCompletionUserTurnStopStrategy`, which gates
finalization on a `UserTurnCompletedFrame` (a fieldless system frame
emitted by any component judging turn completeness — currently the
`UserTurnCompletionLLMServiceMixin` on `✓`).
Adds `deferred(strategy)` / `DeferredUserTurnStopStrategy`, a thin
wrapper that forwards an inner strategy's events except
`on_user_turn_stopped`. Use this to install a stop strategy as an
inference trigger only, leaving finalization to a peer (e.g. the LLM
completion strategy).
Adds `llm_completion_user_turn_stop_strategies()` for the common
case:
UserTurnStrategies(
stop=llm_completion_user_turn_stop_strategies(),
)
Deprecates `LLMUserAggregatorParams.filter_incomplete_user_turns`.
The aggregator emits a `DeprecationWarning`, wraps existing stop
strategies with `deferred(...)`, and appends
`LLMTurnCompletionUserTurnStopStrategy` automatically.
``examples/function-calling/function-calling-missing-handler.py``
demonstrates the missing-handler path by deliberately advertising a
tool to the LLM without registering its handler — what happens when a
developer forgets to call ``register_function``. Exercises the new
``logger.error`` severity end-to-end without needing to coax the LLM
into hallucinating.
When tools change mid-conversation, LLMs can produce a few different
flavors of tool-call-related hallucination: calling tools that have
been removed, avoiding tools that have been re-added, or hallucinating
output (made-up answers or tool-call-shaped non-tool-calls) when tools
are unavailable.
This change introduces an opt-in ``add_tool_change_messages`` flag on
the LLM aggregators (preferred entry point: ``LLMContextAggregatorPair(
..., add_tool_change_messages=True)``) that appends a developer-role
message to the context whenever ``LLMSetToolsFrame`` changes the set
of advertised standard tools. Helps the LLM stay coherent across tool
changes by spelling out exactly what just became available or
unavailable. Both aggregators participate; whichever handles the
frame first wins, and the other (if any) sees an empty diff against
the shared context and stays silent — order-independent regardless of
whether the frame flows downstream or upstream.
Also tightens the existing missing-handler path (introduced in #4301):
- Reworded the terminal tool result to a neutral "The function
``X`` is not currently available." (overridable via
``LLMService.MISSING_FUNCTION_CALL_MESSAGE_TEMPLATE``). Previously
read "Error: function 'X' is not registered."
- Logs at the call site now distinguish developer error (tool
advertised but no handler registered → ``logger.error``) from
hallucination (tool not advertised → ``logger.warning``).
Includes a manual validation harness
(``examples/features/features-add-tool-change-messages.py``) that
exercises the new ``add_tool_change_messages`` mitigation by flipping
tool availability on a turn counter so its effect can be observed
end-to-end with the flag on vs. off.
Broaden `tool_resources` to `app_resources` for easy access not just in
tool handlers but in other places like custom `FrameProcessor`s.
Involves 3 changes:
- A rename: `tool_resources` -> `app_resources`
- A new property on `PipelineTask`: `app_resources`
- A new property on `FrameProcessor`: `pipeline_task`
Usage in tool handler:
async def get_weather(params: FunctionCallParams):
resources = cast(MyAppResources, params.app_resources)
...
Usage in custom `FrameProcessor`:
class MyProcessor(FrameProcessor):
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if self.pipeline_task is not None:
resources = cast(MyAppResources, self.pipeline_task.app_resources)
...
The previous `tool_resources` aliases (on `PipelineTask`,
`FunctionCallParams`, and `FrameProcessorSetup`) keep working but are
deprecated as of 1.2.0 and emit `DeprecationWarning`s.
Introduce SonioxTTSService, a WebSocket TTS provider that streams text and
receives audio over a persistent connection, multiplexing up to 5 concurrent
streams per socket via Soniox's `stream_id`. Also updates the README service
table and the Soniox voice example to use the new TTS end-to-end.
Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.
Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s
Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
(independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
optional stream/prompt_name params, eliminating duplicate SC methods
Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
Adds XAITTSService in the existing xai/tts.py module, alongside the
existing XAIHttpTTSService. Connects to xAI's streaming endpoint at
wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta
chunks down on the same connection so audio starts flowing before the full
utterance is synthesized.
Extends InterruptibleTTSService since xAI's protocol is strictly sequential
per connection and exposes neither a cancel verb nor a context ID — the
only way to stop an in-flight utterance is to tear down the WebSocket,
which is exactly what InterruptibleTTSService does on interruption when
the bot is speaking.
Voice, language, codec, and sample_rate are passed as query-string params
at connect time; runtime setting changes reconnect the socket. Defaults to
raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream.
Splits the existing example into voice-xai.py (WebSocket) and
voice-xai-http.py (batch HTTP) so each variant has its own entry point.
Promotes the xai extra to depend on pipecat-ai[websockets-base] since the
new service imports the websockets library.
Remove `examples/` from the `pyrightconfig.json` ignore list and fix
the resulting type errors across all example files. Common fixes:
- Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the
return type is `str` rather than `str | None`, and misconfiguration
fails fast.
- Narrow `LLMContextMessage` union members with `isinstance(..., dict)`
before dict-style access.
- `assert isinstance(params.llm, ...)` before calling service-specific
methods that aren't on the base `LLMService`.
- Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`)
before use.
New `XAISTTService` wraps xAI's real-time speech-to-text WebSocket
(`wss://api.x.ai/v1/stt`). It extends `WebsocketSTTService`, authenticates
with the `XAI_API_KEY` as a Bearer token on the WS handshake, and streams
raw audio (PCM/mu-law/A-law) with configurable interim results, endpointing,
language, multichannel, and diarization settings.
- `src/pipecat/services/xai/stt.py`: new service, settings dataclass, and
`language_to_xai_stt_language` helper.
- `src/pipecat/services/stt_latency.py`: `XAI_TTFS_P99` default.
- `pyproject.toml` / `uv.lock`: `xai` extra now pulls in `websockets-base`.
- `README.md`: link to xAI STT in the services table.
- `examples/voice/voice-xai.py`: swap DeepgramSTTService for XAISTTService so
the xAI voice example is fully xAI.
- `examples/transcription/transcription-xai.py`: new transcription-only
example using the new service.
- Fall back to Language.EN in _primary_detected_language when model is
flux-general-en, preserving prior behavior on the default model.
- Standardize example on DeepgramFluxSTTService.Settings and drop the
now-redundant DeepgramFluxSTTSettings import.
- Narrow the changed-behavior changelog to reflect that flux-general-en
frames still carry Language.EN.
Enables the flux-general-multi model with one or more language_hints.
Hints are sent as repeatable URL params at connect time and via a
Configure control message when updated mid-stream (detect-then-lock).
TranscriptionFrame.language now reflects the language Flux detected
for each turn via the TurnInfo `languages` field.
* VIVA SDK TT v3 support
* Format fix.
* Renamed the API naming, removed '3' from the name.
* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.
* Typo fix in voice-krisp-viva example to use KrispVivaFilter class
* style fix.
* test run error fixes.
* some test related changes.
* Fixed tests
* Stule fixes.