Wrap the detector chain with `deferred(...)` and append the LLM
completion gate via a `UserTurnStrategies` specialization rather than
a free-standing helper, mirroring the existing
`ExternalUserTurnStrategies` pattern. The class lives next to other
strategy containers in `pipecat.turns.user_turn_strategies`, so users
discover it where they're already configuring `user_turn_strategies`.
The deprecated `filter_incomplete_user_turns` flag now rewires
through `FilterIncompleteUserTurnStrategies` under the hood, keeping
the migration path identical to before. `deferred(...)` stays public
as the explicit escape hatch for non-default compositions.
Fixes a real bug: with `filter_incomplete_user_turns` enabled, the
smart-turn detector's tentative stop was firing `on_user_turn_stopped`
before the LLM had a chance to veto it. Observers, transcript
appenders and UI indicators received an early — and sometimes
duplicated — signal.
Decomposes the single stop concern into two events:
- `on_user_turn_inference_triggered` fires when a stop strategy has
enough signal to start LLM inference. The aggregator pushes the
context here, kicking off the LLM call.
- `on_user_turn_stopped` fires only when the user turn is semantically
final. Built-in strategies fire both events at the same call site,
preserving today's behavior for the common case.
Adds `LLMTurnCompletionUserTurnStopStrategy`, which gates
finalization on a `UserTurnCompletedFrame` (a fieldless system frame
emitted by any component judging turn completeness — currently the
`UserTurnCompletionLLMServiceMixin` on `✓`).
Adds `deferred(strategy)` / `DeferredUserTurnStopStrategy`, a thin
wrapper that forwards an inner strategy's events except
`on_user_turn_stopped`. Use this to install a stop strategy as an
inference trigger only, leaving finalization to a peer (e.g. the LLM
completion strategy).
Adds `llm_completion_user_turn_stop_strategies()` for the common
case:
UserTurnStrategies(
stop=llm_completion_user_turn_stop_strategies(),
)
Deprecates `LLMUserAggregatorParams.filter_incomplete_user_turns`.
The aggregator emits a `DeprecationWarning`, wraps existing stop
strategies with `deferred(...)`, and appends
`LLMTurnCompletionUserTurnStopStrategy` automatically.
``examples/function-calling/function-calling-missing-handler.py``
demonstrates the missing-handler path by deliberately advertising a
tool to the LLM without registering its handler — what happens when a
developer forgets to call ``register_function``. Exercises the new
``logger.error`` severity end-to-end without needing to coax the LLM
into hallucinating.
When tools change mid-conversation, LLMs can produce a few different
flavors of tool-call-related hallucination: calling tools that have
been removed, avoiding tools that have been re-added, or hallucinating
output (made-up answers or tool-call-shaped non-tool-calls) when tools
are unavailable.
This change introduces an opt-in ``add_tool_change_messages`` flag on
the LLM aggregators (preferred entry point: ``LLMContextAggregatorPair(
..., add_tool_change_messages=True)``) that appends a developer-role
message to the context whenever ``LLMSetToolsFrame`` changes the set
of advertised standard tools. Helps the LLM stay coherent across tool
changes by spelling out exactly what just became available or
unavailable. Both aggregators participate; whichever handles the
frame first wins, and the other (if any) sees an empty diff against
the shared context and stays silent — order-independent regardless of
whether the frame flows downstream or upstream.
Also tightens the existing missing-handler path (introduced in #4301):
- Reworded the terminal tool result to a neutral "The function
``X`` is not currently available." (overridable via
``LLMService.MISSING_FUNCTION_CALL_MESSAGE_TEMPLATE``). Previously
read "Error: function 'X' is not registered."
- Logs at the call site now distinguish developer error (tool
advertised but no handler registered → ``logger.error``) from
hallucination (tool not advertised → ``logger.warning``).
Includes a manual validation harness
(``examples/features/features-add-tool-change-messages.py``) that
exercises the new ``add_tool_change_messages`` mitigation by flipping
tool availability on a turn counter so its effect can be observed
end-to-end with the flag on vs. off.
Broaden `tool_resources` to `app_resources` for easy access not just in
tool handlers but in other places like custom `FrameProcessor`s.
Involves 3 changes:
- A rename: `tool_resources` -> `app_resources`
- A new property on `PipelineTask`: `app_resources`
- A new property on `FrameProcessor`: `pipeline_task`
Usage in tool handler:
async def get_weather(params: FunctionCallParams):
resources = cast(MyAppResources, params.app_resources)
...
Usage in custom `FrameProcessor`:
class MyProcessor(FrameProcessor):
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if self.pipeline_task is not None:
resources = cast(MyAppResources, self.pipeline_task.app_resources)
...
The previous `tool_resources` aliases (on `PipelineTask`,
`FunctionCallParams`, and `FrameProcessorSetup`) keep working but are
deprecated as of 1.2.0 and emit `DeprecationWarning`s.
Introduce SonioxTTSService, a WebSocket TTS provider that streams text and
receives audio over a persistent connection, multiplexing up to 5 concurrent
streams per socket via Soniox's `stream_id`. Also updates the README service
table and the Soniox voice example to use the new TTS end-to-end.
Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.
Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s
Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
(independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
optional stream/prompt_name params, eliminating duplicate SC methods
Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
Adds XAITTSService in the existing xai/tts.py module, alongside the
existing XAIHttpTTSService. Connects to xAI's streaming endpoint at
wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta
chunks down on the same connection so audio starts flowing before the full
utterance is synthesized.
Extends InterruptibleTTSService since xAI's protocol is strictly sequential
per connection and exposes neither a cancel verb nor a context ID — the
only way to stop an in-flight utterance is to tear down the WebSocket,
which is exactly what InterruptibleTTSService does on interruption when
the bot is speaking.
Voice, language, codec, and sample_rate are passed as query-string params
at connect time; runtime setting changes reconnect the socket. Defaults to
raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream.
Splits the existing example into voice-xai.py (WebSocket) and
voice-xai-http.py (batch HTTP) so each variant has its own entry point.
Promotes the xai extra to depend on pipecat-ai[websockets-base] since the
new service imports the websockets library.
Remove `examples/` from the `pyrightconfig.json` ignore list and fix
the resulting type errors across all example files. Common fixes:
- Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the
return type is `str` rather than `str | None`, and misconfiguration
fails fast.
- Narrow `LLMContextMessage` union members with `isinstance(..., dict)`
before dict-style access.
- `assert isinstance(params.llm, ...)` before calling service-specific
methods that aren't on the base `LLMService`.
- Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`)
before use.
New `XAISTTService` wraps xAI's real-time speech-to-text WebSocket
(`wss://api.x.ai/v1/stt`). It extends `WebsocketSTTService`, authenticates
with the `XAI_API_KEY` as a Bearer token on the WS handshake, and streams
raw audio (PCM/mu-law/A-law) with configurable interim results, endpointing,
language, multichannel, and diarization settings.
- `src/pipecat/services/xai/stt.py`: new service, settings dataclass, and
`language_to_xai_stt_language` helper.
- `src/pipecat/services/stt_latency.py`: `XAI_TTFS_P99` default.
- `pyproject.toml` / `uv.lock`: `xai` extra now pulls in `websockets-base`.
- `README.md`: link to xAI STT in the services table.
- `examples/voice/voice-xai.py`: swap DeepgramSTTService for XAISTTService so
the xAI voice example is fully xAI.
- `examples/transcription/transcription-xai.py`: new transcription-only
example using the new service.
- Fall back to Language.EN in _primary_detected_language when model is
flux-general-en, preserving prior behavior on the default model.
- Standardize example on DeepgramFluxSTTService.Settings and drop the
now-redundant DeepgramFluxSTTSettings import.
- Narrow the changed-behavior changelog to reflect that flux-general-en
frames still carry Language.EN.
Enables the flux-general-multi model with one or more language_hints.
Hints are sent as repeatable URL params at connect time and via a
Configure control message when updated mid-stream (detect-then-lock).
TranscriptionFrame.language now reflects the language Flux detected
for each turn via the TurnInfo `languages` field.
* VIVA SDK TT v3 support
* Format fix.
* Renamed the API naming, removed '3' from the name.
* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.
* Typo fix in voice-krisp-viva example to use KrispVivaFilter class
* style fix.
* test run error fixes.
* some test related changes.
* Fixed tests
* Stule fixes.
* Add Inworld Realtime LLM service
Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.
New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py
Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled
* Prefer init-provided system instruction in Inworld Realtime
Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.
* Update changelog entry with PR number
* Fix changelog format to use bullet point
* Polish PR: default model, example cleanup, changelog update
- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog
* Address PR review feedback for Inworld Realtime
- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
_ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths
* Simplify example, add provider tracking, remove local VAD path
- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends
* Default TTS model to inworld-tts-1.5-max
* Remove dead shimmed tools code, set STT/VAD defaults
- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low
Remove the deprecation proxy infrastructure that allowed old-style flat
imports (e.g. `from pipecat.services.openai import OpenAILLMService`).
Users must now import from specific submodules
(`from pipecat.services.openai.llm import OpenAILLMService`), which is
already the established pattern across all internal code and 179+ examples.
- Strip 32 proxy `__init__.py` files to empty
- Strip 3 non-proxy files with bare star imports (minimax, sambanova, sarvam)
- Strip google/gemini_live `__init__.py` re-exports
- Remove DeprecatedModuleProxy class and helpers from services/__init__.py
- Remove ruff per-file ignore for services/__init__.py
- Fix 2 examples using old-style imports
Napoleon's Attributes section creates class-level attribute docs that
duplicate the __init__ parameter docs when napoleon_include_init_with_doc
is enabled. Using Parameters avoids the duplication.