Rename *-local-vad.py example variants to *-locally-driven-turns.py

The "-local-vad" suffix was ambiguous now that local VAD has two meanings in the realtime context: supplementary user-turn frames broadcast alongside server-driven turns (commented-out opt-in in the base examples), vs. local turn detection driving the conversation end-to-end (server-side turn detection disabled, what these variant files actually demonstrate). The new "-locally-driven-turns" suffix matches the latter intent unambiguously. Renames: realtime-openai-local-vad.py → realtime-openai-locally-driven-turns.py realtime-gemini-live-local-vad.py → realtime-gemini-live-locally-driven-turns.py realtime-grok-local-vad.py → realtime-grok-locally-driven-turns.py realtime-inworld-local-vad.py → realtime-inworld-locally-driven-turns.py Plus the matching changelog fragments. Service docstrings and base examples that referenced the old filenames now point at the new ones.
Show commented-out local-VAD opt-in in no-turn-frames examples
2026-05-21 15:26:27 -04:00 · 2026-05-21 15:13:52 -04:00 · 2026-05-21 14:14:13 -04:00 · 2026-05-21 13:00:34 -04:00 · 2026-05-21 12:37:04 -04:00 · 2026-05-21 12:19:24 -04:00
284 changed files with 25206 additions and 3128 deletions
--- a/.agents/skills/changelog
+++ b/.agents/skills/changelog
@@ -0,0 +1 @@
+../../.claude/skills/changelog
--- a/.agents/skills/cleanup
+++ b/.agents/skills/cleanup
@@ -0,0 +1 @@
+../../.claude/skills/cleanup
--- a/.agents/skills/code-review
+++ b/.agents/skills/code-review
@@ -0,0 +1 @@
+../../.claude/skills/code-review
--- a/.agents/skills/docstring
+++ b/.agents/skills/docstring
@@ -0,0 +1 @@
+../../.claude/skills/docstring
--- a/.agents/skills/pr-description
+++ b/.agents/skills/pr-description
@@ -0,0 +1 @@
+../../.claude/skills/pr-description
--- a/.agents/skills/pr-submit
+++ b/.agents/skills/pr-submit
@@ -0,0 +1 @@
+../../.claude/skills/pr-submit
--- a/.agents/skills/update-docs
+++ b/.agents/skills/update-docs
@@ -0,0 +1 @@
+../../.claude/skills/update-docs
--- a/.claude/skills/cleanup/SKILL.md
+++ b/.claude/skills/cleanup/SKILL.md
@@ -1,3 +1,8 @@
+---
+name: cleanup
+description: Review, refactor, document, and validate code changes in the current branch
+---
+
 # Code Cleanup Skill

 The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
--- a/.github/workflows/coverage.yaml
+++ b/.github/workflows/coverage.yaml
@@ -42,6 +42,7 @@ jobs:
            --extra langchain \
            --extra livekit \
            --extra piper \
+            --extra runner \
            --extra sagemaker \
            --extra tracing \
            --extra websocket
--- a/.github/workflows/format.yaml
+++ b/.github/workflows/format.yaml
@@ -32,7 +32,9 @@ jobs:
        run: uv python install 3.12

      - name: Install development dependencies
-        run: uv sync --group dev --extra daily --extra tracing
+        # `--all-extras` (matching the dev setup in README.md) so pyright can
+        # resolve types from various optional dependencies.
+        run: uv sync --group dev --all-extras --no-extra gstreamer --no-extra local

      - name: Ruff formatter
        id: ruff-format
--- a/.github/workflows/tests.yaml
+++ b/.github/workflows/tests.yaml
@@ -46,6 +46,7 @@ jobs:
            --extra langchain \
            --extra livekit \
            --extra piper \
+            --extra runner \
            --extra sagemaker \
            --extra tracing \
            --extra websocket
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,174 @@
+# AGENTS.md
+
+This file provides guidance to AI coding agents when working with code in this repository.
+
+## Project Overview
+
+Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
+
+## Common Commands
+
+```bash
+# Setup development environment
+uv sync --group dev --all-extras --no-extra gstreamer --no-extra local
+
+# Install pre-commit hooks
+uv run pre-commit install
+
+# Run all tests
+uv run pytest
+
+# Run a single test file
+uv run pytest tests/test_name.py
+
+# Run a specific test
+uv run pytest tests/test_name.py::test_function_name
+
+# Preview changelog
+uv run towncrier build --draft --version Unreleased
+
+# Lint and format check
+uv run ruff check
+uv run ruff format --check
+
+# Update dependencies (after editing pyproject.toml)
+uv lock && uv sync
+```
+
+## Architecture
+
+### Frame-Based Pipeline Processing
+
+All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
+
+```
+[Processor1] → [Processor2] → ... → [ProcessorN]
+```
+
+**Key components:**
+
+- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
+
+- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
+
+- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
+
+- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
+
+- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
+
+- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
+
+- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
+
+- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
+
+- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
+
+- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
+
+- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
+
+### Important Patterns
+
+- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
+
+- **Turn Management**: Turn management is done through `LLMUserAggregator` and
+  `LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
+
+- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
+
+- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
+
+- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
+
+- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
+
+- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
+
+- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
+
+### Key Directories
+
+| Directory                  | Purpose                                            |
+| -------------------------- | -------------------------------------------------- |
+| `src/pipecat/frames/`      | Frame definitions (100+ types)                     |
+| `src/pipecat/processors/`  | FrameProcessor base + aggregators, filters, audio  |
+| `src/pipecat/pipeline/`    | Pipeline orchestration                             |
+| `src/pipecat/services/`    | AI service integrations (60+ providers)            |
+| `src/pipecat/transports/`  | Transport layer (Daily, LiveKit, WebSocket, Local) |
+| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols        |
+| `src/pipecat/observers/`   | Pipeline observers for monitoring frame flow       |
+| `src/pipecat/audio/`       | VAD, filters, mixers, turn detection, DTMF         |
+| `src/pipecat/turns/`       | User turn management                               |
+
+## Code Style
+
+- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
+- **Deprecations**: Use the `.. deprecated:: <version>` Sphinx directive in docstrings (never inline tags like `[DEPRECATED]`), and pair it with a runtime `warnings.warn(..., DeprecationWarning)` at the call site. See `CONTRIBUTING.md` for full conventions.
+- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
+- **Type hints**: Required for complex async code.
+- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
+  - `@dataclass`: Frame types, context aggregator pairs, internal data containers
+  - `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
+
+### Docstring Example
+
+```python
+class MyService(LLMService):
+    """Description of what the service does.
+
+    More detailed description.
+
+    Event handlers available:
+
+    - on_connected: Called when we are connected
+
+    Example::
+
+        @service.event_handler("on_connected")
+        async def on_connected(service, frame):
+            ...
+    """
+
+    def __init__(self, param1: str, **kwargs):
+        """Initialize the service.
+
+        Args:
+            param1: Description of param1.
+            **kwargs: Additional arguments passed to parent.
+        """
+        super().__init__(**kwargs)
+
+
+# Pydantic params class with a deprecated field
+class MyParams(BaseModel):
+    """Configuration parameters for MyService.
+
+    Parameters:
+        new_setting: Replacement for ``old_setting``.
+        old_setting: Legacy setting, no longer used.
+
+            .. deprecated:: 1.2.0
+                Use ``new_setting`` instead. Will be removed in 2.0.0.
+    """
+
+    new_setting: str = "default"
+    old_setting: str | None = None
+```
+
+## Service Implementation
+
+When adding a new service:
+
+1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
+2. Implement required abstract methods
+3. Handle necessary frames
+4. By default, all frames should be pushed in the direction they came
+5. Push `ErrorFrame` on failures
+6. Add metrics tracking via `MetricsData` if relevant
+7. Follow the pattern of existing services in `src/pipecat/services/`
+
+## Testing
+
+Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,515 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 <!-- towncrier release notes start -->

+## [1.2.1] - 2026-05-15
+
+### Changed
+
+- Changed the default WebSocket endpoints for `GradiumSTTService` and
+  `GradiumTTSService` to the region-neutral
+  `wss://api.gradium.ai/api/speech/asr` and
+  `wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes
+  traffic to the nearest endpoint. Override the url to pin to a specific
+  region.
+  (PR [#4500](https://github.com/pipecat-ai/pipecat/pull/4500))
+
+### Fixed
+
+- Fixed bot hangs when `filter_incomplete_user_turns` was enabled and the LLM
+  responded by calling a tool. The user turn never finalized, so the assistant
+  aggregator gated the tool-result context push and the LLM continuation never
+  ran. Tool calls now finalize the turn the moment they start, before the
+  function dispatches.
+  (PR [#4501](https://github.com/pipecat-ai/pipecat/pull/4501))
+
+## [1.2.0] - 2026-05-14
+
+### Added
+
+- Added a `session_id` field to `RunnerArguments` so bots can log or trace a
+  per-session identifier in local development the same way they can in Pipecat
+  Cloud. The development runner now mints a UUID at every construction site,
+  and paths that already returned a `sessionId` to the caller (Daily `/start`,
+  dial-in webhook) share that same UUID with the runner args instead of
+  generating two. The SmallWebRTC `/api/offer` endpoint also accepts an
+  optional `session_id` query parameter so the `/sessions/{session_id}/...`
+  proxy can thread it through.
+  (PR [#4385](https://github.com/pipecat-ai/pipecat/pull/4385))
+
+- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService`
+  for controlling Cartesia's server-side text buffering. When unset, Pipecat
+  picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE`
+  mode (custom buffering — avoids stacking client-side aggregation on top of
+  Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode
+  (Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to
+  override.
+  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
+
+- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and
+  `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
+  Improvement Program. When set, the value is forwarded to Deepgram as a query
+  parameter on the speak request. Defaults to `None`, which preserves the
+  existing behavior. See https://dpgr.am/deepgram-mip for pricing implications
+  before enabling.
+  (PR [#4400](https://github.com/pipecat-ai/pipecat/pull/4400))
+
+- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set
+  via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that
+  appends a developer-role message to the context whenever `LLMSetToolsFrame`
+  changes the set of advertised standard tools. Helps the LLM stay coherent
+  across mid-conversation tool changes, mitigating several flavors of
+  tool-call-related hallucination: calling tools that have been removed,
+  avoiding tools that have been re-added, and hallucinating output (made-up
+  answers or tool-call-shaped non-tool-calls) when tools are unavailable.
+  (PR [#4404](https://github.com/pipecat-ai/pipecat/pull/4404))
+
+- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in
+  `pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the
+  inference-triggered event and suppresses `on_user_turn_stopped`, leaving
+  finalization to another strategy in the chain such as
+  `LLMTurnCompletionUserTurnStopStrategy`.
+  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
+
+- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop` —
+  a generic stop strategy that finalizes the user turn whenever a
+  `UserTurnInferenceCompletedFrame` arrives, regardless of which component
+  produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base;
+  future producers (Flux, custom end-of-turn classifiers, etc.) can use the
+  base directly or subclass it to add producer-specific setup.
+  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
+
+- Added `on_user_turn_inference_triggered`, a new event on the user turn
+  controller, processor, aggregator and stop strategies that fires when a
+  strategy has enough signal to start LLM inference. By default it fires
+  together with `on_user_turn_stopped`; a gating strategy can fire only the
+  inference-triggered event and defer finalization to a peer.
+  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
+
+- Added `FilterIncompleteUserTurnStrategies` in
+  `pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization
+  that wraps the detector chain with `deferred(...)` and appends
+  `LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case:
+  `user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass
+  `config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.
+  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
+
+- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`.
+  When installed, the strategy gates `on_user_turn_stopped` on a
+  `UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by
+  any component that can judge turn completeness — e.g. the
+  `UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout`
+  provides a safety net if no completion frame ever arrives.
+  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
+
+- Added first-class RTVI support for the UI Agent Protocol:
+    - Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server
+  messages, plus `ui-command` and `ui-task` server-to-client messages, with
+  paired `*Data` / `*Message` pydantic models.
+    - Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`,
+  `Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching
+  default handlers live in `@pipecat-ai/client-react`.
+    - Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`,
+  and `ui-cancel-task` messages.
+    - Adds five UI pipeline frames, mirroring the `client-message`
+  frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` /
+  `RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` /
+  `UITaskMessage` envelopes, while the processor pushes inbound
+  `RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame`
+  alongside `on_ui_message`.
+    - Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.
+  (PR [#4407](https://github.com/pipecat-ai/pipecat/pull/4407))
+
+- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore
+  processor now resolve credentials via the standard boto3 provider chain (EC2
+  instance profiles, EKS pod roles / IRSA, ECS task roles, SSO,
+  `~/.aws/credentials`) when explicit credentials and `AWS_*` environment
+  variables are absent. Services running with IAM roles no longer need to
+  export static credentials.
+  (PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
+
+- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can
+  bias transcription for both file-based and realtime transcription.
+  (PR [#4426](https://github.com/pipecat-ai/pipecat/pull/4426))
+
+- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and
+  `DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum
+  silence duration before the watchdog sends a silence packet to prevent
+  dangling turns. The actual threshold is `max(chunk_duration * 2,
+  watchdog_min_timeout)`, so it also adapts automatically to the audio chunk
+  size in use.
+  (PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
+
+- Added `cancel_on_interruption=False` support for `GeminiLiveLLMService` on
+  models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini
+  2.x); the conversation now continues while the tool runs. On models that
+  don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time
+  warning explaining the limitation. (Note: an intermittent 1008 error can
+  occasionally fire on Gemini 2.5 during long-running tool calls; we
+  auto-reconnect.)
+  (PR [#4448](https://github.com/pipecat-ai/pipecat/pull/4448))
+
+- Added `NvidiaSageMakerWebsocketSTTService` for streaming speech recognition
+  using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint.
+  Produces `InterimTranscriptionFrame` and `TranscriptionFrame` frames, is
+  VAD-aware, and automatically reconnects on error.
+  (PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
+
+- Added NVIDIA Magpie TTS services via AWS SageMaker:
+  `NvidiaSageMakerHTTPTTSService` (single HTTP invocation, streams raw PCM
+  back) and `NvidiaSageMakerWebsocketTTSService` (persistent HTTP/2 bidi-stream
+  with full interruption support via `InterruptibleTTSService`).
+  (PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
+
+- Added support for `reasoning` configuration on `OpenAIRealtimeLLMService`,
+  for use with reasoning-capable Realtime models such as `gpt-realtime-2`.
+  (PR [#4470](https://github.com/pipecat-ai/pipecat/pull/4470))
+
+- Inworld TTS updates:
+    - Added `delivery_mode` setting (`STABLE`/`BALANCED`/`CREATIVE`) to
+  `InworldTTSService` and `InworldHttpTTSService`, enabling the
+  stability-vs-creativity tradeoff in `inworld-tts-2`.
+    - Added language support to `InworldTTSService` and
+  `InworldHttpTTSService`. The `language` setting is now forwarded to the API,
+  and a new `language_to_inworld_language()` helper normalizes Pipecat
+  `Language` enums to Inworld's BCP-47 locale tags.
+  (PR [#4473](https://github.com/pipecat-ai/pipecat/pull/4473))
+
+### Changed
+
+- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the
+  generally available `tts-rt-v1`.
+  (PR [#4386](https://github.com/pipecat-ai/pipecat/pull/4386))
+
+- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16`
+  to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the
+  `use_normalized_timestamps` and `max_buffer_delay_ms` fields.
+  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
+
+- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead
+  of the deprecated `use_original_timestamps` field. Word timestamps now
+  reflect what was actually spoken (post text-normalization and
+  pronunciation-dictionary substitution), matching the convention Pipecat uses
+  for ElevenLabs. This is a behavior change for `sonic-3` users, who were
+  previously receiving timestamps tied to the input transcript.
+  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
+
+- Broadened `tool_resources` to `app_resources` for easy access not just in
+  tool handlers but in other places like custom `FrameProcessor`s. Three
+  changes: a rename (`tool_resources` → `app_resources`), a new `app_resources`
+  property on `PipelineTask`, and a new `pipeline_task` property on
+  `FrameProcessor`. Tool handlers now read `params.app_resources`; custom
+  processors read `self.pipeline_task.app_resources`. The previous
+  `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and
+  `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit
+  `DeprecationWarning`s.
+  (PR [#4395](https://github.com/pipecat-ai/pipecat/pull/4395))
+
+- Lowered the per-message log in
+  `SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App
+  messages can be high-frequency and were noisy at debug level; set the loguru
+  level to `TRACE` to see them again.
+  (PR [#4397](https://github.com/pipecat-ai/pipecat/pull/4397))
+
+- Changed the default model for `GrokRealtimeLLMService` to
+  `grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The
+  previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is
+  being removed.
+  (PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
+
+- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to
+  `inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`,
+  `InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing
+  users can pin the prior model explicitly via the `model`/`tts_model`
+  argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid
+  model IDs.
+  (PR [#4422](https://github.com/pipecat-ai/pipecat/pull/4422))
+
+- Changed the default model for `GrokLLMService` from `grok-3` to
+  `grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.
+  (PR [#4429](https://github.com/pipecat-ai/pipecat/pull/4429))
+
+- `DeepgramFluxSTT` watchdog silence threshold is now dynamic:
+  `max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms.
+  This prevents false silence injections when large audio chunks are sent at
+  lower frequency.
+  (PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
+
+- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the
+  turn is complete (on `on_turn_context_completed`) rather than waiting until
+  all audio has finished playing back. The `isFinal` message from ElevenLabs is
+  now used to signal `TTSStoppedFrame` and clean up the audio context,
+  improving turn transition timing.
+  (PR [#4433](https://github.com/pipecat-ai/pipecat/pull/4433))
+
+- Updated `InworldHttpTTSService` and `InworldTTSService` to use PCM audio
+  encoding by default, which returns audio bytes without headers.
+  (PR [#4446](https://github.com/pipecat-ai/pipecat/pull/4446))
+
+- Moved `create_task`, `cancel_task`, the `task_manager` property, and
+  `setup(task_manager)` up from `FrameProcessor` to `BaseObject`. Custom
+  `BaseObject` subclasses (turn strategies, controllers, etc.) now inherit
+  these methods directly instead of reimplementing the task manager wiring.
+  Owners propagate the task manager to their child `BaseObject`s via `await
+  child.setup(task_manager)`.
+  (PR [#4449](https://github.com/pipecat-ai/pipecat/pull/4449))
+
+- Changed the default OpenAI Realtime input audio transcription model from
+  `gpt-4o-transcribe` to `gpt-realtime-whisper` for both
+  `OpenAIRealtimeSTTService` and `OpenAIRealtimeLLMService`. The new model does
+  not accept the `prompt` parameter; if a prompt is supplied alongside
+  `gpt-realtime-whisper`, it is dropped automatically and a warning is logged.
+  To keep using prompt hints, explicitly pin `model="gpt-4o-transcribe"` (or
+  `"gpt-4o-mini-transcribe"`).
+  (PR [#4450](https://github.com/pipecat-ai/pipecat/pull/4450))
+
+- Updated the default model for `CartesiaTTSService` and
+  `CartesiaHttpTTSService` from `sonic-3` to `sonic-3.5`.
+  (PR [#4462](https://github.com/pipecat-ai/pipecat/pull/4462))
+
+- Changed the default model for `OpenAIRealtimeLLMService` from
+  `gpt-realtime-1.5` to `gpt-realtime-2`.
+  (PR [#4472](https://github.com/pipecat-ai/pipecat/pull/4472))
+
+### Deprecated
+
+- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use
+  `user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add
+  `LLMTurnCompletionUserTurnStopStrategy` to a custom
+  `user_turn_strategies.stop`) instead. Setting the legacy flag still works for
+  one release: the aggregator emits a `DeprecationWarning` and rewires the
+  strategies as if you had passed `FilterIncompleteUserTurnStrategies`
+  directly.
+  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
+
+- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the
+  `create_file_resampler()` / `create_stream_resampler()` factories).
+  Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class
+  will be removed in Pipecat 2.0 along with the default `resampy` and `numba`
+  dependencies.
+  (PR [#4428](https://github.com/pipecat-ai/pipecat/pull/4428))
+
+### Fixed
+
+- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as
+  `ErrorFrame`s. The latest API emits a `flush_done` per transcript when
+  server-side buffering is disabled; Pipecat now consumes them silently since
+  each turn already has its own `context_id`.
+  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
+
+- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`,
+  `VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance
+  (e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both
+  the class and an instance.
+  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
+
+- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200
+  response — one with the API's error text and a second, less informative
+  "Unknown error" frame from the outer exception handler. It now pushes a
+  single frame that includes the HTTP status code and returns cleanly.
+  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
+
+- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally
+  for user turn stop strategies. It is now only imported when
+  `default_user_turn_stop_strategies()` is called. This improves startup time
+  and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning
+  when the default stop strategies are not used.
+  (PR [#4393](https://github.com/pipecat-ai/pipecat/pull/4393))
+
+- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was
+  stored in `Settings` but never sent to xAI, so every session silently fell
+  back to xAI's server-side default. The model is now passed via the `?model=`
+  query parameter on the WebSocket URL as xAI's Voice Agent API requires.
+  (PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
+
+- Fixed `on_user_turn_stopped` firing prematurely when
+  `filter_incomplete_user_turns` was enabled. The event now fires only after
+  the LLM confirms the user turn is complete (`✓`); previously the smart-turn
+  detector's tentative stop was bubbling up before the LLM had a chance to veto
+  it, causing observers, transcript appenders and UI indicators to receive an
+  early — and sometimes duplicated — signal.
+  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
+
+- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting
+  across two assistant messages in the LLM context and not surfacing in
+  `on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted
+  at the end of a TTS context now carries a PTS just past the last word so it
+  can't overtake clock-queued `TTSTextFrame`s in the transport's output, and
+  `LLMAssistantAggregator` now triggers
+  `on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the
+  frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting
+  transcripts).
+  (PR [#4414](https://github.com/pipecat-ai/pipecat/pull/4414))
+
+- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged
+  words (e.g. `bookLook`) when using Flash models. Flash often splits sentences
+  mid-stream into alignment chunks that begin with a real inter-word space, but
+  the previous fix unconditionally stripped that space from every chunk.
+  Leading spaces are now stripped only on the first alignment chunk of an
+  utterance, so subsequent chunks correctly flush partial words across
+  boundaries.
+  (PR [#4415](https://github.com/pipecat-ai/pipecat/pull/4415))
+
+- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor
+  erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`
+  was set in the environment. The half-populated kwargs are no longer forwarded
+  to aioboto3; partial env-var configurations now fall through to the boto3
+  credential chain like fully-unset configurations do.
+  (PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
+
+- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing
+  romanized/normalized text to the LLM context. With non-Latin input (e.g.,
+  Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao
+  !` instead of `你好！`), which then degraded subsequent LLM turns. The services
+  now consume `alignment` by default and only switch to `normalizedAlignment` /
+  `normalized_alignment` when `pronunciation_dictionary_locators` is configured
+  (where `alignment` has overlapping restarts that produce duplicated/garbled
+  words, per #4316). Both fields are read with preferred-with-fallback
+  semantics since each is nullable per the API schema.
+  (PR [#4424](https://github.com/pipecat-ai/pipecat/pull/4424))
+
+- Fixed a deadlock in `TTSService` that could permanently stall pipeline
+  processing when all three conditions occurred together:
+  `pause_frame_processing=True`, an interruption arrived before any TTS audio
+  was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`,
+  `FunctionCallResultFrame`) was in the processing queue at that moment. The
+  process task would block on `__process_event.wait()` indefinitely because
+  `BotStoppedSpeakingFrame` never arrives (no audio was played) and the
+  interruption handler did not resume processing. Affects services using
+  `pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and
+  ResembleAI.
+  (PR [#4431](https://github.com/pipecat-ai/pipecat/pull/4431))
+
+- Fixed interruptions being delayed when a slow non-uninterruptible frame was
+  processing and an uninterruptible frame was waiting in the queue. The bot
+  would stall until the slow frame finished instead of cancelling it
+  immediately on interruption.
+  (PR [#4434](https://github.com/pipecat-ai/pipecat/pull/4434))
+
+- Fixed `TTSService` dropping uninterruptible frames (e.g.
+  `FunctionCallResultFrame`) from its internal serialization queue when an
+  interruption occurs. Previously, the queue was recreated on every
+  interruption, silently discarding any queued frames. The queue is now reset
+  instead of recreated, preserving uninterruptible frames so they are always
+  delivered downstream.
+  (PR [#4435](https://github.com/pipecat-ai/pipecat/pull/4435))
+
+- Fixed a race condition in the Daily transport that caused `AttributeError:
+  'NoneType' object has no attribute 'send_app_message'` when tearing down a
+  pipeline. Both `DailyInputTransport` and `DailyOutputTransport` share the
+  same `DailyTransportClient` and both call `cleanup()`, which was releasing
+  the underlying `CallClient` on the first call — leaving the second caller
+  with a `None` client.
+  (PR [#4440](https://github.com/pipecat-ai/pipecat/pull/4440))
+
+- Restored `cancel_on_interruption=False` support for `AWSNovaSonicLLMService`
+  and `OpenAIRealtimeLLMService`. These services previously honored the flag by
+  simply not cancelling in-flight function calls on interruption; the
+  introduction of the new async-tool mechanism (which threads
+  started/intermediate/final messages through the LLM context) broke that path
+  because the realtime services didn't know how to interpret those messages.
+  Note that new-style streamed intermediate results
+  (`FunctionCallResultProperties(is_final=False)`) are not supported on these
+  realtime services. Similar fixes for other impacted realtime services are
+  forthcoming.
+  (PR [#4441](https://github.com/pipecat-ai/pipecat/pull/4441))
+
+- Fixed two misspelled Gemini TTS voice names in
+  `GeminiTTSService.AVAILABLE_VOICES`.
+  (PR [#4443](https://github.com/pipecat-ai/pipecat/pull/4443))
+
+- Extended the `cancel_on_interruption=False` regression fix to
+  `GrokRealtimeLLMService`, `AzureRealtimeLLMService`, and
+  `UltravoxRealtimeLLMService`. Grok and Azure use the same approach as in
+  #4441 (each service detects async-tool messages in the LLM context and routes
+  the final result to its formal tool-result channel; Azure inherits
+  transitively from `OpenAIRealtimeLLMService`). Ultravox needed a different
+  approach because its API freezes the conversation between
+  `client_tool_invocation` and the matching `client_tool_result` — for
+  async-registered functions it now ships a placeholder `client_tool_result`
+  immediately when the function is invoked (to unfreeze the conversation), then
+  injects the real result as user-side text once the tool finishes. Streamed
+  intermediate results (`FunctionCallResultProperties(is_final=False)`) are
+  still not supported on any of these realtime services. `GeminiLiveLLMService`
+  and `InworldRealtimeLLMService` are excluded for now: Gemini Live's
+  async-tool path needs deeper investigation, and Inworld tool calling needs to
+  be sorted out first.
+  (PR [#4447](https://github.com/pipecat-ai/pipecat/pull/4447))
+
+- Fixed `OpenAIRealtimeLLMService` handling of multi-output-item responses
+  (observed with `gpt-realtime-2`). A single response can now contain more than
+  one audio item, and the first item's `audio.done` may arrive after the second
+  item's deltas have started. Deltas still arrive strictly in playback order,
+  so we continue to forward them as received (matching OpenAI's reference
+  implementation). The fix removes spurious warnings, ensures truncation always
+  targets the latest audio item, and emits a single bracketing
+  `TTSStartedFrame`/`TTSStoppedFrame` pair per assistant turn (the Stopped is
+  now pushed on `response.done`).
+  (PR [#4465](https://github.com/pipecat-ai/pipecat/pull/4465))
+
+- Fixed missing `output` attribute on LLM OpenTelemetry spans when the LLM call
+  is interrupted mid-stream.
+  (PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
+
+- Fixed incorrect `metrics.ttfb` on STT OpenTelemetry spans, and parented them
+  to the current turn span.
+  (PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
+
+- Fixed incorrect `metrics.ttfb` on TTS OpenTelemetry spans for streaming
+  services.
+  (PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
+
+- Extended the `cancel_on_interruption=False` regression fix to
+  `InworldRealtimeLLMService`. Uses the same approach as in #4441 (the service
+  detects async-tool messages in the LLM context and routes the final result to
+  its formal tool-result channel). Note: as of this writing, Inworld Realtime
+  doesn't appear to handle the resulting delayed tool result reliably — the
+  routing is best-effort and the service surfaces a one-time warning when
+  async-tool messages are seen. Streamed intermediate results
+  (`FunctionCallResultProperties(is_final=False)`) are still not supported on
+  this realtime service. (Inworld was excluded from #4447 pending resolution of
+  an unrelated tool-calling issue, which turned out to be an account-level
+  matter.)
+  (PR [#4474](https://github.com/pipecat-ai/pipecat/pull/4474))
+
+- Fixed Cartesia TTS Korean word timestamps to use normal spacing rules,
+  preserving word boundaries and per-word timestamp alignment during downstream
+  aggregation.
+  (PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
+
+- Fixed Cartesia TTS Chinese and Japanese timestamp grouping to preserve
+  provider text spacing, avoiding artificial spaces when timestamp groups are
+  reassembled downstream.
+  (PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
+
+- Fixed `SonioxSTTService` final transcription frames missing detected language
+  metadata when Soniox returns token-level language annotations.
+  (PR [#4482](https://github.com/pipecat-ai/pipecat/pull/4482))
+
+- Fixed Soniox final transcription language detection to use the most common
+  recognized token language, avoiding mislabeling an utterance when the last
+  token is tagged with a different language.
+  (PR [#4495](https://github.com/pipecat-ai/pipecat/pull/4495))
+
+- Fixed dropped audio in streaming TTS services whose wire protocol doesn't
+  echo `context_id` back on incoming audio (Sarvam, Smallest, Soniox, Inworld,
+  and others). Previously, audio that arrived between contexts or at the very
+  start of a turn was tagged with `context_id=None` and silently dropped with
+  an "unable to append audio to context: no context ID provided" debug log.
+  `TTSService.get_active_audio_context_id()` now falls back to the
+  synthesis-side `_turn_context_id` when the playback cursor isn't set yet.
+  (PR [#4497](https://github.com/pipecat-ai/pipecat/pull/4497))
+
+### Security
+
+- Fixed a path traversal issue in the development runner's
+  `/files/{filename:path}` download endpoint. Previously, when the runner was
+  started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could
+  escape the configured folder because `%2F`-encoded separators bypassed
+  Starlette's path normalisation. The endpoint now resolves the joined path and
+  rejects any filename that escapes the allowed base with a 403, and also
+  returns 404 (instead of an implicit `null` 200) when `--folder` is unset.
+  (PR [#4417](https://github.com/pipecat-ai/pipecat/pull/4417))
+
 ## [1.1.0] - 2026-04-27

 ### Added
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,157 +1 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Project Overview
-
-Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
-
-## Common Commands
-
-```bash
-# Setup development environment
-uv sync --group dev --all-extras --no-extra gstreamer
-
-# Install pre-commit hooks
-uv run pre-commit install
-
-# Run all tests
-uv run pytest
-
-# Run a single test file
-uv run pytest tests/test_name.py
-
-# Run a specific test
-uv run pytest tests/test_name.py::test_function_name
-
-# Preview changelog
-uv run towncrier build --draft --version Unreleased
-
-# Lint and format check
-uv run ruff check
-uv run ruff format --check
-
-# Update dependencies (after editing pyproject.toml)
-uv lock && uv sync
-```
-
-## Architecture
-
-### Frame-Based Pipeline Processing
-
-All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
-
-```
-[Processor1] → [Processor2] → ... → [ProcessorN]
-```
-
-**Key components:**
-
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
-
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
-
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
-
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
-
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
-
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
-
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
-
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
-
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
-
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
-
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
-
-### Important Patterns
-
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
-
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
-  `LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
-
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
-
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
-
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
-
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
-
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
-
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
-
-### Key Directories
-
-| Directory                  | Purpose                                            |
-| -------------------------- | -------------------------------------------------- |
-| `src/pipecat/frames/`      | Frame definitions (100+ types)                     |
-| `src/pipecat/processors/`  | FrameProcessor base + aggregators, filters, audio  |
-| `src/pipecat/pipeline/`    | Pipeline orchestration                             |
-| `src/pipecat/services/`    | AI service integrations (60+ providers)            |
-| `src/pipecat/transports/`  | Transport layer (Daily, LiveKit, WebSocket, Local) |
-| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols        |
-| `src/pipecat/observers/`   | Pipeline observers for monitoring frame flow       |
-| `src/pipecat/audio/`       | VAD, filters, mixers, turn detection, DTMF         |
-| `src/pipecat/turns/`       | User turn management                               |
-
-## Code Style
-
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
-  - `@dataclass`: Frame types, context aggregator pairs, internal data containers
-  - `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
-
-### Docstring Example
-
-```python
-class MyService(LLMService):
-    """Description of what the service does.
-
-    More detailed description.
-
-    Event handlers available:
-
-    - on_connected: Called when we are connected
-
-    Example::
-
-        @service.event_handler("on_connected")
-        async def on_connected(service, frame):
-            ...
-    """
-
-    def __init__(self, param1: str, **kwargs):
-        """Initialize the service.
-
-        Args:
-            param1: Description of param1.
-            **kwargs: Additional arguments passed to parent.
-        """
-        super().__init__(**kwargs)
-```
-
-## Service Implementation
-
-When adding a new service:
-
-1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
-2. Implement required abstract methods
-3. Handle necessary frames
-4. By default, all frames should be pushed in the direction they came
-5. Push `ErrorFrame` on failures
-6. Add metrics tracking via `MetricsData` if relevant
-7. Follow the pattern of existing services in `src/pipecat/services/`
-
-## Testing
-
-Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
+@AGENTS.md
--- a/README.md
+++ b/README.md
@@ -89,20 +89,20 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout

 ## 🧩 Available services

-| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
-| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA Riva](https://docs.pipecat.ai/api-reference/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-| LLMs                | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
-| Text-to-Speech      | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/api-reference/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
-| Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
-| Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
-| Serializers         | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
-| Video               | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
-| Memory              | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-| Vision & Image      | [fal](https://docs.pipecat.ai/api-reference/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/api-reference/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/api-reference/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
-| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/api-reference/server/utilities/audio/silero-vad-analyzer), [Krisp Viva](https://docs.pipecat.ai/guides/features/krisp-viva), [Koala](https://docs.pipecat.ai/api-reference/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/api-reference/server/utilities/audio/aic-filter), [RNNoise](https://docs.pipecat.ai/api-reference/server/utilities/audio/rnnoise-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
-| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/api-reference/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/api-reference/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-| Community           | [Browse community integrations →](https://docs.pipecat.ai/api-reference/server/services/community-integrations)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| LLMs                | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| Text-to-Speech      | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
+| Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| Serializers         | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| Video               | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| Memory              | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+| Vision & Image      | [fal](https://docs.pipecat.ai/api-reference/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/api-reference/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/api-reference/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/api-reference/server/utilities/audio/silero-vad-analyzer), [Krisp Viva](https://docs.pipecat.ai/guides/features/krisp-viva), [Koala](https://docs.pipecat.ai/api-reference/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/api-reference/server/utilities/audio/aic-filter), [RNNoise](https://docs.pipecat.ai/api-reference/server/utilities/audio/rnnoise-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/api-reference/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/api-reference/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| Community           | [Browse community integrations →](https://docs.pipecat.ai/api-reference/server/services/community-integrations)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |

 📚 [View full services documentation →](https://docs.pipecat.ai/api-reference/server/services/supported-services)

--- a/changelog/+inworld-manual-mode.fixed.md
+++ b/changelog/+inworld-manual-mode.fixed.md
@@ -0,0 +1 @@
+- Fixed `InworldRealtimeLLMService` not supporting manual-mode turn detection (`session_properties.audio.input.turn_detection=None`). Previously `_handle_user_stopped_speaking` and `_handle_interruption` assumed Inworld's server-side VAD handled commit/cancel/response.create automatically and were no-ops on the client side. In manual mode the server doesn't, so local-VAD-driven turns stalled: the bot never responded after the user stopped speaking, and interruptions didn't cancel the in-flight response. Wire the explicit `InputAudioBufferCommitEvent` + `ResponseCreateEvent` on user-stopped-speaking and `InputAudioBufferClearEvent` + `ResponseCancelEvent` on interruption, gated on a new `_is_manual_turn_detection()` check (mirroring the pattern in `OpenAIRealtimeLLMService`).
--- a/changelog/+nova-sonic-server-interruption.fixed.md
+++ b/changelog/+nova-sonic-server-interruption.fixed.md
@@ -0,0 +1 @@
+- Fixed AWS Nova Sonic not surfacing server-side interruption. When the user interrupted the bot mid-response, the `INTERRUPTED` stop reason was acknowledged internally but no `InterruptionFrame` was emitted, so `BaseOutputTransport` kept draining its audio buffer and the bot kept talking past the interruption. Nova Sonic now broadcasts `InterruptionFrame` on both `INTERRUPTED` paths (text-stage and audio-stage). This was previously masked by enabling local VAD on the user aggregator, which generated `UserStartedSpeakingFrame` and triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.
--- a/changelog/+realtime-examples-migrated.changed.md
+++ b/changelog/+realtime-examples-migrated.changed.md
@@ -0,0 +1 @@
+- Migrated all realtime LLM service examples (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, Gemini Live Vertex, AWS Nova Sonic, Ultravox) — base examples, `persistent-context-*`, `update-settings/llm/*`, and the Gemini Live MCP example — to use `LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())`. Where examples previously wired `SileroVADAnalyzer` into `LLMUserAggregatorParams` as a workaround for missing turn frames, the local VAD has been removed; the realtime service mode + the Phase 1.5 interruption fixes for Nova Sonic and Ultravox make this safe. Transcript-logging event handlers have moved from `on_user_turn_stopped` / `on_assistant_turn_stopped` to the new `on_user_message_added` / `on_assistant_message_added` events, which carry the finalized message text. Examples for services without server-side user-turn frames (Gemini Live, AWS Nova Sonic, Ultravox) include a Tier 1 comment block explaining what doesn't activate without those frames and how to add local VAD if needed; the corresponding service docstrings have the same warning.
--- a/changelog/+realtime-grok-locally-driven-turns-example.added.md
+++ b/changelog/+realtime-grok-locally-driven-turns-example.added.md
@@ -0,0 +1 @@
+- Added `examples/realtime/realtime-grok-locally-driven-turns.py`, a variant of the base Grok Realtime example that disables Grok's server-side turn detection (`turn_detection=None`, manual mode) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Mirrors the OpenAI Realtime locally-driven-turns variant. Server-emitted turn frames are preferred when available.
--- a/changelog/+realtime-inworld-locally-driven-turns-example.added.md
+++ b/changelog/+realtime-inworld-locally-driven-turns-example.added.md
@@ -0,0 +1 @@
+- Added `examples/realtime/realtime-inworld-locally-driven-turns.py`, a variant of the base Inworld Realtime example that disables Inworld's server-side turn detection (`turn_detection=None`, manual mode) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Mirrors the OpenAI Realtime and Grok Realtime locally-driven-turns variants. Server-emitted turn frames are preferred when available.
--- a/changelog/+realtime-no-user-turn-frames-log.added.md
+++ b/changelog/+realtime-no-user-turn-frames-log.added.md
@@ -0,0 +1 @@
+- Added a startup INFO log on realtime LLM services that don't emit `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame` (Gemini Live, AWS Nova Sonic, Ultravox). The log spells out which downstream processors depend on those frames (RTVI client speech events, `TurnTrackingObserver`, `AudioBufferProcessor` turn recording, `UserIdleController`, user mute strategies, voicemail detector) and how to opt into local VAD when needed.
--- a/changelog/+realtime-openai-locally-driven-turns-example.added.md
+++ b/changelog/+realtime-openai-locally-driven-turns-example.added.md
@@ -0,0 +1 @@
+- Added `examples/realtime/realtime-openai-locally-driven-turns.py`, a variant of the base OpenAI Realtime example that disables OpenAI's server-side turn detection (`turn_detection=False`) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Use this variant if you need a turn analyzer like `LocalSmartTurnV3` to decide when the user is done speaking, or if you need `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame` to fire from the same source as `InterruptionFrame`. Server-emitted turn frames are preferred when available.
--- a/changelog/+realtime-service-metadata-frame.added.md
+++ b/changelog/+realtime-service-metadata-frame.added.md
@@ -0,0 +1 @@
+- Added `RealtimeServiceMetadataFrame`, broadcast at pipeline start by realtime LLM services (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox). The context aggregator pair listens for it and, when `realtime_service_mode` isn't configured, logs a one-time INFO recommendation pointing users at the option and the `on_user_turn_stopped` timing change it implies.
--- a/changelog/+realtime-service-mode-config.added.md
+++ b/changelog/+realtime-service-mode-config.added.md
@@ -0,0 +1 @@
+- Added `RealtimeServiceModeConfig` and a new `realtime_service_mode` kwarg on `LLMContextAggregatorPair`, opting the pair into realtime (speech-to-speech) LLM behavior. When set, user messages are written to context when the assistant response starts rather than on user-turn-end frames — so context stays correct even when the realtime service emits no turn frames at all — and, by default, turn-end strategies stop waiting for transcripts before signalling end-of-turn, keeping transcript latency off the critical path in local-VAD-driven realtime pipelines. Both behaviors are individually controllable via the `context_writes_await_turns` and `turns_await_transcripts` fields. Cascade (non-realtime) behavior is unchanged when the kwarg is omitted.
--- a/changelog/+realtime-service-mode-events.added.md
+++ b/changelog/+realtime-service-mode-events.added.md
@@ -0,0 +1 @@
+- Added `on_user_message_added` and `on_assistant_message_added` event handlers on `LLMUserAggregator` and `LLMAssistantAggregator`. Each fires when its respective message is flushed to context and carries the finalized content. In cascade mode they coincide with `on_user_turn_stopped` / `on_assistant_turn_stopped`; in realtime mode (where turn-stop fires before the message is finalized) they're the canonical way to subscribe to "context just updated, here's the text."
--- a/changelog/+ultravox-server-interruption.fixed.md
+++ b/changelog/+ultravox-server-interruption.fixed.md
@@ -0,0 +1 @@
+- Fixed Ultravox Realtime not surfacing server-side interruption. The server sends a `playback_clear_buffer` message when the user interrupts the bot mid-speech, instructing clients to drop buffered output audio; this was previously unhandled, so `BaseOutputTransport` kept playing the buffered audio and the bot kept talking past the interruption. Ultravox now broadcasts `InterruptionFrame` on `playback_clear_buffer`. This was previously masked by enabling local VAD on the user aggregator, which generated `UserStartedSpeakingFrame` and triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.
--- a/changelog/+user-turn-stopped-message-content-optional.changed.md
+++ b/changelog/+user-turn-stopped-message-content-optional.changed.md
@@ -0,0 +1 @@
+- `UserTurnStoppedMessage.content` is now typed `str | None`. In realtime mode (`RealtimeServiceModeConfig(context_writes_await_turns=False)`) the user message isn't finalized at turn-stop time, so `content` is `None`; subscribers wanting the finalized text should use the new `on_user_message_added` event. Cascade behavior is unchanged.
--- a/changelog/+wait-for-transcript-stop-strategies.changed.md
+++ b/changelog/+wait-for-transcript-stop-strategies.changed.md
@@ -0,0 +1 @@
+- `SpeechTimeoutUserTurnStopStrategy` and `TurnAnalyzerUserTurnStopStrategy` now accept a `wait_for_transcript: bool = True` kwarg. When set to `False`, the strategy signals end-of-turn as soon as VAD / the turn analyzer reports end-of-speech rather than waiting for a transcript — useful when local turn detection is the intended driver of a realtime conversation. `LLMContextAggregatorPair` flips this for you when `realtime_service_mode` is configured with the default `turns_await_transcripts=False`.
--- a/changelog/4052.added.md
+++ b/changelog/4052.added.md
@@ -0,0 +1 @@
+- Added `VonageVideoConnectorTransport`, a new transport integration for real-time Vonage WebRTC sessions using the Vonage Video Connector library.
--- a/changelog/4306.fixed.md
+++ b/changelog/4306.fixed.md
@@ -0,0 +1 @@
+- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.
--- a/changelog/4380.fixed.2.md
+++ b/changelog/4380.fixed.2.md
@@ -0,0 +1 @@
+- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.
--- a/changelog/4380.fixed.3.md
+++ b/changelog/4380.fixed.3.md
@@ -0,0 +1 @@
+- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.
--- a/changelog/4380.fixed.4.md
+++ b/changelog/4380.fixed.4.md
@@ -0,0 +1 @@
+- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.
--- a/changelog/4380.fixed.md
+++ b/changelog/4380.fixed.md
@@ -0,0 +1 @@
+- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.
--- a/changelog/4442.added.2.md
+++ b/changelog/4442.added.2.md
@@ -0,0 +1 @@
+- Added `GET /status` endpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via `-t`).
--- a/changelog/4442.added.md
+++ b/changelog/4442.added.md
@@ -0,0 +1 @@
+- Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the `/ws-client` endpoint alongside other transports.
--- a/changelog/4442.changed.md
+++ b/changelog/4442.changed.md
@@ -0,0 +1 @@
+- ⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The `/start` endpoint accepts a `"transport"` field to select the transport per-request; omitting `-t` at startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved from `GET /` to `GET /daily`.
--- a/changelog/4507.fixed.md
+++ b/changelog/4507.fixed.md
@@ -0,0 +1 @@
+- Fixed `ElevenLabsSTTService` crashing when `language` was passed as `None`. When `language` is not set, the service now lets ElevenLabs auto-detect the audio language.
--- a/changelog/4522.changed.md
+++ b/changelog/4522.changed.md
@@ -0,0 +1 @@
+- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.
--- a/changelog/4524.changed.md
+++ b/changelog/4524.changed.md
@@ -0,0 +1 @@
+- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.
--- a/changelog/4524.fixed.md
+++ b/changelog/4524.fixed.md
@@ -0,0 +1 @@
+- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.
--- a/changelog/4527.fixed.md
+++ b/changelog/4527.fixed.md
@@ -0,0 +1 @@
+- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.
--- a/changelog/_template.md.j2
+++ b/changelog/_template.md.j2
@@ -5,7 +5,7 @@

 {% for text, values in sections[section][category].items() %}
 {{ text }}
-(PR {{ values|join(', ') }})
+  (PR {{ values|join(', ') }})

 {% endfor %}
 {% endfor %}
--- a/env.example
+++ b/env.example
@@ -132,6 +132,10 @@ NOVITA_API_KEY=...

 # NVIDIA
 NVIDIA_API_KEY=...
+# For a full example of how to deploy to SageMaker, see:
+# https://github.com/pipecat-ai/pipecat-examples/tree/main/nvidia_sagemaker_example/deployment/aws-sagemaker-nvidia
+SAGEMAKER_ASR_ENDPOINT_NAME=...
+SAGEMAKER_MAGPIE_ENDPOINT_NAME=...

 # OpenAI
 OPENAI_API_KEY=...
@@ -207,6 +211,11 @@ TWILIO_AUTH_TOKEN=...
 # Ultravox Realtime
 ULTRAVOX_API_KEY=...

+# Vonage
+VONAGE_APPLICATION_ID=...
+VONAGE_SESSION_ID=...
+VONAGE_TOKEN=...
+
 # WhatsApp
 WHATSAPP_TOKEN=...
 WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
--- a/examples/features/features-add-tool-change-messages.py
+++ b/examples/features/features-add-tool-change-messages.py
@@ -0,0 +1,232 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Manual validation harness for the ``add_tool_change_messages`` feature.
+
+When tools change mid-conversation, LLMs can produce a few different
+flavors of tool-call-related hallucination:
+
+- **Forward hallucination** — calling a tool that has been removed.
+- **Negative hallucination** — refusing to call a tool that has been
+  re-added (because recent context is full of "I can't" responses).
+- **Hallucinated output when tools are unavailable** — making up an
+  answer rather than declining gracefully, or producing JSON that
+  *looks* like a tool call but is actually just an assistant text
+  response.
+
+The ``add_tool_change_messages`` feature mitigates these by appending a
+developer-role message to the conversation whenever ``LLMSetToolsFrame``
+changes the set of advertised tools, so the LLM stays in sync with what's
+actually available.
+
+This harness exercises all of those flavors by flipping the advertised
+tool set on a turn counter:
+
+    Phase 0 (turns 1–4):   weather tool ACTIVE — confirm baseline.
+    Phase 1 (turns 5–8):   tool REMOVED — keep asking for weather.
+    Phase 2 (turn 9+):     tool RE-ADDED — does the LLM call it again?
+
+Set ``ADD_TOOL_CHANGE_MESSAGES=0`` to disable the mitigation and see the
+unmitigated behavior. The default is ON so a fresh run shows the feature
+working.
+
+Defaults to Llama 3.1 8B Instruct via a locally-running Ollama —
+anecdotally one of the more hallucination-prone of the easily accessible
+models. Pull the model once with ``ollama pull llama3.1:8b`` and make
+sure ``ollama serve`` is running. Swap the LLM service to validate other
+providers.
+
+Run with::
+
+    uv run examples/features/features-add-tool-change-messages.py
+    ADD_TOOL_CHANGE_MESSAGES=0 uv run examples/features/features-add-tool-change-messages.py
+"""
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame, LLMSetToolsFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import NOT_GIVEN, LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.ollama.llm import OLLamaLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+# Default ON so a fresh run shows the feature working. Set to "0" to A/B
+# against the unmitigated behavior.
+ADD_TOOL_CHANGE_MESSAGES = os.environ.get("ADD_TOOL_CHANGE_MESSAGES", "1") == "1"
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    await params.result_callback({"conditions": "nice", "temperature": "75"})
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the user's location.",
+        },
+    },
+    required=["location", "format"],
+)
+weather_tools = ToolsSchema(standard_tools=[weather_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
+    "twilio": lambda: FastAPIWebsocketParams(audio_in_enabled=True, audio_out_enabled=True),
+    "webrtc": lambda: TransportParams(audio_in_enabled=True, audio_out_enabled=True),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(
+        f"Starting add_tool_change_messages demo bot "
+        f"(ADD_TOOL_CHANGE_MESSAGES={ADD_TOOL_CHANGE_MESSAGES})"
+    )
+
+    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
+
+    tts = CartesiaTTSService(
+        api_key=os.environ["CARTESIA_API_KEY"],
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )
+
+    llm = OLLamaLLMService(
+        settings=OLLamaLLMService.Settings(
+            # Llama 3.1 8B Instruct is anecdotally one of the more
+            # hallucination-prone of the easily accessible models — exactly
+            # what we want for this validation harness. Pull it with
+            # ``ollama pull llama3.1:8b`` and make sure ``ollama serve``
+            # is running.
+            model="llama3.1:8b",
+            system_instruction=(
+                "You are a helpful assistant in a voice conversation. Your responses "
+                "will be spoken aloud, so avoid emojis, bullet points, or other "
+                "formatting that can't be spoken. Respond briefly and naturally. "
+                "If the user asks for the current weather, use the `get_current_weather` "
+                "function if it's available. IMPORTANT: if you do not have access to the function, "
+                "say something along the lines of 'Sorry, I can't check the weather right now.'."
+            ),
+        ),
+    )
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+
+    context = LLMContext(tools=weather_tools)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        add_tool_change_messages=ADD_TOOL_CHANGE_MESSAGES,
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            stt,
+            user_aggregator,
+            llm,
+            tts,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    # Phase controller: roughly 4 turns per phase.
+    user_turn_count = 0
+    REMOVE_AT_TURN = 5  # tool gone for turn N onward
+    READD_AT_TURN = 9  # tool back for turn N onward
+
+    @user_aggregator.event_handler("on_user_turn_stopped")
+    async def on_user_turn_stopped(aggregator, strategy, message):
+        nonlocal user_turn_count
+        user_turn_count += 1
+        logger.info(f"=== User turn {user_turn_count} complete ===")
+
+        if user_turn_count == REMOVE_AT_TURN - 1:
+            logger.info(
+                "=== Phase 1: weather tool REMOVED. Keep asking about the weather "
+                "to exercise hallucination scenarios. ==="
+            )
+            await task.queue_frame(LLMSetToolsFrame(tools=NOT_GIVEN))
+        elif user_turn_count == READD_AT_TURN - 1:
+            logger.info(
+                "=== Phase 2: weather tool RE-ADDED. Ask for the weather again — "
+                "does the LLM call it, or keep refusing? (THIS IS THE TEST.) ==="
+            )
+            await task.queue_frame(LLMSetToolsFrame(tools=weather_tools))
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info("Client connected")
+        logger.info(
+            "=== Phase 0: weather tool ACTIVE. Ask for the weather a few times "
+            "to confirm it's working. ==="
+        )
+        context.add_message(
+            {
+                "role": "developer",
+                "content": (
+                    "Please introduce yourself briefly to the user, then invite them "
+                    "to ask about the weather."
+                ),
+            }
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info("Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/function-calling/function-calling-tool-resources.py
+++ b/examples/function-calling/function-calling-tool-resources.py
@@ -4,23 +4,33 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-"""Example demonstrating ``PipelineTask(tool_resources=...)``.
+"""Example demonstrating ``PipelineTask(app_resources=...)``.

-``tool_resources`` is an application-defined bag of anything you want every
-tool handler in a session to share by reference: database handles, HTTP
-clients, feature flags, per-user state, observability clients, in-memory
-caches — whatever fits your app. Pipecat passes it through untouched as
-``FunctionCallParams.tool_resources``.
+``app_resources`` is an application-defined bag of anything your
+application code may want to share across a session: database handles,
+HTTP clients, feature flags, per-user state, observability clients,
+in-memory caches — whatever fits your app. Pipecat passes it through
+untouched and exposes it as ``task.app_resources``, so any code with a
+handle on the task can read or mutate it.

-This example uses a small ``ToolCallLogger`` as a stand-in for that "shared
-thing". A real app might just as easily pass a Postgres pool, a Redis
-client, a Stripe SDK instance, or any combination thereof. The mechanics
-shown here — construct once, hand to the task, read it from each handler,
-inspect it after the session — are the same regardless of what you put in.
+Two of the convenience aliases exercised below:

-We bundle resources in a typed ``SessionResources`` dataclass and cast back
-to it at the top of each handler. Pipecat doesn't care what type you pass
-(a plain dict works too), but a typed container gives you autocomplete and
+- Tool handlers read it from ``FunctionCallParams.app_resources``.
+- Custom ``FrameProcessor`` subclasses read it from
+  ``self.pipeline_task.app_resources``.
+
+This example uses two small loggers as stand-ins for that "shared thing":
+``ToolCallLogger`` (written from tool handlers) and
+``TranscriptionLogger`` (written from a custom ``FrameProcessor`` that
+sits in the pipeline). A real app might just as easily pass a Postgres
+pool, a Redis client, a Stripe SDK instance, or any combination thereof.
+The mechanics shown here — construct once, hand to the task, read it
+from each site, inspect it after the session — are the same regardless
+of what you put in.
+
+We bundle resources in a typed ``AppResources`` dataclass and cast back
+to it at each read site. Pipecat doesn't care what type you pass (a
+plain dict works too), but a typed container gives you autocomplete and
 refactor safety instead of dict-by-string-key lookups.
 """

@@ -28,7 +38,7 @@ import json
 import os
 from collections.abc import Mapping
 from dataclasses import dataclass
-from datetime import UTC, datetime, timezone
+from datetime import UTC, datetime
 from typing import Any, cast

 from dotenv import load_dotenv
@@ -37,7 +47,7 @@ from loguru import logger
 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
+from pipecat.frames.frames import Frame, LLMRunFrame, TranscriptionFrame, TTSSpeakFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -46,6 +56,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
 )
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -86,30 +97,80 @@ class ToolCallLogger:
        return json.dumps(self._calls, indent=2)


+class TranscriptionLogger:
+    """Records final user transcriptions — written from a custom FrameProcessor."""
+
+    def __init__(self):
+        """Initialize the logger with an empty list of recorded transcriptions."""
+        self._entries: list[dict[str, Any]] = []
+
+    def log_transcription(self, text: str) -> None:
+        """Record a transcription.
+
+        Args:
+            text: The transcribed user utterance.
+        """
+        entry = {
+            "timestamp": datetime.now(UTC).isoformat(),
+            "text": text,
+        }
+        self._entries.append(entry)
+        logger.info(f"[TranscriptionLogger] {text!r}")
+
+    def dump(self) -> str:
+        """Return all recorded transcriptions as a JSON string."""
+        return json.dumps(self._entries, indent=2)
+
+
@dataclass
-class SessionResources:
-    """Typed container for everything the tool handlers in this session share.
+class AppResources:
+    """Typed container for everything the app shares across this session.

    Add fields here as the app grows (e.g. ``db: AsyncConnection``,
-    ``http: httpx.AsyncClient``). Handlers ``cast()`` ``params.tool_resources``
-    to this type to get autocomplete and refactor safety.
+    ``http: httpx.AsyncClient``). Read sites ``cast()`` to this type to
+    get autocomplete and refactor safety:
+
+    - In tools: ``cast(AppResources, params.app_resources)``.
+    - In custom processors: ``cast(AppResources, self.pipeline_task.app_resources)``.
    """

    tool_call_logger: ToolCallLogger
+    transcription_logger: TranscriptionLogger


 async def fetch_weather_from_api(params: FunctionCallParams):
-    resources = cast(SessionResources, params.tool_resources)
+    resources = cast(AppResources, params.app_resources)
    resources.tool_call_logger.log_tool_call(params.function_name, params.arguments)
    await params.result_callback({"conditions": "nice", "temperature": "75"})


 async def fetch_restaurant_recommendation(params: FunctionCallParams):
-    resources = cast(SessionResources, params.tool_resources)
+    resources = cast(AppResources, params.app_resources)
    resources.tool_call_logger.log_tool_call(params.function_name, params.arguments)
    await params.result_callback({"name": "The Golden Dragon"})


+class TranscriptionLoggingProcessor(FrameProcessor):
+    """Logs each final user transcription into the shared app resources.
+
+    Demonstrates the second read site for ``app_resources``: any custom
+    ``FrameProcessor`` can reach the same bag every tool handler sees by
+    going through ``self.pipeline_task.app_resources``. ``pipeline_task``
+    is ``None`` until the task sets the processor up, so we guard against
+    that case.
+    """
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """Forward all frames; log final user transcriptions on the way through."""
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, TranscriptionFrame) and self.pipeline_task is not None:
+            resources = cast(AppResources, self.pipeline_task.app_resources)
+            resources.transcription_logger.log_transcription(frame.text)
+
+        await self.push_frame(frame, direction)
+
+
 # We use lambdas to defer transport parameter creation until the transport
 # type is selected at runtime.
 transport_params = {
@@ -203,6 +264,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        [
            transport.input(),
            stt,
+            TranscriptionLoggingProcessor(),
            user_aggregator,
            llm,
            tts,
@@ -211,10 +273,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        ]
    )

-    # Keep a local handle so we can read collected state after the session
+    # Keep local handles so we can read collected state after the session
    # ends; Pipecat never copies or clears the object.
    tool_call_logger = ToolCallLogger()
-    resources = SessionResources(tool_call_logger=tool_call_logger)
+    transcription_logger = TranscriptionLogger()
+    resources = AppResources(
+        tool_call_logger=tool_call_logger,
+        transcription_logger=transcription_logger,
+    )

    task = PipelineTask(
        pipeline,
@@ -223,7 +289,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            enable_usage_metrics=True,
        ),
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-        tool_resources=resources,
+        app_resources=resources,
    )

    @transport.event_handler("on_client_connected")
@@ -246,6 +312,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    # The session has ended; read whatever state the handlers built up.
    logger.info(f"Tool calls logged during session:\n{tool_call_logger.dump()}")
+    logger.info(f"Transcriptions logged during session:\n{transcription_logger.dump()}")


 async def bot(runner_args: RunnerArguments):
--- a/examples/function-calling/function-calling-missing-handler.py
+++ b/examples/function-calling/function-calling-missing-handler.py
@@ -0,0 +1,187 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Manual demonstration of the missing-handler (developer-error) recovery path.
+
+When a tool is advertised to the LLM via ``tools``/``LLMContext`` but
+the developer forgets to call ``llm.register_function(...)`` to wire up
+its handler, the LLM happily emits a tool call and then... nothing
+happens on the Pipecat side, leaving the conversation stuck.
+
+Pipecat's recovery path (``LLMService._missing_function_call_handler``)
+catches this case:
+
+- Logs a ``logger.error`` distinguishing **developer error** (tool advertised
+  but no handler registered) from a hallucination (tool not advertised),
+  pointing at the missing ``register_function`` call.
+- Returns a neutral terminal tool result
+  (``LLMService.MISSING_FUNCTION_CALL_MESSAGE_TEMPLATE``: "The function
+  `X` is not currently available.") so the call still terminates with a
+  normal tool result instead of leaving the conversation stuck.
+
+This example is **deliberately broken**: the weather schema is in
+``tools`` but ``register_function`` is *not* called. Ask the bot about
+the weather and observe:
+
+1. The LLM emits a tool call for ``get_current_weather``.
+2. ``logger.error`` fires with "advertised … but has no registered handler
+   — did you forget to call register_function()?"
+3. The terminal tool result is fed back to the LLM.
+4. The LLM responds in voice based on that result (typically something
+   like "the weather function isn't available right now").
+
+Uses the OpenAI LLM service with defaults. Swap to another provider to
+validate this behavior elsewhere.
+"""
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the user's location.",
+        },
+    },
+    required=["location", "format"],
+)
+weather_tools = ToolsSchema(standard_tools=[weather_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
+    "twilio": lambda: FastAPIWebsocketParams(audio_in_enabled=True, audio_out_enabled=True),
+    "webrtc": lambda: TransportParams(audio_in_enabled=True, audio_out_enabled=True),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info("Starting missing-handler demo bot (no handler is registered on purpose)")
+
+    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
+
+    tts = CartesiaTTSService(
+        api_key=os.environ["CARTESIA_API_KEY"],
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )
+
+    llm = OpenAILLMService(
+        api_key=os.environ["OPENAI_API_KEY"],
+        settings=OpenAILLMService.Settings(
+            system_instruction=(
+                "You are a helpful assistant in a voice conversation. Your responses "
+                "will be spoken aloud, so avoid emojis, bullet points, or other "
+                "formatting that can't be spoken. Respond briefly and naturally. "
+                "Always use the get_current_weather function to answer questions "
+                "about the current weather."
+            ),
+        ),
+    )
+
+    # *** DELIBERATELY OMITTED ***
+    # The whole point of this example is to demonstrate the missing-handler
+    # recovery path. Re-add this line to wire the tool up correctly:
+    #
+    # llm.register_function("get_current_weather", fetch_weather_from_api)
+
+    context = LLMContext(tools=weather_tools)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            stt,
+            user_aggregator,
+            llm,
+            tts,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info("Client connected")
+        logger.info(
+            "=== Ask for the weather. Watch for a logger.error about the missing "
+            "handler, and listen for the LLM's response based on the recovery "
+            "message. ==="
+        )
+        context.add_message(
+            {
+                "role": "developer",
+                "content": (
+                    "Please introduce yourself briefly to the user, then invite "
+                    "them to ask about the weather."
+                ),
+            }
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info("Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/function-calling/function-calling-openai-async.py
+++ b/examples/function-calling/function-calling-openai-async.py
@@ -29,7 +29,7 @@ from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.llm_service import FunctionCallParams
 from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.services.openai.stt import OpenAISTTService
+from pipecat.services.openai.stt import OpenAIRealtimeSTTService
 from pipecat.services.openai.tts import OpenAITTSService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -69,13 +69,7 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = OpenAISTTService(
-        api_key=os.environ["OPENAI_API_KEY"],
-        settings=OpenAISTTService.Settings(
-            model="gpt-4o-transcribe",
-            prompt="Expect words related weather, such as temperature and conditions. And restaurant names.",
-        ),
-    )
+    stt = OpenAIRealtimeSTTService(api_key=os.environ["OPENAI_API_KEY"])

    tts = OpenAITTSService(
        api_key=os.environ["OPENAI_API_KEY"],
--- a/examples/function-calling/function-calling-openai.py
+++ b/examples/function-calling/function-calling-openai.py
@@ -25,7 +25,7 @@ from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.llm_service import FunctionCallParams
 from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.services.openai.stt import OpenAISTTService
+from pipecat.services.openai.stt import OpenAIRealtimeSTTService
 from pipecat.services.openai.tts import OpenAITTSService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -63,20 +63,14 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = OpenAISTTService(
-        api_key=os.environ["OPENAI_API_KEY"],
-        settings=OpenAISTTService.Settings(
-            model="gpt-4o-transcribe",
-            prompt="Expect words related weather, such as temperature and conditions. And restaurant names.",
-        ),
-    )
+    stt = OpenAIRealtimeSTTService(api_key=os.environ["OPENAI_API_KEY"])

    tts = OpenAITTSService(
        api_key=os.environ["OPENAI_API_KEY"],
        settings=OpenAITTSService.Settings(
+            instructions="Please speak clearly and at a moderate pace.",
            voice="ballad",
        ),
-        instructions="Please speak clearly and at a moderate pace.",
    )

    llm = OpenAILLMService(
--- a/examples/function-calling/function-calling-qwen.py
+++ b/examples/function-calling/function-calling-qwen.py
@@ -71,8 +71,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    llm = QwenLLMService(
        api_key=os.environ["QWEN_API_KEY"],
-        model="qwen2.5-72b-instruct",
        settings=QwenLLMService.Settings(
+            model="qwen2.5-72b-instruct",
            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
        ),
    )
--- a/examples/mcp/mcp-streamable-http-gemini-live.py
+++ b/examples/mcp/mcp-streamable-http-gemini-live.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger
 from mcp.client.session_group import StreamableHttpParameters

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -19,7 +18,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -84,7 +83,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        context = LLMContext([{"role": "user", "content": "Please introduce yourself."}])
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
-            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+            realtime_service_mode=RealtimeServiceModeConfig(),
        )

        pipeline = Pipeline(
--- a/examples/persistent-context/persistent-context-aws-nova-sonic.py
+++ b/examples/persistent-context/persistent-context-aws-nova-sonic.py
@@ -15,7 +15,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -23,7 +22,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -241,7 +240,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(tools=tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/persistent-context/persistent-context-grok-realtime.py
+++ b/examples/persistent-context/persistent-context-grok-realtime.py
@@ -33,6 +33,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -203,7 +204,10 @@ Remember, your responses should be short - just one or two sentences usually."""
    llm.register_function("load_conversation", load_conversation)

    context = LLMContext([{"role": "developer", "content": "Say hello!"}], tools)
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/persistent-context/persistent-context-openai-realtime.py
+++ b/examples/persistent-context/persistent-context-openai-realtime.py
@@ -15,7 +15,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -23,7 +22,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -217,7 +216,7 @@ Remember, your responses should be short. Just one or two sentences, usually."""
    context = LLMContext([{"role": "developer", "content": "Say hello!"}], tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-aws-nova-sonic-async-tool.py
+++ b/examples/realtime/realtime-aws-nova-sonic-async-tool.py
@@ -0,0 +1,183 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Example: async function call with the AWS Nova Sonic LLM service.
+
+The ``get_current_weather`` tool is registered with
+``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
+While the call is in flight the conversation continues; the result arrives
+later via the async-tool mechanism and is forwarded to Nova Sonic via the
+formal toolResult channel so the model can integrate it naturally into its
+next turn.
+"""
+
+import asyncio
+import os
+import random
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.aws.nova_sonic.llm import AWSNovaSonicLLMService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    # Simulate a long-running API call so we can demonstrate that the
+    # conversation continues while the tool is in flight.
+    await asyncio.sleep(10)
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "location": params.arguments["location"],
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    system_instruction = (
+        "You are a friendly assistant. The user and you will engage in a spoken "
+        "dialog exchanging the transcripts of a natural real-time conversation. "
+        "Keep your responses short, generally two or three sentences for chatty "
+        "scenarios. When the user asks for the weather, call get_current_weather. "
+        "While you wait for the result, keep chatting with the user. When the "
+        "result arrives, share it with the user naturally."
+    )
+
+    llm = AWSNovaSonicLLMService(
+        secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
+        access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
+        region=os.environ["AWS_REGION"],
+        session_token=os.getenv("AWS_SESSION_TOKEN"),
+        settings=AWSNovaSonicLLMService.Settings(
+            voice="tiffany",
+            system_instruction=system_instruction,
+        ),
+    )
+
+    llm.register_function(
+        "get_current_weather",
+        fetch_weather_from_api,
+        cancel_on_interruption=False,
+    )
+
+    context = LLMContext(tools=tools)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-aws-nova-sonic.py
+++ b/examples/realtime/realtime-aws-nova-sonic.py
@@ -15,7 +15,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -24,7 +23,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -46,11 +45,6 @@ async def fetch_weather_from_api(params: FunctionCallParams):
        if params.arguments["format"] == "fahrenheit"
        else random.randint(15, 30)
    )
-    # Simulate a long network delay.
-    # You can continue chatting while waiting for this to complete.
-    # With Nova 2 Sonic (the default model), the assistant will respond
-    # appropriately once the function call is complete.
-    await asyncio.sleep(5)
    await params.result_callback(
        {
            "conditions": "nice",
@@ -150,15 +144,34 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Register function for function calls
    # you can either register a single function for all function calls, or specific functions
    # llm.register_function(None, fetch_weather_from_api)
-    llm.register_function(
-        "get_current_weather", fetch_weather_from_api, cancel_on_interruption=False
-    )
+    llm.register_function("get_current_weather", fetch_weather_from_api)

    # Set up context and context management.
+    #
+    # AWS Nova Sonic drives the conversation server-side and does not emit
+    # UserStartedSpeakingFrame / UserStoppedSpeakingFrame. Context
+    # aggregation still works with realtime_service_mode, but pipeline
+    # processors that depend on those frames (RTVI client speech events,
+    # TurnTrackingObserver, AudioBufferProcessor turn recording,
+    # UserIdleController, user mute strategies, voicemail detector) won't
+    # activate. The Pipecat Prebuilt UI is one such consumer — without
+    # these frames it can't group user transcripts into discrete turns
+    # visually.
+    #
+    # If you need those frames, uncomment the SileroVADAnalyzer import
+    # above and the `user_params=` argument below. Note: local turn
+    # detection may not match Nova Sonic's actual server-side turn
+    # decisions and can desynchronize in subtle ways.
+    #
+    # from pipecat.audio.vad.silero import SileroVADAnalyzer
+    # from pipecat.processors.aggregators.llm_response_universal import (
+    #     LLMUserAggregatorParams,
+    # )
    context = LLMContext(tools=tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        # user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
    )

    # Build the pipeline
@@ -202,14 +215,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Nova Sonic doesn't emit user-turn frames so on_user_turn_stopped
+    # would never fire. The *_message_added events fire when messages are
+    # written to context and carry the finalized content; use those for
+    # transcript logging.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-azure-async-tool.py
+++ b/examples/realtime/realtime-azure-async-tool.py
@@ -0,0 +1,194 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Example: async function call with the Azure Realtime LLM service.
+
+The ``get_current_weather`` tool is registered with
+``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
+While the call is in flight the conversation continues; the result arrives
+later via the async-tool mechanism and is forwarded to Azure Realtime as a
+``function_call_output`` so the model can integrate it naturally into its
+next turn.
+"""
+
+import asyncio
+import os
+import random
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.azure.realtime.llm import AzureRealtimeLLMService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai.realtime.events import (
+    AudioConfiguration,
+    AudioInput,
+    InputAudioTranscription,
+    SessionProperties,
+)
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    # Simulate a long-running API call so we can demonstrate that the
+    # conversation continues while the tool is in flight.
+    await asyncio.sleep(10)
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "location": params.arguments["location"],
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function])
+
+
+system_instruction = (
+    "You are a friendly assistant. The user and you will engage in a spoken "
+    "dialog exchanging the transcripts of a natural real-time conversation. "
+    "Keep your responses short, generally two or three sentences for chatty "
+    "scenarios. When the user asks for the weather, call get_current_weather. "
+    "While you wait for the result, keep chatting with the user. When the "
+    "result arrives, share it with the user naturally."
+)
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    llm = AzureRealtimeLLMService(
+        api_key=os.environ["AZURE_REALTIME_API_KEY"],
+        base_url=os.environ["AZURE_REALTIME_BASE_URL"],
+        settings=AzureRealtimeLLMService.Settings(
+            system_instruction=system_instruction,
+            session_properties=SessionProperties(
+                audio=AudioConfiguration(
+                    input=AudioInput(
+                        transcription=InputAudioTranscription(model="whisper-1"),
+                    )
+                ),
+            ),
+        ),
+    )
+
+    llm.register_function(
+        "get_current_weather",
+        fetch_weather_from_api,
+        cancel_on_interruption=False,
+    )
+
+    context = LLMContext(tools=tools)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-azure.py
+++ b/examples/realtime/realtime-azure.py
@@ -13,7 +13,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -21,7 +20,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -174,7 +173,7 @@ Remember, your responses should be short. Just one or two sentences, usually. Re

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-gemini-live-function-calling.py
+++ b/examples/realtime/realtime-gemini-live-function-calling.py
@@ -4,21 +4,34 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+"""Example: async function call with the Gemini Live LLM service.

+The ``get_current_weather`` tool is registered with
+``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
+While the call is in flight the conversation continues; the result arrives
+later via the async-tool mechanism and is forwarded to Gemini Live as a
+FunctionResponse so the model can integrate it naturally into its next turn.
+"""
+
+import asyncio
 import os
+import random
 from datetime import datetime

 from dotenv import load_dotenv
 from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -31,33 +44,55 @@ load_dotenv(override=True)


 async def fetch_weather_from_api(params: FunctionCallParams):
-    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    # Simulate a long-running API call so we can demonstrate that the
+    # conversation continues while the tool is in flight.
+    await asyncio.sleep(10)
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
    await params.result_callback(
        {
            "conditions": "nice",
            "temperature": temperature,
+            "location": params.arguments["location"],
            "format": params.arguments["format"],
            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
        }
    )


-async def fetch_restaurant_recommendation(params: FunctionCallParams):
-    await params.result_callback({"name": "The Golden Dragon"})
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the user's location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function])


-system_instruction = """
-You are a helpful assistant who can answer questions and use tools.
-
-You have three tools available to you:
-1. get_current_weather: Use this tool to get the current weather in a specific location.
-2. get_restaurant_recommendation: Use this tool to get a restaurant recommendation in a specific location.
-3. google_search: Use this tool to search the web for information.
-"""
+system_instruction = (
+    "You are a friendly assistant. The user and you will engage in a spoken "
+    "dialog exchanging the transcripts of a natural real-time conversation. "
+    "Keep your responses short, generally two or three sentences for chatty "
+    "scenarios. When the user asks for the weather, call get_current_weather. "
+    "While you wait for the result, keep chatting with the user. When the "
+    "result arrives, share it with the user naturally."
+)


-# We use lambdas to defer transport parameter creation until the transport
-# type is selected at runtime.
 transport_params = {
    "daily": lambda: DailyParams(
        audio_in_enabled=True,
@@ -77,42 +112,6 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    weather_function = FunctionSchema(
-        name="get_current_weather",
-        description="Get the current weather",
-        properties={
-            "location": {
-                "type": "string",
-                "description": "The city and state, e.g. San Francisco, CA",
-            },
-            "format": {
-                "type": "string",
-                "enum": ["celsius", "fahrenheit"],
-                "description": "The temperature unit to use. Infer this from the user's location.",
-            },
-        },
-        required=["location", "format"],
-    )
-    restaurant_function = FunctionSchema(
-        name="get_restaurant_recommendation",
-        description="Get a restaurant recommendation",
-        properties={
-            "location": {
-                "type": "string",
-                "description": "The city and state, e.g. San Francisco, CA",
-            },
-        },
-        required=["location"],
-    )
-    search_tool = {"google_search": {}}
-    # KNOWN ISSUE: If using GeminiVertexLiveLLMService, it appears
-    # you cannot use the "google_search" tool alongside other tools.
-    # See https://github.com/googleapis/python-genai/issues/941.
-    tools = ToolsSchema(
-        standard_tools=[weather_function, restaurant_function],
-        custom_tools={AdapterType.GEMINI: [search_tool]},
-    )
-
    llm = GeminiLiveLLMService(
        api_key=os.environ["GOOGLE_API_KEY"],
        settings=GeminiLiveLLMService.Settings(
@@ -121,16 +120,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        tools=tools,
    )

-    llm.register_function("get_current_weather", fetch_weather_from_api)
-    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+    llm.register_function(
+        "get_current_weather",
+        fetch_weather_from_api,
+        cancel_on_interruption=False,
+    )

-    # You can provide the system instructions and tools in the context rather
-    # than as arguments to GeminiLiveLLMService, but note that doing so will
-    # trigger a (fast) reconnection when the GeminiLiveLLMService first
-    # receives the context (i.e. when we send the LLMRunFrame below).
    context = LLMContext()
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
@@ -154,7 +155,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
-        # Kick off the conversation.
        context.add_message(
            {"role": "developer", "content": "Please introduce yourself to the user."}
        )
@@ -166,7 +166,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        await task.cancel()

    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
    await runner.run(task)


--- a/examples/realtime/realtime-gemini-live-files-api.py
+++ b/examples/realtime/realtime-gemini-live-files-api.py
@@ -15,7 +15,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -158,7 +161,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        )

    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    # Build the pipeline
    pipeline = Pipeline(
--- a/examples/realtime/realtime-gemini-live-google-search.py
+++ b/examples/realtime/realtime-gemini-live-google-search.py
@@ -15,7 +15,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -84,7 +87,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        ],
    )
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-graceful-end.py
+++ b/examples/realtime/realtime-gemini-live-graceful-end.py
@@ -17,7 +17,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.processors.frame_processor import FrameDirection
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -148,7 +151,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        [{"role": "developer", "content": "Say hello."}],
    )
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-grounding-metadata.py
+++ b/examples/realtime/realtime-gemini-live-grounding-metadata.py
@@ -9,7 +9,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -115,7 +118,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Set up conversation context and management
    context = LLMContext()
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-locally-driven-turns.py
+++ b/examples/realtime/realtime-gemini-live-locally-driven-turns.py
@@ -4,6 +4,29 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+"""Gemini Live with locally-driven turn detection.
+
+By default Gemini Live drives the conversation with its own server-side VAD
+(see `realtime-gemini-live.py`). That setup doesn't surface
+``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame``, so pipeline
+processors that depend on those frames (RTVI client speech events,
+``TurnTrackingObserver``, ``AudioBufferProcessor`` turn recording,
+``UserIdleController``, user mute strategies, voicemail detector) don't
+activate.
+
+This variant disables Gemini Live's server-side VAD
+(``GeminiVADParams(disabled=True)``) and instead drives turn boundaries
+locally with ``SileroVADAnalyzer`` wired into the user aggregator. Use this
+variant if you need those downstream processors, or if you want a turn
+analyzer like ``LocalSmartTurnV3`` to decide when the user is done speaking.
+
+Caveat: locally-generated turn boundaries are a heuristic and may not match
+the provider's actual server-side turn decisions, which is what really
+drives the conversation. The two can drift apart in subtle, hard-to-debug
+ways, especially around interruptions and overlapping speech. Prefer
+server-emitted turn frames (i.e. the base `realtime-gemini-live.py` example)
+unless you have a specific reason to drive turn detection locally.
+"""

 import os

@@ -20,6 +43,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -72,6 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    )
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
        user_params=LLMUserAggregatorParams(
            vad_analyzer=SileroVADAnalyzer(),
        ),
@@ -107,14 +132,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # The *_message_added events fire when messages are written to context
+    # and carry the finalized content. In realtime mode the turn-stopped
+    # events fire before the message text is finalized.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-gemini-live-vertex-function-calling.py
+++ b/examples/realtime/realtime-gemini-live-vertex-function-calling.py
@@ -18,7 +18,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.vertex.llm import GeminiLiveVertexLLMService
@@ -124,7 +127,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    context = LLMContext([{"role": "developer", "content": "Say hello."}])
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-video.py
+++ b/examples/realtime/realtime-gemini-live-video.py
@@ -16,7 +16,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
    create_transport,
@@ -64,7 +67,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        ],
    )
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live.py
+++ b/examples/realtime/realtime-gemini-live.py
@@ -6,10 +6,13 @@


 import os
+from datetime import datetime

 from dotenv import load_dotenv
 from loguru import logger

+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -18,11 +21,13 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
+from pipecat.services.llm_service import FunctionCallParams
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -30,6 +35,32 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
 load_dotenv(override=True)


+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+system_instruction = """
+You are a helpful assistant who can answer questions and use tools.
+
+You have three tools available to you:
+1. get_current_weather: Use this tool to get the current weather in a specific location.
+2. get_restaurant_recommendation: Use this tool to get a restaurant recommendation in a specific location.
+3. google_search: Use this tool to search the web for information.
+"""
+
+
 # We use lambdas to defer transport parameter creation until the transport
 # type is selected at runtime.
 transport_params = {
@@ -51,25 +82,82 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

+    weather_function = FunctionSchema(
+        name="get_current_weather",
+        description="Get the current weather",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+            "format": {
+                "type": "string",
+                "enum": ["celsius", "fahrenheit"],
+                "description": "The temperature unit to use. Infer this from the user's location.",
+            },
+        },
+        required=["location", "format"],
+    )
+    restaurant_function = FunctionSchema(
+        name="get_restaurant_recommendation",
+        description="Get a restaurant recommendation",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+        },
+        required=["location"],
+    )
+    search_tool = {"google_search": {}}
+    # KNOWN ISSUE: If using GeminiVertexLiveLLMService, it appears
+    # you cannot use the "google_search" tool alongside other tools.
+    # See https://github.com/googleapis/python-genai/issues/941.
+    tools = ToolsSchema(
+        standard_tools=[weather_function, restaurant_function],
+        custom_tools={AdapterType.GEMINI: [search_tool]},
+    )
+
    llm = GeminiLiveLLMService(
        api_key=os.environ["GOOGLE_API_KEY"],
        settings=GeminiLiveLLMService.Settings(
+            system_instruction=system_instruction,
            voice="Aoede",  # Puck, Charon, Kore, Fenrir, Aoede
-            # system_instruction="Talk like a pirate."
        ),
-        # inference_on_context_initialization=False,
+        tools=tools,
    )

-    context = LLMContext(
-        [
-            {
-                "role": "user",
-                "content": "Say hello. Then ask if I want to hear a joke.",
-            },
-        ],
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+
+    context = LLMContext()
+    # Gemini Live drives the conversation server-side and does not emit
+    # UserStartedSpeakingFrame / UserStoppedSpeakingFrame. Context
+    # aggregation still works with realtime_service_mode, but pipeline
+    # processors that depend on those frames (RTVI client speech events,
+    # TurnTrackingObserver, AudioBufferProcessor turn recording,
+    # UserIdleController, user mute strategies, voicemail detector) won't
+    # activate. The Pipecat Prebuilt UI is one such consumer — without
+    # these frames it can't group user transcripts into discrete turns
+    # visually.
+    #
+    # If you need those frames, uncomment the SileroVADAnalyzer import
+    # above and the `user_params=` argument below. Note: local turn
+    # detection may not match Gemini Live's actual server-side turn
+    # decisions and can desynchronize in subtle ways.
+    #
+    # For local VAD driving the conversation (server VAD disabled), see
+    # `realtime-gemini-live-locally-driven-turns.py` instead.
+    #
+    # from pipecat.audio.vad.silero import SileroVADAnalyzer
+    # from pipecat.processors.aggregators.llm_response_universal import (
+    #     LLMUserAggregatorParams,
+    # )
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        # user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
    )
-    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)

    pipeline = Pipeline(
        [
@@ -94,6 +182,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
@@ -101,14 +192,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Gemini Live doesn't emit user-turn frames so on_user_turn_stopped
+    # would never fire. The *_message_added events fire when messages are
+    # written to context and carry the finalized content; use those for
+    # transcript logging regardless of whether the service emits turn
+    # frames.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-grok-async-tool.py
+++ b/examples/realtime/realtime-grok-async-tool.py
@@ -0,0 +1,185 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Example: async function call with the Grok Realtime LLM service.
+
+The ``get_current_weather`` tool is registered with
+``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
+While the call is in flight the conversation continues; the result arrives
+later via the async-tool mechanism and is forwarded to Grok Realtime as a
+``function_call_output`` so the model can integrate it naturally into its
+next turn.
+"""
+
+import asyncio
+import os
+import random
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.xai.realtime.events import SessionProperties
+from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    # Simulate a long-running API call so we can demonstrate that the
+    # conversation continues while the tool is in flight.
+    await asyncio.sleep(10)
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "location": params.arguments["location"],
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function])
+
+
+system_instruction = (
+    "You are a friendly assistant. The user and you will engage in a spoken "
+    "dialog exchanging the transcripts of a natural real-time conversation. "
+    "Keep your responses short, generally two or three sentences for chatty "
+    "scenarios. When the user asks for the weather, call get_current_weather. "
+    "While you wait for the result, keep chatting with the user. When the "
+    "result arrives, share it with the user naturally."
+)
+
+
+# Note: Grok has built-in server-side VAD, so we don't need local VAD.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    llm = GrokRealtimeLLMService(
+        api_key=os.environ["XAI_API_KEY"],
+        settings=GrokRealtimeLLMService.Settings(
+            system_instruction=system_instruction,
+            session_properties=SessionProperties(
+                voice="Ara",
+            ),
+        ),
+    )
+
+    llm.register_function(
+        "get_current_weather",
+        fetch_weather_from_api,
+        cancel_on_interruption=False,
+    )
+
+    context = LLMContext(tools=tools)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-grok-locally-driven-turns.py
+++ b/examples/realtime/realtime-grok-locally-driven-turns.py
@@ -0,0 +1,262 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Grok Realtime with locally-driven turn detection.
+
+By default Grok Realtime drives the conversation with its own server-side
+VAD (see `realtime-grok.py`). This variant disables server-side turn
+detection (``turn_detection=None``, the "manual" mode in Grok's session
+properties) and instead drives turn boundaries locally with
+``SileroVADAnalyzer`` wired into the user aggregator. Use this variant if
+you want a turn analyzer like ``LocalSmartTurnV3`` to decide when the user
+is done speaking, or if you need ``UserStartedSpeakingFrame`` /
+``UserStoppedSpeakingFrame`` to fire from the same source as
+``InterruptionFrame``.
+
+Caveat: locally-generated turn boundaries are a heuristic and may not match
+the provider's actual server-side turn decisions. Prefer server-emitted
+turn frames (i.e. the base `realtime-grok.py` example) unless you have a
+specific reason to drive turn detection locally.
+"""
+
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.observers.loggers.transcription_log_observer import (
+    TranscriptionLogObserver,
+)
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    AssistantTurnStoppedMessage,
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
+    UserTurnStoppedMessage,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.xai.realtime.events import SessionProperties
+from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    """Handle weather function calls."""
+    temperature = 75 if params.arguments.get("format") == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments.get("format", "celsius"),
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def get_current_time(params: FunctionCallParams):
+    """Handle time function calls."""
+    await params.result_callback(
+        {
+            "time": datetime.now().strftime("%H:%M:%S"),
+            "date": datetime.now().strftime("%Y-%m-%d"),
+            "timezone": "local",
+        }
+    )
+
+
+async def get_restaurant_recommendation(params: FunctionCallParams):
+    """Handle restaurant recommendation function calls."""
+    location = params.arguments.get("location", "unknown")
+    await params.result_callback(
+        {
+            "name": "The Golden Dragon",
+            "cuisine": "Chinese",
+            "location": location,
+            "rating": 4.5,
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather for a location",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use.",
+        },
+    },
+    required=["location", "format"],
+)
+
+time_function = FunctionSchema(
+    name="get_current_time",
+    description="Get the current time and date",
+    properties={},
+    required=[],
+)
+
+restaurant_function = FunctionSchema(
+    name="get_restaurant_recommendation",
+    description="Get a restaurant recommendation for a location",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+    },
+    required=["location"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function, time_function, restaurant_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info("Starting Grok Voice Agent bot")
+
+    session_properties = SessionProperties(
+        voice="Ara",
+        # Disable Grok's server-side turn detection (manual mode). This
+        # example drives turn boundaries locally via the SileroVADAnalyzer
+        # wired into the user aggregator below.
+        turn_detection=None,
+    )
+
+    llm = GrokRealtimeLLMService(
+        api_key=os.environ["XAI_API_KEY"],
+        settings=GrokRealtimeLLMService.Settings(
+            system_instruction="""You are a helpful and friendly AI assistant powered by Grok.
+
+    You have access to several tools:
+    - Weather information
+    - Current time
+    - Restaurant recommendations
+    - Web search (built-in)
+    - X/Twitter search (built-in)
+
+    Your voice and personality should be warm and engaging. Keep your responses
+    concise and conversational since this is a voice interaction.
+
+    If the user asks about current events or news, use web search.
+    If they ask about what people are saying on social media, use X search.
+
+    Always be helpful and proactive in offering assistance.""",
+            session_properties=session_properties,
+        ),
+    )
+
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_current_time", get_current_time)
+    llm.register_function("get_restaurant_recommendation", get_restaurant_recommendation)
+
+    context = LLMContext(
+        [{"role": "developer", "content": "Say hello and introduce yourself!"}],
+        tools,
+    )
+
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        # Drive turn detection locally via SileroVAD wired into the user
+        # aggregator. realtime_service_mode keeps context-write semantics
+        # correct and (by default) drops the transcript wait on turn-end so
+        # local VAD can drive turn boundaries on the latency critical path.
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[TranscriptionLogObserver()],
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info("Client connected")
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info("Client disconnected")
+        await task.cancel()
+
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}user: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}assistant: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-grok.py
+++ b/examples/realtime/realtime-grok.py
@@ -33,9 +33,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-
-# Note: Grok has built-in server-side VAD, so we don't need local VAD
-# from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.observers.loggers.transcription_log_observer import (
    TranscriptionLogObserver,
@@ -47,6 +44,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -212,7 +210,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        tools,
    )

-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    # Build the pipeline
    # Note: In realtime mode, transcription comes from Grok (upstream),
@@ -248,15 +249,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info("Client disconnected")
        await task.cancel()

-    # Log transcript updates
-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Log transcript updates. In realtime mode the turn-stopped events
+    # fire before the message text is finalized (UserTurnStoppedMessage
+    # content is None), so subscribe to the *_message_added events
+    # instead — they fire when the message is written to context and
+    # carry the finalized content.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-inworld-locally-driven-turns.py
+++ b/examples/realtime/realtime-inworld-locally-driven-turns.py
@@ -0,0 +1,235 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Inworld Realtime with locally-driven turn detection.
+
+By default Inworld Realtime drives the conversation with its own
+server-side semantic VAD (see `realtime-inworld.py`). This variant
+disables server-side turn detection (``turn_detection=None``, the
+"manual" mode in Inworld's session properties) and instead drives turn
+boundaries locally with ``SileroVADAnalyzer`` wired into the user
+aggregator. Use this variant if you want a turn analyzer like
+``LocalSmartTurnV3`` to decide when the user is done speaking, or if you
+need ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame`` to fire
+from the same source as ``InterruptionFrame``.
+
+Caveat: locally-generated turn boundaries are a heuristic and may not
+match the provider's actual server-side turn decisions. Prefer
+server-emitted turn frames (i.e. the base `realtime-inworld.py` example)
+unless you have a specific reason to drive turn detection locally.
+"""
+
+import os
+import random
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.observers.loggers.transcription_log_observer import (
+    TranscriptionLogObserver,
+)
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    AssistantTurnStoppedMessage,
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
+    UserTurnStoppedMessage,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.inworld.realtime.events import (
+    AudioConfiguration,
+    AudioInput,
+    AudioOutput,
+    InputTranscription,
+    PCMAudioFormat,
+    SessionProperties,
+)
+from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use.",
+        },
+    },
+    required=["location", "format"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info("Starting Inworld Realtime bot (local VAD)")
+
+    model = "openai/gpt-4.1-mini"
+    voice = "Sarah"
+    tts_model = "inworld-tts-2"
+    stt_model = "assemblyai/u3-rt-pro"
+
+    # Setting session_properties here replaces Inworld's defaults wholesale,
+    # so we provide a complete SessionProperties — with turn_detection=None
+    # (manual mode) so local VAD drives turn boundaries instead.
+    session_properties = SessionProperties(
+        model=model,
+        output_modalities=["audio", "text"],
+        audio=AudioConfiguration(
+            input=AudioInput(
+                format=PCMAudioFormat(rate=24000),
+                transcription=InputTranscription(model=stt_model),
+                turn_detection=None,
+            ),
+            output=AudioOutput(
+                format=PCMAudioFormat(rate=24000),
+                model=tts_model,
+                voice=voice,
+            ),
+        ),
+    )
+
+    llm = InworldRealtimeLLMService(
+        api_key=os.environ["INWORLD_API_KEY"],
+        settings=InworldRealtimeLLMService.Settings(
+            system_instruction="""You are a helpful and friendly AI assistant powered by Inworld.
+
+Your voice and personality should be warm and engaging. Keep your responses
+concise and conversational since this is a voice interaction.
+
+Always be helpful and proactive in offering assistance.""",
+            session_properties=session_properties,
+        ),
+    )
+
+    # Note: function calling requires a paid Inworld account and a
+    # function-calling-capable model
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+
+    context = LLMContext(
+        [{"role": "developer", "content": "Say hello and introduce yourself!"}],
+        tools,
+    )
+
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        # Drive turn detection locally via SileroVAD wired into the user
+        # aggregator. realtime_service_mode keeps context-write semantics
+        # correct and (by default) drops the transcript wait on turn-end so
+        # local VAD can drive turn boundaries on the latency critical path.
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[TranscriptionLogObserver()],
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info("Client connected")
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info("Client disconnected")
+        await task.cancel()
+
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        logger.info(f"Transcript: {timestamp}user: {message.content}")
+
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        logger.info(f"Transcript: {timestamp}assistant: {message.content}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-inworld.py
+++ b/examples/realtime/realtime-inworld.py
@@ -28,10 +28,14 @@ Usage:
 """

 import os
+import random
+from datetime import datetime

 from dotenv import load_dotenv
 from loguru import logger

+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.observers.loggers.transcription_log_observer import (
    TranscriptionLogObserver,
@@ -43,11 +47,13 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMService
+from pipecat.services.llm_service import FunctionCallParams
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -55,6 +61,43 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
 load_dotenv(override=True)


+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "location": params.arguments["location"],
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function])
+
+
 # --- Transport Configuration ---

 # No local VAD needed — Inworld's server-side semantic VAD handles turn detection.
@@ -85,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # See: https://docs.inworld.ai/router/introduction
    llm = InworldRealtimeLLMService(
        api_key=os.environ["INWORLD_API_KEY"],
-        llm_model="xai/grok-4-1-fast-non-reasoning",
+        llm_model="openai/gpt-4.1-mini",
        voice="Sarah",
        settings=InworldRealtimeLLMService.Settings(
            system_instruction="""You are a helpful and friendly AI assistant powered by Inworld.
@@ -97,12 +140,20 @@ Always be helpful and proactive in offering assistance.""",
        ),
    )

-    # Create context with initial message
+    # Note: function calling requires a paid Inworld account and a
+    # function-calling-capable model
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+
+    # Create context with initial message + tools
    context = LLMContext(
        [{"role": "developer", "content": "Say hello and introduce yourself!"}],
+        tools,
    )

-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    # Build the pipeline
    pipeline = Pipeline(
@@ -135,13 +186,16 @@ Always be helpful and proactive in offering assistance.""",
        logger.info("Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # In realtime mode the turn-stopped events fire before the message
+    # text is finalized; subscribe to the *_message_added events for the
+    # finalized content.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        logger.info(f"Transcript: {timestamp}user: {message.content}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        logger.info(f"Transcript: {timestamp}assistant: {message.content}")

--- a/examples/realtime/realtime-openai-async-tool.py
+++ b/examples/realtime/realtime-openai-async-tool.py
@@ -0,0 +1,197 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Example: async function call with the OpenAI Realtime LLM service.
+
+The ``get_current_weather`` tool is registered with
+``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
+While the call is in flight the conversation continues; the result arrives
+later via the async-tool mechanism and is forwarded to OpenAI Realtime as a
+``function_call_output`` so the model can integrate it naturally into its
+next turn.
+"""
+
+import asyncio
+import os
+import random
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai.realtime.events import (
+    AudioConfiguration,
+    AudioInput,
+    InputAudioNoiseReduction,
+    InputAudioTranscription,
+    SemanticTurnDetection,
+    SessionProperties,
+)
+from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    # Simulate a long-running API call so we can demonstrate that the
+    # conversation continues while the tool is in flight.
+    await asyncio.sleep(10)
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "location": params.arguments["location"],
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function])
+
+
+system_instruction = (
+    "You are a friendly assistant. The user and you will engage in a spoken "
+    "dialog exchanging the transcripts of a natural real-time conversation. "
+    "Keep your responses short, generally two or three sentences for chatty "
+    "scenarios. When the user asks for the weather, call get_current_weather. "
+    "While you wait for the result, keep chatting with the user. When the "
+    "result arrives, share it with the user naturally."
+)
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    llm = OpenAIRealtimeLLMService(
+        api_key=os.environ["OPENAI_API_KEY"],
+        settings=OpenAIRealtimeLLMService.Settings(
+            system_instruction=system_instruction,
+            session_properties=SessionProperties(
+                audio=AudioConfiguration(
+                    input=AudioInput(
+                        transcription=InputAudioTranscription(),
+                        turn_detection=SemanticTurnDetection(),
+                        noise_reduction=InputAudioNoiseReduction(type="near_field"),
+                    )
+                ),
+            ),
+        ),
+    )
+
+    llm.register_function(
+        "get_current_weather",
+        fetch_weather_from_api,
+        cancel_on_interruption=False,
+    )
+
+    context = LLMContext(tools=tools)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-openai-live-video.py
+++ b/examples/realtime/realtime-openai-live-video.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
 from pipecat.pipeline.pipeline import Pipeline
@@ -19,7 +18,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
@@ -106,7 +105,7 @@ Remember, your responses should be short. Just one or two sentences, usually. Re

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-openai-locally-driven-turns.py
+++ b/examples/realtime/realtime-openai-locally-driven-turns.py
@@ -0,0 +1,267 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""OpenAI Realtime with locally-driven turn detection.
+
+By default OpenAI Realtime drives the conversation with its own server-side
+VAD (see `realtime-openai.py`). This variant disables server-side turn
+detection (``turn_detection=False``) and instead drives turn boundaries
+locally with ``SileroVADAnalyzer`` wired into the user aggregator. This is
+the path to take if you want a turn analyzer like ``LocalSmartTurnV3`` to
+decide when the user is done speaking, or if you need ``UserStartedSpeakingFrame``
+/ ``UserStoppedSpeakingFrame`` to fire from the same source as
+``InterruptionFrame``.
+
+Caveat: locally-generated turn boundaries are a heuristic and may not match
+the provider's actual server-side turn decisions. With OpenAI Realtime,
+server-side turn detection is generally what the service expects to drive
+the conversation, and disabling it puts the responsibility on you. Prefer
+server-emitted turn frames (i.e. the base `realtime-openai.py` example)
+unless you have a specific reason to drive turn detection locally.
+"""
+
+import asyncio
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame, LLMSetToolsFrame
+from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    AssistantTurnStoppedMessage,
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
+    UserTurnStoppedMessage,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai.realtime.events import (
+    AudioConfiguration,
+    AudioInput,
+    InputAudioNoiseReduction,
+    InputAudioTranscription,
+    SessionProperties,
+)
+from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def get_news(params: FunctionCallParams):
+    await params.result_callback(
+        {
+            "news": [
+                "Massive UFO currently hovering above New York City",
+                "Stock markets reach all-time highs",
+                "Living dinosaur species discovered in the Amazon rainforest",
+            ],
+        }
+    )
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+get_news_function = FunctionSchema(
+    name="get_news",
+    description="Get the current news.",
+    properties={},
+    required=[],
+)
+
+restaurant_function = FunctionSchema(
+    name="get_restaurant_recommendation",
+    description="Get a restaurant recommendation",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+    },
+    required=["location"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    llm = OpenAIRealtimeLLMService(
+        api_key=os.environ["OPENAI_API_KEY"],
+        settings=OpenAIRealtimeLLMService.Settings(
+            system_instruction="""You are a helpful and friendly AI.
+
+Act like a human, but remember that you aren't a human and that you can't do human
+things in the real world. Your voice and personality should be warm and engaging, with a lively and
+playful tone.
+
+If interacting in a non-English language, start by using the standard accent or dialect familiar to
+the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
+even if you're asked about them.
+
+You are participating in a voice conversation. Keep your responses concise, short, and to the point
+unless specifically asked to elaborate on a topic.
+
+Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
+            session_properties=SessionProperties(
+                audio=AudioConfiguration(
+                    input=AudioInput(
+                        transcription=InputAudioTranscription(),
+                        # Disable OpenAI's server-side turn detection — this
+                        # example drives turn boundaries locally via the
+                        # SileroVADAnalyzer wired into the user aggregator
+                        # below.
+                        turn_detection=False,
+                        noise_reduction=InputAudioNoiseReduction(type="near_field"),
+                    )
+                ),
+            ),
+        ),
+    )
+
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+    llm.register_function("get_news", get_news)
+
+    context = LLMContext(
+        [{"role": "developer", "content": "Say hello!"}],
+        tools,
+    )
+
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        # Drive turn detection locally via SileroVAD wired into the user
+        # aggregator. realtime_service_mode keeps context-write semantics
+        # correct and (by default) drops the transcript wait on turn-end so
+        # local VAD can drive turn boundaries on the latency critical path.
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[TranscriptionLogObserver()],
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        await task.queue_frames([LLMRunFrame()])
+
+        await asyncio.sleep(15)
+        new_tools = ToolsSchema(
+            standard_tools=[weather_function, restaurant_function, get_news_function]
+        )
+        await task.queue_frames([LLMSetToolsFrame(tools=new_tools)])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}user: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}assistant: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-openai-text.py
+++ b/examples/realtime/realtime-openai-text.py
@@ -13,7 +13,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -21,7 +20,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -177,7 +176,7 @@ Remember, your responses should be short. Just one or two sentences, usually. Re

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-openai.py
+++ b/examples/realtime/realtime-openai.py
@@ -14,7 +14,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMSetToolsFrame
 from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
 from pipecat.pipeline.pipeline import Pipeline
@@ -24,7 +23,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -187,7 +186,13 @@ Remember, your responses should be short. Just one or two sentences, usually. Re

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        # OpenAI Realtime drives the conversation server-side and emits its
+        # own UserStarted/StoppedSpeakingFrame from server VAD events, so
+        # local VAD on the aggregator is unnecessary. realtime_service_mode
+        # decouples context writes from turn frames and transcript-bound
+        # turn-end. See `realtime-openai-locally-driven-turns.py` for the
+        # variant that disables server VAD and drives turn detection locally.
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -232,20 +237,38 @@ Remember, your responses should be short. Just one or two sentences, usually. Re
        #     [LLMUpdateSettingsFrame(settings=SessionProperties(tools=new_tools).model_dump())]
        # )

+        # Reasoning effort can be changed at runtime too. Only
+        # reasoning-capable Realtime models (e.g. gpt-realtime-2) support this.
+        # await task.queue_frames(
+        #     [
+        #         LLMUpdateSettingsFrame(
+        #             delta=OpenAIRealtimeLLMService.Settings(
+        #                 session_properties=SessionProperties(
+        #                     reasoning=Reasoning(effort="xhigh"),
+        #                 ),
+        #             )
+        #         )
+        #     ]
+        # )
+
    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
        logger.info(f"Client disconnected")
        await task.cancel()

-    # Log transcript updates
-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Log transcript updates. In realtime mode the turn-stopped events
+    # fire before the message text is finalized (UserTurnStoppedMessage
+    # content is None), so subscribe to the *_message_added events
+    # instead — they fire when the message is written to context and
+    # carry the finalized content.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-ultravox-async-tool.py
+++ b/examples/realtime/realtime-ultravox-async-tool.py
@@ -0,0 +1,178 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Example: async function call with the Ultravox Realtime LLM service.
+
+The ``get_current_weather`` tool is registered with
+``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
+
+Ultravox's API freezes the conversation between ``client_tool_invocation``
+and the matching ``client_tool_result``, so the service ships a placeholder
+``client_tool_result`` immediately when an async-registered function is
+invoked (to unfreeze the conversation). When the real tool finishes, the
+actual result is injected as user-side text so the model picks it up.
+"""
+
+import asyncio
+import datetime
+import os
+import random
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.ultravox.llm import OneShotInputParams, UltravoxRealtimeLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    # Simulate a long-running API call so we can demonstrate that the
+    # conversation continues while the tool is in flight.
+    await asyncio.sleep(10)
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "location": params.arguments["location"],
+            "format": params.arguments["format"],
+            "timestamp": datetime.datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+
+system_prompt = (
+    "You are a friendly assistant. The user and you will engage in a spoken "
+    "dialog exchanging the transcripts of a natural real-time conversation. "
+    "Keep your responses short, generally two or three sentences for chatty "
+    "scenarios. When the user asks for the weather, call get_current_weather. "
+    "While you wait for the result, keep chatting with the user. When the "
+    "result arrives, share it with the user naturally."
+)
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    llm = UltravoxRealtimeLLMService(
+        params=OneShotInputParams(
+            api_key=os.environ["ULTRAVOX_API_KEY"],
+            system_prompt=system_prompt,
+            temperature=0.3,
+            max_duration=datetime.timedelta(minutes=3),
+        ),
+        one_shot_selected_tools=ToolsSchema(standard_tools=[weather_function]),
+    )
+
+    llm.register_function(
+        "get_current_weather",
+        fetch_weather_from_api,
+        cancel_on_interruption=False,
+    )
+
+    context = LLMContext([])
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-ultravox-text.py
+++ b/examples/realtime/realtime-ultravox-text.py
@@ -12,8 +12,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -21,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -32,8 +30,6 @@ from pipecat.services.ultravox.llm import OneShotInputParams, UltravoxRealtimeLL
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
-from pipecat.turns.user_turn_strategies import UserTurnStrategies

 # Load environment variables
 load_dotenv(override=True)
@@ -188,17 +184,9 @@ There is also a secret menu that changes daily. If the user asks about it, use t

    context = LLMContext([])

-    # Necessary to complete the function call lifecycle in Pipecat and
-    # to produce user and assistant turn stopped events.
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(
-            user_turn_strategies=UserTurnStrategies(
-                stop=[SpeechTimeoutUserTurnStopStrategy()],
-            ),
-            # Set the VAD analyzer to emulate timing of the model.
-            vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5)),
-        ),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    # Build the pipeline
@@ -234,14 +222,16 @@ There is also a secret menu that changes daily. If the user asks about it, use t
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Ultravox doesn't emit user-turn frames; subscribe to the
+    # *_message_added events for the finalized message text.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-ultravox.py
+++ b/examples/realtime/realtime-ultravox.py
@@ -12,7 +12,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -20,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -30,8 +29,6 @@ from pipecat.services.ultravox.llm import OneShotInputParams, UltravoxRealtimeLL
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
-from pipecat.turns.user_turn_strategies import UserTurnStrategies

 # Load environment variables
 load_dotenv(override=True)
@@ -178,18 +175,29 @@ There is also a secret menu that changes daily. If the user asks about it, use t

    context = LLMContext([])

-    # Necessary to complete the function call lifecycle in Pipecat and
-    # to produce user and assistant turn stopped events.
+    # Ultravox drives the conversation server-side and does not emit
+    # UserStartedSpeakingFrame / UserStoppedSpeakingFrame. Context
+    # aggregation still works with realtime_service_mode, but pipeline
+    # processors that depend on those frames (RTVI client speech events,
+    # TurnTrackingObserver, AudioBufferProcessor turn recording,
+    # UserIdleController, user mute strategies, voicemail detector) won't
+    # activate. The Pipecat Prebuilt UI is one such consumer — without
+    # these frames it can't group user transcripts into discrete turns
+    # visually.
+    #
+    # If you need those frames, uncomment the SileroVADAnalyzer import
+    # above and the `user_params=` argument below. Note: local turn
+    # detection may not match Ultravox's actual server-side turn
+    # decisions and can desynchronize in subtle ways.
+    #
+    # from pipecat.audio.vad.silero import SileroVADAnalyzer
+    # from pipecat.processors.aggregators.llm_response_universal import (
+    #     LLMUserAggregatorParams,
+    # )
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(
-            user_turn_strategies=UserTurnStrategies(
-                stop=[SpeechTimeoutUserTurnStopStrategy()],
-            ),
-            # Set the VAD analyzer to create reliable TTFB measurements and
-            # user stop events.
-            vad_analyzer=SileroVADAnalyzer(),
-        ),
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        # user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
    )

    # Build the pipeline
@@ -224,14 +232,18 @@ There is also a secret menu that changes daily. If the user asks about it, use t
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Ultravox doesn't emit user-turn frames so on_user_turn_stopped
+    # would never fire. The *_message_added events fire when messages are
+    # written to context and carry the finalized content; use those for
+    # transcript logging.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/transcription/transcription-gradium.py
+++ b/examples/transcription/transcription-gradium.py
@@ -51,7 +51,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = GradiumSTTService(
        api_key=os.environ["GRADIUM_API_KEY"],
-        api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
        settings=GradiumSTTService.Settings(
            language=Language.EN,
            delay_in_frames=8,
--- a/examples/transcription/transcription-openai.py
+++ b/examples/transcription/transcription-openai.py
@@ -49,13 +49,7 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = OpenAIRealtimeSTTService(
-        api_key=os.environ["OPENAI_API_KEY"],
-        settings=OpenAIRealtimeSTTService.Settings(
-            model="gpt-4o-transcribe",
-            prompt="Expect words related to dogs, such as breed names.",
-        ),
-    )
+    stt = OpenAIRealtimeSTTService(api_key=os.environ["OPENAI_API_KEY"])

    tl = TranscriptionLogger()
    vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer())
--- a/examples/transports/transports-vonage.py
+++ b/examples/transports/transports-vonage.py
@@ -0,0 +1,134 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Example of using OpenAI Realtime voice LLM service with Vonage Video Connector transport."""
+
+import asyncio
+import os
+import sys
+from collections.abc import Callable
+from typing import Any
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.vonage import configure
+from pipecat.services.openai.realtime.events import (
+    AudioConfiguration,
+    AudioInput,
+    InputAudioNoiseReduction,
+    InputAudioTranscription,
+    SemanticTurnDetection,
+    SessionProperties,
+)
+from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
+from pipecat.transports.vonage.video_connector import (
+    VonageVideoConnectorTransport,
+    VonageVideoConnectorTransportParams,
+)
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main() -> None:
+    """Main entry point for the OpenAI Realtime vonage video connector example."""
+    (application_id, session_id, token) = await configure()
+
+    transport = VonageVideoConnectorTransport(
+        application_id,
+        session_id,
+        token,
+        VonageVideoConnectorTransportParams(
+            audio_in_enabled=True,
+            audio_out_enabled=True,
+            publisher_name="Bot",
+        ),
+    )
+
+    llm = OpenAIRealtimeLLMService(
+        api_key=os.environ["OPENAI_API_KEY"],
+        settings=OpenAIRealtimeLLMService.Settings(
+            system_instruction="""You are a helpful and friendly AI.
+
+Act like a human, but remember that you aren't a human and that you can't do human
+things in the real world. Your voice and personality should be warm and engaging, with a lively and
+playful tone.
+
+If interacting in a non-English language, start by using the standard accent or dialect familiar to
+the user. Talk quickly.
+
+You are participating in a voice conversation. Keep your responses concise, short, and to the point
+unless specifically asked to elaborate on a topic.
+
+Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
+            session_properties=SessionProperties(
+                audio=AudioConfiguration(
+                    input=AudioInput(
+                        transcription=InputAudioTranscription(),
+                        turn_detection=SemanticTurnDetection(),
+                        noise_reduction=InputAudioNoiseReduction(type="near_field"),
+                    )
+                ),
+            ),
+        ),
+    )
+
+    context = LLMContext(
+        [{"role": "developer", "content": "Say hello!"}],
+    )
+
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        observers=[TranscriptionLogObserver()],
+    )
+
+    event_handler: Callable[[str], Callable[[Any], Any]] = transport.event_handler
+
+    @event_handler("on_client_connected")
+    async def on_client_connected(transport: VonageVideoConnectorTransport, client: object) -> None:
+        logger.info("Client connected")
+        await task.queue_frames([LLMRunFrame()])
+
+    runner = PipelineRunner()
+
+    await runner.run(task)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/turn-management/turn-management-filter-incomplete-turns-function-calling.py
+++ b/examples/turn-management/turn-management-filter-incomplete-turns-function-calling.py
@@ -0,0 +1,201 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Example 22: Filter Incomplete Turns
+
+Demonstrates LLM-based turn completion detection to suppress bot responses when
+the user was cut off mid-thought. The LLM outputs one of three markers:
+- ✓ (complete): User finished their thought, respond normally
+- ○ (incomplete short): User was cut off, wait ~5s then prompt
+- ◐ (incomplete long): User needs time to think, wait ~10s then prompt
+
+When incomplete is detected, the bot's response is suppressed. After the timeout
+expires, the LLM is automatically prompted to re-engage the user.
+"""
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    AssistantTurnStoppedMessage,
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+    UserTurnStoppedMessage,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+from pipecat.turns.user_turn_strategies import FilterIncompleteUserTurnStrategies
+
+load_dotenv(override=True)
+
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def get_weather(params: FunctionCallParams, location: str):
+    """Return the current weather for a location.
+
+    A stub that always reports the same conditions — replace with a real
+    weather API in production.
+
+    Args:
+        location (str): The city and state or country, e.g. "Paris, France".
+    """
+    await params.result_callback(
+        {
+            "location": location,
+            "temperature_celsius": 22,
+            "conditions": "partly cloudy",
+        }
+    )
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
+
+    llm = OpenAILLMService(
+        api_key=os.environ["OPENAI_API_KEY"],
+        settings=OpenAILLMService.Settings(
+            system_instruction=(
+                "You are a helpful assistant in a voice conversation. Your "
+                "responses will be spoken aloud, so avoid emojis, bullet "
+                "points, or other formatting that can't be spoken. Respond to "
+                "what the user said in a creative, helpful, and brief way. "
+                "If the user asks about the weather, call the get_weather "
+                "tool and speak the result back naturally."
+            ),
+        ),
+    )
+    llm.register_direct_function(get_weather)
+
+    tts = CartesiaTTSService(
+        api_key=os.environ["CARTESIA_API_KEY"],
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )
+
+    context = LLMContext(tools=ToolsSchema(standard_tools=[get_weather]))
+    # `FilterIncompleteUserTurnStrategies` pairs the default detector
+    # chain with `LLMTurnCompletionUserTurnStopStrategy`: detectors
+    # trigger LLM inference but the public `on_user_turn_stopped` event
+    # fires only when the LLM confirms ✓. The LLM marks each response
+    # with one of:
+    # ✓ = complete (respond normally)
+    # ○ = incomplete short (wait 5s, then prompt)
+    # ◐ = incomplete long (wait 10s, then prompt)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(
+            vad_analyzer=SileroVADAnalyzer(),
+            user_turn_strategies=FilterIncompleteUserTurnStrategies(
+                # Optional: customize turn completion behavior
+                # config=UserTurnCompletionConfig(
+                #     incomplete_short_timeout=5.0,
+                #     incomplete_long_timeout=10.0,
+                #     incomplete_short_prompt="Custom prompt...",
+                #     incomplete_long_prompt="Custom prompt...",
+                #     instructions="Custom turn completion instructions...",
+                # ),
+            ),
+        ),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,
+            user_aggregator,  # User responses
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            assistant_aggregator,  # Assistant spoken responses
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    @user_aggregator.event_handler("on_user_turn_stopped")
+    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}user: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
+    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}assistant: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/turn-management/turn-management-filter-incomplete-turns.py
+++ b/examples/turn-management/turn-management-filter-incomplete-turns.py
@@ -10,7 +10,7 @@ Demonstrates LLM-based turn completion detection to suppress bot responses when
 the user was cut off mid-thought. The LLM outputs one of three markers:
 - ✓ (complete): User finished their thought, respond normally
 - ○ (incomplete short): User was cut off, wait ~5s then prompt
- ◐ (incomplete long): User needs time to think, wait ~15s then prompt
+- ◐ (incomplete long): User needs time to think, wait ~10s then prompt

 When incomplete is detected, the bot's response is suppressed. After the timeout
 expires, the LLM is automatically prompted to re-engage the user.
@@ -41,6 +41,7 @@ from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+from pipecat.turns.user_turn_strategies import FilterIncompleteUserTurnStrategies

 load_dotenv(override=True)

@@ -83,23 +84,28 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    )

    context = LLMContext()
+    # `FilterIncompleteUserTurnStrategies` pairs the default detector
+    # chain with `LLMTurnCompletionUserTurnStopStrategy`: detectors
+    # trigger LLM inference but the public `on_user_turn_stopped` event
+    # fires only when the LLM confirms ✓. The LLM marks each response
+    # with one of:
+    # ✓ = complete (respond normally)
+    # ○ = incomplete short (wait 5s, then prompt)
+    # ◐ = incomplete long (wait 10s, then prompt)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
            vad_analyzer=SileroVADAnalyzer(),
-            # Enable turn completion filtering - the LLM will output:
-            # ✓ = complete (respond normally)
-            # ○ = incomplete short (wait 5s, then prompt)
-            # ◐ = incomplete long (wait 15s, then prompt)
-            filter_incomplete_user_turns=True,
-            # Optional: customize turn completion behavior
-            # turn_completion_config=TurnCompletionConfig(
-            #     incomplete_short_timeout=5.0,
-            #     incomplete_long_timeout=15.0,
-            #     incomplete_short_prompt="Custom prompt...",
-            #     incomplete_long_prompt="Custom prompt...",
-            #     instructions="Custom turn completion instructions...",
-            # ),
+            user_turn_strategies=FilterIncompleteUserTurnStrategies(
+                # Optional: customize turn completion behavior
+                # config=UserTurnCompletionConfig(
+                #     incomplete_short_timeout=5.0,
+                #     incomplete_long_timeout=10.0,
+                #     incomplete_short_prompt="Custom prompt...",
+                #     incomplete_long_prompt="Custom prompt...",
+                #     instructions="Custom turn completion instructions...",
+                # ),
+            ),
        ),
    )

--- a/examples/update-settings/llm/llm-aws-nova-sonic.py
+++ b/examples/update-settings/llm/llm-aws-nova-sonic.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -18,7 +17,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -60,7 +59,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/update-settings/llm/llm-azure-realtime.py
+++ b/examples/update-settings/llm/llm-azure-realtime.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.adapters.base_llm_adapter import LLMContextMessage
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -20,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -66,7 +65,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -88,8 +87,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/update-settings/llm/llm-gemini-live-vertex.py
+++ b/examples/update-settings/llm/llm-gemini-live-vertex.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -18,7 +17,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -60,7 +59,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/update-settings/llm/llm-gemini-live.py
+++ b/examples/update-settings/llm/llm-gemini-live.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -18,7 +17,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -58,7 +57,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/update-settings/llm/llm-grok-realtime.py
+++ b/examples/update-settings/llm/llm-grok-realtime.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.adapters.base_llm_adapter import LLMContextMessage
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -20,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -63,7 +62,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -85,8 +84,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/update-settings/llm/llm-openai-realtime.py
+++ b/examples/update-settings/llm/llm-openai-realtime.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.adapters.base_llm_adapter import LLMContextMessage
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -20,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -63,7 +62,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -85,8 +84,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/update-settings/llm/llm-ultravox-realtime.py
+++ b/examples/update-settings/llm/llm-ultravox-realtime.py
@@ -13,7 +13,6 @@ from loguru import logger

 from pipecat.adapters.base_llm_adapter import LLMContextMessage
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -22,7 +21,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -74,7 +73,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -96,8 +95,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/update-settings/stt/stt-gradium.py
+++ b/examples/update-settings/stt/stt-gradium.py
@@ -50,10 +50,7 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = GradiumSTTService(
-        api_key=os.environ["GRADIUM_API_KEY"],
-        api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
-    )
+    stt = GradiumSTTService(api_key=os.environ["GRADIUM_API_KEY"])

    tts = CartesiaTTSService(
        api_key=os.environ["CARTESIA_API_KEY"],
--- a/examples/update-settings/tts/tts-gradium.py
+++ b/examples/update-settings/tts/tts-gradium.py
@@ -55,7 +55,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    tts = GradiumTTSService(
        api_key=os.environ["GRADIUM_API_KEY"],
        settings=GradiumTTSService.Settings(voice="YTpq7expH9539ERJ"),
-        url="wss://us.api.gradium.ai/api/speech/tts",
    )

    llm = OpenAILLMService(
--- a/examples/voice/voice-gradium.py
+++ b/examples/voice/voice-gradium.py
@@ -54,7 +54,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = GradiumSTTService(
        api_key=os.environ["GRADIUM_API_KEY"],
-        api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
        settings=GradiumSTTService.Settings(
            language=Language.EN,
        ),
@@ -62,7 +61,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = GradiumTTSService(
        api_key=os.environ["GRADIUM_API_KEY"],
-        url="wss://us.api.gradium.ai/api/speech/tts",
        settings=GradiumTTSService.Settings(
            voice="YTpq7expH9539ERJ",
        ),
--- a/examples/voice/voice-nvidia-sagemaker.py
+++ b/examples/voice/voice-nvidia-sagemaker.py
@@ -0,0 +1,129 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+# For a full example of how to deploy to SageMaker, see:
+# https://github.com/pipecat-ai/pipecat-examples/tree/main/nvidia_sagemaker_example/deployment/aws-sagemaker-nvidia
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.nvidia.llm import NvidiaLLMService
+from pipecat.services.nvidia.sagemaker.stt import NvidiaSageMakerSTTService
+from pipecat.services.nvidia.sagemaker.tts import NvidiaSageMakerTTSService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    stt = NvidiaSageMakerSTTService(
+        endpoint_name=os.environ["SAGEMAKER_ASR_ENDPOINT_NAME"],
+        region=os.getenv("AWS_REGION", "us-west-2"),
+    )
+
+    llm = NvidiaLLMService(
+        api_key=os.environ["NVIDIA_API_KEY"],
+        settings=NvidiaLLMService.Settings(
+            model="meta/llama-3.3-70b-instruct",
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )
+
+    tts = NvidiaSageMakerTTSService(
+        endpoint_name=os.environ["SAGEMAKER_MAGPIE_ENDPOINT_NAME"],
+        region=os.getenv("AWS_REGION", "us-west-2"),
+    )
+
+    context = LLMContext()
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,  # STT
+            user_aggregator,  # User responses
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            assistant_aggregator,  # Assistant spoken responses
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/voice/voice-openai.py
+++ b/examples/voice/voice-openai.py
@@ -25,7 +25,6 @@ from pipecat.runner.utils import create_transport
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.openai.stt import OpenAIRealtimeSTTService
 from pipecat.services.openai.tts import OpenAITTSService
-from pipecat.transcriptions.language import Language
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -53,14 +52,7 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = OpenAIRealtimeSTTService(
-        api_key=os.environ["OPENAI_API_KEY"],
-        settings=OpenAIRealtimeSTTService.Settings(
-            model="gpt-4o-transcribe",
-            prompt="Expect words related to dogs, such as breed names.",
-            language=Language.EN,
-        ),
-    )
+    stt = OpenAIRealtimeSTTService(api_key=os.environ["OPENAI_API_KEY"])

    tts = OpenAITTSService(
        api_key=os.environ["OPENAI_API_KEY"],
@@ -72,7 +64,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        settings=OpenAILLMService.Settings(
-            system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
        ),
    )

--- a/examples/voice/voice-soniox.py
+++ b/examples/voice/voice-soniox.py
@@ -58,6 +58,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            # Add strict mode to enforce the language hints
            language_hints=[Language.EN],
            language_hints_strict=True,
+            enable_language_identification=True,
        ),
    )

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -103,7 +103,7 @@ piper = [ "piper-tts>=1.3.0,<2", "requests>=2.32.5,<3" ]
 qwen = []
 resembleai = [ "pipecat-ai[websockets-base]" ]
 rime = [ "pipecat-ai[websockets-base]" ]
-runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-small-webrtc-prebuilt>=2.5.0"]
+runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.0"]
 sagemaker = ["aws_sdk_sagemaker_runtime_http2; python_version>='3.12'"]
 sambanova = []
 sarvam = [ "sarvamai==0.1.28", "pipecat-ai[websockets-base]" ]
@@ -119,6 +119,7 @@ tavus = [ "pipecat-ai[daily]" ]
 together = []
 tracing = [ "opentelemetry-sdk>=1.33.0,<2", "opentelemetry-api>=1.33.0,<2", "opentelemetry-instrumentation>=0.54b0,<1" ]
 ultravox = [ "pipecat-ai[websockets-base]" ]
+vonage-video-connector = [ "vonage-video-connector~=0.2.3b0; python_full_version>='3.13' and python_full_version<'3.14' and platform_system=='Linux'" ]
 webrtc = [ "aiortc>=1.14.0,<2", "opencv-python>=4.11.0.86,<5" ]
 websocket = [ "pipecat-ai[websockets-base]", "fastapi>=0.115.6,<1" ]
 websockets-base = [ "websockets>=13.1,<16.0" ]
--- a/pyrightconfig.json
+++ b/pyrightconfig.json
@@ -6,116 +6,54 @@
  "exclude": ["**/*_pb2.py", "**/__pycache__"],
  "ignore": [
    "tests",
-    "src/pipecat/adapters/services/anthropic_adapter.py",
-    "src/pipecat/adapters/services/aws_nova_sonic_adapter.py",
-    "src/pipecat/adapters/services/bedrock_adapter.py",
-    "src/pipecat/adapters/services/gemini_adapter.py",
-    "src/pipecat/adapters/services/grok_realtime_adapter.py",
-    "src/pipecat/adapters/services/inworld_realtime_adapter.py",
-    "src/pipecat/adapters/services/open_ai_adapter.py",
-    "src/pipecat/adapters/services/open_ai_realtime_adapter.py",
-    "src/pipecat/adapters/services/open_ai_responses_adapter.py",
-    "src/pipecat/adapters/services/perplexity_adapter.py",
-    "src/pipecat/audio/dtmf/utils.py",
    "src/pipecat/audio/filters/aic_filter.py",
    "src/pipecat/audio/filters/krisp_viva_filter.py",
-    "src/pipecat/audio/filters/rnnoise_filter.py",
    "src/pipecat/audio/turn/smart_turn/local_smart_turn_v2.py",
    "src/pipecat/audio/turn/smart_turn/local_smart_turn_v3.py",
    "src/pipecat/audio/vad/silero.py",
-    "src/pipecat/processors/aggregators/llm_context.py",
    "src/pipecat/processors/aggregators/llm_response_universal.py",
    "src/pipecat/processors/frame_processor.py",
-    "src/pipecat/processors/frameworks/langchain.py",
    "src/pipecat/processors/frameworks/rtvi/observer.py",
-    "src/pipecat/processors/frameworks/rtvi/processor.py",
-    "src/pipecat/processors/frameworks/strands_agents.py",
    "src/pipecat/services/anthropic/llm.py",
-    "src/pipecat/services/assemblyai/stt.py",
-    "src/pipecat/services/aws/agent_core.py",
    "src/pipecat/services/aws/llm.py",
    "src/pipecat/services/aws/nova_sonic/llm.py",
    "src/pipecat/services/aws/sagemaker/bidi_client.py",
-    "src/pipecat/services/aws/stt.py",
-    "src/pipecat/services/aws/tts.py",
-    "src/pipecat/services/aws/utils.py",
-    "src/pipecat/services/azure/stt.py",
    "src/pipecat/services/azure/tts.py",
-    "src/pipecat/services/cartesia/stt.py",
-    "src/pipecat/services/cartesia/tts.py",
-    "src/pipecat/services/deepgram/flux/base.py",
    "src/pipecat/services/deepgram/flux/sagemaker/stt.py",
-    "src/pipecat/services/deepgram/flux/stt.py",
    "src/pipecat/services/deepgram/sagemaker/stt.py",
    "src/pipecat/services/deepgram/sagemaker/tts.py",
-    "src/pipecat/services/deepgram/tts.py",
-    "src/pipecat/services/elevenlabs/stt.py",
-    "src/pipecat/services/elevenlabs/tts.py",
-    "src/pipecat/services/fish/tts.py",
-    "src/pipecat/services/gladia/stt.py",
    "src/pipecat/services/google/gemini_live/llm.py",
-    "src/pipecat/services/google/gemini_live/vertex/llm.py",
-    "src/pipecat/services/google/image.py",
    "src/pipecat/services/google/llm.py",
    "src/pipecat/services/google/stt.py",
    "src/pipecat/services/google/tts.py",
-    "src/pipecat/services/gradium/stt.py",
-    "src/pipecat/services/groq/tts.py",
-    "src/pipecat/services/heygen/api_interactive_avatar.py",
-    "src/pipecat/services/heygen/base_api.py",
    "src/pipecat/services/heygen/client.py",
    "src/pipecat/services/heygen/video.py",
-    "src/pipecat/services/hume/tts.py",
    "src/pipecat/services/inworld/realtime/llm.py",
-    "src/pipecat/services/inworld/tts.py",
-    "src/pipecat/services/kokoro/tts.py",
    "src/pipecat/services/llm_service.py",
-    "src/pipecat/services/lmnt/tts.py",
    "src/pipecat/services/mem0/memory.py",
-    "src/pipecat/services/mistral/stt.py",
    "src/pipecat/services/mistral/tts.py",
-    "src/pipecat/services/moondream/vision.py",
-    "src/pipecat/services/neuphonic/tts.py",
    "src/pipecat/services/nvidia/stt.py",
    "src/pipecat/services/nvidia/tts.py",
    "src/pipecat/services/openai/base_llm.py",
-    "src/pipecat/services/openai/image.py",
-    "src/pipecat/services/openai/llm.py",
    "src/pipecat/services/openai/realtime/llm.py",
-    "src/pipecat/services/openai/responses/llm.py",
-    "src/pipecat/services/openai/stt.py",
-    "src/pipecat/services/openai/tts.py",
-    "src/pipecat/services/openrouter/llm.py",
-    "src/pipecat/services/piper/tts.py",
-    "src/pipecat/services/resembleai/tts.py",
    "src/pipecat/services/rime/tts.py",
    "src/pipecat/services/sambanova/llm.py",
    "src/pipecat/services/sarvam/stt.py",
-    "src/pipecat/services/sarvam/tts.py",
    "src/pipecat/services/simli/video.py",
-    "src/pipecat/services/smallest/tts.py",
-    "src/pipecat/services/soniox/stt.py",
    "src/pipecat/services/speechmatics/stt.py",
    "src/pipecat/services/stt_service.py",
    "src/pipecat/services/tavus/video.py",
    "src/pipecat/services/tts_service.py",
    "src/pipecat/services/ultravox/llm.py",
-    "src/pipecat/services/websocket_service.py",
    "src/pipecat/services/whisper/stt.py",
    "src/pipecat/services/xai/realtime/llm.py",
-    "src/pipecat/services/xtts/tts.py",
-    "src/pipecat/transports/base_output.py",
    "src/pipecat/transports/daily/transport.py",
-    "src/pipecat/transports/heygen/transport.py",
    "src/pipecat/transports/lemonslice/transport.py",
    "src/pipecat/transports/livekit/transport.py",
    "src/pipecat/transports/smallwebrtc/connection.py",
-    "src/pipecat/transports/smallwebrtc/request_handler.py",
    "src/pipecat/transports/smallwebrtc/transport.py",
    "src/pipecat/transports/tavus/transport.py",
-    "src/pipecat/transports/websocket/client.py",
-    "src/pipecat/transports/websocket/server.py",
-    "src/pipecat/transports/whatsapp/client.py"
+    "src/pipecat/transports/websocket/server.py"
  ],
  "reportMissingImports": false
 }
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				- Fixed `InworldRealtimeLLMService` not supporting manual-mode turn detection (`session_properties.audio.input.turn_detection=None`). Previously `_handle_user_stopped_speaking` and `_handle_interruption` assumed Inworld's server-side VAD handled commit/cancel/response.create automatically and were no-ops on the client side. In manual mode the server doesn't, so local-VAD-driven turns stalled: the bot never responded after the user stopped speaking, and interruptions didn't cancel the in-flight response. Wire the explicit `InputAudioBufferCommitEvent` + `ResponseCreateEvent` on user-stopped-speaking and `InputAudioBufferClearEvent` + `ResponseCancelEvent` on interruption, gated on a new `_is_manual_turn_detection()` check (mirroring the pattern in `OpenAIRealtimeLLMService`).
				`@@ -0,0 +1 @@`
				- Fixed AWS Nova Sonic not surfacing server-side interruption. When the user interrupted the bot mid-response, the `INTERRUPTED` stop reason was acknowledged internally but no `InterruptionFrame` was emitted, so `BaseOutputTransport` kept draining its audio buffer and the bot kept talking past the interruption. Nova Sonic now broadcasts `InterruptionFrame` on both `INTERRUPTED` paths (text-stage and audio-stage). This was previously masked by enabling local VAD on the user aggregator, which generated `UserStartedSpeakingFrame` and triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.
				`@@ -0,0 +1 @@`
				- Migrated all realtime LLM service examples (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, Gemini Live Vertex, AWS Nova Sonic, Ultravox) — base examples, `persistent-context-`, `update-settings/llm/`, and the Gemini Live MCP example — to use `LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())`. Where examples previously wired `SileroVADAnalyzer` into `LLMUserAggregatorParams` as a workaround for missing turn frames, the local VAD has been removed; the realtime service mode + the Phase 1.5 interruption fixes for Nova Sonic and Ultravox make this safe. Transcript-logging event handlers have moved from `on_user_turn_stopped` / `on_assistant_turn_stopped` to the new `on_user_message_added` / `on_assistant_message_added` events, which carry the finalized message text. Examples for services without server-side user-turn frames (Gemini Live, AWS Nova Sonic, Ultravox) include a Tier 1 comment block explaining what doesn't activate without those frames and how to add local VAD if needed; the corresponding service docstrings have the same warning.
				`@@ -0,0 +1 @@`
				- Added `examples/realtime/realtime-grok-locally-driven-turns.py`, a variant of the base Grok Realtime example that disables Grok's server-side turn detection (`turn_detection=None`, manual mode) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Mirrors the OpenAI Realtime locally-driven-turns variant. Server-emitted turn frames are preferred when available.
				`@@ -0,0 +1 @@`
				- Added `examples/realtime/realtime-inworld-locally-driven-turns.py`, a variant of the base Inworld Realtime example that disables Inworld's server-side turn detection (`turn_detection=None`, manual mode) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Mirrors the OpenAI Realtime and Grok Realtime locally-driven-turns variants. Server-emitted turn frames are preferred when available.
				`@@ -0,0 +1 @@`
				- Added a startup INFO log on realtime LLM services that don't emit `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame` (Gemini Live, AWS Nova Sonic, Ultravox). The log spells out which downstream processors depend on those frames (RTVI client speech events, `TurnTrackingObserver`, `AudioBufferProcessor` turn recording, `UserIdleController`, user mute strategies, voicemail detector) and how to opt into local VAD when needed.
				`@@ -0,0 +1 @@`
				- Added `examples/realtime/realtime-openai-locally-driven-turns.py`, a variant of the base OpenAI Realtime example that disables OpenAI's server-side turn detection (`turn_detection=False`) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Use this variant if you need a turn analyzer like `LocalSmartTurnV3` to decide when the user is done speaking, or if you need `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame` to fire from the same source as `InterruptionFrame`. Server-emitted turn frames are preferred when available.
				`@@ -0,0 +1 @@`
				- Added `RealtimeServiceMetadataFrame`, broadcast at pipeline start by realtime LLM services (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox). The context aggregator pair listens for it and, when `realtime_service_mode` isn't configured, logs a one-time INFO recommendation pointing users at the option and the `on_user_turn_stopped` timing change it implies.
				`@@ -0,0 +1 @@`
				- Added `RealtimeServiceModeConfig` and a new `realtime_service_mode` kwarg on `LLMContextAggregatorPair`, opting the pair into realtime (speech-to-speech) LLM behavior. When set, user messages are written to context when the assistant response starts rather than on user-turn-end frames — so context stays correct even when the realtime service emits no turn frames at all — and, by default, turn-end strategies stop waiting for transcripts before signalling end-of-turn, keeping transcript latency off the critical path in local-VAD-driven realtime pipelines. Both behaviors are individually controllable via the `context_writes_await_turns` and `turns_await_transcripts` fields. Cascade (non-realtime) behavior is unchanged when the kwarg is omitted.
				`@@ -0,0 +1 @@`
				- Added `on_user_message_added` and `on_assistant_message_added` event handlers on `LLMUserAggregator` and `LLMAssistantAggregator`. Each fires when its respective message is flushed to context and carries the finalized content. In cascade mode they coincide with `on_user_turn_stopped` / `on_assistant_turn_stopped`; in realtime mode (where turn-stop fires before the message is finalized) they're the canonical way to subscribe to "context just updated, here's the text."