Rename *-local-vad.py example variants to *-locally-driven-turns.py

The "-local-vad" suffix was ambiguous now that local VAD has two meanings in the realtime context: supplementary user-turn frames broadcast alongside server-driven turns (commented-out opt-in in the base examples), vs. local turn detection driving the conversation end-to-end (server-side turn detection disabled, what these variant files actually demonstrate). The new "-locally-driven-turns" suffix matches the latter intent unambiguously. Renames: realtime-openai-local-vad.py → realtime-openai-locally-driven-turns.py realtime-gemini-live-local-vad.py → realtime-gemini-live-locally-driven-turns.py realtime-grok-local-vad.py → realtime-grok-locally-driven-turns.py realtime-inworld-local-vad.py → realtime-inworld-locally-driven-turns.py Plus the matching changelog fragments. Service docstrings and base examples that referenced the old filenames now point at the new ones.
Show commented-out local-VAD opt-in in no-turn-frames examples
2026-05-21 15:26:27 -04:00 · 2026-05-21 15:13:52 -04:00 · 2026-05-21 14:14:13 -04:00 · 2026-05-21 13:00:34 -04:00 · 2026-05-21 12:37:04 -04:00 · 2026-05-21 12:19:24 -04:00
62 changed files with 2226 additions and 206 deletions
--- a/changelog/+inworld-manual-mode.fixed.md
+++ b/changelog/+inworld-manual-mode.fixed.md
@@ -0,0 +1 @@
+- Fixed `InworldRealtimeLLMService` not supporting manual-mode turn detection (`session_properties.audio.input.turn_detection=None`). Previously `_handle_user_stopped_speaking` and `_handle_interruption` assumed Inworld's server-side VAD handled commit/cancel/response.create automatically and were no-ops on the client side. In manual mode the server doesn't, so local-VAD-driven turns stalled: the bot never responded after the user stopped speaking, and interruptions didn't cancel the in-flight response. Wire the explicit `InputAudioBufferCommitEvent` + `ResponseCreateEvent` on user-stopped-speaking and `InputAudioBufferClearEvent` + `ResponseCancelEvent` on interruption, gated on a new `_is_manual_turn_detection()` check (mirroring the pattern in `OpenAIRealtimeLLMService`).
--- a/changelog/+nova-sonic-server-interruption.fixed.md
+++ b/changelog/+nova-sonic-server-interruption.fixed.md
@@ -0,0 +1 @@
+- Fixed AWS Nova Sonic not surfacing server-side interruption. When the user interrupted the bot mid-response, the `INTERRUPTED` stop reason was acknowledged internally but no `InterruptionFrame` was emitted, so `BaseOutputTransport` kept draining its audio buffer and the bot kept talking past the interruption. Nova Sonic now broadcasts `InterruptionFrame` on both `INTERRUPTED` paths (text-stage and audio-stage). This was previously masked by enabling local VAD on the user aggregator, which generated `UserStartedSpeakingFrame` and triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.
--- a/changelog/+realtime-examples-migrated.changed.md
+++ b/changelog/+realtime-examples-migrated.changed.md
@@ -0,0 +1 @@
+- Migrated all realtime LLM service examples (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, Gemini Live Vertex, AWS Nova Sonic, Ultravox) — base examples, `persistent-context-*`, `update-settings/llm/*`, and the Gemini Live MCP example — to use `LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())`. Where examples previously wired `SileroVADAnalyzer` into `LLMUserAggregatorParams` as a workaround for missing turn frames, the local VAD has been removed; the realtime service mode + the Phase 1.5 interruption fixes for Nova Sonic and Ultravox make this safe. Transcript-logging event handlers have moved from `on_user_turn_stopped` / `on_assistant_turn_stopped` to the new `on_user_message_added` / `on_assistant_message_added` events, which carry the finalized message text. Examples for services without server-side user-turn frames (Gemini Live, AWS Nova Sonic, Ultravox) include a Tier 1 comment block explaining what doesn't activate without those frames and how to add local VAD if needed; the corresponding service docstrings have the same warning.
--- a/changelog/+realtime-grok-locally-driven-turns-example.added.md
+++ b/changelog/+realtime-grok-locally-driven-turns-example.added.md
@@ -0,0 +1 @@
+- Added `examples/realtime/realtime-grok-locally-driven-turns.py`, a variant of the base Grok Realtime example that disables Grok's server-side turn detection (`turn_detection=None`, manual mode) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Mirrors the OpenAI Realtime locally-driven-turns variant. Server-emitted turn frames are preferred when available.
--- a/changelog/+realtime-inworld-locally-driven-turns-example.added.md
+++ b/changelog/+realtime-inworld-locally-driven-turns-example.added.md
@@ -0,0 +1 @@
+- Added `examples/realtime/realtime-inworld-locally-driven-turns.py`, a variant of the base Inworld Realtime example that disables Inworld's server-side turn detection (`turn_detection=None`, manual mode) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Mirrors the OpenAI Realtime and Grok Realtime locally-driven-turns variants. Server-emitted turn frames are preferred when available.
--- a/changelog/+realtime-no-user-turn-frames-log.added.md
+++ b/changelog/+realtime-no-user-turn-frames-log.added.md
@@ -0,0 +1 @@
+- Added a startup INFO log on realtime LLM services that don't emit `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame` (Gemini Live, AWS Nova Sonic, Ultravox). The log spells out which downstream processors depend on those frames (RTVI client speech events, `TurnTrackingObserver`, `AudioBufferProcessor` turn recording, `UserIdleController`, user mute strategies, voicemail detector) and how to opt into local VAD when needed.
--- a/changelog/+realtime-openai-locally-driven-turns-example.added.md
+++ b/changelog/+realtime-openai-locally-driven-turns-example.added.md
@@ -0,0 +1 @@
+- Added `examples/realtime/realtime-openai-locally-driven-turns.py`, a variant of the base OpenAI Realtime example that disables OpenAI's server-side turn detection (`turn_detection=False`) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Use this variant if you need a turn analyzer like `LocalSmartTurnV3` to decide when the user is done speaking, or if you need `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame` to fire from the same source as `InterruptionFrame`. Server-emitted turn frames are preferred when available.
--- a/changelog/+realtime-service-metadata-frame.added.md
+++ b/changelog/+realtime-service-metadata-frame.added.md
@@ -0,0 +1 @@
+- Added `RealtimeServiceMetadataFrame`, broadcast at pipeline start by realtime LLM services (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox). The context aggregator pair listens for it and, when `realtime_service_mode` isn't configured, logs a one-time INFO recommendation pointing users at the option and the `on_user_turn_stopped` timing change it implies.
--- a/changelog/+realtime-service-mode-config.added.md
+++ b/changelog/+realtime-service-mode-config.added.md
@@ -0,0 +1 @@
+- Added `RealtimeServiceModeConfig` and a new `realtime_service_mode` kwarg on `LLMContextAggregatorPair`, opting the pair into realtime (speech-to-speech) LLM behavior. When set, user messages are written to context when the assistant response starts rather than on user-turn-end frames — so context stays correct even when the realtime service emits no turn frames at all — and, by default, turn-end strategies stop waiting for transcripts before signalling end-of-turn, keeping transcript latency off the critical path in local-VAD-driven realtime pipelines. Both behaviors are individually controllable via the `context_writes_await_turns` and `turns_await_transcripts` fields. Cascade (non-realtime) behavior is unchanged when the kwarg is omitted.
--- a/changelog/+realtime-service-mode-events.added.md
+++ b/changelog/+realtime-service-mode-events.added.md
@@ -0,0 +1 @@
+- Added `on_user_message_added` and `on_assistant_message_added` event handlers on `LLMUserAggregator` and `LLMAssistantAggregator`. Each fires when its respective message is flushed to context and carries the finalized content. In cascade mode they coincide with `on_user_turn_stopped` / `on_assistant_turn_stopped`; in realtime mode (where turn-stop fires before the message is finalized) they're the canonical way to subscribe to "context just updated, here's the text."
--- a/changelog/+ultravox-server-interruption.fixed.md
+++ b/changelog/+ultravox-server-interruption.fixed.md
@@ -0,0 +1 @@
+- Fixed Ultravox Realtime not surfacing server-side interruption. The server sends a `playback_clear_buffer` message when the user interrupts the bot mid-speech, instructing clients to drop buffered output audio; this was previously unhandled, so `BaseOutputTransport` kept playing the buffered audio and the bot kept talking past the interruption. Ultravox now broadcasts `InterruptionFrame` on `playback_clear_buffer`. This was previously masked by enabling local VAD on the user aggregator, which generated `UserStartedSpeakingFrame` and triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.
--- a/changelog/+user-turn-stopped-message-content-optional.changed.md
+++ b/changelog/+user-turn-stopped-message-content-optional.changed.md
@@ -0,0 +1 @@
+- `UserTurnStoppedMessage.content` is now typed `str | None`. In realtime mode (`RealtimeServiceModeConfig(context_writes_await_turns=False)`) the user message isn't finalized at turn-stop time, so `content` is `None`; subscribers wanting the finalized text should use the new `on_user_message_added` event. Cascade behavior is unchanged.
--- a/changelog/+wait-for-transcript-stop-strategies.changed.md
+++ b/changelog/+wait-for-transcript-stop-strategies.changed.md
@@ -0,0 +1 @@
+- `SpeechTimeoutUserTurnStopStrategy` and `TurnAnalyzerUserTurnStopStrategy` now accept a `wait_for_transcript: bool = True` kwarg. When set to `False`, the strategy signals end-of-turn as soon as VAD / the turn analyzer reports end-of-speech rather than waiting for a transcript — useful when local turn detection is the intended driver of a realtime conversation. `LLMContextAggregatorPair` flips this for you when `realtime_service_mode` is configured with the default `turns_await_transcripts=False`.
--- a/examples/mcp/mcp-streamable-http-gemini-live.py
+++ b/examples/mcp/mcp-streamable-http-gemini-live.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger
 from mcp.client.session_group import StreamableHttpParameters

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -19,7 +18,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -84,7 +83,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        context = LLMContext([{"role": "user", "content": "Please introduce yourself."}])
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
-            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+            realtime_service_mode=RealtimeServiceModeConfig(),
        )

        pipeline = Pipeline(
--- a/examples/persistent-context/persistent-context-aws-nova-sonic.py
+++ b/examples/persistent-context/persistent-context-aws-nova-sonic.py
@@ -15,7 +15,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -23,7 +22,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -241,7 +240,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(tools=tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/persistent-context/persistent-context-grok-realtime.py
+++ b/examples/persistent-context/persistent-context-grok-realtime.py
@@ -33,6 +33,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -203,7 +204,10 @@ Remember, your responses should be short - just one or two sentences usually."""
    llm.register_function("load_conversation", load_conversation)

    context = LLMContext([{"role": "developer", "content": "Say hello!"}], tools)
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/persistent-context/persistent-context-openai-realtime.py
+++ b/examples/persistent-context/persistent-context-openai-realtime.py
@@ -15,7 +15,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -23,7 +22,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -217,7 +216,7 @@ Remember, your responses should be short. Just one or two sentences, usually."""
    context = LLMContext([{"role": "developer", "content": "Say hello!"}], tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-aws-nova-sonic-async-tool.py
+++ b/examples/realtime/realtime-aws-nova-sonic-async-tool.py
@@ -24,7 +24,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -32,7 +31,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -133,7 +132,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(tools=tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-aws-nova-sonic.py
+++ b/examples/realtime/realtime-aws-nova-sonic.py
@@ -15,7 +15,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -24,7 +23,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -148,10 +147,31 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm.register_function("get_current_weather", fetch_weather_from_api)

    # Set up context and context management.
+    #
+    # AWS Nova Sonic drives the conversation server-side and does not emit
+    # UserStartedSpeakingFrame / UserStoppedSpeakingFrame. Context
+    # aggregation still works with realtime_service_mode, but pipeline
+    # processors that depend on those frames (RTVI client speech events,
+    # TurnTrackingObserver, AudioBufferProcessor turn recording,
+    # UserIdleController, user mute strategies, voicemail detector) won't
+    # activate. The Pipecat Prebuilt UI is one such consumer — without
+    # these frames it can't group user transcripts into discrete turns
+    # visually.
+    #
+    # If you need those frames, uncomment the SileroVADAnalyzer import
+    # above and the `user_params=` argument below. Note: local turn
+    # detection may not match Nova Sonic's actual server-side turn
+    # decisions and can desynchronize in subtle ways.
+    #
+    # from pipecat.audio.vad.silero import SileroVADAnalyzer
+    # from pipecat.processors.aggregators.llm_response_universal import (
+    #     LLMUserAggregatorParams,
+    # )
    context = LLMContext(tools=tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        # user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
    )

    # Build the pipeline
@@ -195,14 +215,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Nova Sonic doesn't emit user-turn frames so on_user_turn_stopped
+    # would never fire. The *_message_added events fire when messages are
+    # written to context and carry the finalized content; use those for
+    # transcript logging.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-azure-async-tool.py
+++ b/examples/realtime/realtime-azure-async-tool.py
@@ -24,7 +24,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -32,7 +31,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -144,7 +143,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(tools=tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-azure.py
+++ b/examples/realtime/realtime-azure.py
@@ -13,7 +13,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -21,7 +20,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -174,7 +173,7 @@ Remember, your responses should be short. Just one or two sentences, usually. Re

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-gemini-live-async-tool.py
+++ b/examples/realtime/realtime-gemini-live-async-tool.py
@@ -28,7 +28,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -125,7 +128,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    context = LLMContext()
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-files-api.py
+++ b/examples/realtime/realtime-gemini-live-files-api.py
@@ -15,7 +15,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -158,7 +161,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        )

    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    # Build the pipeline
    pipeline = Pipeline(
--- a/examples/realtime/realtime-gemini-live-google-search.py
+++ b/examples/realtime/realtime-gemini-live-google-search.py
@@ -15,7 +15,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -84,7 +87,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        ],
    )
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-graceful-end.py
+++ b/examples/realtime/realtime-gemini-live-graceful-end.py
@@ -17,7 +17,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.processors.frame_processor import FrameDirection
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -148,7 +151,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        [{"role": "developer", "content": "Say hello."}],
    )
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-grounding-metadata.py
+++ b/examples/realtime/realtime-gemini-live-grounding-metadata.py
@@ -9,7 +9,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -115,7 +118,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Set up conversation context and management
    context = LLMContext()
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-locally-driven-turns.py
+++ b/examples/realtime/realtime-gemini-live-locally-driven-turns.py
@@ -4,6 +4,29 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+"""Gemini Live with locally-driven turn detection.
+
+By default Gemini Live drives the conversation with its own server-side VAD
+(see `realtime-gemini-live.py`). That setup doesn't surface
+``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame``, so pipeline
+processors that depend on those frames (RTVI client speech events,
+``TurnTrackingObserver``, ``AudioBufferProcessor`` turn recording,
+``UserIdleController``, user mute strategies, voicemail detector) don't
+activate.
+
+This variant disables Gemini Live's server-side VAD
+(``GeminiVADParams(disabled=True)``) and instead drives turn boundaries
+locally with ``SileroVADAnalyzer`` wired into the user aggregator. Use this
+variant if you need those downstream processors, or if you want a turn
+analyzer like ``LocalSmartTurnV3`` to decide when the user is done speaking.
+
+Caveat: locally-generated turn boundaries are a heuristic and may not match
+the provider's actual server-side turn decisions, which is what really
+drives the conversation. The two can drift apart in subtle, hard-to-debug
+ways, especially around interruptions and overlapping speech. Prefer
+server-emitted turn frames (i.e. the base `realtime-gemini-live.py` example)
+unless you have a specific reason to drive turn detection locally.
+"""

 import os

@@ -20,6 +43,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -72,6 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    )
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
        user_params=LLMUserAggregatorParams(
            vad_analyzer=SileroVADAnalyzer(),
        ),
@@ -107,14 +132,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # The *_message_added events fire when messages are written to context
+    # and carry the finalized content. In realtime mode the turn-stopped
+    # events fire before the message text is finalized.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-gemini-live-vertex.py
+++ b/examples/realtime/realtime-gemini-live-vertex.py
@@ -18,7 +18,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.vertex.llm import GeminiLiveVertexLLMService
@@ -124,7 +127,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    context = LLMContext([{"role": "developer", "content": "Say hello."}])
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live-video.py
+++ b/examples/realtime/realtime-gemini-live-video.py
@@ -16,7 +16,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
    create_transport,
@@ -64,7 +67,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        ],
    )
    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-gemini-live.py
+++ b/examples/realtime/realtime-gemini-live.py
@@ -21,6 +21,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -130,8 +131,33 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)

    context = LLMContext()
-    # Server-side VAD is enabled by default; no local VAD is added.
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    # Gemini Live drives the conversation server-side and does not emit
+    # UserStartedSpeakingFrame / UserStoppedSpeakingFrame. Context
+    # aggregation still works with realtime_service_mode, but pipeline
+    # processors that depend on those frames (RTVI client speech events,
+    # TurnTrackingObserver, AudioBufferProcessor turn recording,
+    # UserIdleController, user mute strategies, voicemail detector) won't
+    # activate. The Pipecat Prebuilt UI is one such consumer — without
+    # these frames it can't group user transcripts into discrete turns
+    # visually.
+    #
+    # If you need those frames, uncomment the SileroVADAnalyzer import
+    # above and the `user_params=` argument below. Note: local turn
+    # detection may not match Gemini Live's actual server-side turn
+    # decisions and can desynchronize in subtle ways.
+    #
+    # For local VAD driving the conversation (server VAD disabled), see
+    # `realtime-gemini-live-locally-driven-turns.py` instead.
+    #
+    # from pipecat.audio.vad.silero import SileroVADAnalyzer
+    # from pipecat.processors.aggregators.llm_response_universal import (
+    #     LLMUserAggregatorParams,
+    # )
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        # user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )

    pipeline = Pipeline(
        [
@@ -166,14 +192,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Gemini Live doesn't emit user-turn frames so on_user_turn_stopped
+    # would never fire. The *_message_added events fire when messages are
+    # written to context and carry the finalized content; use those for
+    # transcript logging regardless of whether the service emits turn
+    # frames.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-grok-async-tool.py
+++ b/examples/realtime/realtime-grok-async-tool.py
@@ -29,7 +29,10 @@ from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.llm_service import FunctionCallParams
@@ -129,7 +132,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    )

    context = LLMContext(tools=tools)
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    pipeline = Pipeline(
        [
--- a/examples/realtime/realtime-grok-locally-driven-turns.py
+++ b/examples/realtime/realtime-grok-locally-driven-turns.py
@@ -0,0 +1,262 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Grok Realtime with locally-driven turn detection.
+
+By default Grok Realtime drives the conversation with its own server-side
+VAD (see `realtime-grok.py`). This variant disables server-side turn
+detection (``turn_detection=None``, the "manual" mode in Grok's session
+properties) and instead drives turn boundaries locally with
+``SileroVADAnalyzer`` wired into the user aggregator. Use this variant if
+you want a turn analyzer like ``LocalSmartTurnV3`` to decide when the user
+is done speaking, or if you need ``UserStartedSpeakingFrame`` /
+``UserStoppedSpeakingFrame`` to fire from the same source as
+``InterruptionFrame``.
+
+Caveat: locally-generated turn boundaries are a heuristic and may not match
+the provider's actual server-side turn decisions. Prefer server-emitted
+turn frames (i.e. the base `realtime-grok.py` example) unless you have a
+specific reason to drive turn detection locally.
+"""
+
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.observers.loggers.transcription_log_observer import (
+    TranscriptionLogObserver,
+)
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    AssistantTurnStoppedMessage,
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
+    UserTurnStoppedMessage,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.xai.realtime.events import SessionProperties
+from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    """Handle weather function calls."""
+    temperature = 75 if params.arguments.get("format") == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments.get("format", "celsius"),
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def get_current_time(params: FunctionCallParams):
+    """Handle time function calls."""
+    await params.result_callback(
+        {
+            "time": datetime.now().strftime("%H:%M:%S"),
+            "date": datetime.now().strftime("%Y-%m-%d"),
+            "timezone": "local",
+        }
+    )
+
+
+async def get_restaurant_recommendation(params: FunctionCallParams):
+    """Handle restaurant recommendation function calls."""
+    location = params.arguments.get("location", "unknown")
+    await params.result_callback(
+        {
+            "name": "The Golden Dragon",
+            "cuisine": "Chinese",
+            "location": location,
+            "rating": 4.5,
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather for a location",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use.",
+        },
+    },
+    required=["location", "format"],
+)
+
+time_function = FunctionSchema(
+    name="get_current_time",
+    description="Get the current time and date",
+    properties={},
+    required=[],
+)
+
+restaurant_function = FunctionSchema(
+    name="get_restaurant_recommendation",
+    description="Get a restaurant recommendation for a location",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+    },
+    required=["location"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function, time_function, restaurant_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info("Starting Grok Voice Agent bot")
+
+    session_properties = SessionProperties(
+        voice="Ara",
+        # Disable Grok's server-side turn detection (manual mode). This
+        # example drives turn boundaries locally via the SileroVADAnalyzer
+        # wired into the user aggregator below.
+        turn_detection=None,
+    )
+
+    llm = GrokRealtimeLLMService(
+        api_key=os.environ["XAI_API_KEY"],
+        settings=GrokRealtimeLLMService.Settings(
+            system_instruction="""You are a helpful and friendly AI assistant powered by Grok.
+
+    You have access to several tools:
+    - Weather information
+    - Current time
+    - Restaurant recommendations
+    - Web search (built-in)
+    - X/Twitter search (built-in)
+
+    Your voice and personality should be warm and engaging. Keep your responses
+    concise and conversational since this is a voice interaction.
+
+    If the user asks about current events or news, use web search.
+    If they ask about what people are saying on social media, use X search.
+
+    Always be helpful and proactive in offering assistance.""",
+            session_properties=session_properties,
+        ),
+    )
+
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_current_time", get_current_time)
+    llm.register_function("get_restaurant_recommendation", get_restaurant_recommendation)
+
+    context = LLMContext(
+        [{"role": "developer", "content": "Say hello and introduce yourself!"}],
+        tools,
+    )
+
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        # Drive turn detection locally via SileroVAD wired into the user
+        # aggregator. realtime_service_mode keeps context-write semantics
+        # correct and (by default) drops the transcript wait on turn-end so
+        # local VAD can drive turn boundaries on the latency critical path.
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[TranscriptionLogObserver()],
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info("Client connected")
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info("Client disconnected")
+        await task.cancel()
+
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}user: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}assistant: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-grok.py
+++ b/examples/realtime/realtime-grok.py
@@ -33,9 +33,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-
-# Note: Grok has built-in server-side VAD, so we don't need local VAD
-# from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.observers.loggers.transcription_log_observer import (
    TranscriptionLogObserver,
@@ -47,6 +44,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -212,7 +210,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        tools,
    )

-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    # Build the pipeline
    # Note: In realtime mode, transcription comes from Grok (upstream),
@@ -248,15 +249,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info("Client disconnected")
        await task.cancel()

-    # Log transcript updates
-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Log transcript updates. In realtime mode the turn-stopped events
+    # fire before the message text is finalized (UserTurnStoppedMessage
+    # content is None), so subscribe to the *_message_added events
+    # instead — they fire when the message is written to context and
+    # carry the finalized content.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-inworld-locally-driven-turns.py
+++ b/examples/realtime/realtime-inworld-locally-driven-turns.py
@@ -0,0 +1,235 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Inworld Realtime with locally-driven turn detection.
+
+By default Inworld Realtime drives the conversation with its own
+server-side semantic VAD (see `realtime-inworld.py`). This variant
+disables server-side turn detection (``turn_detection=None``, the
+"manual" mode in Inworld's session properties) and instead drives turn
+boundaries locally with ``SileroVADAnalyzer`` wired into the user
+aggregator. Use this variant if you want a turn analyzer like
+``LocalSmartTurnV3`` to decide when the user is done speaking, or if you
+need ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame`` to fire
+from the same source as ``InterruptionFrame``.
+
+Caveat: locally-generated turn boundaries are a heuristic and may not
+match the provider's actual server-side turn decisions. Prefer
+server-emitted turn frames (i.e. the base `realtime-inworld.py` example)
+unless you have a specific reason to drive turn detection locally.
+"""
+
+import os
+import random
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.observers.loggers.transcription_log_observer import (
+    TranscriptionLogObserver,
+)
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    AssistantTurnStoppedMessage,
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
+    UserTurnStoppedMessage,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.inworld.realtime.events import (
+    AudioConfiguration,
+    AudioInput,
+    AudioOutput,
+    InputTranscription,
+    PCMAudioFormat,
+    SessionProperties,
+)
+from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = (
+        random.randint(60, 85)
+        if params.arguments["format"] == "fahrenheit"
+        else random.randint(15, 30)
+    )
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use.",
+        },
+    },
+    required=["location", "format"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info("Starting Inworld Realtime bot (local VAD)")
+
+    model = "openai/gpt-4.1-mini"
+    voice = "Sarah"
+    tts_model = "inworld-tts-2"
+    stt_model = "assemblyai/u3-rt-pro"
+
+    # Setting session_properties here replaces Inworld's defaults wholesale,
+    # so we provide a complete SessionProperties — with turn_detection=None
+    # (manual mode) so local VAD drives turn boundaries instead.
+    session_properties = SessionProperties(
+        model=model,
+        output_modalities=["audio", "text"],
+        audio=AudioConfiguration(
+            input=AudioInput(
+                format=PCMAudioFormat(rate=24000),
+                transcription=InputTranscription(model=stt_model),
+                turn_detection=None,
+            ),
+            output=AudioOutput(
+                format=PCMAudioFormat(rate=24000),
+                model=tts_model,
+                voice=voice,
+            ),
+        ),
+    )
+
+    llm = InworldRealtimeLLMService(
+        api_key=os.environ["INWORLD_API_KEY"],
+        settings=InworldRealtimeLLMService.Settings(
+            system_instruction="""You are a helpful and friendly AI assistant powered by Inworld.
+
+Your voice and personality should be warm and engaging. Keep your responses
+concise and conversational since this is a voice interaction.
+
+Always be helpful and proactive in offering assistance.""",
+            session_properties=session_properties,
+        ),
+    )
+
+    # Note: function calling requires a paid Inworld account and a
+    # function-calling-capable model
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+
+    context = LLMContext(
+        [{"role": "developer", "content": "Say hello and introduce yourself!"}],
+        tools,
+    )
+
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        # Drive turn detection locally via SileroVAD wired into the user
+        # aggregator. realtime_service_mode keeps context-write semantics
+        # correct and (by default) drops the transcript wait on turn-end so
+        # local VAD can drive turn boundaries on the latency critical path.
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[TranscriptionLogObserver()],
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info("Client connected")
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info("Client disconnected")
+        await task.cancel()
+
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        logger.info(f"Transcript: {timestamp}user: {message.content}")
+
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        logger.info(f"Transcript: {timestamp}assistant: {message.content}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-inworld.py
+++ b/examples/realtime/realtime-inworld.py
@@ -47,6 +47,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -149,7 +150,10 @@ Always be helpful and proactive in offering assistance.""",
        tools,
    )

-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        realtime_service_mode=RealtimeServiceModeConfig(),
+    )

    # Build the pipeline
    pipeline = Pipeline(
@@ -182,13 +186,16 @@ Always be helpful and proactive in offering assistance.""",
        logger.info("Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # In realtime mode the turn-stopped events fire before the message
+    # text is finalized; subscribe to the *_message_added events for the
+    # finalized content.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        logger.info(f"Transcript: {timestamp}user: {message.content}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        logger.info(f"Transcript: {timestamp}assistant: {message.content}")

--- a/examples/realtime/realtime-openai-async-tool.py
+++ b/examples/realtime/realtime-openai-async-tool.py
@@ -24,7 +24,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -32,7 +31,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -147,7 +146,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(tools=tools)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-openai-live-video.py
+++ b/examples/realtime/realtime-openai-live-video.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
 from pipecat.pipeline.pipeline import Pipeline
@@ -19,7 +18,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
@@ -106,7 +105,7 @@ Remember, your responses should be short. Just one or two sentences, usually. Re

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-openai-locally-driven-turns.py
+++ b/examples/realtime/realtime-openai-locally-driven-turns.py
@@ -0,0 +1,267 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""OpenAI Realtime with locally-driven turn detection.
+
+By default OpenAI Realtime drives the conversation with its own server-side
+VAD (see `realtime-openai.py`). This variant disables server-side turn
+detection (``turn_detection=False``) and instead drives turn boundaries
+locally with ``SileroVADAnalyzer`` wired into the user aggregator. This is
+the path to take if you want a turn analyzer like ``LocalSmartTurnV3`` to
+decide when the user is done speaking, or if you need ``UserStartedSpeakingFrame``
+/ ``UserStoppedSpeakingFrame`` to fire from the same source as
+``InterruptionFrame``.
+
+Caveat: locally-generated turn boundaries are a heuristic and may not match
+the provider's actual server-side turn decisions. With OpenAI Realtime,
+server-side turn detection is generally what the service expects to drive
+the conversation, and disabling it puts the responsibility on you. Prefer
+server-emitted turn frames (i.e. the base `realtime-openai.py` example)
+unless you have a specific reason to drive turn detection locally.
+"""
+
+import asyncio
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame, LLMSetToolsFrame
+from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    AssistantTurnStoppedMessage,
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
+    UserTurnStoppedMessage,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai.realtime.events import (
+    AudioConfiguration,
+    AudioInput,
+    InputAudioNoiseReduction,
+    InputAudioTranscription,
+    SessionProperties,
+)
+from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def get_news(params: FunctionCallParams):
+    await params.result_callback(
+        {
+            "news": [
+                "Massive UFO currently hovering above New York City",
+                "Stock markets reach all-time highs",
+                "Living dinosaur species discovered in the Amazon rainforest",
+            ],
+        }
+    )
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+get_news_function = FunctionSchema(
+    name="get_news",
+    description="Get the current news.",
+    properties={},
+    required=[],
+)
+
+restaurant_function = FunctionSchema(
+    name="get_restaurant_recommendation",
+    description="Get a restaurant recommendation",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+    },
+    required=["location"],
+)
+
+tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
+
+
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    llm = OpenAIRealtimeLLMService(
+        api_key=os.environ["OPENAI_API_KEY"],
+        settings=OpenAIRealtimeLLMService.Settings(
+            system_instruction="""You are a helpful and friendly AI.
+
+Act like a human, but remember that you aren't a human and that you can't do human
+things in the real world. Your voice and personality should be warm and engaging, with a lively and
+playful tone.
+
+If interacting in a non-English language, start by using the standard accent or dialect familiar to
+the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
+even if you're asked about them.
+
+You are participating in a voice conversation. Keep your responses concise, short, and to the point
+unless specifically asked to elaborate on a topic.
+
+Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
+            session_properties=SessionProperties(
+                audio=AudioConfiguration(
+                    input=AudioInput(
+                        transcription=InputAudioTranscription(),
+                        # Disable OpenAI's server-side turn detection — this
+                        # example drives turn boundaries locally via the
+                        # SileroVADAnalyzer wired into the user aggregator
+                        # below.
+                        turn_detection=False,
+                        noise_reduction=InputAudioNoiseReduction(type="near_field"),
+                    )
+                ),
+            ),
+        ),
+    )
+
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+    llm.register_function("get_news", get_news)
+
+    context = LLMContext(
+        [{"role": "developer", "content": "Say hello!"}],
+        tools,
+    )
+
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        # Drive turn detection locally via SileroVAD wired into the user
+        # aggregator. realtime_service_mode keeps context-write semantics
+        # correct and (by default) drops the transcript wait on turn-end so
+        # local VAD can drive turn boundaries on the latency critical path.
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            user_aggregator,
+            llm,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[TranscriptionLogObserver()],
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        await task.queue_frames([LLMRunFrame()])
+
+        await asyncio.sleep(15)
+        new_tools = ToolsSchema(
+            standard_tools=[weather_function, restaurant_function, get_news_function]
+        )
+        await task.queue_frames([LLMSetToolsFrame(tools=new_tools)])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}user: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
+        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
+        line = f"{timestamp}assistant: {message.content}"
+        logger.info(f"Transcript: {line}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/realtime/realtime-openai-text.py
+++ b/examples/realtime/realtime-openai-text.py
@@ -13,7 +13,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -21,7 +20,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -177,7 +176,7 @@ Remember, your responses should be short. Just one or two sentences, usually. Re

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-openai.py
+++ b/examples/realtime/realtime-openai.py
@@ -14,7 +14,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMSetToolsFrame
 from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
 from pipecat.pipeline.pipeline import Pipeline
@@ -24,7 +23,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -187,7 +186,13 @@ Remember, your responses should be short. Just one or two sentences, usually. Re

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        # OpenAI Realtime drives the conversation server-side and emits its
+        # own UserStarted/StoppedSpeakingFrame from server VAD events, so
+        # local VAD on the aggregator is unnecessary. realtime_service_mode
+        # decouples context writes from turn frames and transcript-bound
+        # turn-end. See `realtime-openai-locally-driven-turns.py` for the
+        # variant that disables server VAD and drives turn detection locally.
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -251,15 +256,19 @@ Remember, your responses should be short. Just one or two sentences, usually. Re
        logger.info(f"Client disconnected")
        await task.cancel()

-    # Log transcript updates
-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Log transcript updates. In realtime mode the turn-stopped events
+    # fire before the message text is finalized (UserTurnStoppedMessage
+    # content is None), so subscribe to the *_message_added events
+    # instead — they fire when the message is written to context and
+    # carry the finalized content.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-ultravox-async-tool.py
+++ b/examples/realtime/realtime-ultravox-async-tool.py
@@ -26,14 +26,13 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -42,8 +41,6 @@ from pipecat.services.ultravox.llm import OneShotInputParams, UltravoxRealtimeLL
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
-from pipecat.turns.user_turn_strategies import UserTurnStrategies

 load_dotenv(override=True)

@@ -134,12 +131,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext([])
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(
-            user_turn_strategies=UserTurnStrategies(
-                stop=[SpeechTimeoutUserTurnStopStrategy()],
-            ),
-            vad_analyzer=SileroVADAnalyzer(),
-        ),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/realtime/realtime-ultravox-text.py
+++ b/examples/realtime/realtime-ultravox-text.py
@@ -12,8 +12,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -21,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -32,8 +30,6 @@ from pipecat.services.ultravox.llm import OneShotInputParams, UltravoxRealtimeLL
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
-from pipecat.turns.user_turn_strategies import UserTurnStrategies

 # Load environment variables
 load_dotenv(override=True)
@@ -188,17 +184,9 @@ There is also a secret menu that changes daily. If the user asks about it, use t

    context = LLMContext([])

-    # Necessary to complete the function call lifecycle in Pipecat and
-    # to produce user and assistant turn stopped events.
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(
-            user_turn_strategies=UserTurnStrategies(
-                stop=[SpeechTimeoutUserTurnStopStrategy()],
-            ),
-            # Set the VAD analyzer to emulate timing of the model.
-            vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5)),
-        ),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    # Build the pipeline
@@ -234,14 +222,16 @@ There is also a secret menu that changes daily. If the user asks about it, use t
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Ultravox doesn't emit user-turn frames; subscribe to the
+    # *_message_added events for the finalized message text.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/realtime/realtime-ultravox.py
+++ b/examples/realtime/realtime-ultravox.py
@@ -12,7 +12,6 @@ from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -20,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
    UserTurnStoppedMessage,
 )
 from pipecat.runner.types import RunnerArguments
@@ -30,8 +29,6 @@ from pipecat.services.ultravox.llm import OneShotInputParams, UltravoxRealtimeLL
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
-from pipecat.turns.user_turn_strategies import UserTurnStrategies

 # Load environment variables
 load_dotenv(override=True)
@@ -178,18 +175,29 @@ There is also a secret menu that changes daily. If the user asks about it, use t

    context = LLMContext([])

-    # Necessary to complete the function call lifecycle in Pipecat and
-    # to produce user and assistant turn stopped events.
+    # Ultravox drives the conversation server-side and does not emit
+    # UserStartedSpeakingFrame / UserStoppedSpeakingFrame. Context
+    # aggregation still works with realtime_service_mode, but pipeline
+    # processors that depend on those frames (RTVI client speech events,
+    # TurnTrackingObserver, AudioBufferProcessor turn recording,
+    # UserIdleController, user mute strategies, voicemail detector) won't
+    # activate. The Pipecat Prebuilt UI is one such consumer — without
+    # these frames it can't group user transcripts into discrete turns
+    # visually.
+    #
+    # If you need those frames, uncomment the SileroVADAnalyzer import
+    # above and the `user_params=` argument below. Note: local turn
+    # detection may not match Ultravox's actual server-side turn
+    # decisions and can desynchronize in subtle ways.
+    #
+    # from pipecat.audio.vad.silero import SileroVADAnalyzer
+    # from pipecat.processors.aggregators.llm_response_universal import (
+    #     LLMUserAggregatorParams,
+    # )
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(
-            user_turn_strategies=UserTurnStrategies(
-                stop=[SpeechTimeoutUserTurnStopStrategy()],
-            ),
-            # Set the VAD analyzer to create reliable TTFB measurements and
-            # user stop events.
-            vad_analyzer=SileroVADAnalyzer(),
-        ),
+        realtime_service_mode=RealtimeServiceModeConfig(),
+        # user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
    )

    # Build the pipeline
@@ -224,14 +232,18 @@ There is also a secret menu that changes daily. If the user asks about it, use t
        logger.info(f"Client disconnected")
        await task.cancel()

-    @user_aggregator.event_handler("on_user_turn_stopped")
-    async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
+    # Ultravox doesn't emit user-turn frames so on_user_turn_stopped
+    # would never fire. The *_message_added events fire when messages are
+    # written to context and carry the finalized content; use those for
+    # transcript logging.
+    @user_aggregator.event_handler("on_user_message_added")
+    async def on_user_message_added(aggregator, message: UserTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}user: {message.content}"
        logger.info(f"Transcript: {line}")

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/update-settings/llm/llm-aws-nova-sonic.py
+++ b/examples/update-settings/llm/llm-aws-nova-sonic.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -18,7 +17,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -60,7 +59,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/update-settings/llm/llm-azure-realtime.py
+++ b/examples/update-settings/llm/llm-azure-realtime.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.adapters.base_llm_adapter import LLMContextMessage
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -20,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -66,7 +65,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -88,8 +87,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/update-settings/llm/llm-gemini-live-vertex.py
+++ b/examples/update-settings/llm/llm-gemini-live-vertex.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -18,7 +17,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -60,7 +59,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/update-settings/llm/llm-gemini-live.py
+++ b/examples/update-settings/llm/llm-gemini-live.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -18,7 +17,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -58,7 +57,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
--- a/examples/update-settings/llm/llm-grok-realtime.py
+++ b/examples/update-settings/llm/llm-grok-realtime.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.adapters.base_llm_adapter import LLMContextMessage
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -20,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -63,7 +62,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -85,8 +84,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/update-settings/llm/llm-openai-realtime.py
+++ b/examples/update-settings/llm/llm-openai-realtime.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.adapters.base_llm_adapter import LLMContextMessage
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -20,7 +19,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -63,7 +62,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -85,8 +84,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/examples/update-settings/llm/llm-ultravox-realtime.py
+++ b/examples/update-settings/llm/llm-ultravox-realtime.py
@@ -13,7 +13,6 @@ from loguru import logger

 from pipecat.adapters.base_llm_adapter import LLMContextMessage
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame, LLMUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -22,7 +21,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import (
    AssistantTurnStoppedMessage,
    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -74,7 +73,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        realtime_service_mode=RealtimeServiceModeConfig(),
    )

    pipeline = Pipeline(
@@ -96,8 +95,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

-    @assistant_aggregator.event_handler("on_assistant_turn_stopped")
-    async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
+    @assistant_aggregator.event_handler("on_assistant_message_added")
+    async def on_assistant_message_added(aggregator, message: AssistantTurnStoppedMessage):
        timestamp = f"[{message.timestamp}] " if message.timestamp else ""
        line = f"{timestamp}assistant: {message.content}"
        logger.info(f"Transcript: {line}")
--- a/src/pipecat/frames/frames.py
+++ b/src/pipecat/frames/frames.py
@@ -1439,6 +1439,27 @@ class STTMetadataFrame(ServiceMetadataFrame):
    ttfs_p99_latency: float


+@dataclass
+class RealtimeServiceMetadataFrame(ServiceMetadataFrame):
+    """Metadata announcing a realtime (speech-to-speech) LLM service.
+
+    Broadcast by realtime LLM services at pipeline start so downstream
+    processors — notably ``LLMContextAggregatorPair`` — can detect that
+    a realtime service is in the pipeline. The aggregator uses this to
+    surface a one-time recommendation to opt in to
+    ``RealtimeServiceModeConfig`` when it hasn't been configured.
+
+    Parameters:
+        emits_user_turn_frames: Whether this service emits
+            ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame``
+            from server-side turn signals. False for services with no
+            server-side turn signals (e.g. Gemini Live, AWS Nova Sonic,
+            Ultravox).
+    """
+
+    emits_user_turn_frames: bool = True
+
+
@dataclass
 class ServiceSwitcherRequestMetadataFrame(ControlFrame):
    """Request a service to re-emit its metadata frames.
--- a/src/pipecat/processors/aggregators/llm_response_universal.py
+++ b/src/pipecat/processors/aggregators/llm_response_universal.py
@@ -55,6 +55,7 @@ from pipecat.frames.frames import (
    LLMThoughtEndFrame,
    LLMThoughtStartFrame,
    LLMThoughtTextFrame,
+    RealtimeServiceMetadataFrame,
    StartFrame,
    TextFrame,
    TranscriptionFrame,
@@ -83,7 +84,11 @@ from pipecat.processors.aggregators.llm_context_summarizer import (
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.turns.user_idle_controller import UserIdleController
 from pipecat.turns.user_mute import BaseUserMuteStrategy
-from pipecat.turns.user_start import BaseUserTurnStartStrategy, UserTurnStartedParams
+from pipecat.turns.user_start import (
+    BaseUserTurnStartStrategy,
+    TranscriptionUserTurnStartStrategy,
+    UserTurnStartedParams,
+)
 from pipecat.turns.user_stop import BaseUserTurnStopStrategy, UserTurnStoppedParams
 from pipecat.turns.user_turn_completion_mixin import UserTurnCompletionConfig
 from pipecat.turns.user_turn_controller import UserTurnController
@@ -258,6 +263,55 @@ class LLMAssistantAggregatorParams:
            self.context_summarization_config = None


+@dataclass
+class RealtimeServiceModeConfig:
+    """Configure an ``LLMContextAggregatorPair`` for use with a realtime LLM service.
+
+    Both fields default to False (the recommended realtime behavior, dropping
+    transcript-related waits at both points in the flow). Override individual
+    fields to dial back to cascade-style behavior selectively.
+
+    Parameters:
+        context_writes_await_turns: When False (default), context writes are
+            triggered by the content stream itself (transcripts and assistant
+            text frames), making writes independent of turn-frame availability
+            and timing. When True, user messages are written to context on
+            user-turn-end frames (cascade behavior).
+        turns_await_transcripts: When False (default), turn-end fires as soon
+            as VAD signals end of speech, avoiding latency on the critical
+            path when local turn detection drives a realtime conversation.
+            When True, turn-end strategies wait for transcripts to arrive
+            before signalling end-of-turn.
+
+    Note:
+        Local VAD (via ``LLMUserAggregatorParams.vad_analyzer``) is intended
+        for use with realtime services that either don't emit
+        ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame``
+        themselves (Gemini Live, AWS Nova Sonic, Ultravox) or have their
+        server-side turn detection disabled (e.g. OpenAI Realtime with
+        ``turn_detection=False``). Wiring local VAD on top of a service
+        whose server-side turn detection is also active produces duplicate
+        user-turn frames from both sources — the service broadcasts them,
+        and the aggregator's local-VAD-driven strategies broadcast them
+        again. Pick one source.
+    """
+
+    context_writes_await_turns: bool = False
+    turns_await_transcripts: bool = False
+
+    def __post_init__(self):
+        """Validate the field combination."""
+        if not self.turns_await_transcripts and self.context_writes_await_turns:
+            raise ValueError(
+                "Invalid combination: turns fire early (without transcripts) "
+                "but context writes wait on those turn frames — context would "
+                "be written with incomplete user messages. Either set "
+                "turns_await_transcripts=True (preserve transcript-aware "
+                "turn-end timing) or context_writes_await_turns=False "
+                "(decouple writes from turn frames)."
+            )
+
+
@dataclass
 class UserTurnStoppedMessage:
    """A user turn stopped message containing a user transcript update.
@@ -266,13 +320,18 @@ class UserTurnStoppedMessage:
    the aggregated transcript that is then used in the context.

    Parameters:
-        content: The message content/text.
+        content: The message content/text. ``None`` in realtime mode
+            (``RealtimeServiceModeConfig(context_writes_await_turns=False)``)
+            when fired from a user-turn-stop frame, since the user message
+            hasn't been finalized at that point. Subscribers that need the
+            finalized text should listen to ``on_user_message_added``
+            instead.
        timestamp: When the user turn started.
        user_id: Optional identifier for the user.

    """

-    content: str
+    content: str | None
    timestamp: str
    user_id: str | None = None

@@ -567,6 +626,9 @@ class LLMUserAggregator(LLMContextAggregator):
        context: LLMContext,
        *,
        params: LLMUserAggregatorParams | None = None,
+        _realtime_service_mode: RealtimeServiceModeConfig | None = None,
+        _paired_half: "LLMAssistantAggregator | None" = None,
+        _pair_lock: asyncio.Lock | None = None,
        **kwargs,
    ):
        """Initialize the user context aggregator.
@@ -574,6 +636,14 @@ class LLMUserAggregator(LLMContextAggregator):
        Args:
            context: The LLM context for conversation storage.
            params: Configuration parameters for aggregation behavior.
+            _realtime_service_mode: Pair-internal. Realtime-mode
+                configuration propagated from
+                ``LLMContextAggregatorPair``. Not intended for direct use —
+                construct the aggregators via the pair.
+            _paired_half: Pair-internal. Back-reference to the paired
+                assistant aggregator for cross-half coordination.
+            _pair_lock: Pair-internal. Shared asyncio lock serializing
+                cross-half flushes.
            **kwargs: Additional arguments.
        """
        params = params or LLMUserAggregatorParams()
@@ -590,9 +660,23 @@ class LLMUserAggregator(LLMContextAggregator):
        self._register_event_handler("on_user_turn_stop_timeout")
        self._register_event_handler("on_user_turn_idle")
        self._register_event_handler("on_user_turn_inference_triggered")
+        self._register_event_handler("on_user_message_added")
        self._register_event_handler("on_user_mute_started")
        self._register_event_handler("on_user_mute_stopped")

+        # Realtime-mode wiring. Defaults (no config) preserve cascade
+        # behavior: context writes happen on turn frames, turns wait
+        # for transcripts.
+        self._realtime_service_mode = _realtime_service_mode
+        self._paired_half = _paired_half
+        self._pair_lock = _pair_lock
+        if _realtime_service_mode is not None:
+            self._context_writes_await_turns = _realtime_service_mode.context_writes_await_turns
+            self._turns_await_transcripts = _realtime_service_mode.turns_await_transcripts
+        else:
+            self._context_writes_await_turns = True
+            self._turns_await_transcripts = True
+
        user_turn_strategies = self._params.user_turn_strategies or UserTurnStrategies()

        # Deprecated path: translate filter_incomplete_user_turns into
@@ -606,8 +690,19 @@ class LLMUserAggregator(LLMContextAggregator):
            )
            self._params.user_turn_strategies = user_turn_strategies

+        # Realtime mutation: when turns shouldn't wait for transcripts,
+        # drop the transcription-based start strategy and flip the
+        # wait_for_transcript flag on stop strategies that expose it. The
+        # set of strategies that support it intentionally stays narrow —
+        # the flag was reintroduced specifically for this realtime path.
+        if not self._turns_await_transcripts:
+            self._apply_realtime_strategy_mutations(user_turn_strategies)
+
        self._user_is_muted = False
        self._user_turn_start_timestamp = ""
+        # Tracks whether the §3.6 recommendation log has already fired
+        # for this session — see _handle_realtime_service_metadata.
+        self._realtime_recommendation_logged = False
        # Full transcript across the user turn. Each
        # `_on_user_turn_inference_triggered` push captures only the
        # new segment since the previous push (push_aggregation resets
@@ -717,6 +812,9 @@ class LLMUserAggregator(LLMContextAggregator):
            await self.push_frame(frame, direction)
        elif isinstance(frame, LLMSetToolChoiceFrame):
            self.set_tool_choice(frame.tool_choice)
+        elif isinstance(frame, RealtimeServiceMetadataFrame):
+            await self._handle_realtime_service_metadata(frame)
+            await self.push_frame(frame, direction)
        else:
            await self.push_frame(frame, direction)

@@ -734,9 +832,16 @@ class LLMUserAggregator(LLMContextAggregator):
        self._context.add_message({"role": self.role, "content": aggregation})
        await self.push_context_frame()

+        message = UserTurnStoppedMessage(
+            content=aggregation, timestamp=self._user_turn_start_timestamp
+        )
+        await self._call_event_handler("on_user_message_added", message)
+
        return aggregation

    async def _start(self, frame: StartFrame):
+        self._validate_realtime_pairing()
+
        if self._vad_controller:
            await self._vad_controller.setup(self.task_manager)

@@ -748,13 +853,138 @@ class LLMUserAggregator(LLMContextAggregator):
            await s.setup(self.task_manager)

    async def _stop(self, frame: EndFrame):
-        await self._maybe_emit_user_turn_stopped(on_session_end=True)
+        if not self._context_writes_await_turns:
+            # Realtime: flush trailing user content directly. The
+            # on_user_turn_stopped event already fired (if turn frames
+            # were emitted), so don't re-fire it from session end.
+            await self.push_aggregation()
+        else:
+            await self._maybe_emit_user_turn_stopped(on_session_end=True)
        await self._cleanup()

    async def _cancel(self, frame: CancelFrame):
-        await self._maybe_emit_user_turn_stopped(on_session_end=True)
+        if not self._context_writes_await_turns:
+            await self.push_aggregation()
+        else:
+            await self._maybe_emit_user_turn_stopped(on_session_end=True)
        await self._cleanup()

+    def _validate_realtime_pairing(self):
+        """Validate the realtime-mode wiring set by ``LLMContextAggregatorPair``.
+
+        Realtime mode requires both halves to be paired through the
+        ``LLMContextAggregatorPair`` so cross-half flushes can find each
+        other. Direct construction of a half with the private realtime
+        kwargs is not supported.
+        """
+        if not self._context_writes_await_turns:
+            if self._paired_half is None:
+                raise RuntimeError(
+                    f"{self}: realtime_service_mode is configured but this user "
+                    "aggregator has no paired assistant aggregator. Construct "
+                    "the pair via LLMContextAggregatorPair("
+                    "context, realtime_service_mode=RealtimeServiceModeConfig())."
+                )
+        if self._paired_half is not None:
+            if (
+                self._context_writes_await_turns != self._paired_half._context_writes_await_turns
+                or self._turns_await_transcripts != self._paired_half._turns_await_transcripts
+            ):
+                raise RuntimeError(
+                    f"{self}: realtime-mode config mismatch between user and "
+                    "assistant halves. Use LLMContextAggregatorPair to construct "
+                    "the pair so both halves share the same configuration."
+                )
+
+    def _apply_realtime_strategy_mutations(self, user_turn_strategies: UserTurnStrategies) -> None:
+        """Mutate turn strategies for the realtime ``turns_await_transcripts=False`` path.
+
+        Drops ``TranscriptionUserTurnStartStrategy`` from the start strategies
+        (transcripts shouldn't start a turn when the realtime service drives
+        the conversation) and flips ``wait_for_transcript=False`` on stop
+        strategies that expose the flag, so end-of-turn fires as soon as VAD /
+        the turn analyzer reports end-of-speech.
+        """
+        custom_strategies = self._params.user_turn_strategies is not None
+
+        start_strategies = user_turn_strategies.start or []
+        dropped: list[str] = []
+        kept_start: list[BaseUserTurnStartStrategy] = []
+        for s in start_strategies:
+            if isinstance(s, TranscriptionUserTurnStartStrategy):
+                dropped.append(s.__class__.__name__)
+            else:
+                kept_start.append(s)
+        user_turn_strategies.start = kept_start
+
+        flipped: list[str] = []
+        for s in user_turn_strategies.stop or []:
+            if hasattr(s, "wait_for_transcript"):
+                try:
+                    s.wait_for_transcript = False
+                    flipped.append(s.__class__.__name__)
+                except AttributeError:
+                    # Strategy exposes the property but no setter — skip.
+                    pass
+
+        if not dropped and not flipped:
+            return
+
+        msg = (
+            f"{self}: realtime_service_mode(turns_await_transcripts=False) — "
+            f"dropped {dropped or 'no'} start strategy(ies); set "
+            f"wait_for_transcript=False on {flipped or 'no'} stop strategy(ies)."
+        )
+        if custom_strategies:
+            logger.warning(msg)
+        else:
+            logger.debug(msg)
+
+    async def _handle_realtime_service_metadata(self, frame: RealtimeServiceMetadataFrame):
+        """Handle a ``RealtimeServiceMetadataFrame`` broadcast by a realtime LLM service.
+
+        When ``realtime_service_mode`` isn't configured, log a one-time INFO
+        recommendation pointing the user at the option and warning about the
+        timing change on ``on_user_turn_stopped``. When it is configured, log
+        a confirming debug message. Fires at most once per session.
+        """
+        if self._realtime_recommendation_logged:
+            return
+        self._realtime_recommendation_logged = True
+
+        if self._realtime_service_mode is None:
+            logger.info(
+                f"{self}: detected realtime service `{frame.service_name}` in the "
+                "pipeline. For correct context-write semantics with realtime "
+                "services, consider passing "
+                "realtime_service_mode=RealtimeServiceModeConfig() to "
+                "LLMContextAggregatorPair. Note: this changes when user messages "
+                "are written to context — they're written when the assistant "
+                "response starts rather than when the user-turn-end frame fires. "
+                "Subscribe to `on_user_message_added` instead of "
+                "`on_user_turn_stopped` if you need post-write semantics."
+            )
+        else:
+            logger.debug(
+                f"{self}: detected realtime service `{frame.service_name}`; "
+                "realtime_service_mode is configured."
+            )
+
+    async def _realtime_handoff_flush(self) -> None:
+        """Flush pending user aggregation to context.
+
+        Called by the paired assistant half from
+        ``_realtime_handle_llm_start`` (i.e. on ``LLMFullResponseStartFrame``)
+        to commit the in-flight user message before the assistant starts
+        its own turn. No-op when there's no pending content.
+        """
+        if not self._aggregation:
+            return
+        # push_aggregation writes the message to context, pushes
+        # LLMContextFrame, and emits on_user_message_added.
+        await self.push_aggregation()
+        self._user_turn_start_timestamp = ""
+
    async def _cleanup(self):
        if self._vad_controller:
            await self._vad_controller.cleanup()
@@ -826,6 +1056,10 @@ class LLMUserAggregator(LLMContextAggregator):
            await self.push_context_frame()

    async def _handle_transcription(self, frame: TranscriptionFrame):
+        if not self._context_writes_await_turns:
+            await self._realtime_handle_transcription(frame)
+            return
+
        text = frame.text

        # Make sure we really have some text.
@@ -839,6 +1073,30 @@ class LLMUserAggregator(LLMContextAggregator):
            )
        )

+    async def _realtime_handle_transcription(self, frame: TranscriptionFrame):
+        """Realtime variant: signal the paired assistant half to flush, then append.
+
+        The first new user transcript after an assistant turn ends is what
+        commits the assistant's pending message to context. The flush is
+        idempotent (no-op when nothing pending), so it's safe to call on
+        every chunk.
+        """
+        if not frame.text.strip():
+            return
+
+        if self._paired_half is not None and self._pair_lock is not None:
+            async with self._pair_lock:
+                await self._paired_half._realtime_handoff_flush()
+
+        if not self._user_turn_start_timestamp:
+            self._user_turn_start_timestamp = time_now_iso8601()
+
+        self._aggregation.append(
+            TextPartForConcatenation(
+                frame.text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
+            )
+        )
+
    async def _queued_broadcast_frame(self, frame_cls: type[Frame], **kwargs):
        """Broadcasts a frame upstream and queues it for internal processing.

@@ -903,6 +1161,17 @@ class LLMUserAggregator(LLMContextAggregator):
        controller: UserTurnController,
        strategy: BaseUserTurnStopStrategy,
    ):
+        if not self._context_writes_await_turns:
+            # Realtime: turn frames are supplemental — they don't drive
+            # context writes. Fire the event without pushing aggregation;
+            # the trailing-write path commits the user message instead.
+            logger.debug(
+                f"{self}: User turn inference triggered (strategy: {strategy}) "
+                "[realtime: event-only, no context push]"
+            )
+            await self._call_event_handler("on_user_turn_inference_triggered", strategy)
+            return
+
        logger.debug(f"{self}: User turn inference triggered (strategy: {strategy})")

        # Push aggregation now: this writes the user message segment to
@@ -935,6 +1204,17 @@ class LLMUserAggregator(LLMContextAggregator):

        await self._user_idle_controller.process_frame(UserStoppedSpeakingFrame())

+        if not self._context_writes_await_turns:
+            # Realtime: turn frames are supplemental. The user message
+            # isn't finalized at turn-stop time — content is None.
+            # Subscribers wanting the finalized text use
+            # on_user_message_added instead.
+            message = UserTurnStoppedMessage(
+                content=None, timestamp=self._user_turn_start_timestamp
+            )
+            await self._call_event_handler("on_user_turn_stopped", strategy, message)
+            return
+
        await self._maybe_emit_user_turn_stopped(strategy)

    async def _on_reset_aggregation(
@@ -1030,6 +1310,9 @@ class LLMAssistantAggregator(LLMContextAggregator):
        context: LLMContext,
        *,
        params: LLMAssistantAggregatorParams | None = None,
+        _realtime_service_mode: RealtimeServiceModeConfig | None = None,
+        _paired_half: "LLMUserAggregator | None" = None,
+        _pair_lock: asyncio.Lock | None = None,
        **kwargs,
    ):
        """Initialize the assistant context aggregator.
@@ -1037,6 +1320,14 @@ class LLMAssistantAggregator(LLMContextAggregator):
        Args:
            context: The OpenAI LLM context for conversation storage.
            params: Configuration parameters for aggregation behavior.
+            _realtime_service_mode: Pair-internal. Realtime-mode
+                configuration propagated from
+                ``LLMContextAggregatorPair``. Not intended for direct use —
+                construct the aggregators via the pair.
+            _paired_half: Pair-internal. Back-reference to the paired
+                user aggregator for cross-half coordination.
+            _pair_lock: Pair-internal. Shared asyncio lock serializing
+                cross-half flushes.
            **kwargs: Additional arguments.
        """
        params = params or LLMAssistantAggregatorParams()
@@ -1048,6 +1339,24 @@ class LLMAssistantAggregator(LLMContextAggregator):
        )
        self._params = params

+        # Realtime-mode wiring. Defaults (no config) preserve cascade
+        # behavior: write to context on LLMFullResponseEndFrame.
+        self._realtime_service_mode = _realtime_service_mode
+        self._paired_half = _paired_half
+        self._pair_lock = _pair_lock
+        if _realtime_service_mode is not None:
+            self._context_writes_await_turns = _realtime_service_mode.context_writes_await_turns
+            self._turns_await_transcripts = _realtime_service_mode.turns_await_transcripts
+        else:
+            self._context_writes_await_turns = True
+            self._turns_await_transcripts = True
+
+        # Realtime mode only. Holds the assistant turn's content between
+        # LLMFullResponseEndFrame (the moment we mark it ready to flush)
+        # and the next user transcript (the moment we actually write it
+        # to context).
+        self._pending_assistant_message_to_flush: dict | None = None
+
        self._function_calls_in_progress: dict[str, FunctionCallInProgressFrame | None] = {}
        self._function_calls_image_results: dict[str, UserImageRawFrame] = {}
        self._context_updated_tasks: set[asyncio.Task] = set()
@@ -1084,6 +1393,7 @@ class LLMAssistantAggregator(LLMContextAggregator):

        self._register_event_handler("on_assistant_turn_started")
        self._register_event_handler("on_assistant_turn_stopped")
+        self._register_event_handler("on_assistant_message_added")
        self._register_event_handler("on_assistant_thought")
        self._register_event_handler("on_summary_applied")

@@ -1184,6 +1494,10 @@ class LLMAssistantAggregator(LLMContextAggregator):
            if self._push_context_on_bot_stopped_speaking and not self._user_speaking:
                logger.debug(f"{self}: Bot stopped speaking — pushing deferred context frame!")
                await self.push_context_frame(FrameDirection.UPSTREAM)
+        elif isinstance(frame, RealtimeServiceMetadataFrame):
+            # The user half logs the §3.6 recommendation; the assistant
+            # half just passes the frame through.
+            await self.push_frame(frame, direction)
        else:
            await self.push_frame(frame, direction)

@@ -1192,9 +1506,37 @@ class LLMAssistantAggregator(LLMContextAggregator):
            await self._summarizer.process_frame(frame)

    async def _start(self, frame: StartFrame):
+        self._validate_realtime_pairing()
        if self._summarizer:
            await self._summarizer.setup(self.task_manager)

+    def _validate_realtime_pairing(self):
+        """Validate the realtime-mode wiring set by ``LLMContextAggregatorPair``.
+
+        Realtime mode requires both halves to be paired through the
+        ``LLMContextAggregatorPair`` so cross-half flushes can find each
+        other. Direct construction of a half with the private realtime
+        kwargs is not supported.
+        """
+        if not self._context_writes_await_turns:
+            if self._paired_half is None:
+                raise RuntimeError(
+                    f"{self}: realtime_service_mode is configured but this assistant "
+                    "aggregator has no paired user aggregator. Construct the pair "
+                    "via LLMContextAggregatorPair("
+                    "context, realtime_service_mode=RealtimeServiceModeConfig())."
+                )
+        if self._paired_half is not None:
+            if (
+                self._context_writes_await_turns != self._paired_half._context_writes_await_turns
+                or self._turns_await_transcripts != self._paired_half._turns_await_transcripts
+            ):
+                raise RuntimeError(
+                    f"{self}: realtime-mode config mismatch between user and "
+                    "assistant halves. Use LLMContextAggregatorPair to construct "
+                    "the pair so both halves share the same configuration."
+                )
+
    async def push_aggregation(self) -> str:
        """Push the current assistant aggregation with timestamp."""
        if not self._aggregation:
@@ -1247,6 +1589,12 @@ class LLMAssistantAggregator(LLMContextAggregator):

    async def _handle_end_or_cancel(self, frame: Frame):
        await self._trigger_assistant_turn_stopped(interrupted=isinstance(frame, CancelFrame))
+        if not self._context_writes_await_turns:
+            # Flush any pending assistant content parked by
+            # _realtime_trigger_assistant_turn_stopped (i.e. the bot
+            # finished its last reply but no follow-up user transcript
+            # arrived before the session ended).
+            await self._realtime_handoff_flush()
        if self._summarizer:
            await self._summarizer.cleanup()

@@ -1349,26 +1697,7 @@ class LLMAssistantAggregator(LLMContextAggregator):
                    run_llm = True

        if run_llm and not self._user_speaking:
-            if self.has_queued_frame(FunctionCallResultFrame):
-                # Another FunctionCallResultFrame is already queued. Defer the context push
-                # to bundle all results into a single LLM call instead of triggering one
-                # inference pass per result. The context will be pushed once the last
-                # function call in the queue is processed.
-                logger.debug(
-                    f"{self}: More FunctionCallResultFrames queued — deferring context frame push."
-                )
-            elif self._bot_speaking:
-                # Defer the context frame push until the bot finishes speaking. If multiple
-                # function call results arrive while the bot is speaking, they all accumulate
-                # in the context and a single push is performed once speaking stops, preventing
-                # the LLM from running multiple times and producing duplicated responses.
-                # This should be an edge case, since it would require a FunctionCallResultFrame
-                # being queued between an LLM response start and end frame.
-                logger.debug(f"{self}: Bot is speaking — deferring context frame push.")
-                self._push_context_on_bot_stopped_speaking = True
-            else:
-                logger.debug(f"{self}: Pushing context frame!")
-                await self.push_context_frame(FrameDirection.UPSTREAM)
+            await self._maybe_push_context_after_function_result()

        # Call the `on_context_updated` callback once the function call result
        # is added to the context. Also, run this in a separate task to make
@@ -1379,6 +1708,42 @@ class LLMAssistantAggregator(LLMContextAggregator):
            self._context_updated_tasks.add(task)
            task.add_done_callback(self._context_updated_task_finished)

+    async def _maybe_push_context_after_function_result(self) -> None:
+        """Decide whether to push a context frame after a function-call result.
+
+        Dispatched by mode. Cascade re-runs LLM inference by pushing an
+        ``LLMContextFrame`` upstream (with care to avoid duplicate pushes
+        while results are queued or the bot is still speaking). Realtime
+        services consume function results directly via
+        ``FunctionCallResultFrame``, so the context-driven re-inference
+        cycle is unnecessary.
+        """
+        if not self._context_writes_await_turns:
+            # Realtime: the realtime service has the result via
+            # FunctionCallResultFrame. No context push needed.
+            return
+
+        if self.has_queued_frame(FunctionCallResultFrame):
+            # Another FunctionCallResultFrame is already queued. Defer the context push
+            # to bundle all results into a single LLM call instead of triggering one
+            # inference pass per result. The context will be pushed once the last
+            # function call in the queue is processed.
+            logger.debug(
+                f"{self}: More FunctionCallResultFrames queued — deferring context frame push."
+            )
+        elif self._bot_speaking:
+            # Defer the context frame push until the bot finishes speaking. If multiple
+            # function call results arrive while the bot is speaking, they all accumulate
+            # in the context and a single push is performed once speaking stops, preventing
+            # the LLM from running multiple times and producing duplicated responses.
+            # This should be an edge case, since it would require a FunctionCallResultFrame
+            # being queued between an LLM response start and end frame.
+            logger.debug(f"{self}: Bot is speaking — deferring context frame push.")
+            self._push_context_on_bot_stopped_speaking = True
+        else:
+            logger.debug(f"{self}: Pushing context frame!")
+            await self.push_context_frame(FrameDirection.UPSTREAM)
+
    async def _handle_function_call_intermediate_result(
        self, frame: FunctionCallResultFrame, in_progress_frame: FunctionCallInProgressFrame
    ):
@@ -1469,6 +1834,20 @@ class LLMAssistantAggregator(LLMContextAggregator):
            )

    async def _handle_llm_start(self, _: LLMFullResponseStartFrame):
+        if not self._context_writes_await_turns:
+            await self._realtime_handle_llm_start()
+            return
+        await self._trigger_assistant_turn_started()
+
+    async def _realtime_handle_llm_start(self):
+        """Realtime: flush the paired user half before starting the assistant turn.
+
+        The first content frame of an assistant turn is the trigger to
+        commit any in-flight user transcript to context.
+        """
+        if self._paired_half is not None and self._pair_lock is not None:
+            async with self._pair_lock:
+                await self._paired_half._realtime_handoff_flush()
        await self._trigger_assistant_turn_started()

    async def _handle_llm_end(self, _: LLMFullResponseEndFrame):
@@ -1606,6 +1985,10 @@ class LLMAssistantAggregator(LLMContextAggregator):
        await self._call_event_handler("on_assistant_turn_started")

    async def _trigger_assistant_turn_stopped(self, *, interrupted: bool = False):
+        if not self._context_writes_await_turns:
+            await self._realtime_trigger_assistant_turn_stopped(interrupted=interrupted)
+            return
+
        if not self._assistant_turn_start_timestamp:
            return

@@ -1620,9 +2003,86 @@ class LLMAssistantAggregator(LLMContextAggregator):
            timestamp=self._assistant_turn_start_timestamp,
        )
        await self._call_event_handler("on_assistant_turn_stopped", message)
+        if aggregation:
+            await self._call_event_handler("on_assistant_message_added", message)

        self._assistant_turn_start_timestamp = ""

+    async def _realtime_trigger_assistant_turn_stopped(self, *, interrupted: bool):
+        """Realtime variant: defer the context write or flush on interruption.
+
+        Normal end-of-turn (``interrupted=False``, from
+        ``LLMFullResponseEndFrame``) parks the message text in a pending
+        slot — it isn't written to context until the next user transcript
+        arrives or the session ends. Interruption (``interrupted=True``)
+        commits immediately, matching today's
+        ``AssistantTurnStoppedMessage.interrupted`` semantics.
+        """
+        if not self._assistant_turn_start_timestamp:
+            return
+
+        timestamp = self._assistant_turn_start_timestamp
+        self._assistant_turn_start_timestamp = ""
+
+        if interrupted:
+            aggregation = await self.push_aggregation()
+            if aggregation:
+                aggregation = self._maybe_strip_turn_completion_markers(aggregation)
+            message = AssistantTurnStoppedMessage(
+                content=aggregation, interrupted=True, timestamp=timestamp
+            )
+            await self._call_event_handler("on_assistant_turn_stopped", message)
+            if aggregation:
+                await self._call_event_handler("on_assistant_message_added", message)
+            return
+
+        # Normal end. Park the message for trailing write.
+        raw_aggregation = self.aggregation_string()
+        if raw_aggregation:
+            self._pending_assistant_message_to_flush = {
+                "raw": raw_aggregation,
+                "timestamp": timestamp,
+            }
+        await self.reset()
+        stripped = (
+            self._maybe_strip_turn_completion_markers(raw_aggregation) if raw_aggregation else ""
+        )
+        message = AssistantTurnStoppedMessage(
+            content=stripped, interrupted=False, timestamp=timestamp
+        )
+        await self._call_event_handler("on_assistant_turn_stopped", message)
+
+    async def _realtime_handoff_flush(self) -> None:
+        """Flush pending assistant aggregation to context.
+
+        Called by the paired user half from
+        ``_realtime_handle_transcription`` when a new transcript arrives,
+        committing the assistant's deferred message before the user
+        starts a new turn. No-op when nothing is pending.
+        """
+        if self._pending_assistant_message_to_flush is None:
+            return
+        pending = self._pending_assistant_message_to_flush
+        self._pending_assistant_message_to_flush = None
+
+        raw = pending["raw"]
+        timestamp = pending["timestamp"]
+
+        # Mirror push_aggregation: write the raw aggregation (with any
+        # turn-completion markers intact) to context, emit LLMContextFrame
+        # and the timestamp frame. Markers are stripped only from the
+        # event-carried text.
+        self._context.add_message({"role": "assistant", "content": raw})
+        await self.push_context_frame()
+        timestamp_frame = LLMContextAssistantTimestampFrame(timestamp=time_now_iso8601())
+        await self.push_frame(timestamp_frame)
+
+        stripped = self._maybe_strip_turn_completion_markers(raw)
+        message = AssistantTurnStoppedMessage(
+            content=stripped, interrupted=False, timestamp=timestamp
+        )
+        await self._call_event_handler("on_assistant_message_added", message)
+
    def _maybe_strip_turn_completion_markers(self, text: str) -> str:
        """Strip turn completion markers from assistant transcript.

@@ -1685,6 +2145,7 @@ class LLMContextAggregatorPair:
        user_params: LLMUserAggregatorParams | None = None,
        assistant_params: LLMAssistantAggregatorParams | None = None,
        add_tool_change_messages: bool | None = None,
+        realtime_service_mode: RealtimeServiceModeConfig | None = None,
    ):
        """Initialize the LLM context aggregator pair.

@@ -1702,14 +2163,38 @@ class LLMContextAggregatorPair:
                announcement is added exactly once (the second aggregator's
                diff is empty by the time it sees the frame). Leave as
                ``None`` to respect per-params settings.
+            realtime_service_mode: When provided, configures the pair for
+                use with a realtime (speech-to-speech) LLM service.
+                Context writes become trailing — driven by the content
+                stream itself (transcripts, ``LLMFullResponseStartFrame``)
+                rather than turn frames — and, by default, turn-end
+                strategies stop waiting for transcripts. Both halves share
+                this configuration via a private channel; mismatched
+                halves are rejected at ``StartFrame``. Defaults to
+                ``None``, which preserves cascade behavior.
        """
        user_params = user_params or LLMUserAggregatorParams()
        assistant_params = assistant_params or LLMAssistantAggregatorParams()
        if add_tool_change_messages is not None:
            user_params.add_tool_change_messages = add_tool_change_messages
            assistant_params.add_tool_change_messages = add_tool_change_messages
-        self._user = LLMUserAggregator(context, params=user_params)
-        self._assistant = LLMAssistantAggregator(context, params=assistant_params)
+
+        pair_lock = asyncio.Lock() if realtime_service_mode is not None else None
+        self._user = LLMUserAggregator(
+            context,
+            params=user_params,
+            _realtime_service_mode=realtime_service_mode,
+            _pair_lock=pair_lock,
+        )
+        self._assistant = LLMAssistantAggregator(
+            context,
+            params=assistant_params,
+            _realtime_service_mode=realtime_service_mode,
+            _pair_lock=pair_lock,
+        )
+        # Wire the cross-half back-references after both halves exist.
+        self._user._paired_half = self._assistant
+        self._assistant._paired_half = self._user

    def user(self) -> LLMUserAggregator:
        """Get the user context aggregator.
--- a/src/pipecat/services/aws/nova_sonic/llm.py
+++ b/src/pipecat/services/aws/nova_sonic/llm.py
@@ -56,7 +56,7 @@ from pipecat.services.aws.nova_sonic.session_continuation import (
    SessionContinuationHelper,
    SessionContinuationParams,
 )
-from pipecat.services.llm_service import LLMService
+from pipecat.services.llm_service import LLMService, RealtimeServiceInfo
 from pipecat.services.settings import NOT_GIVEN, LLMSettings, _NotGiven, assert_given
 from pipecat.utils.time import time_now_iso8601

@@ -241,6 +241,17 @@ class AWSNovaSonicLLMService(LLMService[AWSNovaSonicLLMAdapter]):

    Provides bidirectional audio streaming, real-time transcription, text generation,
    and function calling capabilities using AWS Nova Sonic model.
+
+    Does NOT emit ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame``,
+    so pipeline processors that depend on those frames — RTVI client
+    speech events, ``TurnTrackingObserver``, ``AudioBufferProcessor`` turn
+    recording, ``UserIdleController``, user mute strategies, voicemail
+    detector — won't activate with the default server-VAD-only setup. Pair
+    with ``LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())``
+    so context writes are correct anyway. To produce the turn frames
+    locally, wire ``vad_analyzer=SileroVADAnalyzer()`` (or similar) into
+    ``LLMUserAggregatorParams``; locally-generated turn boundaries are a
+    heuristic and may not match Nova Sonic's server-side turn decisions.
    """

    Settings = AWSNovaSonicLLMSettings
@@ -249,6 +260,10 @@ class AWSNovaSonicLLMService(LLMService[AWSNovaSonicLLMAdapter]):
    # Override the default adapter to use the AWSNovaSonicLLMAdapter one
    adapter_class = AWSNovaSonicLLMAdapter

+    # Realtime (speech-to-speech) service. Does NOT emit
+    # UserStarted/StoppedSpeakingFrame from server-side turn signals.
+    _realtime_service_info = RealtimeServiceInfo(emits_user_turn_frames=False)
+
    def __init__(
        self,
        *,
@@ -1428,9 +1443,15 @@ class AWSNovaSonicLLMService(LLMService[AWSNovaSonicLLMAdapter]):
                        if self._sc.on_content_end_assistant_final_text(content.text_content):
                            self.create_task(self._run_sc_handoff(), name="sc_handoff")
                else:
+                    # FINAL TEXT INTERRUPTED is the canonical barge-in
+                    # signal. The AUDIO branch usually closed the
+                    # response already (AUDIO contentEnd arrives with
+                    # END_TURN on barge-in, before this), but the
+                    # output transport's audio buffer is still draining
+                    # — broadcast unconditionally to clear it.
+                    await self.broadcast_interruption()
                    if self._assistant_is_responding:
-                        # TEXT INTERRUPTED before audio started means no AUDIO
-                        # contentEnd will arrive — end the response here.
+                        # No AUDIO contentEnd will arrive — close here.
                        self._assistant_is_responding = False
                        await self._report_assistant_response_ended()
                    # Session continuation: TEXT INTERRUPTED is a completion
@@ -1443,6 +1464,18 @@ class AWSNovaSonicLLMService(LLMService[AWSNovaSonicLLMAdapter]):
                if stop_reason in ("END_TURN", "INTERRUPTED"):
                    # END_TURN: normal completion. INTERRUPTED: user interrupted
                    # mid-audio. Both mean no more audio for this turn.
+                    if stop_reason == "INTERRUPTED":
+                        # Emit InterruptionFrame upstream so the assistant
+                        # aggregator marks the message interrupted=True, and
+                        # downstream so BaseOutputTransport clears the audio
+                        # buffer (without this the bot keeps talking past the
+                        # interruption while the buffer drains, since Nova
+                        # Sonic doesn't surface server-side interruption any
+                        # other way). Must fire before
+                        # _report_assistant_response_ended so the aggregator
+                        # handles InterruptionFrame before LLMFullResponseEndFrame
+                        # closes the turn.
+                        await self.broadcast_interruption()
                    self._assistant_is_responding = False
                    await self._report_assistant_response_ended()
        elif content.role == Role.USER:
--- a/src/pipecat/services/google/gemini_live/llm.py
+++ b/src/pipecat/services/google/gemini_live/llm.py
@@ -62,7 +62,7 @@ from pipecat.processors.aggregators.llm_context import LLMContext, LLMSpecificMe
 from pipecat.processors.frame_processor import FrameDirection
 from pipecat.services.google.frames import LLMSearchOrigin, LLMSearchResponseFrame, LLMSearchResult
 from pipecat.services.google.utils import update_google_client_http_options
-from pipecat.services.llm_service import FunctionCallFromLLM, LLMService
+from pipecat.services.llm_service import FunctionCallFromLLM, LLMService, RealtimeServiceInfo
 from pipecat.services.settings import NOT_GIVEN, LLMSettings, _NotGiven, assert_given
 from pipecat.transcriptions.language import Language, resolve_language
 from pipecat.utils.string import match_endofsentence
@@ -361,6 +361,18 @@ class GeminiLiveLLMService(LLMService[GeminiLLMAdapter]):
    This service enables real-time conversations with Gemini, supporting both
    text and audio modalities. It handles voice transcription, streaming audio
    responses, and tool usage.
+
+    Does NOT emit ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame``
+    (the API exposes an ``interrupted`` event but no turn-start/-end), so
+    pipeline processors that depend on those frames — RTVI client speech
+    events, ``TurnTrackingObserver``, ``AudioBufferProcessor`` turn
+    recording, ``UserIdleController``, user mute strategies, voicemail
+    detector — won't activate with the default server-VAD-only setup. Pair
+    with ``LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())``
+    so context writes are correct anyway. To produce the turn frames
+    locally, see ``examples/realtime/realtime-gemini-live-locally-driven-turns.py``;
+    note that locally-generated turn boundaries are a heuristic and may
+    not match Gemini Live's server-side turn decisions.
    """

    Settings = GeminiLiveLLMSettings
@@ -369,6 +381,11 @@ class GeminiLiveLLMService(LLMService[GeminiLLMAdapter]):
    # Overriding the default adapter to use the Gemini one.
    adapter_class = GeminiLLMAdapter

+    # Realtime (speech-to-speech) service. Does NOT emit
+    # UserStarted/StoppedSpeakingFrame from server-side turn signals —
+    # the API exposes an `interrupted` event but no turn-start/-end.
+    _realtime_service_info = RealtimeServiceInfo(emits_user_turn_frames=False)
+
    @property
    def _is_gemini_3(self) -> bool:
        """Check if the current model is a Gemini 3.x model."""
--- a/src/pipecat/services/inworld/realtime/llm.py
+++ b/src/pipecat/services/inworld/realtime/llm.py
@@ -51,7 +51,7 @@ from pipecat.metrics.metrics import LLMTokenUsage
 from pipecat.processors.aggregators import async_tool_messages
 from pipecat.processors.aggregators.llm_context import LLMContext, LLMSpecificMessage
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.llm_service import FunctionCallFromLLM, LLMService
+from pipecat.services.llm_service import FunctionCallFromLLM, LLMService, RealtimeServiceInfo
 from pipecat.services.settings import (
    NOT_GIVEN,
    LLMSettings,
@@ -201,6 +201,16 @@ class InworldRealtimeLLMService(LLMService[InworldRealtimeLLMAdapter]):
    Supports function calling, conversation management, and real-time
    transcription.

+    Emits ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame`` from
+    Inworld's server-side VAD events. Pair with
+    ``LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())``
+    so context writes are decoupled from those frames. If you wire local
+    VAD (``LLMUserAggregatorParams.vad_analyzer``) on top of this
+    service, disable Inworld's server-side turn detection first via
+    ``turn_detection=None`` (manual mode); otherwise both sources
+    broadcast duplicate user-turn frames. See
+    ``examples/realtime/realtime-inworld-locally-driven-turns.py``.
+
    Example::

        llm = InworldRealtimeLLMService(
@@ -245,6 +255,10 @@ class InworldRealtimeLLMService(LLMService[InworldRealtimeLLMAdapter]):

    adapter_class = InworldRealtimeLLMAdapter

+    # Realtime (speech-to-speech) service. Emits UserStarted/Stopped
+    # speaking frames from server-side VAD events.
+    _realtime_service_info = RealtimeServiceInfo(emits_user_turn_frames=True)
+
    # Target ~60ms audio chunks when sending to Inworld (16-bit mono).
    _AUDIO_CHUNK_TARGET_MS = 60

@@ -417,12 +431,25 @@ class InworldRealtimeLLMService(LLMService[InworldRealtimeLLMAdapter]):
            return rate
        return getattr(self, "_output_sample_rate", 24000)

+    def _is_manual_turn_detection(self) -> bool:
+        """Whether server-side turn detection is disabled (manual mode)."""
+        session_properties = assert_given(self._settings.session_properties)
+        return bool(
+            session_properties.audio
+            and session_properties.audio.input
+            and session_properties.audio.input.turn_detection is None
+        )
+
    async def _handle_interruption(self):
        """Handle user interruption of assistant speech.

-        Inworld's server-side VAD handles response cancellation and buffer
-        cleanup automatically, so we only need to clean up local state.
+        Server-side VAD handles response cancellation and buffer cleanup
+        automatically; in manual mode the client must send the cancel
+        and clear events explicitly.
        """
+        if self._is_manual_turn_detection():
+            await self.send_client_event(events.InputAudioBufferClearEvent())
+            await self.send_client_event(events.ResponseCancelEvent())
        await self._truncate_current_audio_response()
        await self.stop_all_metrics()

@@ -437,10 +464,16 @@ class InworldRealtimeLLMService(LLMService[InworldRealtimeLLMAdapter]):
    async def _handle_user_stopped_speaking(self, frame):
        """Handle user stopped speaking event.

-        Inworld's server-side VAD handles commit and response creation,
-        so this is a no-op. Metrics are started in _handle_evt_speech_stopped.
+        Server-side VAD handles commit and response creation
+        automatically; in manual mode the client must send them
+        explicitly. Metrics are started in _handle_evt_speech_stopped
+        in the server-VAD path.
        """
-        pass
+        if self._is_manual_turn_detection():
+            await self.start_ttfb_metrics()
+            await self.start_processing_metrics()
+            await self.send_client_event(events.InputAudioBufferCommitEvent())
+            await self.send_client_event(events.ResponseCreateEvent())

    async def _handle_bot_stopped_speaking(self):
        """Handle bot stopped speaking event."""
--- a/src/pipecat/services/llm_service.py
+++ b/src/pipecat/services/llm_service.py
@@ -16,6 +16,7 @@ from collections.abc import Awaitable, Callable, Mapping, Sequence
 from dataclasses import dataclass
 from typing import (
    Any,
+    ClassVar,
    Generic,
    Protocol,
    cast,
@@ -48,6 +49,7 @@ from pipecat.frames.frames import (
    LLMFullResponseStartFrame,
    LLMTextFrame,
    LLMUpdateSettingsFrame,
+    RealtimeServiceMetadataFrame,
    StartFrame,
 )
 from pipecat.processors.aggregators.llm_context import (
@@ -97,6 +99,31 @@ class FunctionCallResultCallback(Protocol):
        ...


+@dataclass(frozen=True)
+class RealtimeServiceInfo:
+    """Per-service metadata for realtime (speech-to-speech) LLM services.
+
+    Realtime LLM subclasses set ``LLMService._realtime_service_info`` to a
+    populated instance; the presence of a non-None value is what marks a
+    service as realtime. Non-realtime services keep the default ``None``.
+
+    Carries the configuration ``LLMService`` and
+    ``LLMContextAggregatorPair`` need to wire up realtime behavior:
+    auto-broadcasting ``RealtimeServiceMetadataFrame`` at start, the
+    startup INFO log for services with no server-side turn signals, and
+    the aggregator's one-time recommendation log.
+
+    Parameters:
+        emits_user_turn_frames: Whether the service emits
+            ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame``
+            from server-side turn signals. False for services with no
+            server-side turn signals (e.g. Gemini Live, AWS Nova Sonic,
+            Ultravox).
+    """
+
+    emits_user_turn_frames: bool = True
+
+
@dataclass
 class FunctionCallParams:
    """Parameters for a function call.
@@ -244,6 +271,15 @@ class LLMService(UserTurnCompletionLLMServiceMixin, AIService, Generic[TAdapter]
    # However, subclasses should override this with a more specific adapter when necessary.
    adapter_class: type[BaseLLMAdapter] = OpenAILLMAdapter

+    # Marker + per-service config for realtime (speech-to-speech) LLM
+    # services. Realtime subclasses override this with a populated
+    # ``RealtimeServiceInfo`` instance — the presence of a non-None value
+    # is what marks the service as realtime. Non-realtime services keep
+    # the default ``None`` and the realtime-specific machinery
+    # (auto-broadcast of ``RealtimeServiceMetadataFrame``, startup INFO
+    # log for services without server-side turn signals) stays inert.
+    _realtime_service_info: ClassVar[RealtimeServiceInfo | None] = None
+
    # Returned to the LLM as the tool result when an unavailable function is
    # called. Deliberately neutral about future availability so the LLM can
    # pick the function up again if it returns (e.g. via the
@@ -363,6 +399,21 @@ class LLMService(UserTurnCompletionLLMServiceMixin, AIService, Generic[TAdapter]
            await self._create_sequential_runner_task()
        if self._enable_async_tool_cancellation and self._has_async_tools():
            self._setup_async_tool_cancellation()
+        if (
+            self._realtime_service_info is not None
+            and not self._realtime_service_info.emits_user_turn_frames
+        ):
+            logger.info(
+                f"{self} does not emit UserStartedSpeakingFrame/"
+                "UserStoppedSpeakingFrame. Pipeline processors that depend on "
+                "these frames (RTVI client speech events, TurnTrackingObserver, "
+                "AudioBufferProcessor turn recording, UserIdleController, user "
+                "mute strategies, voicemail detector) will not activate. To "
+                "produce them locally, add `vad_analyzer=` to "
+                "LLMUserAggregatorParams. Note: local turn detection may not "
+                "match the provider's actual server-side turn decisions and "
+                "can desynchronize in subtle ways."
+            )

    async def stop(self, frame: EndFrame):
        """Stop the LLM service.
@@ -495,6 +546,23 @@ class LLMService(UserTurnCompletionLLMServiceMixin, AIService, Generic[TAdapter]

        await super().push_frame(frame, direction)

+        # Broadcast realtime-service metadata immediately after the
+        # StartFrame propagates downstream, mirroring the order STT
+        # services use for STTMetadataFrame. The aggregator (upstream)
+        # already received its own StartFrame and is ready to process
+        # the broadcast; downstream processors see StartFrame then the
+        # metadata in their queues.
+        if (
+            self._realtime_service_info is not None
+            and isinstance(frame, StartFrame)
+            and direction == FrameDirection.DOWNSTREAM
+        ):
+            await self.broadcast_frame(
+                RealtimeServiceMetadataFrame,
+                service_name=self.name,
+                emits_user_turn_frames=self._realtime_service_info.emits_user_turn_frames,
+            )
+
    async def _push_llm_text(self, text: str):
        """Push LLM text, using turn completion detection if enabled.

--- a/src/pipecat/services/openai/realtime/llm.py
+++ b/src/pipecat/services/openai/realtime/llm.py
@@ -51,7 +51,7 @@ from pipecat.metrics.metrics import LLMTokenUsage
 from pipecat.processors.aggregators import async_tool_messages
 from pipecat.processors.aggregators.llm_context import LLMContext, LLMSpecificMessage
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.llm_service import FunctionCallFromLLM, LLMService
+from pipecat.services.llm_service import FunctionCallFromLLM, LLMService, RealtimeServiceInfo
 from pipecat.services.openai._constants import OPENAI_REALTIME_WHISPER_MODEL, OPENAI_SAMPLE_RATE
 from pipecat.services.settings import (
    NOT_GIVEN,
@@ -204,6 +204,21 @@ class OpenAIRealtimeLLMService(LLMService[OpenAIRealtimeLLMAdapter]):
    Implements the OpenAI Realtime API with WebSocket communication for low-latency
    bidirectional audio and text interactions. Supports function calling, conversation
    management, and real-time transcription.
+
+    Emits ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame`` from
+    OpenAI's server-side VAD events, so pipeline processors that depend on
+    those frames (RTVI client speech events, ``TurnTrackingObserver``,
+    ``AudioBufferProcessor`` turn recording, ``UserIdleController``, user
+    mute strategies, voicemail detector) work out of the box. Pair with
+    ``LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())``
+    so context writes are decoupled from those frames; see the
+    ``examples/realtime/realtime-openai.py`` example.
+
+    If you wire local VAD (``LLMUserAggregatorParams.vad_analyzer``) on
+    top of this service, disable OpenAI's server-side turn detection
+    first (``turn_detection=False``); otherwise both sources broadcast
+    duplicate user-turn frames. See
+    ``examples/realtime/realtime-openai-locally-driven-turns.py``.
    """

    Settings = OpenAIRealtimeLLMSettings
@@ -212,6 +227,10 @@ class OpenAIRealtimeLLMService(LLMService[OpenAIRealtimeLLMAdapter]):
    # Overriding the default adapter to use the OpenAIRealtimeLLMAdapter one.
    adapter_class = OpenAIRealtimeLLMAdapter

+    # Realtime (speech-to-speech) service. Emits UserStarted/Stopped
+    # speaking frames from server-side VAD events.
+    _realtime_service_info = RealtimeServiceInfo(emits_user_turn_frames=True)
+
    def __init__(
        self,
        *,
--- a/src/pipecat/services/ultravox/llm.py
+++ b/src/pipecat/services/ultravox/llm.py
@@ -48,7 +48,7 @@ from pipecat.frames.frames import (
 from pipecat.processors.aggregators import async_tool_messages
 from pipecat.processors.aggregators.llm_context import LLMContext, LLMSpecificMessage
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.llm_service import FunctionCallFromLLM, LLMService
+from pipecat.services.llm_service import FunctionCallFromLLM, LLMService, RealtimeServiceInfo
 from pipecat.services.settings import NOT_GIVEN, LLMSettings, _NotGiven, assert_given
 from pipecat.utils.time import time_now_iso8601

@@ -174,11 +174,26 @@ class UltravoxRealtimeLLMService(LLMService):

    Note: Ultravox is an audio-native model, so voice transcriptions are not used
    by the model and may not always align with its understanding of user input.
+
+    Does NOT emit ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame``,
+    so pipeline processors that depend on those frames — RTVI client
+    speech events, ``TurnTrackingObserver``, ``AudioBufferProcessor`` turn
+    recording, ``UserIdleController``, user mute strategies, voicemail
+    detector — won't activate with the default server-VAD-only setup. Pair
+    with ``LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())``
+    so context writes are correct anyway. To produce the turn frames
+    locally, wire ``vad_analyzer=SileroVADAnalyzer()`` (or similar) into
+    ``LLMUserAggregatorParams``; locally-generated turn boundaries are a
+    heuristic and may not match Ultravox's server-side turn decisions.
    """

    Settings = UltravoxRealtimeLLMSettings
    _settings: Settings

+    # Realtime (speech-to-speech) service. Does NOT emit
+    # UserStarted/StoppedSpeakingFrame from server-side turn signals.
+    _realtime_service_info = RealtimeServiceInfo(emits_user_turn_frames=False)
+
    def __init__(
        self,
        *,
@@ -600,6 +615,18 @@ class UltravoxRealtimeLLMService(LLMService):
                    case "state":
                        if self._bot_responding and data.get("state") != "speaking":
                            await self._handle_response_end()
+                    case "playback_clear_buffer":
+                        # Server signals that the user interrupted the bot
+                        # mid-speech and any buffered output audio should be
+                        # dropped. Broadcast InterruptionFrame so the assistant
+                        # aggregator records the message interrupted=True
+                        # (upstream) and BaseOutputTransport clears its audio
+                        # buffer (downstream). The subsequent "state" message
+                        # transitioning off "speaking" is what closes the
+                        # response via _handle_response_end; firing the
+                        # interruption first ensures the aggregator handles
+                        # InterruptionFrame before LLMFullResponseEndFrame.
+                        await self.broadcast_interruption()
                    case "client_tool_invocation":
                        await self._handle_tool_invocation(
                            data.get("toolName"), data.get("invocationId"), data.get("parameters")
--- a/src/pipecat/services/xai/realtime/llm.py
+++ b/src/pipecat/services/xai/realtime/llm.py
@@ -50,7 +50,7 @@ from pipecat.metrics.metrics import LLMTokenUsage
 from pipecat.processors.aggregators import async_tool_messages
 from pipecat.processors.aggregators.llm_context import LLMContext, LLMSpecificMessage
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.llm_service import FunctionCallFromLLM, LLMService
+from pipecat.services.llm_service import FunctionCallFromLLM, LLMService, RealtimeServiceInfo
 from pipecat.services.settings import (
    NOT_GIVEN,
    LLMSettings,
@@ -195,6 +195,16 @@ class GrokRealtimeLLMService(LLMService[GrokRealtimeLLMAdapter]):
        - Built-in tools (web_search, x_search, file_search)
        - Custom function calling
        - Server-side VAD (Voice Activity Detection)
+
+    Emits ``UserStartedSpeakingFrame`` / ``UserStoppedSpeakingFrame`` from
+    Grok's server-side VAD events. Pair with
+    ``LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())``
+    so context writes are decoupled from those frames. If you wire local
+    VAD (``LLMUserAggregatorParams.vad_analyzer``) on top of this
+    service, disable Grok's server-side turn detection first via
+    ``turn_detection=None`` (manual mode); otherwise both sources
+    broadcast duplicate user-turn frames. See
+    ``examples/realtime/realtime-grok-locally-driven-turns.py``.
    """

    Settings = GrokRealtimeLLMSettings
@@ -203,6 +213,10 @@ class GrokRealtimeLLMService(LLMService[GrokRealtimeLLMAdapter]):
    # Use the Grok-specific adapter
    adapter_class = GrokRealtimeLLMAdapter

+    # Realtime (speech-to-speech) service. Emits UserStarted/Stopped
+    # speaking frames from server-side VAD events.
+    _realtime_service_info = RealtimeServiceInfo(emits_user_turn_frames=True)
+
    def __init__(
        self,
        *,
--- a/src/pipecat/turns/user_stop/speech_timeout_user_turn_stop_strategy.py
+++ b/src/pipecat/turns/user_stop/speech_timeout_user_turn_stop_strategy.py
@@ -45,16 +45,35 @@ class SpeechTimeoutUserTurnStopStrategy(BaseUserTurnStopStrategy):
    transcript — so the stt wait is marked done immediately.
    """

-    def __init__(self, *, user_speech_timeout: float = 0.6, **kwargs):
+    def __init__(
+        self,
+        *,
+        user_speech_timeout: float = 0.6,
+        wait_for_transcript: bool = True,
+        **kwargs,
+    ):
        """Initialize the speech timeout-based user turn stop strategy.

        Args:
            user_speech_timeout: Time to wait for the user to potentially
                say more after they pause speaking. Defaults to 0.6 seconds.
+            wait_for_transcript: Whether to require at least one transcript
+                before triggering end-of-turn. When True (default), turn-end
+                fires only after the user-speech timer expires *and* at least
+                one transcript has been received. When False, the strategy
+                signals turn-end as soon as VAD reports end of speech and the
+                user-speech timer has elapsed — independent of transcripts.
+                Set this to False when local turn detection is the intended
+                driver of the conversation (e.g. with a realtime LLM service
+                consuming audio directly), so transcripts are off the latency
+                critical path. ``LLMContextAggregatorPair`` flips this for
+                you when ``realtime_service_mode`` is configured with
+                ``turns_await_transcripts=False``.
            **kwargs: Additional keyword arguments.
        """
        super().__init__(**kwargs)
        self._user_speech_timeout = user_speech_timeout
+        self._wait_for_transcript = wait_for_transcript
        self._stt_timeout: float = 0.0  # STT P99 latency from STTMetadataFrame
        self._stop_secs: float = 0.0  # VAD stop_secs from VADUserStoppedSpeakingFrame
        self._stop_secs_warned: bool = False
@@ -69,6 +88,15 @@ class SpeechTimeoutUserTurnStopStrategy(BaseUserTurnStopStrategy):
        self._user_speech_wait_done: bool = False
        self._stt_wait_done: bool = False

+    @property
+    def wait_for_transcript(self) -> bool:
+        """Whether transcripts gate end-of-turn signalling."""
+        return self._wait_for_transcript
+
+    @wait_for_transcript.setter
+    def wait_for_transcript(self, value: bool) -> None:
+        self._wait_for_transcript = value
+
    async def reset(self):
        """Reset the strategy to its initial state."""
        await super().reset()
@@ -252,10 +280,14 @@ class SpeechTimeoutUserTurnStopStrategy(BaseUserTurnStopStrategy):

        Both timers must be done (stt is marked done immediately on the
        fallback path and when finalization short-circuits the safety net),
-        the user must not be currently speaking, and at least one transcript
-        must have been received.
+        the user must not be currently speaking, and — when
+        ``wait_for_transcript`` is True — at least one transcript must
+        have been received.
        """
-        if self._vad_user_speaking or not self._text:
+        if self._vad_user_speaking:
+            return
+
+        if self._wait_for_transcript and not self._text:
            return

        if self._user_speech_wait_done and self._stt_wait_done:
--- a/src/pipecat/turns/user_stop/turn_analyzer_user_turn_stop_strategy.py
+++ b/src/pipecat/turns/user_stop/turn_analyzer_user_turn_stop_strategy.py
@@ -44,15 +44,35 @@ class TurnAnalyzerUserTurnStopStrategy(BaseUserTurnStopStrategy):
    as a fallback.
    """

-    def __init__(self, *, turn_analyzer: BaseTurnAnalyzer, **kwargs):
+    def __init__(
+        self,
+        *,
+        turn_analyzer: BaseTurnAnalyzer,
+        wait_for_transcript: bool = True,
+        **kwargs,
+    ):
        """Initialize the user turn stop strategy.

        Args:
            turn_analyzer: The turn detection analyzer instance to detect end of user turn.
+            wait_for_transcript: Whether to require a transcript before
+                triggering end-of-turn. When True (default), turn-end fires
+                only after the turn analyzer reports COMPLETE *and* either a
+                finalized transcript arrives or the STT safety-net timeout
+                elapses with text in hand. When False, the strategy signals
+                turn-end as soon as the turn analyzer reports COMPLETE —
+                independent of transcripts. Set this to False when local
+                turn detection is the intended driver of the conversation
+                (e.g. with a realtime LLM service consuming audio directly),
+                so transcripts are off the latency critical path.
+                ``LLMContextAggregatorPair`` flips this for you when
+                ``realtime_service_mode`` is configured with
+                ``turns_await_transcripts=False``.
            **kwargs: Additional keyword arguments.
        """
        super().__init__(**kwargs)
        self._turn_analyzer = turn_analyzer
+        self._wait_for_transcript = wait_for_transcript
        self._stt_timeout: float = 0.0  # STT P99 latency from STTMetadataFrame
        self._stop_secs: float = 0.0  # VAD stop_secs from VADUserStoppedSpeakingFrame

@@ -66,6 +86,15 @@ class TurnAnalyzerUserTurnStopStrategy(BaseUserTurnStopStrategy):
        self._timeout_task: asyncio.Task | None = None
        self._timeout_expired: bool = False

+    @property
+    def wait_for_transcript(self) -> bool:
+        """Whether transcripts gate end-of-turn signalling."""
+        return self._wait_for_transcript
+
+    @wait_for_transcript.setter
+    def wait_for_transcript(self, value: bool) -> None:
+        self._wait_for_transcript = value
+
    async def reset(self):
        """Reset the strategy to its initial state."""
        await super().reset()
@@ -256,11 +285,25 @@ class TurnAnalyzerUserTurnStopStrategy(BaseUserTurnStopStrategy):
        """Trigger user turn stopped if conditions are met.

        Conditions:
-        - We have transcription text
        - Turn analyzer indicates turn is complete
-        - Either the timeout has elapsed OR we have a finalized transcript
+        - When ``wait_for_transcript`` is True (default): we have
+          transcription text *and* either the safety-net timeout has
+          elapsed or a finalized transcript arrived.
+        - When ``wait_for_transcript`` is False: fire as soon as the turn
+          analyzer reports COMPLETE — independent of transcripts.
        """
-        if not self._text or not self._turn_complete:
+        if not self._turn_complete:
+            return
+
+        if not self._wait_for_transcript:
+            # Turn-end is driven by the analyzer; transcripts are bookkeeping.
+            if self._timeout_task:
+                await self.task_manager.cancel_task(self._timeout_task)
+                self._timeout_task = None
+            await self.trigger_user_turn_stopped()
+            return
+
+        if not self._text:
            return

        # For finalized transcripts, trigger immediately
--- a/tests/test_context_aggregators_universal.py
+++ b/tests/test_context_aggregators_universal.py
@@ -4,6 +4,7 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+import asyncio
 import json
 import unittest

@@ -33,6 +34,7 @@ from pipecat.frames.frames import (
    LLMThoughtEndFrame,
    LLMThoughtStartFrame,
    LLMThoughtTextFrame,
+    RealtimeServiceMetadataFrame,
    SpeechControlParamsFrame,
    StartFrame,
    TextFrame,
@@ -55,6 +57,8 @@ from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregator,
    LLMUserAggregatorParams,
+    RealtimeServiceModeConfig,
+    UserTurnStoppedMessage,
 )
 from pipecat.processors.frame_processor import FrameDirection
 from pipecat.tests.utils import SleepFrame, run_test
@@ -63,6 +67,10 @@ from pipecat.turns.user_mute import (
    FunctionCallUserMuteStrategy,
    MuteUntilFirstBotCompleteUserMuteStrategy,
 )
+from pipecat.turns.user_start import (
+    TranscriptionUserTurnStartStrategy,
+    VADUserTurnStartStrategy,
+)
 from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
 from pipecat.turns.user_turn_strategies import (
    FilterIncompleteUserTurnStrategies,
@@ -1651,5 +1659,314 @@ class TestToolChangeMessages(unittest.IsolatedAsyncioTestCase):
        self.assertFalse(pair.assistant()._add_tool_change_messages)


+class TestRealtimeServiceModeConfig(unittest.TestCase):
+    def test_default_fields_are_realtime(self):
+        cfg = RealtimeServiceModeConfig()
+        self.assertFalse(cfg.context_writes_await_turns)
+        self.assertFalse(cfg.turns_await_transcripts)
+
+    def test_keep_transcripts_keep_writes_on_turn(self):
+        cfg = RealtimeServiceModeConfig(
+            turns_await_transcripts=True, context_writes_await_turns=True
+        )
+        self.assertTrue(cfg.context_writes_await_turns)
+        self.assertTrue(cfg.turns_await_transcripts)
+
+    def test_keep_transcripts_trailing_writes(self):
+        # Valid third row: turns wait on transcripts but context writes
+        # are trailing. The plan calls this out as the explicit fine-grained
+        # case (downstream consumers of user-turn frames want transcripts).
+        cfg = RealtimeServiceModeConfig(turns_await_transcripts=True)
+        self.assertFalse(cfg.context_writes_await_turns)
+        self.assertTrue(cfg.turns_await_transcripts)
+
+    def test_invalid_combination_rejected(self):
+        # turns fire early but context writes wait → incomplete messages.
+        with self.assertRaises(ValueError):
+            RealtimeServiceModeConfig(
+                turns_await_transcripts=False, context_writes_await_turns=True
+            )
+
+
+class TestRealtimeServiceModeAggregator(unittest.IsolatedAsyncioTestCase):
+    """End-to-end tests for the trailing-write realtime mode."""
+
+    def _build_pair(
+        self,
+        *,
+        realtime_service_mode: RealtimeServiceModeConfig | None = None,
+        user_params: LLMUserAggregatorParams | None = None,
+    ) -> tuple[LLMContext, LLMContextAggregatorPair]:
+        context = LLMContext()
+        pair = LLMContextAggregatorPair(
+            context,
+            user_params=user_params,
+            realtime_service_mode=realtime_service_mode,
+        )
+        return context, pair
+
+    async def test_pair_propagates_realtime_mode_to_halves(self):
+        _, pair = self._build_pair(realtime_service_mode=RealtimeServiceModeConfig())
+        # The pair wires shared state into both halves.
+        self.assertIs(pair.user()._paired_half, pair.assistant())
+        self.assertIs(pair.assistant()._paired_half, pair.user())
+        self.assertIs(pair.user()._pair_lock, pair.assistant()._pair_lock)
+        self.assertFalse(pair.user()._context_writes_await_turns)
+        self.assertFalse(pair.user()._turns_await_transcripts)
+        self.assertFalse(pair.assistant()._context_writes_await_turns)
+        self.assertFalse(pair.assistant()._turns_await_transcripts)
+
+    async def test_pair_omits_realtime_wiring_when_unset(self):
+        _, pair = self._build_pair()
+        # Backreferences are still created (harmless), but no shared lock
+        # is allocated when the realtime config is absent.
+        self.assertIsNone(pair.user()._pair_lock)
+        self.assertIsNone(pair.assistant()._pair_lock)
+        self.assertTrue(pair.user()._context_writes_await_turns)
+        self.assertTrue(pair.assistant()._context_writes_await_turns)
+
+    async def test_realtime_strategy_mutations_with_defaults(self):
+        _, pair = self._build_pair(realtime_service_mode=RealtimeServiceModeConfig())
+        # The mutated strategies live on the UserTurnController owned by
+        # the user aggregator.
+        strategies = pair.user()._user_turn_controller._user_turn_strategies
+        # TranscriptionUserTurnStartStrategy is dropped.
+        for s in strategies.start:
+            self.assertNotIsInstance(s, TranscriptionUserTurnStartStrategy)
+        # VAD start strategy is preserved.
+        self.assertTrue(any(isinstance(s, VADUserTurnStartStrategy) for s in strategies.start))
+        # Stop strategies that expose wait_for_transcript have it flipped.
+        for s in strategies.stop:
+            if hasattr(s, "wait_for_transcript"):
+                self.assertFalse(s.wait_for_transcript)
+
+    async def test_realtime_strategy_mutations_skipped_when_turns_await_transcripts(self):
+        _, pair = self._build_pair(
+            realtime_service_mode=RealtimeServiceModeConfig(turns_await_transcripts=True),
+        )
+        strategies = pair.user()._user_turn_controller._user_turn_strategies
+        # When turns still wait for transcripts, the transcript start
+        # strategy stays in the chain.
+        self.assertTrue(
+            any(isinstance(s, TranscriptionUserTurnStartStrategy) for s in strategies.start)
+        )
+
+    async def test_trailing_write_user_then_assistant_then_user(self):
+        _, pair = self._build_pair(realtime_service_mode=RealtimeServiceModeConfig())
+        user, assistant = pair
+
+        user_msg_added: list[UserTurnStoppedMessage] = []
+        assistant_msg_added: list[AssistantTurnStoppedMessage] = []
+
+        @user.event_handler("on_user_message_added")
+        async def _on_um(_a, msg):
+            user_msg_added.append(msg)
+
+        @assistant.event_handler("on_assistant_message_added")
+        async def _on_am(_a, msg):
+            assistant_msg_added.append(msg)
+
+        context = user.context
+
+        # Sequence: user transcript, assistant response starts (flushes
+        # user), assistant response ends (parks pending), new user
+        # transcript (flushes assistant), then EndFrame flushes the new
+        # user message.
+        frames_to_send = [
+            TranscriptionFrame(text="Hello!", user_id="", timestamp="now"),
+            SleepFrame(),
+            LLMFullResponseStartFrame(),
+            LLMTextFrame("Hi "),
+            LLMTextFrame("there!"),
+            LLMFullResponseEndFrame(),
+            SleepFrame(),
+            TranscriptionFrame(text="How are you?", user_id="", timestamp="now"),
+            SleepFrame(),
+        ]
+        await run_test(
+            Pipeline([user, assistant]),
+            frames_to_send=frames_to_send,
+        )
+
+        # Context should contain: user("Hello!"), assistant("Hi there!"),
+        # user("How are you?").
+        messages = context.get_messages()
+        roles_contents = [(m["role"], m["content"]) for m in messages]
+        self.assertEqual(
+            roles_contents,
+            [
+                ("user", "Hello!"),
+                ("assistant", "Hi there!"),
+                ("user", "How are you?"),
+            ],
+        )
+        self.assertEqual([m.content for m in user_msg_added], ["Hello!", "How are you?"])
+        self.assertEqual([m.content for m in assistant_msg_added], ["Hi there!"])
+        for msg in assistant_msg_added:
+            self.assertFalse(msg.interrupted)
+
+    async def test_interruption_writes_assistant_immediately(self):
+        _, pair = self._build_pair(realtime_service_mode=RealtimeServiceModeConfig())
+        user, assistant = pair
+
+        assistant_messages: list[AssistantTurnStoppedMessage] = []
+
+        @assistant.event_handler("on_assistant_message_added")
+        async def _on_am(_a, msg):
+            assistant_messages.append(msg)
+
+        context = user.context
+
+        frames_to_send = [
+            TranscriptionFrame(text="Hi!", user_id="", timestamp="now"),
+            LLMFullResponseStartFrame(),
+            LLMTextFrame("Hello "),
+            SleepFrame(),
+            InterruptionFrame(),
+        ]
+        await run_test(
+            Pipeline([user, assistant]),
+            frames_to_send=frames_to_send,
+        )
+
+        roles_contents = [(m["role"], m["content"]) for m in context.get_messages()]
+        # User message written when assistant started; assistant message
+        # written immediately on interruption with interrupted=True.
+        self.assertEqual(roles_contents, [("user", "Hi!"), ("assistant", "Hello")])
+        self.assertEqual(len(assistant_messages), 1)
+        self.assertTrue(assistant_messages[0].interrupted)
+
+    async def test_user_turn_stopped_in_realtime_mode_has_none_content(self):
+        # When VAD turn frames fire in realtime mode, the user-turn-stop
+        # message carries content=None — the message isn't finalized yet.
+        _, pair = self._build_pair(
+            realtime_service_mode=RealtimeServiceModeConfig(),
+            user_params=LLMUserAggregatorParams(
+                user_turn_strategies=UserTurnStrategies(
+                    stop=[
+                        SpeechTimeoutUserTurnStopStrategy(
+                            user_speech_timeout=TRANSCRIPTION_TIMEOUT,
+                        )
+                    ],
+                ),
+                user_turn_stop_timeout=USER_TURN_STOP_TIMEOUT,
+            ),
+        )
+        user, assistant = pair
+
+        stop_messages: list[UserTurnStoppedMessage] = []
+
+        @user.event_handler("on_user_turn_stopped")
+        async def _on_stop(_a, _s, msg):
+            stop_messages.append(msg)
+
+        frames_to_send = [
+            VADUserStartedSpeakingFrame(),
+            TranscriptionFrame(text="hey", user_id="", timestamp="now"),
+            VADUserStoppedSpeakingFrame(),
+            SleepFrame(sleep=TRANSCRIPTION_TIMEOUT + 0.05),
+        ]
+        await run_test(
+            Pipeline([user, assistant]),
+            frames_to_send=frames_to_send,
+        )
+        self.assertEqual(len(stop_messages), 1)
+        self.assertIsNone(stop_messages[0].content)
+
+    async def test_realtime_metadata_recommendation_log_when_unconfigured(self):
+        # Cascade pair receiving a RealtimeServiceMetadataFrame logs the
+        # one-time recommendation. The user half records the fact via
+        # _realtime_recommendation_logged.
+        _, pair = self._build_pair()
+        user = pair.user()
+
+        frames_to_send = [
+            RealtimeServiceMetadataFrame(
+                service_name="FakeRealtimeLLM", emits_user_turn_frames=False
+            ),
+        ]
+        await run_test(Pipeline([pair.user(), pair.assistant()]), frames_to_send=frames_to_send)
+        self.assertTrue(user._realtime_recommendation_logged)
+
+    async def test_realtime_metadata_no_log_when_configured(self):
+        # When realtime mode is opted in, the metadata frame is consumed
+        # without firing the recommendation log (we still flag the
+        # one-shot bookkeeping).
+        _, pair = self._build_pair(realtime_service_mode=RealtimeServiceModeConfig())
+        user = pair.user()
+
+        frames_to_send = [
+            RealtimeServiceMetadataFrame(
+                service_name="FakeRealtimeLLM", emits_user_turn_frames=False
+            ),
+        ]
+        await run_test(Pipeline([pair.user(), pair.assistant()]), frames_to_send=frames_to_send)
+        self.assertTrue(user._realtime_recommendation_logged)
+
+    async def test_realtime_mode_requires_paired_half(self):
+        # Direct construction of a half with realtime mode set but no
+        # paired_half raises at StartFrame validation. We call the
+        # validation directly so the error isn't swallowed by the
+        # pipeline's exception handler.
+        context = LLMContext()
+        cfg = RealtimeServiceModeConfig()
+        user = LLMUserAggregator(context, _realtime_service_mode=cfg)
+        with self.assertRaises(RuntimeError):
+            user._validate_realtime_pairing()
+        assistant = LLMAssistantAggregator(context, _realtime_service_mode=cfg)
+        with self.assertRaises(RuntimeError):
+            assistant._validate_realtime_pairing()
+
+    async def test_realtime_mode_rejects_mismatched_halves(self):
+        # If a user code path constructs halves with mismatched configs,
+        # StartFrame validation catches it.
+        context = LLMContext()
+        lock = asyncio.Lock()
+        user = LLMUserAggregator(
+            context,
+            _realtime_service_mode=RealtimeServiceModeConfig(),
+            _pair_lock=lock,
+        )
+        assistant = LLMAssistantAggregator(
+            context,
+            _realtime_service_mode=RealtimeServiceModeConfig(turns_await_transcripts=True),
+            _pair_lock=lock,
+        )
+        user._paired_half = assistant
+        assistant._paired_half = user
+        with self.assertRaises(RuntimeError):
+            user._validate_realtime_pairing()
+
+    async def test_function_call_no_context_push_in_realtime_mode(self):
+        # Realtime services consume function results directly via
+        # FunctionCallResultFrame, so the aggregator should not push
+        # LLMContextFrame upstream after a function call result.
+        _, pair = self._build_pair(realtime_service_mode=RealtimeServiceModeConfig())
+        assistant = pair.assistant()
+        frames_to_send = [
+            FunctionCallInProgressFrame(
+                function_name="get_weather",
+                tool_call_id="1",
+                arguments={"location": "Los Angeles"},
+                cancel_on_interruption=True,
+            ),
+            SleepFrame(),
+            FunctionCallResultFrame(
+                function_name="get_weather",
+                tool_call_id="1",
+                arguments={"location": "Los Angeles"},
+                result={"conditions": "Sunny"},
+            ),
+            SleepFrame(),
+        ]
+        _, up_frames = await run_test(
+            assistant,
+            frames_to_send=frames_to_send,
+        )
+        # No LLMContextFrame should have been pushed upstream in
+        # realtime mode (cascade would push one to re-run inference).
+        self.assertFalse(any(isinstance(f, LLMContextFrame) for f in up_frames))
+
+
 if __name__ == "__main__":
    unittest.main()
Author	SHA1	Message	Date
Paul Kompfner	ef46156c1b	Rename -local-vad.py example variants to -locally-driven-turns.py The "-local-vad" suffix was ambiguous now that local VAD has two meanings in the realtime context: supplementary user-turn frames broadcast alongside server-driven turns (commented-out opt-in in the base examples), vs. local turn detection driving the conversation end-to-end (server-side turn detection disabled, what these variant files actually demonstrate). The new "-locally-driven-turns" suffix matches the latter intent unambiguously. Renames: realtime-openai-local-vad.py → realtime-openai-locally-driven-turns.py realtime-gemini-live-local-vad.py → realtime-gemini-live-locally-driven-turns.py realtime-grok-local-vad.py → realtime-grok-locally-driven-turns.py realtime-inworld-local-vad.py → realtime-inworld-locally-driven-turns.py Plus the matching changelog fragments. Service docstrings and base examples that referenced the old filenames now point at the new ones.	2026-05-21 15:26:27 -04:00
Paul Kompfner	86f9ad0c07	Show commented-out local-VAD opt-in in no-turn-frames examples For services that don't emit UserStarted/StoppedSpeakingFrame (Nova Sonic, Gemini Live, Ultravox), the absence of those frames means downstream consumers — including the Pipecat Prebuilt UI — can't group user transcripts into discrete turns. The Tier 1 comment block already called this out, but the fix required users to know to add the SileroVADAnalyzer import + LLMUserAggregatorParams kwarg themselves. Make it a copy-paste: include the relevant imports and `user_params=` argument as commented-out code, with a comment explaining that they're not strictly necessary for context aggregation but enable RTVI / turn- dependent processors when needed. Mirror the wording used in the LLMService startup log. Also fix line wrapping in the llm_service.py startup log for the no- turn-frames case (manual edit to that message left the last line over- length).	2026-05-21 15:13:52 -04:00
Paul Kompfner	cb9fe04e0b	Wire Inworld manual-mode turn detection + add local-VAD example Inworld Realtime's session properties accept turn_detection=None to put the service into manual mode (matching OpenAI Realtime's turn_detection=False), but the Pipecat integration hardcoded _handle_user_stopped_speaking and _handle_interruption to assume server-side VAD: both were no-ops on the client side because Inworld's server normally handles commit/cancel/response.create automatically. In manual mode the server doesn't, so local-VAD-driven turns stalled — the bot never responded after the user stopped speaking, and interruptions left the in-flight response running. Mirror the OpenAI Realtime pattern: on user-stopped-speaking in manual mode, send InputAudioBufferCommitEvent + ResponseCreateEvent; on interruption in manual mode, send InputAudioBufferClearEvent + ResponseCancelEvent. Gate both on a new _is_manual_turn_detection() helper. Add examples/realtime/realtime-inworld-local-vad.py, the matching *-local-vad.py variant for parity with the OpenAI Realtime and Grok Realtime variants, and point the Inworld service docstring at it.	2026-05-21 14:14:13 -04:00
Paul Kompfner	58027484b2	Add realtime-grok-local-vad.py example Grok Realtime supports manual mode (turn_detection=None) which disables its server-side VAD and lets local VAD drive turn boundaries — same pattern as OpenAI Realtime's turn_detection=False. Add the matching *-local-vad.py variant for parity, and point the Grok service docstring at it.	2026-05-21 13:00:34 -04:00
Paul Kompfner	3b668dc937	Broadcast Nova Sonic interruption on FINAL TEXT contentEnd unconditionally The TEXT INTERRUPTED branch gated broadcast_interruption() on _assistant_is_responding, but Nova Sonic's mid-audio barge-in sequence fires AUDIO contentEnd with stopReason=END_TURN first (per AWS docs), which already flips _assistant_is_responding=False. By the time FINAL TEXT contentEnd with stopReason=INTERRUPTED arrives — the actual interruption notification — the guard skipped the broadcast and the output transport's buffered audio kept playing. Always broadcast on TEXT INTERRUPTED; keep the guard around _report_assistant_response_ended() so we don't double-close the response when AUDIO contentEnd already did it.	2026-05-21 12:37:04 -04:00
Paul Kompfner	be218e1941	Document the local-VAD-plus-server-VAD duplicate-frames caveat Realtime services that emit their own UserStartedSpeakingFrame / UserStoppedSpeakingFrame (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime) also call broadcast_interruption() from server VAD events. Wiring local VAD on top — without first disabling the service's server-side turn detection — causes the aggregator's VAD-driven strategies to broadcast the same frames again, producing duplicates downstream (TurnTrackingObserver, RTVI, AudioBufferProcessor would see doubled events). This is pre-existing behavior on main, not introduced by this PR. But the realtime_service_mode "with local VAD" example invites the question, so call out the intended pattern explicitly. Update three places: - RealtimeServiceModeConfig docstring: a Note section explaining that local VAD is intended for services without server-emitted turn frames OR services with server-side turn detection disabled, not for "both VADs on". - OpenAI Realtime, Inworld, Grok/xAI service docstrings: a one-line note that wiring local VAD requires disabling server-side turn detection first (with a pointer to the *-local-vad.py example for OpenAI Realtime). No code change — the duplicate behavior is documented as not-recommended rather than auto-suppressed. Auto-suppression via RealtimeServiceMetadataFrame.emits_user_turn_frames was considered but rejected for surprise-factor (users adding local VAD probably expect their VAD-driven frames to fire).	2026-05-21 12:19:24 -04:00
Paul Kompfner	92ced43300	Add Phase 2 changelog fragments for example migration	2026-05-21 11:25:29 -04:00
Paul Kompfner	bff741a647	Migrate realtime examples to RealtimeServiceModeConfig Pass realtime_service_mode=RealtimeServiceModeConfig() through every realtime LLM service example (base, async-tool, video, text-output, persistent-context, update-settings, MCP) so context aggregation uses the new realtime-mode semantics instead of relying on local VAD as a workaround. Where examples previously wired SileroVADAnalyzer into LLMUserAggregatorParams to coax turn frames out of services that don't emit them server-side (AWS Nova Sonic, Ultravox, Gemini Live), the local VAD is now removed. realtime_service_mode keeps context writes correct without it, and the Phase 1.5 server-side InterruptionFrame fixes for Nova Sonic and Ultravox keep the bot from talking past the user when they barge in. Transcript-logging event handlers move from on_user_turn_stopped / on_assistant_turn_stopped to on_user_message_added / on_assistant_message_added, which carry the finalized text in realtime mode (the turn-stopped events fire before the message is finalized, so their `content` is None in that mode). For services that don't emit user-turn frames (Gemini Live, AWS Nova Sonic, Ultravox) the example now carries a Tier 1 comment block that spells out which downstream processors won't activate, how to add local VAD if needed, and the caveat that locally-generated turn boundaries are a heuristic that may diverge from server-side ground truth. Adds examples/realtime/realtime-openai-local-vad.py, a new variant of the OpenAI Realtime example that disables OpenAI's server-side turn detection and drives turn boundaries locally — useful when you want a turn analyzer like LocalSmartTurnV3 to decide when the user is done speaking. Server-emitted turn frames are still preferred when available. The Gemini Live local-VAD variant already existed; it's been updated in place rather than rewritten.	2026-05-21 11:25:29 -04:00
Paul Kompfner	20d9bf4af6	Document user-turn-frame behavior in realtime service docstrings Each realtime LLM service docstring now states whether the service emits UserStartedSpeakingFrame / UserStoppedSpeakingFrame from server-side turn signals, and what that implies for the rest of the pipeline. For the services that don't (Gemini Live, AWS Nova Sonic, Ultravox), the docstring spells out which downstream processors won't activate (RTVI client speech events, TurnTrackingObserver, AudioBufferProcessor turn recording, UserIdleController, user mute strategies, voicemail detector), points at realtime_service_mode for correct context-write semantics, and notes the option of wiring local VAD plus the caveat that locally- generated turn boundaries are a heuristic that may not match the provider's server-side turn decisions. For the services that do (OpenAI Realtime, Inworld, Grok/xAI Realtime), the docstring confirms turn frames are emitted from server VAD and points at realtime_service_mode.	2026-05-21 11:25:29 -04:00
Paul Kompfner	a00211627f	Surface server-side interruption from Nova Sonic and Ultravox BaseOutputTransport only clears buffered audio mid-playback on InterruptionFrame. Realtime services stream audio downstream as fast as they produce it, and playback necessarily trails the buffer — so when the user interrupts, the bot keeps talking past the interruption unless the service surfaces the interruption to the pipeline. Two realtime services were missing this signal: - AWS Nova Sonic acknowledged the INTERRUPTED stop reason internally (closing its own response state) but never broadcast InterruptionFrame. - Ultravox's playback_clear_buffer message — the server's explicit "drop buffered output audio" signal for interruptions — was not handled at all. In both cases the latent bug was masked by enabling local VAD on the user aggregator, which produced UserStartedSpeakingFrame and triggered the aggregator-side interruption path. The realtime context aggregator work makes local VAD optional, so the underlying gap needs fixing first. Wire broadcast_interruption() into both services on the server-side interruption signal, firing before the response-end signal so the assistant aggregator marks the message interrupted=True before LLMFullResponseEndFrame closes the turn.	2026-05-21 11:25:29 -04:00
Paul Kompfner	11d7fcf174	Add changelog fragments for realtime service mode Fragments use the +<name> prefix so they show up under "Unreleased" without a PR-number suffix; rename to <PR#>.<type>.md before merge.	2026-05-21 11:25:29 -04:00
Paul Kompfner	1fe8cf5289	Add RealtimeServiceModeConfig to LLMContextAggregatorPair Decouple context management from turn frames and transcripts when a realtime LLM service drives the conversation. Three problems with today's behavior: - Some realtime services (Gemini Live, AWS Nova Sonic, Ultravox) emit no UserStarted/StoppedSpeakingFrame at all, so the aggregator — which writes user messages on those frames — doesn't write to context correctly without them. - The workaround (local VAD on the aggregator) generates turn boundaries that don't match the provider's server-side ground truth, and the per-service "do I need it?" rule is hard to keep straight. - When local turn detection is the intended driver, turn-end strategies still wait for transcripts on the latency critical path. Add a realtime_service_mode: RealtimeServiceModeConfig \| None = None kwarg on LLMContextAggregatorPair. When set, the pair switches both halves to trailing context writes: user messages are flushed on the first assistant content frame, assistant messages on the next user transcript, both halves on EndFrame. Turn-end strategies stop waiting for transcripts by default. Two fine-grained boolean fields (context_writes_await_turns, turns_await_transcripts) let callers dial back to cascade-style behavior selectively; their invalid combination is rejected in __post_init__. The bifurcation is dispatch-only: seven branch points across the two halves, each at method entry, each delegating to a mode-pure private method. Cross-half coordination uses an asyncio.Lock and a back-reference shared by both halves; the assistant signals user.flush() on LLMFullResponseStartFrame, and the user signals assistant.flush() on the first new transcript after the assistant turn. The mechanism reuses the existing push_aggregation() — no parallel write path. Two new events fire when messages are flushed to context: on_user_message_added and on_assistant_message_added. In cascade mode they coincide with the existing turn-stopped events; in realtime mode (where the turn-stopped event fires before the message is finalized) they're the canonical way to subscribe to "context just updated, here's the text." UserTurnStoppedMessage.content is now typed str \| None to reflect that realtime mode fires the event with None. When a RealtimeServiceMetadataFrame arrives and realtime_service_mode is None, the aggregator logs a one-time INFO recommendation pointing users at the option.	2026-05-21 11:25:29 -04:00
Paul Kompfner	3247fd1188	Mark realtime LLM services with RealtimeServiceInfo + emit metadata at start Realtime (speech-to-speech) LLM services need to advertise themselves to the rest of the pipeline so downstream components can adapt. Add a new RealtimeServiceMetadataFrame subtype of ServiceMetadataFrame, following the STTMetadataFrame precedent. LLMService gains a single ClassVar, _realtime_service_info, typed RealtimeServiceInfo \| None and defaulting to None. The presence of a populated instance is what marks a service as realtime, and the RealtimeServiceInfo dataclass carries the per-service knobs the rest of the pipeline needs — currently just emits_user_turn_frames. Keeping it all under one optional ClassVar avoids stranding realtime-only knobs on the generic LLMService surface; non-realtime services keep the default None and the realtime-specific machinery stays inert. When _realtime_service_info is set, the base service auto-broadcasts RealtimeServiceMetadataFrame right after StartFrame propagates downstream (same ordering as STT). When emits_user_turn_frames is False, a one-time INFO log at start explains which pipeline processors depend on those frames (RTVI client speech events, TurnTrackingObserver, AudioBufferProcessor turn recording, UserIdleController, user mute strategies, voicemail detector) and how to add local VAD if needed. Set the ClassVar on the seven realtime services: OpenAI Realtime, Azure Realtime (via inheritance), Inworld, Grok/xAI Realtime all emit user-turn frames; Gemini Live (and Gemini Live Vertex via inheritance), AWS Nova Sonic, Ultravox do not. In a follow-up commit, LLMContextAggregatorPair will consume RealtimeServiceMetadataFrame to surface a one-time recommendation when realtime_service_mode is not configured.	2026-05-20 15:08:40 -04:00
Paul Kompfner	9f0a60b995	Add wait_for_transcript flag on user-turn stop strategies SpeechTimeoutUserTurnStopStrategy and TurnAnalyzerUserTurnStopStrategy both gate end-of-turn on a transcript arriving. That's the right default for cascade STT/LLM/TTS pipelines, but it puts transcripts on the latency critical path in pipelines where local turn detection is the intended driver of end-of-turn — typically realtime LLM services consuming audio directly. Closed PR #4480 explored this same fix in isolation. Add wait_for_transcript: bool = True to both strategies. False makes the strategy signal end-of-turn as soon as VAD / the turn analyzer reports end-of-speech, independent of transcripts. The default preserves existing behavior. LLMContextAggregatorPair will flip this in realtime mode in a follow-up commit.	2026-05-20 14:07:58 -04:00
				`@@ -0,0 +1 @@`
				- Fixed `InworldRealtimeLLMService` not supporting manual-mode turn detection (`session_properties.audio.input.turn_detection=None`). Previously `_handle_user_stopped_speaking` and `_handle_interruption` assumed Inworld's server-side VAD handled commit/cancel/response.create automatically and were no-ops on the client side. In manual mode the server doesn't, so local-VAD-driven turns stalled: the bot never responded after the user stopped speaking, and interruptions didn't cancel the in-flight response. Wire the explicit `InputAudioBufferCommitEvent` + `ResponseCreateEvent` on user-stopped-speaking and `InputAudioBufferClearEvent` + `ResponseCancelEvent` on interruption, gated on a new `_is_manual_turn_detection()` check (mirroring the pattern in `OpenAIRealtimeLLMService`).
				`@@ -0,0 +1 @@`
				- Fixed AWS Nova Sonic not surfacing server-side interruption. When the user interrupted the bot mid-response, the `INTERRUPTED` stop reason was acknowledged internally but no `InterruptionFrame` was emitted, so `BaseOutputTransport` kept draining its audio buffer and the bot kept talking past the interruption. Nova Sonic now broadcasts `InterruptionFrame` on both `INTERRUPTED` paths (text-stage and audio-stage). This was previously masked by enabling local VAD on the user aggregator, which generated `UserStartedSpeakingFrame` and triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.
				`@@ -0,0 +1 @@`
				- Migrated all realtime LLM service examples (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, Gemini Live Vertex, AWS Nova Sonic, Ultravox) — base examples, `persistent-context-`, `update-settings/llm/`, and the Gemini Live MCP example — to use `LLMContextAggregatorPair(..., realtime_service_mode=RealtimeServiceModeConfig())`. Where examples previously wired `SileroVADAnalyzer` into `LLMUserAggregatorParams` as a workaround for missing turn frames, the local VAD has been removed; the realtime service mode + the Phase 1.5 interruption fixes for Nova Sonic and Ultravox make this safe. Transcript-logging event handlers have moved from `on_user_turn_stopped` / `on_assistant_turn_stopped` to the new `on_user_message_added` / `on_assistant_message_added` events, which carry the finalized message text. Examples for services without server-side user-turn frames (Gemini Live, AWS Nova Sonic, Ultravox) include a Tier 1 comment block explaining what doesn't activate without those frames and how to add local VAD if needed; the corresponding service docstrings have the same warning.
				`@@ -0,0 +1 @@`
				- Added `examples/realtime/realtime-grok-locally-driven-turns.py`, a variant of the base Grok Realtime example that disables Grok's server-side turn detection (`turn_detection=None`, manual mode) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Mirrors the OpenAI Realtime locally-driven-turns variant. Server-emitted turn frames are preferred when available.
				`@@ -0,0 +1 @@`
				- Added `examples/realtime/realtime-inworld-locally-driven-turns.py`, a variant of the base Inworld Realtime example that disables Inworld's server-side turn detection (`turn_detection=None`, manual mode) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Mirrors the OpenAI Realtime and Grok Realtime locally-driven-turns variants. Server-emitted turn frames are preferred when available.
				`@@ -0,0 +1 @@`
				- Added a startup INFO log on realtime LLM services that don't emit `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame` (Gemini Live, AWS Nova Sonic, Ultravox). The log spells out which downstream processors depend on those frames (RTVI client speech events, `TurnTrackingObserver`, `AudioBufferProcessor` turn recording, `UserIdleController`, user mute strategies, voicemail detector) and how to opt into local VAD when needed.
				`@@ -0,0 +1 @@`
				- Added `examples/realtime/realtime-openai-locally-driven-turns.py`, a variant of the base OpenAI Realtime example that disables OpenAI's server-side turn detection (`turn_detection=False`) and instead drives turn boundaries locally with `SileroVADAnalyzer` wired into the user aggregator. Use this variant if you need a turn analyzer like `LocalSmartTurnV3` to decide when the user is done speaking, or if you need `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame` to fire from the same source as `InterruptionFrame`. Server-emitted turn frames are preferred when available.
				`@@ -0,0 +1 @@`
				- Added `RealtimeServiceMetadataFrame`, broadcast at pipeline start by realtime LLM services (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox). The context aggregator pair listens for it and, when `realtime_service_mode` isn't configured, logs a one-time INFO recommendation pointing users at the option and the `on_user_turn_stopped` timing change it implies.
				`@@ -0,0 +1 @@`
				- Added `RealtimeServiceModeConfig` and a new `realtime_service_mode` kwarg on `LLMContextAggregatorPair`, opting the pair into realtime (speech-to-speech) LLM behavior. When set, user messages are written to context when the assistant response starts rather than on user-turn-end frames — so context stays correct even when the realtime service emits no turn frames at all — and, by default, turn-end strategies stop waiting for transcripts before signalling end-of-turn, keeping transcript latency off the critical path in local-VAD-driven realtime pipelines. Both behaviors are individually controllable via the `context_writes_await_turns` and `turns_await_transcripts` fields. Cascade (non-realtime) behavior is unchanged when the kwarg is omitted.