docs: add a demo showing how to track usage

2025-10-16 13:45:42 +08:00
138 changed files with 3109 additions and 7226 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,438 +9,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### Added

- Refactored pipeline architecture by introducing a new `PipelineNode`
-  abstraction.  Frame processors are now standalone async iterators, and
-  `PipelineNode` is responsible for routing frames upstream or downstream. This
-  decouples frame processors from direct linking, simplifies processor reuse,
-  and provides a clearer separation between processing logic and pipeline
-  wiring. This is an internal, transparent improvement and does not require any
-  changes to existing frame processor code.
-
- `EndFrame` and `EndTaskFrame` have an optional `reason` field to indicate why
-  the pipeline is being ended.
-
- `CancelFrame` and `CancelTaskFrame` have an optional `reason` field to
-  indicate why the pipeline is being canceled. This can be also specified when
-  you cancel a task with `PipelineTask.cancel(reason="cancellation your
-  reason")`.
-
-### Fixed
-
- `GeminiLiveLLMService` now properly supports context-provided system
-  instruction and tools
-
-## [0.0.92] - 2025-10-31 🎃 "The Haunted Edition" 👻
-
-### Added
-
- Added supprt for Sarvam Speech-to-Text service (`SarvamSTTService`) with
-  streaming WebSocket support for `saarika` (STT) and `saaras` (STT-translate)
-  models.
-
- Added a new `DeepgramHttpTTSService`, which delivers a meaningful reduction
-  in latency when compared to the `DeepgramTTSService`.
-
- Add support for `speaking_rate` input parameter in `GoogleHttpTTSService`.
-
- Added `enable_speaker_diarization` and `enable_language_identification` to
-  `SonioxSTTService`.
-
- Added `SpeechmaticsTTSService`, which uses Speechmatic's TTS API. Updated
-  examples 07a\* to use the new TTS service.
-
- Added support for including images or audio to LLM context messages using
-  `LLMContext.create_image_message()` or `LLMContext.create_image_url_message()`
-  (not all LLMs support URLs) and `LLMContext.create_audio_message()`. For
-  example, when creating `LLMMessagesAppendFrame`:
-
-  ```python
-  message = LLMContext.create_image_message(image=..., size= ...)
-  await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True))
-  ```
-
- New event handlers for the `DeepgramFluxSTTService`: `on_start_of_turn`,
-  `on_turn_resumed`, `on_end_of_turn`, `on_eager_end_of_turn`, `on_update`.
-
- Added `generation_config` parameter support to `CartesiaTTSService` and
-  `CartesiaHttpTTSService` for Cartesia Sonic-3 models. Includes a new
-  `GenerationConfig` class with `volume` (0.5-2.0), `speed` (0.6-1.5),
-  and `emotion` (60+ options) parameters for fine-grained speech generation
-  control.
-
- Expanded support for univeral `LLMContext` to `OpenAIRealtimeLLMService`.
-  As a reminder, the context-setup pattern when using `LLMContext` is:
-
-  ```python
-  context = LLMContext(messages, tools)
-  context_aggregator = LLMContextAggregatorPair(context)
-  ```
-
-  (Note that even though `OpenAIRealtimeLLMService` now supports the universal
-  `LLMContext`, it is not meant to be swapped out for another LLM service at
-  runtime with `LLMSwitcher`.)
-
-  Note: `TranscriptionFrame`s and `InterimTranscriptionFrame`s now go upstream
-  from `OpenAIRealtimeLLMService`, so if you're using `TranscriptProcessor`,
-  say, you'll want to adjust accordingly:
-
-  ```python
-  pipeline = Pipeline(
-    [
-      transport.input(),
-      context_aggregator.user(),
-
-      # BEFORE
-      llm,
-      transcript.user(),
-
-      # AFTER
-      transcript.user(),
-      llm,
-
-      transport.output(),
-      transcript.assistant(),
-      context_aggregator.assistant(),
-    ]
-  )
-  ```
-
-  Also worth noting: whether or not you use the new context-setup pattern with
-  `OpenAIRealtimeLLMService`, some types have changed under the hood:
-
-  ```python
-  ## BEFORE:
-
-  # Context aggregator type
-  context_aggregator: OpenAIContextAggregatorPair
-
-  # Context frame type
-  frame: OpenAILLMContextFrame
-
-  # Context type
-  context: OpenAIRealtimeLLMContext
-  # or
-  context: OpenAILLMContext
-
-  ## AFTER:
-
-  # Context aggregator type
-  context_aggregator: LLMContextAggregatorPair
-
-  # Context frame type
-  frame: LLMContextFrame
-
-  # Context type
-  context: LLMContext
-  ```
-
-  Also note that `RealtimeMessagesUpdateFrame` and
-  `RealtimeFunctionCallResultFrame` have been deprecated, since they're no
-  longer used by `OpenAIRealtimeLLMService`. OpenAI Realtime now works more
-  like other LLM services in Pipecat, relying on updates to its context, pushed
-  by context aggregators, to update its internal state. Listen for
-  `LLMContextFrame`s for context updates.
-
-  Finally, `LLMTextFrame`s are no longer pushed from `OpenAIRealtimeLLMService`
-  when it's configured with `output_modalities=['audio']`. If you need
-  to process its output, listen for `TTSTextFrame`s instead.
-
- Expanded support for universal `LLMContext` to `GeminiLiveLLMService`.
-  As a reminder, the context-setup pattern when using `LLMContext` is:
-
-  ```python
-  context = LLMContext(messages, tools)
-  context_aggregator = LLMContextAggregatorPair(context)
-  ```
-
-  (Note that even though `GeminiLiveLLMService` now supports the universal
-  `LLMContext`, it is not meant to be swapped out for another LLM service at
-  runtime with `LLMSwitcher`.)
-
-  Worth noting: whether or not you use the new context-setup pattern with
-  `GeminiLiveLLMService`, some types have changed under the hood:
-
-  ```python
-  ## BEFORE:
-
-  # Context aggregator type
-  context_aggregator: GeminiLiveContextAggregatorPair
-
-  # Context frame type
-  frame: OpenAILLMContextFrame
-
-  # Context type
-  context: GeminiLiveLLMContext
-  # or
-  context: OpenAILLMContext
-
-  ## AFTER:
-
-  # Context aggregator type
-  context_aggregator: LLMContextAggregatorPair
-
-  # Context frame type
-  frame: LLMContextFrame
-
-  # Context type
-  context: LLMContext
-  ```
-
-  Also note that `LLMTextFrame`s are no longer pushed from `GeminiLiveLLMService`
-  when it's configured with `modalities=GeminiModalities.AUDIO`. If you need
-  to process its output, listen for `TTSTextFrame`s instead.
-
-### Changed
-
- The development runner's `/start` endpoint now supports passing
-  `dailyRoomProperties` and `dailyMeetingTokenProperties` in the request body
-  when `createDailyRoom` is true. Properties are validated against the
-  `DailyRoomProperties` and `DailyMeetingTokenProperties` types respectively
-  and passed to Daily's room and token creation APIs.
-
- `UserImageRawFrame` new fields `append_to_context` and `text`. The
-  `append_to_context` field indicates if this image and text should be added to
-  the LLM context (by the LLM assistant aggregator). The `text` field, if set,
-  might also guide the LLM or the vision service on how to analyze the image.
-
- `UserImageRequestFrame` new fiels `append_to_context` and `text`. Both fields
-  will be used to set the same fields on the captured `UserImageRawFrame`.
-
- `UserImageRequestFrame` don't require function call name and ID anymore.
-
- Updated `MoondreamService` to process `UserImageRawFrame`.
-
- `VisionService` expects `UserImageRawFrame` in order to analyze images.
-
- `DailyTransport` triggers `on_error` event if transcription can't be started
-  or stopped.
-
- `DailyTransport` updates: `start_dialout()` now returns two values:
-  `session_id` and `error`. `start_recording()` now returns two values:
-  `stream_id` and `error`.
-
- Updated `daily-python` to 0.21.0.
-
- `SimliVideoService` now accepts `api_key` and `face_id` parameters directly,
-  with optional `params` for `max_session_length` and `max_idle_time`
-  configuration, aligning with other Pipecat service patterns.
-
- Updated the default model to `sonic-3` for `CartesiaTTSService` and
-  `CartesiaHttpTTSService`.
-
- `FunctionFilter` now has a `filter_system_frames` arg, which controls whether
-  or not SystemFrames are filtered.
-
- Upgraded `aws_sdk_bedrock_runtime` to v0.1.1 to resolve potential CPU issues
-  when running `AWSNovaSonicLLMService`.
-
-### Deprecated
-
- The `expect_stripped_words` parameter of `LLMAssistantAggregatorParams` is
-  ignored when used with the newer `LLMAssistantAggregator`, which now handles
-  word spacing automatically.
-
- `LLMService.request_image_frame()` is deprecated, push a
-  `UserImageRequestFrame` instead.
-
- `UserResponseAggregator` is deprecated and will be removed in a future version.
-
- The `send_transcription_frames` argument to `OpenAIRealtimeLLMService` is
-  deprecated. Transcription frames are now always sent. They go upstream, to be
-  handled by the user context aggregator. See "Added" section for details.
-
- Types in `pipecat.services.openai.realtime.context` and
-  `pipecat.services.openai.realtime.frames` are deprecated, as they're no
-  longer used by `OpenAIRealtimeLLMService`. See "Added" section for details.
-
- `SimliVideoService` `simli_config` parameter is deprecated. Use `api_key` and
-  `face_id` parameters instead.
-
-### Removed
-
- Removed `enable_non_final_tokens` and `max_non_final_tokens_duration_ms` from
-  `SonioxSTTService`.
-
- Removed the `aiohttp_session` arg from `SarvamTTSService` as it's no longer
-  used.
-
-### Fixed
-
- Fixed a `PipelineTask` issue that was causing an idle timeout for frames that
-  were being generated but not reaching the end of the pipeline. Since the exact
-  point when frames are discarded is unknown, we now monitor pipeline frames
-  using an observer. If the observer detects frames are being generated, it will
-  prevent the pipeline from being considered idle.
-
- Fixed an issue in `HumeTTSService` that was only using Octave 2, which does
-  not support the `description` field. Now, if a description is provided, it
-  switches to Octave 1.
-
- Fixed an issue where `DailyTransport` would timeout prematurely on join and on
-  leave.
-
- Fixed an issue in the runner where starting a DailyTransport room via
-  `/start` didn't support using the `DAILY_SAMPLE_ROOM_URL` env var.
-
- Fixed an issue in `ServiceSwitcher` where the `STTService`s would result in
-  all STT services producing `TranscriptionFrame`s.
-
-### Other
-
- Updated all vision 12-series foundational examples to load images from a file.
-
- Added 14-series video examples for different services. These new examples
-  request an image from the user camera through a function call.
-
-## [0.0.91] - 2025-10-21
-
-### Added
-
- It is now possible to start a bot from the `/start` endpoint when using the
-  runner Daily's transport. This follows the Pipecat Cloud format with
-  `createDailyRoom` and `body` fields in the POST request body.
-
- Added an ellipsis character (`…`) to the end of sentence detection in the
-  string utils.
-
- Expanded support for universal `LLMContext` to `AWSNovaSonicLLMService`.
-  As a reminder, the context-setup pattern when using `LLMContext` is:
-
-  ```python
-  context = LLMContext(messages, tools)
-  context_aggregator = LLMContextAggregatorPair(context)
-  ```
-
-  (Note that even though `AWSNovaSonicLLMService` now supports the universal
-  `LLMContext`, it is not meant to be swapped out for another LLM service at
-  runtime with `LLMSwitcher`.)
-
-  Worth noting: whether or not you use the new context-setup pattern with
-  `AWSNovaSonicLLMService`, some types have changed under the hood:
-
-  ```python
-  ## BEFORE:
-
-  # Context aggregator type
-  context_aggregator: AWSNovaSonicContextAggregatorPair
-
-  # Context frame type
-  frame: OpenAILLMContextFrame
-
-  # Context type
-  context: AWSNovaSonicLLMContext
-  # or
-  context: OpenAILLMContext
-
-  ## AFTER:
-
-  # Context aggregator type
-  context_aggregator: LLMContextAggregatorPair
-
-  # Context frame type
-  frame: LLMContextFrame
-
-  # Context type
-  context: LLMContext
-  ```
-
- Added support for `bulbul:v3` model in `SarvamTTSService` and
-  `SarvamHttpTTSService`.
-
- Added `keyterms_prompt` parameter to `AssemblyAIConnectionParams`.
-
- Added `speech_model` parameter to `AssemblyAIConnectionParams` to access the
-  multilingual model.
-
- Added support for trickle ICE to the `SmallWebRTCTransport`.
-
- Added support for updating `OpenAITTSService` settings (`instructions` and
-  `speed`) at runtime via `TTSUpdateSettingsFrame`.
-
- Added `--whatsapp` flag to runner to better surface WhatsApp transport logs.
-
- Added `on_connected` and `on_disconnected` events to TTS and STT
-  websocket-based services.
-
- Added an `aggregate_sentences` arg in `ElevenLabsHttpTTSService`, where the
-  default value is True.
-
- Added a `room_properties` arg to the Daily runner's `configure()` method,
-  allowing `DailyRoomProperties` to be provided.
-
 - The runner `--folder` argument now supports downloading files from
  subdirectories.

-### Changed
-
- `RunnerArguments` now include the `body` field, so there's no need to add it
-  to subclasses. Also, all `RunnerArguments` fields are now keyword-only.
-
- `CartesiaSTTService` now inherits from `WebsocketSTTService`.
-
- Package upgrades:
-
-  - `daily-python` upgraded to 0.20.0.
-  - `openai` upgraded to support up to 2.x.x.
-  - `openpipe` upgraded to support up to 5.x.x.
-
- `SpeechmaticsSTTService` updated dependencies for `speechmatics-rt>=0.5.0`.
-
-### Deprecated
-
- The `send_transcription_frames` argument to `AWSNovaSonicLLMService` is
-  deprecated. Transcription frames are now always sent. They go upstream, to be
-  handled by the user context aggregator. See "Added" section for details.
-
- Types in `pipecat.services.aws.nova_sonic.context` are deprecated, as they're
-  no longer used by `AWSNovaSonicLLMService`. See "Added" section for
-  details.
-
 ### Fixed

- Fixed an issue where the `RTVIProcessor` was sending duplicate
-  `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` messages.
-
- Fixed an issue in `AWSBedrockLLMService` where both `temperature` and `top_p`
-  were always sent together, causing conflicts with models like Claude Sonnet 4.5
-  that don't allow both parameters simultaneously. The service now only includes
-  inference parameters that are explicitly set, and `InputParams` defaults have
-  been changed to `None` to rely on AWS Bedrock's built-in model defaults.
-
- Fixed an issue in `RivaSegmentedSTTService` where a runtime error occurred due
-  to a mismatch in the `_handle_transcription` method's signature.
-
- Fixed multiple pipeline task cancellation issues. `asyncio.CancelledError` is
-  now handled properly in `PipelineTask` making it possible to cancel an asyncio
-  task that it's executing a `PipelineRunner` cleanly. Also,
-  `PipelineTask.cancel()` does not block anymore waiting for the `CancelFrame`
-  to reach the end of the pipeline (going back to the behavior in < 0.0.83).
-
- Fixed an issue in `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` where
-  the Flash models would split words, resulting in a space being inserted
-  between words.
-
- Fixed an issue where audio filters' `stop()` would not be called when using
-  `CancelFrame`.
-
- Fixed an issue in `ElevenLabsHttpTTSService`, where
-  `apply_text_normalization` was incorrectly set as a query parameter. It's now
-  being added as a request parameter.
-
 - Fixed an issue where `RimeHttpTTSService` and `PiperTTSService` could generate
  incorrectly 16-bit aligned audio frames, potentially leading to internal
  errors or static audio.

- Fixed an issue in `SpeechmaticsSTTService` where `AdditionalVocabEntry` items
-  needed to have `sounds_like` for the session to start.
-
-### Other
-
- Added foundational example `47-sentry-metrics.py`, demonstrating how to use the
-  `SentryMetrics` processor.
-
- Added foundational example `14x-function-calling-openpipe.py`.
-
 ## [0.0.90] - 2025-10-10

 ### Added
@@ -1432,8 +1009,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### Added

- Added `SonioxSTTService` using Soniox's STT websocket API.
-
 - Added `enable_emulated_vad_interruptions` to `LLMUserAggregatorParams`.
  When user speech is emulated (e.g. when a transcription is received but
  VAD doesn't detect speech), this parameter controls whether the emulated
--- a/README.md
+++ b/README.md
@@ -44,10 +44,6 @@ Looking to build structured conversations? Check out [Pipecat Flows](https://git

 Want to build beautiful and engaging experiences? Checkout the [Voice UI Kit](https://github.com/pipecat-ai/voice-ui-kit), a collection of components, hooks and templates for building voice AI applications quickly.

-### 🛠️ Create and deploy projects
-
-Create a new project in under a minute with the [Pipecat CLI](https://github.com/pipecat-ai/pipecat-cli). Then use the CLI to monitor and deploy your agent to production.
-
 ### 🔍 Debugging

 Looking for help debugging your pipeline and processors? Check out [Whisker](https://github.com/pipecat-ai/whisker), a real-time Pipecat debugger.
@@ -67,24 +63,24 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
    <a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/storytelling-chatbot/image.png" width="400" /></a>
    <br/>
    <a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/translation-chatbot/image.png" width="400" /></a>&nbsp;
-    <a href="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/12-describe-video.py"><img src="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/assets/moondream.png" width="400" /></a>
+    <a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/moondream-chatbot/image.png" width="400" /></a>
 </p>

 ## 🧩 Available services

-| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
-| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper)                                                                                                                                                                                                                                                        |
-| LLMs                | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together)                                                                                                                                                                                                                              |
-| Text-to-Speech      | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
-| Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-| Serializers         | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
-| Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-| Memory              | [mem0](https://docs.pipecat.ai/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
-| Vision & Image      | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
-| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper)                                                                                                                    |
+| LLMs                | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together)                                                                                          |
+| Text-to-Speech      | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
+| Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| Serializers         | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| Memory              | [mem0](https://docs.pipecat.ai/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+| Vision & Image      | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |

 📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)

--- a/env.example
+++ b/env.example
@@ -4,9 +4,6 @@ AICOUSTICS_LICENSE_KEY=...
 # Anthropic
 ANTHROPIC_API_KEY=...

-# Assembly AI
-ASSEMBLYAI_API_KEY=...
-
 # Async
 ASYNCAI_API_KEY=...
 ASYNCAI_VOICE_ID=...
@@ -24,19 +21,12 @@ AZURE_CHATGPT_API_KEY=...
 AZURE_CHATGPT_ENDPOINT=https://...
 AZURE_CHATGPT_MODEL=...

-AZURE_REALTIME_API_KEY=...
-AZURE_REALTIME_BASE_URL=...
-
 AZURE_DALLE_API_KEY=...
 AZURE_DALLE_ENDPOINT=https://...
 AZURE_DALLE_MODEL=...

 # Cartesia
 CARTESIA_API_KEY=...
-CARTESIA_VOICE_ID=...
-
-# Cerebras
-CEREBRAS_API_KEY=...

 # Daily
 DAILY_API_KEY=...
@@ -45,75 +35,42 @@ DAILY_SAMPLE_ROOM_URL=https://...
 # Deepgram
 DEEPGRAM_API_KEY=...

-# DeepSeek
-DEEPSEEK_API_KEY=...
-
 # ElevenLabs
 ELEVENLABS_API_KEY=...
 ELEVENLABS_VOICE_ID=...

+# Neuphonic
+NEUPHONIC_API_KEY=...
+
 # Fal
 FAL_KEY=...

 # Fireworks
 FIREWORKS_API_KEY=...

-# Fish Audio
-FISH_API_KEY=...
-
 # Gladia
 GLADIA_API_KEY=...
 GLADIA_REGION=...

 # Google
 GOOGLE_API_KEY=...
-GOOGLE_VERTEX_TEST_CREDENTIALS=...
 GOOGLE_CLOUD_PROJECT_ID=...
-GOOGLE_CLOUD_LOCATION=...
 GOOGLE_TEST_CREDENTIALS=...
-
-# Grok
-GROK_API_KEY=...
-
-# Groq
-GROQ_API_KEY=...
-
-# Heygen
-HEYGEN_API_KEY=...
+GOOGLE_VERTEX_TEST_CREDENTIALS=...

 # Hume
 HUME_API_KEY=...
-HUME_VOICE_ID=...
-
-# Inworld
-INWORLD_API_KEY=...
-
-# Krisp
-KRISP_MODEL_PATH=...
-
-# Krisp Viva
-KRISP_VIVA_MODEL_PATH=...
-
-# LiveKit
-LIVEKIT_API_KEY=...
-LIVEKIT_API_SECRET=...

 # LMNT
 LMNT_API_KEY=...
 LMNT_VOICE_ID=...

-# MiniMax
-MINIMAX_API_KEY=...
-MINIMAX_GROUP_ID=...
+# Perplexity
+PERPLEXITY_API_KEY=...

-# Mistral
-MISTRAL_API_KEY=...
-
-# Neuphonic
-NEUPHONIC_API_KEY=...
-
-# NVIDIA
-NVIDIA_API_KEY=...
+# PlayHT
+PLAYHT_USER_ID=...
+PLAYHT_API_KEY=...

 # OpenAI
 OPENAI_API_KEY=...
@@ -121,73 +78,92 @@ OPENAI_API_KEY=...
 # OpenPipe
 OPENPIPE_API_KEY=...

-# OpenRouter
-OPENROUTER_API_KEY=...
-
-# Perplexity
-PERPLEXITY_API_KEY=...
-
-# Picovoice Koala
-KOALA_ACCESS_KEY=...
-
-# Piper
-PIPER_BASE_URL=...
-
-# PlayHT
-PLAYHT_USER_ID=...
-PLAYHT_API_KEY=...
-
-# Plivo
-PLIVO_AUTH_ID=...
-PLIVO_AUTH_TOKEN=...
-
-# Qwen
-QWEN_API_KEY=...
-
-# Rime
-RIME_API_KEY=...
-RIME_VOICE_ID=...
-
-# SambaNova
-SAMBANOVA_API_KEY=...
-
-# Sarvam AI
-SARVAM_API_KEY=...
-
-# Sentry
-SENTRY_DSN=...
+# Tavus
+TAVUS_API_KEY=...
+TAVUS_REPLICA_ID=...
+TAVUS_PERSONA_ID=...

 # Simli
 SIMLI_API_KEY=...
 SIMLI_FACE_ID=...

-# Smart turn
-LOCAL_SMART_TURN_MODEL_PATH=...
-FAL_SMART_TURN_API_KEY=...
+# Krisp
+KRISP_MODEL_PATH=...

-# Soniox
-SONIOX_API_KEY=...
+# Krisp Viva
+KRISP_VIVA_MODEL_PATH=...

-# Speechmatics
-SPEECHMATICS_API_KEY=...
+# DeepSeek
+DEEPSEEK_API_KEY=...

-# Tavus
-TAVUS_API_KEY=...
-TAVUS_REPLICA_ID=...
+# Groq
+GROQ_API_KEY=...

-# Telnyx
-TELNYX_API_KEY=...
-TELNYX_ACCOUNT_SID=...
+# Grok
+GROK_API_KEY=...
+
+# Inworld
+INWORLD_API_KEY=...

 # Together.ai
 TOGETHER_API_KEY=...

+# Cerebras
+CEREBRAS_API_KEY=...
+
+# Fish Audio
+FISH_API_KEY=...
+
+# Assembly AI
+ASSEMBLYAI_API_KEY=...
+
+# OpenRouter
+OPENROUTER_API_KEY=...
+
+# Piper
+PIPER_BASE_URL=...
+
+# Smart turn
+LOCAL_SMART_TURN_MODEL_PATH=...
+FAL_SMART_TURN_API_KEY=...
+
 # Twilio
 TWILIO_ACCOUNT_SID=...
 TWILIO_AUTH_TOKEN=...

+# MiniMax
+MINIMAX_API_KEY=...
+MINIMAX_GROUP_ID=...
+
+# Sarvam AI
+SARVAM_API_KEY=...
+
+# Soniox
+SONIOX_API_KEY=
+
+# Speechmatics
+SPEECHMATICS_API_KEY=...
+
+# SambaNova
+SAMBANOVA_API_KEY=...
+
+# Sentry
+SENTRY_DSN=...
+
+# Heygen
+HEYGEN_API_KEY=...
+
+# Mistral
+MISTRAL_API_KEY=...
+
+# NVIDIA
+NVIDIA_API_KEY=...
+
+# Qwen
+QWEN_API_KEY=...
+
 # WhatsApp
-WHATSAPP_TOKEN=...
-WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
-WHATSAPP_PHONE_NUMBER_ID=...
-WHATSAPP_APP_SECRET=...
+WHATSAPP_TOKEN=
+WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=
+WHATSAPP_PHONE_NUMBER_ID=
+WHATSAPP_APP_SECRET=
--- a/examples/foundational/07-interruptible.py
+++ b/examples/foundational/07-interruptible.py
@@ -21,8 +21,8 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.cartesia.stt import CartesiaSTTService
 from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -58,7 +58,7 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
+    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
--- a/examples/foundational/07a-interruptible-speechmatics-vad.py
+++ b/examples/foundational/07a-interruptible-speechmatics-vad.py
@@ -6,7 +6,6 @@

 import os

-import aiohttp
 from dotenv import load_dotenv
 from loguru import logger

@@ -21,10 +20,10 @@ from pipecat.processors.aggregators.llm_response import (
 from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
+from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
 from pipecat.services.openai.base_llm import BaseOpenAILLMService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
-from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
 from pipecat.transcriptions.language import Language
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -52,127 +51,121 @@ transport_params = {


 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    """Speechmatics STT and TTS Service Example
+    """Speechmatics STT Service Example

-    This example demonstrates using Speechmatics Speech-to-Text and Text-to-Speech services
-    with speaker diarization and intelligent speaker management. Key features:
+    This example demonstrates using Speechmatics Speech-to-Text service with speaker diarization and intelligent speaker management. Key features:

-    1. Speaker Diarization (STT)
+    1. Speaker Diarization
       - Automatically identifies and distinguishes between different speakers
       - First speaker is identified as 'S1', others get subsequent IDs
       - Uses `enable_diarization` parameter to manage speaker detection

-    2. Smart Speaker Control (STT)
+    2. Smart Speaker Control
       - `focus_speakers` parameter lets you target specific speakers (e.g. ["S1"])
       - Other speakers will be wrapped in PASSIVE tags
       - Only processes speech from focused speakers
       - Words from all speakers are wrapped with XML tags for clear speaker identification
       - Other speakers' speech only sent when focused speaker is active

-    3. Voice Activity Detection (STT)
+    3. Voice Activity Detection
       - Built-in VAD using `enable_vad` parameter
       - Remove `vad_analyzer` from `transport` config to use module's VAD
       - Emits speaker started/stopped events

-    4. Text-to-Speech (TTS)
-       - Low latency streaming audio synthesis
-       - Multiple voice options available including `sarah`, `theo`, and `megan`
-
-    5. Configuration Options
+    4. Configuration Options
       - `operating_point` parameter defaults to `ENHANCED` for optimal accuracy
       - Configurable `end_of_utterance_silence_trigger` (default 0.5s)
       - Customizable speaker formatting
       - Additional diarization settings available

-    For detailed information:
-    - STT: https://docs.speechmatics.com/rt-api-ref
-    - TTS: https://docs.speechmatics.com/text-to-speech/quickstart
+    For detailed information about operating points and configuration:
+    https://docs.speechmatics.com/rt-api-ref
    """

    logger.info(f"Starting bot")
-    async with aiohttp.ClientSession() as session:
-        stt = SpeechmaticsSTTService(
-            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            params=SpeechmaticsSTTService.InputParams(
-                language=Language.EN,
-                enable_vad=True,
-                enable_diarization=True,
-                focus_speakers=["S1"],
-                end_of_utterance_silence_trigger=0.5,
-                speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
-                speaker_passive_format="<PASSIVE><{speaker_id}>{text}</{speaker_id}></PASSIVE>",
+
+    stt = SpeechmaticsSTTService(
+        api_key=os.getenv("SPEECHMATICS_API_KEY"),
+        params=SpeechmaticsSTTService.InputParams(
+            language=Language.EN,
+            enable_vad=True,
+            enable_diarization=True,
+            focus_speakers=["S1"],
+            end_of_utterance_silence_trigger=0.5,
+            speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
+            speaker_passive_format="<PASSIVE><{speaker_id}>{text}</{speaker_id}></PASSIVE>",
+        ),
+    )
+
+    tts = ElevenLabsTTSService(
+        api_key=os.getenv("ELEVENLABS_API_KEY"),
+        voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        model="eleven_turbo_v2_5",
+    )
+
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        params=BaseOpenAILLMService.InputParams(temperature=0.75),
+    )
+
+    messages = [
+        {
+            "role": "system",
+            "content": (
+                "You are a helpful British assistant called Alfred. "
+                "Your goal is to demonstrate your capabilities in a succinct way. "
+                "Your output will be converted to audio so don't include special characters in your answers. "
+                "Always include punctuation in your responses. "
+                "Give very short replies - do not give longer replies unless strictly necessary. "
+                "Respond to what the user said in a concise, funny, creative and helpful way. "
+                "Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
+                "Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
            ),
-        )
+        },
+    ]

-        tts = SpeechmaticsTTSService(
-            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            voice_id="sarah",
-            aiohttp_session=session,
-        )
+    context = LLMContext(messages)
+    context_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
+    )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            params=BaseOpenAILLMService.InputParams(temperature=0.75),
-        )
-
-        messages = [
-            {
-                "role": "system",
-                "content": (
-                    "You are a helpful British assistant called Sarah. "
-                    "Your goal is to demonstrate your capabilities in a succinct way. "
-                    "Your output will be converted to audio so don't include special characters in your answers. "
-                    "Always include punctuation in your responses. "
-                    "Give very short replies - do not give longer replies unless strictly necessary. "
-                    "Respond to what the user said in a concise, funny, creative and helpful way. "
-                    "Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
-                    "Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
-                ),
-            },
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,
+            context_aggregator.user(),  # User responses
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            context_aggregator.assistant(),  # Assistant spoken responses
        ]
+    )

-        context = LLMContext(messages)
-        context_aggregator = LLMContextAggregatorPair(
-            context,
-            user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
-        )
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )

-        pipeline = Pipeline(
-            [
-                transport.input(),  # Transport user input
-                stt,
-                context_aggregator.user(),  # User responses
-                llm,  # LLM
-                tts,  # TTS
-                transport.output(),  # Transport bot output
-                context_aggregator.assistant(),  # Assistant spoken responses
-            ]
-        )
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        messages.append({"role": "system", "content": "Say a short hello to the user."})
+        await task.queue_frames([LLMRunFrame()])

-        task = PipelineTask(
-            pipeline,
-            params=PipelineParams(
-                enable_metrics=True,
-                enable_usage_metrics=True,
-            ),
-            idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-        )
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()

-        @transport.event_handler("on_client_connected")
-        async def on_client_connected(transport, client):
-            logger.info(f"Client connected")
-            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Say a short hello to the user."})
-            await task.queue_frames([LLMRunFrame()])
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

-        @transport.event_handler("on_client_disconnected")
-        async def on_client_disconnected(transport, client):
-            logger.info(f"Client disconnected")
-            await task.cancel()
-
-        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-        await runner.run(task)
+    await runner.run(task)


 async def bot(runner_args: RunnerArguments):
--- a/examples/foundational/07a-interruptible-speechmatics.py
+++ b/examples/foundational/07a-interruptible-speechmatics.py
@@ -6,7 +6,6 @@

 import os

-import aiohttp
 from dotenv import load_dotenv
 from loguru import logger

@@ -25,10 +24,10 @@ from pipecat.processors.aggregators.llm_response import (
 from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
+from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
 from pipecat.services.openai.base_llm import BaseOpenAILLMService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
-from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
 from pipecat.transcriptions.language import Language
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -62,106 +61,100 @@ transport_params = {


 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    """Run example using Speechmatics STT and TTS.
+    """Run example using Speechmatics STT.

-    This example demonstrates a complete Speechmatics integration with both Speech-to-Text
-    and Text-to-Speech services:
+    This example will use diarization within our STT service and output the words spoken by
+    each individual speaker and wrap them with XML tags for the LLM to process. Note the
+    instructions in the system context for the LLM. This greatly improves the conversation
+    experience by allowing the LLM to understand who is speaking in a multi-party call.

-    STT Features:
-    - Diarization to identify and distinguish between different speakers
-    - Words spoken by each speaker are wrapped with XML tags for LLM processing
-    - System context instructions help the LLM understand multi-party conversations
-    - ENHANCED operating point by default for optimal accuracy
+    By default, this example will use our ENHANCED operating point, which is optimized for
+    high accuracy. You can change this by setting the `operating_point` parameter to a different
+    value.

-    TTS Features:
-    - Low latency streaming audio synthesis
-    - Multiple voice options available including `sarah`, `theo`, and `megan`
-
-    For more information:
-    - STT: https://docs.speechmatics.com/rt-api-ref
-    - TTS: https://docs.speechmatics.com/text-to-speech/quickstart
+    For more information on operating points, see the Speechmatics documentation:
+    https://docs.speechmatics.com/rt-api-ref
    """
    logger.info(f"Starting bot")

-    async with aiohttp.ClientSession() as session:
-        stt = SpeechmaticsSTTService(
-            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            params=SpeechmaticsSTTService.InputParams(
-                language=Language.EN,
-                enable_diarization=True,
-                end_of_utterance_silence_trigger=0.5,
-                speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
+    stt = SpeechmaticsSTTService(
+        api_key=os.getenv("SPEECHMATICS_API_KEY"),
+        params=SpeechmaticsSTTService.InputParams(
+            language=Language.EN,
+            enable_diarization=True,
+            end_of_utterance_silence_trigger=0.5,
+            speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
+        ),
+    )
+
+    tts = ElevenLabsTTSService(
+        api_key=os.getenv("ELEVENLABS_API_KEY"),
+        voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        model="eleven_turbo_v2_5",
+    )
+
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        params=BaseOpenAILLMService.InputParams(temperature=0.75),
+    )
+
+    messages = [
+        {
+            "role": "system",
+            "content": (
+                "You are a helpful British assistant called Alfred. "
+                "Your goal is to demonstrate your capabilities in a succinct way. "
+                "Your output will be converted to audio so don't include special characters in your answers. "
+                "Always include punctuation in your responses. "
+                "Give very short replies - do not give longer replies unless strictly necessary. "
+                "Respond to what the user said in a concise, funny, creative and helpful way. "
+                "Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
            ),
-        )
+        },
+    ]

-        tts = SpeechmaticsTTSService(
-            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            voice_id="sarah",
-            aiohttp_session=session,
-        )
+    context = LLMContext(messages)
+    context_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
+    )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            params=BaseOpenAILLMService.InputParams(temperature=0.75),
-        )
-
-        messages = [
-            {
-                "role": "system",
-                "content": (
-                    "You are a helpful British assistant called Sarah. "
-                    "Your goal is to demonstrate your capabilities in a succinct way. "
-                    "Your output will be converted to audio so don't include special characters in your answers. "
-                    "Always include punctuation in your responses. "
-                    "Give very short replies - do not give longer replies unless strictly necessary. "
-                    "Respond to what the user said in a concise, funny, creative and helpful way. "
-                    "Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
-                ),
-            },
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,  # STT
+            context_aggregator.user(),  # User responses
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            context_aggregator.assistant(),  # Assistant spoken responses
        ]
+    )

-        context = LLMContext(messages)
-        context_aggregator = LLMContextAggregatorPair(
-            context,
-            user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
-        )
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )

-        pipeline = Pipeline(
-            [
-                transport.input(),  # Transport user input
-                stt,  # STT
-                context_aggregator.user(),  # User responses
-                llm,  # LLM
-                tts,  # TTS
-                transport.output(),  # Transport bot output
-                context_aggregator.assistant(),  # Assistant spoken responses
-            ]
-        )
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        messages.append({"role": "system", "content": "Say a short hello to the user."})
+        await task.queue_frames([LLMRunFrame()])

-        task = PipelineTask(
-            pipeline,
-            params=PipelineParams(
-                enable_metrics=True,
-                enable_usage_metrics=True,
-            ),
-            idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-        )
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()

-        @transport.event_handler("on_client_connected")
-        async def on_client_connected(transport, client):
-            logger.info(f"Client connected")
-            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Say a short hello to the user."})
-            await task.queue_frames([LLMRunFrame()])
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

-        @transport.event_handler("on_client_disconnected")
-        async def on_client_disconnected(transport, client):
-            logger.info(f"Client disconnected")
-            await task.cancel()
-
-        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-        await runner.run(task)
+    await runner.run(task)


 async def bot(runner_args: RunnerArguments):
--- a/examples/foundational/07c-interruptible-deepgram-flux.py
+++ b/examples/foundational/07c-interruptible-deepgram-flux.py
@@ -101,10 +101,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client disconnected")
        await task.cancel()

-    @stt.event_handler("on_update")
-    async def on_deepgram_flux_update(stt, transcript):
-        logger.debug(f"On deeggram flux update: {transcript}")
-
    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

    await runner.run(task)
--- a/examples/foundational/07c-interruptible-deepgram-http.py
+++ b/examples/foundational/07c-interruptible-deepgram-http.py
@@ -1,132 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-
-import os
-
-import aiohttp
-from dotenv import load_dotenv
-from loguru import logger
-
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
-from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.deepgram.tts import DeepgramHttpTTSService
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-
-load_dotenv(override=True)
-
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "twilio": lambda: FastAPIWebsocketParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    async with aiohttp.ClientSession() as session:
-        stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
-
-        tts = DeepgramHttpTTSService(
-            api_key=os.getenv("DEEPGRAM_API_KEY"),
-            voice="aura-2-andromeda-en",
-            aiohttp_session=session,
-        )
-
-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
-        context_aggregator = LLMContextAggregatorPair(context)
-
-        pipeline = Pipeline(
-            [
-                transport.input(),  # Transport user input
-                stt,  # STT
-                context_aggregator.user(),  # User responses
-                llm,  # LLM
-                tts,  # TTS
-                transport.output(),  # Transport bot output
-                context_aggregator.assistant(),  # Assistant spoken responses
-            ]
-        )
-
-        task = PipelineTask(
-            pipeline,
-            params=PipelineParams(
-                enable_metrics=True,
-                enable_usage_metrics=True,
-            ),
-            idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-        )
-
-        @transport.event_handler("on_client_connected")
-        async def on_client_connected(transport, client):
-            logger.info(f"Client connected")
-            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
-            await task.queue_frames([LLMRunFrame()])
-
-        @transport.event_handler("on_client_disconnected")
-        async def on_client_disconnected(transport, client):
-            logger.info(f"Client disconnected")
-            await task.cancel()
-
-        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-        await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/07m-interruptible-aws.py
+++ b/examples/foundational/07m-interruptible-aws.py
@@ -67,8 +67,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    llm = AWSBedrockLLMService(
        aws_region="us-west-2",
-        model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
-        params=AWSBedrockLLMService.InputParams(temperature=0.8),
+        model="us.anthropic.claude-3-5-haiku-20241022-v1:0",
+        params=AWSBedrockLLMService.InputParams(temperature=0.8, latency="optimized"),
    )

    messages = [
--- a/examples/foundational/07z-interruptible-sarvam-http.py
+++ b/examples/foundational/07z-interruptible-sarvam-http.py
@@ -22,8 +22,8 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
+from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.services.sarvam.stt import SarvamSTTService
 from pipecat.services.sarvam.tts import SarvamHttpTTSService
 from pipecat.transcriptions.language import Language
 from pipecat.transports.base_transport import BaseTransport, TransportParams
@@ -63,10 +63,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    # Create an HTTP session
    async with aiohttp.ClientSession() as session:
-        stt = SarvamSTTService(
-            api_key=os.getenv("SARVAM_API_KEY"),
-            model="saarika:v2.5",
-        )
+        stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

        tts = SarvamHttpTTSService(
            api_key=os.getenv("SARVAM_API_KEY"),
--- a/examples/foundational/07z-interruptible-sarvam.py
+++ b/examples/foundational/07z-interruptible-sarvam.py
@@ -24,8 +24,8 @@ from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
+from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.services.sarvam.stt import SarvamSTTService
 from pipecat.services.sarvam.tts import SarvamTTSService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -62,10 +62,7 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = SarvamSTTService(
-        api_key=os.getenv("SARVAM_API_KEY"),
-        model="saarika:v2.5",
-    )
+    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

    tts = SarvamTTSService(
        api_key=os.getenv("SARVAM_API_KEY"),
--- a/examples/foundational/08-bots-arguing.py
+++ b/examples/foundational/08-bots-arguing.py
@@ -0,0 +1,147 @@
+import asyncio
+import logging
+import os
+from typing import Tuple
+
+import aiohttp
+from dotenv import load_dotenv
+
+from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMContextFrame, TextFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.processors.aggregators import SentenceAggregator
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.runner.daily import configure
+from pipecat.services.azure import AzureLLMService, AzureTTSService
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.services.fal import FalImageGenService
+from pipecat.transports.daily.transport import DailyTransport
+
+load_dotenv(override=True)
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("pipecat")
+logger.setLevel(logging.DEBUG)
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        (room_url, _) = await configure(session)
+
+        transport = DailyTransport(
+            room_url,
+            None,
+            "Respond bot",
+            duration_minutes=10,
+            mic_enabled=True,
+            mic_sample_rate=16000,
+            camera_enabled=True,
+            camera_width=1024,
+            camera_height=1024,
+        )
+
+        llm = AzureLLMService(
+            api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
+            endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
+            model=os.getenv("AZURE_CHATGPT_MODEL"),
+        )
+        tts1 = AzureTTSService(
+            api_key=os.getenv("AZURE_SPEECH_API_KEY"),
+            region=os.getenv("AZURE_SPEECH_REGION"),
+        )
+        tts2 = ElevenLabsTTSService(
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id="jBpfuIE2acCO8z3wKNLl",
+        )
+        dalle = FalImageGenService(
+            params=FalImageGenService.InputParams(image_size="1024x1024"),
+            aiohttp_session=session,
+            key=os.getenv("FAL_KEY"),
+        )
+
+        bot1_messages = [
+            {
+                "role": "system",
+                "content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long.",
+            },
+        ]
+        bot2_messages = [
+            {
+                "role": "system",
+                "content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich.",
+            },
+        ]
+
+        async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
+            """This function streams text from the LLM and uses the TTS service to convert
+            that text to speech as it's received.
+            """
+            source_queue = asyncio.Queue()
+            sink_queue = asyncio.Queue()
+            sentence_aggregator = SentenceAggregator()
+            pipeline = Pipeline([llm, sentence_aggregator, tts1], source_queue, sink_queue)
+
+            await source_queue.put(LLMContextFrame(LLMContext(messages)))
+            await source_queue.put(EndFrame())
+            await pipeline.run_pipeline()
+
+            message = ""
+            all_audio = bytearray()
+            while sink_queue.qsize():
+                frame = sink_queue.get_nowait()
+                if isinstance(frame, TextFrame):
+                    message += frame.text
+                elif isinstance(frame, AudioFrame):
+                    all_audio.extend(frame.audio)
+
+            return (message, all_audio)
+
+        async def get_bot1_statement():
+            message, audio = await get_text_and_audio(bot1_messages)
+
+            bot1_messages.append({"role": "assistant", "content": message})
+            bot2_messages.append({"role": "user", "content": message})
+
+            return audio
+
+        async def get_bot2_statement():
+            message, audio = await get_text_and_audio(bot2_messages)
+
+            bot2_messages.append({"role": "assistant", "content": message})
+            bot1_messages.append({"role": "user", "content": message})
+
+            return audio
+
+        async def argue():
+            for i in range(100):
+                print(f"In iteration {i}")
+
+                bot1_description = "A woman conservatively dressed as a librarian in a library surrounded by books, cartoon, serious, highly detailed"
+
+                (audio1, image_data1) = await asyncio.gather(
+                    get_bot1_statement(), dalle.run_image_gen(bot1_description)
+                )
+                await transport.send_queue.put(
+                    [
+                        ImageFrame(image_data1[1], image_data1[2]),
+                        AudioFrame(audio1),
+                    ]
+                )
+
+                bot2_description = "A cat dressed in a hot dog costume, cartoon, bright colors, funny, highly detailed"
+
+                (audio2, image_data2) = await asyncio.gather(
+                    get_bot2_statement(), dalle.run_image_gen(bot2_description)
+                )
+                await transport.send_queue.put(
+                    [
+                        ImageFrame(image_data2[1], image_data2[2]),
+                        AudioFrame(audio2),
+                    ]
+                )
+
+        await asyncio.gather(transport.run(), argue())
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/foundational/12-describe-image-openai.py
+++ b/examples/foundational/12-describe-image-openai.py
@@ -1,141 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-
-import os
-
-from dotenv import load_dotenv
-from loguru import logger
-from PIL import Image
-
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-
-load_dotenv(override=True)
-
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
-
-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
-
-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
-        },
-    ]
-
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-
-        if not runner_args.body:
-            script_dir = os.path.dirname(__file__)
-            runner_args.body = {
-                "image_path": os.path.join(script_dir, "assets", "cat.jpg"),
-                "question": "Describe this image",
-            }
-
-        image_path = runner_args.body["image_path"]
-        question = runner_args.body["question"]
-
-        # Kick off the conversation.
-        image = Image.open(image_path)
-        message = LLMContext.create_image_message(
-            image=image.tobytes(),
-            format="RGB",
-            size=image.size,
-            text=question,
-        )
-        messages.append(message)
-        await task.queue_frames([LLMRunFrame()])
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/12-describe-video.py
+++ b/examples/foundational/12-describe-video.py
@@ -0,0 +1,180 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import os
+from typing import Optional
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
+from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.audio.vad.vad_analyzer import VADParams
+from pipecat.frames.frames import (
+    Frame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.user_response import UserResponseAggregator
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import (
+    create_transport,
+    get_transport_client_id,
+    maybe_capture_participant_camera,
+)
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.moondream.vision import MoondreamService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+
+load_dotenv(override=True)
+
+
+class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""
+
+    def __init__(self, participant_id: Optional[str] = None):
+        super().__init__()
+        self._participant_id = participant_id
+
+    def set_participant_id(self, participant_id: str):
+        self._participant_id = participant_id
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if self._participant_id and isinstance(frame, TextFrame):
+            await self.push_frame(
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
+            )
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)
+
+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        video_in_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
+        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        video_in_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
+        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    user_response = UserResponseAggregator()
+
+    # Initialize the image requester without setting the participant ID yet
+    image_requester = UserImageRequester()
+
+    image_processor = UserImageProcessor()
+
+    # If you run into weird description, try with use_cpu=True
+    moondream = MoondreamService()
+
+    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            stt,
+            user_response,
+            image_requester,
+            image_processor,
+            moondream,
+            tts,
+            transport.output(),
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected: {client}")
+
+        await maybe_capture_participant_camera(transport, client)
+
+        # Set the participant ID in the image requester
+        client_id = get_transport_client_id(transport, client)
+        image_requester.set_participant_id(client_id)
+
+        # Welcome message
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/14d-function-calling-gemini-flash-video.py
+++ b/examples/foundational/14d-function-calling-gemini-flash-video.py
@@ -5,23 +5,29 @@
 #

 import os
+from typing import Optional

 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
 from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame, UserImageRequestFrame
+from pipecat.frames.frames import (
+    Frame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.processors.frame_processor import FrameDirection
+from pipecat.processors.aggregators.user_response import UserResponseAggregator
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
    create_transport,
@@ -31,37 +37,53 @@ from pipecat.runner.utils import (
 from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.google.llm import GoogleLLMService
-from pipecat.services.llm_service import FunctionCallParams
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams

 load_dotenv(override=True)


-async def fetch_user_image(params: FunctionCallParams):
-    """Fetch the user image and push it to the LLM.
+class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""

-    When called, this function pushes a UserImageRequestFrame upstream to the
-    transport. As a result, the transport will request the user image and push a
-    UserImageRawFrame downstream which will be added to the context by the LLM
-    assistant aggregator.
-    """
-    user_id = params.arguments["user_id"]
-    question = params.arguments["question"]
-    logger.debug(f"Requesting image with user_id={user_id}, question={question}")
+    def __init__(self, participant_id: Optional[str] = None):
+        super().__init__()
+        self._participant_id = participant_id

-    # Request a user image frame and indicate that it should be added to the
-    # context.
-    await params.llm.push_frame(
-        UserImageRequestFrame(user_id=user_id, text=question, append_to_context=True),
-        FrameDirection.UPSTREAM,
-    )
+    def set_participant_id(self, participant_id: str):
+        self._participant_id = participant_id

-    await params.result_callback(None)
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)

-    # Instead of None, it's possible to also provide a tool call answer to
-    # tell the LLM that we are grabbing the image to analyze.
-    # await params.result_callback({"result": "Image is being captured."})
+        if self._participant_id and isinstance(frame, TextFrame):
+            await self.push_frame(
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
+            )
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -88,53 +110,33 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

+    user_response = UserResponseAggregator()
+
+    # Initialize the image requester without setting the participant ID yet
+    image_requester = UserImageRequester()
+
+    image_processor = UserImageProcessor()
+
    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

+    # Google Gemini model for vision analysis
+    google = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY"))
+
    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    # Google Gemini model for vision analysis
-    llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
-    llm.register_function("fetch_user_image", fetch_user_image)
-
-    fetch_image_function = FunctionSchema(
-        name="fetch_user_image",
-        description="Called when the user requests a description of their camera feed",
-        properties={
-            "user_id": {
-                "type": "string",
-                "description": "The ID of the user to grab the image from",
-            },
-            "question": {
-                "type": "string",
-                "description": "The question that the user is asking about the image",
-            },
-        },
-        required=["user_id", "question"],
-    )
-    tools = ToolsSchema(standard_tools=[fetch_image_function])
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
-        },
-    ]
-
-    context = LLMContext(messages, tools)
-    context_aggregator = LLMContextAggregatorPair(context)
-
    pipeline = Pipeline(
        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
+            transport.input(),
+            stt,
+            user_response,
+            image_requester,
+            image_processor,
+            google,
+            tts,
+            transport.output(),
        ]
    )

@@ -155,15 +157,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        # Set the participant ID in the image requester
        client_id = get_transport_client_id(transport, client)
+        image_requester.set_participant_id(client_id)

-        # Kick off the conversation.
-        messages.append(
-            {
-                "role": "system",
-                "content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",
-            }
-        )
-        await task.queue_frames([LLMRunFrame()])
+        # Welcome message
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/12b-describe-image-aws.py
+++ b/examples/foundational/12b-describe-image-aws.py
@@ -1,148 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-
-import os
-
-from dotenv import load_dotenv
-from loguru import logger
-from PIL import Image
-
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
-from pipecat.services.aws.llm import AWSBedrockLLMService
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-
-load_dotenv(override=True)
-
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
-
-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
-
-    llm = AWSBedrockLLMService(
-        aws_region="us-west-2",
-        model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
-        # Note: usually, prefer providing latency="optimized" param.
-        # Here we can't because AWS Bedrock doesn't support it for Claude 3.7,
-        # which we need for image input.
-        params=AWSBedrockLLMService.InputParams(temperature=0.8),
-    )
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
-        },
-    ]
-
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-
-        if not runner_args.body:
-            script_dir = os.path.dirname(__file__)
-            runner_args.body = {
-                "image_path": os.path.join(script_dir, "assets", "cat.jpg"),
-                "question": "Describe this image",
-            }
-
-        image_path = runner_args.body["image_path"]
-        question = runner_args.body["question"]
-
-        # Kick off the conversation.
-        image = Image.open(image_path)
-        message = LLMContext.create_image_message(
-            image=image.tobytes(),
-            format="RGB",
-            size=image.size,
-            text=question,
-        )
-        messages.append(message)
-        await task.queue_frames([LLMRunFrame()])
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/12b-describe-video-gpt-4o.py
+++ b/examples/foundational/12b-describe-video-gpt-4o.py
@@ -4,9 +4,8 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-import io
 import os
-import re
+from typing import Optional

 from dotenv import load_dotenv
 from loguru import logger
@@ -17,17 +16,24 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.audio.vad.vad_analyzer import VADParams
 from pipecat.frames.frames import (
    Frame,
-    LLMRunFrame,
-    MetricsFrame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
 )
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.user_response import UserResponseAggregator
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
+from pipecat.runner.utils import (
+    create_transport,
+    get_transport_client_id,
+    maybe_capture_participant_camera,
+)
 from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.openai.llm import OpenAILLMService
@@ -37,41 +43,46 @@ from pipecat.transports.daily.transport import DailyParams
 load_dotenv(override=True)


-def format_metrics(metrics, indent=0):
-    lines = []
-    tab = "\t" * indent
+class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""

-    for metric in metrics:
-        lines.append(tab + type(metric).__name__)
-        for field, value in vars(metric).items():
-            if hasattr(value, "__dict__") and not isinstance(
-                value, (str, int, float, bool, type(None))
-            ):
-                lines.append(f"{tab}\t{field}={type(value).__name__}")
-                for k, v in vars(value).items():
-                    lines.append(f"{tab}\t\t{k}={repr(v)}")
-            else:
-                lines.append(f"{tab}\t{field}={repr(value)}")
+    def __init__(self, participant_id: Optional[str] = None):
+        super().__init__()
+        self._participant_id = participant_id

-    return "\n".join(lines)
-
-
-class MetricsFrameLogger(FrameProcessor):
-    """MetricsFrameLogger formats and logs all MetericsFrames"""
-
-    def __init__(self, **kwargs):
-        super().__init__(**kwargs)
+    def set_participant_id(self, participant_id: str):
+        self._participant_id = participant_id

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

-        if isinstance(frame, MetricsFrame):
-            logger.info(f"{frame.name}\n    {format_metrics(frame.data)}")
+        if self._participant_id and isinstance(frame, TextFrame):
+            await self.push_frame(
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
+            )
+        else:
            await self.push_frame(frame, direction)

-        # ALWAYS push all frames
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
        else:
-            # SUPER IMPORTANT: always push every frame!
            await self.push_frame(frame, direction)


@@ -82,13 +93,14 @@ transport_params = {
    "daily": lambda: DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
+        video_in_enabled=True,
        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
    ),
    "webrtc": lambda: TransportParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        video_out_enabled=True,
+        video_in_enabled=True,
        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
    ),
@@ -98,37 +110,33 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

+    user_response = UserResponseAggregator()
+
+    # Initialize the image requester without setting the participant ID yet
+    image_requester = UserImageRequester()
+
+    image_processor = UserImageProcessor()
+
    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

+    # OpenAI GPT-4o for vision analysis
+    openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+
    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    metrics_frame_processor = MetricsFrameLogger()
-
    pipeline = Pipeline(
        [
            transport.input(),
            stt,
-            context_aggregator.user(),
-            llm,
+            user_response,
+            image_requester,
+            image_processor,
+            openai,
            tts,
            transport.output(),
-            context_aggregator.assistant(),
-            metrics_frame_processor,  # pretty print metrics frames
        ]
    )

@@ -144,9 +152,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info(f"Client connected: {client}")
-        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
-        await task.queue_frames([LLMRunFrame()])
+
+        await maybe_capture_participant_camera(transport, client)
+
+        # Set the participant ID in the image requester
+        client_id = get_transport_client_id(transport, client)
+        image_requester.set_participant_id(client_id)
+
+        # Welcome message
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/12c-describe-image-gemini-flash.py
+++ b/examples/foundational/12c-describe-image-gemini-flash.py
@@ -1,141 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-
-import os
-
-from dotenv import load_dotenv
-from loguru import logger
-from PIL import Image
-
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.google.llm import GoogleLLMService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-
-load_dotenv(override=True)
-
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
-
-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
-
-    llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
-        },
-    ]
-
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-
-        if not runner_args.body:
-            script_dir = os.path.dirname(__file__)
-            runner_args.body = {
-                "image_path": os.path.join(script_dir, "assets", "cat.jpg"),
-                "question": "Describe this image",
-            }
-
-        image_path = runner_args.body["image_path"]
-        question = runner_args.body["question"]
-
-        # Kick off the conversation.
-        image = Image.open(image_path)
-        message = LLMContext.create_image_message(
-            image=image.tobytes(),
-            format="RGB",
-            size=image.size,
-            text=question,
-        )
-        messages.append(message)
-        await task.queue_frames([LLMRunFrame()])
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/12c-describe-video-anthropic.py
+++ b/examples/foundational/12c-describe-video-anthropic.py
@@ -4,25 +4,36 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-
 import os
+from typing import Optional

 from dotenv import load_dotenv
 from loguru import logger
-from PIL import Image

 from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
 from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame
+from pipecat.frames.frames import (
+    Frame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.user_response import UserResponseAggregator
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
+from pipecat.runner.utils import (
+    create_transport,
+    get_transport_client_id,
+    maybe_capture_participant_camera,
+)
 from pipecat.services.anthropic.llm import AnthropicLLMService
 from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -32,6 +43,49 @@ from pipecat.transports.daily.transport import DailyParams
 load_dotenv(override=True)


+class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""
+
+    def __init__(self, participant_id: Optional[str] = None):
+        super().__init__()
+        self._participant_id = participant_id
+
+    def set_participant_id(self, participant_id: str):
+        self._participant_id = participant_id
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if self._participant_id and isinstance(frame, TextFrame):
+            await self.push_frame(
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
+            )
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)
+
+
 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
 # selected.
@@ -39,12 +93,14 @@ transport_params = {
    "daily": lambda: DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
+        video_in_enabled=True,
        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
    ),
    "webrtc": lambda: TransportParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
+        video_in_enabled=True,
        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
    ),
@@ -54,34 +110,33 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

+    user_response = UserResponseAggregator()
+
+    # Initialize the image requester without setting the participant ID yet
+    image_requester = UserImageRequester()
+
+    image_processor = UserImageProcessor()
+
    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

+    # Anthropic for vision analysis
+    anthropic = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
+
    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
-        },
-    ]
-
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
-
    pipeline = Pipeline(
        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
+            transport.input(),
+            stt,
+            user_response,
+            image_requester,
+            image_processor,
+            anthropic,
+            tts,
+            transport.output(),
        ]
    )

@@ -96,28 +151,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
+        logger.info(f"Client connected: {client}")

-        if not runner_args.body:
-            script_dir = os.path.dirname(__file__)
-            runner_args.body = {
-                "image_path": os.path.join(script_dir, "assets", "cat.jpg"),
-                "question": "Describe this image",
-            }
+        await maybe_capture_participant_camera(transport, client)

-        image_path = runner_args.body["image_path"]
-        question = runner_args.body["question"]
+        # Set the participant ID in the image requester
+        client_id = get_transport_client_id(transport, client)
+        image_requester.set_participant_id(client_id)

-        # Kick off the conversation.
-        image = Image.open(image_path)
-        message = LLMContext.create_image_message(
-            image=image.tobytes(),
-            format="RGB",
-            size=image.size,
-            text=question,
-        )
-        messages.append(message)
-        await task.queue_frames([LLMRunFrame()])
+        # Welcome message
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/12d-describe-image-moondream.py
+++ b/examples/foundational/12d-describe-image-moondream.py
@@ -1,122 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-
-import os
-
-from dotenv import load_dotenv
-from loguru import logger
-from PIL import Image
-
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import UserImageRawFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.moondream.vision import MoondreamService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-
-load_dotenv(override=True)
-
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
-
-    vision = MoondreamService()
-
-    pipeline = Pipeline(
-        [
-            vision,  # Vision
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-
-        if not runner_args.body:
-            script_dir = os.path.dirname(__file__)
-            runner_args.body = {
-                "image_path": os.path.join(script_dir, "assets", "cat.jpg"),
-                "question": "Describe this image",
-            }
-
-        image_path = runner_args.body["image_path"]
-        question = runner_args.body["question"]
-
-        # Describe the image.
-        image = Image.open(image_path)
-        await task.queue_frames(
-            [
-                UserImageRawFrame(
-                    image=image.tobytes(),
-                    format="RGB",
-                    size=image.size,
-                    text=question,
-                )
-            ]
-        )
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/14d-function-calling-aws-video.py
+++ b/examples/foundational/14d-function-calling-aws-video.py
@@ -5,23 +5,29 @@
 #

 import os
+from typing import Optional

 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
 from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame, UserImageRequestFrame
+from pipecat.frames.frames import (
+    Frame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.processors.frame_processor import FrameDirection
+from pipecat.processors.aggregators.user_response import UserResponseAggregator
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
    create_transport,
@@ -31,37 +37,54 @@ from pipecat.runner.utils import (
 from pipecat.services.aws.llm import AWSBedrockLLMService
 from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.llm_service import FunctionCallParams
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams

 load_dotenv(override=True)


-async def fetch_user_image(params: FunctionCallParams):
-    """Fetch the user image and push it to the LLM.
+class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""

-    When called, this function pushes a UserImageRequestFrame upstream to the
-    transport. As a result, the transport will request the user image and push a
-    UserImageRawFrame downstream which will be added to the context by the LLM
-    assistant aggregator.
-    """
-    user_id = params.arguments["user_id"]
-    question = params.arguments["question"]
-    logger.debug(f"Requesting image with user_id={user_id}, question={question}")
+    def __init__(self, participant_id: Optional[str] = None):
+        super().__init__()
+        self._participant_id = participant_id

-    # Request a user image frame and indicate that it should be added to the
-    # context.
-    await params.llm.push_frame(
-        UserImageRequestFrame(user_id=user_id, text=question, append_to_context=True),
-        FrameDirection.UPSTREAM,
-    )
+    def set_participant_id(self, participant_id: str):
+        self._participant_id = participant_id

-    await params.result_callback(None)
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)

-    # Instead of None, it's possible to also provide a tool call answer to
-    # tell the LLM that we are grabbing the image to analyze.
-    # await params.result_callback({"result": "Image is being captured."})
+        if self._participant_id and isinstance(frame, TextFrame):
+            await self.push_frame(
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
+            )
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                # Note: AWS Bedrock does not yet support the universal LLMContext
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -88,15 +111,17 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

+    user_response = UserResponseAggregator()
+
+    # Initialize the image requester without setting the participant ID yet
+    image_requester = UserImageRequester()
+
+    image_processor = UserImageProcessor()
+
    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
-
    # AWS for vision analysis
-    llm = AWSBedrockLLMService(
+    aws = AWSBedrockLLMService(
        aws_region="us-west-2",
        model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
        # Note: usually, prefer providing latency="optimized" param.
@@ -104,44 +129,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        # which we need for image input.
        params=AWSBedrockLLMService.InputParams(temperature=0.8),
    )
-    llm.register_function("fetch_user_image", fetch_user_image)

-    fetch_image_function = FunctionSchema(
-        name="fetch_user_image",
-        description="Called when the user requests a description of their camera feed",
-        properties={
-            "user_id": {
-                "type": "string",
-                "description": "The ID of the user to grab the image from",
-            },
-            "question": {
-                "type": "string",
-                "description": "The question that the user is asking about the image",
-            },
-        },
-        required=["user_id", "question"],
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )
-    tools = ToolsSchema(standard_tools=[fetch_image_function])
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
-        },
-    ]
-
-    context = LLMContext(messages, tools)
-    context_aggregator = LLMContextAggregatorPair(context)

    pipeline = Pipeline(
        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
+            transport.input(),
+            stt,
+            user_response,
+            image_requester,
+            image_processor,
+            aws,
+            tts,
+            transport.output(),
        ]
    )

@@ -162,15 +165,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        # Set the participant ID in the image requester
        client_id = get_transport_client_id(transport, client)
+        image_requester.set_participant_id(client_id)

-        # Kick off the conversation.
-        messages.append(
-            {
-                "role": "system",
-                "content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",
-            }
-        )
-        await task.queue_frames([LLMRunFrame()])
+        # Welcome message
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/13f-cartesia-transcription.py
+++ b/examples/foundational/13f-cartesia-transcription.py
@@ -48,7 +48,10 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
+    stt = CartesiaSTTService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        base_url=os.getenv("CARTESIA_BASE_URL"),
+    )

    tl = TranscriptionLogger()

--- a/examples/foundational/14b-function-calling-anthropic-video.py
+++ b/examples/foundational/14b-function-calling-anthropic-video.py
@@ -4,6 +4,8 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+
+import asyncio
 import os

 from dotenv import load_dotenv
@@ -15,13 +17,12 @@ from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
 from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame, UserImageRequestFrame
+from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.processors.frame_processor import FrameDirection
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
    create_transport,
@@ -38,30 +39,34 @@ from pipecat.transports.daily.transport import DailyParams
 load_dotenv(override=True)


-async def fetch_user_image(params: FunctionCallParams):
-    """Fetch the user image and push it to the LLM.
+# Global variable to store the client ID
+client_id = ""

-    When called, this function pushes a UserImageRequestFrame upstream to the
-    transport. As a result, the transport will request the user image and push a
-    UserImageRawFrame downstream which will be added to the context by the LLM
-    assistant aggregator.
-    """
-    user_id = params.arguments["user_id"]
+
+async def get_weather(params: FunctionCallParams):
+    location = params.arguments["location"]
+    await params.result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
+
+
+async def get_image(params: FunctionCallParams):
    question = params.arguments["question"]
-    logger.debug(f"Requesting image with user_id={user_id}, question={question}")
+    logger.debug(f"Requesting image with user_id={client_id}, question={question}")

-    # Request a user image frame and indicate that it should be added to the
-    # context.
-    await params.llm.push_frame(
-        UserImageRequestFrame(user_id=user_id, text=question, append_to_context=True),
-        FrameDirection.UPSTREAM,
+    # Request the image frame
+    await params.llm.request_image_frame(
+        user_id=client_id,
+        function_name=params.function_name,
+        tool_call_id=params.tool_call_id,
+        text_content=question,
    )

-    await params.result_callback(None)
+    # Wait a short time for the frame to be processed
+    await asyncio.sleep(0.5)

-    # Instead of None, it's possible to also provide a tool call answer to
-    # tell the LLM that we are grabbing the image to analyze.
-    # await params.result_callback({"result": "Image is being captured."})
+    # Return a result to complete the function call
+    await params.result_callback(
+        f"I've captured an image from your camera and I'm analyzing what you asked about: {question}"
+    )


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -95,32 +100,70 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    # Anthropic for vision analysis
-    llm = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
-    llm.register_function("fetch_user_image", fetch_user_image)
+    llm = AnthropicLLMService(
+        api_key=os.getenv("ANTHROPIC_API_KEY"),
+        model="claude-3-7-sonnet-latest",
+        params=AnthropicLLMService.InputParams(enable_prompt_caching=True),
+    )
+    llm.register_function("get_weather", get_weather)
+    llm.register_function("get_image", get_image)

-    fetch_image_function = FunctionSchema(
-        name="fetch_user_image",
-        description="Called when the user requests a description of their camera feed",
+    weather_function = FunctionSchema(
+        name="get_weather",
+        description="Get the current weather",
        properties={
-            "user_id": {
+            "location": {
                "type": "string",
-                "description": "The ID of the user to grab the image from",
-            },
-            "question": {
-                "type": "string",
-                "description": "The question that the user is asking about the image",
+                "description": "The city and state, e.g. San Francisco, CA",
            },
        },
-        required=["user_id", "question"],
+        required=["location"],
    )
-    tools = ToolsSchema(standard_tools=[fetch_image_function])
+    get_image_function = FunctionSchema(
+        name="get_image",
+        description="Get an image from the video stream.",
+        properties={
+            "question": {
+                "type": "string",
+                "description": "The question that the user is asking about the image.",
+            }
+        },
+        required=["question"],
+    )
+    tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
+
+    system_prompt = """\
+You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.
+
+Your response will be turned into speech so use only simple words and punctuation.
+
+You have access to two tools: get_weather and get_image.
+
+You can respond to questions about the weather using the get_weather tool.
+
+You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
+indicate you should use the get_image tool are:
+- What do you see?
+- What's in the video?
+- Can you describe the video?
+- Tell me about what you see.
+- Tell me something interesting about what you see.
+- What's happening in the video?
+
+If you need to use a tool, simply use the tool. Do not tell the user the tool you are using. Be brief and concise.
+    """

    messages = [
        {
            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
+            "content": [
+                {
+                    "type": "text",
+                    "text": system_prompt,
+                }
+            ],
        },
+        {"role": "user", "content": "Start the conversation by introducing yourself."},
    ]

    context = LLMContext(messages, tools)
@@ -130,11 +173,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        [
            transport.input(),  # Transport user input
            stt,  # STT
-            context_aggregator.user(),  # User responses
+            context_aggregator.user(),  # User speech to text
            llm,  # LLM
            tts,  # TTS
            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
+            context_aggregator.assistant(),  # Assistant spoken responses and tool context
        ]
    )

@@ -153,16 +196,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        await maybe_capture_participant_camera(transport, client)

-        # Set the participant ID in the image requester
+        global client_id
        client_id = get_transport_client_id(transport, client)

        # Kick off the conversation.
-        messages.append(
-            {
-                "role": "system",
-                "content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",
-            }
-        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/14d-function-calling-moondream-video.py
+++ b/examples/foundational/14d-function-calling-moondream-video.py
@@ -1,190 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import os
-
-from dotenv import load_dotenv
-from loguru import logger
-
-from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame, UserImageRequestFrame
-from pipecat.pipeline.parallel_pipeline import ParallelPipeline
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.processors.frame_processor import FrameDirection
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import (
-    create_transport,
-    get_transport_client_id,
-    maybe_capture_participant_camera,
-)
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.llm_service import FunctionCallParams
-from pipecat.services.moondream.vision import MoondreamService
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-
-load_dotenv(override=True)
-
-
-async def fetch_user_image(params: FunctionCallParams):
-    """Fetch the user image.
-
-    When called, this function pushes a UserImageRequestFrame upstream to the
-    transport. As a result, the transport will request the user image and push a
-    UserImageRawFrame downstream.
-    """
-    user_id = params.arguments["user_id"]
-    question = params.arguments["question"]
-    logger.debug(f"Requesting image with user_id={user_id}, question={question}")
-
-    # Request a user image frame. In this case, we don't want the requested
-    # image to be added to the context because we will process it with
-    # Moondream.
-    await params.llm.push_frame(
-        UserImageRequestFrame(user_id=user_id, text=question, append_to_context=False),
-        FrameDirection.UPSTREAM,
-    )
-
-    await params.result_callback(None)
-
-    # Instead of None, it's possible to also provide a tool call answer to
-    # tell the LLM that we are grabbing the image to analyze.
-    # await params.result_callback({"result": "Image is being captured."})
-
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        video_in_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        video_in_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
-
-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
-
-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-    llm.register_function("fetch_user_image", fetch_user_image)
-
-    fetch_image_function = FunctionSchema(
-        name="fetch_user_image",
-        description="Called when the user requests a description of their camera feed",
-        properties={
-            "user_id": {
-                "type": "string",
-                "description": "The ID of the user to grab the image from",
-            },
-            "question": {
-                "type": "string",
-                "description": "The question that the user is asking about the image",
-            },
-        },
-        required=["user_id", "question"],
-    )
-    tools = ToolsSchema(standard_tools=[fetch_image_function])
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
-        },
-    ]
-
-    context = LLMContext(messages, tools)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    # If you run into weird description, try with use_cpu=True
-    moondream = MoondreamService()
-
-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            context_aggregator.user(),  # User responses
-            ParallelPipeline(
-                [llm],  # LLM
-                [moondream],
-            ),
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected: {client}")
-
-        await maybe_capture_participant_camera(transport, client)
-
-        # Set the participant ID in the image requester
-        client_id = get_transport_client_id(transport, client)
-
-        # Kick off the conversation.
-        messages.append(
-            {
-                "role": "system",
-                "content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",
-            }
-        )
-        await task.queue_frames([LLMRunFrame()])
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/14d-function-calling-openai-video.py
+++ b/examples/foundational/14d-function-calling-openai-video.py
@@ -1,186 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-
-import os
-
-from dotenv import load_dotenv
-from loguru import logger
-
-from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame, UserImageRequestFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.processors.frame_processor import FrameDirection
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import (
-    create_transport,
-    get_transport_client_id,
-    maybe_capture_participant_camera,
-)
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.llm_service import FunctionCallParams
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-
-load_dotenv(override=True)
-
-
-async def fetch_user_image(params: FunctionCallParams):
-    """Fetch the user image and push it to the LLM.
-
-    When called, this function pushes a UserImageRequestFrame upstream to the
-    transport. As a result, the transport will request the user image and push a
-    UserImageRawFrame downstream which will be added to the context by the LLM
-    assistant aggregator.
-    """
-    user_id = params.arguments["user_id"]
-    question = params.arguments["question"]
-    logger.debug(f"Requesting image with user_id={user_id}, question={question}")
-
-    # Request a user image frame and indicate that it should be added to the
-    # context.
-    await params.llm.push_frame(
-        UserImageRequestFrame(user_id=user_id, text=question, append_to_context=True),
-        FrameDirection.UPSTREAM,
-    )
-
-    await params.result_callback(None)
-
-    # Instead of None, it's possible to also provide a tool call answer to
-    # tell the LLM that we are grabbing the image to analyze.
-    # await params.result_callback({"result": "Image is being captured."})
-
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        video_in_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        video_in_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
-
-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
-
-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-    llm.register_function("fetch_user_image", fetch_user_image)
-
-    fetch_image_function = FunctionSchema(
-        name="fetch_user_image",
-        description="Called when the user requests a description of their camera feed",
-        properties={
-            "user_id": {
-                "type": "string",
-                "description": "The ID of the user to grab the image from",
-            },
-            "question": {
-                "type": "string",
-                "description": "The question that the user is asking about the image",
-            },
-        },
-        required=["user_id", "question"],
-    )
-    tools = ToolsSchema(standard_tools=[fetch_image_function])
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
-        },
-    ]
-
-    context = LLMContext(messages, tools)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-
-        await maybe_capture_participant_camera(transport, client)
-
-        client_id = get_transport_client_id(transport, client)
-
-        # Kick off the conversation.
-        messages.append(
-            {
-                "role": "system",
-                "content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",
-            }
-        )
-        await task.queue_frames([LLMRunFrame()])
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/14x-function-calling-openpipe.py
+++ b/examples/foundational/14x-function-calling-openpipe.py
@@ -4,8 +4,9 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+
+import asyncio
 import os
-import time

 from dotenv import load_dotenv
 from loguru import logger
@@ -16,31 +17,56 @@ from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
 from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
+from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
+from pipecat.runner.utils import (
+    create_transport,
+    get_transport_client_id,
+    maybe_capture_participant_camera,
+)
 from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.llm_service import FunctionCallParams
-from pipecat.services.openpipe.llm import OpenPipeLLMService
+from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
-from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams

 load_dotenv(override=True)


-async def fetch_weather_from_api(params: FunctionCallParams):
-    await params.result_callback({"conditions": "nice", "temperature": "75"})
+# Global variable to store the client ID
+client_id = ""


-async def fetch_restaurant_recommendation(params: FunctionCallParams):
-    await params.result_callback({"name": "The Golden Dragon"})
+async def get_weather(params: FunctionCallParams):
+    location = params.arguments["location"]
+    await params.result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
+
+
+async def get_image(params: FunctionCallParams):
+    question = params.arguments["question"]
+    logger.debug(f"Requesting image with user_id={client_id}, question={question}")
+
+    # Request the image frame
+    await params.llm.request_image_frame(
+        user_id=client_id,
+        function_name=params.function_name,
+        tool_call_id=params.tool_call_id,
+        text_content=question,
+    )
+
+    # Wait a short time for the frame to be processed
+    await asyncio.sleep(0.5)
+
+    # Return a result to complete the function call
+    await params.result_callback(
+        f"I've captured an image from your camera and I'm analyzing what you asked about: {question}"
+    )


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -50,18 +76,14 @@ transport_params = {
    "daily": lambda: DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "twilio": lambda: FastAPIWebsocketParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
+        video_in_enabled=True,
        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
    ),
    "webrtc": lambda: TransportParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
+        video_in_enabled=True,
        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
    ),
@@ -78,24 +100,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    timestamp = int(time.time())
-    llm = OpenPipeLLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
-        tags={"conversation_id": f"pipecat-{timestamp}"},
-    )
-
-    # You can also register a function_name of None to get all functions
-    # sent to the same callback with an additional function_name parameter.
-    llm.register_function("get_current_weather", fetch_weather_from_api)
-    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
-
-    @llm.event_handler("on_function_calls_started")
-    async def on_function_calls_started(service, function_calls):
-        await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm.register_function("get_weather", get_weather)
+    llm.register_function("get_image", get_image)

    weather_function = FunctionSchema(
-        name="get_current_weather",
+        name="get_weather",
        description="Get the current weather",
        properties={
            "location": {
@@ -108,26 +118,41 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
                "description": "The temperature unit to use. Infer this from the user's location.",
            },
        },
-        required=["location", "format"],
-    )
-    restaurant_function = FunctionSchema(
-        name="get_restaurant_recommendation",
-        description="Get a restaurant recommendation",
-        properties={
-            "location": {
-                "type": "string",
-                "description": "The city and state, e.g. San Francisco, CA",
-            },
-        },
        required=["location"],
    )
-    tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
+    get_image_function = FunctionSchema(
+        name="get_image",
+        description="Get an image from the video stream.",
+        properties={
+            "question": {
+                "type": "string",
+                "description": "The question that the user is asking about the image.",
+            }
        },
+        required=["question"],
+    )
+    tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
+
+    system_prompt = """\
+You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.
+
+Your response will be turned into speech so use only simple words and punctuation.
+
+You have access to two tools: get_weather and get_image.
+
+You can respond to questions about the weather using the get_weather tool.
+
+You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
+indicate you should use the get_image tool are:
+- What do you see?
+- What's in the video?
+- Can you describe the video?
+- Tell me about what you see.
+- Tell me something interesting about what you see.
+- What's happening in the video?
+"""
+    messages = [
+        {"role": "system", "content": system_prompt},
    ]

    context = LLMContext(messages, tools)
@@ -157,6 +182,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
+
+        await maybe_capture_participant_camera(transport, client)
+
+        global client_id
+        client_id = get_transport_client_id(transport, client)
+
        # Kick off the conversation.
        await task.queue_frames([LLMRunFrame()])

--- a/examples/foundational/14r-function-calling-aws.py
+++ b/examples/foundational/14r-function-calling-aws.py
@@ -79,8 +79,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    llm = AWSBedrockLLMService(
        aws_region="us-west-2",
-        model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
-        params=AWSBedrockLLMService.InputParams(temperature=0.8),
+        model="us.anthropic.claude-3-5-haiku-20241022-v1:0",
+        params=AWSBedrockLLMService.InputParams(temperature=0.8, latency="optimized"),
    )

    # You can also register a function_name of None to get all functions
--- a/examples/foundational/18-openai-realtime-usage.py
+++ b/examples/foundational/18-openai-realtime-usage.py
@@ -0,0 +1,156 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Example: Print OpenAI Realtime API Token Usage Statistics
+
+This example demonstrates how to access and print token usage statistics
+from the OpenAI Realtime API, including detailed breakdowns of input/output
+tokens, cached tokens, and audio/text token usage.
+"""
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.audio.vad.vad_analyzer import VADParams
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+# We store functions so objects don't get instantiated until the desired
+# transport gets selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    """Main function demonstrating usage statistics tracking."""
+    logger.info(f"Starting bot")
+
+    # Initialize the OpenAI Realtime service
+    llm = OpenAIRealtimeLLMService(
+        api_key=os.getenv("OPENAI_API_KEY") or "",
+        model="gpt-4o-realtime-preview-2024-12-17",
+    )
+
+    # To access usage statistics, we wrap the internal response handler
+    # This is the cleanest way to intercept usage data from the realtime API
+    original_handler = llm._handle_evt_response_done
+
+    async def custom_response_done_handler(evt):
+        """Custom handler that prints usage stats before calling original handler."""
+        # Print usage statistics if available
+        if evt.response.usage:
+            usage = evt.response.usage
+
+            logger.info("\n" + "=" * 50)
+            logger.info("📊 TOKEN USAGE STATISTICS")
+            logger.info("=" * 50)
+            logger.info(f"Total tokens: {usage.total_tokens}")
+            logger.info(f"Input tokens: {usage.input_tokens}")
+            logger.info(f"Output tokens: {usage.output_tokens}")
+
+            # Input token details
+            if usage.input_token_details:
+                logger.info(f"\n📥 Input token breakdown:")
+                logger.info(f"  • Cached tokens: {usage.input_token_details.cached_tokens}")
+                logger.info(f"  • Text tokens: {usage.input_token_details.text_tokens}")
+                logger.info(f"  • Audio tokens: {usage.input_token_details.audio_tokens}")
+
+                # Cached token details if available
+                if usage.input_token_details.cached_tokens_details:
+                    logger.info(
+                        f"  • Cached text tokens: {usage.input_token_details.cached_tokens_details.text_tokens}"
+                    )
+                    logger.info(
+                        f"  • Cached audio tokens: {usage.input_token_details.cached_tokens_details.audio_tokens}"
+                    )
+
+            # Output token details
+            if usage.output_token_details:
+                logger.info(f"\n📤 Output token breakdown:")
+                logger.info(f"  • Text tokens: {usage.output_token_details.text_tokens}")
+                logger.info(f"  • Audio tokens: {usage.output_token_details.audio_tokens}")
+
+            logger.info("=" * 50 + "\n")
+
+        # Call the original handler to maintain normal functionality
+        await original_handler(evt)
+
+    # Replace the handler with our custom one
+    llm._handle_evt_response_done = custom_response_done_handler
+
+    # Create pipeline
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            llm,
+            transport.output(),
+        ]
+    )
+
+    # Create task
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            allow_interruptions=True,
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info("Client connected")
+        logger.info("🎤 Speak into your microphone to interact with the assistant")
+        logger.info("📊 Usage statistics will be printed after each response")
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info("Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/19-openai-realtime.py
+++ b/examples/foundational/19-openai-realtime.py
@@ -5,7 +5,6 @@
 #


-import asyncio
 import os
 from datetime import datetime

@@ -15,14 +14,12 @@ from loguru import logger
 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import LLMRunFrame, LLMSetToolsFrame, TranscriptionMessage
+from pipecat.frames.frames import LLMRunFrame, TranscriptionMessage
 from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.processors.transcript_processor import TranscriptProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -55,18 +52,6 @@ async def fetch_weather_from_api(params: FunctionCallParams):
    )


-async def get_news(params: FunctionCallParams):
-    await params.result_callback(
-        {
-            "news": [
-                "Massive UFO currently hovering above New York City",
-                "Stock markets reach all-time highs",
-                "Living dinosaur species discovered in the Amazon rainforest",
-            ],
-        }
-    )
-
-
 async def fetch_restaurant_recommendation(params: FunctionCallParams):
    await params.result_callback({"name": "The Golden Dragon"})

@@ -88,13 +73,6 @@ weather_function = FunctionSchema(
    required=["location", "format"],
 )

-get_news_function = FunctionSchema(
-    name="get_news",
-    description="Get the current news.",
-    properties={},
-    required=[],
-)
-
 restaurant_function = FunctionSchema(
    name="get_restaurant_recommendation",
    description="Get a restaurant recommendation",
@@ -162,6 +140,10 @@ even if you're asked about them.
 You are participating in a voice conversation. Keep your responses concise, short, and to the point
 unless specifically asked to elaborate on a topic.

+You have access to the following tools:
+- get_current_weather: Get the current weather for a given location.
+- get_restaurant_recommendation: Get a restaurant recommendation for a given location.
+
 Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
    )

@@ -175,26 +157,25 @@ Remember, your responses should be short. Just one or two sentences, usually. Re
    # llm.register_function(None, fetch_weather_from_api)
    llm.register_function("get_current_weather", fetch_weather_from_api)
    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
-    llm.register_function("get_news", get_news)

    transcript = TranscriptProcessor()

    # Create a standard OpenAI LLM context object using the normal messages format. The
    # OpenAIRealtimeLLMService will convert this internally to messages that the
    # openai WebSocket API can understand.
-    context = LLMContext(
+    context = OpenAILLMContext(
        [{"role": "user", "content": "Say hello!"}],
        tools,
    )

-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
            transport.input(),  # Transport user input
            context_aggregator.user(),
-            transcript.user(),  # LLM pushes TranscriptionFrames upstream
            llm,  # LLM
+            transcript.user(),  # Placed after the LLM, as LLM pushes TranscriptionFrames downstream
            transport.output(),  # Transport bot output
            transcript.assistant(),  # After the transcript output, to time with the audio output
            context_aggregator.assistant(),
@@ -217,13 +198,6 @@ Remember, your responses should be short. Just one or two sentences, usually. Re
        # Kick off the conversation.
        await task.queue_frames([LLMRunFrame()])

-        # Add a new tool at runtime after a delay.
-        await asyncio.sleep(15)
-        new_tools = ToolsSchema(
-            standard_tools=[weather_function, restaurant_function, get_news_function]
-        )
-        await task.queue_frames([LLMSetToolsFrame(tools=new_tools)])
-
    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
        logger.info(f"Client disconnected")
--- a/examples/foundational/19a-azure-realtime.py
+++ b/examples/foundational/19a-azure-realtime.py
@@ -18,9 +18,7 @@ from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.azure.realtime.llm import AzureRealtimeLLMService
@@ -157,10 +155,10 @@ Remember, your responses should be short. Just one or two sentences, usually. Re
    llm.register_function("get_current_weather", fetch_weather_from_api)
    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)

-    # Create a standard LLM context object using the normal messages format. The
+    # Create a standard OpenAI LLM context object using the normal messages format. The
    # OpenAIRealtimeBetaLLMService will convert this internally to messages that the
    # openai WebSocket API can understand.
-    context = LLMContext(
+    context = OpenAILLMContext(
        [{"role": "user", "content": "Say hello!"}],
        # [{"role": "user", "content": [{"type": "text", "text": "Say hello!"}]}],
        #     [
@@ -175,7 +173,7 @@ Remember, your responses should be short. Just one or two sentences, usually. Re
        tools,
    )

-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/19b-openai-realtime-text.py
+++ b/examples/foundational/19b-openai-realtime-text.py
@@ -18,8 +18,7 @@ from pipecat.frames.frames import LLMRunFrame, TranscriptionMessage
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.processors.transcript_processor import TranscriptProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -170,20 +169,20 @@ Remember, your responses should be short. Just one or two sentences, usually. Re
    # Create a standard OpenAI LLM context object using the normal messages format. The
    # OpenAIRealtimeLLMService will convert this internally to messages that the
    # openai WebSocket API can understand.
-    context = LLMContext(
+    context = OpenAILLMContext(
        [{"role": "user", "content": "Say hello!"}],
        tools,
    )

-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
            transport.input(),  # Transport user input
            context_aggregator.user(),
-            transcript.user(),  # LLM pushes TranscriptionFrames upstream
            llm,  # LLM
            tts,  # TTS
+            transcript.user(),  # Placed after the LLM, as LLM pushes TranscriptionFrames downstream
            transport.output(),  # Transport bot output
            transcript.assistant(),  # After the transcript output, to time with the audio output
            context_aggregator.assistant(),
--- a/examples/foundational/20b-persistent-context-openai-realtime.py
+++ b/examples/foundational/20b-persistent-context-openai-realtime.py
@@ -13,15 +13,14 @@ from datetime import datetime
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import (
+    OpenAILLMContext,
+)
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -70,11 +69,11 @@ async def save_conversation(params: FunctionCallParams):
    timestamp = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
    filename = f"{BASE_FILENAME}{timestamp}.json"
    logger.debug(
-        f"writing conversation to {filename}\n{json.dumps(params.context.get_messages(), indent=4)}"
+        f"writing conversation to {filename}\n{json.dumps(params.context.messages, indent=4)}"
    )
    try:
        with open(filename, "w") as file:
-            messages = params.context.get_messages()
+            messages = params.context.get_messages_for_persistent_storage()
            # remove the last message, which is the instruction we just gave to save the conversation
            messages.pop()
            json.dump(messages, file, indent=2)
@@ -91,10 +90,6 @@ async def load_conversation(params: FunctionCallParams):
            with open(filename, "r") as file:
                params.context.set_messages(json.load(file))
                await params.llm.reset_conversation()
-                # NOTE: we manually create a response here rather than relying
-                # on the function callback to trigger one since we've reset the
-                # conversation so the remote service doesn't know about the
-                # in-progress tool call.
                await params.llm._create_response()
        except Exception as e:
            await params.result_callback({"success": False, "error": str(e)})
@@ -102,12 +97,14 @@ async def load_conversation(params: FunctionCallParams):
    asyncio.create_task(_reset())


-tools = ToolsSchema(
-    standard_tools=[
-        FunctionSchema(
-            name="get_current_weather",
-            description="Get the current weather",
-            properties={
+tools = [
+    {
+        "type": "function",
+        "name": "get_current_weather",
+        "description": "Get the current weather",
+        "parameters": {
+            "type": "object",
+            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
@@ -118,33 +115,45 @@ tools = ToolsSchema(
                    "description": "The temperature unit to use. Infer this from the users location.",
                },
            },
-            required=["location", "format"],
-        ),
-        FunctionSchema(
-            name="save_conversation",
-            description="Save the current conversatione. Use this function to persist the current conversation to external storage.",
-            properties={},
-            required=[],
-        ),
-        FunctionSchema(
-            name="get_saved_conversation_filenames",
-            description="Get a list of saved conversation histories. Returns a list of filenames. Each filename includes a date and timestamp. Each file is conversation history that can be loaded into this session.",
-            properties={},
-            required=[],
-        ),
-        FunctionSchema(
-            name="load_conversation",
-            description="Load a conversation history. Use this function to load a conversation history into the current session.",
-            properties={
+            "required": ["location", "format"],
+        },
+    },
+    {
+        "type": "function",
+        "name": "save_conversation",
+        "description": "Save the current conversatione. Use this function to persist the current conversation to external storage.",
+        "parameters": {
+            "type": "object",
+            "properties": {},
+            "required": [],
+        },
+    },
+    {
+        "type": "function",
+        "name": "get_saved_conversation_filenames",
+        "description": "Get a list of saved conversation histories. Returns a list of filenames. Each filename includes a date and timestamp. Each file is conversation history that can be loaded into this session.",
+        "parameters": {
+            "type": "object",
+            "properties": {},
+            "required": [],
+        },
+    },
+    {
+        "type": "function",
+        "name": "load_conversation",
+        "description": "Load a conversation history. Use this function to load a conversation history into the current session.",
+        "parameters": {
+            "type": "object",
+            "properties": {
                "filename": {
                    "type": "string",
                    "description": "The filename of the conversation history to load.",
                }
            },
-            required=["filename"],
-        ),
-    ]
-)
+            "required": ["filename"],
+        },
+    },
+]


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -215,8 +224,8 @@ Remember, your responses should be short. Just one or two sentences, usually."""
    llm.register_function("get_saved_conversation_filenames", get_saved_conversation_filenames)
    llm.register_function("load_conversation", load_conversation)

-    context = LLMContext([{"role": "user", "content": "Say hello!"}], tools)
-    context_aggregator = LLMContextAggregatorPair(context)
+    context = OpenAILLMContext([], tools)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/20c-persistent-context-anthropic.py
+++ b/examples/foundational/20c-persistent-context-anthropic.py
@@ -72,6 +72,7 @@ async def save_conversation(params: FunctionCallParams):
    )
    try:
        with open(filename, "w") as file:
+            # todo: extract 'system' into the first message in the list
            messages = params.context.get_messages()
            # remove the last message, which is the instruction we just gave to save the conversation
            messages.pop()
--- a/examples/foundational/20d-persistent-context-gemini.py
+++ b/examples/foundational/20d-persistent-context-gemini.py
@@ -90,6 +90,7 @@ async def save_conversation(params: FunctionCallParams):
    )
    try:
        with open(filename, "w") as file:
+            # todo: extract 'system' into the first message in the list
            messages = params.context.get_messages()
            # remove the last message (the instruction to save the context)
            messages.pop()
--- a/examples/foundational/20e-persistent-context-aws-nova-sonic.py
+++ b/examples/foundational/20e-persistent-context-aws-nova-sonic.py
@@ -20,8 +20,6 @@ from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -77,7 +75,7 @@ async def save_conversation(params: FunctionCallParams):
    filename = f"{BASE_FILENAME}{timestamp}.json"
    try:
        with open(filename, "w") as file:
-            messages = params.context.get_messages()
+            messages = params.context.get_messages_for_persistent_storage()
            # remove the last few messages. in reverse order, they are:
            # - the in progress save tool call
            # - the invocation of the save tool call
@@ -225,13 +223,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm.register_function("get_saved_conversation_filenames", get_saved_conversation_filenames)
    llm.register_function("load_conversation", load_conversation)

-    context = LLMContext(
+    context = OpenAILLMContext(
        messages=[
            {"role": "system", "content": f"{system_instruction}"},
        ],
        tools=tools,
    )
-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/26a-gemini-live-transcription.py
+++ b/examples/foundational/26a-gemini-live-transcription.py
@@ -16,9 +16,7 @@ from pipecat.frames.frames import LLMRunFrame, TranscriptionMessage
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.processors.transcript_processor import TranscriptProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -74,7 +72,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        # inference_on_context_initialization=False,
    )

-    context = LLMContext(
+    context = OpenAILLMContext(
        [
            {
                "role": "user",
@@ -92,7 +90,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            #     },
        ],
    )
-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    transcript = TranscriptProcessor()

--- a/examples/foundational/26b-gemini-live-function-calling.py
+++ b/examples/foundational/26b-gemini-live-function-calling.py
@@ -19,9 +19,7 @@ from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -141,18 +139,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm.register_function("get_current_weather", fetch_weather_from_api)
    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)

-    # You can provide the system instructions and tools in the context rather
-    # than as arguments to GeminiLiveLLMService, but note that doing so will
-    # trigger a (fast) reconnection when the GeminiLiveLLMService first
-    # receives the context (i.e. when we send the LLMRunFrame below).
-    context = LLMContext(
-        [
-            # {"role": "system", "content": system_instruction},
-            {"role": "user", "content": "Say hello."},
-        ],
-        # tools,
+    context = OpenAILLMContext(
+        [{"role": "user", "content": "Say hello."}],
    )
-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/26c-gemini-live-video.py
+++ b/examples/foundational/26c-gemini-live-video.py
@@ -17,9 +17,7 @@ from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
    create_transport,
@@ -67,7 +65,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        # inference_on_context_initialization=False,
    )

-    context = LLMContext(
+    context = OpenAILLMContext(
        [
            {
                "role": "user",
@@ -75,7 +73,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            },
        ],
    )
-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/26d-gemini-live-text.py
+++ b/examples/foundational/26d-gemini-live-text.py
@@ -16,8 +16,7 @@ from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -110,8 +109,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    # Set up conversation context and management
    # The context_aggregator will automatically collect conversation context
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
+    context = OpenAILLMContext(messages)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/26e-gemini-live-google-search.py
+++ b/examples/foundational/26e-gemini-live-google-search.py
@@ -16,9 +16,7 @@ from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -92,7 +90,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        tools=tools,
    )

-    context = LLMContext(
+    context = OpenAILLMContext(
        [
            {
                "role": "user",
@@ -100,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            }
        ],
    )
-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/26f-gemini-live-files-api.py
+++ b/examples/foundational/26f-gemini-live-files-api.py
@@ -16,9 +16,7 @@ from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -131,7 +129,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        mime_type = "text/plain"

        # Create context with file reference
-        context = LLMContext(
+        context = OpenAILLMContext(
            [
                {
                    "role": "user",
@@ -154,7 +152,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    except Exception as e:
        logger.error(f"Error uploading file: {e}")
        # Continue with a basic context if file upload fails
-        context = LLMContext(
+        context = OpenAILLMContext(
            [
                {
                    "role": "user",
@@ -164,7 +162,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        )

    # Create context aggregator
-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    # Build the pipeline
    pipeline = Pipeline(
--- a/examples/foundational/26g-gemini-live-groundingMetadata.py
+++ b/examples/foundational/26g-gemini-live-groundingMetadata.py
@@ -10,9 +10,7 @@ from pipecat.frames.frames import Frame, LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -126,8 +124,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    ]

    # Set up conversation context and management
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
+    context = OpenAILLMContext(messages)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/26h-gemini-live-vertex-function-calling.py
+++ b/examples/foundational/26h-gemini-live-vertex-function-calling.py
@@ -9,21 +9,21 @@ import os
 from datetime import datetime

 from dotenv import load_dotenv
+from google.genai.types import HttpOptions
 from loguru import logger

 from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.audio.vad.vad_analyzer import VADParams
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
+from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
 from pipecat.services.google.gemini_live.llm_vertex import GeminiLiveVertexLLMService
 from pipecat.services.llm_service import FunctionCallParams
 from pipecat.transports.base_transport import BaseTransport, TransportParams
@@ -139,8 +139,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm.register_function("get_current_weather", fetch_weather_from_api)
    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)

-    context = LLMContext([{"role": "user", "content": "Say hello."}])
-    context_aggregator = LLMContextAggregatorPair(context)
+    context = OpenAILLMContext(
+        [{"role": "user", "content": "Say hello."}],
+    )
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/26i-gemini-live-graceful-end.py
+++ b/examples/foundational/26i-gemini-live-graceful-end.py
@@ -18,9 +18,7 @@ from pipecat.frames.frames import EndTaskFrame, LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.processors.frame_processor import FrameDirection
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
@@ -64,7 +62,7 @@ You have three tools available to you:

 After you've responded to the user three times, do two things, in order:
 1. Politely let them know that that's all the time you have today and say goodbye.
-2. *WITHOUT WAITING FOR THE USER TO RESPOND*, call the end_conversation tool to gracefully end the conversation.
+2. Call the end_conversation tool to gracefully end the conversation.
 """


@@ -154,10 +152,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
    llm.register_function("end_conversation", end_conversation)

-    context = LLMContext(
+    context = OpenAILLMContext(
        [{"role": "user", "content": "Say hello."}],
    )
-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/27-simli-layer.py
+++ b/examples/foundational/27-simli-layer.py
@@ -9,6 +9,7 @@ import os

 from dotenv import load_dotenv
 from loguru import logger
+from simli import SimliConfig

 from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
 from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
@@ -65,12 +66,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",
+        voice_id="a167e0f3-df7e-4d52-a9c3-f949145efdab",
    )

    simli_ai = SimliVideoService(
-        api_key=os.getenv("SIMLI_API_KEY"),
-        face_id="cace3ef7-a4c4-425d-a8cf-a5358eb0c427",
+        SimliConfig(os.getenv("SIMLI_API_KEY"), os.getenv("SIMLI_FACE_ID")),
    )

    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o-mini")
--- a/examples/foundational/40-aws-nova-sonic.py
+++ b/examples/foundational/40-aws-nova-sonic.py
@@ -18,8 +18,7 @@ from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.aws.nova_sonic.llm import AWSNovaSonicLLMService
@@ -120,7 +119,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm.register_function("get_current_weather", fetch_weather_from_api)

    # Set up context and context management.
-    context = LLMContext(
+    # AWSNovaSonicService will adapt OpenAI LLM context objects with standard message format to
+    # what's expected by Nova Sonic.
+    context = OpenAILLMContext(
        messages=[
            {"role": "system", "content": f"{system_instruction}"},
            {
@@ -130,7 +131,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        ],
        tools=tools,
    )
-    context_aggregator = LLMContextAggregatorPair(context)
+    context_aggregator = llm.create_context_aggregator(context)

    # Build the pipeline
    pipeline = Pipeline(
--- a/examples/foundational/46-video-processing.py
+++ b/examples/foundational/46-video-processing.py
@@ -15,9 +15,7 @@ from pipecat.frames.frames import Frame, InputImageRawFrame, LLMRunFrame, Output
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response import LLMAssistantAggregatorParams
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.processors.frameworks.rtvi import RTVIObserver, RTVIProcessor
 from pipecat.runner.types import RunnerArguments
@@ -110,8 +108,8 @@ async def run_bot(pipecat_transport):
        }
    ]

-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
+    context = OpenAILLMContext(messages)
+    context_aggregator = llm.create_context_aggregator(context)

    # RTVI events for Pipecat client UI
    rtvi = RTVIProcessor()
--- a/examples/foundational/47-sentry-metrics.py
+++ b/examples/foundational/47-sentry-metrics.py
@@ -1,142 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import os
-
-import sentry_sdk
-from dotenv import load_dotenv
-from loguru import logger
-
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.processors.metrics.sentry import SentryMetrics
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-
-load_dotenv(override=True)
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "twilio": lambda: FastAPIWebsocketParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    # Initialize Sentry
-    sentry_sdk.init(
-        dsn=os.getenv("SENTRY_DSN"),
-        traces_sample_rate=1.0,
-    )
-
-    stt = DeepgramSTTService(
-        api_key=os.getenv("DEEPGRAM_API_KEY"),
-        metrics=SentryMetrics(),
-    )
-
-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-        metrics=SentryMetrics(),
-    )
-
-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        metrics=SentryMetrics(),
-    )
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
-        await task.queue_frames([LLMRunFrame()])
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/48-service-switcher.py
+++ b/examples/foundational/48-service-switcher.py
@@ -1,153 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import os
-
-from dotenv import load_dotenv
-from loguru import logger
-
-from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
-from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
-from pipecat.frames.frames import LLMRunFrame, ManuallySwitchServiceFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.service_switcher import ServiceSwitcher, ServiceSwitcherStrategyManual
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
-from pipecat.services.cartesia.stt import CartesiaSTTService
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.deepgram.tts import DeepgramTTSService
-from pipecat.services.google.llm import GoogleLLMService
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-
-load_dotenv(override=True)
-
-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "twilio": lambda: FastAPIWebsocketParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
-        turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    logger.info(f"Starting bot")
-
-    stt_cartesia = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
-    stt_deepgram = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
-    stt_switcher = ServiceSwitcher(
-        services=[stt_cartesia, stt_deepgram], strategy_type=ServiceSwitcherStrategyManual
-    )
-
-    tts_cartesia = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",
-    )
-    tts_deepgram = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"))
-    tts_switcher = ServiceSwitcher(
-        services=[tts_cartesia, tts_deepgram], strategy_type=ServiceSwitcherStrategyManual
-    )
-
-    llm_openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-    llm_google = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
-    llm_switcher = ServiceSwitcher(
-        services=[llm_openai, llm_google], strategy_type=ServiceSwitcherStrategyManual
-    )
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt_switcher,
-            context_aggregator.user(),  # User responses
-            llm_switcher,  # LLM
-            tts_switcher,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
-        await task.queue_frames([LLMRunFrame()])
-        await asyncio.sleep(15)
-        print(f"Switching to {stt_deepgram}")
-        await task.queue_frames([ManuallySwitchServiceFrame(service=stt_deepgram)])
-        await asyncio.sleep(15)
-        print(f"Switching to {llm_google}")
-        await task.queue_frames([ManuallySwitchServiceFrame(service=llm_google)])
-        await asyncio.sleep(15)
-        print(f"Switching to {tts_deepgram}")
-        await task.queue_frames([ManuallySwitchServiceFrame(service=tts_deepgram)])
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/assets/cat.jpg
+++ b/examples/foundational/assets/cat.jpg
--- a/examples/foundational/assets/moondream.png
+++ b/examples/foundational/assets/moondream.png
--- a/examples/quickstart/README.md
+++ b/examples/quickstart/README.md
@@ -73,13 +73,13 @@ Transform your local bot into a production-ready service. Pipecat Cloud handles

 1. [Sign up for Pipecat Cloud](https://pipecat.daily.co/sign-up).

-2. Install the Pipecat CLI:
+2. Install the Pipecat Cloud CLI:

   ```bash
-   uv tool install pipecat-ai-cli
+   uv add pipecatcloud
   ```

-> 💡 Tip: You can run the `pipecat` CLI using the `pc` alias.
+> 💡 Tip: You can run the `pipecatcloud` CLI using the `pcc` alias.

 3. Set up Docker for building your bot image:

@@ -113,22 +113,12 @@ secret_set = "quickstart-secrets"

 > 💡 Tip: [Set up `image_credentials`](https://docs.pipecat.ai/deployment/pipecat-cloud/fundamentals/secrets#image-pull-secrets) in your TOML file for authenticated image pulls

-### Log in to Pipecat Cloud
-
-To start using the CLI, authenticate to Pipecat Cloud:
-
-```bash
-pipecat cloud auth login
-```
-
-You'll be presented with a link that you can click to authenticate your client.
-
 ### Configure secrets

 Upload your API keys to Pipecat Cloud's secure storage:

 ```bash
-pipecat cloud secrets set quickstart-secrets --file .env
+uv run pcc secrets set quickstart-secrets --file .env
 ```

 This creates a secret set called `quickstart-secrets` (matching your TOML file) and uploads all your API keys from `.env`.
@@ -138,13 +128,13 @@ This creates a secret set called `quickstart-secrets` (matching your TOML file)
 Build your Docker image and push to Docker Hub:

 ```bash
-pipecat cloud docker build-push
+uv run pcc docker build-push
 ```

 Deploy to Pipecat Cloud:

 ```bash
-pipecat cloud deploy
+uv run pcc deploy
 ```

 ### Connect to your agent
--- a/examples/quickstart/pcc-deploy.toml
+++ b/examples/quickstart/pcc-deploy.toml
@@ -1,11 +1,6 @@
 agent_name = "quickstart"
 image = "your_username/quickstart:0.1"
 secret_set = "quickstart-secrets"
-agent_profile = "agent-1x"
-
-# RECOMMENDED: Set an image pull secret:
-# https://docs.pipecat.ai/deployment/pipecat-cloud/fundamentals/secrets#image-pull-secrets
-# image_credentials = "your_image_pull_secret"

 [scaling]
 	min_agents = 1
--- a/examples/quickstart/pyproject.toml
+++ b/examples/quickstart/pyproject.toml
@@ -4,14 +4,13 @@ version = "0.1.0"
 description = "Quickstart example for building voice AI bots with Pipecat"
 requires-python = ">=3.10"
 dependencies = [
-    "pipecat-ai[webrtc,daily,silero,deepgram,openai,cartesia,local-smart-turn-v3,runner]",
-    "pipecat-ai-cli"
+    "pipecat-ai[webrtc,daily,silero,deepgram,openai,cartesia,local-smart-turn-v3,runner]>=0.0.86",
+    "pipecatcloud>=0.2.4"
 ]

 [dependency-groups]
 dev = [
-    "pyright>=1.1.404,<2",
-    "ruff>=0.12.11,<1",
+    "ruff~=0.12.1",
 ]

 [tool.ruff]
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -34,7 +34,7 @@ dependencies = [
    "pyloudnorm~=0.1.1",
    "resampy~=0.4.3",
    "soxr~=0.5.0",
-    "openai>=1.74.0,<3",
+    "openai>=1.74.0,<=1.99.1",
    # Pinning numba to resolve package dependencies
    "numba==0.61.2",
    "wait_for2>=0.4.1; python_version<'3.12'",
@@ -50,12 +50,12 @@ anthropic = [ "anthropic~=0.49.0" ]
 assemblyai = [ "pipecat-ai[websockets-base]" ]
 asyncai = [ "pipecat-ai[websockets-base]" ]
 aws = [ "aioboto3~=15.0.0", "pipecat-ai[websockets-base]" ]
-aws-nova-sonic = [ "aws_sdk_bedrock_runtime~=0.1.1; python_version>='3.12'" ]
+aws-nova-sonic = [ "aws_sdk_bedrock_runtime~=0.1.0; python_version>='3.12'" ]
 azure = [ "azure-cognitiveservices-speech~=1.42.0"]
 cartesia = [ "cartesia~=2.0.3", "pipecat-ai[websockets-base]" ]
 cerebras = []
 deepseek = []
-daily = [ "daily-python~=0.21.0" ]
+daily = [ "daily-python~=0.19.9" ]
 deepgram = [ "deepgram-sdk~=4.7.0" ]
 elevenlabs = [ "pipecat-ai[websockets-base]" ]
 fal = [ "fal-client~=0.5.9" ]
@@ -84,7 +84,7 @@ nim = []
 neuphonic = [ "pipecat-ai[websockets-base]" ]
 noisereduce = [ "noisereduce~=3.0.3" ]
 openai = [ "pipecat-ai[websockets-base]" ]
-openpipe = [ "openpipe>=4.50.0,<6" ]
+openpipe = [ "openpipe~=4.50.0" ]
 openrouter = []
 perplexity = []
 playht = [ "pipecat-ai[websockets-base]" ]
@@ -93,7 +93,7 @@ rime = [ "pipecat-ai[websockets-base]" ]
 riva = [ "nvidia-riva-client~=2.21.1" ]
 runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<0.117.0", "pipecat-ai-small-webrtc-prebuilt>=1.0.0"]
 sambanova = []
-sarvam = [ "sarvamai==0.1.21", "pipecat-ai[websockets-base]" ]
+sarvam = [ "pipecat-ai[websockets-base]" ]
 sentry = [ "sentry-sdk>=2.28.0,<3" ]
 local-smart-turn = [ "coremltools>=8.0", "transformers", "torch>=2.5.0,<3", "torchaudio>=2.5.0,<3" ]
 local-smart-turn-v3 = [ "transformers", "onnxruntime>=1.20.1,<2" ]
@@ -102,7 +102,7 @@ silero = [ "onnxruntime>=1.20.1,<2" ]
 simli = [ "simli-ai~=0.1.10"]
 soniox = [ "pipecat-ai[websockets-base]" ]
 soundfile = [ "soundfile~=0.13.0" ]
-speechmatics = [ "speechmatics-rt>=0.5.0" ]
+speechmatics = [ "speechmatics-rt>=0.4.0" ]
 strands = [ "strands-agents>=1.9.1,<2" ]
 tavus=[]
 together = []
--- a/scripts/evals/eval.py
+++ b/scripts/evals/eval.py
@@ -10,10 +10,9 @@ import os
 import re
 import time
 import wave
-from dataclasses import dataclass
 from datetime import datetime
 from pathlib import Path
-from typing import Any, List, Optional, Tuple
+from typing import List, Optional, Tuple

 import aiofiles
 from deepgram import LiveOptions
@@ -54,14 +53,6 @@ EVAL_TIMEOUT_SECS = 120
 EvalPrompt = str | Tuple[str, ImageFile]


-@dataclass
-class EvalConfig:
-    prompt: EvalPrompt
-    eval: str
-    eval_speaks_first: bool = False
-    runner_args_body: Optional[Any] = None
-
-
 class EvalRunner:
    def __init__(
        self,
@@ -102,7 +93,9 @@ class EvalRunner:
    async def run_eval(
        self,
        example_file: str,
-        eval_config: EvalConfig,
+        prompt: EvalPrompt,
+        eval: str,
+        user_speaks_first: bool = False,
    ):
        if not re.match(self._pattern, example_file):
            return
@@ -119,8 +112,10 @@ class EvalRunner:

        try:
            tasks = [
-                asyncio.create_task(run_example_pipeline(script_path, eval_config)),
-                asyncio.create_task(run_eval_pipeline(self, example_file, eval_config)),
+                asyncio.create_task(run_example_pipeline(script_path)),
+                asyncio.create_task(
+                    run_eval_pipeline(self, example_file, prompt, eval, user_speaks_first)
+                ),
            ]
            _, pending = await asyncio.wait(tasks, timeout=EVAL_TIMEOUT_SECS)
            if pending:
@@ -182,7 +177,7 @@ class EvalRunner:
        return os.path.join(self._recordings_dir, f"{base_name}.wav")


-async def run_example_pipeline(script_path: Path, eval_config: EvalConfig):
+async def run_example_pipeline(script_path: Path):
    room_url = os.getenv("DAILY_SAMPLE_ROOM_URL")

    module = load_module_from_path(script_path)
@@ -201,7 +196,6 @@ async def run_example_pipeline(script_path: Path, eval_config: EvalConfig):

    runner_args = RunnerArguments()
    runner_args.pipeline_idle_timeout_secs = PIPELINE_IDLE_TIMEOUT_SECS
-    runner_args.body = eval_config.runner_args_body

    await module.run_bot(transport, runner_args)

@@ -209,7 +203,9 @@ async def run_example_pipeline(script_path: Path, eval_config: EvalConfig):
 async def run_eval_pipeline(
    eval_runner: EvalRunner,
    example_file: str,
-    eval_config: EvalConfig,
+    prompt: EvalPrompt,
+    eval: str,
+    user_speaks_first: bool = False,
 ):
    logger.info(f"Starting eval bot")

@@ -266,16 +262,17 @@ async def run_eval_pipeline(
    # Load example prompt depending on image.
    example_prompt = ""
    example_image: Optional[ImageFile] = None
-    if isinstance(eval_config.prompt, str):
-        example_prompt = eval_config.prompt
-    elif isinstance(eval_config.prompt, tuple):
-        example_prompt, example_image = eval_config.prompt
+    if isinstance(prompt, str):
+        example_prompt = prompt
+    elif isinstance(prompt, tuple):
+        example_prompt, example_image = prompt

+    eval_prompt = f"The answer is correct if it matches: {eval}."
    common_system_prompt = (
        "The user might say things other than the answer and that's allowed. "
-        f"You should only call the eval function when the user: {eval_config.eval}"
+        f"You should only call the eval function with your assessment when the user actually answers the question. {eval_prompt}"
    )
-    if eval_config.eval_speaks_first:
+    if user_speaks_first:
        system_prompt = f"You are an LLM eval, be extremly brief. You will start the conversation by saying: '{example_prompt}'. {common_system_prompt}"
    else:
        system_prompt = f"You are an LLM eval, be extremly brief. Your goal is to first ask one question: {example_prompt}. {common_system_prompt}"
@@ -333,9 +330,9 @@ async def run_eval_pipeline(

        # Default behavior is for the bot to speak first
        # If the eval bot speaks first, we append the prompt to the messages
-        if eval_config.eval_speaks_first:
+        if user_speaks_first:
            messages.append(
-                {"role": "user", "content": f"Start by saying this exactly: '{eval_config.prompt}'"}
+                {"role": "user", "content": f"Start by saying this exactly: '{prompt}'"}
            )
            await task.queue_frames([LLMRunFrame()])

--- a/scripts/evals/run-release-evals.py
+++ b/scripts/evals/run-release-evals.py
@@ -11,7 +11,7 @@ from datetime import datetime, timezone
 from pathlib import Path

 from dotenv import load_dotenv
-from eval import EvalConfig, EvalRunner
+from eval import EvalRunner
 from loguru import logger
 from PIL import Image
 from utils import check_env_variables
@@ -24,184 +24,188 @@ ASSETS_DIR = SCRIPT_DIR / "assets"

 FOUNDATIONAL_DIR = SCRIPT_DIR.parent.parent / "examples" / "foundational"

-EVAL_SIMPLE_MATH = EvalConfig(
-    prompt="A simple math addition.",
-    eval="The user answers the math addition correctly.",
+# Speaking order constants
+USER_SPEAKS_FIRST = True
+BOT_SPEAKS_FIRST = False
+
+# Math
+PROMPT_SIMPLE_MATH = "A simple math addition."
+EVAL_SIMPLE_MATH = "Correct math addition."
+
+# Weather
+PROMPT_WEATHER = "What's the weather in San Francisco?"
+EVAL_WEATHER = (
+    "Something specific about the current weather in San Francisco, including the degrees."
 )

-EVAL_WEATHER = EvalConfig(
-    prompt="What's the weather in San Francisco?",
-    eval="The user says something specific about the current weather in San Francisco, including the degrees.",
-)
+# Online search
+PROMPT_ONLINE_SEARCH = "What's the date right now in London?"
+EVAL_ONLINE_SEARCH = f"Today is {datetime.now(timezone.utc).strftime('%B %d, %Y')}."

-EVAL_ONLINE_SEARCH = EvalConfig(
-    prompt="What's the date right now in London?",
-    eval=f"The user says today is {datetime.now(timezone.utc).strftime('%B %d, %Y')} in London.",
-)
+# Switch language
+PROMPT_SWITCH_LANGUAGE = "Say something in Spanish."
+EVAL_SWITCH_LANGUAGE = "The user is now talking in Spanish."

-EVAL_SWITCH_LANGUAGE = EvalConfig(
-    prompt="Say something in Spanish.",
-    eval="The user talks in Spanish.",
-)
-
-EVAL_VISION_CAMERA = EvalConfig(
-    prompt=("Briefly describe what you see.", Image.open(ASSETS_DIR / "cat.jpg")),
-    eval="The user provides a cat description.",
-)
-
-
-def EVAL_VISION_IMAGE(*, eval_speaks_first: bool = False):
-    return EvalConfig(
-        prompt="Briefly describe this image.",
-        eval="The user provides a cat description.",
-        eval_speaks_first=eval_speaks_first,
-        runner_args_body={
-            "image_path": ASSETS_DIR / "cat.jpg",
-            "question": "Briefly describe this image.",
-        },
-    )
-
-
-EVAL_VOICEMAIL = EvalConfig(
-    prompt="Please leave a message.",
-    eval="The user leaves a voicemail message.",
-    eval_speaks_first=True,
-)
-
-EVAL_CONVERSATION = EvalConfig(
-    prompt="Hello, this is Mark.",
-    eval="The user replies with a greeting.",
-    eval_speaks_first=True,
-)
+# Vision
+PROMPT_VISION = ("What do you see?", Image.open(ASSETS_DIR / "cat.jpg"))
+EVAL_VISION = "A cat description."

+# Voicemail
+PROMPT_VOICEMAIL = "Please leave a message after the beep."
+EVAL_VOICEMAIL = "Assess the conversation and determine if it is a voicemail."
+PROMPT_CONVERSATION = "Hello, this is Mark."
+EVAL_CONVERSATION = "A start of a conversation, not a voicemail."

 TESTS_07 = [
    # 07 series
-    ("07-interruptible.py", EVAL_SIMPLE_MATH),
-    ("07-interruptible-cartesia-http.py", EVAL_SIMPLE_MATH),
-    ("07a-interruptible-speechmatics.py", EVAL_SIMPLE_MATH),
-    ("07aa-interruptible-soniox.py", EVAL_SIMPLE_MATH),
-    ("07ab-interruptible-inworld-http.py", EVAL_SIMPLE_MATH),
-    ("07ac-interruptible-asyncai.py", EVAL_SIMPLE_MATH),
-    ("07ac-interruptible-asyncai-http.py", EVAL_SIMPLE_MATH),
-    ("07b-interruptible-langchain.py", EVAL_SIMPLE_MATH),
-    ("07c-interruptible-deepgram.py", EVAL_SIMPLE_MATH),
-    ("07c-interruptible-deepgram-flux.py", EVAL_SIMPLE_MATH),
-    ("07c-interruptible-deepgram-http.py", EVAL_SIMPLE_MATH),
-    ("07d-interruptible-elevenlabs.py", EVAL_SIMPLE_MATH),
-    ("07d-interruptible-elevenlabs-http.py", EVAL_SIMPLE_MATH),
-    ("07f-interruptible-azure.py", EVAL_SIMPLE_MATH),
-    ("07g-interruptible-openai.py", EVAL_SIMPLE_MATH),
-    ("07h-interruptible-openpipe.py", EVAL_SIMPLE_MATH),
-    ("07j-interruptible-gladia.py", EVAL_SIMPLE_MATH),
-    ("07k-interruptible-lmnt.py", EVAL_SIMPLE_MATH),
-    ("07l-interruptible-groq.py", EVAL_SIMPLE_MATH),
-    ("07m-interruptible-aws.py", EVAL_SIMPLE_MATH),
-    ("07m-interruptible-aws-strands.py", EVAL_WEATHER),
-    ("07n-interruptible-gemini.py", EVAL_SIMPLE_MATH),
-    ("07n-interruptible-google.py", EVAL_SIMPLE_MATH),
-    ("07o-interruptible-assemblyai.py", EVAL_SIMPLE_MATH),
-    ("07q-interruptible-rime.py", EVAL_SIMPLE_MATH),
-    ("07q-interruptible-rime-http.py", EVAL_SIMPLE_MATH),
-    ("07r-interruptible-riva-nim.py", EVAL_SIMPLE_MATH),
-    ("07s-interruptible-google-audio-in.py", EVAL_SIMPLE_MATH),
-    ("07t-interruptible-fish.py", EVAL_SIMPLE_MATH),
-    ("07v-interruptible-neuphonic.py", EVAL_SIMPLE_MATH),
-    ("07v-interruptible-neuphonic-http.py", EVAL_SIMPLE_MATH),
-    ("07w-interruptible-fal.py", EVAL_SIMPLE_MATH),
-    ("07y-interruptible-minimax.py", EVAL_SIMPLE_MATH),
-    ("07z-interruptible-sarvam.py", EVAL_SIMPLE_MATH),
-    ("07ae-interruptible-hume.py", EVAL_SIMPLE_MATH),
+    ("07-interruptible.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07-interruptible-cartesia-http.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07a-interruptible-speechmatics.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07aa-interruptible-soniox.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07ab-interruptible-inworld-http.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07ac-interruptible-asyncai.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07ac-interruptible-asyncai-http.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07b-interruptible-langchain.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07c-interruptible-deepgram.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07c-interruptible-deepgram-flux.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07d-interruptible-elevenlabs.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    (
+        "07d-interruptible-elevenlabs-http.py",
+        PROMPT_SIMPLE_MATH,
+        EVAL_SIMPLE_MATH,
+        BOT_SPEAKS_FIRST,
+    ),
+    ("07f-interruptible-azure.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07g-interruptible-openai.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07h-interruptible-openpipe.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07j-interruptible-gladia.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07k-interruptible-lmnt.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07l-interruptible-groq.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07m-interruptible-aws.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07m-interruptible-aws-strands.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("07n-interruptible-gemini.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07n-interruptible-google.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07o-interruptible-assemblyai.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07q-interruptible-rime.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07q-interruptible-rime-http.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07r-interruptible-riva-nim.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    (
+        "07s-interruptible-google-audio-in.py",
+        PROMPT_SIMPLE_MATH,
+        EVAL_SIMPLE_MATH,
+        BOT_SPEAKS_FIRST,
+    ),
+    ("07t-interruptible-fish.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07v-interruptible-neuphonic.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07v-interruptible-neuphonic-http.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07w-interruptible-fal.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07y-interruptible-minimax.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07z-interruptible-sarvam.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    ("07ae-interruptible-hume.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
    # Needs a local XTTS docker instance running.
-    # ("07i-interruptible-xtts.py", EVAL_SIMPLE_MATH),
+    # ("07i-interruptible-xtts.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
    # Needs a Krisp license.
-    # ("07p-interruptible-krisp.py", EVAL_SIMPLE_MATH),
+    # ("07p-interruptible-krisp.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
    # Needs GPU resources.
-    # ("07u-interruptible-ultravox.py", EVAL_SIMPLE_MATH),
+    # ("07u-interruptible-ultravox.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
 ]

 TESTS_12 = [
-    ("12-describe-image-openai.py", EVAL_VISION_IMAGE(eval_speaks_first=True)),
-    ("12a-describe-image-anthropic.py", EVAL_VISION_IMAGE(eval_speaks_first=True)),
-    ("12b-describe-image-aws.py", EVAL_VISION_IMAGE(eval_speaks_first=True)),
-    ("12c-describe-image-gemini-flash.py", EVAL_VISION_IMAGE(eval_speaks_first=True)),
-    ("12d-describe-image-moondream.py", EVAL_VISION_IMAGE()),
+    ("12-describe-video.py", PROMPT_VISION, EVAL_VISION, BOT_SPEAKS_FIRST),
+    ("12a-describe-video-gemini-flash.py", PROMPT_VISION, EVAL_VISION, BOT_SPEAKS_FIRST),
+    ("12b-describe-video-gpt-4o.py", PROMPT_VISION, EVAL_VISION, BOT_SPEAKS_FIRST),
+    ("12c-describe-video-anthropic.py", PROMPT_VISION, EVAL_VISION, BOT_SPEAKS_FIRST),
 ]

 TESTS_14 = [
-    ("14-function-calling.py", EVAL_WEATHER),
-    ("14a-function-calling-anthropic.py", EVAL_WEATHER),
-    ("14e-function-calling-google.py", EVAL_WEATHER),
-    ("14f-function-calling-groq.py", EVAL_WEATHER),
-    ("14g-function-calling-grok.py", EVAL_WEATHER),
-    ("14h-function-calling-azure.py", EVAL_WEATHER),
-    ("14i-function-calling-fireworks.py", EVAL_WEATHER),
-    ("14j-function-calling-nim.py", EVAL_WEATHER),
-    ("14k-function-calling-cerebras.py", EVAL_WEATHER),
-    ("14m-function-calling-openrouter.py", EVAL_WEATHER),
-    ("14n-function-calling-perplexity.py", EVAL_WEATHER),
-    ("14p-function-calling-gemini-vertex-ai.py", EVAL_WEATHER),
-    ("14q-function-calling-qwen.py", EVAL_WEATHER),
-    ("14r-function-calling-aws.py", EVAL_WEATHER),
-    ("14v-function-calling-openai.py", EVAL_WEATHER),
-    ("14w-function-calling-mistral.py", EVAL_WEATHER),
-    ("14x-function-calling-openpipe.py", EVAL_WEATHER),
-    # Video
-    ("14d-function-calling-anthropic-video.py", EVAL_VISION_CAMERA),
-    ("14d-function-calling-aws-video.py", EVAL_VISION_CAMERA),
-    ("14d-function-calling-gemini-flash-video.py", EVAL_VISION_CAMERA),
-    ("14d-function-calling-moondream-video.py", EVAL_VISION_CAMERA),
-    ("14d-function-calling-openai-video.py", EVAL_VISION_CAMERA),
+    ("14-function-calling.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14a-function-calling-anthropic.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14b-function-calling-anthropic-video.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14d-function-calling-video.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14e-function-calling-google.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14f-function-calling-groq.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14g-function-calling-grok.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14h-function-calling-azure.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14i-function-calling-fireworks.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14j-function-calling-nim.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14k-function-calling-cerebras.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14m-function-calling-openrouter.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14n-function-calling-perplexity.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14p-function-calling-gemini-vertex-ai.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14q-function-calling-qwen.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14r-function-calling-aws.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14v-function-calling-openai.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("14w-function-calling-mistral.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
    # Currently not working.
-    # ("14c-function-calling-together.py", EVAL_WEATHER),
-    # ("14l-function-calling-deepseek.py", EVAL_WEATHER),
-    # ("14o-function-calling-gemini-openai-format.py", EVAL_WEATHER),
+    # ("14c-function-calling-together.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    # ("14l-function-calling-deepseek.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    # ("14o-function-calling-gemini-openai-format.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
 ]

 TESTS_15 = [
-    ("15a-switch-languages.py", EVAL_SWITCH_LANGUAGE),
+    ("15a-switch-languages.py", PROMPT_SWITCH_LANGUAGE, EVAL_SWITCH_LANGUAGE, BOT_SPEAKS_FIRST),
 ]

 TESTS_19 = [
-    ("19-openai-realtime.py", EVAL_WEATHER),
-    ("19-openai-realtime-beta.py", EVAL_WEATHER),
+    ("19-openai-realtime.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("19-openai-realtime-beta.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
    # OpenAI Realtime not released on Azure yet
-    # ("19a-azure-realtime.py", EVAL_WEATHER),
-    ("19a-azure-realtime-beta.py", EVAL_WEATHER),
-    ("19b-openai-realtime-text.py", EVAL_WEATHER),
-    ("19b-openai-realtime-beta-text.py", EVAL_WEATHER),
+    # ("19a-azure-realtime.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("19a-azure-realtime-beta.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("19b-openai-realtime-text.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
+    ("19b-openai-realtime-beta-text.py", PROMPT_WEATHER, EVAL_WEATHER, BOT_SPEAKS_FIRST),
 ]

 TESTS_21 = [
-    ("21a-tavus-video-service.py", EVAL_SIMPLE_MATH),
+    ("21a-tavus-video-service.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
 ]

 TESTS_26 = [
-    ("26-gemini-live.py", EVAL_SIMPLE_MATH),
-    ("26a-gemini-live-transcription.py", EVAL_SIMPLE_MATH),
-    ("26b-gemini-live-function-calling.py", EVAL_WEATHER),
-    ("26c-gemini-live-video.py", EVAL_SIMPLE_MATH),
-    ("26e-gemini-live-google-search.py", EVAL_ONLINE_SEARCH),
-    ("26h-gemini-live-vertex-function-calling.py", EVAL_WEATHER),
+    ("26-gemini-multimodal-live.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    (
+        "26a-gemini-live-transcription.py",
+        PROMPT_SIMPLE_MATH,
+        EVAL_SIMPLE_MATH,
+        BOT_SPEAKS_FIRST,
+    ),
+    (
+        "26b-gemini-live-function-calling.py",
+        PROMPT_WEATHER,
+        EVAL_WEATHER,
+        BOT_SPEAKS_FIRST,
+    ),
+    ("26c-gemini-live-video.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    (
+        "26e-gemini-multimodal-google-search.py",
+        PROMPT_ONLINE_SEARCH,
+        EVAL_ONLINE_SEARCH,
+        BOT_SPEAKS_FIRST,
+    ),
    # Currently not working.
-    # ("26d-gemini-live-text.py", EVAL_SIMPLE_MATH),
+    # ("26d-gemini-live-text.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
+    (
+        "26h-gemini-live-vertex-function-calling.py",
+        PROMPT_WEATHER,
+        EVAL_WEATHER,
+        BOT_SPEAKS_FIRST,
+    ),
 ]

 TESTS_27 = [
-    ("27-simli-layer.py", EVAL_SIMPLE_MATH),
+    ("27-simli-layer.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
 ]

 TESTS_40 = [
-    ("40-aws-nova-sonic.py", EVAL_SIMPLE_MATH),
+    ("40-aws-nova-sonic.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
 ]

 TESTS_43 = [
-    ("43a-heygen-video-service.py", EVAL_SIMPLE_MATH),
+    ("43a-heygen-video-service.py", PROMPT_SIMPLE_MATH, EVAL_SIMPLE_MATH, BOT_SPEAKS_FIRST),
 ]

 TESTS_44 = [
-    ("44-voicemail-detection.py", EVAL_VOICEMAIL),
-    ("44-voicemail-detection.py", EVAL_CONVERSATION),
+    ("44-voicemail-detection.py", PROMPT_VOICEMAIL, EVAL_VOICEMAIL, USER_SPEAKS_FIRST),
+    ("44-voicemail-detection.py", PROMPT_CONVERSATION, EVAL_CONVERSATION, USER_SPEAKS_FIRST),
 ]

 TESTS = [
@@ -239,9 +243,9 @@ async def main(args: argparse.Namespace):

    # Parse test config: (test, prompt, eval, user_speaks_first)
    for test_config in TESTS:
-        test, eval_config = test_config
+        test, prompt, eval, user_speaks_first = test_config

-        await runner.run_eval(test, eval_config)
+        await runner.run_eval(test, prompt, eval, user_speaks_first)

    runner.print_results()

--- a/src/pipecat/adapters/schemas/tools_schema.py
+++ b/src/pipecat/adapters/schemas/tools_schema.py
@@ -22,12 +22,9 @@ class AdapterType(Enum):

    Parameters:
        GEMINI: Google Gemini adapter - currently the only service supporting custom tools.
-        SHIM: Backward compatibility shim for creating ToolsSchemas from lists of tools in
-              any format, used by LLMContext.from_openai_context.
    """

    GEMINI = "gemini"  # that is the only service where we are able to add custom tools for now
-    SHIM = "shim"  # for use as backward compatibility shim for creating ToolsSchemas from list of tools in any format


 class ToolsSchema:
--- a/src/pipecat/adapters/services/anthropic_adapter.py
+++ b/src/pipecat/adapters/services/anthropic_adapter.py
@@ -110,7 +110,7 @@ class AnthropicLLMAdapter(BaseLLMAdapter[AnthropicLLMInvocationParams]):
        system = NOT_GIVEN
        messages = []

-        # First, map messages using self._from_universal_context_message(m)
+        # first, map messages using self._from_universal_context_message(m)
        try:
            messages = [self._from_universal_context_message(m) for m in universal_context_messages]
        except Exception as e:
@@ -245,25 +245,13 @@ class AnthropicLLMAdapter(BaseLLMAdapter[AnthropicLLMInvocationParams]):
                    item["text"] = "(empty)"
                # handle image_url -> image conversion
                if item["type"] == "image_url":
-                    if item["image_url"]["url"].startswith("data:"):
-                        item["type"] = "image"
-                        item["source"] = {
-                            "type": "base64",
-                            "media_type": "image/jpeg",
-                            "data": item["image_url"]["url"].split(",")[1],
-                        }
-                        del item["image_url"]
-                    elif item["image_url"]["url"].startswith("http"):
-                        item["type"] = "image"
-                        item["source"] = {
-                            "type": "url",
-                            "url": item["image_url"]["url"],
-                        }
-                        del item["image_url"]
-                    else:
-                        url = item["image_url"]["url"]
-                        logger.warning(f"Unsupported 'image_url': {url}")
-
+                    item["type"] = "image"
+                    item["source"] = {
+                        "type": "base64",
+                        "media_type": "image/jpeg",
+                        "data": item["image_url"]["url"].split(",")[1],
+                    }
+                    del item["image_url"]
            # In the case where there's a single image in the list (like what
            # would result from a UserImageRawFrame), ensure that the image
            # comes before text, as recommended by Anthropic docs
--- a/src/pipecat/adapters/services/aws_nova_sonic_adapter.py
+++ b/src/pipecat/adapters/services/aws_nova_sonic_adapter.py
@@ -6,47 +6,13 @@

 """AWS Nova Sonic LLM adapter for Pipecat."""

-import copy
 import json
-from dataclasses import dataclass
-from enum import Enum
-from typing import Any, Dict, List, Optional, TypedDict
-
-from loguru import logger
+from typing import Any, Dict, List, TypedDict

 from pipecat.adapters.base_llm_adapter import BaseLLMAdapter
 from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
-from pipecat.processors.aggregators.llm_context import LLMContext, LLMContextMessage
-
-
-class Role(Enum):
-    """Roles supported in AWS Nova Sonic conversations.
-
-    Parameters:
-        SYSTEM: System-level messages (not used in conversation history).
-        USER: Messages sent by the user.
-        ASSISTANT: Messages sent by the assistant.
-        TOOL: Messages sent by tools (not used in conversation history).
-    """
-
-    SYSTEM = "SYSTEM"
-    USER = "USER"
-    ASSISTANT = "ASSISTANT"
-    TOOL = "TOOL"
-
-
-@dataclass
-class AWSNovaSonicConversationHistoryMessage:
-    """A single message in AWS Nova Sonic conversation history.
-
-    Parameters:
-        role: The role of the message sender (USER or ASSISTANT only).
-        text: The text content of the message.
-    """
-
-    role: Role  # only USER and ASSISTANT
-    text: str
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.processors.aggregators.llm_context import LLMContext


 class AWSNovaSonicLLMInvocationParams(TypedDict):
@@ -55,9 +21,7 @@ class AWSNovaSonicLLMInvocationParams(TypedDict):
    This is a placeholder until support for universal LLMContext machinery is added for AWS Nova Sonic.
    """

-    system_instruction: Optional[str]
-    messages: List[AWSNovaSonicConversationHistoryMessage]
-    tools: List[Dict[str, Any]]
+    pass


 class AWSNovaSonicLLMAdapter(BaseLLMAdapter[AWSNovaSonicLLMInvocationParams]):
@@ -70,7 +34,7 @@ class AWSNovaSonicLLMAdapter(BaseLLMAdapter[AWSNovaSonicLLMInvocationParams]):
    @property
    def id_for_llm_specific_messages(self) -> str:
        """Get the identifier used in LLMSpecificMessage instances for AWS Nova Sonic."""
-        return "aws-nova-sonic"
+        raise NotImplementedError("Universal LLMContext is not yet supported for AWS Nova Sonic.")

    def get_llm_invocation_params(self, context: LLMContext) -> AWSNovaSonicLLMInvocationParams:
        """Get AWS Nova Sonic-specific LLM invocation parameters from a universal LLM context.
@@ -83,13 +47,7 @@ class AWSNovaSonicLLMAdapter(BaseLLMAdapter[AWSNovaSonicLLMInvocationParams]):
        Returns:
            Dictionary of parameters for invoking AWS Nova Sonic's LLM API.
        """
-        messages = self._from_universal_context_messages(self.get_messages(context))
-        return {
-            "system_instruction": messages.system_instruction,
-            "messages": messages.messages,
-            # NOTE: LLMContext's tools are guaranteed to be a ToolsSchema (or NOT_GIVEN)
-            "tools": self.from_standard_tools(context.tools) or [],
-        }
+        raise NotImplementedError("Universal LLMContext is not yet supported for AWS Nova Sonic.")

    def get_messages_for_logging(self, context) -> List[Dict[str, Any]]:
        """Get messages from a universal LLM context in a format ready for logging about AWS Nova Sonic.
@@ -104,75 +62,7 @@ class AWSNovaSonicLLMAdapter(BaseLLMAdapter[AWSNovaSonicLLMInvocationParams]):
        Returns:
            List of messages in a format ready for logging about AWS Nova Sonic.
        """
-        return self._from_universal_context_messages(self.get_messages(context)).messages
-
-    @dataclass
-    class ConvertedMessages:
-        """Container for Google-formatted messages converted from universal context."""
-
-        messages: List[AWSNovaSonicConversationHistoryMessage]
-        system_instruction: Optional[str] = None
-
-    def _from_universal_context_messages(
-        self, universal_context_messages: List[LLMContextMessage]
-    ) -> ConvertedMessages:
-        system_instruction = None
-        messages = []
-
-        # Bail if there are no messages
-        if not universal_context_messages:
-            return self.ConvertedMessages()
-
-        universal_context_messages = copy.deepcopy(universal_context_messages)
-
-        # If we have a "system" message as our first message, let's pull that out into "instruction"
-        if universal_context_messages[0].get("role") == "system":
-            system = universal_context_messages.pop(0)
-            content = system.get("content")
-            if isinstance(content, str):
-                system_instruction = content
-            elif isinstance(content, list):
-                system_instruction = content[0].get("text")
-            if system_instruction:
-                self._system_instruction = system_instruction
-
-        # Process remaining messages to fill out conversation history.
-        # Nova Sonic supports "user" and "assistant" messages in history.
-        for universal_context_message in universal_context_messages:
-            message = self._from_universal_context_message(universal_context_message)
-            if message:
-                messages.append(message)
-
-        return self.ConvertedMessages(messages=messages, system_instruction=system_instruction)
-
-    def _from_universal_context_message(self, message) -> AWSNovaSonicConversationHistoryMessage:
-        """Convert standard message format to Nova Sonic format.
-
-        Args:
-            message: Standard message dictionary to convert.
-
-        Returns:
-            Nova Sonic conversation history message, or None if not convertible.
-        """
-        role = message.get("role")
-        if message.get("role") == "user" or message.get("role") == "assistant":
-            content = message.get("content")
-            if isinstance(message.get("content"), list):
-                content = ""
-                for c in message.get("content"):
-                    if c.get("type") == "text":
-                        content += " " + c.get("text")
-                    else:
-                        logger.error(
-                            f"Unhandled content type in context message: {c.get('type')} - {message}"
-                        )
-            # There won't be content if this is an assistant tool call entry.
-            # We're ignoring those since they can't be loaded into AWS Nova Sonic conversation
-            # history
-            if content:
-                return AWSNovaSonicConversationHistoryMessage(role=Role[role.upper()], text=content)
-        # NOTE: we're ignoring messages with role "tool" since they can't be loaded into AWS Nova
-        # Sonic conversation history
+        raise NotImplementedError("Universal LLMContext is not yet supported for AWS Nova Sonic.")

    @staticmethod
    def _to_aws_nova_sonic_function_format(function: FunctionSchema) -> Dict[str, Any]:
@@ -210,18 +100,4 @@ class AWSNovaSonicLLMAdapter(BaseLLMAdapter[AWSNovaSonicLLMInvocationParams]):
            List of dictionaries in AWS Nova Sonic function format.
        """
        functions_schema = tools_schema.standard_tools
-        standard_tools = [
-            self._to_aws_nova_sonic_function_format(func) for func in functions_schema
-        ]
-
-        # For backward compatibility, AWS Nova Sonic can still be used with
-        # tools in dict format, even though it always uses `LLMContext` under
-        # the hood (via `LLMContext.from_openai_context()`).
-        # To support this behavior, we use "shimmed" custom tools here.
-        # (We maintain this backward compatibility because users aren't
-        # *knowingly* opting into the new `LLMContext`.)
-        shimmed_tools = []
-        if tools_schema.custom_tools:
-            shimmed_tools = tools_schema.custom_tools.get(AdapterType.SHIM, [])
-
-        return standard_tools + shimmed_tools
+        return [self._to_aws_nova_sonic_function_format(func) for func in functions_schema]
--- a/src/pipecat/adapters/services/bedrock_adapter.py
+++ b/src/pipecat/adapters/services/bedrock_adapter.py
@@ -107,7 +107,7 @@ class AWSBedrockLLMAdapter(BaseLLMAdapter[AWSBedrockLLMInvocationParams]):
        system = None
        messages = []

-        # First, map messages using self._from_universal_context_message(m)
+        # first, map messages using self._from_universal_context_message(m)
        try:
            messages = [self._from_universal_context_message(m) for m in universal_context_messages]
        except Exception as e:
@@ -256,22 +256,15 @@ class AWSBedrockLLMAdapter(BaseLLMAdapter[AWSBedrockLLMInvocationParams]):
                    new_content.append({"text": text_content})
                # handle image_url -> image conversion
                if item["type"] == "image_url":
-                    if item["image_url"]["url"].startswith("data:"):
-                        new_item = {
-                            "image": {
-                                "format": "jpeg",
-                                "source": {
-                                    "bytes": base64.b64decode(
-                                        item["image_url"]["url"].split(",")[1]
-                                    )
-                                },
-                            }
+                    new_item = {
+                        "image": {
+                            "format": "jpeg",
+                            "source": {
+                                "bytes": base64.b64decode(item["image_url"]["url"].split(",")[1])
+                            },
                        }
-                        new_content.append(new_item)
-                    else:
-                        url = item["image_url"]["url"]
-                        logger.warning(f"Unsupported 'image_url': {url}")
-
+                    }
+                    new_content.append(new_item)
            # In the case where there's a single image in the list (like what
            # would result from a UserImageRawFrame), ensure that the image
            # comes before text
--- a/src/pipecat/adapters/services/gemini_adapter.py
+++ b/src/pipecat/adapters/services/gemini_adapter.py
@@ -8,8 +8,8 @@

 import base64
 import json
-from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional, Tuple, TypedDict
+from dataclasses import dataclass
+from typing import Any, Dict, List, Optional, TypedDict

 from loguru import logger
 from openai import NotGiven
@@ -24,7 +24,13 @@ from pipecat.processors.aggregators.llm_context import (
 )

 try:
-    from google.genai.types import Blob, Content, FileData, FunctionCall, FunctionResponse, Part
+    from google.genai.types import (
+        Blob,
+        Content,
+        FunctionCall,
+        FunctionResponse,
+        Part,
+    )
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
@@ -127,28 +133,6 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
        messages: List[Content]
        system_instruction: Optional[str] = None

-    @dataclass
-    class MessageConversionResult:
-        """Result of converting a single universal context message to Google format.
-
-        Either content (a Google Content object) or a system instruction string
-        is guaranteed to be set.
-
-        Also returns a tool call ID to name mapping for any tool calls
-        discovered in the message.
-        """
-
-        content: Optional[Content] = None
-        system_instruction: Optional[str] = None
-        tool_call_id_to_name_mapping: Dict[str, str] = field(default_factory=dict)
-
-    @dataclass
-    class MessageConversionParams:
-        """Parameters for converting a single universal context message to Google format."""
-
-        already_have_system_instruction: bool
-        tool_call_id_to_name_mapping: Dict[str, str]
-
    def _from_universal_context_messages(
        self, universal_context_messages: List[LLMContextMessage]
    ) -> ConvertedMessages:
@@ -172,26 +156,24 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
        """
        system_instruction = None
        messages = []
-        tool_call_id_to_name_mapping = {}

        # Process each message, preserving Google-formatted messages and converting others
        for message in universal_context_messages:
-            result = self._from_universal_context_message(
-                message,
-                params=self.MessageConversionParams(
-                    already_have_system_instruction=bool(system_instruction),
-                    tool_call_id_to_name_mapping=tool_call_id_to_name_mapping,
-                ),
-            )
-            # Each result is either a Content or a system instruction
-            if result.content:
-                messages.append(result.content)
-            elif result.system_instruction:
-                system_instruction = result.system_instruction
+            if isinstance(message, LLMSpecificMessage):
+                # Assume that LLMSpecificMessage wraps a message in Google format
+                messages.append(message.message)
+                continue

-            # Merge tool call ID to name mapping
-            if result.tool_call_id_to_name_mapping:
-                tool_call_id_to_name_mapping.update(result.tool_call_id_to_name_mapping)
+            # Convert standard format to Google format
+            converted = self._from_standard_message(
+                message, already_have_system_instruction=bool(system_instruction)
+            )
+            if isinstance(converted, Content):
+                # Regular (non-system) message
+                messages.append(converted)
+            else:
+                # System instruction
+                system_instruction = converted

        # Check if we only have function-related messages (no regular text)
        has_regular_messages = any(
@@ -211,16 +193,9 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):

        return self.ConvertedMessages(messages=messages, system_instruction=system_instruction)

-    def _from_universal_context_message(
-        self, message: LLMContextMessage, *, params: MessageConversionParams
-    ) -> MessageConversionResult:
-        if isinstance(message, LLMSpecificMessage):
-            return self.MessageConversionResult(content=message.message)
-        return self._from_standard_message(message, params=params)
-
    def _from_standard_message(
-        self, message: LLMStandardMessage, *, params: MessageConversionParams
-    ) -> MessageConversionResult:
+        self, message: LLMStandardMessage, already_have_system_instruction: bool
+    ) -> Content | str:
        """Convert standard universal context message to Google Content object.

        Handles conversion of text, images, and function calls to Google's
@@ -230,11 +205,10 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
        Args:
            message: Message in standard universal context format.
            already_have_system_instruction: Whether we already have a system instruction
-            params: Parameters for conversion.

        Returns:
-            MessageConversionResult containing either a Content object or a
-            system instruction string.
+            Content object with role and parts, or a plain string for system
+            messages.

        Examples:
            Standard text message::
@@ -268,49 +242,38 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
            Converts to Google Content with::

                Content(
-                    role="user",
+                    role="model",
                    parts=[Part(function_call=FunctionCall(name="search", args={"query": "test"}))]
                )
        """
        role = message["role"]
        content = message.get("content", [])
-
        if role == "system":
-            if params.already_have_system_instruction:
+            if already_have_system_instruction:
                role = "user"  # Convert system message to user role if we already have a system instruction
            else:
-                system_instruction: str = None
+                # System instructions are returned as plain text
                if isinstance(content, str):
-                    system_instruction = content
+                    return content
                elif isinstance(content, list):
                    # If content is a list, we assume it's a list of text parts, per the standard
-                    system_instruction = " ".join(
-                        part["text"] for part in content if part.get("type") == "text"
-                    )
-                if system_instruction:
-                    return self.MessageConversionResult(system_instruction=system_instruction)
+                    return " ".join(part["text"] for part in content if part.get("type") == "text")
        elif role == "assistant":
            role = "model"

        parts = []
-        tool_call_id_to_name_mapping = {}
-
        if message.get("tool_calls"):
            for tc in message["tool_calls"]:
-                id = tc["id"]
-                name = tc["function"]["name"]
-                tool_call_id_to_name_mapping[id] = name
                parts.append(
                    Part(
                        function_call=FunctionCall(
-                            id=id,
-                            name=name,
+                            name=tc["function"]["name"],
                            args=json.loads(tc["function"]["arguments"]),
                        )
                    )
                )
        elif role == "tool":
-            role = "user"
+            role = "model"
            try:
                response = json.loads(message["content"])
                if isinstance(response, dict):
@@ -321,18 +284,10 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
                # Response might not be JSON-deserializable.
                # This occurs with a UserImageFrame, for example, where we get a plain "COMPLETED" string.
                response_dict = {"value": message["content"]}
-
-            # Get function name from mapping using tool_call_id, or fallback
-            tool_call_id = message.get("tool_call_id")
-            function_name = "tool_call_result"  # Default fallback
-            if tool_call_id and tool_call_id in params.tool_call_id_to_name_mapping:
-                function_name = params.tool_call_id_to_name_mapping[tool_call_id]
-
            parts.append(
                Part(
                    function_response=FunctionResponse(
-                        id=tool_call_id,
-                        name=function_name,
+                        name="tool_call_result",  # seems to work to hard-code the same name every time
                        response=response_dict,
                    )
                )
@@ -343,7 +298,7 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
            for c in content:
                if c["type"] == "text":
                    parts.append(Part(text=c["text"]))
-                elif c["type"] == "image_url" and c["image_url"]["url"].startswith("data:"):
+                elif c["type"] == "image_url":
                    parts.append(
                        Part(
                            inline_data=Blob(
@@ -352,25 +307,9 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
                            )
                        )
                    )
-                elif c["type"] == "image_url":
-                    url = c["image_url"]["url"]
-                    logger.warning(f"Unsupported 'image_url': {url}")
                elif c["type"] == "input_audio":
                    input_audio = c["input_audio"]
                    audio_bytes = base64.b64decode(input_audio["data"])
                    parts.append(Part(inline_data=Blob(mime_type="audio/wav", data=audio_bytes)))
-                elif c["type"] == "file_data":
-                    file_data = c["file_data"]
-                    parts.append(
-                        Part(
-                            file_data=FileData(
-                                mime_type=file_data.get("mime_type"),
-                                file_uri=file_data.get("file_uri"),
-                            )
-                        )
-                    )

-        return self.MessageConversionResult(
-            content=Content(role=role, parts=parts),
-            tool_call_id_to_name_mapping=tool_call_id_to_name_mapping,
-        )
+        return Content(role=role, parts=parts)
--- a/src/pipecat/adapters/services/open_ai_realtime_adapter.py
+++ b/src/pipecat/adapters/services/open_ai_realtime_adapter.py
@@ -6,18 +6,12 @@

 """OpenAI Realtime LLM adapter for Pipecat."""

-import copy
-import json
-from dataclasses import dataclass
-from typing import Any, Dict, List, Optional, TypedDict
-
-from loguru import logger
+from typing import Any, Dict, List, TypedDict

 from pipecat.adapters.base_llm_adapter import BaseLLMAdapter
 from pipecat.adapters.schemas.function_schema import FunctionSchema
-from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
-from pipecat.processors.aggregators.llm_context import LLMContext, LLMContextMessage
-from pipecat.services.openai.realtime import events
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.processors.aggregators.llm_context import LLMContext


 class OpenAIRealtimeLLMInvocationParams(TypedDict):
@@ -26,9 +20,7 @@ class OpenAIRealtimeLLMInvocationParams(TypedDict):
    This is a placeholder until support for universal LLMContext machinery is added for OpenAI Realtime.
    """

-    system_instruction: Optional[str]
-    messages: List[events.ConversationItem]
-    tools: List[Dict[str, Any]]
+    pass


 class OpenAIRealtimeLLMAdapter(BaseLLMAdapter):
@@ -41,7 +33,7 @@ class OpenAIRealtimeLLMAdapter(BaseLLMAdapter):
    @property
    def id_for_llm_specific_messages(self) -> str:
        """Get the identifier used in LLMSpecificMessage instances for OpenAI Realtime."""
-        return "openai-realtime"
+        raise NotImplementedError("Universal LLMContext is not yet supported for OpenAI Realtime.")

    def get_llm_invocation_params(self, context: LLMContext) -> OpenAIRealtimeLLMInvocationParams:
        """Get OpenAI Realtime-specific LLM invocation parameters from a universal LLM context.
@@ -54,13 +46,7 @@ class OpenAIRealtimeLLMAdapter(BaseLLMAdapter):
        Returns:
            Dictionary of parameters for invoking OpenAI Realtime's API.
        """
-        messages = self._from_universal_context_messages(self.get_messages(context))
-        return {
-            "system_instruction": messages.system_instruction,
-            "messages": messages.messages,
-            # NOTE: LLMContext's tools are guaranteed to be a ToolsSchema (or NOT_GIVEN)
-            "tools": self.from_standard_tools(context.tools) or [],
-        }
+        raise NotImplementedError("Universal LLMContext is not yet supported for OpenAI Realtime.")

    def get_messages_for_logging(self, context) -> List[Dict[str, Any]]:
        """Get messages from a universal LLM context in a format ready for logging about OpenAI Realtime.
@@ -75,124 +61,7 @@ class OpenAIRealtimeLLMAdapter(BaseLLMAdapter):
        Returns:
            List of messages in a format ready for logging about OpenAI Realtime.
        """
-        # NOTE: this is the same as in OpenAIAdapter, as that's what it was
-        # prior to a refactor. Worth noting that for OpenAI Realtime
-        # specifically, not everything handled here is necessarily supported
-        # (or supported yet).
-        msgs = []
-        for message in self.get_messages(context):
-            msg = copy.deepcopy(message)
-            if "content" in msg:
-                if isinstance(msg["content"], list):
-                    for item in msg["content"]:
-                        if item["type"] == "image_url":
-                            if item["image_url"]["url"].startswith("data:image/"):
-                                item["image_url"]["url"] = "data:image/..."
-                        if item["type"] == "input_audio":
-                            item["input_audio"]["data"] = "..."
-            if "mime_type" in msg and msg["mime_type"].startswith("image/"):
-                msg["data"] = "..."
-            msgs.append(msg)
-        return msgs
-
-    @dataclass
-    class ConvertedMessages:
-        """Container for OpenAI-formatted messages converted from universal context."""
-
-        messages: List[events.ConversationItem]
-        system_instruction: Optional[str] = None
-
-    def _from_universal_context_messages(
-        self, universal_context_messages: List[LLMContextMessage]
-    ) -> ConvertedMessages:
-        # We can't load a long conversation history into the openai realtime api yet. (The API/model
-        # forgets that it can do audio, if you do a series of `conversation.item.create` calls.) So
-        # our general strategy until this is fixed is just to put everything into a first "user"
-        # message as a single input.
-
-        if not universal_context_messages:
-            return self.ConvertedMessages(messages=[])
-
-        messages = copy.deepcopy(universal_context_messages)
-        system_instruction = None
-
-        # If we have a "system" message as our first message, let's pull that out into session
-        # "instructions"
-        if messages[0].get("role") == "system":
-            system = messages.pop(0)
-            content = system.get("content")
-            if isinstance(content, str):
-                system_instruction = content
-            elif isinstance(content, list):
-                system_instruction = content[0].get("text")
-            if not messages:
-                return self.ConvertedMessages(messages=[], system_instruction=system_instruction)
-
-        # If we have just a single "user" item, we can just send it normally
-        if len(messages) == 1 and messages[0].get("role") == "user":
-            return self.ConvertedMessages(
-                messages=[self._from_universal_context_message(messages[0])],
-                system_instruction=system_instruction,
-            )
-
-        # Otherwise, let's pack everything into a single "user" message with a bit of
-        # explanation for the LLM
-        intro_text = """
-        This is a previously saved conversation. Please treat this conversation history as a
-        starting point for the current conversation."""
-
-        trailing_text = """
-        This is the end of the previously saved conversation. Please continue the conversation
-        from here. If the last message is a user instruction or question, act on that instruction
-        or answer the question. If the last message is an assistant response, simple say that you
-        are ready to continue the conversation."""
-
-        return self.ConvertedMessages(
-            messages=[
-                {
-                    "role": "user",
-                    "type": "message",
-                    "content": [
-                        {
-                            "type": "input_text",
-                            "text": "\n\n".join(
-                                [intro_text, json.dumps(messages, indent=2), trailing_text]
-                            ),
-                        }
-                    ],
-                }
-            ],
-            system_instruction=system_instruction,
-        )
-
-    def _from_universal_context_message(
-        self, message: LLMContextMessage
-    ) -> events.ConversationItem:
-        if message.get("role") == "user":
-            content = message.get("content")
-            if isinstance(message.get("content"), list):
-                content = ""
-                for c in message.get("content"):
-                    if c.get("type") == "text":
-                        content += " " + c.get("text")
-                    else:
-                        logger.error(
-                            f"Unhandled content type in context message: {c.get('type')} - {message}"
-                        )
-            return events.ConversationItem(
-                role="user",
-                type="message",
-                content=[events.ItemContent(type="input_text", text=content)],
-            )
-        if message.get("role") == "assistant" and message.get("tool_calls"):
-            tc = message.get("tool_calls")[0]
-            return events.ConversationItem(
-                type="function_call",
-                call_id=tc["id"],
-                name=tc["function"]["name"],
-                arguments=tc["function"]["arguments"],
-            )
-        logger.error(f"Unhandled message type in _from_universal_context_message: {message}")
+        raise NotImplementedError("Universal LLMContext is not yet supported for OpenAI Realtime.")

    @staticmethod
    def _to_openai_realtime_function_format(function: FunctionSchema) -> Dict[str, Any]:
@@ -225,18 +94,4 @@ class OpenAIRealtimeLLMAdapter(BaseLLMAdapter):
            List of function definitions in OpenAI Realtime format.
        """
        functions_schema = tools_schema.standard_tools
-        standard_tools = [
-            self._to_openai_realtime_function_format(func) for func in functions_schema
-        ]
-
-        # For backward compatibility, OpenAI Realtime can still be used with
-        # tools in dict format, even though it always uses `LLMContext` under
-        # the hood (via `LLMContext.from_openai_context()`).
-        # To support this behavior, we use "shimmed" custom tools here.
-        # (We maintain this backward compatibility because users aren't
-        # *knowingly* opting into the new `LLMContext`.)
-        shimmed_tools = []
-        if tools_schema.custom_tools:
-            shimmed_tools = tools_schema.custom_tools.get(AdapterType.SHIM, [])
-
-        return standard_tools + shimmed_tools
+        return [self._to_openai_realtime_function_format(func) for func in functions_schema]
--- a/src/pipecat/frames/frames.py
+++ b/src/pipecat/frames/frames.py
@@ -773,15 +773,9 @@ class CancelFrame(SystemFrame):

    Indicates that a pipeline needs to stop right away without
    processing remaining queued frames.
-
-    Parameters:
-        reason: Optional reason for pushing a cancel frame.
    """

-    reason: Optional[str] = None
-
-    def __str__(self):
-        return f"{self.name}(reason: {self.reason})"
+    pass


@dataclass
@@ -1207,23 +1201,26 @@ class TransportMessageUrgentFrame(OutputTransportMessageUrgentFrame):
 class UserImageRequestFrame(SystemFrame):
    """Frame requesting an image from a specific user.

-    A frame to request an image from the given user. The request might come with
-    a text that can be later used to describe the requested image.
+    A frame to request an image from the given user. The frame might be
+    generated by a function call in which case the corresponding fields will be
+    properly set.

    Parameters:
        user_id: Identifier of the user to request image from.
-        text: An optional text associated to the image request.
-        append_to_context: Whether the requested image should be appended to the LLM context.
+        context: Optional context for the image request.
+        function_name: Name of function that generated this request (if any).
+        tool_call_id: Tool call ID if generated by function call.
        video_source: Specific video source to capture from.
    """

    user_id: str
-    text: Optional[str] = None
-    append_to_context: Optional[bool] = None
+    context: Optional[Any] = None
+    function_name: Optional[str] = None
+    tool_call_id: Optional[str] = None
    video_source: Optional[str] = None

    def __str__(self):
-        return f"{self.name}(user: {self.user_id}, text: {self.text}, append_to_context: {self.append_to_context}, {self.video_source})"
+        return f"{self.name}(user: {self.user_id}, video_source: {self.video_source}, function: {self.function_name}, request: {self.tool_call_id})"


@dataclass
@@ -1297,17 +1294,15 @@ class UserImageRawFrame(InputImageRawFrame):

    Parameters:
        user_id: Identifier of the user who provided this image.
-        text: An optional text associated to this image.
-        append_to_context: Whether the requested image should be appended to the LLM context.
+        request: The original image request frame if this is a response.
    """

    user_id: str = ""
-    text: Optional[str] = None
-    append_to_context: Optional[bool] = None
+    request: Optional[UserImageRequestFrame] = None

    def __str__(self):
        pts = format_pts(self.pts)
-        return f"{self.name}(pts: {pts}, user: {self.user_id}, source: {self.transport_source}, size: {self.size}, format: {self.format}, text: {self.text}, append_to_context: {self.append_to_context})"
+        return f"{self.name}(pts: {pts}, user: {self.user_id}, source: {self.transport_source}, size: {self.size}, format: {self.format}, request: {self.request})"


@dataclass
@@ -1372,15 +1367,9 @@ class EndTaskFrame(TaskFrame):
    This is used to notify the pipeline task that the pipeline should be
    closed nicely (flushing all the queued frames) by pushing an EndFrame
    downstream. This frame should be pushed upstream.
-
-    Parameters:
-        reason: Optional reason for pushing an end frame.
    """

-    reason: Optional[str] = None
-
-    def __str__(self):
-        return f"{self.name}(reason: {self.reason})"
+    pass


@dataclass
@@ -1390,15 +1379,9 @@ class CancelTaskFrame(TaskFrame):
    This is used to notify the pipeline task that the pipeline should be
    stopped immediately by pushing a CancelFrame downstream. This frame
    should be pushed upstream.
-
-    Parameters:
-        reason: Optional reason for pushing a cancel frame.
    """

-    reason: Optional[str] = None
-
-    def __str__(self):
-        return f"{self.name}(reason: {self.reason})"
+    pass


@dataclass
@@ -1469,15 +1452,9 @@ class EndFrame(ControlFrame):
    sending frames to its output channel(s) and close all its threads. Note,
    that this is a control frame, which means it will be received in the order it
    was sent.
-
-    Parameters:
-        reason: Optional reason for pushing an end frame.
    """

-    reason: Optional[str] = None
-
-    def __str__(self):
-        return f"{self.name}(reason: {self.reason})"
+    pass


@dataclass
--- a/src/pipecat/pipeline/llm_switcher.py
+++ b/src/pipecat/pipeline/llm_switcher.py
@@ -14,41 +14,20 @@ from pipecat.services.llm_service import LLMService


 class LLMSwitcher(ServiceSwitcher[StrategyType]):
-    """A pipeline that switches between different LLMs at runtime.
-
-    Example::
-
-        llm_switcher = LLMSwitcher(
-            llms=[openai_llm, anthropic_llm],
-            strategy_type=ServiceSwitcherStrategyManual
-        )
-    """
+    """A pipeline that switches between different LLMs at runtime."""

    def __init__(self, llms: List[LLMService], strategy_type: Type[StrategyType]):
-        """Initialize the service switcher with a list of LLMs and a switching strategy.
-
-        Args:
-            llms: List of LLM services to switch between.
-            strategy_type: The strategy class to use for switching between LLMs.
-        """
+        """Initialize the service switcher with a list of LLMs and a switching strategy."""
        super().__init__(llms, strategy_type)

    @property
    def llms(self) -> List[LLMService]:
-        """Get the list of LLMs managed by this switcher.
-
-        Returns:
-            List of LLM services managed by this switcher.
-        """
+        """Get the list of LLMs managed by this switcher."""
        return self.services

    @property
    def active_llm(self) -> Optional[LLMService]:
-        """Get the currently active LLM.
-
-        Returns:
-            The currently active LLM service, or None if no LLM is active.
-        """
+        """Get the currently active LLM, if any."""
        return self.strategy.active_service

    async def run_inference(self, context: LLMContext) -> Optional[str]:
--- a/src/pipecat/pipeline/pipeline.py
+++ b/src/pipecat/pipeline/pipeline.py
@@ -15,7 +15,6 @@ from typing import Callable, Coroutine, List, Optional

 from pipecat.frames.frames import Frame
 from pipecat.pipeline.base_pipeline import BasePipeline
-from pipecat.pipeline.pipeline_node import PipelineNode
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor, FrameProcessorSetup


@@ -118,7 +117,8 @@ class Pipeline(BasePipeline):
        self._source = source or PipelineSource(self.push_frame, name=f"{self}::Source")
        self._sink = sink or PipelineSink(self.push_frame, name=f"{self}::Sink")
        self._processors: List[FrameProcessor] = [self._source] + processors + [self._sink]
-        self._nodes = self._link_processors()
+
+        self._link_processors()

    #
    # Frame processor
@@ -196,22 +196,17 @@ class Pipeline(BasePipeline):

    async def _setup_processors(self, setup: FrameProcessorSetup):
        """Set up all processors in the pipeline."""
-        for n in self._nodes:
-            await n.setup(setup)
+        for p in self._processors:
+            await p.setup(setup)

    async def _cleanup_processors(self):
        """Clean up all processors in the pipeline."""
-        for n in self._nodes:
-            await n.cleanup()
+        for p in self._processors:
+            await p.cleanup()

-    def _link_processors(self) -> List[PipelineNode]:
-        """Link all processors in sequence."""
-        nodes = []
-        prev_node = PipelineNode(self._processors[0])
-        nodes.append(prev_node)
+    def _link_processors(self):
+        """Link all processors in sequence and set their parent."""
+        prev = self._processors[0]
        for curr in self._processors[1:]:
-            curr_node = PipelineNode(curr)
-            nodes.append(curr_node)
-            prev_node.link(curr_node)
-            prev_node = curr_node
-        return nodes
+            prev.link(curr)
+            prev = curr
--- a/src/pipecat/pipeline/pipeline_node.py
+++ b/src/pipecat/pipeline/pipeline_node.py
@@ -1,140 +0,0 @@
-#
-# Copyright (c) 2024–2025, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-"""This module defines pipeline nodes.
-
-A pipeline node (`PipelineNode`) wraps a frame processor (`FrameProcessor`) and
-can link to previous and next nodes in the pipeline. Pipeline nodes allow
-linking frame processors together with the benefit that stateless frame
-processors can be re-used in different pipelines, since what is linked is the
-actual pipeline node, not the frame processor itself.
-
-"""
-
-import asyncio
-from typing import Optional
-
-from loguru import logger
-
-from pipecat.observers.base_observer import FramePushed
-from pipecat.processors.frame_processor import FrameDirection, FrameProcessor, FrameProcessorSetup
-from pipecat.utils.base_object import BaseObject
-
-
-class PipelineNode(BaseObject):
-    """A node in a pipeline that hosts a frame processor.
-
-    A `PipelineNode` wraps a single `FrameProcessor` and is responsible for
-    connecting it to previous and next nodes in a pipeline. It pushes frames
-    emitted by its processor to the appropriate neighbor based on frame
-    direction (UPSTREAM or DOWNSTREAM).
-    """
-
-    def __init__(self, processor: FrameProcessor):
-        """Initialize the pipeline node with a given FrameProcessor.
-
-        Args:
-            processor: The FrameProcessor instance that this node will host.
-        """
-        super().__init__()
-        self._processor = processor
-
-        self._prev: Optional["PipelineNode"] = None
-        self._next: Optional["PipelineNode"] = None
-
-        self.__push_task: Optional[asyncio.Task] = None
-
-    @property
-    def processor(self) -> FrameProcessor:
-        """Returns the frame processor of this pipeline node."""
-        return self._processor
-
-    @property
-    def next(self) -> Optional["PipelineNode"]:
-        """Get the next pipeline node.
-
-        Returns:
-            The next node, or None if there's no next node.
-        """
-        return self._next
-
-    @property
-    def previous(self) -> Optional["PipelineNode"]:
-        """Get the previous pipeline node.
-
-        Returns:
-            The previous node, or None if there's no previous node.
-        """
-        return self._prev
-
-    async def setup(self, setup: FrameProcessorSetup):
-        """Set up this pipeline node.
-
-        This sets up the wrapped frame processor with required components.
-
-        Args:
-            setup: Configuration object containing setup parameters.
-        """
-        await self.processor.setup(setup)
-        self._clock = setup.clock
-        self._task_manager = setup.task_manager
-        self._observer = setup.observer
-
-        self.__create_push_task()
-
-    async def cleanup(self):
-        """Clean up this pipeline node."""
-        await super().cleanup()
-        await self.processor.cleanup()
-        if self.__push_task:
-            await self.__push_task
-            self.__push_task = None
-
-    def link(self, node: "PipelineNode"):
-        """Link this node to the next node in the pipeline.
-
-        Args:
-            node: The node to link to.
-        """
-        self._next = node
-        node._prev = self
-        logger.debug(f"Linking {self.processor} -> {node.processor}")
-
-    def __create_push_task(self):
-        """Create the frame push task."""
-        if not self.__push_task:
-            self.__push_task = self._task_manager.create_task(
-                self.__push_task_handler(), f"{self.processor}::_push_task"
-            )
-
-    async def __push_task_handler(self):
-        """Push task handler.
-
-        Receive frames from the wrapped frame processor and push them to the
-        next or previous node depending on the direction.
-        """
-        async for frame, direction in self.processor:
-            destination = None
-            if direction == FrameDirection.DOWNSTREAM and self.next:
-                logger.trace(f"Pushing {frame} from {self.processor} to {self.next.processor}")
-                destination = self.next.processor
-            elif direction == FrameDirection.UPSTREAM and self.previous:
-                logger.trace(f"Pushing {frame} upstream from {self} to {self._prev}")
-                destination = self.previous.processor
-
-            if destination:
-                await destination.queue_frame(frame, direction)
-
-            if self._observer and destination:
-                timestamp = self._clock.get_time() if self._clock else 0
-                data = FramePushed(
-                    source=self.processor,
-                    destination=destination,
-                    frame=frame,
-                    direction=direction,
-                    timestamp=timestamp,
-                )
-                await self._observer.on_push_frame(data)
--- a/src/pipecat/pipeline/runner.py
+++ b/src/pipecat/pipeline/runner.py
@@ -70,15 +70,11 @@ class PipelineRunner(BaseObject):
        """
        logger.debug(f"Runner {self} started running {task}")
        self._tasks[task.name] = task
-
-        # PipelineTask handles asyncio.CancelledError to shutdown the pipeline
-        # properly and re-raises it in case there's more cleanup to do.
+        params = PipelineTaskParams(loop=self._loop)
        try:
-            params = PipelineTaskParams(loop=self._loop)
            await task.run(params)
        except asyncio.CancelledError:
-            pass
-
+            await self._cancel()
        del self._tasks[task.name]

        # Cleanup base object.
--- a/src/pipecat/pipeline/service_switcher.py
+++ b/src/pipecat/pipeline/service_switcher.py
@@ -21,22 +21,10 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor


 class ServiceSwitcherStrategy:
-    """Base class for service switching strategies.
-
-    Note:
-        Strategy classes are instantiated internally by ServiceSwitcher.
-        Developers should pass the strategy class (not an instance) to ServiceSwitcher.
-    """
+    """Base class for service switching strategies."""

    def __init__(self, services: List[FrameProcessor]):
-        """Initialize the service switcher strategy with a list of services.
-
-        Note:
-            This is called internally by ServiceSwitcher. Do not instantiate directly.
-
-        Args:
-            services: List of frame processors to switch between.
-        """
+        """Initialize the service switcher strategy with a list of services."""
        self.services = services
        self.active_service: Optional[FrameProcessor] = None

@@ -58,24 +46,10 @@ class ServiceSwitcherStrategyManual(ServiceSwitcherStrategy):

    This strategy allows the user to manually select which service is active.
    The initial active service is the first one in the list.
-
-    Example::
-
-        stt_switcher = ServiceSwitcher(
-            services=[stt_1, stt_2],
-            strategy_type=ServiceSwitcherStrategyManual
-        )
    """

    def __init__(self, services: List[FrameProcessor]):
-        """Initialize the manual service switcher strategy with a list of services.
-
-        Note:
-            This is called internally by ServiceSwitcher. Do not instantiate directly.
-
-        Args:
-            services: List of frame processors to switch between.
-        """
+        """Initialize the manual service switcher strategy with a list of services."""
        super().__init__(services)
        self.active_service = services[0] if services else None

@@ -111,12 +85,7 @@ class ServiceSwitcher(ParallelPipeline, Generic[StrategyType]):
    """A pipeline that switches between different services at runtime."""

    def __init__(self, services: List[FrameProcessor], strategy_type: Type[StrategyType]):
-        """Initialize the service switcher with a list of services and a switching strategy.
-
-        Args:
-            services: List of frame processors to switch between.
-            strategy_type: The strategy class to use for switching between services.
-        """
+        """Initialize the service switcher with a list of services and a switching strategy."""
        strategy = strategy_type(services)
        super().__init__(*self._make_pipeline_definitions(services, strategy))
        self.services = services
@@ -131,20 +100,14 @@ class ServiceSwitcher(ParallelPipeline, Generic[StrategyType]):
            active_service: FrameProcessor,
            direction: FrameDirection,
        ):
-            """Initialize the service switcher filter with a strategy and direction.
-
-            Args:
-                wrapped_service: The service that this filter wraps.
-                active_service: The currently active service.
-                direction: The direction of frame flow to filter.
-            """
-            self._wrapped_service = wrapped_service
-            self._active_service = active_service
+            """Initialize the service switcher filter with a strategy and direction."""

            async def filter(_: Frame) -> bool:
                return self._wrapped_service == self._active_service

-            super().__init__(filter, direction, filter_system_frames=True)
+            super().__init__(filter, direction)
+            self._wrapped_service = wrapped_service
+            self._active_service = active_service

        async def process_frame(self, frame, direction):
            """Process a frame through the filter, handling special internal filter-updating frames."""
--- a/src/pipecat/pipeline/task.py
+++ b/src/pipecat/pipeline/task.py
@@ -12,6 +12,7 @@ including heartbeats, idle detection, and observer integration.
 """

 import asyncio
+import time
 from typing import Any, AsyncIterable, Dict, Iterable, List, Optional, Tuple, Type

 from loguru import logger
@@ -38,7 +39,7 @@ from pipecat.frames.frames import (
    UserSpeakingFrame,
 )
 from pipecat.metrics.metrics import ProcessingMetricsData, TTFBMetricsData
-from pipecat.observers.base_observer import BaseObserver, FramePushed
+from pipecat.observers.base_observer import BaseObserver
 from pipecat.observers.turn_tracking_observer import TurnTrackingObserver
 from pipecat.pipeline.base_task import BasePipelineTask, PipelineTaskParams
 from pipecat.pipeline.pipeline import Pipeline, PipelineSink, PipelineSource
@@ -56,43 +57,6 @@ IDLE_TIMEOUT_SECS = 300
 CANCEL_TIMEOUT_SECS = 20.0


-class IdleFrameObserver(BaseObserver):
-    """Idle timeout observer.
-
-    This observer waits for specific frames being generated in the pipeline. If
-    the frames are generated the given asyncio event is set. If the event is not
-    set it means the pipeline is probably idle.
-
-    """
-
-    def __init__(self, *, idle_event: asyncio.Event, idle_timeout_frames: Tuple[Type[Frame], ...]):
-        """Initialize the observer.
-
-        Args:
-            idle_event: The event to set if the idle timeout frames are being pushed.
-            idle_timeout_frames: A tuple with the frames that should set the event when received
-        """
-        super().__init__()
-        self._idle_event = idle_event
-        self._idle_timeout_frames = idle_timeout_frames
-        self._processed_frames = set()
-
-    async def on_push_frame(self, data: FramePushed):
-        """Callback executed when a frame is pushed in the pipeline.
-
-        Args:
-            data: The frame push event data.
-        """
-        # Skip already processed frames
-        if data.frame.id in self._processed_frames:
-            return
-
-        self._processed_frames.add(data.frame.id)
-
-        if isinstance(data.frame, StartFrame) or isinstance(data.frame, self._idle_timeout_frames):
-            self._idle_event.set()
-
-
 class PipelineParams(BaseModel):
    """Configuration parameters for pipeline execution.

@@ -251,6 +215,7 @@ class PipelineTask(BasePipelineTask):
        self._conversation_id = conversation_id
        self._enable_tracing = enable_tracing and is_tracing_available()
        self._enable_turn_tracking = enable_turn_tracking
+        self._idle_timeout_frames = idle_timeout_frames
        self._idle_timeout_secs = idle_timeout_secs
        if self._params.observers:
            import warnings
@@ -285,24 +250,16 @@ class PipelineTask(BasePipelineTask):
        # This queue is the queue used to push frames to the pipeline.
        self._push_queue = asyncio.Queue()
        self._process_push_task: Optional[asyncio.Task] = None
-
        # This is the heartbeat queue. When a heartbeat frame is received in the
        # down queue we add it to the heartbeat queue for processing.
        self._heartbeat_queue = asyncio.Queue()
        self._heartbeat_push_task: Optional[asyncio.Task] = None
        self._heartbeat_monitor_task: Optional[asyncio.Task] = None
-
-        # This is the idle event. When selected frames are pushed from any
-        # processor we consider the pipeline is not idle. We use an observer
-        # which will be listening any part of the pipeline.
-        self._idle_event = asyncio.Event()
+        # This is the idle queue. When frames are received downstream they are
+        # put in the queue. If no frame is received the pipeline is considered
+        # idle.
+        self._idle_queue = asyncio.Queue()
        self._idle_monitor_task: Optional[asyncio.Task] = None
-        if self._idle_timeout_secs:
-            idle_frame_observer = IdleFrameObserver(
-                idle_event=self._idle_event,
-                idle_timeout_frames=idle_timeout_frames,
-            )
-            observers.append(idle_frame_observer)

        # This event is used to indicate the StartFrame has been received at the
        # end of the pipeline.
@@ -312,9 +269,6 @@ class PipelineTask(BasePipelineTask):
        # StopFrame) has been received at the end of the pipeline.
        self._pipeline_end_event = asyncio.Event()

-        # This event is set when the pipeline truly finishes.
-        self._pipeline_finished_event = asyncio.Event()
-
        # This is the final pipeline. It is composed of a source processor,
        # followed by the user pipeline, and ending with a sink processor. The
        # source allows us to receive and react to upstream frames, and the sink
@@ -446,14 +400,14 @@ class PipelineTask(BasePipelineTask):
        logger.debug(f"Task {self} scheduled to stop when done")
        await self.queue_frame(EndFrame())

-    async def cancel(self, *, reason: Optional[str] = None):
-        """Request the running pipeline to cancel.
+    async def cancel(self):
+        """Immediately stop the running pipeline.

-        Args:
-            reason: Optional reason to indicate why the pipeline is being cancelled.
+        Cancels all running tasks and stops frame processing without
+        waiting for completion.
        """
        if not self._finished:
-            await self._cancel(reason=reason)
+            await self._cancel()

    async def run(self, params: PipelineTaskParams):
        """Start and manage the pipeline execution until completion or cancellation.
@@ -463,38 +417,51 @@ class PipelineTask(BasePipelineTask):
        """
        if self.has_finished():
            return
-
-        # Setup processors.
-        await self._setup(params)
-
-        # Create all main tasks and wait for the main push task. This is the
-        # task that pushes frames to the very beginning of our pipeline (i.e. to
-        # our controlled source processor).
-        await self._create_tasks()
-
+        cleanup_pipeline = True
        try:
-            # Wait for pipeline to finish.
-            await self._wait_for_pipeline_finished()
+            # Setup processors.
+            await self._setup(params)
+
+            # Create all main tasks and wait of the main push task. This is the
+            # task that pushes frames to the very beginning of our pipeline (our
+            # controlled source processor).
+            push_task = await self._create_tasks()
+            await push_task
+
+            # We have already cleaned up the pipeline inside the task.
+            cleanup_pipeline = False
+
+            # Pipeline has finished nicely.
+            self._finished = True
        except asyncio.CancelledError:
-            logger.debug(f"Pipeline task {self} got cancelled from outside...")
-            # We have been cancelled from outside, let's just cancel everything.
-            await self._cancel()
-            # Wait again for pipeline to finish. This time we have really
-            # cancelled, so it should really finish.
-            await self._wait_for_pipeline_finished()
-            # Re-raise in case there's more cleanup to do.
+            # Raise exception back to the pipeline runner so it can cancel this
+            # task properly.
            raise
        finally:
            # We can reach this point for different reasons:
            #
-            # 1. The pipeline task has finished (try case).
-            # 2. By an asyncio task cancellation (except case).
-            logger.debug(f"Pipeline task {self} is finishing...")
-            await self._cancel_tasks()
-            if self._check_dangling_tasks:
-                self._print_dangling_tasks()
-            self._finished = True
-            logger.debug(f"Pipeline task {self} has finished")
+            # 1. The task has finished properly (e.g. `EndFrame`).
+            # 2. By calling `PipelineTask.cancel()`.
+            # 3. By asyncio task cancellation.
+            #
+            # Case (1) will execute the code below without issues because
+            # `self._finished` is true.
+            #
+            # Case (2) will execute the code below without issues because
+            # `self._cancelled` is true.
+            #
+            # Case (3) will raise the exception above (because we are cancelling
+            # the asyncio task). This will be then captured by the
+            # `PipelineRunner` which will call `PipelineTask.cancel()` and
+            # therefore becoming case (2).
+            if self._finished or self._cancelled:
+                logger.debug(f"Pipeline task {self} is finishing cleanup...")
+                await self._cancel_tasks()
+                await self._cleanup(cleanup_pipeline)
+                if self._check_dangling_tasks:
+                    self._print_dangling_tasks()
+                self._finished = True
+                logger.debug(f"Pipeline task {self} has finished")

    async def queue_frame(self, frame: Frame):
        """Queue a single frame to be pushed down the pipeline.
@@ -517,16 +484,24 @@ class PipelineTask(BasePipelineTask):
            for frame in frames:
                await self.queue_frame(frame)

-    async def _cancel(self, *, reason: Optional[str] = None):
-        """Internal cancellation logic for the pipeline task.
-
-        Args:
-            reason: Optional reason to indicate why the pipeline is being cancelled.
-        """
+    async def _cancel(self):
+        """Internal cancellation logic for the pipeline task."""
        if not self._cancelled:
            logger.debug(f"Cancelling pipeline task {self}")
            self._cancelled = True
-            await self.queue_frame(CancelFrame(reason=reason))
+            cancel_frame = CancelFrame()
+            # Make sure everything is cleaned up downstream. This is sent
+            # out-of-band from the main streaming task which is what we want since
+            # we want to cancel right away.
+            await self._pipeline.queue_frame(cancel_frame)
+            # Wait for CancelFrame to make it through the pipeline.
+            await self._wait_for_pipeline_end(cancel_frame)
+            # Only cancel the push task, we don't want to be able to process any
+            # other frame after cancel. Everything else will be cancelled in
+            # run().
+            if self._process_push_task:
+                await self._task_manager.cancel_task(self._process_push_task)
+                self._process_push_task = None

    async def _create_tasks(self):
        """Create and start all pipeline processing tasks."""
@@ -581,7 +556,7 @@ class PipelineTask(BasePipelineTask):

    async def _maybe_cancel_idle_task(self):
        """Cancel idle monitoring task if it is running."""
-        if self._idle_monitor_task:
+        if self._idle_timeout_secs and self._idle_monitor_task:
            await self._task_manager.cancel_task(self._idle_monitor_task)
            self._idle_monitor_task = None

@@ -628,17 +603,6 @@ class PipelineTask(BasePipelineTask):

        self._pipeline_end_event.clear()

-        # We are really done.
-        self._pipeline_finished_event.set()
-
-    async def _wait_for_pipeline_finished(self):
-        await self._pipeline_finished_event.wait()
-        self._pipeline_finished_event.clear()
-        # Make sure we wait for the main task to complete.
-        if self._process_push_task:
-            await self._process_push_task
-            self._process_push_task = None
-
    async def _setup(self, params: PipelineTaskParams):
        """Set up the pipeline task and all processors."""
        mgr_params = TaskManagerParams(loop=params.loop)
@@ -724,11 +688,11 @@ class PipelineTask(BasePipelineTask):
        if isinstance(frame, EndTaskFrame):
            # Tell the task we should end nicely.
            logger.debug(f"{self}: received end task frame {frame}")
-            await self.queue_frame(EndFrame(reason=frame.reason))
+            await self.queue_frame(EndFrame())
        elif isinstance(frame, CancelTaskFrame):
            # Tell the task we should end right away.
            logger.debug(f"{self}: received cancel task frame {frame}")
-            await self.queue_frame(CancelFrame(reason=frame.reason))
+            await self.queue_frame(CancelFrame())
        elif isinstance(frame, StopTaskFrame):
            # Tell the task we should stop nicely.
            logger.debug(f"{self}: received stop task frame {frame}")
@@ -757,6 +721,10 @@ class PipelineTask(BasePipelineTask):
        processors have handled the EndFrame and therefore we can exit the task
        cleanly.
        """
+        # Queue received frame to the idle queue so we can monitor idle
+        # pipelines.
+        await self._idle_queue.put(frame)
+
        if isinstance(frame, self._reached_downstream_types):
            await self._call_event_handler("on_frame_reached_downstream", frame)

@@ -819,10 +787,33 @@ class PipelineTask(BasePipelineTask):
        Note: Heartbeats are excluded from idle detection.
        """
        running = True
+        last_frame_time = 0
+
        while running:
            try:
-                await asyncio.wait_for(self._idle_event.wait(), timeout=self._idle_timeout_secs)
-                self._idle_event.clear()
+                frame = await asyncio.wait_for(
+                    self._idle_queue.get(), timeout=self._idle_timeout_secs
+                )
+
+                if isinstance(frame, StartFrame) or isinstance(frame, self._idle_timeout_frames):
+                    # If we find a StartFrame or one of the frames that prevents a
+                    # time out we update the time.
+                    last_frame_time = time.time()
+                else:
+                    # If we find any other frame we check if the pipeline is
+                    # idle by checking the last time we received one of the
+                    # valid frames.
+                    diff_time = time.time() - last_frame_time
+                    if diff_time >= self._idle_timeout_secs:
+                        running = await self._idle_timeout_detected()
+                        # Reset `last_frame_time` so we don't trigger another
+                        # immediate idle timeout if we are not cancelling. For
+                        # example, we might want to force the bot to say goodbye
+                        # and then clean nicely with an `EndFrame`.
+                        last_frame_time = time.time()
+
+                self._idle_queue.task_done()
+
            except asyncio.TimeoutError:
                running = await self._idle_timeout_detected()

@@ -834,7 +825,7 @@ class PipelineTask(BasePipelineTask):
        """
        # If we are cancelling, just exit the task.
        if self._cancelled:
-            return False
+            return True

        logger.warning("Idle timeout detected.")
        await self._call_event_handler("on_idle_timeout")
--- a/src/pipecat/pipeline/task_observer.py
+++ b/src/pipecat/pipeline/task_observer.py
@@ -129,7 +129,7 @@ class TaskObserver(BaseObserver):
        for proxy in self._proxies:
            await proxy.cleanup()

-    async def on_process_frame(self, data: FrameProcessed):
+    async def on_process_frame(self, data: FramePushed):
        """Queue frame data for all managed observers.

        Args:
@@ -189,7 +189,7 @@ class TaskObserver(BaseObserver):
            if isinstance(data, FramePushed):
                if on_push_frame_deprecated:
                    await observer.on_push_frame(
-                        data.source, data.destination, data.frame, data.direction, data.timestamp
+                        data.src, data.dst, data.frame, data.direction, data.timestamp
                    )
                else:
                    await observer.on_push_frame(data)
--- a/src/pipecat/processors/aggregators/llm_context.py
+++ b/src/pipecat/processors/aggregators/llm_context.py
@@ -16,9 +16,8 @@ service-specific adapter.

 import base64
 import io
-import wave
 from dataclasses import dataclass
-from typing import TYPE_CHECKING, Any, List, Optional, TypeAlias, Union
+from typing import Any, List, Optional, TypeAlias, Union

 from loguru import logger
 from openai._types import NOT_GIVEN as OPEN_AI_NOT_GIVEN
@@ -29,12 +28,9 @@ from openai.types.chat import (
 )
 from PIL import Image

-from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.frames.frames import AudioRawFrame

-if TYPE_CHECKING:
-    from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
-
 # "Re-export" types from OpenAI that we're using as universal context types.
 # NOTE: if universal message types need to someday diverge from OpenAI's, we
 # should consider managing our own definitions. But we should do so carefully,
@@ -69,34 +65,6 @@ class LLMContext:
    and content formatting.
    """

-    @staticmethod
-    def from_openai_context(openai_context: "OpenAILLMContext") -> "LLMContext":
-        """Create a universal LLM context from an OpenAI-specific context.
-
-        NOTE: this should only be used internally, for facilitating migration
-        from OpenAILLMContext to LLMContext. New user code should use
-        LLMContext directly.
-
-        Args:
-            openai_context: The OpenAI LLM context to convert.
-
-        Returns:
-            New LLMContext instance with converted messages and settings.
-        """
-        # Convert tools to ToolsSchema if needed.
-        # If the tools are already a ToolsSchema, this is a no-op.
-        # Otherwise, we wrap them in a shim ToolsSchema.
-        converted_tools = openai_context.tools
-        if isinstance(converted_tools, list):
-            converted_tools = ToolsSchema(
-                standard_tools=[], custom_tools={AdapterType.SHIM: converted_tools}
-            )
-        return LLMContext(
-            messages=openai_context.get_messages(),
-            tools=converted_tools,
-            tool_choice=openai_context.tool_choice,
-        )
-
    def __init__(
        self,
        messages: Optional[List[LLMContextMessage]] = None,
@@ -114,129 +82,6 @@ class LLMContext:
        self._tools: ToolsSchema | NotGiven = LLMContext._normalize_and_validate_tools(tools)
        self._tool_choice: LLMContextToolChoice | NotGiven = tool_choice

-    @staticmethod
-    def create_image_url_message(
-        *,
-        role: str = "user",
-        url: str,
-        text: Optional[str] = None,
-    ) -> LLMContextMessage:
-        """Create a context message containing an image URL.
-
-        Args:
-            role: The role of this message (defaults to "user").
-            url: The URL of the image.
-            text: Optional text to include with the image.
-        """
-        content = []
-        if text:
-            content.append({"type": "text", "text": text})
-
-        content.append({"type": "image_url", "image_url": {"url": url}})
-
-        return {"role": role, "content": content}
-
-    @staticmethod
-    def create_image_message(
-        *,
-        role: str = "user",
-        format: str,
-        size: tuple[int, int],
-        image: bytes,
-        text: Optional[str] = None,
-    ) -> LLMContextMessage:
-        """Create a context message containing an image.
-
-        Args:
-            role: The role of this message (defaults to "user").
-            format: Image format (e.g., 'RGB', 'RGBA').
-            size: Image dimensions as (width, height) tuple.
-            image: Raw image bytes.
-            text: Optional text to include with the image.
-        """
-        buffer = io.BytesIO()
-        Image.frombytes(format, size, image).save(buffer, format="JPEG")
-        encoded_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
-        url = f"data:image/jpeg;base64,{encoded_image}"
-
-        return LLMContext.create_image_url_message(role=role, url=url, text=text)
-
-    @staticmethod
-    def create_audio_message(
-        *, role: str = "user", audio_frames: list[AudioRawFrame], text: str = "Audio follows"
-    ) -> LLMContextMessage:
-        """Create a context message containing audio.
-
-        Args:
-            role: The role of this message (defaults to "user").
-            audio_frames: List of audio frame objects to include.
-            text: Optional text to include with the audio.
-        """
-        sample_rate = audio_frames[0].sample_rate
-        num_channels = audio_frames[0].num_channels
-
-        content = []
-        content.append({"type": "text", "text": text})
-        data = b"".join(frame.audio for frame in audio_frames)
-
-        with io.BytesIO() as buffer:
-            with wave.open(buffer, "wb") as wf:
-                wf.setsampwidth(2)
-                wf.setnchannels(num_channels)
-                wf.setframerate(sample_rate)
-                wf.writeframes(data)
-
-        encoded_audio = base64.b64encode(buffer.getvalue()).decode("utf-8")
-
-        content.append(
-            {
-                "type": "input_audio",
-                "input_audio": {"data": encoded_audio, "format": "wav"},
-            }
-        )
-
-        return {"role": role, "content": content}
-
-    @property
-    def messages(self) -> List[LLMContextMessage]:
-        """Get the current messages list.
-
-        NOTE: This is equivalent to calling `get_messages()` with no filter. If
-        you want to filter out LLM-specific messages that don't pertain to your
-        LLM, use `get_messages()` directly.
-
-        Returns:
-            List of conversation messages.
-        """
-        return self.get_messages()
-
-    def get_messages_for_persistent_storage(self) -> List[LLMContextMessage]:
-        """Get messages suitable for persistent storage.
-
-        NOTE: the only reason this method exists is because we're "silently"
-        switching from OpenAILLMContext to LLMContext under the hood in some
-        services and don't want to trip up users who may have been relying on
-        this method, which is part of the public API of OpenAILLMContext but
-        doesn't need to be for LLMContext.
-
-        .. deprecated::
-            Use `get_messages()` instead.
-
-        Returns:
-            List of conversation messages.
-        """
-        import warnings
-
-        with warnings.catch_warnings():
-            warnings.simplefilter("always")
-            warnings.warn(
-                "get_messages_for_persistent_storage() is deprecated, use get_messages() instead.",
-                DeprecationWarning,
-                stacklevel=2,
-            )
-
-        return self.get_messages()
-
    def get_messages(self, llm_specific_filter: Optional[str] = None) -> List[LLMContextMessage]:
        """Get the current messages list.

@@ -244,8 +89,7 @@ class LLMContext:
            llm_specific_filter: Optional filter to return LLM-specific
                messages for the given LLM, in addition to the standard
                messages. If messages end up being filtered, an error will be
-                logged; this is intended to catch accidental use of
-                incompatible LLM-specific messages.
+                logged.

        Returns:
            List of conversation messages.
@@ -322,7 +166,7 @@ class LLMContext:
        self._tool_choice = tool_choice

    def add_image_frame_message(
-        self, *, format: str, size: tuple[int, int], image: bytes, text: Optional[str] = None
+        self, *, format: str, size: tuple[int, int], image: bytes, text: str = None
    ):
        """Add a message containing an image frame.

@@ -332,8 +176,17 @@ class LLMContext:
            image: Raw image bytes.
            text: Optional text to include with the image.
        """
-        message = LLMContext.create_image_message(format=format, size=size, image=image, text=text)
-        self.add_message(message)
+        buffer = io.BytesIO()
+        Image.frombytes(format, size, image).save(buffer, format="JPEG")
+        encoded_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
+
+        content = []
+        if text:
+            content.append({"type": "text", "text": text})
+        content.append(
+            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}},
+        )
+        self.add_message({"role": "user", "content": content})

    def add_audio_frames_message(
        self, *, audio_frames: list[AudioRawFrame], text: str = "Audio follows"
@@ -344,8 +197,66 @@ class LLMContext:
            audio_frames: List of audio frame objects to include.
            text: Optional text to include with the audio.
        """
-        message = LLMContext.create_audio_message(audio_frames=audio_frames, text=text)
-        self.add_message(message)
+        if not audio_frames:
+            return
+
+        sample_rate = audio_frames[0].sample_rate
+        num_channels = audio_frames[0].num_channels
+
+        content = []
+        content.append({"type": "text", "text": text})
+        data = b"".join(frame.audio for frame in audio_frames)
+        data = bytes(
+            self._create_wav_header(
+                sample_rate,
+                num_channels,
+                16,
+                len(data),
+            )
+            + data
+        )
+        encoded_audio = base64.b64encode(data).decode("utf-8")
+        content.append(
+            {
+                "type": "input_audio",
+                "input_audio": {"data": encoded_audio, "format": "wav"},
+            }
+        )
+        self.add_message({"role": "user", "content": content})
+
+    def _create_wav_header(self, sample_rate, num_channels, bits_per_sample, data_size):
+        """Create a WAV file header for audio data.
+
+        Args:
+            sample_rate: Audio sample rate in Hz.
+            num_channels: Number of audio channels.
+            bits_per_sample: Bits per audio sample.
+            data_size: Size of audio data in bytes.
+
+        Returns:
+            WAV header as a bytearray.
+        """
+        # RIFF chunk descriptor
+        header = bytearray()
+        header.extend(b"RIFF")  # ChunkID
+        header.extend((data_size + 36).to_bytes(4, "little"))  # ChunkSize: total size - 8
+        header.extend(b"WAVE")  # Format
+        # "fmt " sub-chunk
+        header.extend(b"fmt ")  # Subchunk1ID
+        header.extend((16).to_bytes(4, "little"))  # Subchunk1Size (16 for PCM)
+        header.extend((1).to_bytes(2, "little"))  # AudioFormat (1 for PCM)
+        header.extend(num_channels.to_bytes(2, "little"))  # NumChannels
+        header.extend(sample_rate.to_bytes(4, "little"))  # SampleRate
+        # Calculate byte rate and block align
+        byte_rate = sample_rate * num_channels * (bits_per_sample // 8)
+        block_align = num_channels * (bits_per_sample // 8)
+        header.extend(byte_rate.to_bytes(4, "little"))  # ByteRate
+        header.extend(block_align.to_bytes(2, "little"))  # BlockAlign
+        header.extend(bits_per_sample.to_bytes(2, "little"))  # BitsPerSample
+        # "data" sub-chunk
+        header.extend(b"data")  # Subchunk2ID
+        header.extend(data_size.to_bytes(4, "little"))  # Subchunk2Size
+        return header

    @staticmethod
    def _normalize_and_validate_tools(tools: ToolsSchema | NotGiven) -> ToolsSchema | NotGiven:
--- a/src/pipecat/processors/aggregators/llm_response.py
+++ b/src/pipecat/processors/aggregators/llm_response.py
@@ -89,9 +89,7 @@ class LLMAssistantAggregatorParams:

    Parameters:
        expect_stripped_words: Whether to expect and handle stripped words
-            in text frames by adding spaces between tokens. This parameter is
-            ignored when used with the newer LLMAssistantAggregator, which
-            handles word spacing automatically.
+            in text frames by adding spaces between tokens.
    """

    expect_stripped_words: bool = True
--- a/src/pipecat/processors/aggregators/llm_response_universal.py
+++ b/src/pipecat/processors/aggregators/llm_response_universal.py
@@ -13,7 +13,6 @@ LLM processing, and text-to-speech components in conversational AI pipelines.

 import asyncio
 import json
-import warnings
 from abc import abstractmethod
 from typing import Any, Dict, List, Literal, Optional, Set

@@ -66,7 +65,6 @@ from pipecat.processors.aggregators.llm_response import (
    LLMUserAggregatorParams,
 )
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
-from pipecat.utils.string import concatenate_aggregated_text
 from pipecat.utils.time import time_now_iso8601


@@ -90,7 +88,7 @@ class LLMContextAggregator(FrameProcessor):
        self._context = context
        self._role = role

-        self._aggregation: List[str] = []
+        self._aggregation: str = ""

    @property
    def messages(self) -> List[LLMContextMessage]:
@@ -170,21 +168,13 @@ class LLMContextAggregator(FrameProcessor):

    async def reset(self):
        """Reset the aggregation state."""
-        self._aggregation = []
+        self._aggregation = ""

    @abstractmethod
    async def push_aggregation(self):
        """Push the current aggregation downstream."""
        pass

-    def aggregation_string(self) -> str:
-        """Get the current aggregation as a string.
-
-        Returns:
-            The concatenated aggregation string.
-        """
-        return concatenate_aggregated_text(self._aggregation)
-

 class LLMUserAggregator(LLMContextAggregator):
    """User LLM aggregator that processes speech-to-text transcriptions.
@@ -222,6 +212,8 @@ class LLMUserAggregator(LLMContextAggregator):
        self._turn_params: Optional[SmartTurnParams] = None

        if "aggregation_timeout" in kwargs:
+            import warnings
+
            with warnings.catch_warnings():
                warnings.simplefilter("always")
                warnings.warn(
@@ -298,12 +290,6 @@ class LLMUserAggregator(LLMContextAggregator):
            await self._handle_llm_messages_update(frame)
        elif isinstance(frame, LLMSetToolsFrame):
            self.set_tools(frame.tools)
-            # Push the LLMSetToolsFrame as well, since speech-to-speech LLM
-            # services (like OpenAI Realtime) may need to know about tool
-            # changes; unlike text-based LLM services they won't just "pick up
-            # the change" on the next LLM run, as the LLM is continuously
-            # running.
-            await self.push_frame(frame, direction)
        elif isinstance(frame, LLMSetToolChoiceFrame):
            self.set_tool_choice(frame.tool_choice)
        elif isinstance(frame, SpeechControlParamsFrame):
@@ -315,7 +301,7 @@ class LLMUserAggregator(LLMContextAggregator):

    async def _process_aggregation(self):
        """Process the current aggregation and push it downstream."""
-        aggregation = self.aggregation_string()
+        aggregation = self._aggregation
        await self.reset()
        self._context.add_message({"role": self.role, "content": aggregation})
        frame = LLMContextFrame(self._context)
@@ -363,7 +349,7 @@ class LLMUserAggregator(LLMContextAggregator):
        """

        async def should_interrupt(strategy: BaseInterruptionStrategy):
-            await strategy.append_text(self.aggregation_string())
+            await strategy.append_text(self._aggregation)
            return await strategy.should_interrupt()

        return any([await should_interrupt(s) for s in self._interruption_strategies])
@@ -433,7 +419,7 @@ class LLMUserAggregator(LLMContextAggregator):
        if not text.strip():
            return

-        self._aggregation.append(text)
+        self._aggregation += f" {text}" if self._aggregation else text
        # We just got a final result, so let's reset interim results.
        self._seen_interim_results = False
        # Reset aggregation timer.
@@ -558,31 +544,23 @@ class LLMAssistantAggregator(LLMContextAggregator):
        Args:
            context: The OpenAI LLM context for conversation storage.
            params: Configuration parameters for aggregation behavior.
-            **kwargs: Additional arguments.
+            **kwargs: Additional arguments. Supports deprecated 'expect_stripped_words'.
        """
        super().__init__(context=context, role="assistant", **kwargs)
        self._params = params or LLMAssistantAggregatorParams()

        if "expect_stripped_words" in kwargs:
+            import warnings
+
            with warnings.catch_warnings():
                warnings.simplefilter("always")
                warnings.warn(
-                    "Parameter 'expect_stripped_words' is deprecated. "
-                    "LLMAssistantAggregator now handles word spacing automatically.",
+                    "Parameter 'expect_stripped_words' is deprecated, use 'params' instead.",
                    DeprecationWarning,
                )

            self._params.expect_stripped_words = kwargs["expect_stripped_words"]

-        if params and not params.expect_stripped_words:
-            with warnings.catch_warnings():
-                warnings.simplefilter("always")
-                warnings.warn(
-                    "params.expect_stripped_words is deprecated. "
-                    "LLMAssistantAggregator now handles word spacing automatically.",
-                    DeprecationWarning,
-                )
-
        self._started = 0
        self._function_calls_in_progress: Dict[str, Optional[FunctionCallInProgressFrame]] = {}
        self._context_updated_tasks: Set[asyncio.Task] = set()
@@ -632,7 +610,7 @@ class LLMAssistantAggregator(LLMContextAggregator):
            await self._handle_function_call_result(frame)
        elif isinstance(frame, FunctionCallCancelFrame):
            await self._handle_function_call_cancel(frame)
-        elif isinstance(frame, UserImageRawFrame):
+        elif isinstance(frame, UserImageRawFrame) and frame.request and frame.request.tool_call_id:
            await self._handle_user_image_frame(frame)
        elif isinstance(frame, BotStoppedSpeakingFrame):
            await self.push_aggregation()
@@ -645,7 +623,7 @@ class LLMAssistantAggregator(LLMContextAggregator):
        if not self._aggregation:
            return

-        aggregation = self.aggregation_string()
+        aggregation = self._aggregation.strip()
        await self.reset()

        if aggregation:
@@ -783,16 +761,27 @@ class LLMAssistantAggregator(LLMContextAggregator):
                message["content"] = result

    async def _handle_user_image_frame(self, frame: UserImageRawFrame):
-        if not frame.append_to_context:
+        logger.debug(
+            f"{self} UserImageRawFrame: [{frame.request.function_name}:{frame.request.tool_call_id}]"
+        )
+
+        if frame.request.tool_call_id not in self._function_calls_in_progress:
+            logger.warning(
+                f"UserImageRawFrame tool_call_id [{frame.request.tool_call_id}] is not running"
+            )
            return

-        logger.debug(f"{self} Appending UserImageRawFrame to LLM context (size: {frame.size})")
+        del self._function_calls_in_progress[frame.request.tool_call_id]

+        # Update context with the image frame
+        self._update_function_call_result(
+            frame.request.function_name, frame.request.tool_call_id, "COMPLETED"
+        )
        self._context.add_image_frame_message(
            format=frame.format,
            size=frame.size,
            image=frame.image,
-            text=frame.text,
+            text=frame.request.context,
        )

        await self.push_aggregation()
@@ -809,11 +798,10 @@ class LLMAssistantAggregator(LLMContextAggregator):
        if not self._started:
            return

-        # Make sure we really have text (spaces count, too!)
-        if len(frame.text) == 0:
-            return
-
-        self._aggregation.append(frame.text)
+        if self._params.expect_stripped_words:
+            self._aggregation += f" {frame.text}" if self._aggregation else frame.text
+        else:
+            self._aggregation += frame.text

    def _context_updated_task_finished(self, task: asyncio.Task):
        self._context_updated_tasks.discard(task)
--- a/src/pipecat/processors/aggregators/user_response.py
+++ b/src/pipecat/processors/aggregators/user_response.py
@@ -27,24 +27,11 @@ class UserResponseAggregator(LLMUserAggregator):
    def __init__(self, **kwargs):
        """Initialize the user response aggregator.

-        .. deprecated:: 0.0.92
-            `UserResponseAggregator` is deprecated and will be removed in a future version.
-
        Args:
            **kwargs: Additional arguments passed to parent LLMUserAggregator.
        """
        super().__init__(context=LLMContext(), **kwargs)

-        import warnings
-
-        with warnings.catch_warnings():
-            warnings.simplefilter("always")
-            warnings.warn(
-                "`UserResponseAggregator` is deprecated and will be removed in a future version.",
-                DeprecationWarning,
-                stacklevel=2,
-            )
-
    async def push_aggregation(self):
        """Push the aggregated user response as a TextFrame.

--- a/src/pipecat/processors/filters/function_filter.py
+++ b/src/pipecat/processors/filters/function_filter.py
@@ -12,7 +12,7 @@ allowing for flexible frame filtering logic in processing pipelines.

 from typing import Awaitable, Callable

-from pipecat.frames.frames import CancelFrame, EndFrame, Frame, StartFrame, SystemFrame
+from pipecat.frames.frames import EndFrame, Frame, SystemFrame
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor


@@ -28,7 +28,6 @@ class FunctionFilter(FrameProcessor):
        self,
        filter: Callable[[Frame], Awaitable[bool]],
        direction: FrameDirection = FrameDirection.DOWNSTREAM,
-        filter_system_frames: bool = False,
    ):
        """Initialize the function filter.

@@ -37,32 +36,22 @@ class FunctionFilter(FrameProcessor):
                frame should pass through, False otherwise.
            direction: The direction to apply filtering. Only frames moving in
                this direction will be filtered. Defaults to DOWNSTREAM.
-            filter_system_frames: Whether to filter system frames. Defaults to False.
        """
        super().__init__()
        self._filter = filter
        self._direction = direction
-        self._filter_system_frames = filter_system_frames

    #
    # Frame processor
    #

+    # Ignore system frames, end frames and frames that are not following the
+    # direction of this gate
    def _should_passthrough_frame(self, frame, direction):
        """Check if a frame should pass through without filtering."""
-        # Always passthrough frames in the wrong direction
-        if direction != self._direction:
-            return True
-
-        # Always passthrough lifecycle frames
-        if isinstance(frame, (StartFrame, EndFrame, CancelFrame)):
-            return True
-
-        # If not filtering system frames, passthrough all other system frames
-        if not self._filter_system_frames and isinstance(frame, SystemFrame):
-            return True
-
-        return False
+        # Ignore system frames, end frames and frames that are not following the
+        # direction of this gate
+        return isinstance(frame, (SystemFrame, EndFrame)) or direction != self._direction

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        """Process a frame through the filter.
--- a/src/pipecat/processors/frame_processor.py
+++ b/src/pipecat/processors/frame_processor.py
@@ -132,17 +132,14 @@ INPUT_TASK_CANCEL_TIMEOUT_SECS = 3


 class FrameProcessor(BaseObject):
-    """Base class for all frame processors in Pipecat.
+    """Base class for all frame processors in the pipeline.

-    A FrameProcessor is an independent, asynchronous component that consumes
-    input frames and produces zero or more output frames. Frames are delivered
-    to the processor via the `queue_frame(frame, direction)` method. The
-    processor internally manages queues and background tasks to handle incoming
-    frames and generate output frames.
-
-    Output frames are made available through the processor's asynchronous
-    iterator interface, allowing consumers to iterate over processed frames
-    using `async for frame in processor`. Frame ordering is guaranteed.
+    Frame processors are the building blocks of Pipecat pipelines, they can be
+    linked to form complex processing pipelines. They receive frames, process
+    them, and pass them to the next or previous processor in the chain.  Each
+    frame processor guarantees frame ordering and processes frames in its own
+    task. System frames are also processed in a separate task which guarantees
+    frame priority.

    Event handlers available:

@@ -150,7 +147,6 @@ class FrameProcessor(BaseObject):
    - on_after_process_frame: Called after a frame is processed
    - on_before_push_frame: Called before a frame is pushed
    - on_after_push_frame: Called after a frame is pushed
-
    """

    def __init__(
@@ -170,6 +166,8 @@ class FrameProcessor(BaseObject):
            **kwargs: Additional arguments passed to parent class.
        """
        super().__init__(name=name, **kwargs)
+        self._prev: Optional["FrameProcessor"] = None
+        self._next: Optional["FrameProcessor"] = None

        # Enable direct mode to skip queues and process frames right away.
        self._enable_direct_mode = enable_direct_mode
@@ -236,9 +234,6 @@ class FrameProcessor(BaseObject):
        self._wait_for_interruption = False
        self._wait_interruption_event = asyncio.Event()

-        # Push queue
-        self.__push_queue = asyncio.Queue()
-
        # Frame processor events.
        self._register_event_handler("on_before_process_frame", sync=True)
        self._register_event_handler("on_after_process_frame", sync=True)
@@ -289,6 +284,24 @@ class FrameProcessor(BaseObject):
        """
        return []

+    @property
+    def next(self) -> Optional["FrameProcessor"]:
+        """Get the next processor.
+
+        Returns:
+            The next processor, or None if there's no next processor.
+        """
+        return self._next
+
+    @property
+    def previous(self) -> Optional["FrameProcessor"]:
+        """Get the previous processor.
+
+        Returns:
+            The previous processor, or None if there's no previous processor.
+        """
+        return self._prev
+
    @property
    def interruptions_allowed(self):
        """Check if interruptions are allowed for this processor.
@@ -505,7 +518,16 @@ class FrameProcessor(BaseObject):
        await self.__cancel_process_task()
        if self._metrics is not None:
            await self._metrics.cleanup()
-        await self.__push_queue.put(None)
+
+    def link(self, processor: "FrameProcessor"):
+        """Link this processor to the next processor in the pipeline.
+
+        Args:
+            processor: The processor to link to.
+        """
+        self._next = processor
+        processor._prev = self
+        logger.debug(f"Linking {self} -> {self._next}")

    def get_clock(self) -> BaseClock:
        """Get the clock used by this processor.
@@ -739,7 +761,36 @@ class FrameProcessor(BaseObject):
            frame: The frame to push.
            direction: The direction to push the frame.
        """
-        await self.__push_queue.put((frame, direction))
+        try:
+            timestamp = self._clock.get_time() if self._clock else 0
+            if direction == FrameDirection.DOWNSTREAM and self._next:
+                logger.trace(f"Pushing {frame} from {self} to {self._next}")
+
+                if self._observer:
+                    data = FramePushed(
+                        source=self,
+                        destination=self._next,
+                        frame=frame,
+                        direction=direction,
+                        timestamp=timestamp,
+                    )
+                    await self._observer.on_push_frame(data)
+                await self._next.queue_frame(frame, direction)
+            elif direction == FrameDirection.UPSTREAM and self._prev:
+                logger.trace(f"Pushing {frame} upstream from {self} to {self._prev}")
+                if self._observer:
+                    data = FramePushed(
+                        source=self,
+                        destination=self._prev,
+                        frame=frame,
+                        direction=direction,
+                        timestamp=timestamp,
+                    )
+                    await self._observer.on_push_frame(data)
+                await self._prev.queue_frame(frame, direction)
+        except Exception as e:
+            logger.exception(f"Uncaught exception in {self}: {e}")
+            await self.push_error(ErrorFrame(str(e)))

    def _check_started(self, frame: Frame):
        """Check if the processor has been started.
@@ -861,18 +912,3 @@ class FrameProcessor(BaseObject):
            await self.__process_frame(frame, direction, callback)

            self.__process_queue.task_done()
-
-    def __aiter__(self):
-        """A frame processor is an asynchronous iterator itself."""
-        return self
-
-    async def __anext__(self):
-        """Retrieve the next frame to push from this processor.
-
-        Returns:
-            The next (frame, direction) item to push form this processor.
-        """
-        data = await self.__push_queue.get()
-        if data is None:
-            raise StopAsyncIteration
-        return data
--- a/src/pipecat/processors/frameworks/rtvi.py
+++ b/src/pipecat/processors/frameworks/rtvi.py
@@ -1018,7 +1018,6 @@ class RTVIObserver(BaseObserver):

        if (
            isinstance(frame, (UserStartedSpeakingFrame, UserStoppedSpeakingFrame))
-            and (direction == FrameDirection.DOWNSTREAM)
            and self._params.user_speaking_enabled
        ):
            await self._handle_interruptions(frame)
--- a/src/pipecat/processors/transcript_processor.py
+++ b/src/pipecat/processors/transcript_processor.py
@@ -26,7 +26,6 @@ from pipecat.frames.frames import (
    TTSTextFrame,
 )
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
-from pipecat.utils.string import concatenate_aggregated_text
 from pipecat.utils.time import time_now_iso8601


@@ -141,7 +140,29 @@ class AssistantTranscriptProcessor(BaseTranscriptProcessor):
                Result: "Hello there how are you"
        """
        if self._current_text_parts and self._aggregation_start_time:
-            content = concatenate_aggregated_text(self._current_text_parts)
+            # Check specifically for space characters, previously isspace() was used
+            # but that includes all whitespace characters (e.g. \n), not just spaces.
+            has_leading_spaces = any(
+                part and part[0] == " " for part in self._current_text_parts[1:]
+            )
+            has_trailing_spaces = any(
+                part and part[-1] == " " for part in self._current_text_parts[:-1]
+            )
+
+            # If there are embedded spaces in the fragments, use direct concatenation
+            contains_spacing_between_fragments = has_leading_spaces or has_trailing_spaces
+
+            # Apply corresponding joining method
+            if contains_spacing_between_fragments:
+                # Fragments already have spacing - just concatenate
+                content = "".join(self._current_text_parts)
+            else:
+                # Word-by-word fragments - join with spaces
+                content = " ".join(self._current_text_parts)
+
+            # Clean up any excessive whitespace
+            content = content.strip()
+
            if content:
                logger.trace(f"Emitting aggregated assistant message: {content}")
                message = TranscriptionMessage(
--- a/src/pipecat/runner/daily.py
+++ b/src/pipecat/runner/daily.py
@@ -44,8 +44,6 @@ from loguru import logger
 from pydantic import BaseModel

 from pipecat.transports.daily.utils import (
-    DailyMeetingTokenParams,
-    DailyMeetingTokenProperties,
    DailyRESTHelper,
    DailyRoomParams,
    DailyRoomProperties,
@@ -78,15 +76,12 @@ class DailyRoomConfig(BaseModel):
 async def configure(
    aiohttp_session: aiohttp.ClientSession,
    *,
-    api_key: Optional[str] = None,
    room_exp_duration: Optional[float] = 2.0,
    token_exp_duration: Optional[float] = 2.0,
    sip_caller_phone: Optional[str] = None,
    sip_enable_video: Optional[bool] = False,
    sip_num_endpoints: Optional[int] = 1,
    sip_codecs: Optional[Dict[str, List[str]]] = None,
-    room_properties: Optional[DailyRoomProperties] = None,
-    token_properties: Optional["DailyMeetingTokenProperties"] = None,
 ) -> DailyRoomConfig:
    """Configure Daily room URL and token with optional SIP capabilities.

@@ -96,7 +91,6 @@ async def configure(

    Args:
        aiohttp_session: HTTP session for making API requests.
-        api_key: Daily API key.
        room_exp_duration: Room expiration time in hours.
        token_exp_duration: Token expiration time in hours.
        sip_caller_phone: Phone number or identifier for SIP display name.
@@ -105,13 +99,6 @@ async def configure(
        sip_num_endpoints: Number of allowed SIP endpoints.
        sip_codecs: Codecs to support for audio and video. If None, uses Daily defaults.
            Example: {"audio": ["OPUS"], "video": ["H264"]}
-        room_properties: Optional DailyRoomProperties to use instead of building from
-            individual parameters. When provided, this overrides room_exp_duration and
-            SIP-related parameters. If not provided, properties are built from the
-            individual parameters as before.
-        token_properties: Optional DailyMeetingTokenProperties to customize the meeting
-            token. When provided, these properties are passed to the token creation API.
-            Note that room_name, exp, and is_owner will be set automatically.

    Returns:
        DailyRoomConfig: Object with room_url, token, and optional sip_endpoint.
@@ -128,48 +115,18 @@ async def configure(
        # SIP-enabled room
        sip_config = await configure(session, sip_caller_phone="+15551234567")
        print(f"SIP endpoint: {sip_config.sip_endpoint}")
-
-        # Custom room properties with recording enabled
-        custom_props = DailyRoomProperties(
-            enable_recording="cloud",
-            max_participants=2,
-        )
-        config = await configure(session, room_properties=custom_props)
    """
    # Check for required API key
-    api_key = api_key or os.getenv("DAILY_API_KEY")
+    api_key = os.getenv("DAILY_API_KEY")
    if not api_key:
        raise Exception(
            "DAILY_API_KEY environment variable is required. "
            "Get your API key from https://dashboard.daily.co/developers"
        )

-    # Warn if both room_properties and individual parameters are provided
-    if room_properties is not None:
-        individual_params_provided = any(
-            [
-                room_exp_duration != 2.0,
-                token_exp_duration != 2.0,
-                sip_caller_phone is not None,
-                sip_enable_video is not False,
-                sip_num_endpoints != 1,
-                sip_codecs is not None,
-            ]
-        )
-        if individual_params_provided:
-            logger.warning(
-                "Both room_properties and individual parameters (room_exp_duration, token_exp_duration, "
-                "sip_*) were provided. The room_properties will be used and individual parameters "
-                "will be ignored."
-            )
-
    # Determine if SIP mode is enabled
    sip_enabled = sip_caller_phone is not None

-    # If room_properties is provided, check if it has SIP configuration
-    if room_properties and room_properties.sip:
-        sip_enabled = True
-
    daily_rest_helper = DailyRESTHelper(
        daily_api_key=api_key,
        daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
@@ -185,10 +142,7 @@ async def configure(

        # Create token and return standard format
        expiry_time: float = token_exp_duration * 60 * 60
-        token_params = None
-        if token_properties:
-            token_params = DailyMeetingTokenParams(properties=token_properties)
-        token = await daily_rest_helper.get_token(room_url, expiry_time, params=token_params)
+        token = await daily_rest_helper.get_token(room_url, expiry_time)
        return DailyRoomConfig(room_url=room_url, token=token)

    # Create a new room
@@ -196,29 +150,27 @@ async def configure(
    room_name = f"{room_prefix}-{uuid.uuid4().hex[:8]}"
    logger.info(f"Creating new Daily room: {room_name}")

-    # Use provided room_properties or build from parameters
-    if room_properties is None:
-        # Calculate expiration time
-        expiration_time = time.time() + (room_exp_duration * 60 * 60)
+    # Calculate expiration time
+    expiration_time = time.time() + (room_exp_duration * 60 * 60)

-        # Create room properties
-        room_properties = DailyRoomProperties(
-            exp=expiration_time,
-            eject_at_room_exp=True,
+    # Create room properties
+    room_properties = DailyRoomProperties(
+        exp=expiration_time,
+        eject_at_room_exp=True,
+    )
+
+    # Add SIP configuration if enabled
+    if sip_enabled:
+        sip_params = DailyRoomSipParams(
+            display_name=sip_caller_phone,
+            video=sip_enable_video,
+            sip_mode="dial-in",
+            num_endpoints=sip_num_endpoints,
+            codecs=sip_codecs,
        )
-
-        # Add SIP configuration if enabled
-        if sip_enabled:
-            sip_params = DailyRoomSipParams(
-                display_name=sip_caller_phone,
-                video=sip_enable_video,
-                sip_mode="dial-in",
-                num_endpoints=sip_num_endpoints,
-                codecs=sip_codecs,
-            )
-            room_properties.sip = sip_params
-            room_properties.enable_dialout = True  # Enable outbound calls if needed
-            room_properties.start_video_off = not sip_enable_video  # Voice-only by default
+        room_properties.sip = sip_params
+        room_properties.enable_dialout = True  # Enable outbound calls if needed
+        room_properties.start_video_off = not sip_enable_video  # Voice-only by default

    # Create room parameters
    room_params = DailyRoomParams(name=room_name, properties=room_properties)
@@ -230,12 +182,7 @@ async def configure(

        # Create meeting token
        token_expiry_seconds = token_exp_duration * 60 * 60
-        token_params = None
-        if token_properties:
-            token_params = DailyMeetingTokenParams(properties=token_properties)
-        token = await daily_rest_helper.get_token(
-            room_url, token_expiry_seconds, params=token_params
-        )
+        token = await daily_rest_helper.get_token(room_url, token_expiry_seconds)

        if sip_enabled:
            # Return SIP configuration object
--- a/src/pipecat/runner/run.py
+++ b/src/pipecat/runner/run.py
@@ -70,19 +70,16 @@ import asyncio
 import mimetypes
 import os
 import sys
-import uuid
 from contextlib import asynccontextmanager
-from http import HTTPMethod
 from pathlib import Path
-from typing import Any, Dict, List, Optional, TypedDict
+from typing import Optional

 import aiohttp
-from fastapi.responses import FileResponse, Response
+from fastapi.responses import FileResponse
 from loguru import logger

 from pipecat.runner.types import (
    DailyRunnerArguments,
-    RunnerArguments,
    SmallWebRTCRunnerArguments,
    WebSocketRunnerArguments,
 )
@@ -169,7 +166,6 @@ def _create_server_app(
    host: str = "localhost",
    proxy: str,
    esp32_mode: bool = False,
-    whatsapp_enabled: bool = False,
    folder: Optional[str] = None,
 ):
    """Create FastAPI app with transport-specific routes."""
@@ -186,8 +182,7 @@ def _create_server_app(
    # Set up transport-specific routes
    if transport_type == "webrtc":
        _setup_webrtc_routes(app, esp32_mode=esp32_mode, host=host, folder=folder)
-        if whatsapp_enabled:
-            _setup_whatsapp_routes(app)
+        _setup_whatsapp_routes(app)
    elif transport_type == "daily":
        _setup_daily_routes(app)
    elif transport_type in TELEPHONY_TRANSPORTS:
@@ -205,10 +200,8 @@ def _setup_webrtc_routes(
    try:
        from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI

-        from pipecat.transports.smallwebrtc.connection import IceServer, SmallWebRTCConnection
+        from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
        from pipecat.transports.smallwebrtc.request_handler import (
-            IceCandidate,
-            SmallWebRTCPatchRequest,
            SmallWebRTCRequest,
            SmallWebRTCRequestHandler,
        )
@@ -216,16 +209,6 @@ def _setup_webrtc_routes(
        logger.error(f"WebRTC transport dependencies not installed: {e}")
        return

-    class IceConfig(TypedDict):
-        iceServers: List[IceServer]
-
-    class StartBotResult(TypedDict, total=False):
-        sessionId: str
-        iceConfig: Optional[IceConfig]
-
-    # In-memory store of active sessions: session_id -> session info
-    active_sessions: Dict[str, Dict[str, Any]] = {}
-
    # Mount the frontend
    app.mount("/client", SmallWebRTCPrebuiltUI)

@@ -271,74 +254,6 @@ def _setup_webrtc_routes(
        )
        return answer

-    @app.patch("/api/offer")
-    async def ice_candidate(request: SmallWebRTCPatchRequest):
-        """Handle WebRTC new ice candidate requests."""
-        logger.debug(f"Received patch request: {request}")
-        await small_webrtc_handler.handle_patch_request(request)
-        return {"status": "success"}
-
-    @app.post("/start")
-    async def rtvi_start(request: Request):
-        """Mimic Pipecat Cloud's /start endpoint."""
-        # Parse the request body
-        try:
-            request_data = await request.json()
-            logger.debug(f"Received request: {request_data}")
-        except Exception as e:
-            logger.error(f"Failed to parse request body: {e}")
-            request_data = {}
-
-        # Store session info immediately in memory, replicate the behavior expected on Pipecat Cloud
-        session_id = str(uuid.uuid4())
-        active_sessions[session_id] = request_data
-
-        result: StartBotResult = {"sessionId": session_id}
-        if request_data.get("enableDefaultIceServers"):
-            result["iceConfig"] = IceConfig(
-                iceServers=[IceServer(urls="stun:stun.l.google.com:19302")]
-            )
-
-        return result
-
-    @app.api_route(
-        "/sessions/{session_id}/{path:path}",
-        methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
-    )
-    async def proxy_request(
-        session_id: str, path: str, request: Request, background_tasks: BackgroundTasks
-    ):
-        """Mimic Pipecat Cloud's proxy."""
-        active_session = active_sessions.get(session_id)
-        if active_session is None:
-            return Response(content="Invalid or not-yet-ready session_id", status_code=404)
-
-        if path.endswith("api/offer"):
-            # Parse the request body and convert to SmallWebRTCRequest
-            try:
-                request_data = await request.json()
-                if request.method == HTTPMethod.POST.value:
-                    webrtc_request = SmallWebRTCRequest(
-                        sdp=request_data["sdp"],
-                        type=request_data["type"],
-                        pc_id=request_data.get("pc_id"),
-                        restart_pc=request_data.get("restart_pc"),
-                        request_data=request_data,
-                    )
-                    return await offer(webrtc_request, background_tasks)
-                elif request.method == HTTPMethod.PATCH.value:
-                    patch_request = SmallWebRTCPatchRequest(
-                        pc_id=request_data["pc_id"],
-                        candidates=[IceCandidate(**c) for c in request_data.get("candidates", [])],
-                    )
-                    return await ice_candidate(patch_request)
-            except Exception as e:
-                logger.error(f"Failed to parse WebRTC request: {e}")
-                return Response(content="Invalid WebRTC request", status_code=400)
-
-        logger.info(f"Received request for path: {path}")
-        return Response(status_code=200)
-
    @asynccontextmanager
    async def smallwebrtc_lifespan(app: FastAPI):
        """Manage FastAPI application lifecycle and cleanup connections."""
@@ -374,29 +289,6 @@ def _add_lifespan_to_app(app: FastAPI, new_lifespan):

 def _setup_whatsapp_routes(app: FastAPI):
    """Set up WebRTC-specific routes."""
-    WHATSAPP_APP_SECRET = os.getenv("WHATSAPP_APP_SECRET")
-    WHATSAPP_PHONE_NUMBER_ID = os.getenv("WHATSAPP_PHONE_NUMBER_ID")
-    WHATSAPP_TOKEN = os.getenv("WHATSAPP_TOKEN")
-    WHATSAPP_WEBHOOK_VERIFICATION_TOKEN = os.getenv("WHATSAPP_WEBHOOK_VERIFICATION_TOKEN")
-
-    if not all(
-        [
-            WHATSAPP_APP_SECRET,
-            WHATSAPP_PHONE_NUMBER_ID,
-            WHATSAPP_TOKEN,
-            WHATSAPP_WEBHOOK_VERIFICATION_TOKEN,
-        ]
-    ):
-        logger.error(
-            """Missing required environment variables for WhatsApp transport:
-    WHATSAPP_APP_SECRET
-    WHATSAPP_PHONE_NUMBER_ID
-    WHATSAPP_TOKEN
-    WHATSAPP_WEBHOOK_VERIFICATION_TOKEN
-            """
-        )
-        return
-
    try:
        from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI

@@ -408,7 +300,24 @@ def _setup_whatsapp_routes(app: FastAPI):
        from pipecat.transports.whatsapp.api import WhatsAppWebhookRequest
        from pipecat.transports.whatsapp.client import WhatsAppClient
    except ImportError as e:
-        logger.error(f"WhatsApp transport dependencies not installed: {e}")
+        logger.error(f"WebRTC transport dependencies not installed: {e}")
+        return
+
+    WHATSAPP_TOKEN = os.getenv("WHATSAPP_TOKEN")
+    WHATSAPP_PHONE_NUMBER_ID = os.getenv("WHATSAPP_PHONE_NUMBER_ID")
+    WHATSAPP_WEBHOOK_VERIFICATION_TOKEN = os.getenv("WHATSAPP_WEBHOOK_VERIFICATION_TOKEN")
+    WHATSAPP_APP_SECRET = os.getenv("WHATSAPP_APP_SECRET")
+
+    if not all(
+        [
+            WHATSAPP_TOKEN,
+            WHATSAPP_PHONE_NUMBER_ID,
+            WHATSAPP_WEBHOOK_VERIFICATION_TOKEN,
+        ]
+    ):
+        logger.debug(
+            "Missing required environment variables for WhatsApp transport. Keeping it disabled."
+        )
        return

    # Global WhatsApp client instance
@@ -530,9 +439,9 @@ def _setup_daily_routes(app: FastAPI):
    """Set up Daily-specific routes."""

    @app.get("/")
-    async def create_room_and_start_agent():
+    async def start_agent():
        """Launch a Daily bot and redirect to room."""
-        print("Starting bot with Daily transport and redirecting to Daily room")
+        print("Starting bot with Daily transport")

        import aiohttp

@@ -547,15 +456,14 @@ def _setup_daily_routes(app: FastAPI):
            asyncio.create_task(bot_module.bot(runner_args))
            return RedirectResponse(room_url)

-    @app.post("/start")
-    async def start_agent(request: Request):
-        """Handler for /start endpoints.
+    async def _handle_rtvi_request(request: Request):
+        """Common handler for both /start and /connect endpoints.

        Expects POST body like::
+
            {
                "createDailyRoom": true,
                "dailyRoomProperties": { "start_video_off": true },
-                "dailyMeetingTokenProperties": { "is_owner": true, "user_name": "Bot" },
                "body": { "custom_data": "value" }
            }
        """
@@ -569,68 +477,47 @@ def _setup_daily_routes(app: FastAPI):
            logger.error(f"Failed to parse request body: {e}")
            request_data = {}

-        create_daily_room = request_data.get("createDailyRoom", False)
-        body = request_data.get("body", {})
-        daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
-        daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
+        # Extract the body data that should be passed to the bot
+        # This mimics Pipecat Cloud's behavior
+        bot_body = request_data.get("body", {})

-        bot_module = _get_bot_module()
-
-        existing_room_url = os.getenv("DAILY_SAMPLE_ROOM_URL")
-
-        result = None
-
-        # Configure room if:
-        # 1. Explicitly requested via createDailyRoom in payload
-        # 2. Using pre-configured room from DAILY_SAMPLE_ROOM_URL env var
-        if create_daily_room or existing_room_url:
-            import aiohttp
-
-            from pipecat.runner.daily import configure
-            from pipecat.transports.daily.utils import (
-                DailyMeetingTokenProperties,
-                DailyRoomProperties,
-            )
-
-            async with aiohttp.ClientSession() as session:
-                # Parse dailyRoomProperties if provided
-                room_properties = None
-                if daily_room_properties_dict:
-                    try:
-                        room_properties = DailyRoomProperties(**daily_room_properties_dict)
-                        logger.debug(f"Using custom room properties: {room_properties}")
-                    except Exception as e:
-                        logger.error(f"Failed to parse dailyRoomProperties: {e}")
-                        # Continue without custom properties
-
-                # Parse dailyMeetingTokenProperties if provided
-                token_properties = None
-                if daily_token_properties_dict:
-                    try:
-                        token_properties = DailyMeetingTokenProperties(
-                            **daily_token_properties_dict
-                        )
-                        logger.debug(f"Using custom token properties: {token_properties}")
-                    except Exception as e:
-                        logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
-                        # Continue without custom properties
-
-                room_url, token = await configure(
-                    session, room_properties=room_properties, token_properties=token_properties
-                )
-                runner_args = DailyRunnerArguments(room_url=room_url, token=token, body=body)
-                result = {
-                    "dailyRoom": room_url,
-                    "dailyToken": token,
-                    "sessionId": str(uuid.uuid4()),
-                }
+        # Log the extracted body data for debugging
+        if bot_body:
+            logger.info(f"Extracted body data for bot: {bot_body}")
        else:
-            runner_args = RunnerArguments(body=body)
+            logger.debug("No body data provided in request")

-        # Start the bot in the background
-        asyncio.create_task(bot_module.bot(runner_args))
+        import aiohttp

-        return result
+        from pipecat.runner.daily import configure
+
+        async with aiohttp.ClientSession() as session:
+            room_url, token = await configure(session)
+
+            # Start the bot in the background with extracted body data
+            bot_module = _get_bot_module()
+            runner_args = DailyRunnerArguments(room_url=room_url, token=token, body=bot_body)
+            asyncio.create_task(bot_module.bot(runner_args))
+            # Match PCC /start endpoint response format:
+            return {"dailyRoom": room_url, "dailyToken": token}
+
+    @app.post("/start")
+    async def rtvi_start(request: Request):
+        """Launch a Daily bot and return connection info for RTVI clients."""
+        return await _handle_rtvi_request(request)
+
+    @app.post("/connect")
+    async def rtvi_connect(request: Request):
+        """Launch a Daily bot and return connection info for RTVI clients.
+
+        .. deprecated:: 0.0.78
+            Use /start instead. This endpoint will be removed in a future version.
+        """
+        logger.warning(
+            "DEPRECATED: /connect endpoint is deprecated. Please use /start instead. "
+            "This endpoint will be removed in a future version."
+        )
+        return await _handle_rtvi_request(request)


 def _setup_telephony_routes(app: FastAPI, *, transport_type: str, proxy: str):
@@ -689,6 +576,8 @@ def _setup_telephony_routes(app: FastAPI, *, transport_type: str, proxy: str):
 async def _run_daily_direct():
    """Run Daily bot with direct connection (no FastAPI server)."""
    try:
+        import aiohttp
+
        from pipecat.runner.daily import configure
    except ImportError as e:
        logger.error("Daily transport dependencies not installed.")
@@ -800,12 +689,6 @@ def main():
    parser.add_argument(
        "--verbose", "-v", action="count", default=0, help="Increase logging verbosity"
    )
-    parser.add_argument(
-        "--whatsapp",
-        action="store_true",
-        default=False,
-        help="Ensure requried WhatsApp environment variables are present",
-    )

    args = parser.parse_args()

@@ -825,6 +708,10 @@ def main():
        logger.error("For ESP32, you need to specify `--host IP` so we can do SDP munging.")
        return

+    if args.transport in TELEPHONY_TRANSPORTS and not args.proxy:
+        logger.error(f"For telephony transports, you need to specify `--proxy PROXY`.")
+        return
+
    # Log level
    logger.remove()
    logger.add(sys.stderr, level="TRACE" if args.verbose else "DEBUG")
@@ -844,11 +731,10 @@ def main():
        print()
        if args.esp32:
            print(f"🚀 Bot ready! (ESP32 mode)")
-        elif args.whatsapp:
-            print(f"🚀 Bot ready! (WhatsApp)")
+            print(f"   → Open http://{args.host}:{args.port}/client in your browser")
        else:
            print(f"🚀 Bot ready!")
-        print(f"   → Open http://{args.host}:{args.port}/client in your browser")
+            print(f"   → Open http://{args.host}:{args.port}/client in your browser")
        print()
    elif args.transport == "daily":
        print()
@@ -866,7 +752,6 @@ def main():
        host=args.host,
        proxy=args.proxy,
        esp32_mode=args.esp32,
-        whatsapp_enabled=args.whatsapp,
        folder=args.folder,
    )

--- a/src/pipecat/runner/types.py
+++ b/src/pipecat/runner/types.py
@@ -20,11 +20,9 @@ from fastapi import WebSocket
 class RunnerArguments:
    """Base class for runner session arguments."""

-    # Use kw_only so subclasses don't need to worry about ordering.
-    handle_sigint: bool = field(init=False, kw_only=True)
-    handle_sigterm: bool = field(init=False, kw_only=True)
-    pipeline_idle_timeout_secs: int = field(init=False, kw_only=True)
-    body: Optional[Any] = field(default_factory=dict, kw_only=True)
+    handle_sigint: bool = field(init=False)
+    handle_sigterm: bool = field(init=False)
+    pipeline_idle_timeout_secs: int = field(init=False)

    def __post_init__(self):
        self.handle_sigint = False
@@ -44,6 +42,7 @@ class DailyRunnerArguments(RunnerArguments):

    room_url: str
    token: Optional[str] = None
+    body: Optional[Any] = field(default_factory=dict)


@dataclass
@@ -56,6 +55,7 @@ class WebSocketRunnerArguments(RunnerArguments):
    """

    websocket: WebSocket
+    body: Optional[Any] = field(default_factory=dict)


@dataclass
--- a/src/pipecat/services/assemblyai/models.py
+++ b/src/pipecat/services/assemblyai/models.py
@@ -108,8 +108,6 @@ class AssemblyAIConnectionParams(BaseModel):
        end_of_turn_confidence_threshold: Confidence threshold for end-of-turn detection.
        min_end_of_turn_silence_when_confident: Minimum silence duration when confident about end-of-turn.
        max_turn_silence: Maximum silence duration before forcing end-of-turn.
-        keyterms_prompt: List of key terms to guide transcription. Will be JSON serialized before sending.
-        speech_model: Select between English and multilingual models. Defaults to "universal-streaming-english".
    """

    sample_rate: int = 16000
@@ -119,7 +117,3 @@ class AssemblyAIConnectionParams(BaseModel):
    end_of_turn_confidence_threshold: Optional[float] = None
    min_end_of_turn_silence_when_confident: Optional[int] = None
    max_turn_silence: Optional[int] = None
-    keyterms_prompt: Optional[List[str]] = None
-    speech_model: Literal["universal-streaming-english", "universal-streaming-multilingual"] = (
-        "universal-streaming-english"
-    )
--- a/src/pipecat/services/assemblyai/stt.py
+++ b/src/pipecat/services/assemblyai/stt.py
@@ -174,16 +174,11 @@ class AssemblyAISTTService(STTService):

    def _build_ws_url(self) -> str:
        """Build WebSocket URL with query parameters using urllib.parse.urlencode."""
-        params = {}
-        for k, v in self._connection_params.model_dump().items():
-            if v is not None:
-                if k == "keyterms_prompt":
-                    params[k] = json.dumps(v)
-                elif isinstance(v, bool):
-                    params[k] = str(v).lower()
-                else:
-                    params[k] = v
-
+        params = {
+            k: str(v).lower() if isinstance(v, bool) else v
+            for k, v in self._connection_params.model_dump().items()
+            if v is not None
+        }
        if params:
            query_string = urlencode(params)
            return f"{self._api_endpoint_base_url}?{query_string}"
@@ -202,8 +197,6 @@ class AssemblyAISTTService(STTService):
            )
            self._connected = True
            self._receive_task = self.create_task(self._receive_task_handler())
-
-            await self._call_event_handler("on_connected")
        except Exception as e:
            logger.error(f"Failed to connect to AssemblyAI: {e}")
            self._connected = False
@@ -245,7 +238,6 @@ class AssemblyAISTTService(STTService):
            self._websocket = None
            self._connected = False
            self._receive_task = None
-            await self._call_event_handler("on_disconnected")

    async def _receive_task_handler(self):
        """Handle incoming WebSocket messages."""
--- a/src/pipecat/services/asyncai/tts.py
+++ b/src/pipecat/services/asyncai/tts.py
@@ -235,8 +235,6 @@ class AsyncAITTSService(InterruptibleTTSService):
            }

            await self._get_websocket().send(json.dumps(init_msg))
-
-            await self._call_event_handler("on_connected")
        except Exception as e:
            logger.error(f"{self} initialization error: {e}")
            self._websocket = None
@@ -254,7 +252,6 @@ class AsyncAITTSService(InterruptibleTTSService):
        finally:
            self._websocket = None
            self._started = False
-            await self._call_event_handler("on_disconnected")

    def _get_websocket(self):
        if self._websocket:
--- a/src/pipecat/services/aws/llm.py
+++ b/src/pipecat/services/aws/llm.py
@@ -720,11 +720,11 @@ class AWSBedrockLLMService(LLMService):
            additional_model_request_fields: Additional model-specific parameters.
        """

-        max_tokens: Optional[int] = Field(default=None, ge=1)
-        temperature: Optional[float] = Field(default=None, ge=0.0, le=1.0)
-        top_p: Optional[float] = Field(default=None, ge=0.0, le=1.0)
+        max_tokens: Optional[int] = Field(default_factory=lambda: 4096, ge=1)
+        temperature: Optional[float] = Field(default_factory=lambda: 0.7, ge=0.0, le=1.0)
+        top_p: Optional[float] = Field(default_factory=lambda: 0.999, ge=0.0, le=1.0)
        stop_sequences: Optional[List[str]] = Field(default_factory=lambda: [])
-        latency: Optional[str] = Field(default=None)
+        latency: Optional[str] = Field(default_factory=lambda: "standard")
        additional_model_request_fields: Optional[Dict[str, Any]] = Field(default_factory=dict)

    def __init__(
@@ -801,24 +801,6 @@ class AWSBedrockLLMService(LLMService):
        """
        return True

-    def _build_inference_config(self) -> Dict[str, Any]:
-        """Build inference config with only the parameters that are set.
-
-        This prevents conflicts with models (e.g., Claude Sonnet 4.5) that don't
-        allow certain parameter combinations like temperature and top_p together.
-
-        Returns:
-            Dictionary containing only the inference parameters that are not None.
-        """
-        inference_config = {}
-        if self._settings["max_tokens"] is not None:
-            inference_config["maxTokens"] = self._settings["max_tokens"]
-        if self._settings["temperature"] is not None:
-            inference_config["temperature"] = self._settings["temperature"]
-        if self._settings["top_p"] is not None:
-            inference_config["topP"] = self._settings["top_p"]
-        return inference_config
-
    async def run_inference(self, context: LLMContext | OpenAILLMContext) -> Optional[str]:
        """Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

@@ -844,16 +826,16 @@ class AWSBedrockLLMService(LLMService):
        model_id = self.model_name

        # Prepare request parameters
-        inference_config = self._build_inference_config()
-
        request_params = {
            "modelId": model_id,
            "messages": messages,
+            "inferenceConfig": {
+                "maxTokens": 8192,
+                "temperature": 0.7,
+                "topP": 0.9,
+            },
        }

-        if inference_config:
-            request_params["inferenceConfig"] = inference_config
-
        if system:
            request_params["system"] = system

@@ -992,20 +974,21 @@ class AWSBedrockLLMService(LLMService):
            tools = params_from_context["tools"]
            tool_choice = params_from_context["tool_choice"]

-            # Set up inference config - only include parameters that are set
-            inference_config = self._build_inference_config()
+            # Set up inference config
+            inference_config = {
+                "maxTokens": self._settings["max_tokens"],
+                "temperature": self._settings["temperature"],
+                "topP": self._settings["top_p"],
+            }

            # Prepare request parameters
            request_params = {
                "modelId": self.model_name,
                "messages": messages,
+                "inferenceConfig": inference_config,
                "additionalModelRequestFields": self._settings["additional_model_request_fields"],
            }

-            # Only add inference config if it has parameters
-            if inference_config:
-                request_params["inferenceConfig"] = inference_config
-
            # Add system message
            if system:
                request_params["system"] = system
--- a/src/pipecat/services/aws/nova_sonic/context.py
+++ b/src/pipecat/services/aws/nova_sonic/context.py
@@ -8,77 +8,8 @@

 This module provides specialized context aggregators and message handling for AWS Nova Sonic,
 including conversation history management and role-specific message processing.
-
-.. deprecated:: 0.0.91
-    AWS Nova Sonic no longer uses types from this module under the hood.
-    It now uses `LLMContext` and `LLMContextAggregatorPair`.
-    Using the new patterns should allow you to not need types from this module.
-
-    BEFORE:
-    ```
-    # Setup
-    context = OpenAILLMContext(messages, tools)
-    context_aggregator = llm.create_context_aggregator(context)
-
-    # Context frame type
-    frame: OpenAILLMContextFrame
-
-    # Context type
-    context: AWSNovaSonicLLMContext
-    # or
-    context: OpenAILLMContext
-    ```
-
-    AFTER:
-    ```
-    # Setup
-    context = LLMContext(messages, tools)
-    context_aggregator = LLMContextAggregatorPair(context)
-
-    # Context frame type
-    frame: LLMContextFrame
-
-    # Context type
-    context: LLMContext
-    ```
 """

-import warnings
-
-with warnings.catch_warnings():
-    warnings.simplefilter("always")
-    warnings.warn(
-        "Types in pipecat.services.aws.nova_sonic.context (or "
-        "pipecat.services.aws_nova_sonic.context) are deprecated. \n"
-        "AWS Nova Sonic no longer uses types from this module under the hood. \n"
-        "It now uses `LLMContext` and `LLMContextAggregatorPair`. \n"
-        "Using the new patterns should allow you to not need types from this module.\n\n"
-        "BEFORE:\n"
-        "```\n"
-        "# Setup\n"
-        "context = OpenAILLMContext(messages, tools)\n"
-        "context_aggregator = llm.create_context_aggregator(context)\n\n"
-        "# Context frame type\n"
-        "frame: OpenAILLMContextFrame\n\n"
-        "# Context type\n"
-        "context: AWSNovaSonicLLMContext\n"
-        "# or\n"
-        "context: OpenAILLMContext\n\n"
-        "```\n\n"
-        "AFTER:\n"
-        "```\n"
-        "# Setup\n"
-        "context = LLMContext(messages, tools)\n"
-        "context_aggregator = LLMContextAggregatorPair(context)\n\n"
-        "# Context frame type\n"
-        "frame: LLMContextFrame\n\n"
-        "# Context type\n"
-        "context: LLMContext\n\n"
-        "```",
-        DeprecationWarning,
-        stacklevel=2,
-    )
-
 import copy
 from dataclasses import dataclass, field
 from enum import Enum
--- a/src/pipecat/services/aws/nova_sonic/llm.py
+++ b/src/pipecat/services/aws/nova_sonic/llm.py
@@ -25,7 +25,7 @@ from loguru import logger
 from pydantic import BaseModel, Field

 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.adapters.services.aws_nova_sonic_adapter import AWSNovaSonicLLMAdapter, Role
+from pipecat.adapters.services.aws_nova_sonic_adapter import AWSNovaSonicLLMAdapter
 from pipecat.frames.frames import (
    BotStoppedSpeakingFrame,
    CancelFrame,
@@ -33,30 +33,35 @@ from pipecat.frames.frames import (
    Frame,
    FunctionCallFromLLM,
    InputAudioRawFrame,
-    InterruptionFrame,
+    InterimTranscriptionFrame,
    LLMContextFrame,
    LLMFullResponseEndFrame,
    LLMFullResponseStartFrame,
+    LLMTextFrame,
    StartFrame,
    TranscriptionFrame,
    TTSAudioRawFrame,
    TTSStartedFrame,
    TTSStoppedFrame,
    TTSTextFrame,
-    UserStartedSpeakingFrame,
-    UserStoppedSpeakingFrame,
 )
-from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response import (
    LLMAssistantAggregatorParams,
    LLMUserAggregatorParams,
 )
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.processors.aggregators.openai_llm_context import (
    OpenAILLMContext,
    OpenAILLMContextFrame,
 )
 from pipecat.processors.frame_processor import FrameDirection
+from pipecat.services.aws.nova_sonic.context import (
+    AWSNovaSonicAssistantContextAggregator,
+    AWSNovaSonicContextAggregatorPair,
+    AWSNovaSonicLLMContext,
+    AWSNovaSonicUserContextAggregator,
+    Role,
+)
+from pipecat.services.aws.nova_sonic.frames import AWSNovaSonicFunctionCallResultFrame
 from pipecat.services.llm_service import LLMService
 from pipecat.utils.time import time_now_iso8601

@@ -212,11 +217,6 @@ class AWSNovaSonicLLMService(LLMService):
            system_instruction: System-level instruction for the model.
            tools: Available tools/functions for the model to use.
            send_transcription_frames: Whether to emit transcription frames.
-
-                .. deprecated:: 0.0.91
-                    This parameter is deprecated and will be removed in a future version.
-                    Transcription frames are always sent.
-
            **kwargs: Additional arguments passed to the parent LLMService.
        """
        super().__init__(**kwargs)
@@ -230,20 +230,8 @@ class AWSNovaSonicLLMService(LLMService):
        self._params = params or Params()
        self._system_instruction = system_instruction
        self._tools = tools
-
-        if not send_transcription_frames:
-            import warnings
-
-            with warnings.catch_warnings():
-                warnings.simplefilter("always")
-                warnings.warn(
-                    "`send_transcription_frames` is deprecated and will be removed in a future version. "
-                    "Transcription frames are always sent.",
-                    DeprecationWarning,
-                    stacklevel=2,
-                )
-
-        self._context: Optional[LLMContext] = None
+        self._send_transcription_frames = send_transcription_frames
+        self._context: Optional[AWSNovaSonicLLMContext] = None
        self._stream: Optional[
            DuplexEventStream[
                InvokeModelWithBidirectionalStreamInput,
@@ -256,17 +244,12 @@ class AWSNovaSonicLLMService(LLMService):
        self._input_audio_content_name: Optional[str] = None
        self._content_being_received: Optional[CurrentContent] = None
        self._assistant_is_responding = False
-        self._may_need_repush_assistant_text = False
        self._ready_to_send_context = False
        self._handling_bot_stopped_speaking = False
        self._triggering_assistant_response = False
-        self._waiting_for_trigger_transcription = False
        self._disconnecting = False
        self._connected_time: Optional[float] = None
        self._wants_connection = False
-        self._user_text_buffer = ""
-        self._assistant_text_buffer = ""
-        self._completed_tool_calls = set()

        file_path = files("pipecat.services.aws.nova_sonic").joinpath("ready.wav")
        with wave.open(file_path.open("rb"), "rb") as wav_file:
@@ -319,12 +302,12 @@ class AWSNovaSonicLLMService(LLMService):
        logger.debug("Resetting conversation")
        await self._handle_bot_stopped_speaking(delay_to_catch_trailing_assistant_text=False)

-        # Grab context to carry through disconnect/reconnect
+        # Carry over previous context through disconnect
        context = self._context
-
        await self._disconnect()
+        self._context = context
+
        await self._start_connecting()
-        await self._handle_context(context)

    #
    # frame processing
@@ -339,35 +322,28 @@ class AWSNovaSonicLLMService(LLMService):
        """
        await super().process_frame(frame, direction)

-        if isinstance(frame, (LLMContextFrame, OpenAILLMContextFrame)):
-            context = (
-                frame.context
-                if isinstance(frame, LLMContextFrame)
-                else LLMContext.from_openai_context(frame.context)
+        if isinstance(frame, OpenAILLMContextFrame):
+            await self._handle_context(frame.context)
+        elif isinstance(frame, LLMContextFrame):
+            raise NotImplementedError(
+                "Universal LLMContext is not yet supported for AWS Nova Sonic."
            )
-            await self._handle_context(context)
        elif isinstance(frame, InputAudioRawFrame):
            await self._handle_input_audio_frame(frame)
        elif isinstance(frame, BotStoppedSpeakingFrame):
            await self._handle_bot_stopped_speaking(delay_to_catch_trailing_assistant_text=True)
-        elif isinstance(frame, InterruptionFrame):
-            await self._handle_interruption_frame()
+        elif isinstance(frame, AWSNovaSonicFunctionCallResultFrame):
+            await self._handle_function_call_result(frame)

        await self.push_frame(frame, direction)

-    async def _handle_context(self, context: LLMContext):
-        if self._disconnecting:
-            return
-
+    async def _handle_context(self, context: OpenAILLMContext):
        if not self._context:
-            # We got our initial context
-            # Try to finish connecting
-            self._context = context
+            # We got our initial context - try to finish connecting
+            self._context = AWSNovaSonicLLMContext.upgrade_to_nova_sonic(
+                context, self._system_instruction
+            )
            await self._finish_connecting_if_context_available()
-        else:
-            # We got an updated context
-            # Send results for any newly-completed function calls
-            await self._process_completed_function_calls(send_new_results=True)

    async def _handle_input_audio_frame(self, frame: InputAudioRawFrame):
        # Wait until we're done sending the assistant response trigger audio before sending audio
@@ -417,9 +393,9 @@ class AWSNovaSonicLLMService(LLMService):
        else:
            await finalize_assistant_response()

-    async def _handle_interruption_frame(self):
-        if self._assistant_is_responding:
-            self._may_need_repush_assistant_text = True
+    async def _handle_function_call_result(self, frame: AWSNovaSonicFunctionCallResultFrame):
+        result = frame.result_frame
+        await self._send_tool_result(tool_call_id=result.tool_call_id, result=result.result)

    #
    # LLM communication: lifecycle
@@ -455,17 +431,6 @@ class AWSNovaSonicLLMService(LLMService):
            logger.error(f"{self} initialization error: {e}")
            await self._disconnect()

-    async def _process_completed_function_calls(self, send_new_results: bool):
-        # Check for set of completed function calls in the context
-        for message in self._context.get_messages():
-            if message.get("role") and message.get("content") != "IN_PROGRESS":
-                tool_call_id = message.get("tool_call_id")
-                if tool_call_id and tool_call_id not in self._completed_tool_calls:
-                    # Found a newly-completed function call - send the result to the service
-                    if send_new_results:
-                        await self._send_tool_result(tool_call_id, message.get("content"))
-                    self._completed_tool_calls.add(tool_call_id)
-
    async def _finish_connecting_if_context_available(self):
        # We can only finish connecting once we've gotten our initial context and we're ready to
        # send it
@@ -474,38 +439,30 @@ class AWSNovaSonicLLMService(LLMService):

        logger.info("Finishing connecting (setting up session)...")

-        # Initialize our bookkeeping of already-completed tool calls in the
-        # context
-        await self._process_completed_function_calls(send_new_results=False)
-
        # Read context
-        adapter: AWSNovaSonicLLMAdapter = self.get_llm_adapter()
-        llm_connection_params = adapter.get_llm_invocation_params(self._context)
+        history = self._context.get_messages_for_initializing_history()

        # Send prompt start event, specifying tools.
        # Tools from context take priority over self._tools.
        tools = (
-            llm_connection_params["tools"]
-            if llm_connection_params["tools"]
-            else adapter.from_standard_tools(self._tools)
+            self._context.tools
+            if self._context.tools
+            else self.get_llm_adapter().from_standard_tools(self._tools)
        )
        logger.debug(f"Using tools: {tools}")
        await self._send_prompt_start_event(tools)

        # Send system instruction.
        # Instruction from context takes priority over self._system_instruction.
-        system_instruction = (
-            llm_connection_params["system_instruction"]
-            if llm_connection_params["system_instruction"]
-            else self._system_instruction
-        )
-        logger.debug(f"Using system instruction: {system_instruction}")
-        if system_instruction:
-            await self._send_text_event(text=system_instruction, role=Role.SYSTEM)
+        # (NOTE: this prioritizing occurred automatically behind the scenes: the context was
+        # initialized with self._system_instruction and then updated itself from its messages when
+        # get_messages_for_initializing_history() was called).
+        logger.debug(f"Using system instruction: {history.system_instruction}")
+        if history.system_instruction:
+            await self._send_text_event(text=history.system_instruction, role=Role.SYSTEM)

        # Send conversation history
-        for message in llm_connection_params["messages"]:
-            # logger.debug(f"Seeding conversation history with message: {message}")
+        for message in history.messages:
            await self._send_text_event(text=message.text, role=message.role)

        # Start audio input
@@ -535,12 +492,9 @@ class AWSNovaSonicLLMService(LLMService):
                await self._send_session_end_events()
                self._client = None

-            # Clean up context
-            self._context = None
-
            # Clean up stream
            if self._stream:
-                await self._stream.close()
+                await self._stream.input_stream.close()
                self._stream = None

            # NOTE: see explanation of HACK, below
@@ -556,23 +510,15 @@ class AWSNovaSonicLLMService(LLMService):
                self._receive_task = None

            # Reset remaining connection-specific state
-            # Should be all private state except:
-            # - _wants_connection
-            # - _assistant_response_trigger_audio
            self._prompt_name = None
            self._input_audio_content_name = None
            self._content_being_received = None
            self._assistant_is_responding = False
-            self._may_need_repush_assistant_text = False
            self._ready_to_send_context = False
            self._handling_bot_stopped_speaking = False
            self._triggering_assistant_response = False
-            self._waiting_for_trigger_transcription = False
            self._disconnecting = False
            self._connected_time = None
-            self._user_text_buffer = ""
-            self._assistant_text_buffer = ""
-            self._completed_tool_calls = set()

            logger.info("Finished disconnecting")
        except Exception as e:
@@ -880,10 +826,6 @@ class AWSNovaSonicLLMService(LLMService):
                            # Handle the LLM completion ending
                            await self._handle_completion_end_event(event_json)
        except Exception as e:
-            if self._disconnecting:
-                # Errors are kind of expected while disconnecting, so just
-                # ignore them and do nothing
-                return
            logger.error(f"{self} error processing responses: {e}")
            if self._wants_connection:
                await self.reset_conversation()
@@ -1014,7 +956,7 @@ class AWSNovaSonicLLMService(LLMService):
    async def _report_assistant_response_started(self):
        logger.debug("Assistant response started")

-        # Report the start of the assistant response.
+        # Report that the assistant has started their response.
        await self.push_frame(LLMFullResponseStartFrame())

        # Report that equivalent of TTS (this is a speech-to-speech model) started
@@ -1026,16 +968,23 @@ class AWSNovaSonicLLMService(LLMService):

        logger.debug(f"Assistant response text added: {text}")

-        # Report the text of the assistant response.
+        # Report some text added to the ongoing assistant response
+        await self.push_frame(LLMTextFrame(text))
+
+        # Report some text added to the *equivalent* of TTS (this is a speech-to-speech model)
        await self.push_frame(TTSTextFrame(text))

-        # HACK: here we're also buffering the assistant text ourselves as a
-        # backup rather than relying solely on the assistant context aggregator
-        # to do it, because the text arrives from Nova Sonic only after all the
-        # assistant audio frames have been pushed, meaning that if an
-        # interruption frame were to arrive we would lose all of it (the text
-        # frames sitting in the queue would be wiped).
-        self._assistant_text_buffer += text
+        # TODO: this is a (hopefully temporary) HACK. Here we directly manipulate the context rather
+        # than relying on the frames pushed to the assistant context aggregator. The pattern of
+        # receiving full-sentence text after the assistant has spoken does not easily fit with the
+        # Pipecat expectation of chunks of text streaming in while the assistant is speaking.
+        # Interruption handling was especially challenging. Rather than spend days trying to fit a
+        # square peg in a round hole, I decided on this hack for the time being. We can most cleanly
+        # abandon this hack if/when AWS Nova Sonic implements streaming smaller text chunks
+        # interspersed with audio. Note that when we move away from this hack, we need to make sure
+        # that on an interruption we avoid sending LLMFullResponseEndFrame, which gets the
+        # LLMAssistantContextAggregator into a bad state.
+        self._context.buffer_assistant_text(text)

    async def _report_assistant_response_ended(self):
        if not self._context:  # should never happen
@@ -1043,34 +992,14 @@ class AWSNovaSonicLLMService(LLMService):

        logger.debug("Assistant response ended")

-        # If an interruption frame arrived while the assistant was responding
-        # we may have lost all of the assistant text (see HACK, above), so
-        # re-push it downstream to the aggregator now.
-        if self._may_need_repush_assistant_text:
-            # Just in case, check that assistant text hasn't already made it
-            # into the context (sometimes it does, despite the interruption).
-            messages = self._context.get_messages()
-            last_message = messages[-1] if messages else None
-            if (
-                not last_message
-                or last_message.get("role") != "assistant"
-                or last_message.get("content") != self._assistant_text_buffer
-            ):
-                # We also need to re-push the LLMFullResponseStartFrame since the
-                # TTSTextFrame would be ignored otherwise (the interruption frame
-                # would have cleared the assistant aggregator state).
-                await self.push_frame(LLMFullResponseStartFrame())
-                await self.push_frame(TTSTextFrame(self._assistant_text_buffer))
-            self._may_need_repush_assistant_text = False
-
-        # Report the end of the assistant response.
+        # Report that the assistant has finished their response.
        await self.push_frame(LLMFullResponseEndFrame())

        # Report that equivalent of TTS (this is a speech-to-speech model) stopped.
        await self.push_frame(TTSStoppedFrame())

-        # Clear out the buffered assistant text
-        self._assistant_text_buffer = ""
+        # For an explanation of this hack, see _report_assistant_response_text_added.
+        self._context.flush_aggregated_assistant_text()

    #
    # user transcription reporting
@@ -1087,67 +1016,33 @@ class AWSNovaSonicLLMService(LLMService):

        logger.debug(f"User transcription text added: {text}")

-        # HACK: here we're buffering the user text ourselves rather than
-        # relying on the upstream user context aggregator to do it, because the
-        # text arrives in fairly large chunks spaced fairly far apart in time.
-        # That means the user text would be split between different messages in
-        # context. Even if we sent placeholder InterimTranscriptionFrames in
-        # between each TranscriptionFrame to tell the aggregator to hold off on
-        # finalizing the user message, the aggregator would likely get the last
-        # chunk too late.
-        self._user_text_buffer += f" {text}" if self._user_text_buffer else text
+        # Manually add new user transcription text to context.
+        # We can't rely on the user context aggregator to do this since it's upstream from the LLM.
+        self._context.buffer_user_text(text)
+
+        # Report that some new user transcription text is available.
+        if self._send_transcription_frames:
+            await self.push_frame(
+                InterimTranscriptionFrame(text=text, user_id="", timestamp=time_now_iso8601())
+            )

    async def _report_user_transcription_ended(self):
        if not self._context:  # should never happen
            return

+        # Manually add user transcription to context (if any has been buffered).
+        # We can't rely on the user context aggregator to do this since it's upstream from the LLM.
+        transcription = self._context.flush_aggregated_user_text()
+
+        if not transcription:
+            return
+
        logger.debug(f"User transcription ended")

-        # Report to the upstream user context aggregator that some new user
-        # transcription text is available.
-
-        # HACK: Check if this transcription was triggered by our own
-        # assistant response trigger. If so, we need to wrap it with
-        # UserStarted/StoppedSpeakingFrames; otherwise the user aggregator
-        # would fire an EmulatedUserStartedSpeakingFrame, which would
-        # trigger an interruption, which would prevent us from writing the
-        # assistant response to context.
-        #
-        # Sending an EmulateUserStartedSpeakingFrame ourselves doesn't
-        # work: it just causes the interruption we're trying to avoid.
-        #
-        # Setting enable_emulated_vad_interruptions also doesn't work: at
-        # the time the user aggregator receives the TranscriptionFrame, it
-        # doesn't yet know the assistant has started responding, so it
-        # doesn't know that emulating the user starting to speak would
-        # cause an interruption.
-        should_wrap_in_user_started_stopped_speaking_frames = (
-            self._waiting_for_trigger_transcription
-            and self._user_text_buffer.strip().lower() == "ready"
-        )
-
-        # Start wrapping the upstream transcription in UserStarted/StoppedSpeakingFrames if needed
-        if should_wrap_in_user_started_stopped_speaking_frames:
-            logger.debug(
-                "Wrapping assistant response trigger transcription with upstream UserStarted/StoppedSpeakingFrames"
+        if self._send_transcription_frames:
+            await self.push_frame(
+                TranscriptionFrame(text=transcription, user_id="", timestamp=time_now_iso8601())
            )
-            await self.push_frame(UserStartedSpeakingFrame(), direction=FrameDirection.UPSTREAM)
-
-        # Send the transcription upstream for the user context aggregator
-        frame = TranscriptionFrame(
-            text=self._user_text_buffer, user_id="", timestamp=time_now_iso8601()
-        )
-        await self.push_frame(frame, direction=FrameDirection.UPSTREAM)
-
-        # Finish wrapping the upstream transcription in UserStarted/StoppedSpeakingFrames if needed
-        if should_wrap_in_user_started_stopped_speaking_frames:
-            await self.push_frame(UserStoppedSpeakingFrame(), direction=FrameDirection.UPSTREAM)
-
-        # Clear out the buffered user text
-        self._user_text_buffer = ""
-
-        # We're no longer waiting for a trigger transcription
-        self._waiting_for_trigger_transcription = False

    #
    # context
@@ -1159,26 +1054,23 @@ class AWSNovaSonicLLMService(LLMService):
        *,
        user_params: LLMUserAggregatorParams = LLMUserAggregatorParams(),
        assistant_params: LLMAssistantAggregatorParams = LLMAssistantAggregatorParams(),
-    ) -> LLMContextAggregatorPair:
+    ) -> AWSNovaSonicContextAggregatorPair:
        """Create context aggregator pair for managing conversation context.

-        NOTE: this method exists only for backward compatibility. New code
-        should instead do:
-            context = LLMContext(...)
-            context_aggregator = LLMContextAggregatorPair(context)
-
        Args:
-            context: The OpenAI LLM context.
+            context: The OpenAI LLM context to upgrade.
            user_params: Parameters for the user context aggregator.
            assistant_params: Parameters for the assistant context aggregator.

        Returns:
            A pair of user and assistant context aggregators.
        """
-        context = LLMContext.from_openai_context(context)
-        return LLMContextAggregatorPair(
-            context, user_params=user_params, assistant_params=assistant_params
-        )
+        context.set_llm_adapter(self.get_llm_adapter())
+
+        user = AWSNovaSonicUserContextAggregator(context=context, params=user_params)
+        assistant = AWSNovaSonicAssistantContextAggregator(context=context, params=assistant_params)
+
+        return AWSNovaSonicContextAggregatorPair(user, assistant)

    #
    # assistant response trigger (HACK)
@@ -1216,8 +1108,6 @@ class AWSNovaSonicLLMService(LLMService):
        try:
            logger.debug("Sending assistant response trigger...")

-            self._waiting_for_trigger_transcription = True
-
            chunk_duration = 0.02  # what we might get from InputAudioRawFrame
            chunk_size = int(
                chunk_duration
--- a/src/pipecat/services/aws/stt.py
+++ b/src/pipecat/services/aws/stt.py
@@ -286,7 +286,6 @@ class AWSTranscribeSTTService(STTService):

                logger.info(f"{self} Successfully connected to AWS Transcribe")

-                await self._call_event_handler("on_connected")
            except Exception as e:
                logger.error(f"{self} Failed to connect to AWS Transcribe: {e}")
                await self._disconnect()
@@ -311,7 +310,6 @@ class AWSTranscribeSTTService(STTService):
            logger.warning(f"{self} Error closing WebSocket connection: {e}")
        finally:
            self._ws_client = None
-            await self._call_event_handler("on_disconnected")

    def language_to_service_language(self, language: Language) -> str | None:
        """Convert internal language enum to AWS Transcribe language code.
--- a/src/pipecat/services/aws_nova_sonic/context.py
+++ b/src/pipecat/services/aws_nova_sonic/context.py
@@ -8,14 +8,18 @@

 This module provides specialized context aggregators and message handling for AWS Nova Sonic,
 including conversation history management and role-specific message processing.
-
-.. deprecated:: 0.0.91
-    AWS Nova Sonic no longer uses types from this module under the hood.
-    It now uses `LLMContext` and `LLMContextAggregatorPair`.
-    Using the new patterns should allow you to not need types from this module.
-
-    See deprecation warning in pipecat.services.aws.nova_sonic.context for more
-    details.
 """

+import warnings
+
 from pipecat.services.aws.nova_sonic.context import *
+
+with warnings.catch_warnings():
+    warnings.simplefilter("always")
+    warnings.warn(
+        "Types in pipecat.services.aws_nova_sonic.context are deprecated. "
+        "Please use the equivalent types from "
+        "pipecat.services.aws.nova_sonic.context instead.",
+        DeprecationWarning,
+        stacklevel=2,
+    )
--- a/src/pipecat/services/azure/realtime/llm.py
+++ b/src/pipecat/services/azure/realtime/llm.py
@@ -38,7 +38,7 @@ class AzureRealtimeLLMService(OpenAIRealtimeLLMService):
        Args:
            api_key: The API key for the Azure OpenAI service.
            base_url: The full Azure WebSocket endpoint URL including api-version and deployment.
-                Example: "wss://my-project.openai.azure.com/openai/realtime?api-version=2025-04-01-preview&deployment=my-realtime-deployment"
+                Example: "wss://my-project.openai.azure.com/openai/realtime?api-version=2024-10-01-preview&deployment=my-realtime-deployment"
            **kwargs: Additional arguments passed to parent OpenAIRealtimeLLMService.
        """
        super().__init__(base_url=base_url, api_key=api_key, **kwargs)
@@ -52,7 +52,7 @@ class AzureRealtimeLLMService(OpenAIRealtimeLLMService):
                # handle disconnections in the send/recv code paths.
                return

-            logger.info(f"Connecting to {self.base_url}")
+            logger.info(f"Connecting to {self.base_url}, api key: {self.api_key}")
            self._websocket = await websocket_connect(
                uri=self.base_url,
                additional_headers={
--- a/src/pipecat/services/cartesia/stt.py
+++ b/src/pipecat/services/cartesia/stt.py
@@ -28,12 +28,13 @@ from pipecat.frames.frames import (
    UserStoppedSpeakingFrame,
 )
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.stt_service import WebsocketSTTService
+from pipecat.services.stt_service import STTService
 from pipecat.transcriptions.language import Language
 from pipecat.utils.time import time_now_iso8601
 from pipecat.utils.tracing.service_decorators import traced_stt

 try:
+    import websockets
    from websockets.asyncio.client import connect as websocket_connect
    from websockets.protocol import State
 except ModuleNotFoundError as e:
@@ -123,7 +124,7 @@ class CartesiaLiveOptions:
        return cls(**json.loads(json_str))


-class CartesiaSTTService(WebsocketSTTService):
+class CartesiaSTTService(STTService):
    """Speech-to-text service using Cartesia Live API.

    Provides real-time speech transcription through WebSocket connection
@@ -175,7 +176,8 @@ class CartesiaSTTService(WebsocketSTTService):
        self.set_model_name(merged_options.model)
        self._api_key = api_key
        self._base_url = base_url or "api.cartesia.ai"
-        self._receive_task = None
+        self._connection = None
+        self._receiver_task = None

    def can_generate_metrics(self) -> bool:
        """Check if the service can generate processing metrics.
@@ -212,27 +214,6 @@ class CartesiaSTTService(WebsocketSTTService):
        await super().cancel(frame)
        await self._disconnect()

-    async def start_metrics(self):
-        """Start performance metrics collection for transcription processing."""
-        await self.start_ttfb_metrics()
-        await self.start_processing_metrics()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        """Process incoming frames and handle speech events.
-
-        Args:
-            frame: The frame to process.
-            direction: Direction of frame flow in the pipeline.
-        """
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, UserStartedSpeakingFrame):
-            await self.start_metrics()
-        elif isinstance(frame, UserStoppedSpeakingFrame):
-            # Send finalize command to flush the transcription session
-            if self._websocket and self._websocket.state is State.OPEN:
-                await self._websocket.send("finalize")
-
    async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
        """Process audio data for speech-to-text transcription.

@@ -243,71 +224,45 @@ class CartesiaSTTService(WebsocketSTTService):
            None - transcription results are handled via WebSocket responses.
        """
        # If the connection is closed, due to timeout, we need to reconnect when the user starts speaking again
-        if not self._websocket or self._websocket.state is State.CLOSED:
+        if not self._connection or self._connection.state is State.CLOSED:
            await self._connect()

-        await self._websocket.send(audio)
+        await self._connection.send(audio)
        yield None

    async def _connect(self):
-        await self._connect_websocket()
+        params = self._settings.to_dict()
+        ws_url = f"wss://{self._base_url}/stt/websocket?{urllib.parse.urlencode(params)}"
+        logger.debug(f"Connecting to Cartesia: {ws_url}")
+        headers = {"Cartesia-Version": "2025-04-16", "X-API-Key": self._api_key}

-        if self._websocket and not self._receive_task:
-            self._receive_task = asyncio.create_task(self._receive_task_handler(self._report_error))
-
-    async def _disconnect(self):
-        if self._receive_task:
-            await self.cancel_task(self._receive_task)
-            self._receive_task = None
-
-        await self._disconnect_websocket()
-
-    async def _connect_websocket(self):
        try:
-            if self._websocket and self._websocket.state is State.OPEN:
-                return
-            logger.debug("Connecting to Cartesia STT")
-
-            params = self._settings.to_dict()
-            ws_url = f"wss://{self._base_url}/stt/websocket?{urllib.parse.urlencode(params)}"
-            headers = {"Cartesia-Version": "2025-04-16", "X-API-Key": self._api_key}
-
-            self._websocket = await websocket_connect(ws_url, additional_headers=headers)
-            await self._call_event_handler("on_connected")
+            self._connection = await websocket_connect(ws_url, additional_headers=headers)
+            # Setup the receiver task to handle the incoming messages from the Cartesia server
+            if self._receiver_task is None or self._receiver_task.done():
+                self._receiver_task = asyncio.create_task(self._receive_messages())
+            logger.debug(f"Connected to Cartesia")
        except Exception as e:
            logger.error(f"{self}: unable to connect to Cartesia: {e}")

-    async def _disconnect_websocket(self):
-        try:
-            if self._websocket and self._websocket.state is State.OPEN:
-                logger.debug("Disconnecting from Cartesia STT")
-                await self._websocket.close()
-        except Exception as e:
-            logger.error(f"{self} error closing websocket: {e}")
-        finally:
-            self._websocket = None
-            await self._call_event_handler("on_disconnected")
-
-    def _get_websocket(self):
-        if self._websocket:
-            return self._websocket
-        raise Exception("Websocket not connected")
-
-    async def _process_messages(self):
-        async for message in self._get_websocket():
-            try:
-                data = json.loads(message)
-                await self._process_response(data)
-            except json.JSONDecodeError:
-                logger.warning(f"Received non-JSON message: {message}")
-
    async def _receive_messages(self):
-        while True:
-            await self._process_messages()
-            # Cartesia times out after 5 minutes of innactivity (no keepalive
-            # mechanism is available). So, we try to reconnect.
-            logger.debug(f"{self} Cartesia connection was disconnected (timeout?), reconnecting")
-            await self._connect_websocket()
+        try:
+            while True:
+                if not self._connection or self._connection.state is State.CLOSED:
+                    break
+
+                message = await self._connection.recv()
+                try:
+                    data = json.loads(message)
+                    await self._process_response(data)
+                except json.JSONDecodeError:
+                    logger.warning(f"Received non-JSON message: {message}")
+        except asyncio.CancelledError:
+            pass
+        except websockets.exceptions.ConnectionClosed as e:
+            logger.debug(f"WebSocket connection closed: {e}")
+        except Exception as e:
+            logger.error(f"Error in message receiver: {e}")

    async def _process_response(self, data):
        if "type" in data:
@@ -361,3 +316,41 @@ class CartesiaSTTService(WebsocketSTTService):
                        language,
                    )
                )
+
+    async def _disconnect(self):
+        if self._receiver_task:
+            self._receiver_task.cancel()
+            try:
+                await self._receiver_task
+            except asyncio.CancelledError:
+                pass
+            except Exception as e:
+                logger.exception(f"Unexpected exception while cancelling task: {e}")
+            self._receiver_task = None
+
+        if self._connection and self._connection.state is State.OPEN:
+            logger.debug("Disconnecting from Cartesia")
+
+            await self._connection.close()
+            self._connection = None
+
+    async def start_metrics(self):
+        """Start performance metrics collection for transcription processing."""
+        await self.start_ttfb_metrics()
+        await self.start_processing_metrics()
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """Process incoming frames and handle speech events.
+
+        Args:
+            frame: The frame to process.
+            direction: Direction of frame flow in the pipeline.
+        """
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserStartedSpeakingFrame):
+            await self.start_metrics()
+        elif isinstance(frame, UserStoppedSpeakingFrame):
+            # Send finalize command to flush the transcription session
+            if self._connection and self._connection.state is State.OPEN:
+                await self._connection.send("finalize")
--- a/src/pipecat/services/cartesia/tts.py
+++ b/src/pipecat/services/cartesia/tts.py
@@ -48,26 +48,6 @@ except ModuleNotFoundError as e:
    raise Exception(f"Missing module: {e}")


-class GenerationConfig(BaseModel):
-    """Configuration for Cartesia Sonic-3 generation parameters.
-
-    Sonic-3 interprets these parameters as guidance to ensure natural speech.
-    Test against your content for best results.
-
-    Parameters:
-        volume: Volume multiplier for generated speech. Valid range: [0.5, 2.0]. Default is 1.0.
-        speed: Speed multiplier for generated speech. Valid range: [0.6, 1.5]. Default is 1.0.
-        emotion: Single emotion string to guide the emotional tone. Examples include neutral,
-            angry, excited, content, sad, scared. Over 60 emotions are supported. For best
-            results, use with recommended voices: Leo, Jace, Kyle, Gavin, Maya, Tessa, Dana,
-            and Marian.
-    """
-
-    volume: Optional[float] = None
-    speed: Optional[float] = None
-    emotion: Optional[str] = None
-
-
 def language_to_cartesia_language(language: Language) -> Optional[str]:
    """Convert a Language enum to Cartesia language code.

@@ -121,20 +101,16 @@ class CartesiaTTSService(AudioContextWordTTSService):

        Parameters:
            language: Language to use for synthesis.
-            speed: Voice speed control for non-Sonic-3 models (literal values).
-            emotion: List of emotion controls for non-Sonic-3 models.
+            speed: Voice speed control.
+            emotion: List of emotion controls.

                .. deprecated:: 0.0.68
                        The `emotion` parameter is deprecated and will be removed in a future version.
-
-            generation_config: Generation configuration for Sonic-3 models. Includes volume,
-                speed (numeric), and emotion (string) parameters.
        """

        language: Optional[Language] = Language.EN
        speed: Optional[Literal["slow", "normal", "fast"]] = None
        emotion: Optional[List[str]] = []
-        generation_config: Optional[GenerationConfig] = None

    def __init__(
        self,
@@ -143,7 +119,7 @@ class CartesiaTTSService(AudioContextWordTTSService):
        voice_id: str,
        cartesia_version: str = "2025-04-16",
        url: str = "wss://api.cartesia.ai/tts/websocket",
-        model: str = "sonic-3",
+        model: str = "sonic-2",
        sample_rate: Optional[int] = None,
        encoding: str = "pcm_s16le",
        container: str = "raw",
@@ -159,7 +135,7 @@ class CartesiaTTSService(AudioContextWordTTSService):
            voice_id: ID of the voice to use for synthesis.
            cartesia_version: API version string for Cartesia service.
            url: WebSocket URL for Cartesia TTS API.
-            model: TTS model to use (e.g., "sonic-3").
+            model: TTS model to use (e.g., "sonic-2").
            sample_rate: Audio sample rate. If None, uses default.
            encoding: Audio encoding format.
            container: Audio container format.
@@ -203,7 +179,6 @@ class CartesiaTTSService(AudioContextWordTTSService):
            else "en",
            "speed": params.speed,
            "emotion": params.emotion,
-            "generation_config": params.generation_config,
        }
        self.set_model_name(model)
        self.set_voice(voice_id)
@@ -322,11 +297,6 @@ class CartesiaTTSService(AudioContextWordTTSService):
        if self._settings["speed"]:
            msg["speed"] = self._settings["speed"]

-        if self._settings["generation_config"]:
-            msg["generation_config"] = self._settings["generation_config"].model_dump(
-                exclude_none=True
-            )
-
        return json.dumps(msg)

    async def start(self, frame: StartFrame):
@@ -374,11 +344,10 @@ class CartesiaTTSService(AudioContextWordTTSService):
        try:
            if self._websocket and self._websocket.state is State.OPEN:
                return
-            logger.debug("Connecting to Cartesia TTS")
+            logger.debug("Connecting to Cartesia")
            self._websocket = await websocket_connect(
                f"{self._url}?api_key={self._api_key}&cartesia_version={self._cartesia_version}"
            )
-            await self._call_event_handler("on_connected")
        except Exception as e:
            logger.error(f"{self} initialization error: {e}")
            self._websocket = None
@@ -396,7 +365,6 @@ class CartesiaTTSService(AudioContextWordTTSService):
        finally:
            self._context_id = None
            self._websocket = None
-            await self._call_event_handler("on_disconnected")

    def _get_websocket(self):
        if self._websocket:
@@ -512,27 +480,23 @@ class CartesiaHttpTTSService(TTSService):

        Parameters:
            language: Language to use for synthesis.
-            speed: Voice speed control for non-Sonic-3 models (literal values).
-            emotion: List of emotion controls for non-Sonic-3 models.
+            speed: Voice speed control.
+            emotion: List of emotion controls.

                .. deprecated:: 0.0.68
                        The `emotion` parameter is deprecated and will be removed in a future version.
-
-            generation_config: Generation configuration for Sonic-3 models. Includes volume,
-                speed (numeric), and emotion (string) parameters.
        """

        language: Optional[Language] = Language.EN
        speed: Optional[Literal["slow", "normal", "fast"]] = None
        emotion: Optional[List[str]] = Field(default_factory=list)
-        generation_config: Optional[GenerationConfig] = None

    def __init__(
        self,
        *,
        api_key: str,
        voice_id: str,
-        model: str = "sonic-3",
+        model: str = "sonic-2",
        base_url: str = "https://api.cartesia.ai",
        cartesia_version: str = "2024-11-13",
        sample_rate: Optional[int] = None,
@@ -546,7 +510,7 @@ class CartesiaHttpTTSService(TTSService):
        Args:
            api_key: Cartesia API key for authentication.
            voice_id: ID of the voice to use for synthesis.
-            model: TTS model to use (e.g., "sonic-3").
+            model: TTS model to use (e.g., "sonic-2").
            base_url: Base URL for Cartesia HTTP API.
            cartesia_version: API version string for Cartesia service.
            sample_rate: Audio sample rate. If None, uses default.
@@ -573,7 +537,6 @@ class CartesiaHttpTTSService(TTSService):
            else "en",
            "speed": params.speed,
            "emotion": params.emotion,
-            "generation_config": params.generation_config,
        }
        self.set_voice(voice_id)
        self.set_model_name(model)
@@ -667,11 +630,6 @@ class CartesiaHttpTTSService(TTSService):
            if self._settings["speed"]:
                payload["speed"] = self._settings["speed"]

-            if self._settings["generation_config"]:
-                payload["generation_config"] = self._settings["generation_config"].model_dump(
-                    exclude_none=True
-                )
-
            yield TTSStartedFrame()

            session = await self._client._get_session()
--- a/src/pipecat/services/deepgram/flux/stt.py
+++ b/src/pipecat/services/deepgram/flux/stt.py
@@ -156,12 +156,6 @@ class DeepgramFluxSTTService(WebsocketSTTService):
        self._language = Language.EN
        self._websocket_url = None
        self._receive_task = None
-        # Flux event handlers
-        self._register_event_handler("on_start_of_turn")
-        self._register_event_handler("on_turn_resumed")
-        self._register_event_handler("on_end_of_turn")
-        self._register_event_handler("on_eager_end_of_turn")
-        self._register_event_handler("on_update")

    async def _connect(self):
        """Connect to WebSocket and start background tasks.
@@ -211,7 +205,6 @@ class DeepgramFluxSTTService(WebsocketSTTService):
                additional_headers={"Authorization": f"Token {self._api_key}"},
            )
            logger.debug("Connected to Deepgram Flux Websocket")
-            await self._call_event_handler("on_connected")
        except Exception as e:
            logger.error(f"{self} initialization error: {e}")
            self._websocket = None
@@ -232,9 +225,6 @@ class DeepgramFluxSTTService(WebsocketSTTService):
                await self._websocket.close()
        except Exception as e:
            logger.error(f"{self} error closing websocket: {e}")
-        finally:
-            self._websocket = None
-            await self._call_event_handler("on_disconnected")

    async def _send_close_stream(self) -> None:
        """Sends a CloseStream control message to the Deepgram Flux WebSocket API.
@@ -529,7 +519,6 @@ class DeepgramFluxSTTService(WebsocketSTTService):
        await self.push_frame(UserStartedSpeakingFrame(), FrameDirection.DOWNSTREAM)
        await self.push_frame(UserStartedSpeakingFrame(), FrameDirection.UPSTREAM)
        await self.start_metrics()
-        await self._call_event_handler("on_start_of_turn", transcript)
        if transcript:
            logger.trace(f"Start of turn transcript: {transcript}")

@@ -544,7 +533,6 @@ class DeepgramFluxSTTService(WebsocketSTTService):
            event: The event type string for logging purposes.
        """
        logger.trace(f"Received event TurnResumed: {event}")
-        await self._call_event_handler("on_turn_resumed")

    async def _handle_end_of_turn(self, transcript: str, data: Dict[str, Any]):
        """Handle EndOfTurn events from Deepgram Flux.
@@ -579,7 +567,6 @@ class DeepgramFluxSTTService(WebsocketSTTService):
        await self.stop_processing_metrics()
        await self.push_frame(UserStoppedSpeakingFrame(), FrameDirection.DOWNSTREAM)
        await self.push_frame(UserStoppedSpeakingFrame(), FrameDirection.UPSTREAM)
-        await self._call_event_handler("on_end_of_turn", transcript)

    async def _handle_eager_end_of_turn(self, transcript: str, data: Dict[str, Any]):
        """Handle EagerEndOfTurn events from Deepgram Flux.
@@ -624,7 +611,6 @@ class DeepgramFluxSTTService(WebsocketSTTService):
                result=data,
            )
        )
-        await self._call_event_handler("on_eager_end_of_turn", transcript)

    async def _handle_update(self, transcript: str):
        """Handle Update events from Deepgram Flux.
@@ -648,4 +634,3 @@ class DeepgramFluxSTTService(WebsocketSTTService):
            # both the "user started speaking" event and the first transcript simultaneously,
            # making this timing measurement meaningless in this context.
            # await self.stop_ttfb_metrics()
-            await self._call_event_handler("on_update", transcript)
--- a/src/pipecat/services/deepgram/tts.py
+++ b/src/pipecat/services/deepgram/tts.py
@@ -12,7 +12,6 @@ for generating speech from text using various voice models.

 from typing import AsyncGenerator, Optional

-import aiohttp
 from loguru import logger

 from pipecat.frames.frames import (
@@ -118,114 +117,3 @@ class DeepgramTTSService(TTSService):
        except Exception as e:
            logger.exception(f"{self} exception: {e}")
            yield ErrorFrame(f"Error getting audio: {str(e)}")
-
-
-class DeepgramHttpTTSService(TTSService):
-    """Deepgram HTTP text-to-speech service.
-
-    Provides text-to-speech synthesis using Deepgram's HTTP TTS API.
-    Supports various voice models and audio encoding formats with
-    configurable sample rates and quality settings.
-    """
-
-    def __init__(
-        self,
-        *,
-        api_key: str,
-        voice: str = "aura-2-helena-en",
-        aiohttp_session: aiohttp.ClientSession,
-        base_url: str = "https://api.deepgram.com",
-        sample_rate: Optional[int] = None,
-        encoding: str = "linear16",
-        **kwargs,
-    ):
-        """Initialize the Deepgram TTS service.
-
-        Args:
-            api_key: Deepgram API key for authentication.
-            voice: Voice model to use for synthesis. Defaults to "aura-2-helena-en".
-            aiohttp_session: Shared aiohttp session for HTTP requests with connection pooling.
-            base_url: Custom base URL for Deepgram API. Defaults to "https://api.deepgram.com".
-            sample_rate: Audio sample rate in Hz. If None, uses service default.
-            encoding: Audio encoding format. Defaults to "linear16".
-            **kwargs: Additional arguments passed to parent TTSService class.
-        """
-        super().__init__(sample_rate=sample_rate, **kwargs)
-
-        self._api_key = api_key
-        self._session = aiohttp_session
-        self._base_url = base_url
-        self._settings = {
-            "encoding": encoding,
-        }
-        self.set_voice(voice)
-
-    def can_generate_metrics(self) -> bool:
-        """Check if the service can generate metrics.
-
-        Returns:
-            True, as Deepgram TTS service supports metrics generation.
-        """
-        return True
-
-    @traced_tts
-    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
-        """Generate speech from text using Deepgram's TTS API.
-
-        Args:
-            text: The text to synthesize into speech.
-
-        Yields:
-            Frame: Audio frames containing the synthesized speech, plus start/stop frames.
-        """
-        logger.debug(f"{self}: Generating TTS [{text}]")
-
-        # Build URL with parameters
-        url = f"{self._base_url}/v1/speak"
-
-        headers = {"Authorization": f"Token {self._api_key}", "Content-Type": "application/json"}
-
-        params = {
-            "model": self._voice_id,
-            "encoding": self._settings["encoding"],
-            "sample_rate": self.sample_rate,
-            "container": "none",
-        }
-
-        payload = {
-            "text": text,
-        }
-
-        try:
-            await self.start_ttfb_metrics()
-
-            async with self._session.post(
-                url, headers=headers, json=payload, params=params
-            ) as response:
-                if response.status != 200:
-                    error_text = await response.text()
-                    raise Exception(f"HTTP {response.status}: {error_text}")
-
-                await self.start_tts_usage_metrics(text)
-                yield TTSStartedFrame()
-
-                CHUNK_SIZE = self.chunk_size
-
-                first_chunk = True
-                async for chunk in response.content.iter_chunked(CHUNK_SIZE):
-                    if first_chunk:
-                        await self.stop_ttfb_metrics()
-                        first_chunk = False
-
-                    if chunk:
-                        yield TTSAudioRawFrame(
-                            audio=chunk,
-                            sample_rate=self.sample_rate,
-                            num_channels=1,
-                        )
-
-            yield TTSStoppedFrame()
-
-        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
-            yield ErrorFrame(f"Error getting audio: {str(e)}")
--- a/src/pipecat/services/elevenlabs/tts.py
+++ b/src/pipecat/services/elevenlabs/tts.py
@@ -168,24 +168,16 @@ def build_elevenlabs_voice_settings(


 def calculate_word_times(
-    alignment_info: Mapping[str, Any],
-    cumulative_time: float,
-    partial_word: str = "",
-    partial_word_start_time: float = 0.0,
-) -> tuple[List[Tuple[str, float]], str, float]:
+    alignment_info: Mapping[str, Any], cumulative_time: float
+) -> List[Tuple[str, float]]:
    """Calculate word timestamps from character alignment information.

    Args:
        alignment_info: Character alignment data from ElevenLabs API.
        cumulative_time: Base time offset for this chunk.
-        partial_word: Partial word carried over from previous chunk.
-        partial_word_start_time: Start time of the partial word.

    Returns:
-        Tuple of (word_times, new_partial_word, new_partial_word_start_time):
-        - word_times: List of (word, timestamp) tuples for complete words
-        - new_partial_word: Incomplete word at end of chunk (empty if chunk ends with space)
-        - new_partial_word_start_time: Start time of the incomplete word
+        List of (word, timestamp) tuples.
    """
    chars = alignment_info["chars"]
    char_start_times_ms = alignment_info["charStartTimesMs"]
@@ -194,37 +186,41 @@ def calculate_word_times(
        logger.error(
            f"calculate_word_times: length mismatch - chars={len(chars)}, times={len(char_start_times_ms)}"
        )
-        return ([], partial_word, partial_word_start_time)
+        return []

    # Build words and track their start positions
    words = []
-    word_start_times = []
-    current_word = partial_word  # Start with any partial word from previous chunk
-    word_start_time = partial_word_start_time if partial_word else None
+    word_start_indices = []
+    current_word = ""
+    word_start_index = None

    for i, char in enumerate(chars):
        if char == " ":
            # End of current word
            if current_word:  # Only add non-empty words
                words.append(current_word)
-                word_start_times.append(word_start_time)
+                word_start_indices.append(word_start_index)
                current_word = ""
-                word_start_time = None
+                word_start_index = None
        else:
            # Building a word
-            if word_start_time is None:  # First character of new word
-                # Convert from milliseconds to seconds and add cumulative offset
-                word_start_time = cumulative_time + (char_start_times_ms[i] / 1000.0)
+            if word_start_index is None:  # First character of new word
+                word_start_index = i
            current_word += char

-    # Build result for complete words
-    word_times = list(zip(words, word_start_times))
+    # Handle the last word if there's no trailing space
+    if current_word and word_start_index is not None:
+        words.append(current_word)
+        word_start_indices.append(word_start_index)

-    # Return any incomplete word at the end of this chunk
-    new_partial_word = current_word if current_word else ""
-    new_partial_word_start_time = word_start_time if word_start_time is not None else 0.0
+    # Calculate timestamps for each word
+    word_times = []
+    for word, start_idx in zip(words, word_start_indices):
+        # Convert from milliseconds to seconds and add cumulative offset
+        start_time_seconds = cumulative_time + (char_start_times_ms[start_idx] / 1000.0)
+        word_times.append((word, start_time_seconds))

-    return (word_times, new_partial_word, new_partial_word_start_time)
+    return word_times


 class ElevenLabsTTSService(AudioContextWordTTSService):
@@ -336,9 +332,6 @@ class ElevenLabsTTSService(AudioContextWordTTSService):
        # there's an interruption or TTSStoppedFrame.
        self._started = False
        self._cumulative_time = 0
-        # Track partial words that span across alignment chunks
-        self._partial_word = ""
-        self._partial_word_start_time = 0.0

        # Context management for v1 multi API
        self._context_id = None
@@ -528,7 +521,6 @@ class ElevenLabsTTSService(AudioContextWordTTSService):
                url, max_size=16 * 1024 * 1024, additional_headers={"xi-api-key": self._api_key}
            )

-            await self._call_event_handler("on_connected")
        except Exception as e:
            logger.error(f"{self} initialization error: {e}")
            self._websocket = None
@@ -551,7 +543,6 @@ class ElevenLabsTTSService(AudioContextWordTTSService):
            self._started = False
            self._context_id = None
            self._websocket = None
-            await self._call_event_handler("on_disconnected")

    def _get_websocket(self):
        if self._websocket:
@@ -579,8 +570,6 @@ class ElevenLabsTTSService(AudioContextWordTTSService):
                logger.error(f"Error closing context on interruption: {e}")
            self._context_id = None
            self._started = False
-            self._partial_word = ""
-            self._partial_word_start_time = 0.0

    async def _receive_messages(self):
        """Handle incoming WebSocket messages from ElevenLabs."""
@@ -620,14 +609,7 @@ class ElevenLabsTTSService(AudioContextWordTTSService):

            if msg.get("alignment"):
                alignment = msg["alignment"]
-                word_times, self._partial_word, self._partial_word_start_time = (
-                    calculate_word_times(
-                        alignment,
-                        self._cumulative_time,
-                        self._partial_word,
-                        self._partial_word_start_time,
-                    )
-                )
+                word_times = calculate_word_times(alignment, self._cumulative_time)

                if word_times:
                    await self.add_word_timestamps(word_times)
@@ -701,8 +683,6 @@ class ElevenLabsTTSService(AudioContextWordTTSService):
                    yield TTSStartedFrame()
                    self._started = True
                    self._cumulative_time = 0
-                    self._partial_word = ""
-                    self._partial_word_start_time = 0.0
                    # If a context ID does not exist, create a new one and
                    # register it. If an ID exists, that means the Pipeline is
                    # configured for allow_interruptions=False, so continue
@@ -776,7 +756,6 @@ class ElevenLabsHttpTTSService(WordTTSService):
        base_url: str = "https://api.elevenlabs.io",
        sample_rate: Optional[int] = None,
        params: Optional[InputParams] = None,
-        aggregate_sentences: Optional[bool] = True,
        **kwargs,
    ):
        """Initialize the ElevenLabs HTTP TTS service.
@@ -789,11 +768,10 @@ class ElevenLabsHttpTTSService(WordTTSService):
            base_url: Base URL for ElevenLabs HTTP API.
            sample_rate: Audio sample rate. If None, uses default.
            params: Additional input parameters for voice customization.
-            aggregate_sentences: Whether to aggregate sentences within the TTSService.
            **kwargs: Additional arguments passed to the parent service.
        """
        super().__init__(
-            aggregate_sentences=aggregate_sentences,
+            aggregate_sentences=True,
            push_text_frames=False,
            push_stop_frames=True,
            sample_rate=sample_rate,
@@ -831,10 +809,6 @@ class ElevenLabsHttpTTSService(WordTTSService):
        # Store previous text for context within a turn
        self._previous_text = ""

-        # Track partial words that span across alignment chunks
-        self._partial_word = ""
-        self._partial_word_start_time = 0.0
-
    def language_to_service_language(self, language: Language) -> Optional[str]:
        """Convert pipecat Language to ElevenLabs language code.

@@ -862,8 +836,6 @@ class ElevenLabsHttpTTSService(WordTTSService):
        self._cumulative_time = 0
        self._started = False
        self._previous_text = ""
-        self._partial_word = ""
-        self._partial_word_start_time = 0.0
        logger.debug(f"{self}: Reset internal state")

    async def start(self, frame: StartFrame):
@@ -898,13 +870,11 @@ class ElevenLabsHttpTTSService(WordTTSService):
    def calculate_word_times(self, alignment_info: Mapping[str, Any]) -> List[Tuple[str, float]]:
        """Calculate word timing from character alignment data.

-        This method handles partial words that may span across multiple alignment chunks.
-
        Args:
            alignment_info: Character timing data from ElevenLabs.

        Returns:
-            List of (word, timestamp) pairs for complete words in this chunk.
+            List of (word, timestamp) pairs.

        Example input data::

@@ -930,28 +900,30 @@ class ElevenLabsHttpTTSService(WordTTSService):
        # Build the words and find their start times
        words = []
        word_start_times = []
-        # Start with any partial word from previous chunk
-        current_word = self._partial_word
-        word_start_time = self._partial_word_start_time if self._partial_word else None
+        current_word = ""
+        first_char_idx = -1

        for i, char in enumerate(chars):
            if char == " ":
                if current_word:  # Only add non-empty words
                    words.append(current_word)
-                    word_start_times.append(word_start_time)
-                    current_word = ""
-                    word_start_time = None
-            else:
-                if word_start_time is None:  # First character of a new word
                    # Use time of the first character of the word, offset by cumulative time
-                    word_start_time = self._cumulative_time + char_start_times[i]
+                    word_start_times.append(
+                        self._cumulative_time + char_start_times[first_char_idx]
+                    )
+                    current_word = ""
+                    first_char_idx = -1
+            else:
+                if not current_word:  # This is the first character of a new word
+                    first_char_idx = i
                current_word += char

-        # Store any incomplete word at the end of this chunk
-        self._partial_word = current_word if current_word else ""
-        self._partial_word_start_time = word_start_time if word_start_time is not None else 0.0
+        # Don't forget the last word if there's no trailing space
+        if current_word and first_char_idx >= 0:
+            words.append(current_word)
+            word_start_times.append(self._cumulative_time + char_start_times[first_char_idx])

-        # Create word-time pairs for complete words only
+        # Create word-time pairs
        word_times = list(zip(words, word_start_times))

        return word_times
@@ -987,9 +959,6 @@ class ElevenLabsHttpTTSService(WordTTSService):
        if self._voice_settings:
            payload["voice_settings"] = self._voice_settings

-        if self._settings["apply_text_normalization"] is not None:
-            payload["apply_text_normalization"] = self._settings["apply_text_normalization"]
-
        language = self._settings["language"]
        if self._model_name in ELEVENLABS_MULTILINGUAL_MODELS and language:
            payload["language_code"] = language
@@ -1010,6 +979,8 @@ class ElevenLabsHttpTTSService(WordTTSService):
        }
        if self._settings["optimize_streaming_latency"] is not None:
            params["optimize_streaming_latency"] = self._settings["optimize_streaming_latency"]
+        if self._settings["apply_text_normalization"] is not None:
+            params["apply_text_normalization"] = self._settings["apply_text_normalization"]

        try:
            await self.start_ttfb_metrics()
@@ -1070,14 +1041,6 @@ class ElevenLabsHttpTTSService(WordTTSService):
                        logger.error(f"Error processing response: {e}", exc_info=True)
                        continue

-                # After processing all chunks, emit any remaining partial word
-                # since this is the end of the utterance
-                if self._partial_word:
-                    final_word_time = [(self._partial_word, self._partial_word_start_time)]
-                    await self.add_word_timestamps(final_word_time)
-                    self._partial_word = ""
-                    self._partial_word_start_time = 0.0
-
                # After processing all chunks, add the total utterance duration
                # to the cumulative time to ensure next utterance starts after this one
                if utterance_duration > 0:
--- a/src/pipecat/services/fish/tts.py
+++ b/src/pipecat/services/fish/tts.py
@@ -225,8 +225,6 @@ class FishAudioTTSService(InterruptibleTTSService):
            start_message = {"event": "start", "request": {"text": "", **self._settings}}
            await self._websocket.send(ormsgpack.packb(start_message))
            logger.debug("Sent start message to Fish Audio")
-
-            await self._call_event_handler("on_connected")
        except Exception as e:
            logger.error(f"Fish Audio initialization error: {e}")
            self._websocket = None
@@ -247,7 +245,6 @@ class FishAudioTTSService(InterruptibleTTSService):
            self._request_id = None
            self._started = False
            self._websocket = None
-            await self._call_event_handler("on_disconnected")

    async def flush_audio(self):
        """Flush any buffered audio by sending a flush event to Fish Audio."""
--- a/src/pipecat/services/google/gemini_live/llm.py
+++ b/src/pipecat/services/google/gemini_live/llm.py
@@ -17,7 +17,6 @@ import json
 import random
 import time
 import uuid
-import warnings
 from dataclasses import dataclass
 from enum import Enum
 from typing import Any, Dict, List, Optional, Union
@@ -57,12 +56,10 @@ from pipecat.frames.frames import (
    UserStoppedSpeakingFrame,
 )
 from pipecat.metrics.metrics import LLMTokenUsage
-from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.llm_response import (
    LLMAssistantAggregatorParams,
    LLMUserAggregatorParams,
 )
-from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
 from pipecat.processors.aggregators.openai_llm_context import (
    OpenAILLMContext,
    OpenAILLMContextFrame,
@@ -222,10 +219,6 @@ class GeminiLiveContext(OpenAILLMContext):

    Provides Gemini-specific context management including system instruction
    extraction and message format conversion for the Live API.
-
-    .. deprecated:: 0.0.92
-        Gemini Live no longer uses `GeminiLiveContext` under the hood.
-        It now uses `LLMContext`.
    """

    @staticmethod
@@ -238,22 +231,6 @@ class GeminiLiveContext(OpenAILLMContext):
        Returns:
            The upgraded Gemini context instance.
        """
-        # This warning is here rather than `__init__` since `upgrade()` was the
-        # "main" way that GeminiLiveContext instances were created.
-        # Almost no users should be seeing this message anyway, as
-        # GeminiLiveContext instances were typically created under the hood:
-        # the user would pass an OpenAILLMContext instance, which would be
-        # upgraded without them necessarily knowing.
-        with warnings.catch_warnings():
-            warnings.simplefilter("always")
-            warnings.warn(
-                "GeminiLiveContext is deprecated. "
-                "Gemini Live no longer uses GeminiLiveContext under the hood. "
-                "It now uses LLMContext.",
-                DeprecationWarning,
-                stacklevel=2,
-            )
-
        if isinstance(obj, OpenAILLMContext) and not isinstance(obj, GeminiLiveContext):
            logger.debug(f"Upgrading to Gemini Live Context: {obj}")
            obj.__class__ = GeminiLiveContext
@@ -351,28 +328,8 @@ class GeminiLiveUserContextAggregator(OpenAIUserContextAggregator):

    Extends OpenAI user aggregator to handle Gemini-specific message passing
    while maintaining compatibility with the standard aggregation pipeline.
-
-    .. deprecated:: 0.0.92
-        Gemini Live no longer expects a `GeminiLiveUserContextAggregator`.
-        It now expects a `LLMUserAggregator`.
    """

-    def __init__(self, *args, **kwargs):
-        """Initialize Gemini Live user context aggregator."""
-        # Almost no users should be seeing this message, as
-        # `GeminiLiveUserContextAggregator`` instances were typically created
-        # under the hood, as part of `llm.create_context_aggregator()`.
-        with warnings.catch_warnings():
-            warnings.simplefilter("always")
-            warnings.warn(
-                "GeminiLiveUserContextAggregator is deprecated. "
-                "Gemini Live no longer expects a GeminiLiveUserContextAggregator. "
-                "It now expects a LLMUserAggregator.",
-                DeprecationWarning,
-                stacklevel=2,
-            )
-        super().__init__(*args, **kwargs)
-
    async def process_frame(self, frame, direction):
        """Process incoming frames for user context aggregation.

@@ -392,28 +349,8 @@ class GeminiLiveAssistantContextAggregator(OpenAIAssistantContextAggregator):
    Handles assistant response aggregation while filtering out LLMTextFrames
    to prevent duplicate context entries, as Gemini Live pushes both
    LLMTextFrames and TTSTextFrames.
-
-    .. deprecated:: 0.0.92
-        Gemini Live no longer uses `GeminiLiveAssistantContextAggregator` under the hood.
-        It now uses `LLMAssistantAggregator`.
    """

-    def __init__(self, *args, **kwargs):
-        """Initialize Gemini Live assistant context aggregator."""
-        # Almost no users should be seeing this message, as
-        # `GeminiLiveAssistantContextAggregator` instances were typically
-        # created under the hood, as part of `llm.create_context_aggregator()`.
-        with warnings.catch_warnings():
-            warnings.simplefilter("always")
-            warnings.warn(
-                "GeminiLiveAssistantContextAggregator is deprecated. "
-                "Gemini Live no longer uses GeminiLiveAssistantContextAggregator under the hood. "
-                "It now uses LLMAssistantAggregator.",
-                DeprecationWarning,
-                stacklevel=2,
-            )
-        super().__init__(*args, **kwargs)
-
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        """Process incoming frames for assistant context aggregation.

@@ -443,10 +380,6 @@ class GeminiLiveAssistantContextAggregator(OpenAIAssistantContextAggregator):
 class GeminiLiveContextAggregatorPair:
    """Pair of user and assistant context aggregators for Gemini Live.

-    .. deprecated:: 0.0.92
-        `GeminiLiveContextAggregatorPair` is deprecated.
-        Use `LLMContextAggregatorPair` instead.
-
    Parameters:
        _user: The user context aggregator instance.
        _assistant: The assistant context aggregator instance.
@@ -455,19 +388,6 @@ class GeminiLiveContextAggregatorPair:
    _user: GeminiLiveUserContextAggregator
    _assistant: GeminiLiveAssistantContextAggregator

-    def __post_init__(self):
-        # Almost no users should be seeing this message, as
-        # `GeminiLiveContextAggregatorPair` instances were typically created
-        # under the hood, with `llm.create_context_aggregator()`.
-        with warnings.catch_warnings():
-            warnings.simplefilter("always")
-            warnings.warn(
-                "GeminiLiveContextAggregatorPair is deprecated. "
-                "Use LLMContextAggregatorPair instead.",
-                DeprecationWarning,
-                stacklevel=2,
-            )
-
    def user(self) -> GeminiLiveUserContextAggregator:
        """Get the user context aggregator.

@@ -672,8 +592,8 @@ class GeminiLiveLLMService(LLMService):
        self._voice_id = voice_id
        self._language_code = params.language

-        self._system_instruction_from_init = system_instruction
-        self._tools_from_init = tools
+        self._system_instruction = system_instruction
+        self._tools = tools
        self._inference_on_context_initialization = inference_on_context_initialization
        self._needs_turn_complete_message = False

@@ -689,7 +609,7 @@ class GeminiLiveLLMService(LLMService):
        self._run_llm_when_session_ready = False

        self._user_is_speaking = False
-        self._bot_is_responding = False
+        self._bot_is_speaking = False
        self._user_audio_buffer = bytearray()
        self._user_transcription_buffer = ""
        self._last_transcription_sent = ""
@@ -745,9 +665,6 @@ class GeminiLiveLLMService(LLMService):
        # Initialize the API client. Subclasses can override this if needed.
        self.create_client()

-        # Bookkeeping for tool calls
-        self._completed_tool_calls = set()
-
    def create_client(self):
        """Create the Gemini API client instance. Subclasses can override this."""
        self._client = Client(api_key=self._api_key, http_options=self._http_options)
@@ -870,13 +787,9 @@ class GeminiLiveLLMService(LLMService):
    #

    async def _handle_interruption(self):
-        if self._bot_is_responding:
-            await self._set_bot_is_responding(False)
-            if self._settings.get("modalities") == GeminiModalities.AUDIO:
-                await self.push_frame(TTSStoppedFrame())
-            # Do not send LLMFullResponseEndFrame here - an interruption
-            # already tells the assistant context aggregator that the response
-            # is over.
+        await self._set_bot_is_speaking(False)
+        await self.push_frame(TTSStoppedFrame())
+        await self.push_frame(LLMFullResponseEndFrame())

    async def _handle_user_started_speaking(self, frame):
        self._user_is_speaking = True
@@ -894,6 +807,7 @@ class GeminiLiveLLMService(LLMService):

    #
    # frame processing
+    #
    # StartFrame, StopFrame, CancelFrame implemented in base class
    #

@@ -906,7 +820,7 @@ class GeminiLiveLLMService(LLMService):
        """
        # Defer EndFrame handling until after the bot turn is finished
        if isinstance(frame, EndFrame):
-            if self._bot_is_responding:
+            if self._bot_is_speaking:
                logger.debug("Deferring handling EndFrame until bot turn is finished")
                self._end_frame_pending_bot_turn_finished = frame
                return
@@ -915,13 +829,22 @@ class GeminiLiveLLMService(LLMService):

        if isinstance(frame, TranscriptionFrame):
            await self.push_frame(frame, direction)
-        elif isinstance(frame, (LLMContextFrame, OpenAILLMContextFrame)):
-            context = (
-                frame.context
-                if isinstance(frame, LLMContextFrame)
-                else LLMContext.from_openai_context(frame.context)
-            )
-            await self._handle_context(context)
+        elif isinstance(frame, OpenAILLMContextFrame):
+            context: GeminiLiveContext = GeminiLiveContext.upgrade(frame.context)
+            # For now, we'll only trigger inference here when either:
+            #   1. We have not seen a context frame before
+            #   2. The last message is a tool call result
+            if not self._context:
+                self._context = context
+                if frame.context.tools:
+                    self._tools = frame.context.tools
+                await self._create_initial_response()
+            elif context.messages and context.messages[-1].get("role") == "tool":
+                # Support just one tool call per context frame for now
+                tool_result_message = context.messages[-1]
+                await self._tool_result(tool_result_message)
+        elif isinstance(frame, LLMContextFrame):
+            raise NotImplementedError("Universal LLMContext is not yet supported for Gemini Live.")
        elif isinstance(frame, InputTextRawFrame):
            await self._send_user_text(frame.text)
            await self.push_frame(frame, direction)
@@ -960,83 +883,13 @@ class GeminiLiveLLMService(LLMService):
        else:
            await self.push_frame(frame, direction)

-    async def _handle_context(self, context: LLMContext):
-        if not self._context:
-            # We got our initial context
-            self._context = context
-
-            # If context contains system instruction or tools, reconnect in
-            # order to apply them.
-            # (Context-provided system instruction and tools take precedence
-            # over the ones provided at initialization time. Note that we could
-            # do more sophisticated comparisons here, but for now this is
-            # sufficient: we'll assume folks won't mean to provide these
-            # settings both in the context and at initialization time. In a
-            # future change, we could/should implement the ability to swap
-            # these settings at any point).
-            adapter: GeminiLLMAdapter = self.get_llm_adapter()
-            params = adapter.get_llm_invocation_params(self._context)
-            system_instruction = params["system_instruction"]
-            tools = params["tools"]
-            if system_instruction and self._system_instruction_from_init:
-                logger.warning(
-                    "System instruction provided both at init time and in context; using context-provided value."
-                )
-            if tools and self._tools_from_init:
-                logger.warning(
-                    "Tools provided both at init time and in context; using context-provided value."
-                )
-            if system_instruction or tools:
-                await self._reconnect()
-
-            # Initialize our bookkeeping of already-completed tool calls in
-            # the context
-            await self._process_completed_function_calls(send_new_results=False)
-
-            # Create initial response if needed, based on conversation history
-            # in context
-            await self._create_initial_response()
-        else:
-            # We got an updated context.
-            self._context = context
-
-            # Here we assume that the updated context will contain either:
-            # - new messages (that the Gemini Live service, with its own
-            #   context management, is already aware of), or
-            # - tool call results (that we need to tell the remote service
-            #   about).
-            # (In the future, we could do more sophisticated diffing here,
-            # which would enable the user to programmatically manipulate the
-            # context).
-
-            # Send results for newly-completed function calls, if any.
-            await self._process_completed_function_calls(send_new_results=True)
-
-    async def _process_completed_function_calls(self, send_new_results: bool):
-        # Check for set of completed function calls in the context
-        adapter: GeminiLLMAdapter = self.get_llm_adapter()
-        messages = adapter.get_llm_invocation_params(self._context).get("messages", [])
-        for message in messages:
-            if message.parts:
-                for part in message.parts:
-                    if part.function_response:
-                        tool_call_id = part.function_response.id
-                        tool_name = part.function_response.name
-                        if tool_call_id and tool_call_id not in self._completed_tool_calls:
-                            # Found a newly-completed function call - send the result to the service
-                            if send_new_results:
-                                await self._tool_result(
-                                    tool_call_id, tool_name, part.function_response.response
-                                )
-                            self._completed_tool_calls.add(tool_call_id)
-
-    async def _set_bot_is_responding(self, responding: bool):
-        if self._bot_is_responding == responding:
+    async def _set_bot_is_speaking(self, speaking: bool):
+        if self._bot_is_speaking == speaking:
            return

-        self._bot_is_responding = responding
+        self._bot_is_speaking = speaking

-        if not self._bot_is_responding and self._end_frame_pending_bot_turn_finished:
+        if not self._bot_is_speaking and self._end_frame_pending_bot_turn_finished:
            await self.queue_frame(self._end_frame_pending_bot_turn_finished)
            self._end_frame_pending_bot_turn_finished = None

@@ -1138,25 +991,18 @@ class GeminiLiveLLMService(LLMService):
                        automatic_activity_detection=vad_config
                    )

-            # Add system instruction and tools to configuration, if provided.
-            # These settings from the context take precedence over the ones
-            # provided at initialization time.
-            adapter: GeminiLLMAdapter = self.get_llm_adapter()
-            system_instruction = None
-            tools = None
-            if self._context:
-                params = adapter.get_llm_invocation_params(self._context)
-                system_instruction = params["system_instruction"]
-                tools = params["tools"]
-            else:
-                system_instruction = self._system_instruction_from_init
-                tools = adapter.from_standard_tools(self._tools_from_init)
+            # Add system instruction to configuration, if provided
+            system_instruction = self._system_instruction or ""
+            if self._context and hasattr(self._context, "extract_system_instructions"):
+                system_instruction += "\n" + self._context.extract_system_instructions()
            if system_instruction:
                logger.debug(f"Setting system instruction: {system_instruction}")
                config.system_instruction = system_instruction
-            if tools:
-                logger.debug(f"Setting tools: {tools}")
-                config.tools = tools
+
+            # Add tools to configuration, if provided
+            if self._tools:
+                logger.debug(f"Setting tools: {self._tools}")
+                config.tools = self.get_llm_adapter().from_standard_tools(self._tools)

            # Start the connection
            self._connection_task = self.create_task(self._connection_task_handler(config=config))
@@ -1270,7 +1116,6 @@ class GeminiLiveLLMService(LLMService):
            if self._session:
                await self._session.close()
                self._session = None
-            self._completed_tool_calls = set()
            self._disconnecting = False
        except Exception as e:
            logger.error(f"{self} error disconnecting: {e}")
@@ -1350,8 +1195,7 @@ class GeminiLiveLLMService(LLMService):
            self._run_llm_when_session_ready = True
            return

-        adapter: GeminiLLMAdapter = self.get_llm_adapter()
-        messages = adapter.get_llm_invocation_params(self._context).get("messages", [])
+        messages = self._context.get_messages_for_initializing_history()
        if not messages:
            return

@@ -1379,9 +1223,8 @@ class GeminiLiveLLMService(LLMService):

        # Create a throwaway context just for the purpose of getting messages
        # in the right format
-        context = LLMContext(messages=messages_list)
-        adapter: GeminiLLMAdapter = self.get_llm_adapter()
-        messages = adapter.get_llm_invocation_params(context).get("messages", [])
+        context = GeminiLiveContext.upgrade(OpenAILLMContext(messages=messages_list))
+        messages = context.get_messages_for_initializing_history()

        if not messages:
            return
@@ -1396,16 +1239,17 @@ class GeminiLiveLLMService(LLMService):
            await self._handle_send_error(e)

    @traced_gemini_live(operation="llm_tool_result")
-    async def _tool_result(
-        self, tool_call_id: str, tool_name: str, tool_result_message: Dict[str, Any]
-    ):
+    async def _tool_result(self, tool_result_message):
        """Send tool result back to the API."""
        if self._disconnecting or not self._session:
            return

        # For now we're shoving the name into the tool_call_id field, so this
        # will work until we revisit that.
-        response = FunctionResponse(name=tool_name, id=tool_call_id, response=tool_result_message)
+        id = tool_result_message.get("tool_call_id")
+        name = tool_result_message.get("tool_call_name")
+        result = json.loads(tool_result_message.get("content") or "")
+        response = FunctionResponse(name=name, id=id, response=result)

        try:
            await self._session.send_tool_response(function_responses=response)
@@ -1433,10 +1277,7 @@ class GeminiLiveLLMService(LLMService):
        # part.text is added when `modalities` is set to TEXT; otherwise, it's None
        text = part.text
        if text:
-            if not self._bot_is_responding:
-                # Update bot responding state and send service start frame
-                # (AUDIO modality case)
-                await self._set_bot_is_responding(True)
+            if not self._bot_text_buffer:
                await self.push_frame(LLMFullResponseStartFrame())

            self._bot_text_buffer += text
@@ -1447,8 +1288,6 @@ class GeminiLiveLLMService(LLMService):
        if msg.server_content and msg.server_content.grounding_metadata:
            self._accumulated_grounding_metadata = msg.server_content.grounding_metadata

-        # If we have no audio, stop here.
-        # All logic below this point pertains to the AUDIO modality.
        inline_data = part.inline_data
        if not inline_data:
            return
@@ -1474,10 +1313,8 @@ class GeminiLiveLLMService(LLMService):
        if not audio:
            return

-        # Update bot responding state and send service start frames
-        # (AUDIO modality case)
-        if not self._bot_is_responding:
-            await self._set_bot_is_responding(True)
+        if not self._bot_is_speaking:
+            await self._set_bot_is_speaking(True)
            await self.push_frame(TTSStartedFrame())
            await self.push_frame(LLMFullResponseStartFrame())

@@ -1517,6 +1354,7 @@ class GeminiLiveLLMService(LLMService):
    @traced_gemini_live(operation="llm_response")
    async def _handle_msg_turn_complete(self, message: LiveServerMessage):
        """Handle the turn complete message."""
+        await self._set_bot_is_speaking(False)
        text = self._bot_text_buffer

        # Trace the complete LLM response (this will be handled by the decorator)
@@ -1535,15 +1373,13 @@ class GeminiLiveLLMService(LLMService):
        self._search_result_buffer = ""
        self._accumulated_grounding_metadata = None

-        if self._bot_is_responding:
-            await self._set_bot_is_responding(False)
-            if not text:
-                # AUDIO modality case
-                await self.push_frame(TTSStoppedFrame())
-                await self.push_frame(LLMFullResponseEndFrame())
-            else:
-                # TEXT modality case
-                await self.push_frame(LLMFullResponseEndFrame())
+        # Only push the TTSStoppedFrame if the bot is outputting audio
+        # when text is found, modalities is set to TEXT and no audio
+        # is produced.
+        if not text:
+            await self.push_frame(TTSStoppedFrame())
+
+        await self.push_frame(LLMFullResponseEndFrame())

    @traced_stt
    async def _handle_user_transcription(
@@ -1606,8 +1442,8 @@ class GeminiLiveLLMService(LLMService):
            return

        # This is the output transcription text when modalities is set to AUDIO.
-        # In this case, we push TTSTextFrame to be handled by the downstream
-        # assistant context aggregator.
+        # In this case, we push LLMTextFrame and TTSTextFrame to be handled by the
+        # downstream assistant context aggregator.
        text = message.server_content.output_transcription.text

        if not text:
@@ -1622,17 +1458,7 @@ class GeminiLiveLLMService(LLMService):
        # Collect text for tracing
        self._llm_output_buffer += text

-        # NOTE: Shoot. When using Vertex AI, output transcription messages
-        # arrive *before* the model_turn messages with audio, so we need to
-        # handle sending TTSStartedFrame and LLMFullResponseStartFrame here as
-        # well. These messages also contain much *more* text (it looks further
-        # ahead). That means that on an interruption our recorded context will
-        # contain some text that was actually never spoken.
-        if not self._bot_is_responding:
-            await self._set_bot_is_responding(True)
-            await self.push_frame(TTSStartedFrame())
-            await self.push_frame(LLMFullResponseStartFrame())
-
+        await self.push_frame(LLMTextFrame(text=text))
        await self.push_frame(TTSTextFrame(text=text))

    async def _handle_msg_grounding_metadata(self, message: LiveServerMessage):
@@ -1731,26 +1557,26 @@ class GeminiLiveLLMService(LLMService):
        *,
        user_params: LLMUserAggregatorParams = LLMUserAggregatorParams(),
        assistant_params: LLMAssistantAggregatorParams = LLMAssistantAggregatorParams(),
-    ) -> LLMContextAggregatorPair:
+    ) -> GeminiLiveContextAggregatorPair:
        """Create an instance of GeminiLiveContextAggregatorPair from an OpenAILLMContext.

        Constructor keyword arguments for both the user and assistant aggregators can be provided.

-        NOTE: this method exists only for backward compatibility. New code
-        should instead do:
-            context = LLMContext(...)
-            context_aggregator = LLMContextAggregatorPair(context)
-
        Args:
            context: The LLM context to use.
            user_params: User aggregator parameters. Defaults to LLMUserAggregatorParams().
            assistant_params: Assistant aggregator parameters. Defaults to LLMAssistantAggregatorParams().

        Returns:
-            A pair of user and assistant context aggregators.
+            GeminiLiveContextAggregatorPair: A pair of context
+            aggregators, one for the user and one for the assistant,
+            encapsulated in an GeminiLiveContextAggregatorPair.
        """
-        context = LLMContext.from_openai_context(context)
+        context.set_llm_adapter(self.get_llm_adapter())
+
+        GeminiLiveContext.upgrade(context)
+        user = GeminiLiveUserContextAggregator(context, params=user_params)
+
        assistant_params.expect_stripped_words = False
-        return LLMContextAggregatorPair(
-            context, user_params=user_params, assistant_params=assistant_params
-        )
+        assistant = GeminiLiveAssistantContextAggregator(context, params=assistant_params)
+        return GeminiLiveContextAggregatorPair(_user=user, _assistant=assistant)
--- a/src/pipecat/services/google/llm.py
+++ b/src/pipecat/services/google/llm.py
@@ -1034,23 +1034,6 @@ class GoogleLLMService(LLMService):
        if context:
            await self._process_context(context)

-    async def stop(self, frame):
-        """Override stop to gracefully close the client."""
-        await super().stop(frame)
-        await self._close_client()
-
-    async def cancel(self, frame):
-        """Override cancel to gracefully close the client."""
-        await super().cancel(frame)
-        await self._close_client()
-
-    async def _close_client(self):
-        try:
-            await self._client.aio.aclose()
-        except Exception:
-            # Do nothing - we're shutting down anyway
-            pass
-
    def create_context_aggregator(
        self,
        context: OpenAILLMContext,
--- a/src/pipecat/services/google/stt.py
+++ b/src/pipecat/services/google/stt.py
@@ -730,8 +730,6 @@ class GoogleSTTService(STTService):
        self._request_queue = asyncio.Queue()
        self._streaming_task = self.create_task(self._stream_audio())

-        await self._call_event_handler("on_connected")
-
    async def _disconnect(self):
        """Clean up streaming recognition resources."""
        if self._streaming_task:
@@ -739,8 +737,6 @@ class GoogleSTTService(STTService):
            await self.cancel_task(self._streaming_task)
            self._streaming_task = None

-        await self._call_event_handler("on_disconnected")
-
    async def _request_generator(self):
        """Generates requests for the streaming recognize method."""
        recognizer_path = f"projects/{self._project_id}/locations/{self._location}/recognizers/_"
--- a/Show More
+++ b/Show More