demo: DelayProcessor

Merge pull request #2614 from pipecat-ai/aleix/readme-client-sdks-table
README: update clients' table
2025-09-11 16:05:08 +08:00 · 2025-09-10 10:21:18 -07:00 · 2025-09-10 09:13:04 -07:00 · 2025-09-10 10:40:10 -04:00 · 2025-09-10 15:03:23 +08:00 · 2025-09-08 17:13:28 -04:00
64 changed files with 4935 additions and 3521 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,14 +5,79 @@ All notable changes to **Pipecat** will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [Unreleased]
+
+### Added
+
+- Added video streaming support to `LiveKitTransport`.
+
+- Added `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService` which provide
+  access to OpenAI Realtime.
+
+### Removed
+
+- Remove `VisionImageRawFrame` in favor of context frames (`LLMContextFrame` or
+  `OpenAILLMContextFrame`).
+
+### Deprecated
+
+- Deprecate `VisionImageFrameAggregator` because `VisionImageRawFrame` has been
+  removed. See the `12*` examples for the new recommended replacement pattern.
+
+- `NoisereduceFilter` is now deprecated and will be removed in a future
+  version. Use other audio filters like `KrispFilter` or `AICFilter`.
+
+- Deprecated `OpenAIRealtimeBetaLLMService` and `AzureRealtimeBetaLLMService`.
+  Use `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService`, respectively.
+  Each service will be removed in an upcoming version, 1.0.0.
+
+### Fixed
+
+- Fixed a `LiveKitTransport` issue where RTVI messages were not properly
+  encoded.
+
+- Add additional fixups to Mistral context messages to ensure they meet
+  Mistral-specific requirements, avoiding Mistral "invalid request" errors.
+
+- Fixed `DailyTransport` transcription handling to gracefully handle missing
+  `rawResponse` field in transcription messages, preventing KeyError crashes.
+
+## [0.0.84] - 2025-09-05
+
+### Added
+
+- Add the ability to send DTMF to `LiveKitTransport`.
+
+- Expanded support for universal `LLMContext` to the Anthropic LLM service.
+  Using the universal `LLMContext` and associated `LLMContextAggregatorPair` is
+  a pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime.
+
+### Changed
+
+- Updated `daily-python` to 0.19.9.
+
+- Restored `DailyTransport`'s native DTMF support using Daily's `send_dtmf()`
+  method instead of generated audio tones.
+
+### Fixed
+
+- Fixed a `AWSBedrockLLMService` crash caused by an extra `await`.
+
+- Fixed a `OpenAIImageGenService` issue where it was not creating
+  `URLImageRawFrame` correctly.
+
 ## [0.0.83] - 2025-09-03

 ### Added

+- Added multilingual support for AsyncAI in `AsyncAITTSService` and `AsyncAIHttpTTSService`.
+
+  - New `languages`: `es`, `fr`, `de`, `it`.
+
 - Added new frames `InputTransportMessageUrgentFrame` and
  `DailyInputTransportMessageUrgentFrame` for transport messages received from
  external sources.
-  
+
 - Added `UserSpeakingFrame`. This will be sent upstream and downstream while VAD
  detects the user is speaking.

@@ -82,7 +147,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 - Added new config parameters to `GladiaSTTService`.
  - PreProcessingConfig > `audio_enhancer` to enhance audio quality.
-  - CustomVocabularyItem > `pronunciations` and `language` to specify special pronunciations and in which language it will be pronounced.
+  - CustomVocabularyItem > `pronunciations` and `language` to specify special
+    pronunciations and in which language it will be pronounced.

 ### Changed

@@ -101,7 +167,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - `pipecat.frames.frames.KeypadEntry` is deprecated and has been moved to
  `pipecat.audio.dtmf.types.KeypadEntry`.

- Updated `RimeTTSService`'s flush_audio message to conform with Rime's official API.
+- Updated `RimeTTSService`'s flush_audio message to conform with Rime's official
+  API.

 - Updated the default model for `CerebrasLLMService` to GPT-OSS-120B.

--- a/README.md
+++ b/README.md
@@ -28,6 +28,41 @@
 - **Composable Pipelines**: Build complex behavior from modular components
 - **Real-Time**: Ultra-low latency interaction with different transports (e.g. WebSockets or WebRTC)

+## 📱 Client SDKs
+
+You can connect to Pipecat from any platform using our official SDKs:
+
+<table>
+  <tr>
+    <td>
+      <img src="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/javascript/javascript-original.svg" width="40" height="40" alt="JavaScript"/>
+      <a href="https://docs.pipecat.ai/client/js/introduction">JavaScript</a>
+    </td>
+    <td>
+      <img src="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/react/react-original.svg" width="40" height="40" alt="React"/>
+      <a href="https://docs.pipecat.ai/client/react/introduction">React</a>
+    </td>
+    <td>
+      <img src="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/react/react-original.svg" width="40" height="40" alt="React Native"/>
+      <a href="https://docs.pipecat.ai/client/react-native/introduction">React Native</a>
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <img src="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/swift/swift-original.svg" width="40" height="40" alt="Swift"/>
+      <a href="https://docs.pipecat.ai/client/ios/introduction">Swift</a>
+    </td>
+    <td>
+      <img src="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/kotlin/kotlin-original.svg" width="40" height="40" alt="Kotlin"/>
+      <a href="https://docs.pipecat.ai/client/android/introduction">Kotlin</a>
+    </td>
+    <td>
+      <img src="https://cdn.jsdelivr.net/gh/devicons/devicon/icons/cplusplus/cplusplus-original.svg" width="40" height="40" alt="JavaScript"/>
+      <a href="https://docs.pipecat.ai/client/c++/introduction">C++</a>
+    </td>
+  </tr>
+</table>
+
 ## 🎬 See it in action

 <p float="left">
@@ -38,17 +73,6 @@
    <a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/moondream-chatbot/image.png" width="400" /></a>
 </p>

-## 📱 Client SDKs
-
-You can connect to Pipecat from any platform using our official SDKs:
-
-| Platform | SDK Repo                                                                       | Description                      |
-| -------- | ------------------------------------------------------------------------------ | -------------------------------- |
-| Web      | [pipecat-client-web](https://github.com/pipecat-ai/pipecat-client-web)         | JavaScript and React client SDKs |
-| iOS      | [pipecat-client-ios](https://github.com/pipecat-ai/pipecat-client-ios)         | Swift SDK for iOS                |
-| Android  | [pipecat-client-android](https://github.com/pipecat-ai/pipecat-client-android) | Kotlin SDK for Android           |
-| C++      | [pipecat-client-cxx](https://github.com/pipecat-ai/pipecat-client-cxx)         | C++ client SDK                   |
-
 ## 🧩 Available services

 | Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
@@ -62,7 +86,7 @@ You can connect to Pipecat from any platform using our official SDKs:
 | Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | Memory              | [mem0](https://docs.pipecat.ai/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
 | Vision & Image      | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
 | Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |

 📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
--- a/examples/foundational/07-interruptible.py
+++ b/examples/foundational/07-interruptible.py
@@ -4,17 +4,19 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+import asyncio
 import os

 from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import LLMRunFrame
+from pipecat.frames.frames import Frame, LLMFullResponseEndFrame, LLMRunFrame, LLMTextFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -26,6 +28,62 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams

 load_dotenv(override=True)

+
+class DelayProcessor(FrameProcessor):
+    """Custom processor that queues LLM text frames until response is complete.
+
+    This creates a more natural conversation flow by preventing the agent from
+    responding immediately after the user stops speaking. It queues all LLMTextFrames
+    until it sees an LLMFullResponseEndFrame, then waits for the specified delay
+    before releasing all queued frames at once.
+    """
+
+    def __init__(self, *, delay_seconds: float = 1.0, **kwargs) -> None:
+        """Initialize the DelayProcessor.
+
+        Args:
+            delay_seconds: Number of seconds to delay before releasing queued frames (default: 1.0)
+        """
+        super().__init__(**kwargs)
+        self._delay_seconds = delay_seconds
+        self._queued_frames = []
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection) -> None:
+        """Process frames, queuing LLM text frames until response is complete.
+
+        Args:
+            frame: The frame to process
+            direction: Direction of the frame in the pipeline
+        """
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, LLMTextFrame):
+            # Queue LLM text frames instead of pushing them immediately
+            logger.debug(f"Queuing LLMTextFrame: {frame.text}")
+            self._queued_frames.append((frame, direction))
+        elif isinstance(frame, LLMFullResponseEndFrame):
+            # When we see the end frame, wait for delay then push all queued frames
+            logger.debug(
+                f"LLM response complete, delaying {self._delay_seconds} seconds before releasing {len(self._queued_frames)} queued frames"
+            )
+            await asyncio.sleep(self._delay_seconds)
+
+            # Push all queued LLM text frames
+            for queued_frame, queued_direction in self._queued_frames:
+                logger.debug(f"Releasing queued LLMTextFrame: {queued_frame.text}")
+                await self.push_frame(queued_frame, queued_direction)
+
+            # Clear the queue
+            self._queued_frames.clear()
+
+            # Push the end frame
+            logger.debug("Pushing LLMFullResponseEndFrame")
+            await self.push_frame(frame, direction)
+        else:
+            # Push all other frames immediately
+            await self.push_frame(frame, direction)
+
+
 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
 # selected.
@@ -70,12 +128,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context = OpenAILLMContext(messages)
    context_aggregator = llm.create_context_aggregator(context)

+    # Create delay processor to add 1-second delay before agent responses
+    delay_processor = DelayProcessor(delay_seconds=1.0)
+
    pipeline = Pipeline(
        [
            transport.input(),  # Transport user input
            stt,
            context_aggregator.user(),  # User responses
            llm,  # LLM
+            delay_processor,  # Add delay before TTS
            tts,  # TTS
            transport.output(),  # Transport bot output
            context_aggregator.assistant(),  # Assistant spoken responses
--- a/examples/foundational/07s-interruptible-google-audio-in.py
+++ b/examples/foundational/07s-interruptible-google-audio-in.py
@@ -93,9 +93,8 @@ class UserAudioCollector(FrameProcessor):
        elif isinstance(frame, UserStoppedSpeakingFrame):
            self._user_speaking = False
            self._context.add_audio_frames_message(audio_frames=self._audio_frames)
-            await self._user_context_aggregator.push_frame(
-                self._user_context_aggregator.get_context_frame()
-            )
+            await self._user_context_aggregator.push_frame(LLMRunFrame())
+
        elif isinstance(frame, InputAudioRawFrame):
            if self._user_speaking:
                self._audio_frames.append(frame)
@@ -151,7 +150,7 @@ class TranscriptExtractor(FrameProcessor):
        await self.push_frame(frame, direction)


-class TanscriptionContextFixup(FrameProcessor):
+class TranscriptionContextFixup(FrameProcessor):
    def __init__(self, context):
        super().__init__()
        self._context = context
@@ -245,7 +244,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    context_aggregator = llm.create_context_aggregator(context)
    audio_collector = UserAudioCollector(context, context_aggregator.user())
    pull_transcript_out_of_llm_output = TranscriptExtractor(context)
-    fixup_context_messages = TanscriptionContextFixup(context)
+    fixup_context_messages = TranscriptionContextFixup(context)

    pipeline = Pipeline(
        [
--- a/examples/foundational/12-describe-video.py
+++ b/examples/foundational/12-describe-video.py
@@ -11,12 +11,19 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import Frame, TextFrame, TTSSpeakFrame, UserImageRequestFrame
+from pipecat.frames.frames import (
+    Frame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.user_response import UserResponseAggregator
-from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
@@ -34,6 +41,8 @@ load_dotenv(override=True)


 class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""
+
    def __init__(self, participant_id: Optional[str] = None):
        super().__init__()
        self._participant_id = participant_id
@@ -46,9 +55,32 @@ class UserImageRequester(FrameProcessor):

        if self._participant_id and isinstance(frame, TextFrame):
            await self.push_frame(
-                UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
            )
-        await self.push_frame(frame, direction)
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -78,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Initialize the image requester without setting the participant ID yet
    image_requester = UserImageRequester()

-    vision_aggregator = VisionImageFrameAggregator()
+    image_processor = UserImageProcessor()

    # If you run into weird description, try with use_cpu=True
    moondream = MoondreamService()
@@ -96,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            stt,
            user_response,
            image_requester,
-            vision_aggregator,
+            image_processor,
            moondream,
            tts,
            transport.output(),
@@ -119,7 +151,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        image_requester.set_participant_id(client_id)

        # Welcome message
-        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me what I see."))
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/12a-describe-video-gemini-flash.py
+++ b/examples/foundational/12a-describe-video-gemini-flash.py
@@ -11,12 +11,19 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import Frame, TextFrame, TTSSpeakFrame, UserImageRequestFrame
+from pipecat.frames.frames import (
+    Frame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.user_response import UserResponseAggregator
-from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
@@ -34,6 +41,8 @@ load_dotenv(override=True)


 class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""
+
    def __init__(self, participant_id: Optional[str] = None):
        super().__init__()
        self._participant_id = participant_id
@@ -46,9 +55,32 @@ class UserImageRequester(FrameProcessor):

        if self._participant_id and isinstance(frame, TextFrame):
            await self.push_frame(
-                UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
            )
-        await self.push_frame(frame, direction)
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -78,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Initialize the image requester without setting the participant ID yet
    image_requester = UserImageRequester()

-    vision_aggregator = VisionImageFrameAggregator()
+    image_processor = UserImageProcessor()

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

@@ -96,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            stt,
            user_response,
            image_requester,
-            vision_aggregator,
+            image_processor,
            google,
            tts,
            transport.output(),
@@ -123,7 +155,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        image_requester.set_participant_id(client_id)

        # Welcome message
-        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me what I see."))
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/12b-describe-video-gpt-4o.py
+++ b/examples/foundational/12b-describe-video-gpt-4o.py
@@ -11,12 +11,19 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import Frame, TextFrame, TTSSpeakFrame, UserImageRequestFrame
+from pipecat.frames.frames import (
+    Frame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.user_response import UserResponseAggregator
-from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
@@ -34,6 +41,8 @@ load_dotenv(override=True)


 class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""
+
    def __init__(self, participant_id: Optional[str] = None):
        super().__init__()
        self._participant_id = participant_id
@@ -46,9 +55,32 @@ class UserImageRequester(FrameProcessor):

        if self._participant_id and isinstance(frame, TextFrame):
            await self.push_frame(
-                UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
            )
-        await self.push_frame(frame, direction)
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -78,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Initialize the image requester without setting the participant ID yet
    image_requester = UserImageRequester()

-    vision_aggregator = VisionImageFrameAggregator()
+    image_processor = UserImageProcessor()

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

@@ -96,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            stt,
            user_response,
            image_requester,
-            vision_aggregator,
+            image_processor,
            openai,
            tts,
            transport.output(),
@@ -123,7 +155,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        image_requester.set_participant_id(client_id)

        # Welcome message
-        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me what I see."))
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/12c-describe-video-anthropic.py
+++ b/examples/foundational/12c-describe-video-anthropic.py
@@ -11,12 +11,19 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import Frame, TextFrame, TTSSpeakFrame, UserImageRequestFrame
+from pipecat.frames.frames import (
+    Frame,
+    LLMContextFrame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.user_response import UserResponseAggregator
-from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import (
@@ -34,6 +41,8 @@ load_dotenv(override=True)


 class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""
+
    def __init__(self, participant_id: Optional[str] = None):
        super().__init__()
        self._participant_id = participant_id
@@ -46,9 +55,32 @@ class UserImageRequester(FrameProcessor):

        if self._participant_id and isinstance(frame, TextFrame):
            await self.push_frame(
-                UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
            )
-        await self.push_frame(frame, direction)
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                context = LLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = LLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)


 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -78,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Initialize the image requester without setting the participant ID yet
    image_requester = UserImageRequester()

-    vision_aggregator = VisionImageFrameAggregator()
+    image_processor = UserImageProcessor()

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

@@ -96,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            stt,
            user_response,
            image_requester,
-            vision_aggregator,
+            image_processor,
            anthropic,
            tts,
            transport.output(),
@@ -123,7 +155,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        image_requester.set_participant_id(client_id)

        # Welcome message
-        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me what I see."))
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/12d-describe-video-aws.py
+++ b/examples/foundational/12d-describe-video-aws.py
@@ -0,0 +1,186 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import os
+from typing import Optional
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import (
+    Frame,
+    TextFrame,
+    TTSSpeakFrame,
+    UserImageRawFrame,
+    UserImageRequestFrame,
+)
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import (
+    OpenAILLMContext,
+    OpenAILLMContextFrame,
+)
+from pipecat.processors.aggregators.user_response import UserResponseAggregator
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import (
+    create_transport,
+    get_transport_client_id,
+    maybe_capture_participant_camera,
+)
+from pipecat.services.aws.llm import AWSBedrockLLMService
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+
+load_dotenv(override=True)
+
+
+class UserImageRequester(FrameProcessor):
+    """Converts incoming text into requests for user images."""
+
+    def __init__(self, participant_id: Optional[str] = None):
+        super().__init__()
+        self._participant_id = participant_id
+
+    def set_participant_id(self, participant_id: str):
+        self._participant_id = participant_id
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if self._participant_id and isinstance(frame, TextFrame):
+            await self.push_frame(
+                UserImageRequestFrame(self._participant_id, context=frame.text),
+                FrameDirection.UPSTREAM,
+            )
+        else:
+            await self.push_frame(frame, direction)
+
+
+class UserImageProcessor(FrameProcessor):
+    """Converts incoming user images into context frames."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, UserImageRawFrame):
+            if frame.request and frame.request.context:
+                # Note: AWS Bedrock does not yet support the universal LLMContext
+                context = OpenAILLMContext()
+                context.add_image_frame_message(
+                    image=frame.image,
+                    text=frame.request.context,
+                    size=frame.size,
+                    format=frame.format,
+                )
+                frame = OpenAILLMContextFrame(context)
+                await self.push_frame(frame)
+        else:
+            await self.push_frame(frame, direction)
+
+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        video_in_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        video_in_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    user_response = UserResponseAggregator()
+
+    # Initialize the image requester without setting the participant ID yet
+    image_requester = UserImageRequester()
+
+    image_processor = UserImageProcessor()
+
+    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+
+    # AWS for vision analysis
+    aws = AWSBedrockLLMService(
+        aws_region="us-west-2",
+        model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+        params=AWSBedrockLLMService.InputParams(temperature=0.8),
+    )
+
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            stt,
+            user_response,
+            image_requester,
+            image_processor,
+            aws,
+            tts,
+            transport.output(),
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected: {client}")
+
+        await maybe_capture_participant_camera(transport, client)
+
+        # Set the participant ID in the image requester
+        client_id = get_transport_client_id(transport, client)
+        image_requester.set_participant_id(client_id)
+
+        # Welcome message
+        await task.queue_frame(TTSSpeakFrame("Hi there! Feel free to ask me about what I see."))
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/13-whisper-transcription.py
+++ b/examples/foundational/13-whisper-transcription.py
@@ -31,6 +31,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13a-whisper-local.py
+++ b/examples/foundational/13a-whisper-local.py
@@ -32,6 +32,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 async def main():
    transport = LocalAudioTransport(
--- a/examples/foundational/13b-deepgram-transcription.py
+++ b/examples/foundational/13b-deepgram-transcription.py
@@ -31,6 +31,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13c-gladia-transcription.py
+++ b/examples/foundational/13c-gladia-transcription.py
@@ -31,6 +31,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13c-gladia-translation.py
+++ b/examples/foundational/13c-gladia-translation.py
@@ -40,6 +40,9 @@ class TranscriptionLogger(FrameProcessor):
        elif isinstance(frame, TranslationFrame):
            print(f"Translation ({frame.language}): {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13d-assemblyai-transcription.py
+++ b/examples/foundational/13d-assemblyai-transcription.py
@@ -31,6 +31,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13e-whisper-mlx.py
+++ b/examples/foundational/13e-whisper-mlx.py
@@ -52,6 +52,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            self._last_transcription_time = time.time()

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13f-cartesia-transcription.py
+++ b/examples/foundational/13f-cartesia-transcription.py
@@ -31,6 +31,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13g-sambanova-transcription.py
+++ b/examples/foundational/13g-sambanova-transcription.py
@@ -53,6 +53,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            self._last_transcription_time = time.time()

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13h-speechmatics-transcription.py
+++ b/examples/foundational/13h-speechmatics-transcription.py
@@ -32,6 +32,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
 # instantiated. The function will be called when the desired transport gets
--- a/examples/foundational/13i-soniox-transcription.py
+++ b/examples/foundational/13i-soniox-transcription.py
@@ -32,6 +32,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 transport_params = {
    "daily": lambda: DailyParams(
--- a/examples/foundational/13j-azure-transcription.py
+++ b/examples/foundational/13j-azure-transcription.py
@@ -32,6 +32,9 @@ class TranscriptionLogger(FrameProcessor):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")

+        # Push all frames through
+        await self.push_frame(frame, direction)
+

 transport_params = {
    "daily": lambda: DailyParams(
--- a/examples/foundational/14b-function-calling-anthropic-video.py
+++ b/examples/foundational/14b-function-calling-anthropic-video.py
@@ -97,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = AnthropicLLMService(
        api_key=os.getenv("ANTHROPIC_API_KEY"),
        model="claude-3-7-sonnet-latest",
-        enable_prompt_caching_beta=True,
+        params=AnthropicLLMService.InputParams(enable_prompt_caching=True),
    )
    llm.register_function("get_weather", get_weather)
    llm.register_function("get_image", get_image)
--- a/examples/foundational/14z-function-calling-anthropic-universal-context.py
+++ b/examples/foundational/14z-function-calling-anthropic-universal-context.py
@@ -0,0 +1,211 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import asyncio
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import (
+    create_transport,
+    get_transport_client_id,
+    maybe_capture_participant_camera,
+)
+from pipecat.services.anthropic.llm import AnthropicLLMService
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.services.daily import DailyParams
+
+load_dotenv(override=True)
+
+
+# Global variable to store the client ID
+client_id = ""
+
+
+async def get_weather(params: FunctionCallParams):
+    location = params.arguments["location"]
+    await params.result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
+
+
+async def get_image(params: FunctionCallParams):
+    question = params.arguments["question"]
+    logger.debug(f"Requesting image with user_id={client_id}, question={question}")
+
+    # Request the image frame
+    await params.llm.request_image_frame(
+        user_id=client_id,
+        function_name=params.function_name,
+        tool_call_id=params.tool_call_id,
+        text_content=question,
+    )
+
+    # Wait a short time for the frame to be processed
+    await asyncio.sleep(0.5)
+
+    # Return a result to complete the function call
+    await params.result_callback(
+        f"I've captured an image from your camera and I'm analyzing what you asked about: {question}"
+    )
+
+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        video_in_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        video_in_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+    )
+
+    llm = AnthropicLLMService(
+        api_key=os.getenv("ANTHROPIC_API_KEY"),
+        model="claude-3-7-sonnet-latest",
+        params=AnthropicLLMService.InputParams(enable_prompt_caching=True),
+    )
+    llm.register_function("get_weather", get_weather)
+    llm.register_function("get_image", get_image)
+
+    weather_function = FunctionSchema(
+        name="get_weather",
+        description="Get the current weather",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+        },
+        required=["location"],
+    )
+    get_image_function = FunctionSchema(
+        name="get_image",
+        description="Get an image from the video stream.",
+        properties={
+            "question": {
+                "type": "string",
+                "description": "The question that the user is asking about the image.",
+            }
+        },
+        required=["question"],
+    )
+    tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
+
+    system_prompt = """\
+You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.
+
+Your response will be turned into speech so use only simple words and punctuation.
+
+You have access to two tools: get_weather and get_image.
+
+You can respond to questions about the weather using the get_weather tool.
+
+You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
+indicate you should use the get_image tool are:
+- What do you see?
+- What's in the video?
+- Can you describe the video?
+- Tell me about what you see.
+- Tell me something interesting about what you see.
+- What's happening in the video?
+
+If you need to use a tool, simply use the tool. Do not tell the user the tool you are using. Be brief and concise.
+    """
+
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": "Start the conversation by introducing yourself."},
+    ]
+
+    context = LLMContext(messages, tools)
+    context_aggregator = LLMContextAggregatorPair(context)
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,  # STT
+            context_aggregator.user(),  # User speech to text
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            context_aggregator.assistant(),  # Assistant spoken responses and tool context
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected: {client}")
+
+        await maybe_capture_participant_camera(transport, client)
+
+        global client_id
+        client_id = get_transport_client_id(transport, client)
+
+        # Kick off the conversation.
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/19-openai-realtime.py
+++ b/examples/foundational/19-openai-realtime.py
@@ -0,0 +1,228 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame, TranscriptionMessage
+from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.processors.transcript_processor import TranscriptProcessor
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai_realtime import (
+    InputAudioNoiseReduction,
+    InputAudioTranscription,
+    OpenAIRealtimeLLMService,
+    SemanticTurnDetection,
+    SessionProperties,
+)
+from pipecat.services.openai_realtime.events import AudioConfiguration, AudioInput
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+restaurant_function = FunctionSchema(
+    name="get_restaurant_recommendation",
+    description="Get a restaurant recommendation",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+    },
+    required=["location"],
+)
+
+# Create tools schema
+tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
+
+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    session_properties = SessionProperties(
+        audio=AudioConfiguration(
+            input=AudioInput(
+                transcription=InputAudioTranscription(),
+                # Set openai TurnDetection parameters. Not setting this at all will turn it
+                # on by default
+                turn_detection=SemanticTurnDetection(),
+                # Or set to False to disable openai turn detection and use transport VAD
+                # turn_detection=False,
+                noise_reduction=InputAudioNoiseReduction(type="near_field"),
+            )
+        ),
+        # tools=tools,
+        instructions="""You are a helpful and friendly AI.
+
+Act like a human, but remember that you aren't a human and that you can't do human
+things in the real world. Your voice and personality should be warm and engaging, with a lively and
+playful tone.
+
+If interacting in a non-English language, start by using the standard accent or dialect familiar to
+the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
+even if you're asked about them.
+
+You are participating in a voice conversation. Keep your responses concise, short, and to the point
+unless specifically asked to elaborate on a topic.
+
+You have access to the following tools:
+- get_current_weather: Get the current weather for a given location.
+- get_restaurant_recommendation: Get a restaurant recommendation for a given location.
+
+Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
+    )
+
+    llm = OpenAIRealtimeLLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        session_properties=session_properties,
+        start_audio_paused=False,
+    )
+
+    # you can either register a single function for all function calls, or specific functions
+    # llm.register_function(None, fetch_weather_from_api)
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+
+    transcript = TranscriptProcessor()
+
+    # Create a standard OpenAI LLM context object using the normal messages format. The
+    # OpenAIRealtimeLLMService will convert this internally to messages that the
+    # openai WebSocket API can understand.
+    context = OpenAILLMContext(
+        [{"role": "user", "content": "Say hello!"}],
+        tools,
+    )
+
+    context_aggregator = llm.create_context_aggregator(context)
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            context_aggregator.user(),
+            llm,  # LLM
+            transcript.user(),  # Placed after the LLM, as LLM pushes TranscriptionFrames downstream
+            transport.output(),  # Transport bot output
+            transcript.assistant(),  # After the transcript output, to time with the audio output
+            context_aggregator.assistant(),
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[TranscriptionLogObserver()],
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    # Register event handler for transcript updates
+    @transcript.event_handler("on_transcript_update")
+    async def on_transcript_update(processor, frame):
+        for msg in frame.messages:
+            if isinstance(msg, TranscriptionMessage):
+                timestamp = f"[{msg.timestamp}] " if msg.timestamp else ""
+                line = f"{timestamp}{msg.role}: {msg.content}"
+                logger.info(f"Transcript: {line}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/19a-azure-realtime.py
+++ b/examples/foundational/19a-azure-realtime.py
@@ -0,0 +1,221 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai_realtime import (
+    AzureRealtimeLLMService,
+    InputAudioTranscription,
+    SessionProperties,
+)
+from pipecat.services.openai_realtime.events import AudioConfiguration, AudioInput
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+# Define weather function using standardized schema
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+restaurant_function = FunctionSchema(
+    name="get_restaurant_recommendation",
+    description="Get a restaurant recommendation",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+    },
+    required=["location"],
+)
+
+# Create tools schema
+tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
+
+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    session_properties = SessionProperties(
+        audio=AudioConfiguration(
+            input=AudioInput(
+                transcription=InputAudioTranscription(model="whisper-1"),
+                # Set openai TurnDetection parameters. Not setting this at all will turn it
+                # on by default
+                # turn_detection=TurnDetection(silence_duration_ms=1000),
+                # Or set to False to disable openai turn detection and use transport VAD
+                # turn_detection=False,
+            )
+        ),
+        # tools=tools,
+        instructions="""You are a helpful and friendly AI.
+
+Act like a human, but remember that you aren't a human and that you can't do human
+things in the real world. Your voice and personality should be warm and engaging, with a lively and
+playful tone.
+
+If interacting in a non-English language, start by using the standard accent or dialect familiar to
+the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
+even if you're asked about them.
+-
+You are participating in a voice conversation. Keep your responses concise, short, and to the point
+unless specifically asked to elaborate on a topic.
+
+You have access to the following tools:
+- get_current_weather: Get the current weather for a given location.
+- get_restaurant_recommendation: Get a restaurant recommendation for a given location.
+
+Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
+    )
+
+    llm = AzureRealtimeLLMService(
+        api_key=os.getenv("AZURE_REALTIME_API_KEY"),
+        base_url=os.getenv("AZURE_REALTIME_BASE_URL"),
+        session_properties=session_properties,
+        start_audio_paused=False,
+    )
+
+    # you can either register a single function for all function calls, or specific functions
+    # llm.register_function(None, fetch_weather_from_api)
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+
+    # Create a standard OpenAI LLM context object using the normal messages format. The
+    # OpenAIRealtimeBetaLLMService will convert this internally to messages that the
+    # openai WebSocket API can understand.
+    context = OpenAILLMContext(
+        [{"role": "user", "content": "Say hello!"}],
+        # [{"role": "user", "content": [{"type": "text", "text": "Say hello!"}]}],
+        #     [
+        #         {
+        #             "role": "user",
+        #             "content": [
+        #                 {"type": "text", "text": "Say"},
+        #                 {"type": "text", "text": "yo what's up!"},
+        #             ],
+        #         }
+        #     ],
+        tools,
+    )
+
+    context_aggregator = llm.create_context_aggregator(context)
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            context_aggregator.user(),
+            llm,  # LLM
+            transport.output(),  # Transport bot output
+            context_aggregator.assistant(),
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/19b-openai-realtime-beta-text.py
+++ b/examples/foundational/19b-openai-realtime-beta-text.py
@@ -31,6 +31,7 @@ from pipecat.services.openai_realtime_beta import (
    SemanticTurnDetection,
    SessionProperties,
 )
+from pipecat.services.openai_realtime_beta.events import AudioConfiguration, AudioInput
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -113,14 +114,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

    session_properties = SessionProperties(
-        input_audio_transcription=InputAudioTranscription(),
-        modalities=["text"],
-        # Set openai TurnDetection parameters. Not setting this at all will turn it
-        # on by default
-        turn_detection=SemanticTurnDetection(),
-        # Or set to False to disable openai turn detection and use transport VAD
-        # turn_detection=False,
-        input_audio_noise_reduction=InputAudioNoiseReduction(type="near_field"),
+        audio=AudioConfiguration(
+            input=AudioInput(
+                transcription=InputAudioTranscription(),
+                # Set openai TurnDetection parameters. Not setting this at all will turn it
+                # on by default
+                turn_detection=SemanticTurnDetection(),
+                # Or set to False to disable openai turn detection and use transport VAD
+                # turn_detection=False,
+                noise_reduction=InputAudioNoiseReduction(type="near_field"),
+            )
+        ),
+        output_modalities=["text"],
        # tools=tools,
        instructions="""You are a helpful and friendly AI.

--- a/examples/foundational/19b-openai-realtime-text.py
+++ b/examples/foundational/19b-openai-realtime-text.py
@@ -0,0 +1,234 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame, TranscriptionMessage
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.processors.transcript_processor import TranscriptProcessor
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai_realtime import (
+    InputAudioNoiseReduction,
+    InputAudioTranscription,
+    OpenAIRealtimeLLMService,
+    SemanticTurnDetection,
+    SessionProperties,
+)
+from pipecat.services.openai_realtime.events import AudioConfiguration, AudioInput
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+weather_function = FunctionSchema(
+    name="get_current_weather",
+    description="Get the current weather",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+        "format": {
+            "type": "string",
+            "enum": ["celsius", "fahrenheit"],
+            "description": "The temperature unit to use. Infer this from the users location.",
+        },
+    },
+    required=["location", "format"],
+)
+
+restaurant_function = FunctionSchema(
+    name="get_restaurant_recommendation",
+    description="Get a restaurant recommendation",
+    properties={
+        "location": {
+            "type": "string",
+            "description": "The city and state, e.g. San Francisco, CA",
+        },
+    },
+    required=["location"],
+)
+
+# Create tools schema
+tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
+
+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    session_properties = SessionProperties(
+        audio=AudioConfiguration(
+            input=AudioInput(
+                transcription=InputAudioTranscription(),
+                # Set openai TurnDetection parameters. Not setting this at all will turn it
+                # on by default
+                turn_detection=SemanticTurnDetection(),
+                # Or set to False to disable openai turn detection and use transport VAD
+                # turn_detection=False,
+                noise_reduction=InputAudioNoiseReduction(type="near_field"),
+            )
+        ),
+        output_modalities=["text"],
+        # tools=tools,
+        instructions="""You are a helpful and friendly AI.
+
+Act like a human, but remember that you aren't a human and that you can't do human
+things in the real world. Your voice and personality should be warm and engaging, with a lively and
+playful tone.
+
+If interacting in a non-English language, start by using the standard accent or dialect familiar to
+the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
+even if you're asked about them.
+
+You are participating in a voice conversation. Keep your responses concise, short, and to the point
+unless specifically asked to elaborate on a topic.
+
+You have access to the following tools:
+- get_current_weather: Get the current weather for a given location.
+- get_restaurant_recommendation: Get a restaurant recommendation for a given location.
+
+Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
+    )
+
+    llm = OpenAIRealtimeLLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        session_properties=session_properties,
+        start_audio_paused=False,
+    )
+
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+    )
+
+    # you can either register a single function for all function calls, or specific functions
+    # llm.register_function(None, fetch_weather_from_api)
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+
+    transcript = TranscriptProcessor()
+
+    # Create a standard OpenAI LLM context object using the normal messages format. The
+    # OpenAIRealtimeLLMService will convert this internally to messages that the
+    # openai WebSocket API can understand.
+    context = OpenAILLMContext(
+        [{"role": "user", "content": "Say hello!"}],
+        tools,
+    )
+
+    context_aggregator = llm.create_context_aggregator(context)
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            context_aggregator.user(),
+            llm,  # LLM
+            tts,  # TTS
+            transcript.user(),  # Placed after the LLM, as LLM pushes TranscriptionFrames downstream
+            transport.output(),  # Transport bot output
+            transcript.assistant(),  # After the transcript output, to time with the audio output
+            context_aggregator.assistant(),
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    # Register event handler for transcript updates
+    @transcript.event_handler("on_transcript_update")
+    async def on_transcript_update(processor, frame):
+        for msg in frame.messages:
+            if isinstance(msg, TranscriptionMessage):
+                timestamp = f"[{msg.timestamp}] " if msg.timestamp else ""
+                line = f"{timestamp}{msg.role}: {msg.content}"
+                logger.info(f"Transcript: {line}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/20b-persistent-context-openai-realtime-beta.py
+++ b/examples/foundational/20b-persistent-context-openai-realtime-beta.py
@@ -0,0 +1,274 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import glob
+import json
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import (
+    OpenAILLMContext,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.services.openai_realtime_beta import (
+    InputAudioTranscription,
+    OpenAIRealtimeBetaLLMService,
+    SessionProperties,
+    TurnDetection,
+)
+from pipecat.services.openai_realtime_beta.events import AudioConfiguration, AudioInput
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+BASE_FILENAME = "/tmp/pipecat_conversation_"
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def get_saved_conversation_filenames(params: FunctionCallParams):
+    # Construct the full pattern including the BASE_FILENAME
+    full_pattern = f"{BASE_FILENAME}*.json"
+
+    # Use glob to find all matching files
+    matching_files = glob.glob(full_pattern)
+    logger.debug(f"matching files: {matching_files}")
+
+    await params.result_callback({"filenames": matching_files})
+
+
+async def save_conversation(params: FunctionCallParams):
+    timestamp = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
+    filename = f"{BASE_FILENAME}{timestamp}.json"
+    logger.debug(
+        f"writing conversation to {filename}\n{json.dumps(params.context.messages, indent=4)}"
+    )
+    try:
+        with open(filename, "w") as file:
+            messages = params.context.get_messages_for_persistent_storage()
+            # remove the last message, which is the instruction we just gave to save the conversation
+            messages.pop()
+            json.dump(messages, file, indent=2)
+        await params.result_callback({"success": True})
+    except Exception as e:
+        await params.result_callback({"success": False, "error": str(e)})
+
+
+async def load_conversation(params: FunctionCallParams):
+    async def _reset():
+        filename = params.arguments["filename"]
+        logger.debug(f"loading conversation from {filename}")
+        try:
+            with open(filename, "r") as file:
+                params.context.set_messages(json.load(file))
+                await params.llm.reset_conversation()
+                await params.llm._create_response()
+        except Exception as e:
+            await params.result_callback({"success": False, "error": str(e)})
+
+    asyncio.create_task(_reset())
+
+
+tools = [
+    {
+        "type": "function",
+        "name": "get_current_weather",
+        "description": "Get the current weather",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {
+                    "type": "string",
+                    "description": "The city and state, e.g. San Francisco, CA",
+                },
+                "format": {
+                    "type": "string",
+                    "enum": ["celsius", "fahrenheit"],
+                    "description": "The temperature unit to use. Infer this from the users location.",
+                },
+            },
+            "required": ["location", "format"],
+        },
+    },
+    {
+        "type": "function",
+        "name": "save_conversation",
+        "description": "Save the current conversatione. Use this function to persist the current conversation to external storage.",
+        "parameters": {
+            "type": "object",
+            "properties": {},
+            "required": [],
+        },
+    },
+    {
+        "type": "function",
+        "name": "get_saved_conversation_filenames",
+        "description": "Get a list of saved conversation histories. Returns a list of filenames. Each filename includes a date and timestamp. Each file is conversation history that can be loaded into this session.",
+        "parameters": {
+            "type": "object",
+            "properties": {},
+            "required": [],
+        },
+    },
+    {
+        "type": "function",
+        "name": "load_conversation",
+        "description": "Load a conversation history. Use this function to load a conversation history into the current session.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "filename": {
+                    "type": "string",
+                    "description": "The filename of the conversation history to load.",
+                }
+            },
+            "required": ["filename"],
+        },
+    },
+]
+
+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        vad_analyzer=SileroVADAnalyzer(),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+
+    session_properties = SessionProperties(
+        audio=AudioConfiguration(
+            input=AudioInput(
+                transcription=InputAudioTranscription(),
+                # Set openai TurnDetection parameters. Not setting this at all will turn it
+                # on by default
+                turn_detection=TurnDetection(silence_duration_ms=1000),
+                # Or set to False to disable openai turn detection and use transport VAD
+                # turn_detection=False,
+            )
+        ),
+        # tools=tools,
+        instructions="""Your knowledge cutoff is 2023-10. You are a helpful and friendly AI.
+
+Act like a human, but remember that you aren't a human and that you can't do human
+things in the real world. Your voice and personality should be warm and engaging, with a lively and
+playful tone.
+
+If interacting in a non-English language, start by using the standard accent or dialect familiar to
+the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
+even if you're asked about them.
+-
+You are participating in a voice conversation. Keep your responses concise, short, and to the point
+unless specifically asked to elaborate on a topic.
+
+Remember, your responses should be short. Just one or two sentences, usually.""",
+    )
+
+    llm = OpenAIRealtimeBetaLLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        session_properties=session_properties,
+        start_audio_paused=False,
+    )
+
+    # you can either register a single function for all function calls, or specific functions
+    # llm.register_function(None, fetch_weather_from_api)
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("save_conversation", save_conversation)
+    llm.register_function("get_saved_conversation_filenames", get_saved_conversation_filenames)
+    llm.register_function("load_conversation", load_conversation)
+
+    context = OpenAILLMContext([], tools)
+    context_aggregator = llm.create_context_aggregator(context)
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,  # STT
+            context_aggregator.user(),
+            llm,  # LLM
+            transport.output(),  # Transport bot output
+            context_aggregator.assistant(),
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/20b-persistent-context-openai-realtime.py
+++ b/examples/foundational/20b-persistent-context-openai-realtime.py
@@ -25,12 +25,13 @@ from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.llm_service import FunctionCallParams
-from pipecat.services.openai_realtime_beta import (
+from pipecat.services.openai_realtime import (
    InputAudioTranscription,
-    OpenAIRealtimeBetaLLMService,
+    OpenAIRealtimeLLMService,
    SessionProperties,
    TurnDetection,
 )
+from pipecat.services.openai_realtime.events import AudioConfiguration, AudioInput
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -182,12 +183,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

    session_properties = SessionProperties(
-        input_audio_transcription=InputAudioTranscription(),
-        # Set openai TurnDetection parameters. Not setting this at all will turn it
-        # on by default
-        turn_detection=TurnDetection(silence_duration_ms=1000),
-        # Or set to False to disable openai turn detection and use transport VAD
-        # turn_detection=False,
+        audio=AudioConfiguration(
+            input=AudioInput(
+                transcription=InputAudioTranscription(),
+                # Set openai TurnDetection parameters. Not setting this at all will turn it
+                # on by default
+                turn_detection=TurnDetection(silence_duration_ms=1000),
+                # Or set to False to disable openai turn detection and use transport VAD
+                # turn_detection=False,
+            )
+        ),
        # tools=tools,
        instructions="""Your knowledge cutoff is 2023-10. You are a helpful and friendly AI.

@@ -205,7 +210,7 @@ unless specifically asked to elaborate on a topic.
 Remember, your responses should be short. Just one or two sentences, usually.""",
    )

-    llm = OpenAIRealtimeBetaLLMService(
+    llm = OpenAIRealtimeLLMService(
        api_key=os.getenv("OPENAI_API_KEY"),
        session_properties=session_properties,
        start_audio_paused=False,
--- a/examples/foundational/43a-heygen-video-service.py
+++ b/examples/foundational/43a-heygen-video-service.py
@@ -21,9 +21,10 @@ from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.google.llm import GoogleLLMService
+from pipecat.services.heygen.api import AvatarQuality, NewSessionRequest
 from pipecat.services.heygen.video import HeyGenVideoService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.daily.transport import DailyParams, DailyTransport

 load_dotenv(override=True)

@@ -38,6 +39,7 @@ transport_params = {
        video_out_is_live=True,
        video_out_width=1280,
        video_out_height=720,
+        video_out_bitrate=2_000_000,  # 2MBps
        vad_analyzer=SileroVADAnalyzer(),
    ),
    "webrtc": lambda: TransportParams(
@@ -64,7 +66,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))

-        heyGen = HeyGenVideoService(api_key=os.getenv("HEYGEN_API_KEY"), session=session)
+        heyGen = HeyGenVideoService(
+            api_key=os.getenv("HEYGEN_API_KEY"),
+            session=session,
+            session_request=NewSessionRequest(
+                avatar_id="Shawn_Therapist_public", version="v2", quality=AvatarQuality.high
+            ),
+        )

        messages = [
            {
@@ -101,6 +109,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        @transport.event_handler("on_client_connected")
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
+            # Updating publishing settings to enable adaptive bitrate
+            if isinstance(transport, DailyTransport):
+                await transport.update_publishing(
+                    publishing_settings={
+                        "camera": {
+                            "sendSettings": {
+                                "allowAdaptiveLayers": True,
+                            }
+                        }
+                    }
+                )
+
            # Kick off the conversation.
            messages.append(
                {
--- a/examples/quickstart/pyproject.toml
+++ b/examples/quickstart/pyproject.toml
@@ -4,7 +4,7 @@ version = "0.1.0"
 description = "Quickstart example for building voice AI bots with Pipecat"
 requires-python = ">=3.10"
 dependencies = [
-    "pipecat-ai[webrtc,daily,silero,deepgram,openai,cartesia,runner]>=0.0.82",
+    "pipecat-ai[webrtc,daily,silero,deepgram,openai,cartesia,runner]>=0.0.83",
    "pipecatcloud>=0.2.4"
 ]

--- a/examples/quickstart/uv.lock
+++ b/examples/quickstart/uv.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -55,7 +55,7 @@ azure = [ "azure-cognitiveservices-speech~=1.42.0"]
 cartesia = [ "cartesia~=2.0.3", "websockets>=13.1,<15.0" ]
 cerebras = []
 deepseek = []
-daily = [ "daily-python~=0.19.8" ]
+daily = [ "daily-python~=0.19.9" ]
 deepgram = [ "deepgram-sdk~=4.7.0" ]
 elevenlabs = [ "websockets>=13.1,<15.0" ]
 fal = [ "fal-client~=0.5.9" ]
--- a/scripts/evals/eval.py
+++ b/scripts/evals/eval.py
@@ -47,7 +47,7 @@ from pipecat.transports.daily.transport import DailyParams, DailyTransport
 SCRIPT_DIR = Path(__file__).resolve().parent

 PIPELINE_IDLE_TIMEOUT_SECS = 60
-EVAL_TIMEOUT_SECS = 90
+EVAL_TIMEOUT_SECS = 120

 EvalPrompt = str | Tuple[str, ImageFile]

@@ -266,8 +266,11 @@ async def run_eval_pipeline(
    elif isinstance(prompt, tuple):
        example_prompt, example_image = prompt

-    eval_prompt = f"The answer is correct if it's appropriate for the context and matches: {eval}."
-    common_system_prompt = f"Call the eval function with your assessment only if the user answers the question. {eval_prompt}"
+    eval_prompt = f"The answer is correct if it matches: {eval}."
+    common_system_prompt = (
+        "The user might say things other than the answer and that's allowed. "
+        f"You should only call the eval function with your assessment when the user actually answers the question. {eval_prompt}"
+    )
    if user_speaks_first:
        system_prompt = f"You are an LLM eval, be extremly brief. You will start the conversation by saying: '{example_prompt}'. {common_system_prompt}"
    else:
--- a/src/pipecat/adapters/base_llm_adapter.py
+++ b/src/pipecat/adapters/base_llm_adapter.py
@@ -39,11 +39,12 @@ class BaseLLMAdapter(ABC, Generic[TLLMInvocationParams]):
    """

    @abstractmethod
-    def get_llm_invocation_params(self, context: LLMContext) -> TLLMInvocationParams:
+    def get_llm_invocation_params(self, context: LLMContext, **kwargs) -> TLLMInvocationParams:
        """Get provider-specific LLM invocation parameters from a universal LLM context.

        Args:
            context: The LLM context containing messages, tools, etc.
+            **kwargs: Additional provider-specific arguments that subclasses can use.

        Returns:
            Provider-specific parameters for invoking the LLM.
--- a/src/pipecat/adapters/services/anthropic_adapter.py
+++ b/src/pipecat/adapters/services/anthropic_adapter.py
@@ -6,12 +6,25 @@

 """Anthropic LLM adapter for Pipecat."""

-from typing import Any, Dict, List, TypedDict
+import copy
+import json
+from dataclasses import dataclass
+from typing import Any, Dict, List, Optional, TypedDict
+
+from anthropic import NOT_GIVEN, NotGiven
+from anthropic.types.message_param import MessageParam
+from anthropic.types.tool_union_param import ToolUnionParam
+from loguru import logger

 from pipecat.adapters.base_llm_adapter import BaseLLMAdapter
 from pipecat.adapters.schemas.function_schema import FunctionSchema
 from pipecat.adapters.schemas.tools_schema import ToolsSchema
-from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_context import (
+    LLMContext,
+    LLMContextMessage,
+    LLMSpecificMessage,
+    LLMStandardMessage,
+)


 class AnthropicLLMInvocationParams(TypedDict):
@@ -20,7 +33,9 @@ class AnthropicLLMInvocationParams(TypedDict):
    This is a placeholder until support for universal LLMContext machinery is added for Anthropic.
    """

-    pass
+    system: str | NotGiven
+    messages: List[MessageParam]
+    tools: List[ToolUnionParam]


 class AnthropicLLMAdapter(BaseLLMAdapter[AnthropicLLMInvocationParams]):
@@ -30,20 +45,33 @@ class AnthropicLLMAdapter(BaseLLMAdapter[AnthropicLLMInvocationParams]):
    to the specific format required by Anthropic's Claude models for function calling.
    """

-    def get_llm_invocation_params(self, context: LLMContext) -> AnthropicLLMInvocationParams:
+    def get_llm_invocation_params(
+        self, context: LLMContext, enable_prompt_caching: bool
+    ) -> AnthropicLLMInvocationParams:
        """Get Anthropic-specific LLM invocation parameters from a universal LLM context.

        This is a placeholder until support for universal LLMContext machinery is added for Anthropic.

        Args:
            context: The LLM context containing messages, tools, etc.
+            enable_prompt_caching: Whether prompt caching should be enabled.

        Returns:
            Dictionary of parameters for invoking Anthropic's LLM API.
        """
-        raise NotImplementedError("Universal LLMContext is not yet supported for Anthropic.")
+        messages = self._from_universal_context_messages(self._get_messages(context))
+        return {
+            "system": messages.system,
+            "messages": (
+                self._with_cache_control_markers(messages.messages)
+                if enable_prompt_caching
+                else messages.messages
+            ),
+            # NOTE: LLMContext's tools are guaranteed to be a ToolsSchema (or NOT_GIVEN)
+            "tools": self.from_standard_tools(context.tools) or [],
+        }

-    def get_messages_for_logging(self, context) -> List[Dict[str, Any]]:
+    def get_messages_for_logging(self, context: LLMContext) -> List[Dict[str, Any]]:
        """Get messages from a universal LLM context in a format ready for logging about Anthropic.

        Removes or truncates sensitive data like image content for safe logging.
@@ -56,7 +84,241 @@ class AnthropicLLMAdapter(BaseLLMAdapter[AnthropicLLMInvocationParams]):
        Returns:
            List of messages in a format ready for logging about Anthropic.
        """
-        raise NotImplementedError("Universal LLMContext is not yet supported for Anthropic.")
+        # Get messages in Anthropic's format
+        messages = self._from_universal_context_messages(self._get_messages(context)).messages
+
+        # Sanitize messages for logging
+        messages_for_logging = []
+        for message in messages:
+            msg = copy.deepcopy(message)
+            if "content" in msg:
+                if isinstance(msg["content"], list):
+                    for item in msg["content"]:
+                        if item["type"] == "image":
+                            item["source"]["data"] = "..."
+            messages_for_logging.append(msg)
+        return messages_for_logging
+
+    def _get_messages(self, context: LLMContext) -> List[LLMContextMessage]:
+        return context.get_messages("anthropic")
+
+    @dataclass
+    class ConvertedMessages:
+        """Container for Anthropic-formatted messages converted from universal context."""
+
+        messages: List[MessageParam]
+        system: str | NotGiven
+
+    def _from_universal_context_messages(
+        self, universal_context_messages: List[LLMContextMessage]
+    ) -> ConvertedMessages:
+        system = NOT_GIVEN
+        messages = []
+
+        # first, map messages using self._from_universal_context_message(m)
+        try:
+            messages = [self._from_universal_context_message(m) for m in universal_context_messages]
+        except Exception as e:
+            logger.error(f"Error mapping messages: {e}")
+
+        # See if we should pull the system message out of our messages list.
+        if messages and messages[0]["role"] == "system":
+            if len(messages) == 1:
+                # If we have only have a system message in the list, all we can really do
+                # without introducing too much magic is change the role to "user".
+                messages[0]["role"] = "user"
+            else:
+                # If we have more than one message, we'll pull the system message out of the
+                # list.
+                system = messages[0]["content"]
+                messages.pop(0)
+
+        # Convert any subsequent "system"-role messages to "user"-role
+        # messages, as Anthropic doesn't support system input messages.
+        for message in messages:
+            if message["role"] == "system":
+                message["role"] = "user"
+
+        # Merge consecutive messages with the same role.
+        i = 0
+        while i < len(messages) - 1:
+            current_message = messages[i]
+            next_message = messages[i + 1]
+            if current_message["role"] == next_message["role"]:
+                # Convert content to list of dictionaries if it's a string
+                if isinstance(current_message["content"], str):
+                    current_message["content"] = [
+                        {"type": "text", "text": current_message["content"]}
+                    ]
+                if isinstance(next_message["content"], str):
+                    next_message["content"] = [{"type": "text", "text": next_message["content"]}]
+                # Concatenate the content
+                current_message["content"].extend(next_message["content"])
+                # Remove the next message from the list
+                messages.pop(i + 1)
+            else:
+                i += 1
+
+        # Avoid empty content in messages
+        for message in messages:
+            if isinstance(message["content"], str) and message["content"] == "":
+                message["content"] = "(empty)"
+            elif isinstance(message["content"], list) and len(message["content"]) == 0:
+                message["content"] = [{"type": "text", "text": "(empty)"}]
+
+        return self.ConvertedMessages(messages=messages, system=system)
+
+    def _from_universal_context_message(self, message: LLMContextMessage) -> MessageParam:
+        if isinstance(message, LLMSpecificMessage):
+            return copy.deepcopy(message.message)
+        return self._from_standard_message(message)
+
+    def _from_standard_message(self, message: LLMStandardMessage) -> MessageParam:
+        """Convert standard universal context message to Anthropic format.
+
+        Handles conversion of text content, tool calls, and tool results.
+        Empty text content is converted to "(empty)".
+
+        Args:
+            message: Message in standard universal context format.
+
+        Returns:
+            Message in Anthropic format.
+
+        Examples:
+            Input standard format::
+
+                {
+                    "role": "assistant",
+                    "tool_calls": [
+                        {
+                            "id": "123",
+                            "function": {"name": "search", "arguments": '{"q": "test"}'}
+                        }
+                    ]
+                }
+
+            Output Anthropic format::
+
+                {
+                    "role": "assistant",
+                    "content": [
+                        {
+                            "type": "tool_use",
+                            "id": "123",
+                            "name": "search",
+                            "input": {"q": "test"}
+                        }
+                    ]
+                }
+        """
+        message = copy.deepcopy(message)
+        if message["role"] == "tool":
+            return {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "tool_result",
+                        "tool_use_id": message["tool_call_id"],
+                        "content": message["content"],
+                    },
+                ],
+            }
+        if message.get("tool_calls"):
+            tc = message["tool_calls"]
+            ret = {"role": "assistant", "content": []}
+            for tool_call in tc:
+                function = tool_call["function"]
+                arguments = json.loads(function["arguments"])
+                new_tool_use = {
+                    "type": "tool_use",
+                    "id": tool_call["id"],
+                    "name": function["name"],
+                    "input": arguments,
+                }
+                ret["content"].append(new_tool_use)
+            return ret
+        content = message.get("content")
+        if isinstance(content, str):
+            # fix empty text
+            if content == "":
+                content = "(empty)"
+        elif isinstance(content, list):
+            for item in content:
+                # fix empty text
+                if item["type"] == "text" and item["text"] == "":
+                    item["text"] = "(empty)"
+                # handle image_url -> image conversion
+                if item["type"] == "image_url":
+                    item["type"] = "image"
+                    item["source"] = {
+                        "type": "base64",
+                        "media_type": "image/jpeg",
+                        "data": item["image_url"]["url"].split(",")[1],
+                    }
+                    del item["image_url"]
+            # In the case where there's a single image in the list (like what
+            # would result from a UserImageRawFrame), ensure that the image
+            # comes before text, as recommended by Anthropic docs
+            # (https://docs.anthropic.com/en/docs/build-with-claude/vision#example-one-image)
+            image_indices = [i for i, item in enumerate(content) if item["type"] == "image"]
+            text_indices = [i for i, item in enumerate(content) if item["type"] == "text"]
+            if len(image_indices) == 1 and text_indices:
+                img_idx = image_indices[0]
+                first_txt_idx = text_indices[0]
+                if img_idx > first_txt_idx:
+                    # Move image before the first text
+                    image_item = content.pop(img_idx)
+                    content.insert(first_txt_idx, image_item)
+
+        return message
+
+    def _with_cache_control_markers(self, messages: List[MessageParam]) -> List[MessageParam]:
+        """Add cache control markers to messages for prompt caching.
+
+        Args:
+            messages: List of messages in Anthropic format.
+
+        Returns:
+            List of messages with cache control markers added.
+        """
+
+        def add_cache_control_marker(message: MessageParam):
+            if isinstance(message["content"], str):
+                message["content"] = [{"type": "text", "text": message["content"]}]
+            message["content"][-1]["cache_control"] = {"type": "ephemeral"}
+
+        try:
+            # Add cache control markers to the most recent two user messages.
+            # - The marker at the most recent user message tells Anthropic to
+            #   cache the prompt up to that point.
+            # - The marker at the second-most-recent user message tells Anthropic
+            #   to look up the cached prompt that goes up to that point (the
+            #   point that *was* the last user message the previous turn).
+            # If we only added the marker to the last user message, we'd only
+            # ever be adding to the cache, never looking up from it.
+            # Why user messages? We're assuming that we're primarily running
+            # inference as soon as user turns come in. In Anthropic, turns
+            # strictly alternate between user and assistant.
+
+            messages_with_markers = copy.deepcopy(messages)
+
+            # Find the most recent two user messages
+            user_message_indices = []
+            for i in range(len(messages_with_markers) - 1, -1, -1):
+                if messages_with_markers[i]["role"] == "user":
+                    user_message_indices.append(i)
+                    if len(user_message_indices) == 2:
+                        break
+
+            # Add cache control markers to the identified user messages
+            for index in user_message_indices:
+                add_cache_control_marker(messages_with_markers[index])
+
+            return messages_with_markers
+        except Exception as e:
+            logger.error(f"Error adding cache control marker: {e}")
+            return messages_with_markers

    @staticmethod
    def _to_anthropic_function_format(function: FunctionSchema) -> Dict[str, Any]:
--- a/src/pipecat/adapters/services/gemini_adapter.py
+++ b/src/pipecat/adapters/services/gemini_adapter.py
@@ -67,7 +67,7 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
        return {
            "system_instruction": messages.system_instruction,
            "messages": messages.messages,
-            # NOTE; LLMContext's tools are guaranteed to be a ToolsSchema (or NOT_GIVEN)
+            # NOTE: LLMContext's tools are guaranteed to be a ToolsSchema (or NOT_GIVEN)
            "tools": self.from_standard_tools(context.tools),
        }

@@ -192,14 +192,14 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
    def _from_standard_message(
        self, message: LLMStandardMessage, already_have_system_instruction: bool
    ) -> Content | str:
-        """Convert universal context message to Google Content object.
+        """Convert standard universal context message to Google Content object.

        Handles conversion of text, images, and function calls to Google's
        format.
        System instructions are returned as a plain string.

        Args:
-            message: Message in universal context format.
+            message: Message in standard universal context format.
            already_have_system_instruction: Whether we already have a system instruction

        Returns:
@@ -308,5 +308,4 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
                    audio_bytes = base64.b64decode(input_audio["data"])
                    parts.append(Part(inline_data=Blob(mime_type="audio/wav", data=audio_bytes)))

-        message = Content(role=role, parts=parts)
-        return message
+        return Content(role=role, parts=parts)
--- a/src/pipecat/audio/filters/noisereduce_filter.py
+++ b/src/pipecat/audio/filters/noisereduce_filter.py
@@ -33,6 +33,10 @@ class NoisereduceFilter(BaseAudioFilter):
    Applies spectral gating noise reduction algorithms to suppress background
    noise in audio streams. Uses the noisereduce library's default noise
    reduction parameters.
+
+    .. deprecated:: 0.0.85
+        `NoisereduceFilter` is deprecated and will be removed in a future version.
+        We recommend using other real-time audio filters like `KrispFilter` or `AICFilter`.
    """

    def __init__(self) -> None:
@@ -40,6 +44,17 @@ class NoisereduceFilter(BaseAudioFilter):
        self._filtering = True
        self._sample_rate = 0

+        import warnings
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("always")
+            warnings.warn(
+                "`NoisereduceFilter` is deprecated. "
+                "Use other real-time audio filters like `KrispFilter` or `AICFilter`.",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+
    async def start(self, sample_rate: int):
        """Initialize the filter with the transport's sample rate.

--- a/src/pipecat/frames/frames.py
+++ b/src/pipecat/frames/frames.py
@@ -1253,23 +1253,6 @@ class UserImageRawFrame(InputImageRawFrame):
        return f"{self.name}(pts: {pts}, user: {self.user_id}, source: {self.transport_source}, size: {self.size}, format: {self.format}, request: {self.request})"


-@dataclass
-class VisionImageRawFrame(InputImageRawFrame):
-    """Image frame for vision/image analysis with associated text prompt.
-
-    An image with an associated text to ask for a description of it.
-
-    Parameters:
-        text: Optional text prompt describing what to analyze in the image.
-    """
-
-    text: Optional[str] = None
-
-    def __str__(self):
-        pts = format_pts(self.pts)
-        return f"{self.name}(pts: {pts}, text: [{self.text}], size: {self.size}, format: {self.format})"
-
-
@dataclass
 class InputDTMFFrame(DTMFFrame, SystemFrame):
    """DTMF keypress input frame from transport."""
--- a/src/pipecat/pipeline/llm_switcher.py
+++ b/src/pipecat/pipeline/llm_switcher.py
@@ -30,25 +30,17 @@ class LLMSwitcher(ServiceSwitcher[StrategyType]):
        """Get the currently active LLM, if any."""
        return self.strategy.active_service

-    async def run_inference(
-        self, context: LLMContext, system_instruction: Optional[str] = None
-    ) -> Optional[str]:
+    async def run_inference(self, context: LLMContext) -> Optional[str]:
        """Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context, using the currently active LLM.

        Args:
            context: The LLM context containing conversation history.
-            system_instruction: Optional system instruction to guide the LLM's
-              behavior. You could also (again, optionally) provide a system
-              instruction directly in the context. If both are provided, the
-              one in the context takes precedence.

        Returns:
            The LLM's response as a string, or None if no response is generated.
        """
        if self.active_llm:
-            return await self.active_llm.run_inference(
-                context=context, system_instruction=system_instruction
-            )
+            return await self.active_llm.run_inference(context=context)
        return None

    def register_function(
--- a/src/pipecat/processors/aggregators/vision_image_frame.py
+++ b/src/pipecat/processors/aggregators/vision_image_frame.py
@@ -10,13 +10,22 @@ This module provides frame aggregation functionality to combine text and image
 frames into vision frames for multimodal processing.
 """

-from pipecat.frames.frames import Frame, InputImageRawFrame, TextFrame, VisionImageRawFrame
+from pipecat.frames.frames import Frame, InputImageRawFrame, TextFrame
+from pipecat.processors.aggregators.openai_llm_context import (
+    OpenAILLMContext,
+    OpenAILLMContextFrame,
+)
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor


 class VisionImageFrameAggregator(FrameProcessor):
    """Aggregates consecutive text and image frames into vision frames.

+    .. deprecated:: 0.0.85
+        VisionImageRawFrame has been removed in favor of context frames
+        (LLMContextFrame or OpenAILLMContextFrame), so this aggregator is not
+        needed anymore. See the 12* examples for the new recommended pattern.
+
    This aggregator waits for a consecutive TextFrame and an InputImageRawFrame.
    After the InputImageRawFrame arrives it will output a VisionImageRawFrame
    combining both the text and image data for multimodal processing.
@@ -28,6 +37,17 @@ class VisionImageFrameAggregator(FrameProcessor):
        The aggregator starts with no cached text, waiting for the first
        TextFrame to arrive before it can create vision frames.
        """
+        import warnings
+
+        warnings.warn(
+            "VisionImageFrameAggregator is deprecated. "
+            "VisionImageRawFrame has been removed in favor of context frames "
+            "(LLMContextFrame or OpenAILLMContextFrame), so this aggregator is "
+            "not needed anymore. See the 12* examples for the new recommended "
+            "pattern.",
+            DeprecationWarning,
+            stacklevel=2,
+        )
        super().__init__()
        self._describe_text = None

@@ -47,12 +67,14 @@ class VisionImageFrameAggregator(FrameProcessor):
            self._describe_text = frame.text
        elif isinstance(frame, InputImageRawFrame):
            if self._describe_text:
-                frame = VisionImageRawFrame(
+                context = OpenAILLMContext()
+                context.add_image_frame_message(
                    text=self._describe_text,
                    image=frame.image,
                    size=frame.size,
                    format=frame.format,
                )
+                frame = OpenAILLMContextFrame(context)
                await self.push_frame(frame)
                self._describe_text = None
        else:
--- a/src/pipecat/services/anthropic/llm.py
+++ b/src/pipecat/services/anthropic/llm.py
@@ -24,7 +24,10 @@ from loguru import logger
 from PIL import Image
 from pydantic import BaseModel, Field

-from pipecat.adapters.services.anthropic_adapter import AnthropicLLMAdapter
+from pipecat.adapters.services.anthropic_adapter import (
+    AnthropicLLMAdapter,
+    AnthropicLLMInvocationParams,
+)
 from pipecat.frames.frames import (
    ErrorFrame,
    Frame,
@@ -39,7 +42,6 @@ from pipecat.frames.frames import (
    LLMTextFrame,
    LLMUpdateSettingsFrame,
    UserImageRawFrame,
-    VisionImageRawFrame,
 )
 from pipecat.metrics.metrics import LLMTokenUsage
 from pipecat.processors.aggregators.llm_context import LLMContext
@@ -112,7 +114,12 @@ class AnthropicLLMService(LLMService):
        """Input parameters for Anthropic model inference.

        Parameters:
-            enable_prompt_caching_beta: Whether to enable beta prompt caching feature.
+            enable_prompt_caching: Whether to enable the prompt caching feature.
+            enable_prompt_caching_beta (deprecated): Whether to enable the beta prompt caching feature.
+
+                .. deprecated:: 0.0.84
+                    Use the `enable_prompt_caching` parameter instead.
+
            max_tokens: Maximum tokens to generate. Must be at least 1.
            temperature: Sampling temperature between 0.0 and 1.0.
            top_k: Top-k sampling parameter.
@@ -120,13 +127,26 @@ class AnthropicLLMService(LLMService):
            extra: Additional parameters to pass to the API.
        """

-        enable_prompt_caching_beta: Optional[bool] = False
+        enable_prompt_caching: Optional[bool] = None
+        enable_prompt_caching_beta: Optional[bool] = None
        max_tokens: Optional[int] = Field(default_factory=lambda: 4096, ge=1)
        temperature: Optional[float] = Field(default_factory=lambda: NOT_GIVEN, ge=0.0, le=1.0)
        top_k: Optional[int] = Field(default_factory=lambda: NOT_GIVEN, ge=0)
        top_p: Optional[float] = Field(default_factory=lambda: NOT_GIVEN, ge=0.0, le=1.0)
        extra: Optional[Dict[str, Any]] = Field(default_factory=dict)

+        def model_post_init(self, __context):
+            """Post-initialization to handle deprecated parameters."""
+            if self.enable_prompt_caching_beta is not None:
+                import warnings
+
+                warnings.simplefilter("always")
+                warnings.warn(
+                    "enable_prompt_caching_beta is deprecated. Use enable_prompt_caching instead.",
+                    DeprecationWarning,
+                    stacklevel=2,
+                )
+
    def __init__(
        self,
        *,
@@ -159,7 +179,15 @@ class AnthropicLLMService(LLMService):
        self._retry_on_timeout = retry_on_timeout
        self._settings = {
            "max_tokens": params.max_tokens,
-            "enable_prompt_caching_beta": params.enable_prompt_caching_beta or False,
+            "enable_prompt_caching": (
+                params.enable_prompt_caching
+                if params.enable_prompt_caching is not None
+                else (
+                    params.enable_prompt_caching_beta
+                    if params.enable_prompt_caching_beta is not None
+                    else False
+                )
+            ),
            "temperature": params.temperature,
            "top_k": params.top_k,
            "top_p": params.top_p,
@@ -199,34 +227,28 @@ class AnthropicLLMService(LLMService):
            response = await api_call(**params)
            return response

-    async def run_inference(
-        self, context: LLMContext | OpenAILLMContext, system_instruction: Optional[str] = None
-    ) -> Optional[str]:
+    async def run_inference(self, context: LLMContext | OpenAILLMContext) -> Optional[str]:
        """Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

        Args:
            context: The LLM context containing conversation history.
-            system_instruction: Optional system instruction to guide the LLM's
-              behavior. You could also (again, optionally) provide a system
-              instruction directly in the context. If both are provided, the
-              one in the context takes precedence.

        Returns:
            The LLM's response as a string, or None if no response is generated.
        """
        messages = []
-        system = []
+        system = NOT_GIVEN
        if isinstance(context, LLMContext):
-            # Future code will be something like this:
-            # adapter = self.get_llm_adapter()
-            # params: AnthropicLLMInvocationParams = adapter.get_llm_invocation_params(context)
-            # messages = params["messages"]
-            # system = params["system_instruction"]
-            raise NotImplementedError("Universal LLMContext is not yet supported for Anthropic.")
+            adapter: AnthropicLLMAdapter = self.get_llm_adapter()
+            params = adapter.get_llm_invocation_params(
+                context, enable_prompt_caching=self._settings["enable_prompt_caching"]
+            )
+            messages = params["messages"]
+            system = params["system"]
        else:
            context = AnthropicLLMContext.upgrade_to_anthropic(context)
            messages = context.messages
-            system = getattr(context, "system", None) or system_instruction
+            system = getattr(context, "system", NOT_GIVEN)

        # LLM completion
        response = await self._client.messages.create(
@@ -239,15 +261,6 @@ class AnthropicLLMService(LLMService):

        return response.content[0].text

-    @property
-    def enable_prompt_caching_beta(self) -> bool:
-        """Check if prompt caching beta feature is enabled.
-
-        Returns:
-            True if prompt caching is enabled.
-        """
-        return self._enable_prompt_caching_beta
-
    def create_context_aggregator(
        self,
        context: OpenAILLMContext,
@@ -277,8 +290,31 @@ class AnthropicLLMService(LLMService):
        assistant = AnthropicAssistantContextAggregator(context, params=assistant_params)
        return AnthropicContextAggregatorPair(_user=user, _assistant=assistant)

+    def _get_llm_invocation_params(
+        self, context: OpenAILLMContext | LLMContext
+    ) -> AnthropicLLMInvocationParams:
+        # Universal LLMContext
+        if isinstance(context, LLMContext):
+            adapter: AnthropicLLMAdapter = self.get_llm_adapter()
+            params = adapter.get_llm_invocation_params(
+                context, enable_prompt_caching=self._settings["enable_prompt_caching"]
+            )
+            return params
+
+        # Anthropic-specific context
+        messages = (
+            context.get_messages_with_cache_control_markers()
+            if self._settings["enable_prompt_caching"]
+            else context.messages
+        )
+        return AnthropicLLMInvocationParams(
+            system=context.system,
+            messages=messages,
+            tools=context.tools or [],
+        )
+
    @traced_llm
-    async def _process_context(self, context: OpenAILLMContext):
+    async def _process_context(self, context: OpenAILLMContext | LLMContext):
        # Usage tracking. We track the usage reported by Anthropic in prompt_tokens and
        # completion_tokens. We also estimate the completion tokens from output text
        # and use that estimate if we are interrupted, because we almost certainly won't
@@ -294,24 +330,22 @@ class AnthropicLLMService(LLMService):
            await self.push_frame(LLMFullResponseStartFrame())
            await self.start_processing_metrics()

+            params_from_context = self._get_llm_invocation_params(context)
+
+            if isinstance(context, LLMContext):
+                adapter = self.get_llm_adapter()
+                context_type_for_logging = "universal"
+                messages_for_logging = adapter.get_messages_for_logging(context)
+            else:
+                context_type_for_logging = "LLM-specific"
+                messages_for_logging = context.get_messages_for_logging()
            logger.debug(
-                f"{self}: Generating chat [{context.system}] | {context.get_messages_for_logging()}"
+                f"{self}: Generating chat from {context_type_for_logging} context [{params_from_context['system']}] | {messages_for_logging}"
            )

-            messages = context.messages
-            if self._settings["enable_prompt_caching_beta"]:
-                messages = context.get_messages_with_cache_control_markers()
-
-            api_call = self._client.messages.create
-            if self._settings["enable_prompt_caching_beta"]:
-                api_call = self._client.beta.prompt_caching.messages.create
-
            await self.start_ttfb_metrics()

            params = {
-                "tools": context.tools or [],
-                "system": context.system,
-                "messages": messages,
                "model": self.model_name,
                "max_tokens": self._settings["max_tokens"],
                "stream": True,
@@ -320,9 +354,12 @@ class AnthropicLLMService(LLMService):
                "top_p": self._settings["top_p"],
            }

+            # Messages, system, tools
+            params.update(params_from_context)
+
            params.update(self._settings["extra"])

-            response = await self._create_message_stream(api_call, params)
+            response = await self._create_message_stream(self._client.messages.create, params)

            await self.stop_ttfb_metrics()

@@ -405,7 +442,10 @@ class AnthropicLLMService(LLMService):
                        prompt_tokens + cache_creation_input_tokens + cache_read_input_tokens
                    )
                    if total_input_tokens >= 1024:
-                        context.turns_above_cache_threshold += 1
+                        if hasattr(
+                            context, "turns_above_cache_threshold"
+                        ):  # LLMContext doesn't have this attribute
+                            context.turns_above_cache_threshold += 1

            await self.run_function_calls(function_calls)

@@ -451,20 +491,14 @@ class AnthropicLLMService(LLMService):
        if isinstance(frame, OpenAILLMContextFrame):
            context: "AnthropicLLMContext" = AnthropicLLMContext.upgrade_to_anthropic(frame.context)
        elif isinstance(frame, LLMContextFrame):
-            raise NotImplementedError("Universal LLMContext is not yet supported for Anthropic.")
+            context = frame.context
        elif isinstance(frame, LLMMessagesFrame):
            context = AnthropicLLMContext.from_messages(frame.messages)
-        elif isinstance(frame, VisionImageRawFrame):
-            # This is only useful in very simple pipelines because it creates
-            # a new context. Generally we want a context manager to catch
-            # UserImageRawFrames coming through the pipeline and add them
-            # to the context.
-            context = AnthropicLLMContext.from_image_frame(frame)
        elif isinstance(frame, LLMUpdateSettingsFrame):
            await self._update_settings(frame.settings)
        elif isinstance(frame, LLMEnablePromptCachingFrame):
            logger.debug(f"Setting enable prompt caching to: [{frame.enable}]")
-            self._settings["enable_prompt_caching_beta"] = frame.enable
+            self._settings["enable_prompt_caching"] = frame.enable
        else:
            await self.push_frame(frame, direction)

@@ -585,22 +619,6 @@ class AnthropicLLMContext(OpenAILLMContext):
        self._restructure_from_openai_messages()
        return self

-    @classmethod
-    def from_image_frame(cls, frame: VisionImageRawFrame) -> "AnthropicLLMContext":
-        """Create context from a vision image frame.
-
-        Args:
-            frame: The vision image frame to process.
-
-        Returns:
-            New Anthropic context with the image message.
-        """
-        context = cls()
-        context.add_image_frame_message(
-            format=frame.format, size=frame.size, image=frame.image, text=frame.text
-        )
-        return context
-
    def set_messages(self, messages: List):
        """Set the messages list and reset cache tracking.

--- a/src/pipecat/services/asyncai/tts.py
+++ b/src/pipecat/services/asyncai/tts.py
@@ -52,6 +52,10 @@ def language_to_async_language(language: Language) -> Optional[str]:
    """
    BASE_LANGUAGES = {
        Language.EN: "en",
+        Language.FR: "fr",
+        Language.ES: "es",
+        Language.DE: "de",
+        Language.IT: "it",
    }

    result = BASE_LANGUAGES.get(language)
--- a/src/pipecat/services/aws/llm.py
+++ b/src/pipecat/services/aws/llm.py
@@ -39,7 +39,6 @@ from pipecat.frames.frames import (
    LLMTextFrame,
    LLMUpdateSettingsFrame,
    UserImageRawFrame,
-    VisionImageRawFrame,
 )
 from pipecat.metrics.metrics import LLMTokenUsage
 from pipecat.processors.aggregators.llm_context import LLMContext
@@ -180,22 +179,6 @@ class AWSBedrockLLMContext(OpenAILLMContext):
        self._restructure_from_openai_messages()
        return self

-    @classmethod
-    def from_image_frame(cls, frame: VisionImageRawFrame) -> "AWSBedrockLLMContext":
-        """Create AWS Bedrock context from vision image frame.
-
-        Args:
-            frame: The vision image frame to convert.
-
-        Returns:
-            New AWS Bedrock LLM context instance.
-        """
-        context = cls()
-        context.add_image_frame_message(
-            format=frame.format, size=frame.size, image=frame.image, text=frame.text
-        )
-        return context
-
    def set_messages(self, messages: List):
        """Set the messages list and restructure for Bedrock format.

@@ -399,9 +382,33 @@ class AWSBedrockLLMContext(OpenAILLMContext):
        elif isinstance(content, list):
            new_content = []
            for item in content:
+                # fix empty text
                if item.get("type", "") == "text":
                    text_content = item["text"] if item["text"] != "" else "(empty)"
                    new_content.append({"text": text_content})
+                # handle image_url -> image conversion
+                if item["type"] == "image_url":
+                    new_item = {
+                        "image": {
+                            "format": "jpeg",
+                            "source": {
+                                "bytes": base64.b64decode(item["image_url"]["url"].split(",")[1])
+                            },
+                        }
+                    }
+                    new_content.append(new_item)
+            # In the case where there's a single image in the list (like what
+            # would result from a UserImageRawFrame), ensure that the image
+            # comes before text
+            image_indices = [i for i, item in enumerate(new_content) if "image" in item]
+            text_indices = [i for i, item in enumerate(new_content) if "text" in item]
+            if len(image_indices) == 1 and text_indices:
+                img_idx = image_indices[0]
+                first_txt_idx = text_indices[0]
+                if img_idx > first_txt_idx:
+                    # Move image before the first text
+                    image_item = new_content.pop(img_idx)
+                new_content.insert(first_txt_idx, image_item)
            return {"role": message["role"], "content": new_content}

        return message
@@ -569,7 +576,7 @@ class AWSBedrockLLMContext(OpenAILLMContext):
                if isinstance(msg["content"], list):
                    for item in msg["content"]:
                        if item.get("image"):
-                            item["source"]["bytes"] = "..."
+                            item["image"]["source"]["bytes"] = "..."
            msgs.append(msg)
        return msgs

@@ -792,17 +799,11 @@ class AWSBedrockLLMService(LLMService):
        """
        return True

-    async def run_inference(
-        self, context: LLMContext | OpenAILLMContext, system_instruction: Optional[str] = None
-    ) -> Optional[str]:
+    async def run_inference(self, context: LLMContext | OpenAILLMContext) -> Optional[str]:
        """Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

        Args:
            context: The LLM context containing conversation history.
-            system_instruction: Optional system instruction to guide the LLM's
-              behavior. You could also (again, optionally) provide a system
-              instruction directly in the context. If both are provided, the
-              one in the context takes precedence.

        Returns:
            The LLM's response as a string, or None if no response is generated.
@@ -815,14 +816,14 @@ class AWSBedrockLLMService(LLMService):
                # adapter = self.get_llm_adapter()
                # params: AWSBedrockLLMInvocationParams = adapter.get_llm_invocation_params(context)
                # messages = params["messages"]
-                # system = params["system_instruction"]
+                # system = params["system_instruction"] # [{"text": "system message"}]
                raise NotImplementedError(
                    "Universal LLMContext is not yet supported for AWS Bedrock."
                )
            else:
                context = AWSBedrockLLMContext.upgrade_to_bedrock(context)
                messages = context.messages
-                system = getattr(context, "system", None) or system_instruction
+                system = getattr(context, "system", None)  # [{"text": "system message"}]

            # Determine if we're using Claude or Nova based on model ID
            model_id = self.model_name
@@ -839,7 +840,7 @@ class AWSBedrockLLMService(LLMService):
            }

            if system:
-                request_params["system"] = [{"text": system}]
+                request_params["system"] = system

            async with self._aws_session.client(
                service_name="bedrock-runtime", **self._aws_params
@@ -880,7 +881,7 @@ class AWSBedrockLLMService(LLMService):
        if self._retry_on_timeout:
            try:
                response = await asyncio.wait_for(
-                    await client.converse_stream(**request_params), timeout=self._retry_timeout_secs
+                    client.converse_stream(**request_params), timeout=self._retry_timeout_secs
                )
                return response
            except (ReadTimeoutError, asyncio.TimeoutError) as e:
@@ -973,7 +974,9 @@ class AWSBedrockLLMService(LLMService):
            }

            # Add system message
-            request_params["system"] = context.system
+            system = getattr(context, "system", None)
+            if system:
+                request_params["system"] = system

            # Check if messages contain tool use or tool result content blocks
            has_tool_content = False
@@ -1015,7 +1018,10 @@ class AWSBedrockLLMService(LLMService):
            if self._settings["latency"] in ["standard", "optimized"]:
                request_params["performanceConfig"] = {"latency": self._settings["latency"]}

-            logger.debug(f"Calling AWS Bedrock model with: {request_params}")
+            # Log request params with messages redacted for logging
+            log_params = dict(request_params)
+            log_params["messages"] = context.get_messages_for_logging()
+            logger.debug(f"Calling AWS Bedrock model with: {log_params}")

            async with self._aws_session.client(
                service_name="bedrock-runtime", **self._aws_params
@@ -1126,12 +1132,6 @@ class AWSBedrockLLMService(LLMService):
            raise NotImplementedError("Universal LLMContext is not yet supported for AWS Bedrock.")
        elif isinstance(frame, LLMMessagesFrame):
            context = AWSBedrockLLMContext.from_messages(frame.messages)
-        elif isinstance(frame, VisionImageRawFrame):
-            # This is only useful in very simple pipelines because it creates
-            # a new context. Generally we want a context manager to catch
-            # UserImageRawFrames coming through the pipeline and add them
-            # to the context.
-            context = AWSBedrockLLMContext.from_image_frame(frame)
        elif isinstance(frame, LLMUpdateSettingsFrame):
            await self._update_settings(frame.settings)
        else:
--- a/src/pipecat/services/gemini_multimodal_live/gemini.py
+++ b/src/pipecat/services/gemini_multimodal_live/gemini.py
@@ -33,6 +33,7 @@ from pipecat.frames.frames import (
    InputAudioRawFrame,
    InputImageRawFrame,
    InputTextRawFrame,
+    LLMContextFrame,
    LLMFullResponseEndFrame,
    LLMFullResponseStartFrame,
    LLMMessagesAppendFrame,
@@ -738,6 +739,10 @@ class GeminiMultimodalLiveLLMService(LLMService):
                # Support just one tool call per context frame for now
                tool_result_message = context.messages[-1]
                await self._tool_result(tool_result_message)
+        elif isinstance(frame, LLMContextFrame):
+            raise NotImplementedError(
+                "Universal LLMContext is not yet supported for Gemini Multimodal Live."
+            )
        elif isinstance(frame, InputTextRawFrame):
            await self._send_user_text(frame.text)
            await self.push_frame(frame, direction)
--- a/src/pipecat/services/google/llm.py
+++ b/src/pipecat/services/google/llm.py
@@ -36,7 +36,6 @@ from pipecat.frames.frames import (
    LLMTextFrame,
    LLMUpdateSettingsFrame,
    UserImageRawFrame,
-    VisionImageRawFrame,
 )
 from pipecat.metrics.metrics import LLMTokenUsage
 from pipecat.processors.aggregators.llm_context import LLMContext
@@ -733,17 +732,11 @@ class GoogleLLMService(LLMService):
    def _create_client(self, api_key: str, http_options: Optional[HttpOptions] = None):
        self._client = genai.Client(api_key=api_key, http_options=http_options)

-    async def run_inference(
-        self, context: LLMContext | OpenAILLMContext, system_instruction: Optional[str] = None
-    ) -> Optional[str]:
+    async def run_inference(self, context: LLMContext | OpenAILLMContext) -> Optional[str]:
        """Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

        Args:
            context: The LLM context containing conversation history.
-            system_instruction: Optional system instruction to guide the LLM's
-              behavior. You could also (again, optionally) provide a system
-              instruction directly in the context. If both are provided, the
-              one in the context takes precedence.

        Returns:
            The LLM's response as a string, or None if no response is generated.
@@ -758,7 +751,7 @@ class GoogleLLMService(LLMService):
        else:
            context = GoogleLLMContext.upgrade_to_google(context)
            messages = context.messages
-            system = getattr(context, "system_message", None) or system_instruction
+            system = getattr(context, "system_message", None)

        generation_config = GenerateContentConfig(system_instruction=system)

@@ -858,8 +851,7 @@ class GoogleLLMService(LLMService):
        self, context: OpenAILLMContext
    ) -> AsyncIterator[GenerateContentResponse]:
        logger.debug(
-            # f"{self}: Generating chat [{self._system_instruction}] | {context.get_messages_for_logging()}"
-            f"{self}: Generating chat from OpenAI context {context.get_messages_for_logging()}"
+            f"{self}: Generating chat from LLM-specific context [{context.system_message}] | {context.get_messages_for_logging()}"
        )

        params = GeminiLLMInvocationParams(
@@ -874,13 +866,12 @@ class GoogleLLMService(LLMService):
        self, context: LLMContext
    ) -> AsyncIterator[GenerateContentResponse]:
        adapter = self.get_llm_adapter()
-        logger.debug(
-            # f"{self}: Generating chat [{self._system_instruction}] | {context.get_messages_for_logging()}"
-            f"{self}: Generating chat from universal context {adapter.get_messages_for_logging(context)}"
-        )
-
        params: GeminiLLMInvocationParams = adapter.get_llm_invocation_params(context)

+        logger.debug(
+            f"{self}: Generating chat from universal context [{params['system_instruction']}] | {adapter.get_messages_for_logging(context)}"
+        )
+
        return await self._stream_content(params)

    @traced_llm
@@ -1021,15 +1012,6 @@ class GoogleLLMService(LLMService):
            # NOTE: LLMMessagesFrame is deprecated, so we don't support the newer universal
            # LLMContext with it
            context = GoogleLLMContext(frame.messages)
-        elif isinstance(frame, VisionImageRawFrame):
-            # This is only useful in very simple pipelines because it creates
-            # a new context. Generally we want a context manager to catch
-            # UserImageRawFrames coming through the pipeline and add them
-            # to the context.
-            context = GoogleLLMContext()
-            context.add_image_frame_message(
-                format=frame.format, size=frame.size, image=frame.image, text=frame.text
-            )
        elif isinstance(frame, LLMUpdateSettingsFrame):
            await self._update_settings(frame.settings)
        else:
--- a/src/pipecat/services/llm_service.py
+++ b/src/pipecat/services/llm_service.py
@@ -195,18 +195,13 @@ class LLMService(AIService):
        """
        return self._adapter

-    async def run_inference(
-        self, context: LLMContext | OpenAILLMContext, system_instruction: Optional[str] = None
-    ) -> Optional[str]:
+    async def run_inference(self, context: LLMContext | OpenAILLMContext) -> Optional[str]:
        """Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

        Must be implemented by subclasses.

        Args:
            context: The LLM context containing conversation history.
-            system_instruction: Optional system instruction to guide the LLM's
-              behavior. You could also (again, optionally) provide a system
-              instruction directly in the context.

        Returns:
            The LLM's response as a string, or None if no response is generated.
--- a/src/pipecat/services/mistral/llm.py
+++ b/src/pipecat/services/mistral/llm.py
@@ -57,16 +57,18 @@ class MistralLLMService(OpenAILLMService):
        logger.debug(f"Creating Mistral client with api {base_url}")
        return super().create_client(api_key, base_url, **kwargs)

-    def _apply_mistral_assistant_prefix(
+    def _apply_mistral_fixups(
        self, messages: List[ChatCompletionMessageParam]
    ) -> List[ChatCompletionMessageParam]:
-        """Apply Mistral's assistant message prefix requirement.
+        """Apply fixups to messages to meet Mistral-specific requirements.

-        Mistral requires assistant messages to have prefix=True when they
-        are the final message in a conversation. According to Mistral's API:
-        - Assistant messages with prefix=True MUST be the last message
-        - Only add prefix=True to the final assistant message when needed
-        - This allows assistant messages to be accepted as the last message
+        1. A "tool"-role message must be followed by an assistant message.
+
+        2. "system"-role messages must only appear at the start of a
+           conversation.
+
+        3. Assistant messages must have prefix=True when they are the final
+           message in a conversation (but at no other point).

        Args:
            messages: The original list of messages.
@@ -80,6 +82,25 @@ class MistralLLMService(OpenAILLMService):
        # Create a copy to avoid modifying the original
        fixed_messages = [dict(msg) for msg in messages]

+        # Ensure all tool responses are followed by an assistant message
+        assistant_insert_indices = []
+        for i, msg in enumerate(fixed_messages):
+            if msg.get("role") == "tool":
+                # If this is the last message or the next message is not assistant
+                if i == len(fixed_messages) - 1 or fixed_messages[i + 1].get("role") != "assistant":
+                    assistant_insert_indices.append(i + 1)
+        for idx in reversed(assistant_insert_indices):
+            fixed_messages.insert(idx, {"role": "assistant", "content": " "})
+
+        # Convert any "system" messages that aren't at the start (i.e., after the initial contiguous block) to "user"
+        first_non_system_idx = next(
+            (i for i, msg in enumerate(fixed_messages) if msg.get("role") != "system"),
+            len(fixed_messages),
+        )
+        for i, msg in enumerate(fixed_messages):
+            if msg.get("role") == "system" and i >= first_non_system_idx:
+                msg["role"] = "user"
+
        # Get the last message
        last_message = fixed_messages[-1]

@@ -158,7 +179,7 @@ class MistralLLMService(OpenAILLMService):
        - Core completion settings
        """
        # Apply Mistral's assistant prefix requirement for API compatibility
-        fixed_messages = self._apply_mistral_assistant_prefix(params_from_context["messages"])
+        fixed_messages = self._apply_mistral_fixups(params_from_context["messages"])

        params = {
            "model": self.model_name,
--- a/src/pipecat/services/moondream/vision.py
+++ b/src/pipecat/services/moondream/vision.py
@@ -11,17 +11,20 @@ for image analysis and description generation.
 """

 import asyncio
-from typing import AsyncGenerator
+import base64
+from io import BytesIO
+from typing import AsyncGenerator, Optional

 from loguru import logger
 from PIL import Image

-from pipecat.frames.frames import ErrorFrame, Frame, TextFrame, VisionImageRawFrame
+from pipecat.frames.frames import ErrorFrame, Frame, TextFrame
+from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.services.vision_service import VisionService

 try:
    import torch
-    from transformers import AutoModelForCausalLM, AutoTokenizer
+    from transformers import AutoModelForCausalLM
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Moondream, you need to `pip install pipecat-ai[moondream]`.")
@@ -94,11 +97,11 @@ class MoondreamService(VisionService):

        logger.debug("Loaded Moondream model")

-    async def run_vision(self, frame: VisionImageRawFrame) -> AsyncGenerator[Frame, None]:
+    async def run_vision(self, context: LLMContext) -> AsyncGenerator[Frame, None]:
        """Analyze an image and generate a description.

        Args:
-            frame: Vision frame containing the image data and optional question text.
+            context: The context to process, containing image data.

        Yields:
            Frame: TextFrame containing the generated image description, or ErrorFrame
@@ -109,22 +112,45 @@ class MoondreamService(VisionService):
            yield ErrorFrame("Moondream model not available")
            return

-        logger.debug(f"Analyzing image: {frame}")
+        image_bytes = None
+        text = None
+        try:
+            messages = context.get_messages()
+            last_message = messages[-1]
+            last_message_content = last_message.get("content")

-        def get_image_description(frame: VisionImageRawFrame):
-            """Generate description for the given image frame.
+            for item in last_message_content:
+                if isinstance(item, dict):
+                    if (
+                        "image_url" in item
+                        and isinstance(item["image_url"], dict)
+                        and item["image_url"].get("url")
+                    ):
+                        image_bytes = base64.b64decode(item["image_url"]["url"].split(",")[1])
+                    elif "text" in item and isinstance(item["text"], str):
+                        text = item["text"]

-            Args:
-                frame: Vision frame containing image data and question.
+        except Exception as e:
+            logger.error(f"Exception during image extraction: {e}")
+            yield ErrorFrame("Failed to extract image from context")
+            return

-            Returns:
-                str: Generated description of the image.
-            """
-            image = Image.frombytes(frame.format, frame.size, frame.image)
+        if not image_bytes:
+            logger.error("No image found in context")
+            yield ErrorFrame("No image found in context")
+            return
+
+        logger.debug(
+            f"Analyzing image (bytes length: {len(image_bytes) if image_bytes else 'None'})"
+        )
+
+        def get_image_description(bytes: bytes, text: Optional[str]) -> str:
+            image_buffer = BytesIO(bytes)
+            image = Image.open(image_buffer)
            image_embeds = self._model.encode_image(image)
-            description = self._model.query(image_embeds, frame.text)["answer"]
+            description = self._model.query(image_embeds, text)["answer"]
            return description

-        description = await asyncio.to_thread(get_image_description, frame)
+        description = await asyncio.to_thread(get_image_description, image_bytes, text)

        yield TextFrame(text=description)
--- a/src/pipecat/services/openai/base_llm.py
+++ b/src/pipecat/services/openai/base_llm.py
@@ -32,7 +32,6 @@ from pipecat.frames.frames import (
    LLMMessagesFrame,
    LLMTextFrame,
    LLMUpdateSettingsFrame,
-    VisionImageRawFrame,
 )
 from pipecat.metrics.metrics import LLMTokenUsage
 from pipecat.processors.aggregators.llm_context import LLMContext
@@ -245,16 +244,11 @@ class BaseOpenAILLMService(LLMService):
        params.update(self._settings["extra"])
        return params

-    async def run_inference(
-        self, context: LLMContext | OpenAILLMContext, system_instruction: Optional[str] = None
-    ) -> Optional[str]:
+    async def run_inference(self, context: LLMContext | OpenAILLMContext) -> Optional[str]:
        """Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

        Args:
            context: The LLM context containing conversation history.
-            system_instruction: Optional system instruction to guide the LLM's
-              behavior. You could also (again, optionally) provide a system
-              instruction directly in the context.

        Returns:
            The LLM's response as a string, or None if no response is generated.
@@ -279,7 +273,7 @@ class BaseOpenAILLMService(LLMService):
        self, context: OpenAILLMContext
    ) -> AsyncStream[ChatCompletionChunk]:
        logger.debug(
-            f"{self}: Generating chat from OpenAI context {context.get_messages_for_logging()}"
+            f"{self}: Generating chat from LLM-specific context {context.get_messages_for_logging()}"
        )

        messages: List[ChatCompletionMessageParam] = context.get_messages()
@@ -423,8 +417,8 @@ class BaseOpenAILLMService(LLMService):
        """Process frames for LLM completion requests.

        Handles OpenAILLMContextFrame, LLMContextFrame, LLMMessagesFrame,
-        VisionImageRawFrame, and LLMUpdateSettingsFrame to trigger LLM
-        completions and manage settings.
+        and LLMUpdateSettingsFrame to trigger LLM completions and manage
+        settings.

        Args:
            frame: The frame to process.
@@ -443,16 +437,6 @@ class BaseOpenAILLMService(LLMService):
            # NOTE: LLMMessagesFrame is deprecated, so we don't support the newer universal
            # LLMContext with it
            context = OpenAILLMContext.from_messages(frame.messages)
-        elif isinstance(frame, VisionImageRawFrame):
-            # This is only useful in very simple pipelines because it creates
-            # a new context. Generally we want a context manager to catch
-            # UserImageRawFrames coming through the pipeline and add them
-            # to the context.
-            # TODO: support the newer universal LLMContext with a VisionImageRawFrame equivalent?
-            context = OpenAILLMContext()
-            context.add_image_frame_message(
-                format=frame.format, size=frame.size, image=frame.image, text=frame.text
-            )
        elif isinstance(frame, LLMUpdateSettingsFrame):
            await self._update_settings(frame.settings)
        else:
--- a/src/pipecat/services/openai/image.py
+++ b/src/pipecat/services/openai/image.py
@@ -84,5 +84,10 @@ class OpenAIImageGenService(ImageGenService):
        async with self._aiohttp_session.get(image_url) as response:
            image_stream = io.BytesIO(await response.content.read())
            image = Image.open(image_stream)
-            frame = URLImageRawFrame(image_url, image.tobytes(), image.size, image.format)
+            frame = URLImageRawFrame(
+                image=image.tobytes(),
+                size=image.size,
+                format=image.format,
+                url=image_url,
+            )
            yield frame
--- a/src/pipecat/services/openai_realtime/init.py
+++ b/src/pipecat/services/openai_realtime/init.py
@@ -0,0 +1,9 @@
+from .azure import AzureRealtimeLLMService
+from .events import (
+    InputAudioNoiseReduction,
+    InputAudioTranscription,
+    SemanticTurnDetection,
+    SessionProperties,
+    TurnDetection,
+)
+from .openai import OpenAIRealtimeLLMService
--- a/src/pipecat/services/openai_realtime/azure.py
+++ b/src/pipecat/services/openai_realtime/azure.py
@@ -0,0 +1,67 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Azure OpenAI Realtime LLM service implementation."""
+
+from loguru import logger
+
+from .openai import OpenAIRealtimeLLMService
+
+try:
+    from websockets.asyncio.client import connect as websocket_connect
+except ModuleNotFoundError as e:
+    logger.error(f"Exception: {e}")
+    logger.error(
+        "In order to use OpenAI, you need to `pip install pipecat-ai[openai]`. Also, set `OPENAI_API_KEY` environment variable."
+    )
+    raise Exception(f"Missing module: {e}")
+
+
+class AzureRealtimeLLMService(OpenAIRealtimeLLMService):
+    """Azure OpenAI Realtime LLM service with Azure-specific authentication.
+
+    Extends the OpenAI Realtime service to work with Azure OpenAI endpoints,
+    using Azure's authentication headers and endpoint format. Provides the same
+    real-time audio and text communication capabilities as the base OpenAI service.
+    """
+
+    def __init__(
+        self,
+        *,
+        api_key: str,
+        base_url: str,
+        **kwargs,
+    ):
+        """Initialize Azure Realtime LLM service.
+
+        Args:
+            api_key: The API key for the Azure OpenAI service.
+            base_url: The full Azure WebSocket endpoint URL including api-version and deployment.
+                Example: "wss://my-project.openai.azure.com/openai/realtime?api-version=2024-10-01-preview&deployment=my-realtime-deployment"
+            **kwargs: Additional arguments passed to parent OpenAIRealtimeLLMService.
+        """
+        super().__init__(base_url=base_url, api_key=api_key, **kwargs)
+        self.api_key = api_key
+        self.base_url = base_url
+
+    async def _connect(self):
+        try:
+            if self._websocket:
+                # Here we assume that if we have a websocket, we are connected. We
+                # handle disconnections in the send/recv code paths.
+                return
+
+            logger.info(f"Connecting to {self.base_url}, api key: {self.api_key}")
+            self._websocket = await websocket_connect(
+                uri=self.base_url,
+                additional_headers={
+                    "api-key": self.api_key,
+                },
+            )
+            self._receive_task = self.create_task(self._receive_task_handler())
+        except Exception as e:
+            logger.error(f"{self} initialization error: {e}")
+            self._websocket = None
--- a/src/pipecat/services/openai_realtime/context.py
+++ b/src/pipecat/services/openai_realtime/context.py
@@ -0,0 +1,272 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""OpenAI Realtime LLM context and aggregator implementations."""
+
+import copy
+import json
+
+from loguru import logger
+
+from pipecat.frames.frames import (
+    Frame,
+    FunctionCallResultFrame,
+    InterimTranscriptionFrame,
+    LLMMessagesUpdateFrame,
+    LLMSetToolsFrame,
+    LLMTextFrame,
+    TranscriptionFrame,
+)
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.processors.frame_processor import FrameDirection
+from pipecat.services.openai.llm import (
+    OpenAIAssistantContextAggregator,
+    OpenAIUserContextAggregator,
+)
+
+from . import events
+from .frames import RealtimeFunctionCallResultFrame, RealtimeMessagesUpdateFrame
+
+
+class OpenAIRealtimeLLMContext(OpenAILLMContext):
+    """OpenAI Realtime LLM context with session management and message conversion.
+
+    Extends the standard OpenAI LLM context to support real-time session properties,
+    instruction management, and conversion between standard message formats and
+    realtime conversation items.
+    """
+
+    def __init__(self, messages=None, tools=None, **kwargs):
+        """Initialize the OpenAIRealtimeLLMContext.
+
+        Args:
+            messages: Initial conversation messages. Defaults to None.
+            tools: Available function tools. Defaults to None.
+            **kwargs: Additional arguments passed to parent OpenAILLMContext.
+        """
+        super().__init__(messages=messages, tools=tools, **kwargs)
+        self.__setup_local()
+
+    def __setup_local(self):
+        self.llm_needs_settings_update = True
+        self.llm_needs_initial_messages = True
+        self._session_instructions = ""
+
+        return
+
+    @staticmethod
+    def upgrade_to_realtime(obj: OpenAILLMContext) -> "OpenAIRealtimeLLMContext":
+        """Upgrade a standard OpenAI LLM context to a realtime context.
+
+        Args:
+            obj: The OpenAILLMContext instance to upgrade.
+
+        Returns:
+            The upgraded OpenAIRealtimeLLMContext instance.
+        """
+        if isinstance(obj, OpenAILLMContext) and not isinstance(obj, OpenAIRealtimeLLMContext):
+            obj.__class__ = OpenAIRealtimeLLMContext
+            obj.__setup_local()
+        return obj
+
+    # todo
+    #   - finish implementing all frames
+
+    def from_standard_message(self, message):
+        """Convert a standard message format to a realtime conversation item.
+
+        Args:
+            message: The standard message dictionary to convert.
+
+        Returns:
+            A ConversationItem instance for the realtime API.
+        """
+        if message.get("role") == "user":
+            content = message.get("content")
+            if isinstance(message.get("content"), list):
+                content = ""
+                for c in message.get("content"):
+                    if c.get("type") == "text":
+                        content += " " + c.get("text")
+                    else:
+                        logger.error(
+                            f"Unhandled content type in context message: {c.get('type')} - {message}"
+                        )
+            return events.ConversationItem(
+                role="user",
+                type="message",
+                content=[events.ItemContent(type="input_text", text=content)],
+            )
+        if message.get("role") == "assistant" and message.get("tool_calls"):
+            tc = message.get("tool_calls")[0]
+            return events.ConversationItem(
+                type="function_call",
+                call_id=tc["id"],
+                name=tc["function"]["name"],
+                arguments=tc["function"]["arguments"],
+            )
+        logger.error(f"Unhandled message type in from_standard_message: {message}")
+
+    def get_messages_for_initializing_history(self):
+        """Get conversation items for initializing the realtime session history.
+
+        Converts the context's messages to a format suitable for the realtime API,
+        handling system instructions and conversation history packaging.
+
+        Returns:
+            List of conversation items for session initialization.
+        """
+        # We can't load a long conversation history into the openai realtime api yet. (The API/model
+        # forgets that it can do audio, if you do a series of `conversation.item.create` calls.) So
+        # our general strategy until this is fixed is just to put everything into a first "user"
+        # message as a single input.
+        if not self.messages:
+            return []
+
+        messages = copy.deepcopy(self.messages)
+
+        # If we have a "system" message as our first message, let's pull that out into session
+        # "instructions"
+        if messages[0].get("role") == "system":
+            self.llm_needs_settings_update = True
+            system = messages.pop(0)
+            content = system.get("content")
+            if isinstance(content, str):
+                self._session_instructions = content
+            elif isinstance(content, list):
+                self._session_instructions = content[0].get("text")
+            if not messages:
+                return []
+
+        # If we have just a single "user" item, we can just send it normally
+        if len(messages) == 1 and messages[0].get("role") == "user":
+            return [self.from_standard_message(messages[0])]
+
+        # Otherwise, let's pack everything into a single "user" message with a bit of
+        # explanation for the LLM
+        intro_text = """
+        This is a previously saved conversation. Please treat this conversation history as a
+        starting point for the current conversation."""
+
+        trailing_text = """
+        This is the end of the previously saved conversation. Please continue the conversation
+        from here. If the last message is a user instruction or question, act on that instruction
+        or answer the question. If the last message is an assistant response, simple say that you
+        are ready to continue the conversation."""
+
+        return [
+            {
+                "role": "user",
+                "type": "message",
+                "content": [
+                    {
+                        "type": "input_text",
+                        "text": "\n\n".join(
+                            [intro_text, json.dumps(messages, indent=2), trailing_text]
+                        ),
+                    }
+                ],
+            }
+        ]
+
+    def add_user_content_item_as_message(self, item):
+        """Add a user content item as a standard message to the context.
+
+        Args:
+            item: The conversation item to add as a user message.
+        """
+        message = {
+            "role": "user",
+            "content": [{"type": "text", "text": item.content[0].transcript}],
+        }
+        self.add_message(message)
+
+
+class OpenAIRealtimeUserContextAggregator(OpenAIUserContextAggregator):
+    """User context aggregator for OpenAI Realtime API.
+
+    Handles user input frames and generates appropriate context updates
+    for the realtime conversation, including message updates and tool settings.
+
+    Args:
+        context: The OpenAI realtime LLM context.
+        **kwargs: Additional arguments passed to parent aggregator.
+    """
+
+    async def process_frame(
+        self, frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM
+    ):
+        """Process incoming frames and handle realtime-specific frame types.
+
+        Args:
+            frame: The frame to process.
+            direction: The direction of frame flow in the pipeline.
+        """
+        await super().process_frame(frame, direction)
+        # Parent does not push LLMMessagesUpdateFrame. This ensures that in a typical pipeline,
+        # messages are only processed by the user context aggregator, which is generally what we want. But
+        # we also need to send new messages over the websocket, so the openai realtime API has them
+        # in its context.
+        if isinstance(frame, LLMMessagesUpdateFrame):
+            await self.push_frame(RealtimeMessagesUpdateFrame(context=self._context))
+
+        # Parent also doesn't push the LLMSetToolsFrame.
+        if isinstance(frame, LLMSetToolsFrame):
+            await self.push_frame(frame, direction)
+
+    async def push_aggregation(self):
+        """Push user input aggregation.
+
+        Currently ignores all user input coming into the pipeline as realtime
+        audio input is handled directly by the service.
+        """
+        # for the moment, ignore all user input coming into the pipeline.
+        # todo: think about whether/how to fix this to allow for text input from
+        #       upstream (transport/transcription, or other sources)
+        pass
+
+
+class OpenAIRealtimeAssistantContextAggregator(OpenAIAssistantContextAggregator):
+    """Assistant context aggregator for OpenAI Realtime API.
+
+    Handles assistant output frames from the realtime service, filtering
+    out duplicate text frames and managing function call results.
+
+    Args:
+        context: The OpenAI realtime LLM context.
+        **kwargs: Additional arguments passed to parent aggregator.
+    """
+
+    # The LLMAssistantContextAggregator uses TextFrames to aggregate the LLM output,
+    # but the OpenAIRealtimeLLMService pushes LLMTextFrames and TTSTextFrames. We
+    # need to override this proces_frame for LLMTextFrame, so that only the TTSTextFrames
+    # are process. This ensures that the context gets only one set of messages.
+    # OpenAIRealtimeLLMService also pushes TranscriptionFrames and InterimTranscriptionFrames,
+    # so we need to ignore pushing those as well, as they're also TextFrames.
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """Process assistant frames, filtering out duplicate text content.
+
+        Args:
+            frame: The frame to process.
+            direction: The direction of frame flow in the pipeline.
+        """
+        if not isinstance(frame, (LLMTextFrame, TranscriptionFrame, InterimTranscriptionFrame)):
+            await super().process_frame(frame, direction)
+
+    async def handle_function_call_result(self, frame: FunctionCallResultFrame):
+        """Handle function call result and notify the realtime service.
+
+        Args:
+            frame: The function call result frame to handle.
+        """
+        await super().handle_function_call_result(frame)
+
+        # The standard function callback code path pushes the FunctionCallResultFrame from the llm itself,
+        # so we didn't have a chance to add the result to the openai realtime api context. Let's push a
+        # special frame to do that.
+        await self.push_frame(
+            RealtimeFunctionCallResultFrame(result_frame=frame), FrameDirection.UPSTREAM
+        )
--- a/src/pipecat/services/openai_realtime/events.py
+++ b/src/pipecat/services/openai_realtime/events.py
--- a/src/pipecat/services/openai_realtime/frames.py
+++ b/src/pipecat/services/openai_realtime/frames.py
@@ -0,0 +1,37 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Custom frame types for OpenAI Realtime API integration."""
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING
+
+from pipecat.frames.frames import DataFrame, FunctionCallResultFrame
+
+if TYPE_CHECKING:
+    from pipecat.services.openai_realtime_beta.context import OpenAIRealtimeLLMContext
+
+
+@dataclass
+class RealtimeMessagesUpdateFrame(DataFrame):
+    """Frame indicating that the realtime context messages have been updated.
+
+    Parameters:
+        context: The updated OpenAI realtime LLM context.
+    """
+
+    context: "OpenAIRealtimeLLMContext"
+
+
+@dataclass
+class RealtimeFunctionCallResultFrame(DataFrame):
+    """Frame containing function call results for the realtime service.
+
+    Parameters:
+        result_frame: The function call result frame to send to the realtime API.
+    """
+
+    result_frame: FunctionCallResultFrame
--- a/src/pipecat/services/openai_realtime/openai.py
+++ b/src/pipecat/services/openai_realtime/openai.py
@@ -0,0 +1,831 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""OpenAI Realtime LLM service implementation with WebSocket support."""
+
+import base64
+import json
+import time
+from dataclasses import dataclass
+from typing import Optional
+
+from loguru import logger
+
+from pipecat.adapters.services.open_ai_realtime_adapter import OpenAIRealtimeLLMAdapter
+from pipecat.frames.frames import (
+    BotStoppedSpeakingFrame,
+    CancelFrame,
+    EndFrame,
+    ErrorFrame,
+    Frame,
+    InputAudioRawFrame,
+    InterimTranscriptionFrame,
+    LLMContextFrame,
+    LLMFullResponseEndFrame,
+    LLMFullResponseStartFrame,
+    LLMMessagesAppendFrame,
+    LLMSetToolsFrame,
+    LLMTextFrame,
+    LLMUpdateSettingsFrame,
+    StartFrame,
+    StartInterruptionFrame,
+    TranscriptionFrame,
+    TTSAudioRawFrame,
+    TTSStartedFrame,
+    TTSStoppedFrame,
+    TTSTextFrame,
+    UserStartedSpeakingFrame,
+    UserStoppedSpeakingFrame,
+)
+from pipecat.metrics.metrics import LLMTokenUsage
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantAggregatorParams,
+    LLMUserAggregatorParams,
+)
+from pipecat.processors.aggregators.openai_llm_context import (
+    OpenAILLMContext,
+    OpenAILLMContextFrame,
+)
+from pipecat.processors.frame_processor import FrameDirection
+from pipecat.services.llm_service import FunctionCallFromLLM, LLMService
+from pipecat.services.openai.llm import OpenAIContextAggregatorPair
+from pipecat.transcriptions.language import Language
+from pipecat.utils.time import time_now_iso8601
+from pipecat.utils.tracing.service_decorators import traced_openai_realtime, traced_stt
+
+from . import events
+from .context import (
+    OpenAIRealtimeAssistantContextAggregator,
+    OpenAIRealtimeLLMContext,
+    OpenAIRealtimeUserContextAggregator,
+)
+from .frames import RealtimeFunctionCallResultFrame, RealtimeMessagesUpdateFrame
+
+try:
+    from websockets.asyncio.client import connect as websocket_connect
+except ModuleNotFoundError as e:
+    logger.error(f"Exception: {e}")
+    logger.error("In order to use OpenAI, you need to `pip install pipecat-ai[openai]`.")
+    raise Exception(f"Missing module: {e}")
+
+
+@dataclass
+class CurrentAudioResponse:
+    """Tracks the current audio response from the assistant.
+
+    Parameters:
+        item_id: Unique identifier for the audio response item.
+        content_index: Index of the audio content within the item.
+        start_time_ms: Timestamp when the audio response started in milliseconds.
+        total_size: Total size of audio data received in bytes. Defaults to 0.
+    """
+
+    item_id: str
+    content_index: int
+    start_time_ms: int
+    total_size: int = 0
+
+
+class OpenAIRealtimeLLMService(LLMService):
+    """OpenAI Realtime LLM service providing real-time audio and text communication.
+
+    Implements the OpenAI Realtime API with WebSocket communication for low-latency
+    bidirectional audio and text interactions. Supports function calling, conversation
+    management, and real-time transcription.
+    """
+
+    # Overriding the default adapter to use the OpenAIRealtimeLLMAdapter one.
+    adapter_class = OpenAIRealtimeLLMAdapter
+
+    def __init__(
+        self,
+        *,
+        api_key: str,
+        model: str = "gpt-realtime",
+        base_url: str = "wss://api.openai.com/v1/realtime",
+        session_properties: Optional[events.SessionProperties] = None,
+        start_audio_paused: bool = False,
+        send_transcription_frames: bool = True,
+        **kwargs,
+    ):
+        """Initialize the OpenAI Realtime LLM service.
+
+        Args:
+            api_key: OpenAI API key for authentication.
+            model: OpenAI model name. Defaults to "gpt-4o-realtime-preview-2025-06-03".
+            base_url: WebSocket base URL for the realtime API.
+                Defaults to "wss://api.openai.com/v1/realtime".
+            session_properties: Configuration properties for the realtime session.
+                If None, uses default SessionProperties.
+            start_audio_paused: Whether to start with audio input paused. Defaults to False.
+            send_transcription_frames: Whether to emit transcription frames. Defaults to True.
+            **kwargs: Additional arguments passed to parent LLMService.
+        """
+        full_url = f"{base_url}?model={model}"
+        super().__init__(base_url=full_url, **kwargs)
+
+        self.api_key = api_key
+        self.base_url = full_url
+        self.set_model_name(model)
+
+        self._session_properties: events.SessionProperties = (
+            session_properties or events.SessionProperties()
+        )
+        self._audio_input_paused = start_audio_paused
+        self._send_transcription_frames = send_transcription_frames
+        self._websocket = None
+        self._receive_task = None
+        self._context = None
+
+        self._disconnecting = False
+        self._api_session_ready = False
+        self._run_llm_when_api_session_ready = False
+
+        self._current_assistant_response = None
+        self._current_audio_response = None
+
+        self._messages_added_manually = {}
+        self._user_and_response_message_tuple = None
+        self._pending_function_calls = {}  # Track function calls by call_id
+
+        self._register_event_handler("on_conversation_item_created")
+        self._register_event_handler("on_conversation_item_updated")
+        self._retrieve_conversation_item_futures = {}
+
+    def can_generate_metrics(self) -> bool:
+        """Check if the service can generate usage metrics.
+
+        Returns:
+            True if metrics generation is supported.
+        """
+        return True
+
+    def set_audio_input_paused(self, paused: bool):
+        """Set whether audio input is paused.
+
+        Args:
+            paused: True to pause audio input, False to resume.
+        """
+        self._audio_input_paused = paused
+
+    def _is_modality_enabled(self, modality: str) -> bool:
+        """Check if a specific modality is enabled, "text" or "audio"."""
+        modalities = self._session_properties.output_modalities or ["audio", "text"]
+        return modality in modalities
+
+    def _get_enabled_modalities(self) -> list[str]:
+        """Get the list of enabled modalities."""
+        modalities = self._session_properties.output_modalities or ["audio", "text"]
+        # API only supports single modality responses: either ["text"] or ["audio"]
+        if "audio" in modalities:
+            return ["audio"]
+        elif "text" in modalities:
+            return ["text"]
+
+    async def retrieve_conversation_item(self, item_id: str):
+        """Retrieve a conversation item by ID from the server.
+
+        Args:
+            item_id: The ID of the conversation item to retrieve.
+
+        Returns:
+            The retrieved conversation item.
+        """
+        future = self.get_event_loop().create_future()
+        retrieval_in_flight = False
+        if not self._retrieve_conversation_item_futures.get(item_id):
+            self._retrieve_conversation_item_futures[item_id] = []
+        else:
+            retrieval_in_flight = True
+        self._retrieve_conversation_item_futures[item_id].append(future)
+        if not retrieval_in_flight:
+            await self.send_client_event(
+                # Set event_id to "rci_{item_id}" so that we can identify an
+                # error later if the retrieval fails. We don't need a UUID
+                # suffix to the event_id because we're ensuring only one
+                # in-flight retrieval per item_id. (Note: "rci" = "retrieve
+                # conversation item")
+                events.ConversationItemRetrieveEvent(item_id=item_id, event_id=f"rci_{item_id}")
+            )
+        return await future
+
+    #
+    # standard AIService frame handling
+    #
+
+    async def start(self, frame: StartFrame):
+        """Start the service and establish WebSocket connection.
+
+        Args:
+            frame: The start frame triggering service initialization.
+        """
+        await super().start(frame)
+        await self._connect()
+
+    async def stop(self, frame: EndFrame):
+        """Stop the service and close WebSocket connection.
+
+        Args:
+            frame: The end frame triggering service shutdown.
+        """
+        await super().stop(frame)
+        await self._disconnect()
+
+    async def cancel(self, frame: CancelFrame):
+        """Cancel the service and close WebSocket connection.
+
+        Args:
+            frame: The cancel frame triggering service cancellation.
+        """
+        await super().cancel(frame)
+        await self._disconnect()
+
+    #
+    # speech and interruption handling
+    #
+
+    async def _handle_interruption(self):
+        # None and False are different. Check for False. None means we're using OpenAI's
+        # built-in turn detection defaults.
+        turn_detection_disabled = (
+            self._session_properties.audio
+            and self._session_properties.audio.input
+            and self._session_properties.audio.input.turn_detection is False
+        )
+        if turn_detection_disabled:
+            await self.send_client_event(events.InputAudioBufferClearEvent())
+            await self.send_client_event(events.ResponseCancelEvent())
+        await self._truncate_current_audio_response()
+        await self.stop_all_metrics()
+        if self._current_assistant_response:
+            await self.push_frame(LLMFullResponseEndFrame())
+            # Only push TTSStoppedFrame if audio modality is enabled
+            if self._is_modality_enabled("audio"):
+                await self.push_frame(TTSStoppedFrame())
+
+    async def _handle_user_started_speaking(self, frame):
+        pass
+
+    async def _handle_user_stopped_speaking(self, frame):
+        # None and False are different. Check for False. None means we're using OpenAI's
+        # built-in turn detection defaults.
+        turn_detection_disabled = (
+            self._session_properties.audio
+            and self._session_properties.audio.input
+            and self._session_properties.audio.input.turn_detection is False
+        )
+        if turn_detection_disabled:
+            await self.send_client_event(events.InputAudioBufferCommitEvent())
+            await self.send_client_event(events.ResponseCreateEvent())
+
+    async def _handle_bot_stopped_speaking(self):
+        self._current_audio_response = None
+
+    def _calculate_audio_duration_ms(
+        self, total_bytes: int, sample_rate: int = 24000, bytes_per_sample: int = 2
+    ) -> int:
+        """Calculate audio duration in milliseconds based on PCM audio parameters."""
+        samples = total_bytes / bytes_per_sample
+        duration_seconds = samples / sample_rate
+        return int(duration_seconds * 1000)
+
+    async def _truncate_current_audio_response(self):
+        """Truncates the current audio response at the appropriate duration.
+
+        Calculates the actual duration of the audio content and truncates at the shorter of
+        either the wall clock time or the actual audio duration to prevent invalid truncation
+        requests.
+        """
+        if not self._current_audio_response:
+            return
+
+        # if the bot is still speaking, truncate the last message
+        try:
+            current = self._current_audio_response
+            self._current_audio_response = None
+
+            # Calculate actual audio duration instead of using wall clock time
+            audio_duration_ms = self._calculate_audio_duration_ms(current.total_size)
+
+            # Use the shorter of wall clock time or actual audio duration
+            elapsed_ms = int(time.time() * 1000 - current.start_time_ms)
+            truncate_ms = min(elapsed_ms, audio_duration_ms)
+
+            logger.trace(
+                f"Truncating audio: duration={audio_duration_ms}ms, "
+                f"elapsed={elapsed_ms}ms, truncate={truncate_ms}ms"
+            )
+
+            await self.send_client_event(
+                events.ConversationItemTruncateEvent(
+                    item_id=current.item_id,
+                    content_index=current.content_index,
+                    audio_end_ms=truncate_ms,
+                )
+            )
+        except Exception as e:
+            # Log warning and don't re-raise - allow session to continue
+            logger.warning(f"Audio truncation failed (non-fatal): {e}")
+
+    #
+    # frame processing
+    #
+    # StartFrame, StopFrame, CancelFrame implemented in base class
+    #
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """Process incoming frames from the pipeline.
+
+        Args:
+            frame: The frame to process.
+            direction: The direction of frame flow in the pipeline.
+        """
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, TranscriptionFrame):
+            pass
+        elif isinstance(frame, OpenAILLMContextFrame):
+            context: OpenAIRealtimeLLMContext = OpenAIRealtimeLLMContext.upgrade_to_realtime(
+                frame.context
+            )
+            if not self._context:
+                self._context = context
+            elif frame.context is not self._context:
+                # If the context has changed, reset the conversation
+                self._context = context
+                await self.reset_conversation()
+            # Run the LLM at next opportunity
+            await self._create_response()
+        elif isinstance(frame, LLMContextFrame):
+            raise NotImplementedError(
+                "Universal LLMContext is not yet supported for OpenAI Realtime."
+            )
+        elif isinstance(frame, InputAudioRawFrame):
+            if not self._audio_input_paused:
+                await self._send_user_audio(frame)
+        elif isinstance(frame, StartInterruptionFrame):
+            await self._handle_interruption()
+        elif isinstance(frame, UserStartedSpeakingFrame):
+            await self._handle_user_started_speaking(frame)
+        elif isinstance(frame, UserStoppedSpeakingFrame):
+            await self._handle_user_stopped_speaking(frame)
+        elif isinstance(frame, BotStoppedSpeakingFrame):
+            await self._handle_bot_stopped_speaking()
+        elif isinstance(frame, LLMMessagesAppendFrame):
+            await self._handle_messages_append(frame)
+        elif isinstance(frame, RealtimeMessagesUpdateFrame):
+            self._context = frame.context
+        elif isinstance(frame, LLMUpdateSettingsFrame):
+            self._session_properties = events.SessionProperties(**frame.settings)
+            await self._update_settings()
+        elif isinstance(frame, LLMSetToolsFrame):
+            await self._update_settings()
+        elif isinstance(frame, RealtimeFunctionCallResultFrame):
+            await self._handle_function_call_result(frame.result_frame)
+
+        await self.push_frame(frame, direction)
+
+    async def _handle_messages_append(self, frame):
+        logger.error("!!! NEED TO IMPLEMENT MESSAGES APPEND")
+
+    async def _handle_function_call_result(self, frame):
+        item = events.ConversationItem(
+            type="function_call_output",
+            call_id=frame.tool_call_id,
+            output=json.dumps(frame.result),
+        )
+        await self.send_client_event(events.ConversationItemCreateEvent(item=item))
+
+    #
+    # websocket communication
+    #
+
+    async def send_client_event(self, event: events.ClientEvent):
+        """Send a client event to the OpenAI Realtime API.
+
+        Args:
+            event: The client event to send.
+        """
+        await self._ws_send(event.model_dump(exclude_none=True))
+
+    async def _connect(self):
+        try:
+            if self._websocket:
+                # Here we assume that if we have a websocket, we are connected. We
+                # handle disconnections in the send/recv code paths.
+                return
+            self._websocket = await websocket_connect(
+                uri=self.base_url,
+                additional_headers={
+                    "Authorization": f"Bearer {self.api_key}",
+                },
+            )
+            self._receive_task = self.create_task(self._receive_task_handler())
+        except Exception as e:
+            logger.error(f"{self} initialization error: {e}")
+            self._websocket = None
+
+    async def _disconnect(self):
+        try:
+            self._disconnecting = True
+            self._api_session_ready = False
+            await self.stop_all_metrics()
+            if self._websocket:
+                await self._websocket.close()
+                self._websocket = None
+            if self._receive_task:
+                await self.cancel_task(self._receive_task, timeout=1.0)
+                self._receive_task = None
+            self._disconnecting = False
+        except Exception as e:
+            logger.error(f"{self} error disconnecting: {e}")
+
+    async def _ws_send(self, realtime_message):
+        try:
+            if self._websocket:
+                await self._websocket.send(json.dumps(realtime_message))
+        except Exception as e:
+            if self._disconnecting:
+                return
+            logger.error(f"Error sending message to websocket: {e}")
+            # In server-to-server contexts, a WebSocket error should be quite rare. Given how hard
+            # it is to recover from a send-side error with proper state management, and that exponential
+            # backoff for retries can have cost/stability implications for a service cluster, let's just
+            # treat a send-side error as fatal.
+            await self.push_error(ErrorFrame(error=f"Error sending client event: {e}", fatal=True))
+
+    async def _update_settings(self):
+        settings = self._session_properties
+        # tools given in the context override the tools in the session properties
+        if self._context and self._context.tools:
+            settings.tools = self._context.tools
+        # instructions in the context come from an initial "system" message in the
+        # messages list, and override instructions in the session properties
+        if self._context and self._context._session_instructions:
+            settings.instructions = self._context._session_instructions
+        await self.send_client_event(events.SessionUpdateEvent(session=settings))
+
+    #
+    # inbound server event handling
+    # https://platform.openai.com/docs/api-reference/realtime-server-events
+    #
+
+    async def _receive_task_handler(self):
+        async for message in self._websocket:
+            evt = events.parse_server_event(message)
+            if evt.type == "session.created":
+                await self._handle_evt_session_created(evt)
+            elif evt.type == "session.updated":
+                await self._handle_evt_session_updated(evt)
+            elif evt.type == "response.output_audio.delta":
+                await self._handle_evt_audio_delta(evt)
+            elif evt.type == "response.output_audio.done":
+                await self._handle_evt_audio_done(evt)
+            elif evt.type == "conversation.item.added":
+                await self._handle_evt_conversation_item_added(evt)
+            elif evt.type == "conversation.item.done":
+                await self._handle_evt_conversation_item_done(evt)
+            elif evt.type == "conversation.item.input_audio_transcription.delta":
+                await self._handle_evt_input_audio_transcription_delta(evt)
+            elif evt.type == "conversation.item.input_audio_transcription.completed":
+                await self.handle_evt_input_audio_transcription_completed(evt)
+            elif evt.type == "conversation.item.retrieved":
+                await self._handle_conversation_item_retrieved(evt)
+            elif evt.type == "response.done":
+                await self._handle_evt_response_done(evt)
+            elif evt.type == "input_audio_buffer.speech_started":
+                await self._handle_evt_speech_started(evt)
+            elif evt.type == "input_audio_buffer.speech_stopped":
+                await self._handle_evt_speech_stopped(evt)
+            elif evt.type == "response.output_text.delta":
+                await self._handle_evt_text_delta(evt)
+            elif evt.type == "response.output_audio_transcript.delta":
+                await self._handle_evt_audio_transcript_delta(evt)
+            elif evt.type == "response.function_call_arguments.done":
+                await self._handle_evt_function_call_arguments_done(evt)
+            elif evt.type == "error":
+                if not await self._maybe_handle_evt_retrieve_conversation_item_error(evt):
+                    await self._handle_evt_error(evt)
+                    # errors are fatal, so exit the receive loop
+                    return
+
+    @traced_openai_realtime(operation="llm_setup")
+    async def _handle_evt_session_created(self, evt):
+        # session.created is received right after connecting. Send a message
+        # to configure the session properties.
+        await self._update_settings()
+
+    async def _handle_evt_session_updated(self, evt):
+        # If this is our first context frame, run the LLM
+        self._api_session_ready = True
+        # Now that we've configured the session, we can run the LLM if we need to.
+        if self._run_llm_when_api_session_ready:
+            self._run_llm_when_api_session_ready = False
+            await self._create_response()
+
+    async def _handle_evt_audio_delta(self, evt):
+        # note: ttfb is faster by 1/2 RTT than ttfb as measured for other services, since we're getting
+        # this event from the server
+        await self.stop_ttfb_metrics()
+        if not self._current_audio_response:
+            self._current_audio_response = CurrentAudioResponse(
+                item_id=evt.item_id,
+                content_index=evt.content_index,
+                start_time_ms=int(time.time() * 1000),
+            )
+            await self.push_frame(TTSStartedFrame())
+        audio = base64.b64decode(evt.delta)
+        self._current_audio_response.total_size += len(audio)
+        frame = TTSAudioRawFrame(
+            audio=audio,
+            sample_rate=24000,
+            num_channels=1,
+        )
+        await self.push_frame(frame)
+
+    async def _handle_evt_audio_done(self, evt):
+        if self._current_audio_response:
+            await self.push_frame(TTSStoppedFrame())
+            # Don't clear the self._current_audio_response here. We need to wait until we
+            # receive a BotStoppedSpeakingFrame from the output transport.
+
+    async def _handle_evt_conversation_item_added(self, evt):
+        """Handle conversation.item.added event - item is added but may still be processing."""
+        if evt.item.type == "function_call":
+            # Track this function call for when arguments are completed
+            # Only add if not already tracked (prevent duplicates)
+            if evt.item.call_id not in self._pending_function_calls:
+                self._pending_function_calls[evt.item.call_id] = evt.item
+            else:
+                logger.warning(f"Function call {evt.item.call_id} already tracked, skipping")
+
+        await self._call_event_handler("on_conversation_item_created", evt.item.id, evt.item)
+
+        # This will get sent from the server every time a new "message" is added
+        # to the server's conversation state, whether we create it via the API
+        # or the server creates it from LLM output.
+        if self._messages_added_manually.get(evt.item.id):
+            del self._messages_added_manually[evt.item.id]
+            return
+
+        if evt.item.role == "user":
+            # We need to wait for completion of both user message and response message. Then we'll
+            # add both to the context. User message is complete when we have a "transcript" field
+            # that is not None. Response message is complete when we get a "response.done" event.
+            self._user_and_response_message_tuple = (evt.item, {"done": False, "output": []})
+        elif evt.item.role == "assistant":
+            self._current_assistant_response = evt.item
+            await self.push_frame(LLMFullResponseStartFrame())
+
+    async def _handle_evt_conversation_item_done(self, evt):
+        """Handle conversation.item.done event - item is fully completed."""
+        await self._call_event_handler("on_conversation_item_updated", evt.item.id, evt.item)
+        # The item is now fully processed and ready
+        # For now, no additional logic needed beyond the event handler call
+
+    async def _handle_evt_input_audio_transcription_delta(self, evt):
+        if self._send_transcription_frames:
+            await self.push_frame(
+                # no way to get a language code?
+                InterimTranscriptionFrame(evt.delta, "", time_now_iso8601(), result=evt)
+            )
+
+    @traced_stt
+    async def _handle_user_transcription(
+        self, transcript: str, is_final: bool, language: Optional[Language] = None
+    ):
+        """Handle a transcription result with tracing."""
+        pass
+
+    async def handle_evt_input_audio_transcription_completed(self, evt):
+        """Handle completion of input audio transcription.
+
+        Args:
+            evt: The transcription completed event.
+        """
+        await self._call_event_handler("on_conversation_item_updated", evt.item_id, None)
+
+        if self._send_transcription_frames:
+            await self.push_frame(
+                # no way to get a language code?
+                TranscriptionFrame(evt.transcript, "", time_now_iso8601(), result=evt)
+            )
+            await self._handle_user_transcription(evt.transcript, True, Language.EN)
+        pair = self._user_and_response_message_tuple
+        if pair:
+            user, assistant = pair
+            user.content[0].transcript = evt.transcript
+            if assistant["done"]:
+                self._user_and_response_message_tuple = None
+                self._context.add_user_content_item_as_message(user)
+        else:
+            # User message without preceding conversation.item.created. Bug?
+            logger.warning(f"Transcript for unknown user message: {evt}")
+
+    async def _handle_conversation_item_retrieved(self, evt: events.ConversationItemRetrieved):
+        futures = self._retrieve_conversation_item_futures.pop(evt.item.id, None)
+        if futures:
+            for future in futures:
+                future.set_result(evt.item)
+
+    @traced_openai_realtime(operation="llm_response")
+    async def _handle_evt_response_done(self, evt):
+        # todo: figure out whether there's anything we need to do for "cancelled" events
+        # usage metrics
+        tokens = LLMTokenUsage(
+            prompt_tokens=evt.response.usage.input_tokens,
+            completion_tokens=evt.response.usage.output_tokens,
+            total_tokens=evt.response.usage.total_tokens,
+        )
+        await self.start_llm_usage_metrics(tokens)
+        await self.stop_processing_metrics()
+        await self.push_frame(LLMFullResponseEndFrame())
+        self._current_assistant_response = None
+        # error handling
+        if evt.response.status == "failed":
+            await self.push_error(
+                ErrorFrame(error=evt.response.status_details["error"]["message"], fatal=True)
+            )
+            return
+        # response content
+        for item in evt.response.output:
+            await self._call_event_handler("on_conversation_item_updated", item.id, item)
+        pair = self._user_and_response_message_tuple
+        if pair:
+            user, assistant = pair
+            assistant["done"] = True
+            assistant["output"] = evt.response.output
+            if user.content[0].transcript is not None:
+                self._user_and_response_message_tuple = None
+                self._context.add_user_content_item_as_message(user)
+        else:
+            # Response message without preceding user message (standalone response)
+            # Function calls in this response were already processed immediately when arguments were complete
+            logger.debug(f"Handling standalone response: {evt.response.id}")
+
+    async def _handle_evt_text_delta(self, evt):
+        if evt.delta:
+            await self.push_frame(LLMTextFrame(evt.delta))
+
+    async def _handle_evt_audio_transcript_delta(self, evt):
+        if evt.delta:
+            await self.push_frame(LLMTextFrame(evt.delta))
+            await self.push_frame(TTSTextFrame(evt.delta))
+
+    async def _handle_evt_function_call_arguments_done(self, evt):
+        """Handle completion of function call arguments.
+
+        Args:
+            evt: The response.function_call_arguments.done event.
+        """
+        # Process the function call immediately when arguments are complete
+        # This is needed because function calls might not trigger response.done
+        try:
+            # Parse the arguments
+            args = json.loads(evt.arguments)
+
+            # Get the function call item we tracked earlier
+            function_call_item = self._pending_function_calls.get(evt.call_id)
+            if function_call_item:
+                # Remove from pending calls FIRST to prevent duplicate processing
+                del self._pending_function_calls[evt.call_id]
+
+                # Create the function call and process it
+                function_calls = [
+                    FunctionCallFromLLM(
+                        context=self._context,
+                        tool_call_id=evt.call_id,
+                        function_name=function_call_item.name,
+                        arguments=args,
+                    )
+                ]
+
+                await self.run_function_calls(function_calls)
+                logger.debug(f"Processed function call: {function_call_item.name}")
+            else:
+                logger.warning(f"No tracked function call found for call_id: {evt.call_id}")
+                logger.warning(
+                    f"Available pending calls: {list(self._pending_function_calls.keys())}"
+                )
+
+        except Exception as e:
+            logger.error(f"Failed to process function call arguments: {e}")
+
+    async def _handle_evt_speech_started(self, evt):
+        await self._truncate_current_audio_response()
+        await self._start_interruption()  # cancels this processor task
+        await self.push_frame(StartInterruptionFrame())  # cancels downstream tasks
+        await self.push_frame(UserStartedSpeakingFrame())
+
+    async def _handle_evt_speech_stopped(self, evt):
+        await self.start_ttfb_metrics()
+        await self.start_processing_metrics()
+        await self._stop_interruption()
+        await self.push_frame(UserStoppedSpeakingFrame())
+
+    async def _maybe_handle_evt_retrieve_conversation_item_error(self, evt: events.ErrorEvent):
+        """Maybe handle an error event related to retrieving a conversation item.
+
+        If the given error event is an error retrieving a conversation item:
+
+        - set an exception on the future that retrieve_conversation_item() is waiting on
+        - return true
+        Otherwise:
+        - return false
+        """
+        if evt.error.code == "item_retrieve_invalid_item_id":
+            item_id = evt.error.event_id.split("_", 1)[1]  # event_id is of the form "rci_{item_id}"
+            futures = self._retrieve_conversation_item_futures.pop(item_id, None)
+            if futures:
+                for future in futures:
+                    future.set_exception(Exception(evt.error.message))
+            return True
+        return False
+
+    async def _handle_evt_error(self, evt):
+        # Errors are fatal to this connection. Send an ErrorFrame.
+        await self.push_error(ErrorFrame(error=f"Error: {evt}", fatal=True))
+
+    #
+    # state and client events for the current conversation
+    # https://platform.openai.com/docs/api-reference/realtime-client-events
+    #
+
+    async def reset_conversation(self):
+        """Reset the conversation by disconnecting and reconnecting.
+
+        This is the safest way to start a new conversation. Note that this will
+        fail if called from the receive task.
+        """
+        logger.debug("Resetting conversation")
+        await self._disconnect()
+        if self._context:
+            self._context.llm_needs_settings_update = True
+            self._context.llm_needs_initial_messages = True
+        await self._connect()
+
+    @traced_openai_realtime(operation="llm_request")
+    async def _create_response(self):
+        if not self._api_session_ready:
+            self._run_llm_when_api_session_ready = True
+            return
+
+        if self._context.llm_needs_initial_messages:
+            messages = self._context.get_messages_for_initializing_history()
+            for item in messages:
+                evt = events.ConversationItemCreateEvent(item=item)
+                self._messages_added_manually[evt.item.id] = True
+                await self.send_client_event(evt)
+            self._context.llm_needs_initial_messages = False
+
+        if self._context.llm_needs_settings_update:
+            await self._update_settings()
+            self._context.llm_needs_settings_update = False
+
+        logger.debug(f"Creating response: {self._context.get_messages_for_logging()}")
+
+        await self.push_frame(LLMFullResponseStartFrame())
+        await self.start_processing_metrics()
+        await self.start_ttfb_metrics()
+        await self.send_client_event(
+            events.ResponseCreateEvent(
+                response=events.ResponseProperties(output_modalities=self._get_enabled_modalities())
+            )
+        )
+
+    async def _send_user_audio(self, frame):
+        payload = base64.b64encode(frame.audio).decode("utf-8")
+        await self.send_client_event(events.InputAudioBufferAppendEvent(audio=payload))
+
+    def create_context_aggregator(
+        self,
+        context: OpenAILLMContext,
+        *,
+        user_params: LLMUserAggregatorParams = LLMUserAggregatorParams(),
+        assistant_params: LLMAssistantAggregatorParams = LLMAssistantAggregatorParams(),
+    ) -> OpenAIContextAggregatorPair:
+        """Create an instance of OpenAIContextAggregatorPair from an OpenAILLMContext.
+
+        Constructor keyword arguments for both the user and assistant aggregators can be provided.
+
+        Args:
+            context: The LLM context.
+            user_params: User aggregator parameters.
+            assistant_params: Assistant aggregator parameters.
+
+        Returns:
+            OpenAIContextAggregatorPair: A pair of context aggregators, one for
+            the user and one for the assistant, encapsulated in an
+            OpenAIContextAggregatorPair.
+        """
+        context.set_llm_adapter(self.get_llm_adapter())
+
+        OpenAIRealtimeLLMContext.upgrade_to_realtime(context)
+        user = OpenAIRealtimeUserContextAggregator(context, params=user_params)
+
+        assistant_params.expect_stripped_words = False
+        assistant = OpenAIRealtimeAssistantContextAggregator(context, params=assistant_params)
+        return OpenAIContextAggregatorPair(_user=user, _assistant=assistant)
--- a/src/pipecat/services/openai_realtime_beta/azure.py
+++ b/src/pipecat/services/openai_realtime_beta/azure.py
@@ -6,6 +6,8 @@

 """Azure OpenAI Realtime Beta LLM service implementation."""

+import warnings
+
 from loguru import logger

 from .openai import OpenAIRealtimeBetaLLMService
@@ -23,6 +25,10 @@ except ModuleNotFoundError as e:
 class AzureRealtimeBetaLLMService(OpenAIRealtimeBetaLLMService):
    """Azure OpenAI Realtime Beta LLM service with Azure-specific authentication.

+    .. deprecated:: 0.0.84
+        `AzureRealtimeBetaLLMService` is deprecated, use `AzureRealtimeLLMService` instead.
+        This class will be removed in version 1.0.0.
+
    Extends the OpenAI Realtime service to work with Azure OpenAI endpoints,
    using Azure's authentication headers and endpoint format. Provides the same
    real-time audio and text communication capabilities as the base OpenAI service.
@@ -44,6 +50,16 @@ class AzureRealtimeBetaLLMService(OpenAIRealtimeBetaLLMService):
            **kwargs: Additional arguments passed to parent OpenAIRealtimeBetaLLMService.
        """
        super().__init__(base_url=base_url, api_key=api_key, **kwargs)
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("always")
+            warnings.warn(
+                "AzureRealtimeBetaLLMService is deprecated and will be removed in version 1.0.0. "
+                "Use AzureRealtimeLLMService instead.",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+
        self.api_key = api_key
        self.base_url = base_url

--- a/src/pipecat/services/openai_realtime_beta/openai.py
+++ b/src/pipecat/services/openai_realtime_beta/openai.py
@@ -9,6 +9,7 @@
 import base64
 import json
 import time
+import warnings
 from dataclasses import dataclass
 from typing import Optional

@@ -92,6 +93,10 @@ class CurrentAudioResponse:
 class OpenAIRealtimeBetaLLMService(LLMService):
    """OpenAI Realtime Beta LLM service providing real-time audio and text communication.

+    .. deprecated:: 0.0.84
+        `OpenAIRealtimeBetaLLMService` is deprecated, use `OpenAIRealtimeLLMService` instead.
+        This class will be removed in version 1.0.0.
+
    Implements the OpenAI Realtime API Beta with WebSocket communication for low-latency
    bidirectional audio and text interactions. Supports function calling, conversation
    management, and real-time transcription.
@@ -124,6 +129,15 @@ class OpenAIRealtimeBetaLLMService(LLMService):
            send_transcription_frames: Whether to emit transcription frames. Defaults to True.
            **kwargs: Additional arguments passed to parent LLMService.
        """
+        with warnings.catch_warnings():
+            warnings.simplefilter("always")
+            warnings.warn(
+                "OpenAIRealtimeBetaLLMService is deprecated and will be removed in version 1.0.0. "
+                "Use OpenAIRealtimeLLMService instead.",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+
        full_url = f"{base_url}?model={model}"
        super().__init__(base_url=full_url, **kwargs)

--- a/src/pipecat/services/vision_service.py
+++ b/src/pipecat/services/vision_service.py
@@ -14,7 +14,8 @@ visual content.
 from abc import abstractmethod
 from typing import AsyncGenerator

-from pipecat.frames.frames import Frame, VisionImageRawFrame
+from pipecat.frames.frames import Frame, LLMContextFrame
+from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.frame_processor import FrameDirection
 from pipecat.services.ai_service import AIService

@@ -37,15 +38,15 @@ class VisionService(AIService):
        self._describe_text = None

    @abstractmethod
-    async def run_vision(self, frame: VisionImageRawFrame) -> AsyncGenerator[Frame, None]:
-        """Process a vision image frame and generate results.
+    async def run_vision(self, context: LLMContext) -> AsyncGenerator[Frame, None]:
+        """Process the latest image in the context and generate results.

        This method must be implemented by subclasses to provide actual computer
        vision functionality such as image description, object detection, or
        visual question answering.

        Args:
-            frame: The vision image frame to process, containing image data.
+            context: The context to process, containing image data.

        Yields:
            Frame: Frames containing the vision analysis results, typically TextFrame
@@ -65,9 +66,9 @@ class VisionService(AIService):
        """
        await super().process_frame(frame, direction)

-        if isinstance(frame, VisionImageRawFrame):
+        if isinstance(frame, LLMContextFrame):
            await self.start_processing_metrics()
-            await self.process_generator(self.run_vision(frame))
+            await self.process_generator(self.run_vision(frame.context))
            await self.stop_processing_metrics()
        else:
            await self.push_frame(frame, direction)
--- a/src/pipecat/transports/base_output.py
+++ b/src/pipecat/transports/base_output.py
@@ -219,7 +219,34 @@ class BaseOutputTransport(FrameProcessor):
        pass

    async def write_dtmf(self, frame: OutputDTMFFrame | OutputDTMFUrgentFrame):
-        """Write a DTMF tone to the transport.
+        """Write a DTMF tone using the transport's preferred method.
+
+        Args:
+            frame: The DTMF frame to write.
+        """
+        if self._supports_native_dtmf():
+            await self._write_dtmf_native(frame)
+        else:
+            await self._write_dtmf_audio(frame)
+
+    def _supports_native_dtmf(self) -> bool:
+        """Override in transport implementations that support native DTMF.
+
+        Returns:
+            True if the transport supports native DTMF, False otherwise.
+        """
+        return False
+
+    async def _write_dtmf_native(self, frame: OutputDTMFFrame | OutputDTMFUrgentFrame):
+        """Override in transport implementations for native DTMF.
+
+        Args:
+            frame: The DTMF frame to write.
+        """
+        raise NotImplementedError("Transport claims native DTMF support but doesn't implement it")
+
+    async def _write_dtmf_audio(self, frame: OutputDTMFFrame | OutputDTMFUrgentFrame):
+        """Generate and send audio tones for DTMF.

        Args:
            frame: The DTMF frame to write.
@@ -228,7 +255,6 @@ class BaseOutputTransport(FrameProcessor):
        dtmf_audio_frame = OutputAudioRawFrame(
            audio=dtmf_audio, sample_rate=self._sample_rate, num_channels=1
        )
-        dtmf_audio_frame.transport_destination = frame.transport_destination
        await self.write_audio_frame(dtmf_audio_frame)

    async def send_audio(self, frame: OutputAudioRawFrame):
--- a/src/pipecat/transports/daily/transport.py
+++ b/src/pipecat/transports/daily/transport.py
@@ -61,9 +61,7 @@ try:
        VirtualCameraDevice,
        VirtualSpeakerDevice,
    )
-    from daily import (
-        LogLevel as DailyLogLevel,
-    )
+    from daily import LogLevel as DailyLogLevel
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
@@ -1809,6 +1807,27 @@ class DailyOutputTransport(BaseOutputTransport):
        """
        await self._client.write_video_frame(frame)

+    def _supports_native_dtmf(self) -> bool:
+        """Daily supports native DTMF via telephone events.
+
+        Returns:
+            True, as Daily supports native DTMF transmission.
+        """
+        return True
+
+    async def _write_dtmf_native(self, frame):
+        """Use Daily's native send_dtmf method for telephone events.
+
+        Args:
+            frame: The DTMF frame to write.
+        """
+        await self._client.send_dtmf(
+            {
+                "sessionId": frame.transport_destination,
+                "tones": frame.button.value,
+            }
+        )
+

 class DailyTransport(BaseTransport):
    """Transport implementation for Daily audio and video calls.
@@ -2296,7 +2315,7 @@ class DailyTransport(BaseTransport):
        """Handle participant updated events."""
        await self._call_event_handler("on_participant_updated", participant)

-    async def _on_transcription_message(self, message):
+    async def _on_transcription_message(self, message: Dict[str, Any]) -> None:
        """Handle transcription message events."""
        await self._call_event_handler("on_transcription_message", message)

@@ -2308,9 +2327,10 @@ class DailyTransport(BaseTransport):

        text = message["text"]
        timestamp = message["timestamp"]
-        is_final = message["rawResponse"]["is_final"]
+        raw_response = message.get("rawResponse", {})
+        is_final = raw_response.get("is_final", False)
        try:
-            language = message["rawResponse"]["channel"]["alternatives"][0]["languages"][0]
+            language = raw_response["channel"]["alternatives"][0]["languages"][0]
            language = Language(language)
        except KeyError:
            language = None
--- a/src/pipecat/transports/livekit/transport.py
+++ b/src/pipecat/transports/livekit/transport.py
@@ -12,6 +12,7 @@ event handling for conversational AI applications.
 """

 import asyncio
+import json
 from dataclasses import dataclass
 from typing import Any, Awaitable, Callable, List, Optional

@@ -24,11 +25,15 @@ from pipecat.frames.frames import (
    AudioRawFrame,
    CancelFrame,
    EndFrame,
+    ImageRawFrame,
    OutputAudioRawFrame,
+    OutputDTMFFrame,
+    OutputDTMFUrgentFrame,
    StartFrame,
    TransportMessageFrame,
    TransportMessageUrgentFrame,
    UserAudioRawFrame,
+    UserImageRawFrame,
 )
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessorSetup
 from pipecat.transports.base_input import BaseInputTransport
@@ -38,12 +43,29 @@ from pipecat.utils.asyncio.task_manager import BaseTaskManager

 try:
    from livekit import rtc
+    from livekit.rtc._proto import video_frame_pb2 as proto_video_frame
    from tenacity import retry, stop_after_attempt, wait_exponential
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use LiveKit, you need to `pip install pipecat-ai[livekit]`.")
    raise Exception(f"Missing module: {e}")

+# DTMF mapping according to RFC 4733
+DTMF_CODE_MAP = {
+    "0": 0,
+    "1": 1,
+    "2": 2,
+    "3": 3,
+    "4": 4,
+    "5": 5,
+    "6": 6,
+    "7": 7,
+    "8": 8,
+    "9": 9,
+    "*": 10,
+    "#": 11,
+}
+

@dataclass
 class LiveKitTransportMessageFrame(TransportMessageFrame):
@@ -96,6 +118,8 @@ class LiveKitCallbacks(BaseModel):
    on_participant_disconnected: Callable[[str], Awaitable[None]]
    on_audio_track_subscribed: Callable[[str], Awaitable[None]]
    on_audio_track_unsubscribed: Callable[[str], Awaitable[None]]
+    on_video_track_subscribed: Callable[[str], Awaitable[None]]
+    on_video_track_unsubscribed: Callable[[str], Awaitable[None]]
    on_data_received: Callable[[bytes, str], Awaitable[None]]
    on_first_participant_joined: Callable[[str], Awaitable[None]]

@@ -140,8 +164,11 @@ class LiveKitTransportClient:
        self._audio_track: Optional[rtc.LocalAudioTrack] = None
        self._audio_tracks = {}
        self._audio_queue = asyncio.Queue()
+        self._video_tracks = {}
+        self._video_queue = asyncio.Queue()
        self._other_participant_has_joined = False
        self._task_manager: Optional[BaseTaskManager] = None
+        self._async_lock = asyncio.Lock()

    @property
    def participant_id(self) -> str:
@@ -202,61 +229,63 @@ class LiveKitTransportClient:
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
    async def connect(self):
        """Connect to the LiveKit room with retry logic."""
-        if self._connected:
-            # Increment disconnect counter if already connected.
-            self._disconnect_counter += 1
-            return
+        async with self._async_lock:
+            if self._connected:
+                # Increment disconnect counter if already connected.
+                self._disconnect_counter += 1
+                return

-        logger.info(f"Connecting to {self._room_name}")
+            logger.info(f"Connecting to {self._room_name}")

-        try:
-            await self.room.connect(
-                self._url,
-                self._token,
-                options=rtc.RoomOptions(auto_subscribe=True),
-            )
-            self._connected = True
-            # Increment disconnect counter if we successfully connected.
-            self._disconnect_counter += 1
+            try:
+                await self.room.connect(
+                    self._url,
+                    self._token,
+                    options=rtc.RoomOptions(auto_subscribe=True),
+                )
+                self._connected = True
+                # Increment disconnect counter if we successfully connected.
+                self._disconnect_counter += 1

-            self._participant_id = self.room.local_participant.sid
-            logger.info(f"Connected to {self._room_name}")
+                self._participant_id = self.room.local_participant.sid
+                logger.info(f"Connected to {self._room_name}")

-            # Set up audio source and track
-            self._audio_source = rtc.AudioSource(
-                self._out_sample_rate, self._params.audio_out_channels
-            )
-            self._audio_track = rtc.LocalAudioTrack.create_audio_track(
-                "pipecat-audio", self._audio_source
-            )
-            options = rtc.TrackPublishOptions()
-            options.source = rtc.TrackSource.SOURCE_MICROPHONE
-            await self.room.local_participant.publish_track(self._audio_track, options)
+                # Set up audio source and track
+                self._audio_source = rtc.AudioSource(
+                    self._out_sample_rate, self._params.audio_out_channels
+                )
+                self._audio_track = rtc.LocalAudioTrack.create_audio_track(
+                    "pipecat-audio", self._audio_source
+                )
+                options = rtc.TrackPublishOptions()
+                options.source = rtc.TrackSource.SOURCE_MICROPHONE
+                await self.room.local_participant.publish_track(self._audio_track, options)

-            await self._callbacks.on_connected()
+                await self._callbacks.on_connected()

-            # Check if there are already participants in the room
-            participants = self.get_participants()
-            if participants and not self._other_participant_has_joined:
-                self._other_participant_has_joined = True
-                await self._callbacks.on_first_participant_joined(participants[0])
-        except Exception as e:
-            logger.error(f"Error connecting to {self._room_name}: {e}")
-            raise
+                # Check if there are already participants in the room
+                participants = self.get_participants()
+                if participants and not self._other_participant_has_joined:
+                    self._other_participant_has_joined = True
+                    await self._callbacks.on_first_participant_joined(participants[0])
+            except Exception as e:
+                logger.error(f"Error connecting to {self._room_name}: {e}")
+                raise

    async def disconnect(self):
        """Disconnect from the LiveKit room."""
-        # Decrement leave counter when leaving.
-        self._disconnect_counter -= 1
+        async with self._async_lock:
+            # Decrement leave counter when leaving.
+            self._disconnect_counter -= 1

-        if not self._connected or self._disconnect_counter > 0:
-            return
+            if not self._connected or self._disconnect_counter > 0:
+                return

-        logger.info(f"Disconnecting from {self._room_name}")
-        await self.room.disconnect()
-        self._connected = False
-        logger.info(f"Disconnected from {self._room_name}")
-        await self._callbacks.on_disconnected()
+            logger.info(f"Disconnecting from {self._room_name}")
+            await self.room.disconnect()
+            self._connected = False
+            logger.info(f"Disconnected from {self._room_name}")
+            await self._callbacks.on_disconnected()

    async def send_data(self, data: bytes, participant_id: Optional[str] = None):
        """Send data to participants in the room.
@@ -278,6 +307,26 @@ class LiveKitTransportClient:
        except Exception as e:
            logger.error(f"Error sending data: {e}")

+    async def send_dtmf(self, digit: str):
+        """Send DTMF tone to the room.
+
+        Args:
+            digit: The DTMF digit to send (0-9, *, #).
+        """
+        if not self._connected:
+            return
+
+        if digit not in DTMF_CODE_MAP:
+            logger.warning(f"Invalid DTMF digit: {digit}")
+            return
+
+        code = DTMF_CODE_MAP[digit]
+
+        try:
+            await self.room.local_participant.publish_dtmf(code=code, digit=digit)
+        except Exception as e:
+            logger.error(f"Error sending DTMF tone {digit}: {e}")
+
    async def publish_audio(self, audio_frame: rtc.AudioFrame):
        """Publish an audio frame to the room.

@@ -439,6 +488,15 @@ class LiveKitTransportClient:
                f"{self}::_process_audio_stream",
            )
            await self._callbacks.on_audio_track_subscribed(participant.sid)
+        elif track.kind == rtc.TrackKind.KIND_VIDEO:
+            logger.info(f"Video track subscribed: {track.sid} from participant {participant.sid}")
+            self._video_tracks[participant.sid] = track
+            video_stream = rtc.VideoStream(track)
+            self._task_manager.create_task(
+                self._process_video_stream(video_stream, participant.sid),
+                f"{self}::_process_video_stream",
+            )
+            await self._callbacks.on_video_track_subscribed(participant.sid)

    async def _async_on_track_unsubscribed(
        self,
@@ -450,6 +508,8 @@ class LiveKitTransportClient:
        logger.info(f"Track unsubscribed: {publication.sid} from {participant.identity}")
        if track.kind == rtc.TrackKind.KIND_AUDIO:
            await self._callbacks.on_audio_track_unsubscribed(participant.sid)
+        elif track.kind == rtc.TrackKind.KIND_VIDEO:
+            await self._callbacks.on_video_track_unsubscribed(participant.sid)

    async def _async_on_data_received(self, data: rtc.DataPacket):
        """Handle data received events."""
@@ -480,6 +540,21 @@ class LiveKitTransportClient:
            frame, participant_id = await self._audio_queue.get()
            yield frame, participant_id

+    async def _process_video_stream(self, video_stream: rtc.VideoStream, participant_id: str):
+        """Process incoming video stream from a participant."""
+        logger.info(f"Started processing video stream for participant {participant_id}")
+        async for event in video_stream:
+            if isinstance(event, rtc.VideoFrameEvent):
+                await self._video_queue.put((event, participant_id))
+            else:
+                logger.warning(f"Received unexpected event type: {type(event)}")
+
+    async def get_next_video_frame(self):
+        """Get the next video frame from the queue."""
+        while True:
+            frame, participant_id = await self._video_queue.get()
+            yield frame, participant_id
+
    def __str__(self):
        """String representation of the LiveKit transport client."""
        return f"{self._transport_name}::LiveKitTransportClient"
@@ -512,6 +587,7 @@ class LiveKitInputTransport(BaseInputTransport):
        self._client = client

        self._audio_in_task = None
+        self._video_in_task = None
        self._vad_analyzer: Optional[VADAnalyzer] = params.vad_analyzer
        self._resampler = create_stream_resampler()

@@ -544,6 +620,8 @@ class LiveKitInputTransport(BaseInputTransport):
        await self._client.connect()
        if not self._audio_in_task and self._params.audio_in_enabled:
            self._audio_in_task = self.create_task(self._audio_in_task_handler())
+        if not self._video_in_task and self._params.video_in_enabled:
+            self._video_in_task = self.create_task(self._video_in_task_handler())
        await self.set_transport_ready(frame)
        logger.info("LiveKitInputTransport started")

@@ -557,6 +635,8 @@ class LiveKitInputTransport(BaseInputTransport):
        await self._client.disconnect()
        if self._audio_in_task:
            await self.cancel_task(self._audio_in_task)
+        if self._video_in_task:
+            await self.cancel_task(self._video_in_task)
        logger.info("LiveKitInputTransport stopped")

    async def cancel(self, frame: CancelFrame):
@@ -569,6 +649,8 @@ class LiveKitInputTransport(BaseInputTransport):
        await self._client.disconnect()
        if self._audio_in_task and self._params.audio_in_enabled:
            await self.cancel_task(self._audio_in_task)
+        if self._video_in_task and self._params.video_in_enabled:
+            await self.cancel_task(self._video_in_task)

    async def setup(self, setup: FrameProcessorSetup):
        """Setup the input transport with shared client setup.
@@ -617,6 +699,29 @@ class LiveKitInputTransport(BaseInputTransport):
                )
                await self.push_audio_frame(input_audio_frame)

+    async def _video_in_task_handler(self):
+        """Handle incoming video frames from participants."""
+        logger.info("Video input task started")
+        video_iterator = self._client.get_next_video_frame()
+        async for video_data in video_iterator:
+            if video_data:
+                video_frame_event, participant_id = video_data
+                pipecat_video_frame = await self._convert_livekit_video_to_pipecat(
+                    video_frame_event=video_frame_event
+                )
+
+                # Skip frames with no video data
+                if len(pipecat_video_frame.image) == 0:
+                    continue
+
+                input_video_frame = UserImageRawFrame(
+                    user_id=participant_id,
+                    image=pipecat_video_frame.image,
+                    size=pipecat_video_frame.size,
+                    format=pipecat_video_frame.format,
+                )
+                await self.push_video_frame(input_video_frame)
+
    async def _convert_livekit_audio_to_pipecat(
        self, audio_frame_event: rtc.AudioFrameEvent
    ) -> AudioRawFrame:
@@ -633,6 +738,19 @@ class LiveKitInputTransport(BaseInputTransport):
            num_channels=audio_frame.num_channels,
        )

+    async def _convert_livekit_video_to_pipecat(
+        self,
+        video_frame_event: rtc.VideoFrameEvent,
+    ) -> ImageRawFrame:
+        """Convert LiveKit video frame to Pipecat video frame."""
+        rgb_frame = video_frame_event.frame.convert(proto_video_frame.VideoBufferType.RGB24)
+        image_frame = ImageRawFrame(
+            image=rgb_frame.data,
+            size=(rgb_frame.width, rgb_frame.height),
+            format="RGB",
+        )
+        return image_frame
+

 class LiveKitOutputTransport(BaseOutputTransport):
    """Handles outgoing media streams and events to LiveKit rooms.
@@ -720,10 +838,14 @@ class LiveKitOutputTransport(BaseOutputTransport):
        Args:
            frame: The transport message frame to send.
        """
+        message = frame.message
+        if isinstance(message, dict):
+            # fix message encoding for dict-like messages, e.g. RTVI messages.
+            message = json.dumps(message, ensure_ascii=False)
        if isinstance(frame, (LiveKitTransportMessageFrame, LiveKitTransportMessageUrgentFrame)):
-            await self._client.send_data(frame.message.encode(), frame.participant_id)
+            await self._client.send_data(message.encode(), frame.participant_id)
        else:
-            await self._client.send_data(frame.message.encode())
+            await self._client.send_data(message.encode())

    async def write_audio_frame(self, frame: OutputAudioRawFrame):
        """Write an audio frame to the LiveKit room.
@@ -734,6 +856,22 @@ class LiveKitOutputTransport(BaseOutputTransport):
        livekit_audio = self._convert_pipecat_audio_to_livekit(frame.audio)
        await self._client.publish_audio(livekit_audio)

+    def _supports_native_dtmf(self) -> bool:
+        """LiveKit supports native DTMF via telephone events.
+
+        Returns:
+            True, as LiveKit supports native DTMF transmission.
+        """
+        return True
+
+    async def _write_dtmf_native(self, frame: OutputDTMFFrame | OutputDTMFUrgentFrame):
+        """Use LiveKit's native publish_dtmf method for telephone events.
+
+        Args:
+            frame: The DTMF frame to write.
+        """
+        await self._client.send_dtmf(frame.button.value)
+
    def _convert_pipecat_audio_to_livekit(self, pipecat_audio: bytes) -> rtc.AudioFrame:
        """Convert Pipecat audio data to LiveKit audio frame."""
        bytes_per_sample = 2  # Assuming 16-bit audio
@@ -784,6 +922,8 @@ class LiveKitTransport(BaseTransport):
            on_participant_disconnected=self._on_participant_disconnected,
            on_audio_track_subscribed=self._on_audio_track_subscribed,
            on_audio_track_unsubscribed=self._on_audio_track_unsubscribed,
+            on_video_track_subscribed=self._on_video_track_subscribed,
+            on_video_track_unsubscribed=self._on_video_track_unsubscribed,
            on_data_received=self._on_data_received,
            on_first_participant_joined=self._on_first_participant_joined,
        )
@@ -801,6 +941,8 @@ class LiveKitTransport(BaseTransport):
        self._register_event_handler("on_participant_disconnected")
        self._register_event_handler("on_audio_track_subscribed")
        self._register_event_handler("on_audio_track_unsubscribed")
+        self._register_event_handler("on_video_track_subscribed")
+        self._register_event_handler("on_video_track_unsubscribed")
        self._register_event_handler("on_data_received")
        self._register_event_handler("on_first_participant_joined")
        self._register_event_handler("on_participant_left")
@@ -922,6 +1064,20 @@ class LiveKitTransport(BaseTransport):
        """Handle audio track unsubscribed events."""
        await self._call_event_handler("on_audio_track_unsubscribed", participant_id)

+    async def _on_video_track_subscribed(self, participant_id: str):
+        """Handle video track subscribed events."""
+        await self._call_event_handler("on_video_track_subscribed", participant_id)
+        participant = self._client.room.remote_participants.get(participant_id)
+        if participant:
+            for publication in participant.video_tracks.values():
+                self._client._on_track_subscribed_wrapper(
+                    publication.track, publication, participant
+                )
+
+    async def _on_video_track_unsubscribed(self, participant_id: str):
+        """Handle video track unsubscribed events."""
+        await self._call_event_handler("on_video_track_unsubscribed", participant_id)
+
    async def _on_data_received(self, data: bytes, participant_id: str):
        """Handle data received events."""
        if self._input:
--- a/uv.lock
+++ b/uv.lock
@@ -1236,13 +1236,13 @@ wheels = [

 [[package]]
 name = "daily-python"
-version = "0.19.8"
+version = "0.19.9"
 source = { registry = "https://pypi.org/simple" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/25/33/21029ca23df6bae54dfa4e8af550fdcc557f053dc924a554cfa0e32b2904/daily_python-0.19.8-cp37-abi3-macosx_10_15_x86_64.whl", hash = "sha256:cccd2eb8b223299408a9f1269a6f1a257d03aba749ef9fa97678010474c2b40b", size = 13692303, upload-time = "2025-08-27T21:24:36.567Z" },
-    { url = "https://files.pythonhosted.org/packages/14/5b/c795498ffe7cbdca72530b71b4102c4ef43c2f528a72f0deb2d4c3af1cac/daily_python-0.19.8-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:2c1010d44238d492cee3a6af231ff613899efb54943fe3a191f5b84d6af3330d", size = 12047872, upload-time = "2025-08-27T21:24:39.097Z" },
-    { url = "https://files.pythonhosted.org/packages/94/fd/145d65d6902873f3b44f3da7918d1bbbeeb6228e260e2aeb163311a33aa2/daily_python-0.19.8-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:d2b2fedf92e84b599f18d424ede423116eabffbe01ec6434f478d0b577d8bec3", size = 14111600, upload-time = "2025-08-27T21:24:40.962Z" },
-    { url = "https://files.pythonhosted.org/packages/5b/b6/a0123f00003cee45e488467649e2d69f09058f7815a4c590b4827992d3ef/daily_python-0.19.8-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:faec30ae64e233384d8bb96ad13de871843bb02f67bf3aa793a6bce9722734c5", size = 14582824, upload-time = "2025-08-27T21:24:43.148Z" },
+    { url = "https://files.pythonhosted.org/packages/22/85/6064c3225e5b190e522e8f3bc6a460efc5e3e6632f16fd5f9799c44ba57a/daily_python-0.19.9-cp37-abi3-macosx_10_15_x86_64.whl", hash = "sha256:cbc558ad7d49e79b550bf7567b9ceae75e2864d4fcaf41c90377b620e38a2461", size = 13365213, upload-time = "2025-09-06T00:31:00.224Z" },
+    { url = "https://files.pythonhosted.org/packages/23/58/af986c6881180a46a7b60dd418ce58d6d7c0c4ffc48d261748067c679317/daily_python-0.19.9-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:446bb9ee848d88bc68ca29a2216793c9b5ebaf5991bf604daf76f7c5a53d5919", size = 11711673, upload-time = "2025-09-06T00:31:02.526Z" },
+    { url = "https://files.pythonhosted.org/packages/9d/48/1cad4c3e92cdb5ef06467d972c76a510fe5e807513334b10ad7f8c21bf74/daily_python-0.19.9-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:2facaf82b614404c642c70bbf0874fb045d8ad46400acb051470cd4df93cb4db", size = 13679393, upload-time = "2025-09-06T00:31:04.999Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/e9/354f4699619e83d13e266256b2352b21741ac527e3e5ab5f2264d5c482cd/daily_python-0.19.9-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:ffc205efca7b47739efd358febab17577248c8db2ebc4d17d819307a83b9eefc", size = 14221932, upload-time = "2025-09-06T00:31:07.471Z" },
 ]

 [[package]]
@@ -4432,7 +4432,7 @@ requires-dist = [
    { name = "azure-cognitiveservices-speech", marker = "extra == 'azure'", specifier = "~=1.42.0" },
    { name = "cartesia", marker = "extra == 'cartesia'", specifier = "~=2.0.3" },
    { name = "coremltools", marker = "extra == 'local-smart-turn'", specifier = ">=8.0" },
-    { name = "daily-python", marker = "extra == 'daily'", specifier = "~=0.19.8" },
+    { name = "daily-python", marker = "extra == 'daily'", specifier = "~=0.19.9" },
    { name = "deepgram-sdk", marker = "extra == 'deepgram'", specifier = "~=4.7.0" },
    { name = "docstring-parser", specifier = "~=0.16" },
    { name = "einops", marker = "extra == 'moondream'", specifier = "~=0.8.0" },
Author	SHA1	Message	Date
James Hush	6bb3cb2b83	demo: DelayProcessor	2025-09-11 16:05:08 +08:00
Aleix Conchillo Flaqué	908325484d	Merge pull request #2614 from pipecat-ai/aleix/readme-client-sdks-table README: update clients' table	2025-09-10 10:21:18 -07:00
Mark Backman	dd6ff789c7	Merge pull request #2628 from pipecat-ai/mb/fix-13-push-frame fix: 13 foundational examples now push frames from TranscriptionLogger	2025-09-10 09:13:04 -07:00
Mark Backman	f4938e0fad	fix: 13 foundational examples now push frames from TranscriptionLogger	2025-09-10 10:40:10 -04:00
James Hush	e8f60c7c6f	Handle missing rawResponse in transcription messages (#2623 ) * Handle missing rawResponse in transcription messages - Use message.get('rawResponse', {}) to safely access rawResponse field - Default is_final to False when rawResponse is missing - Add proper type annotations for better code clarity - Minor import formatting cleanup This prevents KeyError crashes when transcription messages from Daily's API don't include the rawResponse field in edge cases. * docs: add changelog line	2025-09-10 15:03:23 +08:00
kompfner	38f6e33f97	Merge pull request #2598 from pipecat-ai/pk/deprecate-vision-image-raw-frame Remove `VisionImageRawFrame`, which was previously being handled dire…	2025-09-08 17:13:28 -04:00
Paul Kompfner	1c3e4e34e5	Minor fix to AWS Bedrock console logging to handle image messages in the context	2025-09-08 17:10:11 -04:00
Paul Kompfner	623c660027	Remove debugging comment	2025-09-08 17:01:51 -04:00
Paul Kompfner	a3e65ab3b5	The `VisionImageRawFrame` removal and corresponding `VisionImageFrameAggregator` deprecation will now happen in version 0.0.85	2025-09-08 17:01:47 -04:00
Paul Kompfner	f3a4b416df	Remove `VisionImageRawFrame`, which was previously being handled directly by the LLM services, and deprecate the associated `VisionImageFrameAggregator`. Removing `VisionImageRawFrame` lets us simplify LLM services' logic, getting us closer to the idealized architecture where all they care about is handling context frames. This change is in service of getting us closer to ready to deprecate usage of `OpenAILLMContext` and subclasses in favor of the universal `LLMContext`, at least for the traditional text-to-text LLMs. Why remove `VisionImageRawFrame` rather than deprecate? It's "internal"—only created by `VisionImageFrameAggregator`—and never intended to be used directly by users (it would be difficult to use directly anyway). Move the logic that was once in `VisionImageFrameAggregator` directly into the examples. Reasoning: - If `UserImageRequester` is defined in the examples, it makes sense for `UserImageProcessor` to be too, as it’s the flip side of the same coin, so to speak - The logic is now pretty trivial - This kind of one-shot, history-less image-describing pipeline shouldn't be common at all; it's ok for it to live in examples rather than as a dedicated class - In the short term, this enables us to create `LLMContext`s for services that support it and `OpenAILLMContext`s for services that don't yet (AWS) This commit also adds missing translation from OpenAI-format image context messages to AWS format. Note that this isn't a wasted effort in the face of the upcoming migration to universal `LLMContext`—this work will be reused as it has to be implemented there too.	2025-09-08 17:00:08 -04:00
Aleix Conchillo Flaqué	aa471a4ef5	update CHANGELOG with LiveKitTransport updates	2025-09-08 13:53:21 -07:00
Aleix Conchillo Flaqué	d55133a44f	Merge pull request #2604 from alexyzhou/feature/livekit_video_and_bug_fix Feature: Add support for livekit video stream and minor bug fixes	2025-09-08 13:51:14 -07:00
Aleix Conchillo Flaqué	0f1cf81691	README: update clients' table	2025-09-08 12:08:32 -07:00
kompfner	ac4d335799	Merge pull request #2613 from pipecat-ai/pk/mistral-message-fixups Apply additional fixups to context messages to meet Mistral-specific …	2025-09-08 13:59:54 -04:00
Paul Kompfner	e65385c151	Tweak the Mistral-specific context messages fixup logic to handle the (mostly academic) possibility of a "tool" message appearing at the end	2025-09-08 13:55:09 -04:00
Paul Kompfner	0bb7df7a6b	Remove stray debugging message	2025-09-08 13:38:26 -04:00
Paul Kompfner	daee1ddf3b	Apply additional fixups to context messages to meet Mistral-specific requirements	2025-09-08 11:26:58 -04:00
Aleix Conchillo Flaqué	1cccb97ccf	Merge pull request #2608 from pipecat-ai/aleix/deprecate-noisereducefilter audio(filters): deprecate NoisereduceFilter	2025-09-07 20:54:09 -07:00
Aleix Conchillo Flaqué	d7794abf21	audio(filters): deprecate NoisereduceFilter	2025-09-07 20:52:17 -07:00
Aleix Conchillo Flaqué	6a6a63a532	Merge pull request #2607 from pipecat-ai/aleix/scripts-evals-improve-eval-prompt scripts(evals): allow user to talk and only eval when needed	2025-09-07 20:49:43 -07:00
Mark Backman	6edb6fed41	Merge pull request #2606 from pipecat-ai/mb/quickstart-lockfile Remove uv.lock from quickstart	2025-09-07 06:10:14 -07:00
Mark Backman	a537382816	Add OpenAIRealtimeLLMService, AzureRealtimeLLMService (#2596 ) * Add OpenAI Realtime module * Add foundational examples for OpenAI Realtime * Add deprecation warning to OpenAIRealtimeBetaLLMService * Add deprecation warning to AzureRealtimeBetaLLMService * Update Changelog	2025-09-07 09:09:57 -04:00
Aleix Conchillo Flaqué	46deaada70	scripts(evals): allow user to talk and only eval when needed	2025-09-06 19:19:08 -07:00
Mark Backman	dbc52bc6b0	Remove uv.lock from quickstart	2025-09-06 11:13:50 -04:00
Alex Zhou	d6432589f6	fix: fix format and lint by ruff	2025-09-06 10:50:47 +08:00
Alex Zhou	13b73d4406	feat: Add support for pipecat video stream; fix the bug of duplicate participants when connecting; fix the bug of RTVI messages sent via livekit messages;	2025-09-06 10:41:33 +08:00
Aleix Conchillo Flaqué	85d8282f7e	Merge pull request #2602 from pipecat-ai/aleix/pipecat-0.0.84 update CHANGELOG for 0.0.84	2025-09-05 19:35:26 -07:00
Aleix Conchillo Flaqué	070690ec64	update CHANGELOG for 0.0.84	2025-09-05 18:22:50 -07:00
Aleix Conchillo Flaqué	b9c96fd623	Merge pull request #2601 from pipecat-ai/aleix/daily-python-0.19.9 pyproject: update daily-python to 0.19.9	2025-09-05 18:21:49 -07:00
Aleix Conchillo Flaqué	f8b2ab6331	pyproject: update daily-python to 0.19.9	2025-09-05 18:14:57 -07:00
Mark Backman	ea3f7e3c34	Merge pull request #2600 from pipecat-ai/mb/livekit-dtmf LiveKitTransport: Add support to send DTMF	2025-09-05 15:25:32 -07:00
Mark Backman	2f44f88b08	LiveKitTransport: Add support to send DTMF	2025-09-05 18:23:04 -04:00
Mark Backman	25747a001b	Merge pull request #2599 from pipecat-ai/mb/fix-daily-dtmf DTMF: Add support for native DTMF implementation where available	2025-09-05 15:20:05 -07:00
Mark Backman	fbe4338440	DTMF: Add support for native DTMF implementation where available	2025-09-05 18:16:56 -04:00
Filipi da Silva Fuchter	64b4c65728	Merge pull request #2595 from pipecat-ai/filipi/heygen_quality Improving HeyGen example video quality.	2025-09-05 17:19:25 -03:00
kompfner	29442969a9	Merge pull request #2597 from pipecat-ai/pk/fix-anthropic-tool-less-usage Fix Anthropic tool-less usage	2025-09-05 15:30:29 -04:00
Paul Kompfner	dc2e1d4ad3	Fix Anthropic tool-less usage	2025-09-05 11:47:31 -04:00
Filipi Fuchter	5477dfcbea	Improving HeyGen example video quality.	2025-09-05 11:30:01 -03:00
kompfner	516f0e08ab	Merge pull request #2590 from pipecat-ai/pk/gemini-multimodal-live-doesnt-support-llm-context Raise an error when attempting to use Gemini Multimodal Live with uni…	2025-09-05 09:22:33 -04:00
Paul Kompfner	246f9f3325	Raise an error when attempting to use Gemini Multimodal Live with universal `LLMContext`. This is exactly the same error we already have for the other s2s models, AWS Nova Sonic and OpenAI Realtime, it was just missing from this service.	2025-09-04 16:47:08 -04:00
kompfner	3d850e8cc5	Merge pull request #2574 from pipecat-ai/pk/expand-universal-llm-context-support-to-anthropic Expand universal `LLMContext` support to Anthropic	2025-09-04 13:09:44 -04:00
Paul Kompfner	6e734a37f9	Fix a bug in `AWSBedrockLLMService.run_inference()`; it was expecting the wrong format for the system instruction	2025-09-04 13:04:15 -04:00
Paul Kompfner	f72ca2fd7d	Remove unnecessary `system_instruction` argument from `run_inference()` methods	2025-09-04 13:04:15 -04:00
Paul Kompfner	0826d72f74	Add deprecation warning for using `enable_prompt_caching_beta` param	2025-09-04 13:04:15 -04:00
Paul Kompfner	ba5ebfa0ec	Fixed subtle CHANGELOG conflict after release of 0.0.83: universal `LLMContext` support for Anthropic didn't make that release. Also, some automatic Prettier fixes.	2025-09-04 13:04:11 -04:00
Paul Kompfner	dc3412b2df	Bump a deprecation to 0.0.84, as 0.0.83 just shipped	2025-09-04 13:03:06 -04:00
Paul Kompfner	b2e9fd9341	Rename Anthropic `enable_prompt_caching_beta` parameter to just `enable_prompt_caching`	2025-09-04 13:03:06 -04:00
Paul Kompfner	c11b207c97	Add Anthropic to CHANGELOG list of services newly supporting runtime LLM switching	2025-09-04 13:03:06 -04:00
Paul Kompfner	d6205027cf	Trivial cleanup	2025-09-04 13:03:06 -04:00
Paul Kompfner	986160c077	Fix a bug where the Anthropic adapter's merge-consecutive-messages-with-the-same-role logic was unintentionally affecting the source `LLMContext`'s messages, resulting in more and more duplication of text with each inference	2025-09-04 13:03:06 -04:00
Paul Kompfner	b56ff86fee	Minor refactor of `AnthropicLLMAdapter` cache-control-marker-adding logic (without really changing its behavior)	2025-09-04 13:03:06 -04:00
Paul Kompfner	5c574eaad9	Add support for universal `LLMContext` to Anthropic LLM service	2025-09-04 13:03:06 -04:00
Paul Kompfner	2df231143a	Add foundational example using Anthropic with universal `LLMContext`	2025-09-04 13:03:06 -04:00
Aleix Conchillo Flaqué	65298ab792	update CHANGELOG with AWSBedrockLLMService fix	2025-09-04 09:24:55 -07:00
Aleix Conchillo Flaqué	b609b02614	Merge pull request #2568 from ezisezis/fix-bedrock-timeouts fix timeout handling in AWSBedrockLLMService	2025-09-04 09:23:28 -07:00
Aleix Conchillo Flaqué	f2b50c14d2	Merge pull request #2573 from pipecat-ai/vp-minor-fixes-07s example 07s: minor typo updates	2025-09-04 09:21:32 -07:00
Aleix Conchillo Flaqué	ee3b023986	update CHANGELOG with OpenAIImageGenService fix	2025-09-04 09:20:02 -07:00
Aleix Conchillo Flaqué	0d9e1190d7	Merge pull request #2583 from sassanh/main fix: openai image generator now initiates URLImageRawFrame with correct order of arguments	2025-09-04 09:17:51 -07:00
Mark Backman	595a7c7fbe	Merge pull request #2587 from pipecat-ai/mb/update-quickstart-0.0.83 Update quickstart pyproject to use 0.0.83	2025-09-04 07:42:56 -07:00
Mark Backman	586586f743	Update quickstart pyproject to use 0.0.83	2025-09-04 10:36:58 -04:00
Mark Backman	a1c6ad539d	Merge pull request #2585 from ashotbagh/feat/asyncai-multilingual-support feat(asyncai): add multilingual TTS support	2025-09-04 05:03:45 -07:00
Ashot	daf7fed8b3	feat(asyncai): add multilingual TTS support	2025-09-04 13:58:50 +04:00
Sassan Haradji	a26647c433	fix: openai image generator now initiates URLImageRawFrame with correct order of arguments	2025-09-04 06:09:57 +03:30
vipyne	83f64ecd3b	example 07s: minor typo updates	2025-09-03 12:11:07 -05:00
Eduards Klavins	0a3e98857e	fix timeout handling in AWSBedrockLLMService	2025-09-03 11:52:30 +03:00