fixup

test cleanup
everything but audioframe and endpipeframe
2024-05-31 14:23:56 +00:00 · 2024-05-31 14:23:56 +00:00 · 2024-05-31 14:23:52 +00:00 · 2024-05-30 14:18:41 -07:00 · 2024-05-30 10:41:00 -07:00 · 2024-05-30 12:25:39 -05:00
76 changed files with 2510 additions and 604 deletions
--- a/.github/workflows/publish_test.yaml
+++ b/.github/workflows/publish_test.yaml
@@ -40,7 +40,7 @@ jobs:
          name: wheels
          path: ./dist

-  publish-to-pypi:
+  publish-to-test-pypi:
    name: "Publish to Test PyPI"
    runs-on: ubuntu-latest
    needs: [ build ]
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,131 @@ All notable changes to **pipecat** will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [Unreleased]
+
+### Added
+
+- Added Cartesia TTS support (https://cartesia.ai/)
+
+### Fixed
+
+- Fixed SileroVAD frame processor.
+
+- Fixed an issue where `camera_out_enabled` would cause the highg CPU usage if
+  no image was provided.
+
+
+## [0.0.24] - 2024-05-29
+
+### Added
+
+- Exposed `on_dialin_ready` for Daily transport SIP endpoint handling. This
+  notifies when the Daily room SIP endpoints are ready. This allows integrating
+  with third-party services like Twilio.
+
+- Exposed Daily transport `on_app_message` event.
+
+- Added Daily transport `on_call_state_updated` event.
+
+- Added Daily transport `start_recording()`, `stop_recording` and
+  `stop_dialout`.
+
+### Changed
+
+- Added `PipelineParams`. This replaces the `allow_interruptions` argument in
+  `PipelineTask` and will allow future parameters in the future.
+
+- Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.
+
+- GoogleLLMService `api_key` argument is now mandatory.
+
+### Fixed
+
+- Daily tranport `dialin-ready` doesn't not block anymore and it now handles
+  timeouts.
+
+- Fixed AzureLLMService.
+
+## [0.0.23] - 2024-05-23
+
+### Fixed
+
+- Fixed an issue handling Daily transport `dialin-ready` event.
+
+## [0.0.22] - 2024-05-23
+
+### Added
+
+- Added Daily transport `start_dialout()` to be able to make phone or SIP calls.
+  See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout
+
+- Added Daily transport support for dial-in use cases.
+
+- Added Daily transport events: `on_dialout_connected`, `on_dialout_stopped`,
+  `on_dialout_error` and `on_dialout_warning`.  See
+  https://reference-python.daily.co/api_reference.html#daily.EventHandler
+
+## [0.0.21] - 2024-05-22
+
+### Added
+
+- Added vision support to Anthropic service.
+
+- Added `WakeCheckFilter` which allows you to pass information downstream only
+  if you say a certain phrase/word.
+
+### Changed
+
+- `Filter` has been renamed to `FrameFilter` and it's now under
+  `processors/filters`.
+
+### Fixed
+
+- Fixed Anthropic service to use new frame types.
+
+- Fixed an issue in `LLMUserResponseAggregator` and `UserResponseAggregator`
+  that would cause frames after a brief pause to not be pushed to the LLM.
+
+- Clear the audio output buffer if we are interrupted.
+
+- Re-add exponential smoothing after volume calculation. This makes sure the
+  volume value being used doesn't fluctuate so much.
+
+## [0.0.20] - 2024-05-22
+
+### Added
+
+- In order to improve interruptions we now compute a loudness level using
+  [pyloudnorm](https://github.com/csteinmetz1/pyloudnorm). The audio coming
+  WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm
+  applied to the signal, however we don't do that on our local PyAudio
+  signals. This means that currently incoming audio from PyAudio is kind of
+  broken. We will fix it in future releases.
+
+### Fixed
+
+- Fixed an issue where `StartInterruptionFrame` would cause
+  `LLMUserResponseAggregator` to push the accumulated text causing the LLM
+  respond in the wrong task. The `StartInterruptionFrame` should not trigger any
+  new LLM response because that would be spoken in a different task.
+
+- Fixed an issue where tasks and threads could be paused because the executor
+  didn't have more tasks available. This was causing issues when cancelling and
+  recreating tasks during interruptions.
+
+## [0.0.19] - 2024-05-20
+
+### Changed
+
+- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` internal
+  messages are now exposed through the `messages` property.
+
+### Fixed
+
+- Fixed an issue where `LLMAssistantResponseAggregator` was not accumulating the
+  full response but short sentences instead. If there's an interruption we only
+  accumulate what the bot has spoken until now in a long response as well.
+
 ## [0.0.18] - 2024-05-20

 ### Fixed
--- a/examples/foundational/06-listen-and-respond.py
+++ b/examples/foundational/06-listen-and-respond.py
@@ -56,10 +56,11 @@ async def main(room_url: str, token):

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4-turbo-preview")
+            model="gpt-4o")

-        fl_in = FrameLogger("Inner")
-        fl_out = FrameLogger("Outer")
+        fl = FrameLogger("!!! after LLM", "red")
+        fltts = FrameLogger("@@@ out of tts", "green")
+        flend = FrameLogger("### out of the end", "magenta")

        messages = [
            {
@@ -71,14 +72,15 @@ async def main(room_url: str, token):
        tma_out = LLMAssistantResponseAggregator(messages)

        pipeline = Pipeline([
-            fl_in,
            transport.input(),
            tma_in,
            llm,
-            fl_out,
+            fl,
            tts,
+            fltts,
            transport.output(),
-            tma_out
+            tma_out,
+            flend
        ])

        task = PipelineTask(pipeline)
--- a/examples/foundational/06a-image-sync.py
+++ b/examples/foundational/06a-image-sync.py
@@ -15,14 +15,15 @@ from pipecat.frames.frames import ImageRawFrame, Frame, SystemFrame, TextFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineTask
-from pipecat.processors.aggregators.llm_context import (
-    LLMAssistantContextAggregator,
-    LLMUserContextAggregator,
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator,
+    LLMUserResponseAggregator,
 )
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.services.openai import OpenAILLMService
 from pipecat.services.elevenlabs import ElevenLabsTTSService
 from pipecat.transports.services.daily import DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer

 from pipecat.transports.services.daily import DailyParams
 from runner import configure
@@ -66,7 +67,9 @@ async def main(room_url: str, token):
                audio_out_enabled=True,
                camera_out_width=1024,
                camera_out_height=1024,
-                transcription_enabled=True
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
            )
        )

@@ -87,8 +90,8 @@ async def main(room_url: str, token):
            },
        ]

-        tma_in = LLMUserContextAggregator(messages)
-        tma_out = LLMAssistantContextAggregator(messages)
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)

        image_sync_aggregator = ImageSyncAggregator(
            os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
--- a/examples/foundational/07-interruptible.py
+++ b/examples/foundational/07-interruptible.py
@@ -12,7 +12,7 @@ import sys
 from pipecat.frames.frames import LLMMessagesFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineTask
+from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_response import (
    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
 from pipecat.services.elevenlabs import ElevenLabsTTSService
@@ -74,7 +74,7 @@ async def main(room_url: str, token):
            tma_out              # Assistant spoken responses
        ])

-        task = PipelineTask(pipeline, allow_interruptions=True)
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
--- a/examples/foundational/07a-interruptible-anthropic.py
+++ b/examples/foundational/07a-interruptible-anthropic.py
@@ -0,0 +1,95 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.frames.frames import LLMMessagesFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.services.anthropic import AnthropicLLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
+            )
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = AnthropicLLMService(
+            api_key=os.getenv("ANTHROPIC_API_KEY"),
+            model="claude-3-opus-20240229")
+
+        # todo: think more about how to handle system prompts in a more general way. OpenAI,
+        # Google, and Anthropic all have slightly different approaches to providing a system
+        # prompt.
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative, helpful, and brief way. Say hello.",
+            },
+        ]
+
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
+
+        pipeline = Pipeline([
+            transport.input(),   # Transport user input
+            tma_in,              # User responses
+            llm,                 # LLM
+            tts,                 # TTS
+            transport.output(),  # Transport bot output
+            tma_out              # Assistant spoken responses
+        ])
+
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            transport.capture_participant_transcription(participant["id"])
+            # Kick off the conversation.
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/examples/foundational/07c-interruptible-deepgram.py
+++ b/examples/foundational/07c-interruptible-deepgram.py
@@ -0,0 +1,94 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.frames.frames import LLMMessagesFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
+from pipecat.services.deepgram import DeepgramTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
+            )
+        )
+
+        tts = DeepgramTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("DEEPGRAM_API_KEY"),
+            voice="aura-helios-en"
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
+
+        pipeline = Pipeline([
+            transport.input(),   # Transport user input
+            tma_in,              # User responses
+            llm,                 # LLM
+            tts,                 # TTS
+            transport.output(),  # Transport bot output
+            tma_out              # Assistant spoken responses
+        ])
+
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            transport.capture_participant_transcription(participant["id"])
+            # Kick off the conversation.
+            messages.append(
+                {"role": "system", "content": "Please introduce yourself to the user."})
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/examples/foundational/07d-interruptible-cartesia.py
+++ b/examples/foundational/07d-interruptible-cartesia.py
@@ -0,0 +1,93 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.frames.frames import LLMMessagesFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
+            )
+        )
+
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            voice_name="Barbershop Man"
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            model="gpt-4o")
+
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
+
+        pipeline = Pipeline([
+            transport.input(),   # Transport user input
+            tma_in,              # User responses
+            llm,                 # LLM
+            tts,                 # TTS
+            transport.output(),  # Transport bot output
+            tma_out              # Assistant spoken responses
+        ])
+
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            transport.capture_participant_transcription(participant["id"])
+            # Kick off the conversation.
+            messages.append(
+                {"role": "system", "content": "Please introduce yourself to the user."})
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/examples/foundational/08-bots-arguing.py
+++ b/examples/foundational/08-bots-arguing.py
@@ -3,14 +3,14 @@ import aiohttp
 import asyncio
 import logging
 import os
-from pipecat.pipeline.aggregators import SentenceAggregator
+from pipecat.processors.aggregators import SentenceAggregator
 from pipecat.pipeline.pipeline import Pipeline

 from pipecat.transports.daily_transport import DailyTransport
 from pipecat.services.azure_ai_services import AzureLLMService, AzureTTSService
 from pipecat.services.elevenlabs_ai_services import ElevenLabsTTSService
 from pipecat.services.fal_ai_services import FalImageGenService
-from pipecat.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
+from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame

 from runner import configure

--- a/examples/foundational/10-wake-phrase.py
+++ b/examples/foundational/10-wake-phrase.py
@@ -0,0 +1,94 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main(room_url: str, token):
+
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Robot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
+            )
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            model="gpt-4o")
+
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
+            },
+        ]
+
+        hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
+
+        pipeline = Pipeline([
+            transport.input(),   # Transport user input
+            hey_robot_filter,    # Filter out speech not directed at the robot
+            tma_in,              # User responses
+            llm,                 # LLM
+            tts,                 # TTS
+            transport.output(),  # Transport bot output
+            tma_out              # Assistant spoken responses
+        ])
+
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            transport.capture_participant_transcription(participant["id"])
+            await tts.say("Hi! If you want to talk to me, just say 'Hey Robot'.")
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/examples/foundational/10-wake-word.py
+++ b/examples/foundational/10-wake-word.py
@@ -1,189 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import aiohttp
-import os
-import random
-import sys
-
-from PIL import Image
-
-from pipecat.frames.frames import (
-    Frame,
-    SystemFrame,
-    TextFrame,
-    ImageRawFrame,
-    SpriteFrame,
-    TranscriptionFrame,
-)
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineTask
-from pipecat.processors.aggregators.llm_context import (
-    LLMUserContextAggregator,
-    LLMAssistantContextAggregator,
-)
-from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.elevenlabs import ElevenLabsTTSService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-
-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-sprites = {}
-image_files = [
-    "sc-default.png",
-    "sc-talk.png",
-    "sc-listen-1.png",
-    "sc-think-1.png",
-    "sc-think-2.png",
-    "sc-think-3.png",
-    "sc-think-4.png",
-]
-
-script_dir = os.path.dirname(__file__)
-
-for file in image_files:
-    # Build the full path to the image file
-    full_path = os.path.join(script_dir, "assets", file)
-    # Get the filename without the extension to use as the dictionary key
-    filename = os.path.splitext(os.path.basename(full_path))[0]
-    # Open the image and convert it to bytes
-    with Image.open(full_path) as img:
-        sprites[file] = ImageRawFrame(image=img.tobytes(), size=img.size, format=img.format)
-
-# When the bot isn't talking, show a static image of the cat listening
-quiet_frame = sprites["sc-listen-1.png"]
-
-# When the bot is talking, build an animation from two sprites
-talking_list = [sprites["sc-default.png"], sprites["sc-talk.png"]]
-talking = [random.choice(talking_list) for x in range(30)]
-talking_frame = SpriteFrame(talking)
-
-# TODO: Support "thinking" as soon as we get a valid transcript, while LLM
-# is processing
-thinking_list = [
-    sprites["sc-think-1.png"],
-    sprites["sc-think-2.png"],
-    sprites["sc-think-3.png"],
-    sprites["sc-think-4.png"],
-]
-thinking_frame = SpriteFrame(thinking_list)
-
-
-class NameCheckFilter(FrameProcessor):
-    def __init__(self, names: list[str]):
-        super().__init__()
-        self._names = names
-        self._sentence = ""
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        if isinstance(frame, SystemFrame):
-            await self.push_frame(frame, direction)
-            return
-
-        content: str = ""
-
-        # TODO: split up transcription by participant
-        if isinstance(frame, TranscriptionFrame):
-            content = frame.text
-            self._sentence += content
-            if self._sentence.endswith((".", "?", "!")):
-                if any(name in self._sentence for name in self._names):
-                    await self.push_frame(TextFrame(self._sentence))
-                    self._sentence = ""
-                else:
-                    self._sentence = ""
-        else:
-            await self.push_frame(frame, direction)
-
-
-class ImageSyncAggregator(FrameProcessor):
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await self.push_frame(talking_frame)
-        await self.push_frame(frame)
-        await self.push_frame(quiet_frame)
-
-
-async def main(room_url: str, token):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Santa Cat",
-            DailyParams(
-                audio_out_enabled=True,
-                camera_out_enabled=True,
-                camera_out_width=720,
-                camera_out_height=1280,
-                camera_out_framerate=10,
-                transcription_enabled=True
-            )
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4-turbo-preview")
-
-        tts = ElevenLabsTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("ELEVENLABS_API_KEY"),
-            voice_id="jBpfuIE2acCO8z3wKNLl",
-        )
-        isa = ImageSyncAggregator()
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while.  Your responses should only be a few sentences long.",
-            },
-        ]
-
-        tma_in = LLMUserContextAggregator(messages)
-        tma_out = LLMAssistantContextAggregator(messages)
-        ncf = NameCheckFilter(["Santa Cat", "Santa"])
-
-        pipeline = Pipeline([
-            transport.input(),
-            isa,
-            ncf,
-            tma_in,
-            llm,
-            tts,
-            transport.output(),
-            tma_out
-        ])
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            # Send some greeting at the beginning.
-            await tts.say("Hi! If you want to talk to me, just say 'hey Santa Cat'.")
-            transport.capture_participant_transcription(participant["id"])
-
-        async def starting_image():
-            await transport.send_image(quiet_frame)
-
-        runner = PipelineRunner()
-
-        task = PipelineTask(pipeline)
-
-        await asyncio.gather(runner.run(task), starting_image())
-
-
-if __name__ == "__main__":
-    (url, token) = configure()
-    asyncio.run(main(url, token))
--- a/examples/foundational/11-sound-effects.py
+++ b/examples/foundational/11-sound-effects.py
@@ -19,15 +19,16 @@ from pipecat.frames.frames import (
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineTask
-from pipecat.processors.aggregators.llm_context import (
-    LLMUserContextAggregator,
-    LLMAssistantContextAggregator,
+from pipecat.processors.aggregators.llm_response import (
+    LLMUserResponseAggregator,
+    LLMAssistantResponseAggregator,
 )
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.processors.logger import FrameLogger
 from pipecat.services.elevenlabs import ElevenLabsTTSService
 from pipecat.services.openai import OpenAILLMService
 from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer

 from runner import configure

@@ -84,7 +85,12 @@ async def main(room_url: str, token):
            room_url,
            token,
            "Respond bot",
-            DailyParams(audio_out_enabled=True, transcription_enabled=True)
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
+            )
        )

        llm = OpenAILLMService(
@@ -104,8 +110,8 @@ async def main(room_url: str, token):
            },
        ]

-        tma_in = LLMUserContextAggregator(messages)
-        tma_out = LLMAssistantContextAggregator(messages)
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
        out_sound = OutboundSoundEffectWrapper()
        in_sound = InboundSoundEffectWrapper()
        fl = FrameLogger("LLM Out")
--- a/examples/foundational/12a-describe-video-gemini-flash.py
+++ b/examples/foundational/12a-describe-video-gemini-flash.py
@@ -62,19 +62,15 @@ async def main(room_url: str, token):
            )
        )

-        tts = ElevenLabsTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("ELEVENLABS_API_KEY"),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
-        )
-
        user_response = UserResponseAggregator()

        image_requester = UserImageRequester()

        vision_aggregator = VisionImageFrameAggregator()

-        google = GoogleLLMService(model="gemini-1.5-flash-latest")
+        google = GoogleLLMService(
+            model="gemini-1.5-flash-latest",
+            api_key=os.getenv("GOOGLE_API_KEY"))

        tts = ElevenLabsTTSService(
            aiohttp_session=session,
--- a/examples/foundational/12b-describe-video-gpt-4o.py
+++ b/examples/foundational/12b-describe-video-gpt-4o.py
@@ -61,12 +61,6 @@ async def main(room_url: str, token):
            )
        )

-        tts = ElevenLabsTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("ELEVENLABS_API_KEY"),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
-        )
-
        user_response = UserResponseAggregator()

        image_requester = UserImageRequester()
--- a/examples/foundational/12c-describe-video-anthropic.py
+++ b/examples/foundational/12c-describe-video-anthropic.py
@@ -0,0 +1,106 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask
+from pipecat.processors.aggregators.user_response import UserResponseAggregator
+from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.services.anthropic import AnthropicLLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+class UserImageRequester(FrameProcessor):
+
+    def __init__(self, participant_id: str | None = None):
+        super().__init__()
+        self._participant_id = participant_id
+
+    def set_participant_id(self, participant_id: str):
+        self._participant_id = participant_id
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        if self._participant_id and isinstance(frame, TextFrame):
+            await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
+        await self.push_frame(frame, direction)
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Describe participant video",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
+            )
+        )
+
+        user_response = UserResponseAggregator()
+
+        image_requester = UserImageRequester()
+
+        vision_aggregator = VisionImageFrameAggregator()
+
+        anthropic = AnthropicLLMService(
+            api_key=os.getenv("ANTHROPIC_API_KEY"),
+            model="claude-3-sonnet-20240229"
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            await tts.say("Hi there! Feel free to ask me what I see.")
+            transport.capture_participant_video(participant["id"], framerate=0)
+            transport.capture_participant_transcription(participant["id"])
+            image_requester.set_participant_id(participant["id"])
+
+        pipeline = Pipeline([
+            transport.input(),
+            user_response,
+            image_requester,
+            vision_aggregator,
+            anthropic,
+            tts,
+            transport.output()
+        ])
+
+        task = PipelineTask(pipeline)
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/examples/foundational/14-function-calling.py
+++ b/examples/foundational/14-function-calling.py
@@ -0,0 +1,145 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import json
+import sys
+
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantContextAggregator,
+    LLMUserContextAggregator,
+)
+from pipecat.services.openai import OpenAILLMContext
+from pipecat.processors.logger import FrameLogger
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer
+from openai.types.chat import (
+    ChatCompletionToolParam,
+)
+from pipecat.frames.frames import (
+    TextFrame
+)
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def start_fetch_weather(llm):
+    await llm.push_frame(TextFrame("Let me think."))
+
+
+async def fetch_weather_from_api(llm, args):
+    return ({"conditions": "nice", "temperature": "75"})
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
+            )
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            model="gpt-4-turbo-preview")
+        llm.register_function(
+            "get_current_weather",
+            fetch_weather_from_api,
+            start_callback=start_fetch_weather)
+
+        fl_in = FrameLogger("Inner")
+        fl_out = FrameLogger("Outer")
+
+        tools = [
+            ChatCompletionToolParam(
+                type="function",
+                function={
+                    "name": "get_current_weather",
+                    "description": "Get the current weather",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "location": {
+                                "type": "string",
+                                "description": "The city and state, e.g. San Francisco, CA",
+                            },
+                            "format": {
+                                "type": "string",
+                                "enum": [
+                                    "celsius",
+                                    "fahrenheit"],
+                                "description": "The temperature unit to use. Infer this from the users location.",
+                            },
+                        },
+                        "required": [
+                            "location",
+                            "format"],
+                    },
+                })]
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = OpenAILLMContext(messages, tools)
+        tma_in = LLMUserContextAggregator(context)
+        tma_out = LLMAssistantContextAggregator(context)
+        pipeline = Pipeline([
+            fl_in,
+            transport.input(),
+            tma_in,
+            llm,
+            fl_out,
+            tts,
+            transport.output(),
+            tma_out
+        ])
+
+        task = PipelineTask(pipeline)
+
+        @ transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            transport.capture_participant_transcription(participant["id"])
+            # Kick off the conversation.
+            await tts.say("Hi! Ask me about the weather in San Francisco.")
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/examples/foundational/websocket-server/sample.py
+++ b/examples/foundational/websocket-server/sample.py
@@ -2,8 +2,8 @@ import asyncio
 import aiohttp
 import logging
 import os
-from pipecat.pipeline.frame_processor import FrameProcessor
-from pipecat.pipeline.frames import TextFrame, TranscriptionFrame
+from pipeline.processors.frame_processor import FrameProcessor
+from pipecat.frames.frames import TextFrame, TranscriptionFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.services.elevenlabs_ai_services import ElevenLabsTTSService
 from pipecat.transports.websocket_transport import WebsocketTransport
--- a/examples/patient-intake/Dockerfile
+++ b/examples/patient-intake/Dockerfile
@@ -0,0 +1,16 @@
+FROM python:3.10-bullseye
+
+RUN mkdir /app
+RUN mkdir /app/assets
+RUN mkdir /app/utils
+COPY *.py /app/
+COPY requirements.txt /app/
+copy assets/* /app/assets/
+copy utils/* /app/utils/
+
+WORKDIR /app
+RUN pip3 install -r requirements.txt
+
+EXPOSE 7860
+
+CMD ["python3", "server.py"]
--- a/examples/patient-intake/README.md
+++ b/examples/patient-intake/README.md
@@ -0,0 +1,37 @@
+# Simple Chatbot
+
+<img src="image.png" width="420px">
+
+This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion.
+
+See a video of it in action: https://x.com/kwindla/status/1778628911817183509
+
+And a quick video walkthrough of the code: https://www.loom.com/share/13df1967161f4d24ade054e7f8753416
+
+ℹ️ The first time, things might take extra time to get started since VAD (Voice Activity Detection) model needs to be downloaded.
+
+## Get started
+
+```python
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+
+cp env.example .env # and add your credentials
+
+```
+
+## Run the server
+
+```bash
+python server.py
+```
+
+Then, visit `http://localhost:7860/start` in your browser to start a chatbot session.
+
+## Build and test the Docker image
+
+```
+docker build -t chatbot .
+docker run --env-file .env -p 7860:7860 chatbot
+```
--- a/examples/patient-intake/assets/clack-short-quiet.wav
+++ b/examples/patient-intake/assets/clack-short-quiet.wav
--- a/examples/patient-intake/assets/clack-short.wav
+++ b/examples/patient-intake/assets/clack-short.wav
--- a/examples/patient-intake/assets/clack.wav
+++ b/examples/patient-intake/assets/clack.wav
--- a/examples/patient-intake/assets/ding.wav
+++ b/examples/patient-intake/assets/ding.wav
--- a/examples/patient-intake/assets/ding2.wav
+++ b/examples/patient-intake/assets/ding2.wav
--- a/examples/patient-intake/assets/ding3.wav
+++ b/examples/patient-intake/assets/ding3.wav
--- a/examples/patient-intake/bot.py
+++ b/examples/patient-intake/bot.py
@@ -0,0 +1,359 @@
+import asyncio
+import aiohttp
+import copy
+import json
+import os
+import re
+import sys
+import wave
+from typing import List
+
+from openai._types import NotGiven, NOT_GIVEN
+
+from openai.types.chat import (
+    ChatCompletionToolParam,
+)
+
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask
+from pipecat.processors.aggregators.llm_response import LLMUserContextAggregator, LLMAssistantContextAggregator
+from pipecat.processors.logger import FrameLogger
+from pipecat.frames.frames import (
+    Frame,
+    LLMMessagesFrame,
+    AudioRawFrame,
+)
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.services.elevenlabs import ElevenLabsTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.services.ai_services import AIService
+from pipecat.transports.services.daily import DailyParams, DailyTranscriptionSettings, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer
+from pipecat.services.openai import OpenAILLMContext, OpenAILLMContextFrame
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+sounds = {}
+sound_files = [
+    "clack-short.wav",
+    "clack.wav",
+    "clack-short-quiet.wav",
+    "ding.wav",
+    "ding2.wav",
+]
+
+script_dir = os.path.dirname(__file__)
+
+for file in sound_files:
+    # Build the full path to the sound file
+    full_path = os.path.join(script_dir, "assets", file)
+    # Get the filename without the extension to use as the dictionary key
+    filename = os.path.splitext(os.path.basename(full_path))[0]
+    # Open the sound and convert it to bytes
+    with wave.open(full_path) as audio_file:
+        sounds[file] = AudioRawFrame(audio_file.readframes(-1),
+                                     audio_file.getframerate(), audio_file.getnchannels())
+
+
+class IntakeProcessor:
+    def __init__(
+        self,
+        context: OpenAILLMContext,
+        llm: AIService,
+        tools: List[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,
+        *args,
+        **kwargs,
+    ):
+        super().__init__(*args, **kwargs)
+        self._context: OpenAILLMContext = context
+        self._llm = llm
+        print(f"Initializing context from IntakeProcessor")
+        self._context.add_message({"role": "system", "content": "You are Jessica, an agent for a company called Tri-County Health Services. Your job is to collect important information from the user before their doctor visit. You're talking to Chad Bailey. You should address the user by their first name and be polite and professional. You're not a medical professional, so you shouldn't provide any advice. Keep your responses short. Your job is to collect information to give to a doctor. Don't make assumptions about what values to plug into functions. Ask for clarification if a user response is ambiguous. Start by introducing yourself. Then, ask the user to confirm their identity by telling you their birthday, including the year. When they answer with their birthday, call the verify_birthday function."})
+        self._context.set_tools([
+            {
+                "type": "function",
+                "function": {
+                    "name": "verify_birthday",
+                    "description": "Use this function to verify the user has provided their correct birthday.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "birthday": {
+                                "type": "string",
+                                "description": "The user's birthdate, including the year. The user can provide it in any format, but convert it to YYYY-MM-DD format to call this function.",
+                            }},
+                    },
+                },
+            }])
+        # Create an allowlist of functions that the LLM can call
+        self._functions = [
+            "verify_birthday",
+            "list_prescriptions",
+            "list_allergies",
+            "list_conditions",
+            "list_visit_reasons",
+        ]
+
+    async def verify_birthday(self, llm, args):
+        if args["birthday"] == "1983-01-01":
+            self._context.set_tools(
+                [
+                    {
+                        "type": "function",
+                        "function": {
+                            "name": "list_prescriptions",
+                            "description": "Once the user has provided a list of their prescription medications, call this function.",
+                            "parameters": {
+                                "type": "object",
+                                "properties": {
+                                    "prescriptions": {
+                                        "type": "array",
+                                        "items": {
+                                            "type": "object",
+                                            "properties": {
+                                                "medication": {
+                                                    "type": "string",
+                                                    "description": "The medication's name",
+                                                },
+                                                "dosage": {
+                                                    "type": "string",
+                                                    "description": "The prescription's dosage",
+                                                },
+                                            },
+                                        },
+                                    }},
+                            },
+                        },
+                    }])
+            # It's a bit weird to push this to the LLM, but it gets it into the pipeline
+            await llm.push_frame(sounds["ding2.wav"], FrameDirection.DOWNSTREAM)
+            # We don't need the function call in the context, so just return a new
+            # system message and let the framework re-prompt
+            return [{"role": "system", "content": "Next, thank the user for confirming their identity, then ask the user to list their current prescriptions. Each prescription needs to have a medication name and a dosage. Do not call the list_prescriptions function with any unknown dosages."}]
+        else:
+            # The user provided an incorrect birthday; ask them to try again
+            return [{"role": "system", "content": "The user provided an incorrect birthday. Ask them for their birthday again. When they answer, call the verify_birthday function."}]
+
+    async def start_prescriptions(self, llm):
+        print(f"!!! doing start prescriptions")
+        # Move on to allergies
+        self._context.set_tools(
+            [
+                {
+                    "type": "function",
+                    "function": {
+                        "name": "list_allergies",
+                        "description": "Once the user has provided a list of their allergies, call this function.",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "allergies": {
+                                    "type": "array",
+                                    "items": {
+                                        "type": "object",
+                                        "properties": {
+                                            "name": {
+                                                "type": "string",
+                                                "description": "What the user is allergic to",
+                                            }},
+                                    },
+                                }},
+                        },
+                    },
+                }])
+        self._context.add_message(
+            {
+                "role": "system",
+                "content": "Next, ask the user if they have any allergies. Once they have listed their allergies or confirmed they don't have any, call the list_allergies function."})
+        print(f"!!! about to await llm process frame in start prescrpitions")
+        await llm.process_frame(OpenAILLMContextFrame(self._context), FrameDirection.DOWNSTREAM)
+        print(f"!!! past await process frame in start prescriptions")
+
+    async def start_allergies(self, llm):
+        print("!!! doing start allergies")
+        # Move on to conditions
+        self._context.set_tools(
+            [
+                {
+                    "type": "function",
+                    "function": {
+                        "name": "list_conditions",
+                        "description": "Once the user has provided a list of their medical conditions, call this function.",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "conditions": {
+                                    "type": "array",
+                                    "items": {
+                                        "type": "object",
+                                        "properties": {
+                                            "name": {
+                                                "type": "string",
+                                                "description": "The user's medical condition",
+                                            }},
+                                    },
+                                }},
+                        },
+                    },
+                },
+            ])
+        self._context.add_message(
+            {
+                "role": "system",
+                "content": "Now ask the user if they have any medical conditions the doctor should know about. Once they've answered the question, call the list_conditions function."})
+        await llm.process_frame(OpenAILLMContextFrame(self._context), FrameDirection.DOWNSTREAM)
+
+    async def start_conditions(self, llm):
+        print("!!! doing start conditions")
+        # Move on to visit reasons
+        self._context.set_tools(
+            [
+                {
+                    "type": "function",
+                    "function": {
+                        "name": "list_visit_reasons",
+                        "description": "Once the user has provided a list of the reasons they are visiting a doctor today, call this function.",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "visit_reasons": {
+                                    "type": "array",
+                                    "items": {
+                                        "type": "object",
+                                        "properties": {
+                                            "name": {
+                                                "type": "string",
+                                                "description": "The user's reason for visiting the doctor",
+                                            }},
+                                    },
+                                }},
+                        },
+                    },
+                }])
+        self._context.add_message(
+            {"role": "system", "content": "Finally, ask the user the reason for their doctor visit today. Once they answer, call the list_visit_reasons function."})
+        await llm.process_frame(OpenAILLMContextFrame(self._context), FrameDirection.DOWNSTREAM)
+        pass
+
+    async def start_visit_reasons(self, llm):
+        print("!!! doing start visit reasons")
+        # move to finish call
+        self._context.set_tools([])
+        self._context.add_message({"role": "system",
+                                   "content": "Now, thank the user and end the conversation."})
+        await llm.process_frame(OpenAILLMContextFrame(self._context), FrameDirection.DOWNSTREAM)
+        pass
+
+    async def save_data(self, llm, args):
+        logger.info(f"!!! Saving data: {args}")
+        # Since this is supposed to be "async", returning None from the callback
+        # will prevent adding anything to context or re-prompting
+        return None
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Chatbot",
+            DailyParams(
+                audio_out_enabled=True,
+                camera_out_enabled=True,
+                camera_out_width=1024,
+                camera_out_height=576,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(),
+                transcription_enabled=True,
+                #
+                # Spanish
+                #
+                # transcription_settings=DailyTranscriptionSettings(
+                #     language="es",
+                #     tier="nova",
+                #     model="2-general"
+                # )
+            )
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            #
+            # English
+            #
+            voice_id="pNInz6obpgDQGcFmaJgB",
+
+            #
+            # Spanish
+            #
+            # model="eleven_multilingual_v2",
+            # voice_id="gD1IexrzCvsXPHUuT0s3",
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            model="gpt-4o")
+
+        messages = []
+        context = OpenAILLMContext(
+            messages=messages,
+        )
+        user_context = LLMUserContextAggregator(context)
+        assistant_context = LLMAssistantContextAggregator(context)
+        # checklist = ChecklistProcessor(context, llm)
+        intake = IntakeProcessor(context, llm)
+        llm.register_function("verify_birthday", intake.verify_birthday)
+        llm.register_function(
+            "list_prescriptions",
+            intake.save_data,
+            start_callback=intake.start_prescriptions)
+        llm.register_function(
+            "list_allergies",
+            intake.save_data,
+            start_callback=intake.start_allergies)
+        llm.register_function(
+            "list_conditions",
+            intake.save_data,
+            start_callback=intake.start_conditions)
+        llm.register_function(
+            "list_visit_reasons",
+            intake.save_data,
+            start_callback=intake.start_visit_reasons)
+        fl = FrameLogger("LLM Output")
+
+        pipeline = Pipeline([
+            transport.input(),
+            user_context,
+            llm,
+            fl,
+            tts,
+            transport.output(),
+            assistant_context,
+        ])
+
+        task = PipelineTask(pipeline, allow_interruptions=False)
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            transport.capture_participant_transcription(participant["id"])
+            print(f"Context is: {context}")
+            await task.queue_frames([OpenAILLMContextFrame(context)])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/examples/patient-intake/env.example
+++ b/examples/patient-intake/env.example
@@ -0,0 +1,4 @@
+DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
+DAILY_API_KEY=7df...
+OPENAI_API_KEY=sk-PL...
+ELEVENLABS_API_KEY=aeb...
--- a/examples/patient-intake/image.png
+++ b/examples/patient-intake/image.png
--- a/examples/patient-intake/requirements.txt
+++ b/examples/patient-intake/requirements.txt
@@ -0,0 +1,5 @@
+python-dotenv
+requests
+fastapi[all]
+uvicorn
+pipecat-ai[daily,openai,silero]
--- a/examples/patient-intake/runner.py
+++ b/examples/patient-intake/runner.py
@@ -0,0 +1,58 @@
+import argparse
+import os
+import time
+import urllib
+import requests
+
+
+def configure():
+    parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
+    parser.add_argument(
+        "-u",
+        "--url",
+        type=str,
+        required=False,
+        help="URL of the Daily room to join")
+    parser.add_argument(
+        "-k",
+        "--apikey",
+        type=str,
+        required=False,
+        help="Daily API Key (needed to create an owner token for the room)",
+    )
+
+    args, unknown = parser.parse_known_args()
+
+    url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
+    key = args.apikey or os.getenv("DAILY_API_KEY")
+
+    if not url:
+        raise Exception(
+            "No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL.")
+
+    if not key:
+        raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
+
+    # Create a meeting token for the given room with an expiration 1 hour in
+    # the future.
+    room_name: str = urllib.parse.urlparse(url).path[1:]
+    expiration: float = time.time() + 60 * 60
+
+    res: requests.Response = requests.post(
+        f"https://api.daily.co/v1/meeting-tokens",
+        headers={
+            "Authorization": f"Bearer {key}"},
+        json={
+            "properties": {
+                "room_name": room_name,
+                "is_owner": True,
+                "exp": expiration}},
+    )
+
+    if res.status_code != 200:
+        raise Exception(
+            f"Failed to create meeting token: {res.status_code} {res.text}")
+
+    token: str = res.json()["token"]
+
+    return (url, token)
--- a/examples/patient-intake/server.py
+++ b/examples/patient-intake/server.py
@@ -0,0 +1,124 @@
+import os
+import argparse
+import subprocess
+import atexit
+
+from fastapi import FastAPI, Request, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import JSONResponse, RedirectResponse
+
+from utils.daily_helpers import create_room as _create_room, get_token
+
+MAX_BOTS_PER_ROOM = 1
+
+# Bot sub-process dict for status reporting and concurrency control
+bot_procs = {}
+
+
+def cleanup():
+    # Clean up function, just to be extra safe
+    for proc in bot_procs.values():
+        proc.terminate()
+        proc.wait()
+
+
+atexit.register(cleanup)
+
+
+app = FastAPI()
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+@app.get("/start")
+async def start_agent(request: Request):
+    print(f"!!! Creating room")
+    room_url, room_name = _create_room()
+    print(f"!!! Room URL: {room_url}")
+    # Ensure the room property is present
+    if not room_url:
+        raise HTTPException(
+            status_code=500,
+            detail="Missing 'room' property in request data. Cannot start agent without a target room!")
+
+    # Check if there is already an existing process running in this room
+    num_bots_in_room = sum(
+        1 for proc in bot_procs.values() if proc[1] == room_url and proc[0].poll() is None)
+    if num_bots_in_room >= MAX_BOTS_PER_ROOM:
+        raise HTTPException(
+            status_code=500, detail=f"Max bot limited reach for room: {room_url}")
+
+    # Get the token for the room
+    token = get_token(room_url)
+
+    if not token:
+        raise HTTPException(
+            status_code=500, detail=f"Failed to get token for room: {room_url}")
+
+    # Spawn a new agent, and join the user session
+    # Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
+    try:
+        proc = subprocess.Popen(
+            [
+                f"python3 -m bot -u {room_url} -t {token}"
+            ],
+            shell=True,
+            bufsize=1,
+            cwd=os.path.dirname(os.path.abspath(__file__))
+        )
+        bot_procs[proc.pid] = (proc, room_url)
+    except Exception as e:
+        raise HTTPException(
+            status_code=500, detail=f"Failed to start subprocess: {e}")
+
+    return RedirectResponse(room_url)
+
+
+@app.get("/status/{pid}")
+def get_status(pid: int):
+    # Look up the subprocess
+    proc = bot_procs.get(pid)
+
+    # If the subprocess doesn't exist, return an error
+    if not proc:
+        raise HTTPException(
+            status_code=404, detail=f"Bot with process id: {pid} not found")
+
+    # Check the status of the subprocess
+    if proc[0].poll() is None:
+        status = "running"
+    else:
+        status = "finished"
+
+    return JSONResponse({"bot_id": pid, "status": status})
+
+
+if __name__ == "__main__":
+    import uvicorn
+
+    default_host = os.getenv("HOST", "0.0.0.0")
+    default_port = int(os.getenv("FAST_API_PORT", "7860"))
+
+    parser = argparse.ArgumentParser(
+        description="Daily Storyteller FastAPI server")
+    parser.add_argument("--host", type=str,
+                        default=default_host, help="Host address")
+    parser.add_argument("--port", type=int,
+                        default=default_port, help="Port number")
+    parser.add_argument("--reload", action="store_true",
+                        help="Reload code on change")
+
+    config = parser.parse_args()
+    print(f"to join a test room, visit http://localhost:{config.port}/start")
+    uvicorn.run(
+        "server:app",
+        host=config.host,
+        port=config.port,
+        reload=config.reload,
+    )
--- a/examples/patient-intake/utils/daily_helpers.py
+++ b/examples/patient-intake/utils/daily_helpers.py
@@ -0,0 +1,109 @@
+
+import urllib.parse
+import os
+import time
+import urllib
+import requests
+
+from dotenv import load_dotenv
+load_dotenv()
+
+
+daily_api_path = os.getenv("DAILY_API_URL") or "api.daily.co/v1"
+daily_api_key = os.getenv("DAILY_API_KEY")
+
+
+def create_room() -> tuple[str, str]:
+    """
+    Helper function to create a Daily room.
+    # See: https://docs.daily.co/reference/rest-api/rooms
+
+    Returns:
+        tuple: A tuple containing the room URL and room name.
+
+    Raises:
+        Exception: If the request to create the room fails or if the response does not contain the room URL or room name.
+    """
+    room_props = {
+        "exp": time.time() + 60 * 60,  # 1 hour
+        "enable_chat": True,
+        "enable_emoji_reactions": True,
+        "eject_at_room_exp": True,
+        "enable_prejoin_ui": False,  # Important for the bot to be able to join headlessly
+    }
+    res = requests.post(
+        f"https://{daily_api_path}/rooms",
+        headers={"Authorization": f"Bearer {daily_api_key}"},
+        json={
+            "properties": room_props
+        },
+    )
+    if res.status_code != 200:
+        raise Exception(f"Unable to create room: {res.text}")
+
+    data = res.json()
+    room_url: str = data.get("url")
+    room_name: str = data.get("name")
+    if room_url is None or room_name is None:
+        raise Exception("Missing room URL or room name in response")
+
+    return room_url, room_name
+
+
+def get_name_from_url(room_url: str) -> str:
+    """
+    Extracts the name from a given room URL.
+
+    Args:
+        room_url (str): The URL of the room.
+
+    Returns:
+        str: The extracted name from the room URL.
+    """
+    return urllib.parse.urlparse(room_url).path[1:]
+
+
+def get_token(room_url: str) -> str:
+    """
+    Retrieves a meeting token for the specified Daily room URL.
+    # See: https://docs.daily.co/reference/rest-api/meeting-tokens
+
+    Args:
+        room_url (str): The URL of the Daily room.
+
+    Returns:
+        str: The meeting token.
+
+    Raises:
+        Exception: If no room URL is specified or if no Daily API key is specified.
+        Exception: If there is an error creating the meeting token.
+    """
+    if not room_url:
+        raise Exception(
+            "No Daily room specified. You must specify a Daily room in order a token to be generated.")
+
+    if not daily_api_key:
+        raise Exception(
+            "No Daily API key specified. set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
+
+    expiration: float = time.time() + 60 * 60
+    room_name = get_name_from_url(room_url)
+
+    res: requests.Response = requests.post(
+        f"https://{daily_api_path}/meeting-tokens",
+        headers={
+            "Authorization": f"Bearer {daily_api_key}"},
+        json={
+            "properties": {
+                "room_name": room_name,
+                "is_owner": True,  # Owner tokens required for transcription
+                "exp": expiration}},
+    )
+
+    if res.status_code != 200:
+        raise Exception(
+            f"Failed to create meeting token: {res.status_code} {res.text}")
+
+    token: str = res.json()["token"]
+
+    return token
--- a/examples/simple-chatbot/bot.py
+++ b/examples/simple-chatbot/bot.py
@@ -7,7 +7,7 @@ from PIL import Image

 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineTask
+from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
 from pipecat.frames.frames import (
    AudioRawFrame,
@@ -149,7 +149,7 @@ async def main(room_url: str, token):
            assistant_response,
        ])

-        task = PipelineTask(pipeline, allow_interruptions=True)
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
        await task.queue_frame(quiet_frame)

        @transport.event_handler("on_first_participant_joined")
--- a/linux-py3.10-requirements.txt
+++ b/linux-py3.10-requirements.txt
@@ -8,11 +8,11 @@ aiohttp==3.9.5
    # via pipecat-ai (pyproject.toml)
 aiosignal==1.3.1
    # via aiohttp
-annotated-types==0.6.0
+annotated-types==0.7.0
    # via pydantic
 anthropic==0.25.9
    # via pipecat-ai (pyproject.toml)
-anyio==4.3.0
+anyio==4.4.0
    # via
    #   anthropic
    #   httpx
@@ -40,9 +40,9 @@ click==8.1.7
    # via flask
 coloredlogs==15.0.1
    # via onnxruntime
-ctranslate2==4.3.0
+ctranslate2==4.2.1
    # via faster-whisper
-daily-python==0.7.4
+daily-python==0.9.0
    # via pipecat-ai (pyproject.toml)
 distro==1.9.0
    # via
@@ -79,6 +79,8 @@ fsspec==2024.5.0
    # via
    #   huggingface-hub
    #   torch
+future==1.0.0
+    # via pyloudnorm
 google-ai-generativelanguage==0.6.4
    # via google-generativeai
 google-api-core[grpc]==2.19.0
@@ -86,7 +88,7 @@ google-api-core[grpc]==2.19.0
    #   google-ai-generativelanguage
    #   google-api-python-client
    #   google-generativeai
-google-api-python-client==2.129.0
+google-api-python-client==2.131.0
    # via google-generativeai
 google-auth==2.29.0
    # via
@@ -103,7 +105,7 @@ googleapis-common-protos==1.63.0
    # via
    #   google-api-core
    #   grpcio-status
-grpcio==1.63.0
+grpcio==1.64.0
    # via
    #   google-api-core
    #   grpcio-status
@@ -125,7 +127,7 @@ httpx==0.27.0
    #   openai
 httpx-sse==0.4.0
    # via fal-client
-huggingface-hub==0.23.0
+huggingface-hub==0.23.2
    # via
    #   faster-whisper
    #   timm
@@ -164,6 +166,8 @@ numpy==1.26.4
    #   ctranslate2
    #   onnxruntime
    #   pipecat-ai (pyproject.toml)
+    #   pyloudnorm
+    #   scipy
    #   torchvision
    #   transformers
 nvidia-cublas-cu12==12.1.3.1
@@ -191,7 +195,7 @@ nvidia-cusparse-cu12==12.1.0.106
    #   torch
 nvidia-nccl-cu12==2.20.5
    # via torch
-nvidia-nvjitlink-cu12==12.4.127
+nvidia-nvjitlink-cu12==12.5.40
    # via
    #   nvidia-cusolver-cu12
    #   nvidia-cusparse-cu12
@@ -232,15 +236,17 @@ pyasn1-modules==0.4.0
    # via google-auth
 pyaudio==0.2.14
    # via pipecat-ai (pyproject.toml)
-pydantic==2.7.1
+pydantic==2.7.2
    # via
    #   anthropic
    #   google-generativeai
    #   openai
-pydantic-core==2.18.2
+pydantic-core==2.18.3
    # via pydantic
 pyht==0.0.28
    # via pipecat-ai (pyproject.toml)
+pyloudnorm==0.1.1
+    # via pipecat-ai (pyproject.toml)
 pyparsing==3.1.2
    # via httplib2
 python-dotenv==1.0.1
@@ -253,7 +259,7 @@ pyyaml==6.0.1
    #   transformers
 regex==2024.5.15
    # via transformers
-requests==2.31.0
+requests==2.32.2
    # via
    #   google-api-core
    #   huggingface-hub
@@ -265,6 +271,8 @@ safetensors==0.4.3
    # via
    #   timm
    #   transformers
+scipy==1.13.1
+    # via pyloudnorm
 sniffio==1.3.1
    # via
    #   anthropic
--- a/macos-py3.10-requirements.txt
+++ b/macos-py3.10-requirements.txt
@@ -5,14 +5,16 @@
 #    pip-compile --all-extras pyproject.toml
 #
 aiohttp==3.9.5
-    # via pipecat-ai (pyproject.toml)
+    # via
+    #   cartesia
+    #   pipecat-ai (pyproject.toml)
 aiosignal==1.3.1
    # via aiohttp
-annotated-types==0.6.0
+annotated-types==0.7.0
    # via pydantic
 anthropic==0.25.9
    # via pipecat-ai (pyproject.toml)
-anyio==4.3.0
+anyio==4.4.0
    # via
    #   anthropic
    #   httpx
@@ -21,7 +23,7 @@ async-timeout==4.0.3
    # via aiohttp
 attrs==23.2.0
    # via aiohttp
-av==12.0.0
+av==12.1.0
    # via faster-whisper
 azure-cognitiveservices-speech==1.37.0
    # via pipecat-ai (pyproject.toml)
@@ -29,11 +31,15 @@ blinker==1.8.2
    # via flask
 cachetools==5.3.3
    # via google-auth
+cartesia==0.1.0
+    # via pipecat-ai (pyproject.toml)
 certifi==2024.2.2
    # via
    #   httpcore
    #   httpx
    #   requests
+cffi==1.16.0
+    # via sounddevice
 charset-normalizer==3.3.2
    # via requests
 click==8.1.7
@@ -42,7 +48,7 @@ coloredlogs==15.0.1
    # via onnxruntime
 ctranslate2==4.2.1
    # via faster-whisper
-daily-python==0.7.4
+daily-python==0.9.1
    # via pipecat-ai (pyproject.toml)
 distro==1.9.0
    # via
@@ -51,7 +57,9 @@ distro==1.9.0
 einops==0.8.0
    # via pipecat-ai (pyproject.toml)
 exceptiongroup==1.2.1
-    # via anyio
+    # via
+    #   anyio
+    #   pytest
 fal-client==0.4.0
    # via pipecat-ai (pyproject.toml)
 faster-whisper==1.0.2
@@ -78,14 +86,16 @@ fsspec==2024.5.0
    # via
    #   huggingface-hub
    #   torch
-google-ai-generativelanguage==0.6.3
+future==1.0.0
+    # via pyloudnorm
+google-ai-generativelanguage==0.6.4
    # via google-generativeai
 google-api-core[grpc]==2.19.0
    # via
    #   google-ai-generativelanguage
    #   google-api-python-client
    #   google-generativeai
-google-api-python-client==2.129.0
+google-api-python-client==2.131.0
    # via google-generativeai
 google-auth==2.29.0
    # via
@@ -96,13 +106,13 @@ google-auth==2.29.0
    #   google-generativeai
 google-auth-httplib2==0.2.0
    # via google-api-python-client
-google-generativeai==0.5.3
+google-generativeai==0.5.4
    # via pipecat-ai (pyproject.toml)
 googleapis-common-protos==1.63.0
    # via
    #   google-api-core
    #   grpcio-status
-grpcio==1.63.0
+grpcio==1.64.0
    # via
    #   google-api-core
    #   grpcio-status
@@ -120,11 +130,12 @@ httplib2==0.22.0
 httpx==0.27.0
    # via
    #   anthropic
+    #   cartesia
    #   fal-client
    #   openai
 httpx-sse==0.4.0
    # via fal-client
-huggingface-hub==0.23.0
+huggingface-hub==0.23.2
    # via
    #   faster-whisper
    #   timm
@@ -138,6 +149,8 @@ idna==3.7
    #   httpx
    #   requests
    #   yarl
+iniconfig==2.0.0
+    # via pytest
 itsdangerous==2.2.0
    # via flask
 jinja2==3.1.4
@@ -163,9 +176,11 @@ numpy==1.26.4
    #   ctranslate2
    #   onnxruntime
    #   pipecat-ai (pyproject.toml)
+    #   pyloudnorm
+    #   scipy
    #   torchvision
    #   transformers
-onnxruntime==1.17.3
+onnxruntime==1.18.0
    # via faster-whisper
 openai==1.26.0
    # via pipecat-ai (pyproject.toml)
@@ -173,11 +188,14 @@ packaging==24.0
    # via
    #   huggingface-hub
    #   onnxruntime
+    #   pytest
    #   transformers
 pillow==10.3.0
    # via
    #   pipecat-ai (pyproject.toml)
    #   torchvision
+pluggy==1.5.0
+    # via pytest
 proto-plus==1.23.0
    # via
    #   google-ai-generativelanguage
@@ -200,17 +218,25 @@ pyasn1-modules==0.4.0
    # via google-auth
 pyaudio==0.2.14
    # via pipecat-ai (pyproject.toml)
-pydantic==2.7.1
+pycparser==2.22
+    # via cffi
+pydantic==2.7.2
    # via
    #   anthropic
    #   google-generativeai
    #   openai
-pydantic-core==2.18.2
+pydantic-core==2.18.3
    # via pydantic
 pyht==0.0.28
    # via pipecat-ai (pyproject.toml)
+pyloudnorm==0.1.1
+    # via pipecat-ai (pyproject.toml)
 pyparsing==3.1.2
    # via httplib2
+pytest==8.2.1
+    # via pytest-asyncio
+pytest-asyncio==0.23.7
+    # via cartesia
 python-dotenv==1.0.1
    # via pipecat-ai (pyproject.toml)
 pyyaml==6.0.1
@@ -221,8 +247,9 @@ pyyaml==6.0.1
    #   transformers
 regex==2024.5.15
    # via transformers
-requests==2.31.0
+requests==2.32.3
    # via
+    #   cartesia
    #   google-api-core
    #   huggingface-hub
    #   pyht
@@ -233,13 +260,17 @@ safetensors==0.4.3
    # via
    #   timm
    #   transformers
+scipy==1.13.1
+    # via pyloudnorm
 sniffio==1.3.1
    # via
    #   anthropic
    #   anyio
    #   httpx
    #   openai
-sympy==1.12
+sounddevice==0.4.7
+    # via pipecat-ai (pyproject.toml)
+sympy==1.12.1
    # via
    #   onnxruntime
    #   torch
@@ -250,6 +281,8 @@ tokenizers==0.19.1
    #   anthropic
    #   faster-whisper
    #   transformers
+tomli==2.0.1
+    # via pytest
 torch==2.3.0
    # via
    #   pipecat-ai (pyproject.toml)
@@ -284,7 +317,9 @@ uritemplate==4.1.1
 urllib3==2.2.1
    # via requests
 websockets==12.0
-    # via pipecat-ai (pyproject.toml)
+    # via
+    #   cartesia
+    #   pipecat-ai (pyproject.toml)
 werkzeug==3.0.3
    # via flask
 yarl==1.9.4
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -24,6 +24,7 @@ dependencies = [
    "numpy~=1.26.4",
    "loguru~=0.7.0",
    "Pillow~=10.3.0",
+    "pyloudnorm~=0.1.1",
    "typing-extensions~=4.11.0",
 ]

@@ -34,7 +35,8 @@ Website = "https://pipecat.ai"
 [project.optional-dependencies]
 anthropic = [ "anthropic~=0.25.7" ]
 azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
-daily = [ "daily-python~=0.7.4" ]
+cartesia = [ "numpy~=1.26.0", "sounddevice", "cartesia" ]
+daily = [ "daily-python~=0.9.0" ]
 examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
 fal = [ "fal-client~=0.4.0" ]
 google = [ "google-generativeai~=0.5.3" ]
--- a/src/pipecat/frames/frames.py
+++ b/src/pipecat/frames/frames.py
@@ -307,7 +307,7 @@ class UserStoppedSpeakingFrame(ControlFrame):
@dataclass
 class TTSStartedFrame(ControlFrame):
    """Used to indicate the beginning of a TTS response. Following
-    AudioRawFrames are part of the TTS response until an TTSEndFrame. These
+    AudioRawFrames are part of the TTS response until an TTSStoppedFrame. These
    frames can be used for aggregating audio frames in a transport to optimize
    the size of frames sent to the session, without needing to control this in
    the TTS service.
--- a/src/pipecat/pipeline/merge_pipeline.py
+++ b/src/pipecat/pipeline/merge_pipeline.py
@@ -1,5 +1,5 @@
 from typing import List
-from pipecat.pipeline.frames import EndFrame, EndPipeFrame
+from pipecat.frames.frames import EndFrame
 from pipecat.pipeline.pipeline import Pipeline


@@ -16,8 +16,7 @@ class SequentialMergePipeline(Pipeline):
            while True:
                frame = await pipeline.sink.get()
                if isinstance(
-                        frame, EndFrame) or isinstance(
-                        frame, EndPipeFrame):
+                        frame, EndFrame):
                    break
                await self.sink.put(frame)

--- a/src/pipecat/pipeline/task.py
+++ b/src/pipecat/pipeline/task.py
@@ -8,6 +8,8 @@ import asyncio

 from typing import AsyncIterable, Iterable

+from pydantic import BaseModel
+
 from pipecat.frames.frames import CancelFrame, EndFrame, ErrorFrame, Frame, StartFrame, StopTaskFrame
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.utils.utils import obj_count, obj_id
@@ -15,6 +17,10 @@ from pipecat.utils.utils import obj_count, obj_id
 from loguru import logger


+class PipelineParams(BaseModel):
+    allow_interruptions: bool = False
+
+
 class Source(FrameProcessor):

    def __init__(self, up_queue: asyncio.Queue):
@@ -31,12 +37,12 @@ class Source(FrameProcessor):

 class PipelineTask:

-    def __init__(self, pipeline: FrameProcessor, allow_interruptions=False):
+    def __init__(self, pipeline: FrameProcessor, params: PipelineParams = PipelineParams()):
        self.id: int = obj_id()
        self.name: str = f"{self.__class__.__name__}#{obj_count(self)}"

        self._pipeline = pipeline
-        self._allow_interruptions = allow_interruptions
+        self._params = params

        self._down_queue = asyncio.Queue()
        self._up_queue = asyncio.Queue()
@@ -77,7 +83,7 @@ class PipelineTask:

    async def _process_down_queue(self):
        await self._source.process_frame(
-            StartFrame(allow_interruptions=self._allow_interruptions), FrameDirection.DOWNSTREAM)
+            StartFrame(allow_interruptions=self._params.allow_interruptions), FrameDirection.DOWNSTREAM)
        running = True
        should_cleanup = True
        while running:
--- a/src/pipecat/processors/aggregators/gated.py
+++ b/src/pipecat/processors/aggregators/gated.py
@@ -17,7 +17,7 @@ class GatedAggregator(FrameProcessor):
    Yields gate-opening frame before any accumulated frames, then ensuing frames
    until and not including the gate-closed frame.

-    >>> from pipecat.pipeline.frames import ImageFrame
+    >>> from pipecat.frames.frames import ImageFrame

    >>> async def print_frames(aggregator, frame):
    ...     async for frame in aggregator.process_frame(frame):
--- a/src/pipecat/processors/aggregators/llm_context.py
+++ b/src/pipecat/processors/aggregators/llm_context.py
@@ -1,82 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-from pipecat.frames.frames import Frame, InterimTranscriptionFrame, LLMMessagesFrame, TextFrame
-from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
-
-
-class LLMContextAggregator(FrameProcessor):
-    def __init__(
-        self,
-        messages: list[dict],
-        role: str,
-        complete_sentences=True,
-        pass_through=True,
-    ):
-        super().__init__()
-        self._messages = messages
-        self._role = role
-        self._sentence = ""
-        self._complete_sentences = complete_sentences
-        self._pass_through = pass_through
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        # We don't do anything with non-text frames, pass it along to next in
-        # the pipeline.
-        if not isinstance(frame, TextFrame):
-            await self.push_frame(frame, direction)
-            return
-
-        # If we get interim results, we ignore them.
-        if isinstance(frame, InterimTranscriptionFrame):
-            return
-
-        # The common case for "pass through" is receiving frames from the LLM that we'll
-        # use to update the "assistant" LLM messages, but also passing the text frames
-        # along to a TTS service to be spoken to the user.
-        if self._pass_through:
-            await self.push_frame(frame, direction)
-
-        # TODO: split up transcription by participant
-        if self._complete_sentences:
-            # type: ignore -- the linter thinks this isn't a TextFrame, even
-            # though we check it above
-            self._sentence += frame.text
-            if self._sentence.endswith((".", "?", "!")):
-                self._messages.append(
-                    {"role": self._role, "content": self._sentence})
-                self._sentence = ""
-                await self.push_frame(LLMMessagesFrame(self._messages))
-        else:
-            # type: ignore -- the linter thinks this isn't a TextFrame, even
-            # though we check it above
-            self._messages.append({"role": self._role, "content": frame.text})
-            await self.push_frame(LLMMessagesFrame(self._messages))
-
-
-class LLMUserContextAggregator(LLMContextAggregator):
-    def __init__(
-            self,
-            messages: list[dict],
-            complete_sentences=True):
-        super().__init__(
-            messages,
-            "user",
-            complete_sentences,
-            pass_through=False)
-
-
-class LLMAssistantContextAggregator(LLMContextAggregator):
-    def __init__(
-            self,
-            messages: list[dict],
-            complete_sentences=True):
-        super().__init__(
-            messages,
-            "assistant",
-            complete_sentences,
-            pass_through=True,
-        )
--- a/src/pipecat/processors/aggregators/llm_response.py
+++ b/src/pipecat/processors/aggregators/llm_response.py
@@ -6,17 +6,20 @@

 from typing import List

+from pipecat.services.openai import OpenAILLMContextFrame, OpenAILLMContext
+
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.frames.frames import (
    Frame,
    InterimTranscriptionFrame,
    LLMFullResponseEndFrame,
-    LLMMessagesFrame,
-    LLMResponseStartFrame,
-    StartInterruptionFrame,
-    TextFrame,
+    LLMFullResponseStartFrame,
    LLMResponseEndFrame,
+    LLMResponseStartFrame,
+    LLMMessagesFrame,
+    StartInterruptionFrame,
    TranscriptionFrame,
+    TextFrame,
    UserStartedSpeakingFrame,
    UserStoppedSpeakingFrame)

@@ -31,7 +34,8 @@ class LLMResponseAggregator(FrameProcessor):
        start_frame,
        end_frame,
        accumulator_frame: TextFrame,
-        interim_accumulator_frame: TextFrame | None = None
+        interim_accumulator_frame: TextFrame | None = None,
+        handle_interruptions: bool = False
    ):
        super().__init__()

@@ -41,10 +45,19 @@ class LLMResponseAggregator(FrameProcessor):
        self._end_frame = end_frame
        self._accumulator_frame = accumulator_frame
        self._interim_accumulator_frame = interim_accumulator_frame
+        self._handle_interruptions = handle_interruptions

        # Reset our accumulator state.
        self._reset()

+    @property
+    def messages(self):
+        return self._messages
+
+    @property
+    def role(self):
+        return self._role
+
    #
    # Frame processor
    #
@@ -69,10 +82,14 @@ class LLMResponseAggregator(FrameProcessor):
        send_aggregation = False

        if isinstance(frame, self._start_frame):
-            self._seen_start_frame = True
+            self._aggregation = ""
            self._aggregating = True
+            self._seen_start_frame = True
+            self._seen_end_frame = False
+            self._seen_interim_results = False
        elif isinstance(frame, self._end_frame):
            self._seen_end_frame = True
+            self._seen_start_frame = False

            # We might have received the end frame but we might still be
            # aggregating (i.e. we have seen interim results but not the final
@@ -94,7 +111,9 @@ class LLMResponseAggregator(FrameProcessor):
            self._seen_interim_results = False
        elif self._interim_accumulator_frame and isinstance(frame, self._interim_accumulator_frame):
            self._seen_interim_results = True
-        elif isinstance(frame, StartInterruptionFrame):
+        elif self._handle_interruptions and isinstance(frame, StartInterruptionFrame):
+            await self._push_aggregation()
+            # Reset anyways
            self._reset()
            await self.push_frame(frame, direction)
        else:
@@ -106,12 +125,14 @@ class LLMResponseAggregator(FrameProcessor):
    async def _push_aggregation(self):
        if len(self._aggregation) > 0:
            self._messages.append({"role": self._role, "content": self._aggregation})
+
+            # Reset the aggregation. Reset it before pushing it down, otherwise
+            # if the tasks gets cancelled we won't be able to clear things up.
+            self._aggregation = ""
+
            frame = LLMMessagesFrame(self._messages)
            await self.push_frame(frame)

-            # Reset our accumulator state.
-            self._reset()
-
    def _reset(self):
        self._aggregation = ""
        self._aggregating = False
@@ -125,9 +146,10 @@ class LLMAssistantResponseAggregator(LLMResponseAggregator):
        super().__init__(
            messages=messages,
            role="assistant",
-            start_frame=LLMResponseStartFrame,
-            end_frame=LLMResponseEndFrame,
-            accumulator_frame=TextFrame
+            start_frame=LLMFullResponseStartFrame,
+            end_frame=LLMFullResponseEndFrame,
+            accumulator_frame=TextFrame,
+            handle_interruptions=True
        )


@@ -193,3 +215,44 @@ class LLMFullResponseAggregator(FrameProcessor):
            self._aggregation = ""
        else:
            await self.push_frame(frame, direction)
+
+
+class LLMContextAggregator(LLMResponseAggregator):
+    def __init__(self, *, context: OpenAILLMContext, **kwargs):
+
+        self._context = context
+        super().__init__(**kwargs)
+
+    async def _push_aggregation(self):
+        if len(self._aggregation) > 0:
+            self._context.add_message({"role": self._role, "content": self._aggregation})
+            frame = OpenAILLMContextFrame(self._context)
+            await self.push_frame(frame)
+
+            # Reset our accumulator state.
+            self._reset()
+
+
+class LLMAssistantContextAggregator(LLMContextAggregator):
+    def __init__(self, context: OpenAILLMContext):
+        super().__init__(
+            messages=[],
+            context=context,
+            role="assistant",
+            start_frame=LLMResponseStartFrame,
+            end_frame=LLMResponseEndFrame,
+            accumulator_frame=TextFrame
+        )
+
+
+class LLMUserContextAggregator(LLMContextAggregator):
+    def __init__(self, context: OpenAILLMContext):
+        super().__init__(
+            messages=[],
+            context=context,
+            role="user",
+            start_frame=UserStartedSpeakingFrame,
+            end_frame=UserStoppedSpeakingFrame,
+            accumulator_frame=TranscriptionFrame,
+            interim_accumulator_frame=InterimTranscriptionFrame
+        )
--- a/src/pipecat/processors/aggregators/user_response.py
+++ b/src/pipecat/processors/aggregators/user_response.py
@@ -85,10 +85,13 @@ class ResponseAggregator(FrameProcessor):
        send_aggregation = False

        if isinstance(frame, self._start_frame):
-            self._seen_start_frame = True
            self._aggregating = True
+            self._seen_start_frame = True
+            self._seen_end_frame = False
+            self._seen_interim_results = False
        elif isinstance(frame, self._end_frame):
            self._seen_end_frame = True
+            self._seen_start_frame = False

            # We might have received the end frame but we might still be
            # aggregating (i.e. we have seen interim results but not the final
@@ -110,9 +113,6 @@ class ResponseAggregator(FrameProcessor):
            self._seen_interim_results = False
        elif self._interim_accumulator_frame and isinstance(frame, self._interim_accumulator_frame):
            self._seen_interim_results = True
-        elif isinstance(frame, StartInterruptionFrame):
-            self._reset()
-            await self.push_frame(frame, direction)
        else:
            await self.push_frame(frame, direction)

@@ -121,7 +121,13 @@ class ResponseAggregator(FrameProcessor):

    async def _push_aggregation(self):
        if len(self._aggregation) > 0:
-            await self.push_frame(TextFrame(self._aggregation.strip()))
+            frame = TextFrame(self._aggregation.strip())
+
+            # Reset the aggregation. Reset it before pushing it down, otherwise
+            # if the tasks gets cancelled we won't be able to clear things up.
+            self._aggregation = ""
+
+            await self.push_frame(frame)

            # Reset our accumulator state.
            self._reset()
--- a/src/pipecat/processors/aggregators/vision_image_frame.py
+++ b/src/pipecat/processors/aggregators/vision_image_frame.py
@@ -12,7 +12,7 @@ class VisionImageFrameAggregator(FrameProcessor):
    """This aggregator waits for a consecutive TextFrame and an
    ImageFrame. After the ImageFrame arrives it will output a VisionImageFrame.

-    >>> from pipecat.pipeline.frames import ImageFrame
+    >>> from pipecat.frames.frames import ImageFrame

    >>> async def print_frames(aggregator, frame):
    ...     async for frame in aggregator.process_frame(frame):
--- a/src/pipecat/processors/filters/frame_filter.py
+++ b/src/pipecat/processors/filters/frame_filter.py
@@ -10,7 +10,7 @@ from pipecat.frames.frames import AppFrame, ControlFrame, Frame, SystemFrame
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor


-class Filter(FrameProcessor):
+class FrameFilter(FrameProcessor):

    def __init__(self, types: List[type]):
        super().__init__()
--- a/src/pipecat/processors/filters/wake_check_filter.py
+++ b/src/pipecat/processors/filters/wake_check_filter.py
@@ -0,0 +1,84 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import re
+import time
+
+from enum import Enum
+
+from pipecat.frames.frames import ErrorFrame, Frame, TranscriptionFrame
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+
+from loguru import logger
+
+
+class WakeCheckFilter(FrameProcessor):
+    """
+    This filter looks for wake phrases in the transcription frames and only passes through frames
+    after a wake phrase has been detected. It also has a keepalive timeout to allow for a brief
+    period of continued conversation after a wake phrase has been detected.
+    """
+    class WakeState(Enum):
+        IDLE = 1
+        AWAKE = 2
+
+    class ParticipantState:
+        def __init__(self, participant_id: str):
+            self.participant_id = participant_id
+            self.state = WakeCheckFilter.WakeState.IDLE
+            self.wake_timer = 0.0
+            self.accumulator = ""
+
+    def __init__(self, wake_phrases: list[str], keepalive_timeout: float = 3):
+        super().__init__()
+        self._participant_states = {}
+        self._keepalive_timeout = keepalive_timeout
+        self._wake_patterns = []
+        for name in wake_phrases:
+            pattern = re.compile(r'\b' + r'\s*'.join(re.escape(word)
+                                 for word in name.split()) + r'\b', re.IGNORECASE)
+            self._wake_patterns.append(pattern)
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        try:
+            if isinstance(frame, TranscriptionFrame):
+                p = self._participant_states.get(frame.user_id)
+                if p is None:
+                    p = WakeCheckFilter.ParticipantState(frame.user_id)
+                    self._participant_states[frame.user_id] = p
+
+                # If we have been AWAKE within the last keepalive_timeout seconds, pass
+                # the frame through
+                if p.state == WakeCheckFilter.WakeState.AWAKE:
+                    if time.time() - p.wake_timer < self._keepalive_timeout:
+                        logger.debug(
+                            f"Wake phrase keepalive timeout has not expired. Pushing {frame}")
+                        p.wake_timer = time.time()
+                        await self.push_frame(frame)
+                        return
+                    else:
+                        p.state = WakeCheckFilter.WakeState.IDLE
+
+                p.accumulator += frame.text
+                for pattern in self._wake_patterns:
+                    match = pattern.search(p.accumulator)
+                    if match:
+                        logger.debug(f"Wake phrase triggered: {match.group()}")
+                        # Found the wake word. Discard from the accumulator up to the start of the match
+                        # and modify the frame in place.
+                        p.state = WakeCheckFilter.WakeState.AWAKE
+                        p.wake_timer = time.time()
+                        frame.text = p.accumulator[match.start():]
+                        p.accumulator = ""
+                        await self.push_frame(frame)
+                    else:
+                        pass
+            else:
+                await self.push_frame(frame, direction)
+        except Exception as e:
+            error_msg = f"Error in wake word filter: {e}"
+            logger.error(error_msg)
+            await self.push_error(ErrorFrame(error_msg))
--- a/src/pipecat/processors/frame_processor.py
+++ b/src/pipecat/processors/frame_processor.py
@@ -8,7 +8,7 @@ import asyncio
 from asyncio import AbstractEventLoop
 from enum import Enum

-from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame
+from pipecat.frames.frames import ErrorFrame, Frame
 from pipecat.utils.utils import obj_count, obj_id

 from loguru import logger
--- a/src/pipecat/processors/logger.py
+++ b/src/pipecat/processors/logger.py
@@ -6,17 +6,22 @@

 from pipecat.frames.frames import Frame
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from loguru import logger
+from typing import Optional
+logger = logger.opt(ansi=True)


 class FrameLogger(FrameProcessor):
-    def __init__(self, prefix="Frame"):
+    def __init__(self, prefix="Frame", color: Optional[str] = None):
        super().__init__()
        self._prefix = prefix
+        self._color = color

    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        match direction:
-            case FrameDirection.UPSTREAM:
-                print(f"< {self._prefix}: {frame}")
-            case FrameDirection.DOWNSTREAM:
-                print(f"> {self._prefix}: {frame}")
+        dir = "<" if direction is FrameDirection.UPSTREAM else ">"
+        msg = f"{dir} {self._prefix}: {frame}"
+        if self._color:
+            msg = f"<{self._color}>{msg}</>"
+        logger.debug(msg)
+
        await self.push_frame(frame, direction)
--- a/src/pipecat/processors/utils/audio.py
+++ b/src/pipecat/processors/utils/audio.py
@@ -1,25 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-from typing import List
-
-from pipecat.frames.frames import AudioRawFrame
-
-
-def maybe_split_audio_frame(frame: AudioRawFrame, largest_write_size: int) -> List[AudioRawFrame]:
-    """Subdivide large audio frames to enable interruption."""
-    frames: List[AudioRawFrame] = []
-    if len(frame.audio) > largest_write_size:
-        for i in range(0, len(frame.audio), largest_write_size):
-            chunk = frame.audio[i: i + largest_write_size]
-            frames.append(
-                AudioRawFrame(
-                    audio=chunk,
-                    sample_rate=frame.sample_rate,
-                    num_channels=frame.num_channels))
-    else:
-        frames.append(frame)
-    return frames
--- a/src/pipecat/serializers/abstract_frame_serializer.py
+++ b/src/pipecat/serializers/abstract_frame_serializer.py
@@ -1,6 +1,6 @@
 from abc import abstractmethod

-from pipecat.pipeline.frames import Frame
+from pipecat.frames.frames import Frame


 class FrameSerializer:
--- a/src/pipecat/serializers/protobuf_serializer.py
+++ b/src/pipecat/serializers/protobuf_serializer.py
@@ -1,14 +1,14 @@
 import dataclasses
 from typing import Text
-from pipecat.pipeline.frames import AudioFrame, Frame, TextFrame, TranscriptionFrame
-import pipecat.pipeline.protobufs.frames_pb2 as frame_protos
+from pipecat.frames.frames import AudioRawFrame, Frame, TextFrame, TranscriptionFrame
+import pipecat.frames.protobufs.frames_pb2 as frame_protos
 from pipecat.serializers.abstract_frame_serializer import FrameSerializer


 class ProtobufFrameSerializer(FrameSerializer):
    SERIALIZABLE_TYPES = {
        TextFrame: "text",
-        AudioFrame: "audio",
+        AudioRawFrame: "audio",
        TranscriptionFrame: "transcription"
    }

--- a/src/pipecat/services/ai_services.py
+++ b/src/pipecat/services/ai_services.py
@@ -4,9 +4,7 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-import array
 import io
-import math
 import wave

 from abc import abstractmethod
@@ -24,6 +22,7 @@ from pipecat.frames.frames import (
    VisionImageRawFrame,
 )
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.utils.audio import calculate_audio_volume
 from pipecat.utils.utils import exp_smoothing


@@ -67,7 +66,7 @@ class TTSService(AIService):
        else:
            self._current_sentence += frame.text
            if self._current_sentence.strip().endswith((".", "?", "!")):
-                text = self._current_sentence
+                text = self._current_sentence.strip()
                self._current_sentence = ""

        if text:
@@ -96,13 +95,13 @@ class STTService(AIService):
    """STTService is a base class for speech-to-text services."""

    def __init__(self,
-                 min_rms: int = 100,
+                 min_volume: float = 0.6,
                 max_silence_secs: float = 0.3,
                 max_buffer_secs: float = 1.5,
                 sample_rate: int = 16000,
                 num_channels: int = 1):
        super().__init__()
-        self._min_rms = min_rms
+        self._min_volume = min_volume
        self._max_silence_secs = max_silence_secs
        self._max_buffer_secs = max_buffer_secs
        self._sample_rate = sample_rate
@@ -110,8 +109,8 @@ class STTService(AIService):
        (self._content, self._wave) = self._new_wave()
        self._silence_num_frames = 0
        # Volume exponential smoothing
-        self._smoothing_factor = 0.5
-        self._prev_rms = 1 - self._smoothing_factor
+        self._smoothing_factor = 0.4
+        self._prev_volume = 1 - self._smoothing_factor

    @abstractmethod
    async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
@@ -126,25 +125,20 @@ class STTService(AIService):
        ww.setframerate(self._sample_rate)
        return (content, ww)

-    def _get_smoothed_volume(self, audio: bytes, prev_rms: float, factor: float) -> float:
-        # https://docs.python.org/3/library/array.html
-        audio_array = array.array('h', audio)
-        squares = [sample**2 for sample in audio_array]
-        mean = sum(squares) / len(audio_array)
-        rms = math.sqrt(mean)
-        return exp_smoothing(rms, prev_rms, factor)
+    def _get_smoothed_volume(self, frame: AudioRawFrame) -> float:
+        volume = calculate_audio_volume(frame.audio, frame.sample_rate)
+        return exp_smoothing(volume, self._prev_volume, self._smoothing_factor)

    async def _append_audio(self, frame: AudioRawFrame):
        # Try to filter out empty background noise
-        # (Very rudimentary approach, can be improved)
-        rms = self._get_smoothed_volume(frame.audio, self._prev_rms, self._smoothing_factor)
-        if rms >= self._min_rms:
+        volume = self._get_smoothed_volume(frame)
+        if volume >= self._min_volume:
            # If volume is high enough, write new data to wave file
            self._wave.writeframes(frame.audio)
            self._silence_num_frames = 0
        else:
            self._silence_num_frames += frame.num_frames
-        self._prev_rms = rms
+        self._prev_volume = volume

        # If buffer is not empty and we have enough data or there's been a long
        # silence, transcribe the audio gathered so far.
--- a/src/pipecat/services/anthropic.py
+++ b/src/pipecat/services/anthropic.py
@@ -4,9 +4,24 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-from pipecat.frames.frames import Frame, LLMMessagesFrame, TextFrame
+import os
+import asyncio
+import time
+import base64
+
+from pipecat.frames.frames import (
+    Frame,
+    TextFrame,
+    VisionImageRawFrame,
+    LLMMessagesFrame,
+    LLMFullResponseStartFrame,
+    LLMResponseStartFrame,
+    LLMResponseEndFrame,
+    LLMFullResponseEndFrame
+)
 from pipecat.processors.frame_processor import FrameDirection
 from pipecat.services.ai_services import LLMService
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext, OpenAILLMContextFrame

 from loguru import logger

@@ -20,18 +35,98 @@ except ModuleNotFoundError as e:


 class AnthropicLLMService(LLMService):
+    """This class implements inference with Anthropic's AI models
+
+    This service translates internally from OpenAILLMContext to the messages format
+    expected by the Anthropic Python SDK. We are using the OpenAILLMContext as a lingua
+    franca for all LLM services, so that it is easy to switch between different LLMs.
+    """

    def __init__(
            self,
-            api_key,
-            model="claude-3-opus-20240229",
-            max_tokens=1024):
+            api_key: str,
+            model: str = "claude-3-opus-20240229",
+            max_tokens: int = 1024):
        super().__init__()
-        self.client = AsyncAnthropic(api_key=api_key)
-        self.model = model
-        self.max_tokens = max_tokens
+        self._client = AsyncAnthropic(api_key=api_key)
+        self._model = model
+        self._max_tokens = max_tokens
+
+    def _get_messages_from_openai_context(
+            self, context: OpenAILLMContext):
+        openai_messages = context.get_messages()
+        anthropic_messages = []
+
+        for message in openai_messages:
+            role = message["role"]
+            text = message["content"]
+            if role == "system":
+                role = "user"
+            if message.get("mime_type") == "image/jpeg":
+                # vision frame
+                encoded_image = base64.b64encode(message["data"].getvalue()).decode("utf-8")
+                anthropic_messages.append({
+                    "role": role,
+                    "content": [{
+                        "type": "image",
+                        "source": {
+                            "type": "base64",
+                            "media_type": message.get("mime_type"),
+                            "data": encoded_image,
+                        }
+                    }, {
+                        "type": "text",
+                        "text": text
+                    }]
+                })
+            else:
+                # text frame
+                anthropic_messages.append({"role": role, "content": content})
+
+        return anthropic_messages
+
+    async def _process_context(self, context: OpenAILLMContext):
+        await self.push_frame(LLMFullResponseStartFrame())
+        try:
+            logger.debug(f"Generating chat: {context.get_messages_json()}")
+
+            messages = self._get_messages_from_openai_context(context)
+
+            start_time = time.time()
+            response = await self._client.messages.create(
+                messages=messages,
+                model=self._model,
+                max_tokens=self._max_tokens,
+                stream=True)
+            logger.debug(f"Anthropic LLM TTFB: {time.time() - start_time}")
+            async for event in response:
+                # logger.debug(f"Anthropic LLM event: {event}")
+                if (event.type == "content_block_delta"):
+                    await self.push_frame(LLMResponseStartFrame())
+                    await self.push_frame(TextFrame(event.delta.text))
+                    await self.push_frame(LLMResponseEndFrame())
+
+        except Exception as e:
+            logger.error(f"Exception: {e}")
+        finally:
+            await self.push_frame(LLMFullResponseEndFrame())

    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        context = None
+
+        if isinstance(frame, OpenAILLMContextFrame):
+            context: OpenAILLMContext = frame.context
+        elif isinstance(frame, LLMMessagesFrame):
+            context = OpenAILLMContext.from_messages(frame.messages)
+        elif isinstance(frame, VisionImageRawFrame):
+            context = OpenAILLMContext.from_image_frame(frame)
+        else:
+            await self.push_frame(frame, direction)
+
+        if context:
+            await self._process_context(context)
+
+    async def x_process_frame(self, frame: Frame, direction: FrameDirection):
        if isinstance(frame, LLMMessagesFrame):
            stream = await self.client.messages.create(
                max_tokens=self.max_tokens,
--- a/src/pipecat/services/azure.py
+++ b/src/pipecat/services/azure.py
@@ -11,6 +11,7 @@ import io
 from PIL import Image
 from typing import AsyncGenerator

+from numpy import str_
 from openai import AsyncAzureOpenAI

 from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame, URLImageRawFrame
@@ -45,7 +46,7 @@ class AzureTTSService(TTSService):
        self._voice = voice

    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
-        logger.debug(f"Transcribing text: {text}")
+        logger.debug(f"Generating TTS: {text}")

        ssml = (
            "<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' "
@@ -73,17 +74,18 @@ class AzureLLMService(BaseOpenAILLMService):
    def __init__(
            self,
            *,
-            api_key,
-            endpoint,
-            api_version="2023-12-01-preview",
-            model):
-        super().__init__(api_key=api_key, model=model)
+            api_key: str,
+            endpoint: str,
+            model: str,
+            api_version: str = "2023-12-01-preview"):
+        # Initialize variables before calling parent __init__() because that
+        # will call create_client() and we need those values there.
        self._endpoint = endpoint
        self._api_version = api_version
-        self._model: str = model
+        super().__init__(api_key=api_key, model=model)

    def create_client(self, api_key=None, base_url=None):
-        self._client = AsyncAzureOpenAI(
+        return AsyncAzureOpenAI(
            api_key=api_key,
            azure_endpoint=self._endpoint,
            api_version=self._api_version,
@@ -95,12 +97,12 @@ class AzureImageGenServiceREST(ImageGenService):
    def __init__(
        self,
        *,
-        api_version="2023-06-01-preview",
-        image_size: str,
        aiohttp_session: aiohttp.ClientSession,
-        api_key,
-        endpoint,
-        model,
+        image_size: str,
+        api_key: str,
+        endpoint: str,
+        model: str,
+        api_version="2023-06-01-preview",
    ):
        super().__init__()

--- a/src/pipecat/services/cartesia.py
+++ b/src/pipecat/services/cartesia.py
@@ -0,0 +1,56 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+from cartesia.tts import AsyncCartesiaTTS
+
+import time
+from typing import AsyncGenerator
+
+from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame
+from pipecat.services.ai_services import TTSService
+
+from loguru import logger
+
+
+class CartesiaTTSService(TTSService):
+
+    def __init__(
+            self,
+            *,
+            api_key: str,
+            voice_name: str,
+            **kwargs):
+        super().__init__(**kwargs)
+
+        self._api_key = api_key
+        self._voice_name = voice_name
+
+        self._client = None
+
+    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
+        logger.debug(f"Transcribing text: [{text}]")
+
+        try:
+            if self._client is None:
+                self._client = AsyncCartesiaTTS(api_key=self._api_key)
+                voices = self._client.get_voices()
+                self._voice_id = voices[self._voice_name]["id"]
+                self._voice = self._client.get_voice_embedding(voice_id=self._voice_id)
+
+            chunk_generator = await self._client.generate(
+                transcript=text, voice=self._voice, stream=True,
+                model_id="upbeat-moon", data_rtype='array', output_format='pcm_16000',
+                # a chunk_time of 0.1 seems to be the default. there are small audio pops/gaps which
+                # we need to debug
+                chunk_time=0.1
+            )
+
+            async for chunk in chunk_generator:
+                # print(f"")
+                frame = AudioRawFrame(chunk['audio'], 16000, 1)
+                yield frame
+        except Exception as e:
+            logger.error(f"Exception {e}")
--- a/src/pipecat/services/deepgram.py
+++ b/src/pipecat/services/deepgram.py
@@ -8,7 +8,7 @@ import aiohttp

 from typing import AsyncGenerator

-from pipecat.frames.frames import AudioRawFrame, Frame
+from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame
 from pipecat.services.ai_services import TTSService

 from loguru import logger
@@ -21,7 +21,7 @@ class DeepgramTTSService(TTSService):
            *,
            aiohttp_session: aiohttp.ClientSession,
            api_key: str,
-            voice: str = "alpha-asteria-en-v2",
+            voice: str = "aura-helios-en",
            **kwargs):
        super().__init__(**kwargs)

@@ -31,11 +31,22 @@ class DeepgramTTSService(TTSService):

    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
        logger.info(f"Running Deepgram TTS for {text}")
-        base_url = "https://api.beta.deepgram.com/v1/speak"
-        request_url = f"{base_url}?model={self._voice}&encoding=linear16&container=none&sample_rate=16000"
+        base_url = "https://api.deepgram.com/v1/speak"
+        request_url = f"{base_url}?model = {
+            self._voice} & encoding = linear16 & container = none & sample_rate = 16000"
        headers = {"authorization": f"token {self._api_key}"}
        body = {"text": text}
-        async with self._aiohttp_session.post(request_url, headers=headers, json=body) as r:
-            async for data in r.content:
-                frame = AudioRawFrame(audio=data, sample_rate=16000, num_channels=1)
-                yield frame
+
+        try:
+            async with self._aiohttp_session.post(request_url, headers=headers, json=body) as r:
+                if r.status != 200:
+                    text = await r.text()
+                    logger.error(f"Error getting audio (status: {r.status}, error: {text})")
+                    yield ErrorFrame(f"Error getting audio (status: {r.status}, error: {text})")
+                    return
+
+                async for data in r.content:
+                    frame = AudioRawFrame(audio=data, sample_rate=16000, num_channels=1)
+                    yield frame
+        except Exception as e:
+            logger.error(f"Exception {e}")
--- a/src/pipecat/services/elevenlabs.py
+++ b/src/pipecat/services/elevenlabs.py
@@ -8,7 +8,7 @@ import aiohttp

 from typing import AsyncGenerator

-from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame, TTSStartedFrame, TTSStoppedFrame, TextFrame
+from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame
 from pipecat.services.ai_services import TTSService

 from loguru import logger
@@ -32,7 +32,7 @@ class ElevenLabsTTSService(TTSService):
        self._model = model

    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
-        logger.debug(f"Transcribing text: {text}")
+        logger.debug(f"Generating TTS: [{text}]")

        url = f"https://api.elevenlabs.io/v1/text-to-speech/{self._voice_id}/stream"

@@ -49,8 +49,9 @@ class ElevenLabsTTSService(TTSService):

        async with self._aiohttp_session.post(url, json=payload, headers=headers, params=querystring) as r:
            if r.status != 200:
-                logger.error(f"Audio fetch status code: {r.status}, error: {r.text}")
-                yield ErrorFrame(f"Audio fetch status code: {r.status}, error: {r.text}")
+                text = await r.text()
+                logger.error(f"Error getting audio (status: {r.status}, error: {text})")
+                yield ErrorFrame(f"Error getting audio (status: {r.status}, error: {text})")
                return

            async for chunk in r.content:
--- a/src/pipecat/services/fireworks.py
+++ b/src/pipecat/services/fireworks.py
@@ -19,6 +19,6 @@ except ModuleNotFoundError as e:

 class FireworksLLMService(BaseOpenAILLMService):
    def __init__(self,
-                 model="accounts/fireworks/models/firefunction-v1",
-                 base_url="https://api.fireworks.ai/inference/v1"):
+                 model: str = "accounts/fireworks/models/firefunction-v1",
+                 base_url: str = "https://api.fireworks.ai/inference/v1"):
        super().__init__(model, base_url)
--- a/src/pipecat/services/google.py
+++ b/src/pipecat/services/google.py
@@ -40,14 +40,10 @@ class GoogleLLMService(LLMService):
    franca for all LLM services, so that it is easy to switch between different LLMs.
    """

-    def __init__(self, model="gemini-1.5-flash-latest", api_key=None, **kwargs):
+    def __init__(self, api_key: str, model: str = "gemini-1.5-flash-latest", **kwargs):
        super().__init__(**kwargs)
-        self.model = model
-        gai.configure(api_key=api_key or os.environ["GOOGLE_API_KEY"])
-        self.create_client()
-
-    def create_client(self):
-        self._client = gai.GenerativeModel(self.model)
+        gai.configure(api_key=api_key)
+        self._client = gai.GenerativeModel(model)

    def _get_messages_from_openai_context(
            self, context: OpenAILLMContext) -> List[glm.Content]:
@@ -90,9 +86,18 @@ class GoogleLLMService(LLMService):
            logger.debug(f"Google LLM TTFB: {time.time() - start_time}")

            async for chunk in self._async_generator_wrapper(response):
-                await self.push_frame(LLMResponseStartFrame())
-                await self.push_frame(TextFrame(chunk.text))
-                await self.push_frame(LLMResponseEndFrame())
+                try:
+                    text = chunk.text
+                    await self.push_frame(LLMResponseStartFrame())
+                    await self.push_frame(TextFrame(text))
+                    await self.push_frame(LLMResponseEndFrame())
+                except Exception as e:
+                    # Google LLMs seem to flag safety issues a lot!
+                    if chunk.candidates[0].finish_reason == 3:
+                        logger.debug(
+                            f"LLM refused to generate content for safety reasons - {messages}.")
+                    else:
+                        logger.error(f"Error {e}")

        except Exception as e:
            logger.error(f"Exception: {e}")
--- a/src/pipecat/services/ollama.py
+++ b/src/pipecat/services/ollama.py
@@ -9,5 +9,5 @@ from pipecat.services.openai import BaseOpenAILLMService

 class OLLamaLLMService(BaseOpenAILLMService):

-    def __init__(self, model="llama2", base_url="http://localhost:11434/v1"):
+    def __init__(self, model: str = "llama2", base_url: str = "http://localhost:11434/v1"):
        super().__init__(model=model, base_url=base_url, api_key="ollama")
--- a/src/pipecat/services/openai.py
+++ b/src/pipecat/services/openai.py
@@ -29,7 +29,12 @@ from pipecat.frames.frames import (
 from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext, OpenAILLMContextFrame
 from pipecat.processors.frame_processor import FrameDirection
 from pipecat.services.ai_services import LLMService, ImageGenService
-
+from openai.types.chat import (
+    ChatCompletionSystemMessageParam,
+    ChatCompletionFunctionMessageParam,
+    ChatCompletionToolParam,
+    ChatCompletionUserMessageParam,
+)
 from loguru import logger

 try:
@@ -47,6 +52,10 @@ except ModuleNotFoundError as e:
    raise Exception(f"Missing module: {e}")


+class OpenAIUnhandledFunctionException(BaseException):
+    pass
+
+
 class BaseOpenAILLMService(LLMService):
    """This is the base for all services that use the AsyncOpenAI client.

@@ -60,10 +69,23 @@ class BaseOpenAILLMService(LLMService):
    def __init__(self, model: str, api_key=None, base_url=None):
        super().__init__()
        self._model: str = model
-        self.create_client(api_key=api_key, base_url=base_url)
+        self._client = self.create_client(api_key=api_key, base_url=base_url)
+        self._callbacks = {}
+        self._start_callbacks = {}

    def create_client(self, api_key=None, base_url=None):
-        self._client = AsyncOpenAI(api_key=api_key, base_url=base_url)
+        return AsyncOpenAI(api_key=api_key, base_url=base_url)
+
+    # TODO-CB: callback function type
+    def register_function(self, function_name, callback, start_callback=None):
+        self._callbacks[function_name] = callback
+        if start_callback:
+            self._start_callbacks[function_name] = start_callback
+
+    def unregister_function(self, function_name):
+        del self._callbacks[function_name]
+        if self._start_callbacks[function_name]:
+            del self._start_callbacks[function_name]

    async def _stream_chat_completions(
        self, context: OpenAILLMContext
@@ -111,13 +133,12 @@ class BaseOpenAILLMService(LLMService):
    async def _process_context(self, context: OpenAILLMContext):
        function_name = ""
        arguments = ""
+        tool_call_id = ""

        chunk_stream: AsyncStream[ChatCompletionChunk] = (
            await self._stream_chat_completions(context)
        )

-        await self.push_frame(LLMFullResponseStartFrame())
-
        async for chunk in chunk_stream:
            if len(chunk.choices) == 0:
                continue
@@ -137,23 +158,77 @@ class BaseOpenAILLMService(LLMService):
                tool_call = chunk.choices[0].delta.tool_calls[0]
                if tool_call.function and tool_call.function.name:
                    function_name += tool_call.function.name
-                    # yield LLMFunctionStartFrame(function_name=tool_call.function.name)
+                    tool_call_id = tool_call.id
+                    # only send a function start frame if we're not handling the function call
+                    if function_name in self._callbacks.keys():
+                        if function_name in self._start_callbacks.keys():
+                            await self._start_callbacks[function_name](self)
                if tool_call.function and tool_call.function.arguments:
-                    # Keep iterating through the response to collect all the argument fragments and
-                    # yield a complete LLMFunctionCallFrame after run_llm_async
-                    # completes
+                    # Keep iterating through the response to collect all the argument fragments
                    arguments += tool_call.function.arguments
            elif chunk.choices[0].delta.content:
                await self.push_frame(LLMResponseStartFrame())
                await self.push_frame(TextFrame(chunk.choices[0].delta.content))
                await self.push_frame(LLMResponseEndFrame())

-        await self.push_frame(LLMFullResponseEndFrame())
+        # if we got a function name and arguments, check to see if it's a function with
+        # a registered handler. If so, run the registered callback, save the result to
+        # the context, and re-prompt to get a chat answer. If we don't have a registered
+        # handler, raise an exception.
+        if function_name and arguments:
+            if function_name in self._callbacks.keys():
+                await self._handle_function_call(context, tool_call_id, function_name, arguments)

-        # if we got a function name and arguments, yield the frame with all the info so
-        # frame consumers can take action based on the function call.
-        # if function_name and arguments:
-        #     yield LLMFunctionCallFrame(function_name=function_name, arguments=arguments)
+            else:
+                raise OpenAIUnhandledFunctionException(
+                    f"The LLM tried to call a function named '{function_name}', but there isn't a callback registered for that function.")
+
+    async def _handle_function_call(
+            self,
+            context,
+            tool_call_id,
+            function_name,
+            arguments
+    ):
+        arguments = json.loads(arguments)
+        result = await self._callbacks[function_name](self, arguments)
+        arguments = json.dumps(arguments)
+        if isinstance(result, (str, dict)):
+            # Handle it in "full magic mode"
+            tool_call = ChatCompletionFunctionMessageParam({
+                "role": "assistant",
+                "tool_calls": [
+                    {
+                        "id": tool_call_id,
+                        "function": {
+                            "arguments": arguments,
+                            "name": function_name
+                        },
+                        "type": "function"
+                    }
+                ]
+
+            })
+            context.add_message(tool_call)
+            if isinstance(result, dict):
+                result = json.dumps(result)
+            tool_result = ChatCompletionToolParam({
+                "tool_call_id": tool_call_id,
+                "role": "tool",
+                "content": result
+            })
+            context.add_message(tool_result)
+            # re-prompt to get a human answer
+            await self._process_context(context)
+        elif isinstance(result, list):
+            # reduced magic
+            for msg in result:
+                context.add_message(msg)
+            await self._process_context(context)
+        elif isinstance(result, type(None)):
+            pass
+        else:
+            raise BaseException(f"Unknown return type from function callback: {type(result)}")

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        context = None
@@ -167,7 +242,9 @@ class BaseOpenAILLMService(LLMService):
            await self.push_frame(frame, direction)

        if context:
+            await self.push_frame(LLMFullResponseStartFrame())
            await self._process_context(context)
+            await self.push_frame(LLMFullResponseEndFrame())


 class OpenAILLMService(BaseOpenAILLMService):
--- a/src/pipecat/transports/base_input.py
+++ b/src/pipecat/transports/base_input.py
@@ -7,6 +7,8 @@
 import asyncio
 import queue

+from concurrent.futures import ThreadPoolExecutor
+
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.frames.frames import (
    AudioRawFrame,
@@ -34,7 +36,9 @@ class BaseInputTransport(FrameProcessor):
        self._running = False
        self._allow_interruptions = False

-        # Start media threads.
+        self._in_executor = ThreadPoolExecutor(max_workers=5)
+
+        # Create audio input queue if needed.
        if self._params.audio_in_enabled or self._params.vad_enabled:
            self._audio_in_queue = queue.Queue()

@@ -55,8 +59,10 @@ class BaseInputTransport(FrameProcessor):

        if self._params.audio_in_enabled or self._params.vad_enabled:
            loop = self.get_event_loop()
-            self._audio_in_thread = loop.run_in_executor(None, self._audio_in_thread_handler)
-            self._audio_out_thread = loop.run_in_executor(None, self._audio_out_thread_handler)
+            self._audio_in_thread = loop.run_in_executor(
+                self._in_executor, self._audio_in_thread_handler)
+            self._audio_out_thread = loop.run_in_executor(
+                self._in_executor, self._audio_out_thread_handler)

    async def stop(self):
        if not self._running:
@@ -131,10 +137,12 @@ class BaseInputTransport(FrameProcessor):
        if self._allow_interruptions:
            # Make sure we notify about interruptions quickly out-of-band
            if isinstance(frame, UserStartedSpeakingFrame):
+                logger.debug("User started speaking")
                self._push_frame_task.cancel()
                self._create_push_task()
                await self.push_frame(StartInterruptionFrame())
            elif isinstance(frame, UserStoppedSpeakingFrame):
+                logger.debug("User stopped speaking")
                await self.push_frame(StopInterruptionFrame())
        await self._internal_push_frame(frame)

--- a/src/pipecat/transports/base_output.py
+++ b/src/pipecat/transports/base_output.py
@@ -11,6 +11,8 @@ import queue
 import time
 import threading

+from concurrent.futures import ThreadPoolExecutor
+
 from PIL import Image
 from typing import List

@@ -41,6 +43,8 @@ class BaseOutputTransport(FrameProcessor):
        self._running = False
        self._allow_interruptions = False

+        self._out_executor = ThreadPoolExecutor(max_workers=5)
+
        # These are the images that we should send to the camera at our desired
        # framerate.
        self._camera_images = None
@@ -67,9 +71,10 @@ class BaseOutputTransport(FrameProcessor):
        loop = self.get_event_loop()

        if self._params.camera_out_enabled:
-            self._camera_out_thread = loop.run_in_executor(None, self._camera_out_thread_handler)
+            self._camera_out_thread = loop.run_in_executor(
+                self._out_executor, self._camera_out_thread_handler)

-        self._sink_thread = loop.run_in_executor(None, self._sink_thread_handler)
+        self._sink_thread = loop.run_in_executor(self._out_executor, self._sink_thread_handler)

        # Create push frame task. This is the task that will push frames in
        # order. We also guarantee that all frames are pushed in the same task.
@@ -153,7 +158,6 @@ class BaseOutputTransport(FrameProcessor):
        while self._running:
            try:
                frame = self._sink_queue.get(timeout=1)
-
                if not self._is_interrupted.is_set():
                    if isinstance(frame, AudioRawFrame):
                        if self._params.audio_out_enabled:
@@ -170,8 +174,7 @@ class BaseOutputTransport(FrameProcessor):
                            self._internal_push_frame(frame), self.get_event_loop())
                        future.result()
                else:
-                    # Send any remaining audio
-                    self._send_audio_truncated(buffer, bytes_size_10ms)
+                    # If we get interrupted just clear the output buffer.
                    buffer = bytearray()

                if isinstance(frame, EndFrame):
@@ -248,6 +251,8 @@ class BaseOutputTransport(FrameProcessor):
                    image = next(self._camera_images)
                    self._draw_image(image)
                    time.sleep(1.0 / self._params.camera_out_framerate)
+                else:
+                    time.sleep(1.0 / self._params.camera_out_framerate)
            except queue.Empty:
                pass
            except Exception as e:
--- a/src/pipecat/transports/services/daily.py
+++ b/src/pipecat/transports/services/daily.py
@@ -4,7 +4,9 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+import aiohttp
 import asyncio
+from concurrent.futures import ThreadPoolExecutor
 import inspect
 import queue
 import time
@@ -15,8 +17,8 @@ from functools import partial
 from typing import Any, Callable, Mapping

 from daily import (
-    CallClient,
    Daily,
+    CallClient,
    EventHandler,
    VirtualCameraDevice,
    VirtualMicrophoneDevice,
@@ -80,6 +82,11 @@ class WebRTCVADAnalyzer(VADAnalyzer):
        return confidence


+class DailyDialinSettings(BaseModel):
+    call_id: str = ""
+    call_domain: str = ""
+
+
 class DailyTranscriptionSettings(BaseModel):
    language: str = "en"
    tier: str = "nova"
@@ -95,6 +102,9 @@ class DailyTranscriptionSettings(BaseModel):


 class DailyParams(TransportParams):
+    api_url: str = "https://api.daily.co"
+    api_key: str = ""
+    dialin_settings: DailyDialinSettings | None = None
    transcription_enabled: bool = False
    transcription_settings: DailyTranscriptionSettings = DailyTranscriptionSettings()

@@ -102,11 +112,17 @@ class DailyParams(TransportParams):
 class DailyCallbacks(BaseModel):
    on_joined: Callable[[Mapping[str, Any]], None]
    on_left: Callable[[], None]
+    on_error: Callable[[str], None]
+    on_app_message: Callable[[Any, str], None]
+    on_call_state_updated: Callable[[str], None]
+    on_dialin_ready: Callable[[str], None]
+    on_dialout_connected: Callable[[Any], None]
+    on_dialout_stopped: Callable[[Any], None]
+    on_dialout_error: Callable[[Any], None]
+    on_dialout_warning: Callable[[Any], None]
+    on_first_participant_joined: Callable[[Mapping[str, Any]], None]
    on_participant_joined: Callable[[Mapping[str, Any]], None]
    on_participant_left: Callable[[Mapping[str, Any], str], None]
-    on_first_participant_joined: Callable[[Mapping[str, Any]], None]
-    on_app_message: Callable[[Any, str], None]
-    on_error: Callable[[str], None]


 class DailyTransportClient(EventHandler):
@@ -146,6 +162,8 @@ class DailyTransportClient(EventHandler):
        self._leaving = False
        self._sync_response = {k: queue.Queue() for k in ["join", "leave"]}

+        self._executor = ThreadPoolExecutor(max_workers=5)
+
        self._client: CallClient = CallClient(event_handler=self)

        self._camera: VirtualCameraDevice = Daily.create_camera_device(
@@ -195,7 +213,7 @@ class DailyTransportClient(EventHandler):
        self._joining = True

        loop = asyncio.get_running_loop()
-        await loop.run_in_executor(None, self._join)
+        await loop.run_in_executor(self._executor, self._join)

    def _join(self):
        logger.info(f"Joining {self._room_url}")
@@ -287,7 +305,7 @@ class DailyTransportClient(EventHandler):
        self._leaving = True

        loop = asyncio.get_running_loop()
-        await loop.run_in_executor(None, self._leave)
+        await loop.run_in_executor(self._executor, self._leave)

    def _leave(self):
        logger.info(f"Leaving {self._room_url}")
@@ -318,13 +336,25 @@ class DailyTransportClient(EventHandler):

    async def cleanup(self):
        loop = asyncio.get_running_loop()
-        await loop.run_in_executor(None, self._cleanup)
+        await loop.run_in_executor(self._executor, self._cleanup)

    def _cleanup(self):
        if self._client:
            self._client.release()
            self._client = None

+    def start_dialout(self, settings):
+        self._client.start_dialout(settings)
+
+    def stop_dialout(self, participant_id):
+        self._client.stop_dialout(participant_id)
+
+    def start_recording(self, streaming_settings, stream_id, force_new):
+        self._client.start_recording(streaming_settings, stream_id, force_new)
+
+    def stop_recording(self, stream_id):
+        self._client.stop_recording(stream_id)
+
    def capture_participant_transcription(self, participant_id: str, callback: Callable):
        if not self._params.transcription_enabled:
            return
@@ -358,6 +388,27 @@ class DailyTransportClient(EventHandler):
    # Daily (EventHandler)
    #

+    def on_app_message(self, message: Any, sender: str):
+        self._callbacks.on_app_message(message, sender)
+
+    def on_call_state_updated(self, state: str):
+        self._callbacks.on_call_state_updated(state)
+
+    def on_dialin_ready(self, sip_endpoint: str):
+        self._callbacks.on_dialin_ready(sip_endpoint)
+
+    def on_dialout_connected(self, data: Any):
+        self._callbacks.on_dialout_connected(data)
+
+    def on_dialout_stopped(self, data: Any):
+        self._callbacks.on_dialout_stopped(data)
+
+    def on_dialout_error(self, data: Any):
+        self._callbacks.on_dialout_error(data)
+
+    def on_dialout_warning(self, data: Any):
+        self._callbacks.on_dialout_warning(data)
+
    def on_participant_joined(self, participant):
        id = participant["id"]
        logger.info(f"Participant joined {id}")
@@ -392,9 +443,6 @@ class DailyTransportClient(EventHandler):
    def on_transcription_stopped(self, stopped_by, stopped_by_error):
        logger.debug("Transcription stopped")

-    def on_app_message(self, message: Any, sender: str):
-        self._callbacks.on_app_message(message, sender)
-
    #
    # Daily (CallClient callbacks)
    #
@@ -438,7 +486,8 @@ class DailyInputTransport(BaseInputTransport):
        await super().start(frame)
        # Create camera in thread (runs if _running is true).
        loop = asyncio.get_running_loop()
-        self._camera_in_thread = loop.run_in_executor(None, self._camera_in_thread_handler)
+        self._camera_in_thread = loop.run_in_executor(
+            self._in_executor, self._camera_in_thread_handler)

    async def stop(self):
        if not self._running:
@@ -597,11 +646,17 @@ class DailyTransport(BaseTransport):
        callbacks = DailyCallbacks(
            on_joined=self._on_joined,
            on_left=self._on_left,
+            on_error=self._on_error,
+            on_app_message=self._on_app_message,
+            on_call_state_updated=self._on_call_state_updated,
+            on_dialin_ready=self._on_dialin_ready,
+            on_dialout_connected=self._on_dialout_connected,
+            on_dialout_stopped=self._on_dialout_stopped,
+            on_dialout_error=self._on_dialout_error,
+            on_dialout_warning=self._on_dialout_warning,
            on_first_participant_joined=self._on_first_participant_joined,
            on_participant_joined=self._on_participant_joined,
            on_participant_left=self._on_participant_left,
-            on_app_message=self._on_app_message,
-            on_error=self._on_error,
        )
        self._params = params

@@ -616,9 +671,16 @@ class DailyTransport(BaseTransport):
        # these handlers.
        self._register_event_handler("on_joined")
        self._register_event_handler("on_left")
+        self._register_event_handler("on_app_message")
+        self._register_event_handler("on_call_state_updated")
+        self._register_event_handler("on_dialin_ready")
+        self._register_event_handler("on_dialout_connected")
+        self._register_event_handler("on_dialout_stopped")
+        self._register_event_handler("on_dialout_error")
+        self._register_event_handler("on_dialout_warning")
+        self._register_event_handler("on_first_participant_joined")
        self._register_event_handler("on_participant_joined")
        self._register_event_handler("on_participant_left")
-        self._register_event_handler("on_first_participant_joined")

    #
    # BaseTransport
@@ -650,6 +712,18 @@ class DailyTransport(BaseTransport):
        if self._output:
            await self._output.process_frame(frame, FrameDirection.DOWNSTREAM)

+    def start_dialout(self, settings=None):
+        self._client.start_dialout(settings)
+
+    def stop_dialout(self, participant_id):
+        self._client.stop_dialout(participant_id)
+
+    def start_recording(self, streaming_settings=None, stream_id=None, force_new=None):
+        self._client.start_recording(streaming_settings, stream_id, force_new)
+
+    def stop_recording(self, stream_id=None):
+        self._client.stop_recording(stream_id)
+
    def capture_participant_transcription(self, participant_id: str):
        self._client.capture_participant_transcription(
            participant_id,
@@ -677,6 +751,62 @@ class DailyTransport(BaseTransport):
        # the client should report the error.
        pass

+    def _on_app_message(self, message: Any, sender: str):
+        if self._input:
+            self._input.push_app_message(message, sender)
+        self.on_app_message(message, sender)
+
+    def _on_call_state_updated(self, state: str):
+        self.on_call_state_updated(state)
+
+    async def _handle_dialin_ready(self, sip_endpoint: str):
+        if not self._params.dialin_settings:
+            return
+
+        async with aiohttp.ClientSession() as session:
+            headers = {
+                "Authorization": f"Bearer {self._params.api_key}",
+                "Content-Type": "application/x-www-form-urlencoded"
+            }
+            data = {
+                "callId": self._params.dialin_settings.call_id,
+                "callDomain": self._params.dialin_settings.call_domain,
+                "sipUri": sip_endpoint
+            }
+
+            url = f"{self._params.api_url}/dialin/pinlessCallUpdate"
+
+            try:
+                async with session.post(url, headers=headers, data=data, timeout=10) as r:
+                    if r.status != 200:
+                        text = await r.text()
+                        logger.error(
+                            f"Unable to handle dialin-ready event (status: {r.status}, error: {text})")
+                        return
+
+                    logger.debug("Event dialin-ready was handled successfully")
+            except asyncio.TimeoutError:
+                logger.error(f"Timeout handling dialin-ready event ({url})")
+            except BaseException as e:
+                logger.error(f"Error handling dialin-ready event ({url}): {e}")
+
+    def _on_dialin_ready(self, sip_endpoint):
+        if self._params.dialin_settings:
+            asyncio.run_coroutine_threadsafe(self._handle_dialin_ready(sip_endpoint), self._loop)
+        self.on_dialin_ready(sip_endpoint)
+
+    def _on_dialout_connected(self, data):
+        self.on_dialout_connected(data)
+
+    def _on_dialout_stopped(self, data):
+        self.on_dialout_stopped(data)
+
+    def _on_dialout_error(self, data):
+        self.on_dialout_error(data)
+
+    def _on_dialout_warning(self, data):
+        self.on_dialout_warning(data)
+
    def _on_participant_joined(self, participant):
        self.on_participant_joined(participant)

@@ -686,16 +816,13 @@ class DailyTransport(BaseTransport):
    def _on_first_participant_joined(self, participant):
        self.on_first_participant_joined(participant)

-    def _on_app_message(self, message: Any, sender: str):
-        if self._input:
-            self._input.push_app_message(message, sender)
-
    def _on_transcription_message(self, participant_id, message):
        text = message["text"]
        timestamp = message["timestamp"]
        is_final = message["rawResponse"]["is_final"]
        if is_final:
            frame = TranscriptionFrame(text, participant_id, timestamp)
+            logger.debug(f"Transcription (from: {participant_id}): [{text}]")
        else:
            frame = InterimTranscriptionFrame(text, participant_id, timestamp)

@@ -712,15 +839,36 @@ class DailyTransport(BaseTransport):
    def on_left(self):
        pass

+    def on_app_message(self, message, sender):
+        pass
+
+    def on_call_state_updated(self, state):
+        pass
+
+    def on_dialin_ready(self, sip_endpoint):
+        pass
+
+    def on_dialout_connected(self, data):
+        pass
+
+    def on_dialout_stopped(self, data):
+        pass
+
+    def on_dialout_error(self, data):
+        pass
+
+    def on_dialout_warning(self, data):
+        pass
+
+    def on_first_participant_joined(self, participant):
+        pass
+
    def on_participant_joined(self, participant):
        pass

    def on_participant_left(self, participant, reason):
        pass

-    def on_first_participant_joined(self, participant):
-        pass
-
    def event_handler(self, event_name: str):
        def decorator(handler):
            self._add_event_handler(event_name, handler)
@@ -760,8 +908,5 @@ class DailyTransport(BaseTransport):
            logger.error(f"Exception in event handler {event_name}: {e}")
            raise e

-    #     def dialout(self, number):
-    #         self.client.start_dialout({"phoneNumber": number})
-
    #     def start_recording(self):
    #         self.client.start_recording()
--- a/src/pipecat/utils/audio.py
+++ b/src/pipecat/utils/audio.py
@@ -0,0 +1,33 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import numpy as np
+import pyloudnorm as pyln
+
+
+def normalize_value(value, min_value, max_value):
+    normalized = (value - min_value) / (max_value - min_value)
+    normalized_clamped = max(0, min(1, normalized))
+    return normalized_clamped
+
+
+def calculate_audio_volume(audio: bytes, sample_rate: int) -> float:
+    audio_np = np.frombuffer(audio, dtype=np.int16)
+    audio_float = audio_np.astype(np.float64)
+
+    block_size = audio_np.size / sample_rate
+    meter = pyln.Meter(sample_rate, block_size=block_size)
+    loudness = meter.integrated_loudness(audio_float)
+
+    # Loudness goes from -20 to 80 (more or less), where -20 is quiet and 80 is
+    # loud.
+    loudness = normalize_value(loudness, -20, 80)
+
+    return loudness
+
+
+def exp_smoothing(value: float, prev_value: float, factor: float) -> float:
+    return prev_value + factor * (value - prev_value)
--- a/src/pipecat/utils/test_frame_processor.py
+++ b/src/pipecat/utils/test_frame_processor.py
@@ -0,0 +1,41 @@
+from typing import List
+from pipecat.processors.frame_processor import FrameProcessor
+
+
+class TestException(BaseException):
+    pass
+
+
+class TestFrameProcessor(FrameProcessor):
+    def __init__(self, test_frames):
+        self.test_frames = test_frames
+        self._list_counter = 0
+        super().__init__()
+
+    async def process_frame(self, frame, direction):
+        if not self.test_frames[0]:  # then we've run out of required frames but the generator is still going?
+            raise TestException(f"Oops, got an extra frame, {frame}")
+        if isinstance(self.test_frames[0], List):
+            # We need to consume frames until we see the next frame type after this
+            next_frame = self.test_frames[1]
+            if isinstance(frame, next_frame):
+                # we're done iterating the list I guess
+                print(f"TestFrameProcessor got expected list exit frame: {frame}")
+                # pop twice to get rid of the list, as well as the next frame
+                self.test_frames.pop(0)
+                self.test_frames.pop(0)
+                self.list_counter = 0
+            else:
+                fl = self.test_frames[0]
+                fl_el = fl[self._list_counter % len(fl)]
+                if isinstance(frame, fl_el):
+                    print(f"TestFrameProcessor got expected list frame: {frame}")
+                    self._list_counter += 1
+                else:
+                    raise TestException(f"Inside a list, expected {fl_el} but got {frame}")
+
+        else:
+            if not isinstance(frame, self.test_frames[0]):
+                raise TestException(f"Expected {self.test_frames[0]}, but got {frame}")
+            print(f"TestFrameProcessor got expected frame: {frame}")
+            self.test_frames.pop(0)
--- a/src/pipecat/vad/silero.py
+++ b/src/pipecat/vad/silero.py
@@ -37,8 +37,6 @@ class SileroVADAnalyzer(VADAnalyzer):
            repo_or_dir="snakers4/silero-vad", model="silero_vad", force_reload=False
        )

-        self._processor_vad_state: VADState = VADState.QUIET
-
        logger.debug("Loaded Silero VAD")

    #
@@ -73,6 +71,8 @@ class SileroVAD(FrameProcessor):
        self._vad_analyzer = SileroVADAnalyzer(sample_rate=sample_rate, params=vad_params)
        self._audio_passthrough = audio_passthrough

+        self._processor_vad_state: VADState = VADState.QUIET
+
    #
    # FrameProcessor
    #
--- a/src/pipecat/vad/vad_analyzer.py
+++ b/src/pipecat/vad/vad_analyzer.py
@@ -4,15 +4,12 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-import array
-import math
-
 from abc import abstractmethod
 from enum import Enum

 from pydantic.main import BaseModel

-from pipecat.utils.utils import exp_smoothing
+from pipecat.utils.audio import calculate_audio_volume, exp_smoothing


 class VADState(Enum):
@@ -26,13 +23,14 @@ class VADParams(BaseModel):
    confidence: float = 0.6
    start_secs: float = 0.2
    stop_secs: float = 0.8
-    min_rms: int = 1000
+    min_volume: float = 0.6


 class VADAnalyzer:

    def __init__(self, sample_rate: int, num_channels: int, params: VADParams):
        self._sample_rate = sample_rate
+        self._num_channels = num_channels
        self._params = params
        self._vad_frames = self.num_frames_required()
        self._vad_frames_num_bytes = self._vad_frames * num_channels * 2
@@ -48,8 +46,8 @@ class VADAnalyzer:
        self._vad_buffer = b""

        # Volume exponential smoothing
-        self._smoothing_factor = 0.5
-        self._prev_rms = 1 - self._smoothing_factor
+        self._smoothing_factor = 0.4
+        self._prev_volume = 1 - self._smoothing_factor

    @property
    def sample_rate(self):
@@ -63,13 +61,9 @@ class VADAnalyzer:
    def voice_confidence(self, buffer) -> float:
        pass

-    def _get_smoothed_volume(self, audio: bytes, prev_rms: float, factor: float) -> float:
-        # https://docs.python.org/3/library/array.html
-        audio_array = array.array('h', audio)
-        squares = [sample**2 for sample in audio_array]
-        mean = sum(squares) / len(audio_array)
-        rms = math.sqrt(mean)
-        return exp_smoothing(rms, prev_rms, factor)
+    def _get_smoothed_volume(self, audio: bytes) -> float:
+        volume = calculate_audio_volume(audio, self._sample_rate)
+        return exp_smoothing(volume, self._prev_volume, self._smoothing_factor)

    def analyze_audio(self, buffer) -> VADState:
        self._vad_buffer += buffer
@@ -82,10 +76,11 @@ class VADAnalyzer:
        self._vad_buffer = self._vad_buffer[num_required_bytes:]

        confidence = self.voice_confidence(audio_frames)
-        rms = self._get_smoothed_volume(audio_frames, self._prev_rms, self._smoothing_factor)
-        self._prev_rms = rms

-        speaking = confidence >= self._params.confidence and rms >= self._params.min_rms
+        volume = self._get_smoothed_volume(audio_frames)
+        self._prev_volume = volume
+
+        speaking = confidence >= self._params.confidence and volume >= self._params.min_volume

        if speaking:
            match self._vad_state:
--- a/tests/integration/integration_azure_llm.py
+++ b/tests/integration/integration_azure_llm.py
@@ -1,8 +1,8 @@
 import asyncio
 import os
-from pipecat.pipeline.openai_frames import OpenAILLMContextFrame
-from pipecat.services.azure_ai_services import AzureLLMService
-from pipecat.services.openai_llm_context import OpenAILLMContext
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContextFrame
+from pipecat.services.azure import AzureLLMService
+from pipecat.services.openai import OpenAILLMContext

 from openai.types.chat import (
    ChatCompletionSystemMessageParam,
--- a/tests/integration/integration_ollama_llm.py
+++ b/tests/integration/integration_ollama_llm.py
@@ -1,11 +1,10 @@
 import asyncio
-from pipecat.pipeline.openai_frames import OpenAILLMContextFrame
-from pipecat.services.openai_llm_context import OpenAILLMContext
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContextFrame, OpenAILLMContext

 from openai.types.chat import (
    ChatCompletionSystemMessageParam,
 )
-from pipecat.services.ollama_ai_services import OLLamaLLMService
+from pipecat.services.ollama import OLLamaLLMService

 if __name__ == "__main__":
    async def test_chat():
--- a/tests/integration/integration_openai_llm.py
+++ b/tests/integration/integration_openai_llm.py
@@ -1,51 +1,75 @@
 import asyncio
+import json
 import os
-from pipecat.pipeline.openai_frames import OpenAILLMContextFrame
-from pipecat.services.openai_llm_context import OpenAILLMContext
+from typing import List

+
+from pipecat.services.openai import OpenAILLMContextFrame, OpenAILLMContext
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.frames.frames import (
+    LLMFullResponseStartFrame,
+    LLMFullResponseEndFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
+    TextFrame
+)
+from pipecat.utils.test_frame_processor import TestFrameProcessor
 from openai.types.chat import (
    ChatCompletionSystemMessageParam,
    ChatCompletionToolParam,
    ChatCompletionUserMessageParam,
 )

-from pipecat.services.openai_api_llm_service import BaseOpenAILLMService
+from pipecat.services.openai import OpenAILLMService
+
+tools = [
+    ChatCompletionToolParam(
+        type="function",
+        function={
+            "name": "get_current_weather",
+            "description": "Get the current weather",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "The city and state, e.g. San Francisco, CA",
+                    },
+                    "format": {
+                        "type": "string",
+                        "enum": [
+                            "celsius",
+                            "fahrenheit"],
+                        "description": "The temperature unit to use. Infer this from the users location.",
+                    },
+                },
+                "required": [
+                    "location",
+                    "format"],
+            },
+        })]

 if __name__ == "__main__":
-    async def test_functions():
-        tools = [
-            ChatCompletionToolParam(
-                type="function",
-                function={
-                    "name": "get_current_weather",
-                    "description": "Get the current weather",
-                    "parameters": {
-                        "type": "object",
-                        "properties": {
-                            "location": {
-                                "type": "string",
-                                "description": "The city and state, e.g. San Francisco, CA",
-                            },
-                            "format": {
-                                "type": "string",
-                                "enum": [
-                                    "celsius",
-                                    "fahrenheit"],
-                                "description": "The temperature unit to use. Infer this from the users location.",
-                            },
-                        },
-                        "required": [
-                            "location",
-                            "format"],
-                    },
-                })]
+    async def test_simple_functions():
+
+        async def get_weather_from_api(llm, args):
+            return json.dumps({"conditions": "nice", "temperature": "75"})

        api_key = os.getenv("OPENAI_API_KEY")

-        llm = BaseOpenAILLMService(
+        llm = OpenAILLMService(
            api_key=api_key or "",
            model="gpt-4-1106-preview",
        )
+
+        llm.register_function("get_current_weather", get_weather_from_api)
+        t = TestFrameProcessor([
+            LLMFullResponseStartFrame,
+            [LLMResponseStartFrame, TextFrame, LLMResponseEndFrame],
+            LLMFullResponseEndFrame
+        ])
+        llm.link(t)
+
        context = OpenAILLMContext(tools=tools)
        system_message: ChatCompletionSystemMessageParam = ChatCompletionSystemMessageParam(
            content="Ask the user to ask for a weather report", name="system", role="system"
@@ -58,26 +82,64 @@ if __name__ == "__main__":
        context.add_message(system_message)
        context.add_message(user_message)
        frame = OpenAILLMContextFrame(context)
-        async for s in llm.process_frame(frame):
-            print(s)
+        await llm.process_frame(frame, FrameDirection.DOWNSTREAM)
+
+    async def test_advanced_functions():
+
+        async def get_weather_from_api(llm, args):
+            return [{"role": "system", "content": "The user has asked for live weather. Respond by telling them we don't currently support live weather for that area, but it's coming soon."}]

-    async def test_chat():
        api_key = os.getenv("OPENAI_API_KEY")

-        llm = BaseOpenAILLMService(
+        llm = OpenAILLMService(
            api_key=api_key or "",
            model="gpt-4-1106-preview",
        )
+
+        llm.register_function("get_current_weather", get_weather_from_api)
+        t = TestFrameProcessor([
+            LLMFullResponseStartFrame,
+            [LLMResponseStartFrame, TextFrame, LLMResponseEndFrame],
+            LLMFullResponseEndFrame
+        ])
+        llm.link(t)
+
+        context = OpenAILLMContext(tools=tools)
+        system_message: ChatCompletionSystemMessageParam = ChatCompletionSystemMessageParam(
+            content="Ask the user to ask for a weather report", name="system", role="system"
+        )
+        user_message: ChatCompletionUserMessageParam = ChatCompletionUserMessageParam(
+            content="Could you tell me the weather for Boulder, Colorado",
+            name="user",
+            role="user",
+        )
+        context.add_message(system_message)
+        context.add_message(user_message)
+        frame = OpenAILLMContextFrame(context)
+        await llm.process_frame(frame, FrameDirection.DOWNSTREAM)
+
+    async def test_chat():
+        api_key = os.getenv("OPENAI_API_KEY")
+        t = TestFrameProcessor([
+            LLMFullResponseStartFrame,
+            [LLMResponseStartFrame, TextFrame, LLMResponseEndFrame],
+            LLMFullResponseEndFrame
+        ])
+        llm = OpenAILLMService(
+            api_key=api_key or "",
+            model="gpt-4o",
+        )
+        llm.link(t)
        context = OpenAILLMContext()
        message: ChatCompletionSystemMessageParam = ChatCompletionSystemMessageParam(
            content="Please tell the world hello.", name="system", role="system")
        context.add_message(message)
        frame = OpenAILLMContextFrame(context)
-        async for s in llm.process_frame(frame):
-            print(s)
+        await llm.process_frame(frame, FrameDirection.DOWNSTREAM)

    async def run_tests():
-        await test_functions()
+        await test_simple_functions()
+        await test_advanced_functions()
        await test_chat()

    asyncio.run(run_tests())
--- a/tests/test_aggregators.py
+++ b/tests/test_aggregators.py
@@ -3,16 +3,15 @@ import doctest
 import functools
 import unittest

-from pipecat.pipeline.aggregators import (
-    GatedAggregator,
-    ParallelPipeline,
-    SentenceAggregator,
-    StatelessTextTransformer,
-)
-from pipecat.pipeline.frames import (
-    AudioFrame,
+from pipecat.processors.aggregators.sentence import SentenceAggregator
+from pipecat.processors.text_transformer import StatelessTextTransformer
+from pipecat.processors.aggregators.gated import GatedAggregator
+from pipecat.pipeline.parallel_pipeline import ParallelPipeline
+
+from pipecat.frames.frames import (
+    AudioRawFrame,
    EndFrame,
-    ImageFrame,
+    ImageRawFrame,
    LLMResponseEndFrame,
    LLMResponseStartFrame,
    Frame,
@@ -46,26 +45,26 @@ class TestDailyFrameAggregators(unittest.IsolatedAsyncioTestCase):
    async def test_gated_accumulator(self):
        gated_aggregator = GatedAggregator(
            gate_open_fn=lambda frame: isinstance(
-                frame, ImageFrame), gate_close_fn=lambda frame: isinstance(
+                frame, ImageRawFrame), gate_close_fn=lambda frame: isinstance(
                frame, LLMResponseStartFrame), start_open=False, )

        frames = [
            LLMResponseStartFrame(),
            TextFrame("Hello, "),
            TextFrame("world."),
-            AudioFrame(b"hello"),
-            ImageFrame(b"image", (0, 0)),
-            AudioFrame(b"world"),
+            AudioRawFrame(b"hello", 1, 1),
+            ImageRawFrame(b"image", (0, 0)),
+            AudioRawFrame(b"world", 1, 1),
            LLMResponseEndFrame(),
        ]

        expected_output_frames = [
-            ImageFrame(b"image", (0, 0)),
+            ImageRawFrame(b"image", (0, 0)),
            LLMResponseStartFrame(),
            TextFrame("Hello, "),
            TextFrame("world."),
-            AudioFrame(b"hello"),
-            AudioFrame(b"world"),
+            AudioRawFrame(b"hello", 1, 1),
+            AudioRawFrame(b"world", 1, 1),
            LLMResponseEndFrame(),
        ]
        for frame in frames:
--- a/tests/test_ai_services.py
+++ b/tests/test_ai_services.py
@@ -3,7 +3,7 @@ import unittest
 from typing import AsyncGenerator

 from pipecat.services.ai_services import AIService
-from pipecat.pipeline.frames import EndFrame, Frame, TextFrame
+from pipecat.frames.frames import EndFrame, Frame, TextFrame


 class SimpleAIService(AIService):
--- a/tests/test_pipeline.py
+++ b/tests/test_pipeline.py
@@ -2,9 +2,10 @@ import asyncio
 import unittest
 from unittest.mock import Mock

-from pipecat.pipeline.aggregators import SentenceAggregator, StatelessTextTransformer
-from pipecat.pipeline.frame_processor import FrameProcessor
-from pipecat.pipeline.frames import EndFrame, TextFrame
+from pipecat.processors.text_transformer import StatelessTextTransformer
+from pipecat.processors.aggregators.sentence import SentenceAggregator
+from pipecat.processors.frame_processor import FrameProcessor
+from pipecat.frames.frames import EndFrame, TextFrame

 from pipecat.pipeline.pipeline import Pipeline

--- a/tests/to_be_updated/test_protobuf_serializer.py
+++ b/tests/to_be_updated/test_protobuf_serializer.py
@@ -1,6 +1,6 @@
 import unittest

-from pipecat.pipeline.frames import AudioFrame, TextFrame, TranscriptionFrame
+from pipecat.frames.frames import AudioFrame, TextFrame, TranscriptionFrame
 from pipecat.serializers.protobuf_serializer import ProtobufFrameSerializer


--- a/tests/to_be_updated/test_websocket_transport.py
+++ b/tests/to_be_updated/test_websocket_transport.py
@@ -2,7 +2,7 @@ import asyncio
 import unittest
 from unittest.mock import AsyncMock, patch, Mock

-from pipecat.pipeline.frames import AudioFrame, EndFrame, TextFrame, TTSEndFrame, TTSStartFrame
+from pipecat.frames.frames import AudioRawFrame, EndFrame, TextFrame, TTSStoppedFrame, TTSStartedFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.transports.websocket_transport import WebSocketFrameProcessor, WebsocketTransport

@@ -52,10 +52,10 @@ class TestWebSocketTransportService(unittest.IsolatedAsyncioTestCase):
        processor = WebSocketFrameProcessor(audio_frame_size=4)

        source_frames = [
-            TTSStartFrame(),
-            AudioFrame(b"1234"),
-            AudioFrame(b"5678"),
-            TTSEndFrame(),
+            TTSStartedFrame(),
+            AudioRawFrame(b"1234", 1, 1),
+            AudioRawFrame(b"5678", 1, 1),
+            TTSStoppedFrame(),
            TextFrame("hello world")
        ]

@@ -65,9 +65,9 @@ class TestWebSocketTransportService(unittest.IsolatedAsyncioTestCase):
                frames.append(output_frame)

        self.assertEqual(len(frames), 3)
-        self.assertIsInstance(frames[0], AudioFrame)
+        self.assertIsInstance(frames[0], AudioRawFrame)
        self.assertEqual(frames[0].data, b"1234")
-        self.assertIsInstance(frames[1], AudioFrame)
+        self.assertIsInstance(frames[1], AudioRawFrame)
        self.assertEqual(frames[1].data, b"5678")
        self.assertIsInstance(frames[2], TextFrame)
        self.assertEqual(frames[2].text, "hello world")
Author	SHA1	Message	Date
Chad Bailey	ff1b2961d8	fixup	2024-05-31 14:23:56 +00:00
Chad Bailey	ba42cffcc2	test cleanup	2024-05-31 14:23:56 +00:00
Chad Bailey	9778d86607	everything but audioframe and endpipeframe	2024-05-31 14:23:52 +00:00
Kwindla Hultman Kramer	19caf750fd	Merge pull request #194 from pipecat-ai/khk-cartesia-changelog Added cartesia line to CHANGELOG.md	2024-05-30 14:18:41 -07:00
Kwindla Hultman Kramer	296611714f	added cartesia line to CHANGELOG.md	2024-05-30 10:41:00 -07:00
chadbailey59	4c3d19cc8b	Function calling (#175 ) * added function calling code back * removed old llm_context file * added integration testing for openai * added function calling example * added function callbacks * added function start callback * fixup * fixup * added different return type support for function calling * intake example working * added frame loggers * cleanup * fixup * Update openai.py * removed function call frame types * fixup * re-added example * renumbered wake phrase * fixup for autopep8 * remove unused imports	2024-05-30 12:25:39 -05:00
Aleix Conchillo Flaqué	a3ba07c7a3	Merge pull request #193 from pipecat-ai/aleix/fix-camera-out-enabled-cpu transport(output): fix high CPU usage with camera_out_enabled and no …	2024-05-31 01:25:06 +08:00
Kwindla Hultman Kramer	a1579808b2	Merge pull request #189 from pipecat-ai/khk-cartesia-etc Cartesia TTS	2024-05-30 10:24:45 -07:00
Aleix Conchillo Flaqué	aecb9f5816	transport(output): fix high CPU usage with camera_out_enabled and no images	2024-05-30 10:18:43 -07:00
Aleix Conchillo Flaqué	a5d42a526c	Merge pull request #191 from pipecat-ai/aleix/fix-silero-vad vad: fix silero vad frame processor	2024-05-30 23:25:52 +08:00
Aleix Conchillo Flaqué	a9472f8116	vad: fix silero vad frame processor	2024-05-30 07:50:58 -07:00
Kwindla Hultman Kramer	d5f106ae19	pr fixes	2024-05-29 23:41:35 -07:00
Kwindla Hultman Kramer	920745345a	cartesia tts support	2024-05-29 23:35:35 -07:00
Aleix Conchillo Flaqué	c444004eec	Merge pull request #186 from pipecat-ai/aleix/update-changelog-0.0.24 update CHANGELOG.md 0.0.24	2024-05-29 23:23:06 +08:00
Aleix Conchillo Flaqué	72cf7896d7	update CHANGELOG.md 0.0.24	2024-05-29 08:22:33 -07:00
Aleix Conchillo Flaqué	31af5f8177	Merge pull request #182 from pipecat-ai/aleix/expo-se-dialin-ready transports(daily): expose dialin-ready and handle timeouts	2024-05-29 23:05:47 +08:00
Aleix Conchillo Flaqué	6a68d9a57e	pyproject: update daily-python to 0.9.0	2024-05-28 18:30:43 -07:00
Aleix Conchillo Flaqué	39f41ab25e	transports(daily): expose dialin-ready and handle timeouts	2024-05-28 18:00:09 -07:00
Aleix Conchillo Flaqué	624cc1e987	Merge pull request #185 from pipecat-ai/aleix/add-start-recording transport(daily): add start_recording, stop_recording and stop_dialout	2024-05-29 08:24:59 +08:00
Aleix Conchillo Flaqué	08a15e5cdd	transports(daily): expose on_app_message	2024-05-28 17:23:34 -07:00
Aleix Conchillo Flaqué	4cd4787e4d	transports(daily): added on_call_state_updated	2024-05-28 17:23:34 -07:00
Aleix Conchillo Flaqué	65afee2808	transport(daily): add start_recording, stop_recording and stop_dialout	2024-05-28 17:16:39 -07:00
Aleix Conchillo Flaqué	00ece864ec	Merge pull request #184 from pipecat-ai/aleix/introduce-pipelineparams introduce PipelineParams	2024-05-29 08:14:58 +08:00
Aleix Conchillo Flaqué	6d6d9bea5a	introduce PipelineParams	2024-05-28 17:14:14 -07:00
Kwindla Hultman Kramer	7c213f8533	Merge pull request #183 from pipecat-ai/khk-deepgram-fix moving Deepgram TTS base_url from beta to prod	2024-05-28 17:04:03 -07:00
Kwindla Hultman Kramer	3685c19b2d	moving Deepgram TTS base_url from beta to prod	2024-05-28 15:59:26 -07:00
Aleix Conchillo Flaqué	650a2b4da4	Merge pull request #174 from pipecat-ai/fix-azure-llm-service services(azure): fix AzureLLMService	2024-05-25 00:27:51 +08:00
Aleix Conchillo Flaqué	afea6f38f6	examples: no need to define tts twice	2024-05-24 09:23:00 -07:00
Aleix Conchillo Flaqué	c45d428551	services(google): make api_key argument mandatory	2024-05-24 09:23:00 -07:00
Aleix Conchillo Flaqué	4e594aa9b0	services: BaseOpenAILLMService.create_client() now returns the client	2024-05-24 09:04:15 -07:00
Aleix Conchillo Flaqué	32f91c5f31	services(azure): fix AzureLLMService Fixes #160	2024-05-23 16:51:04 -07:00
Aleix Conchillo Flaqué	a32ece897a	Merge pull request #179 from pipecat-ai/aleix/aiohttp-response-text fix aiohttp response text	2024-05-24 07:42:05 +08:00
Aleix Conchillo Flaqué	88f6436aaa	fix aiohttp response text	2024-05-23 15:51:00 -07:00
Aleix Conchillo Flaqué	fac43cea06	Merge pull request #178 from pipecat-ai/aleix/daily-python-0.8.0-deps update linux/macos requirements	2024-05-24 05:50:10 +08:00
Aleix Conchillo Flaqué	a9e6aeed54	update linux/macos requirements	2024-05-23 14:49:34 -07:00
Aleix Conchillo Flaqué	fa9f49f5bb	Merge pull request #177 from pipecat-ai/aleix/dialin-ready-missing-sipuri transports(daily): fix dialin-ready event handling	2024-05-24 05:39:31 +08:00
Aleix Conchillo Flaqué	2a6183aba5	transports(daily): fix dialin-ready event handling	2024-05-23 14:38:37 -07:00
Aleix Conchillo Flaqué	b1a622971b	Merge pull request #176 from pipecat-ai/aleix/handle-dialin-ready transport(daily): add support for dial-in use cases	2024-05-24 04:58:10 +08:00
Aleix Conchillo Flaqué	5b72faccb4	update CHANGELOG.md for release 0.0.22	2024-05-23 13:57:28 -07:00
Aleix Conchillo Flaqué	c8732544c7	transport(daily): add support for dial-in use cases	2024-05-23 13:56:50 -07:00
Aleix Conchillo Flaqué	d4219b16b8	Merge pull request #170 from pipecat-ai/add-daily-transport-dialout-support transport(daily): add dialout support	2024-05-24 04:19:51 +08:00
Aleix Conchillo Flaqué	0c33432f64	transport(daily): update CHANGELOG.md with dialout/dialin updates	2024-05-23 13:14:34 -07:00
Aleix Conchillo Flaqué	95bd58cced	pyproject: depend on daily-python 0.8.0	2024-05-23 13:10:48 -07:00
Aleix Conchillo Flaqué	8d7d1a7e24	transport(daily): add dialin-ready event	2024-05-23 07:12:31 -07:00
Aleix Conchillo Flaqué	3768cb2f2c	transport(daily): add dialout support	2024-05-22 22:44:01 -07:00
Aleix Conchillo Flaqué	d4b2741608	Merge pull request #169 from pipecat-ai/update-changelog-0.0.21 update CHANGELOG.md for 0.0.21	2024-05-23 12:42:41 +08:00
Aleix Conchillo Flaqué	aef2152dcc	update CHANGELOG.md for 0.0.21	2024-05-22 21:40:29 -07:00
Aleix Conchillo Flaqué	d0b0221b97	Merge pull request #167 from pipecat-ai/khk-bump-anthropic add new response frame types and vision support for anthropic	2024-05-23 12:16:55 +08:00
Kwindla Hultman Kramer	b4758cd989	update CHANGELOG.md	2024-05-22 21:14:11 -07:00
Kwindla Hultman Kramer	681250f114	add new response frame types and vision support for anthropic	2024-05-22 21:12:30 -07:00
Aleix Conchillo Flaqué	fd13d3c50e	Merge pull request #168 from pipecat-ai/transcription-logging transports(daily): add transcription logging	2024-05-23 11:42:51 +08:00
Aleix Conchillo Flaqué	674b8bb0cd	transports(daily): add transcription logging	2024-05-22 20:41:34 -07:00
Aleix Conchillo Flaqué	5d9a962146	Merge pull request #166 from pipecat-ai/fix-llm-response-wake-check fix llm response wake check	2024-05-23 11:35:11 +08:00
Aleix Conchillo Flaqué	e130aada72	filters(WakeCheckFilter): increase timeout to 3	2024-05-22 19:41:14 -07:00
Aleix Conchillo Flaqué	76709a9a39	enclose text between brackets when logging	2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué	acd2d55b84	examples(14): remove commented code	2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué	fcec0eb812	transports(base): log when user is speaking	2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué	e9965347b5	processors(WakeCheckFilter): log what frame we are pushing	2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué	5a83f75e0d	processors: fix user response processors	2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué	91c706a201	Merge pull request #165 from pipecat-ai/clear-audio-output-buffer-when-interrupted transport(base): clear audio output buffer if interrupted	2024-05-23 07:31:33 +08:00
Aleix Conchillo Flaqué	34384881bc	transport(base): clear audio output buffer if interrupted	2024-05-22 16:30:43 -07:00
Aleix Conchillo Flaqué	71ba28753e	Merge pull request #157 from pipecat-ai/khk-improved-wake-word Improved wake word filter	2024-05-23 06:47:59 +08:00
Aleix Conchillo Flaqué	32d2f0db66	update CHANGELOG.ms with filters updates	2024-05-22 15:46:13 -07:00
Aleix Conchillo Flaqué	e1169a4e82	processors(WakeCheckFilter): push error	2024-05-22 15:44:44 -07:00
Aleix Conchillo Flaqué	0e5711e62d	examples: update 10-wake-work.py to use WakeCheckFilter	2024-05-22 15:44:44 -07:00
Aleix Conchillo Flaqué	0ddfa3de5b	move WakeCheckFilter to processors/filters	2024-05-22 15:44:43 -07:00
Kwindla Hultman Kramer	661aa79b7c	fix user_id str field name in TranscriptionFrame	2024-05-22 15:44:43 -07:00
Kwindla Hultman Kramer	2c32cc2f27	improved wake word filter	2024-05-22 15:44:43 -07:00
Aleix Conchillo Flaqué	d7bb0bc5cb	Merge pull request #164 from pipecat-ai/readd-vad-exp-smoothing vad: re-add volume exponential smoothing	2024-05-23 06:44:27 +08:00
Aleix Conchillo Flaqué	d5644c3ab9	vad: re-add volume exponential smoothing	2024-05-22 15:26:32 -07:00
Aleix Conchillo Flaqué	09ab8e3efd	Merge pull request #163 from pipecat-ai/update-0.0.20-deps update requirements files	2024-05-23 05:40:12 +08:00
Aleix Conchillo Flaqué	2f683529ec	update requirements files	2024-05-22 14:39:26 -07:00
Aleix Conchillo Flaqué	6ac012a82b	Merge pull request #158 from pipecat-ai/use-pyloudnorm-loudness interruptions: introduce pyloudnorm to compute loudness	2024-05-23 05:24:38 +08:00
Aleix Conchillo Flaqué	075194cb54	update CHANGELOG for 0.0.20	2024-05-22 14:21:13 -07:00
Aleix Conchillo Flaqué	269f070051	audio: no need for compute_rms	2024-05-22 14:09:24 -07:00
Aleix Conchillo Flaqué	3342c9d7c2	services(stt): use calculate_audio_volume	2024-05-22 13:05:20 -07:00
Aleix Conchillo Flaqué	b468b2f926	audio: clamp normalized volume	2024-05-22 13:04:09 -07:00
Aleix Conchillo Flaqué	af1c7d0023	interruptions: introduce pyloudnorm to compute loudness https://github.com/csteinmetz1/pyloudnorm	2024-05-22 11:52:07 -07:00
Aleix Conchillo Flaqué	34670eef79	Merge pull request #162 from pipecat-ai/reset-before-pushing processors: reset aggergator before pushing	2024-05-23 02:51:55 +08:00
Aleix Conchillo Flaqué	979739c1b7	processors: reset aggergator before pushing	2024-05-22 11:26:08 -07:00
Aleix Conchillo Flaqué	83ed6870b9	Merge pull request #161 from pipecat-ai/only-interrupt-assistant processors: only interrupt asssisstant	2024-05-23 02:02:43 +08:00
Aleix Conchillo Flaqué	57a568986a	processors: only interrupt asssisstant We were pushing interruption frames in the audio task. This was caussing the LLMUserResponseAggregator to push the accumulated text and then casuing the LLM to respond.	2024-05-22 10:15:35 -07:00
Aleix Conchillo Flaqué	e828e26b5b	Merge pull request #159 from pipecat-ai/create-pool-executor transports: run threads in their own ThreadPoolExecutor	2024-05-22 15:49:03 +08:00
Aleix Conchillo Flaqué	825738440e	transports: run threads in their own ThreadPoolExecutor	2024-05-21 18:52:27 -07:00
Aleix Conchillo Flaqué	147bd1a075	Merge pull request #156 from pipecat-ai/pipecat-0.0.19 update CHANGELOG.md for 0.0.19	2024-05-21 12:36:48 +08:00
Aleix Conchillo Flaqué	209e97f372	update CHANGELOG.md for 0.0.19	2024-05-20 21:33:15 -07:00
Aleix Conchillo Flaqué	47f8627432	Merge pull request #155 from pipecat-ai/llm-accumlate-full-response aggregators: accumulate full responses and take interruptions into ac…	2024-05-21 11:34:39 +08:00
Aleix Conchillo Flaqué	cc6713837a	github: publish test to pypi again. simply always use PRs	2024-05-20 12:19:39 -07:00
Aleix Conchillo Flaqué	728fe0ad88	github: don't publish to test pypi twice	2024-05-20 12:15:54 -07:00
Aleix Conchillo Flaqué	dbba45349f	github: don't run publish_test on main branch	2024-05-20 12:14:00 -07:00
Aleix Conchillo Flaqué	40ccf46b4b	aggregators: accumulate full responses and take interruptions into account	2024-05-20 11:40:57 -07:00
Aleix Conchillo Flaqué	077bb9f20a	Merge pull request #153 from pipecat-ai/expose-llm-messages aggregators: expose LLM messages	2024-05-21 02:40:26 +08:00
Aleix Conchillo Flaqué	e4c990c677	aggregators: expose LLM messages	2024-05-20 10:51:37 -07:00