removed space from event handler

added pause to start and new intro prompt
removed header comment from bot runner
2024-06-26 18:30:56 +01:00 · 2024-06-26 18:24:14 +01:00 · 2024-06-24 17:35:26 +01:00 · 2024-06-24 17:34:25 +01:00 · 2024-06-24 17:28:10 +01:00 · 2024-06-24 16:25:36 +01:00
63 changed files with 962 additions and 1591 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,111 +5,6 @@ All notable changes to **pipecat** will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

-## [0.0.36] - 2024-07-02
-
-### Added
-
- Added `GladiaSTTService`.
-  See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition
-
- Added `XTTSService`. This is a local Text-To-Speech service.
-  See https://github.com/coqui-ai/TTS
-
- Added `UserIdleProcessor`. This processor can be used to wait for any
-  interaction with the user. If the user doesn't say anything within a given
-  timeout a provided callback is called.
-
- Added `IdleFrameProcessor`. This processor can be used to wait for frames
-  within a given timeout. If no frame is received within the timeout a provided
-  callback is called.
-
- Added new frame `BotSpeakingFrame`. This frame will be continuously pushed
-  upstream while the bot is talking.
-
- It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer`
-  or `SileroVAD`.
-
- Added `AysncFrameProcessor` and `AsyncAIService`.  Some services like
-  `DeepgramSTTService` need to process things asynchronously. For example, audio
-  is sent to Deepgram but transcriptions are not returned immediately. In these
-  cases we still require all frames (except system frames) to be pushed
-  downstream from a single task. That's what `AsyncFrameProcessor` is for. It
-  creates a task and all frames should be pushed from that task. So, whenever a
-  new Deepgram transcription is ready that transcription will also be pushed
-  from this internal task.
-
- The `MetricsFrame` now includes processing metrics if metrics are enabled. The
-  processing metrics indicate the time a processor needs to generate all its
-  output. Note that not all processors generate these kind of metrics.
-
-### Changed
-
- `WhisperSTTService` model can now also be a string.
-
- Added missing * keyword separators in services.
-
-### Fixed
-
- `WebsocketServerTransport` doesn't try to send frames anymore if serializers
-  returns `None`.
-
- Fixed an issue where exceptions that occurred inside frame processors were
-  being swallowed and not displayed.
-
- Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send
-  data to the websocket after being closed.
-
-### Other
-
- Added Fly.io deployment example in `examples/deployment/flyio-example`.
-
- Added new `17-detect-user-idle.py` example that shows how to use the new
-  `UserIdleProcessor`.
-
-## [0.0.35] - 2024-06-28
-
-### Changed
-
- `FastAPIWebsocketParams` now require a serializer.
-
- `TwilioFrameSerializer` now requires a `streamSid`.
-
-### Fixed
-
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for
-  8000 sample rate.
-
-## [0.0.34] - 2024-06-25
-
-### Fixed
-
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
-  interruptions to ignore transcriptions.
-
- Fixed an issue introduced in 0.0.33 that would cause the LLM to generate
-  shorter output.
-
-## [0.0.33] - 2024-06-25
-
-### Changed
-
- Upgraded to Cartesia's new Python library 1.0.0. `CartesiaTTSService` now
-  expects a voice ID instead of a voice name (you can get the voice ID from
-  Cartesia's playground). You can also specify the audio `sample_rate` and
-  `encoding` instead of the previous `output_format`.
-
-### Fixed
-
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
-  cause static audio issues and interruptions to not work properly when dealing
-  with multiple LLMs sentences.
-
- Fixed an issue that could mix new LLM responses with previous ones when
-  handling interruptions.
-
- Fixed a Daily transport blocking situation that occurred while reading audio
-  frames after a participant left the room. Needs daily-python >= 0.10.1.
-
 ## [0.0.32] - 2024-06-22

 ### Added
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ pip install "pipecat-ai[option,...]"

 Your project may or may not need these, so they're made available as optional requirements. Here is a list:

- **AI services**: `anthropic`, `azure`, `deepgram`, `gladia`, `google`, `fal`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`, `xtts`
+- **AI services**: `anthropic`, `azure`, `deepgram`, `google`, `fal`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`
 - **Transports**: `local`, `websocket`, `daily`

 ## Code examples
--- a/dot-env.template
+++ b/dot-env.template
@@ -27,9 +27,6 @@ FAL_KEY=...
 # Fireworks
 FIREWORKS_API_KEY=...

-# Gladia
-GLADIA_API_KEY=...
-
 # PlayHT
 PLAY_HT_USER_ID=...
 PLAY_HT_API_KEY=...
--- a/examples/deployment/flyio-example/Dockerfile
+++ b/examples/deployment/flyio-example/Dockerfile
@@ -1,16 +0,0 @@
-FROM python:3.11-bullseye
-
-# Open port 7860 for http service
-ENV FAST_API_PORT=7860
-EXPOSE 7860
-
-# Install Python dependencies
-COPY *.py .
-COPY ./requirements.txt requirements.txt
-RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
-
-# Install models
-RUN python3 install_deps.py
-
-# Start the FastAPI server
-CMD python3 bot_runner.py --port ${FAST_API_PORT}
--- a/examples/deployment/flyio-example/README.md
+++ b/examples/deployment/flyio-example/README.md
@@ -1,43 +0,0 @@
-# Fly.io deployment example
-
-This project modifies the `bot_runner.py` server to launch a new machine for each user session. This is a recommended approach for production vs. running shell processess as your deployment will quickly run out of system resources under load.
-
-To speed up machine boot times, we also download and cache Silero VAD as part of the Dockerfile (`install_deps.py`). If you are using other custom models, you can add them here too.
-
-For this example, we are using Daily as a WebRTC transport and provisioning a new room and token for each session. You can use another transport, such as WebSockets, by modifying the `bot.py` and `bot_runner.py` files accordingly.
-
-## Setting up your fly.io deployment
-
-### Create your fly.toml file
-
-You can copy the `example-fly.toml` as a reference. Be sure to change the app name to something unique.
-
-### Create your .env file
-
-Copy the base `env.example` to `.env` and enter the necessary API keys. 
-
-`FLY_APP_NAME` should match that in the `fly.toml` file.
-
-### Launch a new fly.io project
-
-`fly launch` or `fly launch --org your-org-name`
-
-### Set the necessary app secrets from your .env
-
-Note: you can do this manually via the fly.io dashboard under the "secrets" sub-section of your deployment (e.g. "https://fly.io/apps/fly-app-name/secrets") or run the following terminal command:
-
-`cat .env | tr '\n' ' ' | xargs flyctl secrets set`
-
-### Deploy your machine
-
-`fly deploy`
-
-
-## Connecting to your bot
-
-Send a post request to your running fly.io instance:
-
-`curl --location --request POST 'https://YOUR_FLY_APP_NAME/start_bot'`
-
-This request will wait until the machine enters into a `starting` state, before returning the a room URL and token to join.
-
--- a/examples/deployment/flyio-example/bot.py
+++ b/examples/deployment/flyio-example/bot.py
@@ -1,103 +0,0 @@
-import asyncio
-import aiohttp
-import os
-import sys
-import argparse
-
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
-from pipecat.frames.frames import LLMMessagesFrame, EndFrame
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.elevenlabs import ElevenLabsTTSService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-daily_api_key = os.getenv("DAILY_API_KEY", "")
-daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
-
-
-async def main(room_url: str, token: str):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Chatbot",
-            DailyParams(
-                api_url=daily_api_url,
-                api_key=daily_api_key,
-                audio_in_enabled=True,
-                audio_out_enabled=True,
-                camera_out_enabled=False,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
-                transcription_enabled=True,
-            )
-        )
-
-        tts = ElevenLabsTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("ELEVENLABS_API_KEY", ""),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are Chatbot, a friendly, helpful robot. Your output will be converted to audio so don't include special characters other than '!' or '?' in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying hello.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        pipeline = Pipeline([
-            transport.input(),
-            tma_in,
-            llm,
-            tts,
-            transport.output(),
-            tma_out,
-        ])
-
-        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        @transport.event_handler("on_participant_left")
-        async def on_participant_left(transport, participant, reason):
-            await task.queue_frame(EndFrame())
-
-        @transport.event_handler("on_call_state_updated")
-        async def on_call_state_updated(transport, state):
-            if state == "left":
-                await task.queue_frame(EndFrame())
-
-        runner = PipelineRunner()
-
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="Pipecat Bot")
-    parser.add_argument("-u", type=str, help="Room URL")
-    parser.add_argument("-t", type=str, help="Token")
-    config = parser.parse_args()
-
-    asyncio.run(main(config.u, config.t))
--- a/examples/deployment/flyio-example/env.example
+++ b/examples/deployment/flyio-example/env.example
@@ -1,8 +0,0 @@
-DAILY_API_KEY=
-DAILY_SAMPLE_ROOM_URL= # Enter a Daily room URL to use a set room URL each time (useful for local testing)
-OPENAI_API_KEY=
-ELEVENLABS_API_KEY=
-ELEVENLABS_VOICE_ID=
-FLY_API_KEY=
-FLY_APP_NAME=
-RUN_AS_PROCESS= # Spawn fly.io machine for each session or run as local process
--- a/examples/deployment/flyio-example/example-fly.toml
+++ b/examples/deployment/flyio-example/example-fly.toml
@@ -1,25 +0,0 @@
-# fly.toml app configuration file generated for pipecat-fly-example on 2024-07-01T15:04:53+01:00
-#
-# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
-#
-
-app = 'pipecat-fly-example'
-primary_region = 'sjc'
-
-[build]
-
-[env]
-  FLY_APP_NAME = 'pipecat-fly-example'
-
-[http_service]
-  internal_port = 7860
-  force_https = true
-  auto_stop_machines = true
-  auto_start_machines = true
-  min_machines_running = 0
-  processes = ['app']
-
-[[vm]]
-  memory = 512
-  cpu_kind = 'shared'
-  cpus = 1
--- a/examples/deployment/flyio-example/install_deps.py
+++ b/examples/deployment/flyio-example/install_deps.py
@@ -1,4 +0,0 @@
-import torch
-
-# Download (cache) the Silero VAD model
-torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)
--- a/examples/fast-chatbot/.gitignore
+++ b/examples/fast-chatbot/.gitignore
@@ -0,0 +1,165 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
+runpod.toml
+
+# custom script to recursively upgrade items in requirements.py
+upgrade_requirements.py
+.DS_Store
--- a/examples/deployment/flyio-example/init.py
+++ b/examples/deployment/flyio-example/init.py
--- a/examples/fast-chatbot/bot.py
+++ b/examples/fast-chatbot/bot.py
@@ -0,0 +1,164 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+from loguru import logger
+import argparse
+import asyncio
+import aiohttp
+import os
+import sys
+import time
+from typing import Optional
+
+from pydantic import BaseModel, ValidationError
+
+from pipecat.vad.vad_analyzer import VADParams
+from pipecat.vad.silero import SileroVADAnalyzer
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.services.openai import OpenAILLMService
+from pipecat.services.deepgram import DeepgramSTTService
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.frames.frames import LLMMessagesFrame, EndFrame
+
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator, LLMUserResponseAggregator
+)
+
+from helpers import (
+    ClearableDeepgramTTSService,
+    AudioVolumeTimer,
+    TranscriptionTimingLogger
+)
+
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level=os.getenv("LOG_LEVEL", "DEBUG"))
+
+
+class BotSettings(BaseModel):
+    room_url: str
+    room_token: str
+    bot_name: str = "Pipecat"
+    prompt: Optional[str] = "You are a helpful assistant."
+    deepgram_api_key: Optional[str] = os.getenv("DEEPGRAM_API_KEY", None)
+    deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE", "aura-asteria-en")
+    deepgram_tts_base_url: Optional[str] = os.getenv(
+        "DEEPGRAM_TTS_BASE_URL", "https://api.deepgram.com/v1/speak")
+    deepgram_stt_base_url: Optional[str] = os.getenv(
+        "DEEPGRAM_STT_BASE_URL", "https://api.deepgram.com/v1/speak")
+    openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY", None),
+    openai_model: Optional[str] = os.getenv("OPENAI_MODEL", None),
+    openai_base_url: Optional[str] = os.getenv("OPENAI_BASE_URL", None)
+    vad_stop_secs: Optional[float] = os.getenv("VAD_STOP_SECS", 0.200)
+
+
+async def main(settings: BotSettings):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            settings.room_url,
+            settings.room_token,
+            settings.bot_name,
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=False,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(params=VADParams(
+                    stop_secs=settings.vad_stop_secs
+                )),
+                vad_audio_passthrough=True
+            )
+        )
+
+        stt = DeepgramSTTService(
+            name="STT",
+            api_key=settings.deepgram_api_key,
+            url=settings.deepgram_stt_base_url
+        )
+
+        tts = ClearableDeepgramTTSService(
+            name="Voice",
+            aiohttp_session=session,
+            api_key=settings.deepgram_api_key,
+            voice=settings.deepgram_voice,
+            **({'base_url': url} if (url := settings.deepgram_tts_base_url) else {})
+        )
+
+        llm = OpenAILLMService(
+            name="LLM",
+            api_key=settings.openai_api_key,
+            model=settings.openai_model,
+            base_url=settings.openai_base_url,
+        )
+
+        messages = [
+            {
+                "role": "system",
+                "content": settings.prompt,
+            },
+        ]
+
+        avt = AudioVolumeTimer()
+        tl = TranscriptionTimingLogger(avt)
+
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
+
+        pipeline = Pipeline([
+            transport.input(),   # Transport user input
+            avt,                 # Audio volume timer
+            stt,                 # Speech-to-text
+            tl,                  # Transcription timing logger
+            tma_in,              # User responses
+            llm,                 # LLM
+            tts,                 # TTS
+            transport.output(),  # Transport bot output
+            tma_out,             # Assistant spoken responses
+        ])
+
+        task = PipelineTask(
+            pipeline,
+            PipelineParams(
+                allow_interruptions=True,
+                enable_metrics=True,
+                report_only_initial_ttfb=True
+            ))
+
+        # When the participant leaves, we exit the bot.
+        @transport.event_handler("on_participant_left")
+        async def on_participant_left(transport, participant, reason):
+            await task.queue_frame(EndFrame())
+
+        # When the first participant joins, the bot should introduce itself.
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            # Provide some air whilst tracks subscribe
+            time.sleep(2)
+            messages.append(
+                {
+                    "role": "system",
+                    "content": "Briefly introduce yourself by saying 'hello, I'm FastBot, how can I help you today?'"})
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Pipecat Bot")
+    parser.add_argument("-s", "--settings", type=str, required=True, help="Pipecat bot settings")
+
+    args, unknown = parser.parse_known_args()
+
+    try:
+        settings = BotSettings.model_validate_json(args.settings)
+        asyncio.run(main(settings))
+    except ValidationError as e:
+        print(e)
--- a/examples/deployment/flyio-example/bot_runner.py
+++ b/examples/deployment/flyio-example/bot_runner.py
@@ -1,7 +1,15 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
 import os
 import argparse
 import subprocess
-import requests
+
+from pydantic import BaseModel, ValidationError
+from typing import Optional

 from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams

@@ -9,6 +17,8 @@ from fastapi import FastAPI, Request, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import JSONResponse

+from bot import BotSettings
+
 from dotenv import load_dotenv
 load_dotenv(override=True)

@@ -16,29 +26,24 @@ load_dotenv(override=True)
 # ------------ Configuration ------------ #

 MAX_SESSION_TIME = 5 * 60  # 5 minutes
-REQUIRED_ENV_VARS = [
-    'DAILY_API_KEY',
-    'OPENAI_API_KEY',
-    'ELEVENLABS_API_KEY',
-    'ELEVENLABS_VOICE_ID',
-    'FLY_API_KEY',
-    'FLY_APP_NAME',]
-
-FLY_API_HOST = os.getenv("FLY_API_HOST", "https://api.machines.dev/v1")
-FLY_APP_NAME = os.getenv("FLY_APP_NAME", "pipecat-fly-example")
-FLY_API_KEY = os.getenv("FLY_API_KEY", "")
-FLY_HEADERS = {
-    'Authorization': f"Bearer {FLY_API_KEY}",
-    'Content-Type': 'application/json'
-}
+REQUIRED_ENV_VARS = ['DAILY_API_URL', 'DAILY_API_KEY', 'DEEPGRAM_API_KEY']

 daily_rest_helper = DailyRESTHelper(
    os.getenv("DAILY_API_KEY", ""),
    os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))


+class RunnerSettings(BaseModel):
+    prompt: Optional[
+        str] = "You are a fast, low-latency chatbot. Your goal is to demonstrate voice-driven AI capabilities at human-like speeds. When introducing yourself briefly mention your goal is to showcase speed and conversational flow. The technology powering you is Daily for transport, Cerebrium for GPU hosting, Llama 3 (8-B version) LLM, and Deepgram for speech-to-text and text-to-speech. You are hosted on the east coast of the United States. Respond to what the user said in a creative and helpful way, but keep responses short and legible. Ensure responses contain only words. Check again that you have not included special characters other than '?' or '!'."
+    deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE")
+    openai_model: Optional[str] = os.getenv("OPENAI_MODEL", "gpt-4o")
+    openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY")
+    test: Optional[bool] = None
+
 # ----------------- API ----------------- #

+
 app = FastAPI()

 app.add_middleware(
@@ -52,67 +57,25 @@ app.add_middleware(
 # ----------------- Main ----------------- #


-def spawn_fly_machine(room_url: str, token: str):
-    # Use the same image as the bot runner
-    res = requests.get(f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines", headers=FLY_HEADERS)
-    if res.status_code != 200:
-        raise Exception(f"Unable to get machine info from Fly: {res.text}")
-    image = res.json()[0]['config']['image']
-
-    # Machine configuration
-    cmd = f"python3 bot.py -u {room_url} -t {token}"
-    cmd = cmd.split()
-    worker_props = {
-        "config": {
-            "image": image,
-            "auto_destroy": True,
-            "init": {
-                "cmd": cmd
-            },
-            "restart": {
-                "policy": "no"
-            },
-            "guest": {
-                "cpu_kind": "shared",
-                "cpus": 1,
-                "memory_mb": 1024
-            }
-        },
-
-    }
-
-    # Spawn a new machine instance
-    res = requests.post(
-        f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines",
-        headers=FLY_HEADERS,
-        json=worker_props)
-
-    if res.status_code != 200:
-        raise Exception(f"Problem starting a bot worker: {res.text}")
-
-    # Wait for the machine to enter the started state
-    vm_id = res.json()['id']
-
-    res = requests.get(
-        f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines/{vm_id}/wait?state=started",
-        headers=FLY_HEADERS)
-
-    if res.status_code != 200:
-        raise Exception(f"Bot was unable to enter started state: {res.text}")
-
-    print(f"Machine joined room: {room_url}")
-
-
@app.post("/start_bot")
 async def start_bot(request: Request) -> JSONResponse:
+    runner_settings = RunnerSettings()
    try:
-        data = await request.json()
-        # Is this a webhook creation request?
-        if "test" in data:
-            return JSONResponse({"test": True})
+        request_body = await request.body()
+        if len(request_body) > 0:
+            runner_settings = RunnerSettings.model_validate_json(request_body)
+    except ValidationError as e:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Invalid request: {e}")
    except Exception as e:
+        # If no data in request, pass
        pass

+    # Is this a webhook creation request?
+    if runner_settings.test is not None:
+        return JSONResponse({"test": True})
+
    # Use specified room URL, or create a new one if not specified
    room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")

@@ -141,25 +104,26 @@ async def start_bot(request: Request) -> JSONResponse:
        raise HTTPException(
            status_code=500, detail=f"Failed to get token for room: {room_url}")

-    # Launch a new fly.io machine, or run as a shell process (not recommended)
-    run_as_process = os.getenv("RUN_AS_PROCESS", False)
+    # Spawn a new agent, and join the user session
+    try:
+        bot_settings = BotSettings(
+            room_url=room.url,
+            room_token=token,
+            prompt=runner_settings.prompt,
+            deepgram_voice=runner_settings.deepgram_voice,
+            openai_model=runner_settings.openai_model,
+            openai_api_key=runner_settings.openai_api_key,
+        )
+        bot_settings_str = bot_settings.model_dump_json(exclude_none=True)

-    if run_as_process:
-        try:
-            subprocess.Popen(
-                [f"python3 -m bot -u {room.url} -t {token}"],
-                shell=True,
-                bufsize=1,
-                cwd=os.path.dirname(os.path.abspath(__file__)))
-        except Exception as e:
-            raise HTTPException(
-                status_code=500, detail=f"Failed to start subprocess: {e}")
-    else:
-        try:
-            spawn_fly_machine(room.url, token)
-        except Exception as e:
-            raise HTTPException(
-                status_code=500, detail=f"Failed to spawn VM: {e}")
+        subprocess.Popen(
+            [f"python3 -m bot -s '{bot_settings_str}'"],
+            shell=True,
+            bufsize=1,
+            cwd=os.path.dirname(os.path.abspath(__file__)))
+    except Exception as e:
+        raise HTTPException(
+            status_code=500, detail=f"Failed to start subprocess: {e}")

    # Grab a token for the user to join with
    user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
@@ -169,6 +133,7 @@ async def start_bot(request: Request) -> JSONResponse:
        "token": user_token,
    })

+
 if __name__ == "__main__":
    # Check environment variables
    for env_var in REQUIRED_ENV_VARS:
@@ -181,7 +146,7 @@ if __name__ == "__main__":
    parser.add_argument("--port", type=int,
                        default=os.getenv("PORT", 7860), help="Port number")
    parser.add_argument("--reload", action="store_true",
-                        default=False, help="Reload code on change")
+                        default=True, help="Reload code on change")

    config = parser.parse_args()

--- a/examples/fast-chatbot/env.example
+++ b/examples/fast-chatbot/env.example
@@ -0,0 +1,12 @@
+DAILY_SAMPLE_ROOM_URL= #optional: use the same room each time, or create a new one if unset
+DAILY_API_KEY=
+DAILY_API_URL=
+
+DEEPGRAM_API_KEY=
+DEEPGRAM_VOICE=
+DEEPGRAM_STT_URL=
+DEEPGRAM_TTS_BASE_URL=
+
+OPENAI_API_KEY=
+OPENAI_MODEL=
+OPENAI_BASE_URL=
--- a/examples/fast-chatbot/helpers.py
+++ b/examples/fast-chatbot/helpers.py
@@ -0,0 +1,267 @@
+from loguru import logger
+import asyncio
+import math
+import struct
+import time
+from dataclasses import dataclass, field
+from typing import List
+
+
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.frames.frames import (
+    Frame,
+    AudioRawFrame,
+    InterimTranscriptionFrame,
+    TranscriptionFrame,
+    TextFrame,
+    StartInterruptionFrame,
+    LLMFullResponseStartFrame,
+    TTSStoppedFrame,
+    MetricsFrame
+)
+
+from pipecat.vad.vad_analyzer import VADAnalyzer, VADState
+from pipecat.services.deepgram import DeepgramTTSService
+from pipecat.services.openai import OpenAILLMContext, OpenAILLMContextFrame
+
+
+class GreedyLLMAggregator(FrameProcessor):
+    def __init__(self, context: OpenAILLMContext = None, **kwargs):
+        super().__init__(**kwargs)
+        self.context: OpenAILLMContext = context if context else OpenAILLMContext()
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        logger.debug(f"{frame}")
+
+        try:
+            if isinstance(frame, InterimTranscriptionFrame):
+                return
+
+            if isinstance(frame, TranscriptionFrame):
+                # append transcribed text to last "user" frame
+                if self.context.messages and self.context.messages[-1]["role"] == "user":
+                    last_frame = self.context.messages.pop()
+                else:
+                    last_frame = {"role": "user", "content": ""}
+
+                last_frame["content"] += " " + frame.text
+                self.context.messages.append(last_frame)
+
+                oai_context_frame = OpenAILLMContextFrame(context=self.context)
+                logger.debug(f"pushing frame {oai_context_frame}")
+                await self.push_frame(oai_context_frame)
+                return
+
+            await self.push_frame(frame, direction)
+        except Exception as e:
+            logger.debug(f"error: {e}")
+
+
+class ClearableDeepgramTTSService(DeepgramTTSService):
+    def __init___(self, **kwargs):
+        super().__init(**kwargs)
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, StartInterruptionFrame):
+            self._current_sentence = ""
+
+
+@dataclass
+class BufferedSentence:
+    audio_frames: List[AudioRawFrame] = field(default_factory=list)
+    text_frame: TextFrame = None
+
+
+class VADGate(FrameProcessor):
+
+    def __init__(
+            self,
+            vad_analyzer: VADAnalyzer = None,
+            context: OpenAILLMContext = None,
+            **kwargs):
+        super().__init__(**kwargs)
+        self.vad_analyzer = vad_analyzer
+        self.context = context
+
+        self._audio_pusher_task = None
+        self._expect_text_frame_next = False
+        self._sentences: List[BufferedSentence] = []
+
+    # queue output from tts one sentence at a time. associate a buffer of audio frames with the content of
+    # each text frame.
+    #
+    # start a coroutine to service the queue and send sentences down the pipeline when possible.
+    # 1. do not send anything when we are not in VADState.QUIET
+    # 2. if we are in VADState.QUIET, send a sentence, estimate how long it will take for that sentence
+    #    to output, sleep until it's time to send another sentence
+    # 3. each time we send a sentence, append it to the conversation context
+    # 3. when the sentence buffer becomes empty, cancel the coroutine
+    # 4. if we get a new LLMFullResponse, treat that as a cancellation, too
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        try:
+
+            # A TTSService will emit a series of AudioRawFrame objects, then a TTSStoppedFrame,
+            # then a TextFrame.
+
+            if self._expect_text_frame_next:
+                self._expect_text_frame_next = False
+                if isinstance(frame, TextFrame):
+                    self._sentences[-1].text_frame = frame
+                else:
+                    logger.debug(f"expected a text frame, but received {frame}")
+                    await self.push_frame(frame, direction)
+                return
+            else:
+                if isinstance(frame, TextFrame):
+                    logger.error(f"XXXXXXXXXXXXXXXXXXX received a text frame, wasn't expecting it.")
+
+            if isinstance(frame, AudioRawFrame):
+                # if our buffer is empty or has a "finished" sentence at the end,
+                # then we need to start buffering a new sentence
+                if not self._sentences or self._sentences[-1].text_frame:
+                    self._sentences.append(BufferedSentence())
+                self._sentences[-1].audio_frames.append(frame)
+                await self.maybe_start_audio_pusher_task()
+                return
+
+            if isinstance(frame, TTSStoppedFrame):
+                self._expect_text_frame_next = True
+                await self.push_frame(frame, direction)
+                return
+
+            # There are two ways we can be interrupted. During greedy inference, a new
+            # LLM response can start. Or, during playout, we can get a traditional
+            # user interruption frame.
+            if (isinstance(frame, LLMFullResponseStartFrame) or
+                    isinstance(frame, StartInterruptionFrame)):
+                logger.debug(f"{frame} - Handle interruption in VADGate")
+                self._sentences = []
+                if self._audio_pusher_task:
+                    self._audio_pusher_task.cancel()
+                    self._audio_pusher_task = None
+                await self.push_frame(frame, direction)
+                return
+
+            await self.push_frame(frame, direction)
+        except Exception as e:
+            logger.debug(f"error: {e}")
+
+    async def maybe_start_audio_pusher_task(self):
+        try:
+            if self._audio_pusher_task:
+                return
+            self._audio_pusher_task = self.get_event_loop().create_task(self.push_audio())
+
+        except Exception as e:
+            logger.debug(f"Exception {e}")
+
+    async def push_audio(self):
+        try:
+            while True:
+                if not self._sentences:
+                    await asyncio.sleep(0.01)
+                    continue
+
+                if self.vad_analyzer._vad_state != VADState.QUIET:
+                    await asyncio.sleep(0.01)
+                    continue
+
+                # we only want to push completed sentence buffers
+                if not self._sentences[0].text_frame:
+                    await asyncio.sleep(0.01)
+                    continue
+
+                s = self._sentences.pop(0)
+                if not s.audio_frames:
+                    continue
+                sample_rate = s.audio_frames[0].sample_rate
+                duration = 0
+                logger.debug(f"Pushing {len(s.audio_frames)} audio frames")
+                for frame in s.audio_frames:
+                    await self.push_frame(frame)
+                    # assume linear16 encoding (2 bytes per sample). todo: add some more
+                    # metadata to AudioRawFrame, maybe
+                    duration += (len(frame.audio) / 2 / frame.num_channels) / sample_rate
+                await asyncio.sleep(duration - 20 / 1000)
+                if self.context:
+                    logger.debug(f"Appending assistant message to context: [{s.text_frame.text}]")
+                    self.context.messages.append(
+                        {"role": "assistant", "content": s.text_frame.text}
+                    )
+                await self.push_frame(s.text_frame)
+
+        except Exception as e:
+            logger.debug(f"Exception {e}")
+
+
+class TranscriptionTimingLogger(FrameProcessor):
+    def __init__(self, avt):
+        super().__init__()
+        self.name = "Transcription"
+        self._avt = avt
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        try:
+            await super().process_frame(frame, direction)
+            if isinstance(frame, TranscriptionFrame):
+                elapsed = time.time() - self._avt.last_transition_ts
+                logger.debug(f"Transcription TTF: {elapsed}")
+                await self.push_frame(MetricsFrame(ttfb={self.name: elapsed}))
+
+            await self.push_frame(frame, direction)
+        except Exception as e:
+            logger.debug(f"Exception {e}")
+
+
+class AudioVolumeTimer(FrameProcessor):
+    def __init__(self):
+        super().__init__()
+        self.last_transition_ts = 0
+        self._prev_volume = -80
+        self._speech_volume_threshold = -50
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, AudioRawFrame):
+            volume = self.calculate_volume(frame)
+            # print(f"Audio volume: {volume:.2f} dB")
+            if (volume >= self._speech_volume_threshold and
+                    self._prev_volume < self._speech_volume_threshold):
+                # logger.debug("transition above speech volume threshold")
+                self.last_transition_ts = time.time()
+            elif (volume < self._speech_volume_threshold and
+                    self._prev_volume >= self._speech_volume_threshold):
+                # logger.debug("transition below non-speech volume threshold")
+                self.last_transition_ts = time.time()
+            self._prev_volume = volume
+
+        await self.push_frame(frame, direction)
+
+    def calculate_volume(self, frame: AudioRawFrame) -> float:
+        if frame.num_channels != 1:
+            raise ValueError(f"Expected 1 channel, got {frame.num_channels}")
+
+        # Unpack audio data into 16-bit integers
+        fmt = f"{len(frame.audio) // 2}h"
+        audio_samples = struct.unpack(fmt, frame.audio)
+
+        # Calculate RMS
+        sum_squares = sum(sample**2 for sample in audio_samples)
+        rms = math.sqrt(sum_squares / len(audio_samples))
+
+        # Convert RMS to decibels (dB)
+        # Reference: maximum value for 16-bit audio is 32767
+        if rms > 0:
+            db = 20 * math.log10(rms / 32767)
+        else:
+            db = -96  # Minimum value (almost silent)
+
+        return db
--- a/examples/deployment/flyio-example/requirements.txt
+++ b/examples/deployment/flyio-example/requirements.txt
@@ -1,4 +1,4 @@
-pipecat-ai[daily,openai,silero]
+pipecat-ai[daily,openai,silero,deepgram]
 fastapi
 uvicorn
 requests
--- a/examples/foundational/06a-image-sync.py
+++ b/examples/foundational/06a-image-sync.py
@@ -67,12 +67,11 @@ async def main(room_url: str, token):
            "Respond bot",
            DailyParams(
                audio_out_enabled=True,
-                camera_out_enabled=True,
                camera_out_width=1024,
                camera_out_height=1024,
                transcription_enabled=True,
                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
+                vad_analyzer=SileroVADAnalyzer()
            )
        )

@@ -117,7 +116,7 @@ async def main(room_url: str, token):
        async def on_first_participant_joined(transport, participant):
            participant_name = participant["info"]["userName"] or ''
            transport.capture_participant_transcription(participant["id"])
-            await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])
+            await task.queue_frames([TextFrame(f"Hi, this is {participant_name}.")])

        runner = PipelineRunner()

--- a/examples/foundational/07d-interruptible-cartesia.py
+++ b/examples/foundational/07d-interruptible-cartesia.py
@@ -38,6 +38,7 @@ async def main(room_url: str, token):
        "Respond bot",
        DailyParams(
            audio_out_enabled=True,
+            audio_out_sample_rate=44100,
            transcription_enabled=True,
            vad_enabled=True,
            vad_analyzer=SileroVADAnalyzer()
@@ -46,7 +47,8 @@ async def main(room_url: str, token):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="a0e99841-438c-4a64-b679-ae501e7d6091",  # Barbershop Man
+        voice_name="British Lady",
+        output_format="pcm_44100"
    )

    llm = OpenAILLMService(
--- a/examples/foundational/07i-interruptible-xtts.py
+++ b/examples/foundational/07i-interruptible-xtts.py
@@ -1,96 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import aiohttp
-import os
-import sys
-
-from pipecat.frames.frames import LLMMessagesFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
-from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.xtts import XTTSService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-async def main(room_url: str, token):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Respond bot",
-            DailyParams(
-                audio_out_enabled=True,
-                transcription_enabled=True,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
-            )
-        )
-
-        tts = XTTSService(
-            aiohttp_session=session,
-            voice_id="Claribel Dervla",
-            language="en",
-            base_url="http://localhost:8000"
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        pipeline = Pipeline([
-            transport.input(),   # Transport user input
-            tma_in,              # User responses
-            llm,                 # LLM
-            tts,                 # TTS
-            transport.output(),  # Transport bot output
-            tma_out              # Assistant spoken responses
-        ])
-
-        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            # Kick off the conversation.
-            messages.append(
-                {"role": "system", "content": "Please introduce yourself to the user."})
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        runner = PipelineRunner()
-
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    (url, token) = configure()
-    asyncio.run(main(url, token))
--- a/examples/foundational/07j-interruptible-gladia.py
+++ b/examples/foundational/07j-interruptible-gladia.py
@@ -1,101 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import aiohttp
-import os
-import sys
-
-from pipecat.frames.frames import LLMMessagesFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
-from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
-from pipecat.services.gladia import GladiaSTTService
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.xtts import XTTSService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-async def main(room_url: str, token):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Respond bot",
-            DailyParams(
-                audio_out_enabled=True,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
-                vad_audio_passthrough=True,
-            )
-        )
-
-        stt = GladiaSTTService(
-            api_key=os.getenv("GLADIA_API_KEY"),
-        )
-
-        tts = DeepgramTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("DEEPGRAM_API_KEY"),
-            voice="aura-helios-en"
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        pipeline = Pipeline([
-            transport.input(),   # Transport user input
-            stt,                 # STT
-            tma_in,              # User responses
-            llm,                 # LLM
-            tts,                 # TTS
-            transport.output(),  # Transport bot output
-            tma_out              # Assistant spoken responses
-        ])
-
-        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            # Kick off the conversation.
-            messages.append(
-                {"role": "system", "content": "Please introduce yourself to the user."})
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        runner = PipelineRunner()
-
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    (url, token) = configure()
-    asyncio.run(main(url, token))
--- a/examples/foundational/15-switch-voices.py
+++ b/examples/foundational/15-switch-voices.py
@@ -66,6 +66,7 @@ async def main(room_url: str, token):
            "Pipecat",
            DailyParams(
                audio_out_enabled=True,
+                audio_out_sample_rate=44100,
                transcription_enabled=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer()
@@ -74,17 +75,20 @@ async def main(room_url: str, token):

        news_lady = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="bf991597-6c13-47e4-8411-91ec2de5c466",  # Newslady
+            voice_name="Newslady",
+            output_format="pcm_44100"
        )

        british_lady = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
+            voice_name="British Lady",
+            output_format="pcm_44100"
        )

        barbershop_man = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="a0e99841-438c-4a64-b679-ae501e7d6091",  # Barbershop Man
+            voice_name="Barbershop Man",
+            output_format="pcm_44100"
        )

        llm = OpenAILLMService(
--- a/examples/foundational/17-detect-user-idle.py
+++ b/examples/foundational/17-detect-user-idle.py
@@ -1,108 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import aiohttp
-import os
-import sys
-
-from pipecat.frames.frames import LLMMessagesFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
-from pipecat.processors.frame_processor import FrameDirection
-from pipecat.processors.user_idle_processor import UserIdleProcessor
-from pipecat.services.elevenlabs import ElevenLabsTTSService
-from pipecat.services.openai import OpenAILLMService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-async def main(room_url: str, token):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Respond bot",
-            DailyParams(
-                audio_out_enabled=True,
-                transcription_enabled=True,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer()
-            )
-        )
-
-        tts = ElevenLabsTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("ELEVENLABS_API_KEY"),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        async def user_idle_callback(user_idle: UserIdleProcessor):
-            messages.append(
-                {"role": "system", "content": "Ask the user if they are still there and try to prompt for some input, but be short."})
-            await user_idle.queue_frame(LLMMessagesFrame(messages))
-
-        user_idle = UserIdleProcessor(callback=user_idle_callback, timeout=5.0)
-
-        pipeline = Pipeline([
-            transport.input(),   # Transport user input
-            user_idle,           # Idle user check-in
-            tma_in,              # User responses
-            llm,                 # LLM
-            tts,                 # TTS
-            transport.output(),  # Transport bot output
-            tma_out              # Assistant spoken responses
-        ])
-
-        task = PipelineTask(pipeline, PipelineParams(
-            allow_interruptions=True,
-            enable_metrics=True,
-            report_only_initial_ttfb=True,
-        ))
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            # Kick off the conversation.
-            messages.append(
-                {"role": "system", "content": "Please introduce yourself to the user."})
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        runner = PipelineRunner()
-
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    (url, token) = configure()
-    asyncio.run(main(url, token))
--- a/examples/storytelling-chatbot/frontend/yarn.lock
+++ b/examples/storytelling-chatbot/frontend/yarn.lock
@@ -899,11 +899,11 @@ brace-expansion@^2.0.1:
    balanced-match "^1.0.0"

 braces@^3.0.2, braces@~3.0.2:
-  version "3.0.3"
-  resolved "https://registry.yarnpkg.com/braces/-/braces-3.0.3.tgz#490332f40919452272d55a8480adc0c441358789"
-  integrity "sha1-SQMy9AkZRSJy1VqEgK3AxEE1h4k= sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA=="
+  version "3.0.2"
+  resolved "https://registry.yarnpkg.com/braces/-/braces-3.0.2.tgz#3454e1a462ee8d599e236df336cd9ea4f8afe107"
+  integrity sha512-b8um+L1RzM3WDSzvhm6gIz1yfTbBt6YTlcEKAvsmqCZZFw46z626lVj9j1yEPW33H5H+lBQpZMP1k8l+78Ha0A==
  dependencies:
-    fill-range "^7.1.1"
+    fill-range "^7.0.1"

 browserslist@^4.23.0:
  version "4.23.0"
@@ -1551,10 +1551,10 @@ file-entry-cache@^6.0.1:
  dependencies:
    flat-cache "^3.0.4"

-fill-range@^7.1.1:
-  version "7.1.1"
-  resolved "https://registry.yarnpkg.com/fill-range/-/fill-range-7.1.1.tgz#44265d3cac07e3ea7dc247516380643754a05292"
-  integrity "sha1-RCZdPKwH4+p9wkdRY4BkN1SgUpI= sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg=="
+fill-range@^7.0.1:
+  version "7.0.1"
+  resolved "https://registry.yarnpkg.com/fill-range/-/fill-range-7.0.1.tgz#1919a6a7c75fe38b2c7c77e5198535da9acdda40"
+  integrity sha512-qOo9F+dMUmC2Lcb4BbVvnKJxTPjCm+RRpe4gDuGrzkL7mEVl/djYSu2OdQ2Pa302N4oqkSg9ir6jaLWJ2USVpQ==
  dependencies:
    to-regex-range "^5.0.1"

--- a/examples/twilio-chatbot/bot.py
+++ b/examples/twilio-chatbot/bot.py
@@ -15,7 +15,6 @@ from pipecat.services.deepgram import DeepgramSTTService
 from pipecat.services.elevenlabs import ElevenLabsTTSService
 from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams
 from pipecat.vad.silero import SileroVADAnalyzer
-from pipecat.serializers.twilio import TwilioFrameSerializer

 from loguru import logger

@@ -26,7 +25,7 @@ logger.remove(0)
 logger.add(sys.stderr, level="DEBUG")


-async def run_bot(websocket_client, stream_sid):
+async def run_bot(websocket_client):
    async with aiohttp.ClientSession() as session:
        transport = FastAPIWebsocketTransport(
            websocket=websocket_client,
@@ -35,8 +34,7 @@ async def run_bot(websocket_client, stream_sid):
                add_wav_header=False,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
-                vad_audio_passthrough=True,
-                serializer=TwilioFrameSerializer(stream_sid)
+                vad_audio_passthrough=True
            )
        )

--- a/examples/twilio-chatbot/server.py
+++ b/examples/twilio-chatbot/server.py
@@ -1,5 +1,3 @@
-import json
-
 import uvicorn

 from fastapi import FastAPI, WebSocket
@@ -28,13 +26,8 @@ async def start_call():
@app.websocket("/ws")
 async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
-    start_data = websocket.iter_text()
-    await start_data.__anext__()
-    call_data = json.loads(await start_data.__anext__())
-    print(call_data, flush=True)
-    stream_sid = call_data['start']['streamSid']
    print("WebSocket connection accepted")
-    await run_bot(websocket, stream_sid)
+    await run_bot(websocket)


 if __name__ == "__main__":
--- a/linux-py3.10-requirements.txt
+++ b/linux-py3.10-requirements.txt
@@ -4,7 +4,7 @@
 #
 #    pip-compile --all-extras pyproject.toml
 #
-aiofiles==24.1.0
+aiofiles==23.2.1
    # via deepgram-sdk
 aiohttp==3.9.5
    # via
@@ -17,7 +17,7 @@ aiosignal==1.3.1
    # via aiohttp
 annotated-types==0.7.0
    # via pydantic
-anthropic==0.28.1
+anthropic==0.25.9
    # via
    #   openpipe
    #   pipecat-ai (pyproject.toml)
@@ -36,21 +36,23 @@ attrs==23.2.0
    # via
    #   aiohttp
    #   openpipe
-av==12.2.0
+av==12.1.0
    # via faster-whisper
-azure-cognitiveservices-speech==1.38.0
+azure-cognitiveservices-speech==1.37.0
    # via pipecat-ai (pyproject.toml)
 blinker==1.8.2
    # via flask
 cachetools==5.3.3
    # via google-auth
-cartesia==1.0.3
+cartesia==0.1.1
    # via pipecat-ai (pyproject.toml)
 certifi==2024.6.2
    # via
    #   httpcore
    #   httpx
    #   requests
+cffi==1.16.0
+    # via sounddevice
 charset-normalizer==3.3.2
    # via requests
 click==8.1.7
@@ -62,7 +64,7 @@ coloredlogs==15.0.1
    # via onnxruntime
 ctranslate2==4.3.1
    # via faster-whisper
-daily-python==0.10.1
+daily-python==0.10.0
    # via pipecat-ai (pyproject.toml)
 dataclasses-json==0.6.7
    # via
@@ -84,15 +86,15 @@ exceptiongroup==1.2.1
    # via
    #   anyio
    #   pytest
-fal-client==0.4.1
+fal-client==0.4.0
    # via pipecat-ai (pyproject.toml)
 fastapi==0.111.0
    # via pipecat-ai (pyproject.toml)
 fastapi-cli==0.0.4
    # via fastapi
-faster-whisper==1.0.3
+faster-whisper==1.0.2
    # via pipecat-ai (pyproject.toml)
-filelock==3.15.4
+filelock==3.15.3
    # via
    #   huggingface-hub
    #   pyht
@@ -111,22 +113,22 @@ frozenlist==1.4.1
    # via
    #   aiohttp
    #   aiosignal
-fsspec==2024.6.1
+fsspec==2024.6.0
    # via
    #   huggingface-hub
    #   torch
 future==1.0.0
    # via pyloudnorm
-google-ai-generativelanguage==0.6.6
+google-ai-generativelanguage==0.6.4
    # via google-generativeai
-google-api-core[grpc]==2.19.1
+google-api-core[grpc]==2.19.0
    # via
    #   google-ai-generativelanguage
    #   google-api-python-client
    #   google-generativeai
-google-api-python-client==2.135.0
+google-api-python-client==2.134.0
    # via google-generativeai
-google-auth==2.31.0
+google-auth==2.30.0
    # via
    #   google-ai-generativelanguage
    #   google-api-core
@@ -135,9 +137,9 @@ google-auth==2.31.0
    #   google-generativeai
 google-auth-httplib2==0.2.0
    # via google-api-python-client
-google-generativeai==0.7.1
+google-generativeai==0.5.4
    # via pipecat-ai (pyproject.toml)
-googleapis-common-protos==1.63.2
+googleapis-common-protos==1.63.1
    # via
    #   google-api-core
    #   grpcio-status
@@ -197,35 +199,31 @@ jinja2==3.1.4
    #   fastapi
    #   flask
    #   torch
-jiter==0.5.0
-    # via anthropic
 jsonpatch==1.33
    # via langchain-core
 jsonpointer==3.0.0
    # via jsonpatch
-langchain==0.2.6
+langchain==0.2.5
    # via
    #   langchain-community
    #   pipecat-ai (pyproject.toml)
-langchain-community==0.2.6
+langchain-community==0.2.5
    # via pipecat-ai (pyproject.toml)
-langchain-core==0.2.10
+langchain-core==0.2.9
    # via
    #   langchain
    #   langchain-community
    #   langchain-openai
    #   langchain-text-splitters
-langchain-openai==0.1.10
+langchain-openai==0.1.9
    # via pipecat-ai (pyproject.toml)
-langchain-text-splitters==0.2.2
+langchain-text-splitters==0.2.1
    # via langchain
-langsmith==0.1.83
+langsmith==0.1.81
    # via
    #   langchain
    #   langchain-community
    #   langchain-core
-llvmlite==0.43.0
-    # via numba
 loguru==0.7.2
    # via pipecat-ai (pyproject.toml)
 markdown-it-py==3.0.0
@@ -248,18 +246,14 @@ mypy-extensions==1.0.0
    # via typing-inspect
 networkx==3.3
    # via torch
-numba==0.60.0
-    # via resampy
 numpy==1.26.4
    # via
    #   ctranslate2
    #   langchain
    #   langchain-community
-    #   numba
    #   onnxruntime
    #   pipecat-ai (pyproject.toml)
    #   pyloudnorm
-    #   resampy
    #   scipy
    #   torchvision
    #   transformers
@@ -288,20 +282,20 @@ nvidia-cusparse-cu12==12.1.0.106
    #   torch
 nvidia-nccl-cu12==2.20.5
    # via torch
-nvidia-nvjitlink-cu12==12.5.82
+nvidia-nvjitlink-cu12==12.5.40
    # via
    #   nvidia-cusolver-cu12
    #   nvidia-cusparse-cu12
 nvidia-nvtx-cu12==12.1.105
    # via torch
-onnxruntime==1.18.1
+onnxruntime==1.18.0
    # via faster-whisper
-openai==1.27.0
+openai==1.26.0
    # via
    #   langchain-openai
    #   openpipe
    #   pipecat-ai (pyproject.toml)
-openpipe==4.16.0
+openpipe==4.14.0
    # via pipecat-ai (pyproject.toml)
 orjson==3.10.5
    # via
@@ -344,7 +338,9 @@ pyasn1-modules==0.4.0
    # via google-auth
 pyaudio==0.2.14
    # via pipecat-ai (pyproject.toml)
-pydantic==2.8.0
+pycparser==2.22
+    # via cffi
+pydantic==2.7.4
    # via
    #   anthropic
    #   fastapi
@@ -353,7 +349,7 @@ pydantic==2.8.0
    #   langchain-core
    #   langsmith
    #   openai
-pydantic-core==2.20.0
+pydantic-core==2.18.4
    # via pydantic
 pygments==2.18.0
    # via rich
@@ -400,8 +396,6 @@ requests==2.32.3
    #   pyht
    #   tiktoken
    #   transformers
-resampy==0.4.3
-    # via pipecat-ai (pyproject.toml)
 rich==13.7.1
    # via typer
 rsa==4.9
@@ -410,7 +404,7 @@ safetensors==0.4.3
    # via
    #   timm
    #   transformers
-scipy==1.14.0
+scipy==1.13.1
    # via pyloudnorm
 shellingham==1.5.4
    # via typer
@@ -422,6 +416,8 @@ sniffio==1.3.1
    #   anyio
    #   httpx
    #   openai
+sounddevice==0.4.7
+    # via pipecat-ai (pyproject.toml)
 sqlalchemy==2.0.31
    # via
    #   langchain
@@ -432,7 +428,7 @@ sympy==1.12.1
    # via
    #   onnxruntime
    #   torch
-tenacity==8.4.2
+tenacity==8.4.1
    # via
    #   langchain
    #   langchain-community
--- a/macos-py3.10-requirements.txt
+++ b/macos-py3.10-requirements.txt
@@ -1,10 +1,10 @@
 #
-# This file is autogenerated by pip-compile with Python 3.10
+# This file is autogenerated by pip-compile with Python 3.12
 # by the following command:
 #
 #    pip-compile --all-extras pyproject.toml
 #
-aiofiles==24.1.0
+aiofiles==23.2.1
    # via deepgram-sdk
 aiohttp==3.9.5
    # via
@@ -17,7 +17,7 @@ aiosignal==1.3.1
    # via aiohttp
 annotated-types==0.7.0
    # via pydantic
-anthropic==0.28.1
+anthropic==0.25.9
    # via
    #   openpipe
    #   pipecat-ai (pyproject.toml)
@@ -28,29 +28,27 @@ anyio==4.4.0
    #   openai
    #   starlette
    #   watchfiles
-async-timeout==4.0.3
-    # via
-    #   aiohttp
-    #   langchain
 attrs==23.2.0
    # via
    #   aiohttp
    #   openpipe
-av==12.2.0
+av==12.1.0
    # via faster-whisper
-azure-cognitiveservices-speech==1.38.0
+azure-cognitiveservices-speech==1.37.0
    # via pipecat-ai (pyproject.toml)
 blinker==1.8.2
    # via flask
 cachetools==5.3.3
    # via google-auth
-cartesia==1.0.3
+cartesia==0.1.1
    # via pipecat-ai (pyproject.toml)
 certifi==2024.6.2
    # via
    #   httpcore
    #   httpx
    #   requests
+cffi==1.16.0
+    # via sounddevice
 charset-normalizer==3.3.2
    # via requests
 click==8.1.7
@@ -62,7 +60,7 @@ coloredlogs==15.0.1
    # via onnxruntime
 ctranslate2==4.3.1
    # via faster-whisper
-daily-python==0.10.1
+daily-python==0.10.0
    # via pipecat-ai (pyproject.toml)
 dataclasses-json==0.6.7
    # via
@@ -80,19 +78,15 @@ einops==0.8.0
    # via pipecat-ai (pyproject.toml)
 email-validator==2.2.0
    # via fastapi
-exceptiongroup==1.2.1
-    # via
-    #   anyio
-    #   pytest
-fal-client==0.4.1
+fal-client==0.4.0
    # via pipecat-ai (pyproject.toml)
 fastapi==0.111.0
    # via pipecat-ai (pyproject.toml)
 fastapi-cli==0.0.4
    # via fastapi
-faster-whisper==1.0.3
+faster-whisper==1.0.2
    # via pipecat-ai (pyproject.toml)
-filelock==3.15.4
+filelock==3.15.3
    # via
    #   huggingface-hub
    #   pyht
@@ -110,22 +104,22 @@ frozenlist==1.4.1
    # via
    #   aiohttp
    #   aiosignal
-fsspec==2024.6.1
+fsspec==2024.6.0
    # via
    #   huggingface-hub
    #   torch
 future==1.0.0
    # via pyloudnorm
-google-ai-generativelanguage==0.6.6
+google-ai-generativelanguage==0.6.4
    # via google-generativeai
-google-api-core[grpc]==2.19.1
+google-api-core[grpc]==2.19.0
    # via
    #   google-ai-generativelanguage
    #   google-api-python-client
    #   google-generativeai
-google-api-python-client==2.135.0
+google-api-python-client==2.134.0
    # via google-generativeai
-google-auth==2.31.0
+google-auth==2.30.0
    # via
    #   google-ai-generativelanguage
    #   google-api-core
@@ -134,9 +128,9 @@ google-auth==2.31.0
    #   google-generativeai
 google-auth-httplib2==0.2.0
    # via google-api-python-client
-google-generativeai==0.7.1
+google-generativeai==0.5.4
    # via pipecat-ai (pyproject.toml)
-googleapis-common-protos==1.63.2
+googleapis-common-protos==1.63.1
    # via
    #   google-api-core
    #   grpcio-status
@@ -194,35 +188,31 @@ jinja2==3.1.4
    #   fastapi
    #   flask
    #   torch
-jiter==0.5.0
-    # via anthropic
 jsonpatch==1.33
    # via langchain-core
 jsonpointer==3.0.0
    # via jsonpatch
-langchain==0.2.6
+langchain==0.2.5
    # via
    #   langchain-community
    #   pipecat-ai (pyproject.toml)
-langchain-community==0.2.6
+langchain-community==0.2.5
    # via pipecat-ai (pyproject.toml)
-langchain-core==0.2.10
+langchain-core==0.2.9
    # via
    #   langchain
    #   langchain-community
    #   langchain-openai
    #   langchain-text-splitters
-langchain-openai==0.1.10
+langchain-openai==0.1.9
    # via pipecat-ai (pyproject.toml)
-langchain-text-splitters==0.2.2
+langchain-text-splitters==0.2.1
    # via langchain
-langsmith==0.1.83
+langsmith==0.1.81
    # via
    #   langchain
    #   langchain-community
    #   langchain-core
-llvmlite==0.43.0
-    # via numba
 loguru==0.7.2
    # via pipecat-ai (pyproject.toml)
 markdown-it-py==3.0.0
@@ -245,29 +235,25 @@ mypy-extensions==1.0.0
    # via typing-inspect
 networkx==3.3
    # via torch
-numba==0.60.0
-    # via resampy
 numpy==1.26.4
    # via
    #   ctranslate2
    #   langchain
    #   langchain-community
-    #   numba
    #   onnxruntime
    #   pipecat-ai (pyproject.toml)
    #   pyloudnorm
-    #   resampy
    #   scipy
    #   torchvision
    #   transformers
-onnxruntime==1.18.1
+onnxruntime==1.18.0
    # via faster-whisper
-openai==1.27.0
+openai==1.26.0
    # via
    #   langchain-openai
    #   openpipe
    #   pipecat-ai (pyproject.toml)
-openpipe==4.16.0
+openpipe==4.14.0
    # via pipecat-ai (pyproject.toml)
 orjson==3.10.5
    # via
@@ -310,7 +296,9 @@ pyasn1-modules==0.4.0
    # via google-auth
 pyaudio==0.2.14
    # via pipecat-ai (pyproject.toml)
-pydantic==2.8.0
+pycparser==2.22
+    # via cffi
+pydantic==2.7.4
    # via
    #   anthropic
    #   fastapi
@@ -319,7 +307,7 @@ pydantic==2.8.0
    #   langchain-core
    #   langsmith
    #   openai
-pydantic-core==2.20.0
+pydantic-core==2.18.4
    # via pydantic
 pygments==2.18.0
    # via rich
@@ -366,8 +354,6 @@ requests==2.32.3
    #   pyht
    #   tiktoken
    #   transformers
-resampy==0.4.3
-    # via pipecat-ai (pyproject.toml)
 rich==13.7.1
    # via typer
 rsa==4.9
@@ -376,7 +362,7 @@ safetensors==0.4.3
    # via
    #   timm
    #   transformers
-scipy==1.14.0
+scipy==1.13.1
    # via pyloudnorm
 shellingham==1.5.4
    # via typer
@@ -388,6 +374,8 @@ sniffio==1.3.1
    #   anyio
    #   httpx
    #   openai
+sounddevice==0.4.7
+    # via pipecat-ai (pyproject.toml)
 sqlalchemy==2.0.31
    # via
    #   langchain
@@ -398,7 +386,7 @@ sympy==1.12.1
    # via
    #   onnxruntime
    #   torch
-tenacity==8.4.2
+tenacity==8.4.1
    # via
    #   langchain
    #   langchain-community
@@ -412,8 +400,6 @@ tokenizers==0.19.1
    #   anthropic
    #   faster-whisper
    #   transformers
-tomli==2.0.1
-    # via pytest
 torch==2.3.1
    # via
    #   pipecat-ai (pyproject.toml)
@@ -437,7 +423,6 @@ typer==0.12.3
 typing-extensions==4.12.2
    # via
    #   anthropic
-    #   anyio
    #   deepgram-sdk
    #   fastapi
    #   google-generativeai
@@ -450,7 +435,6 @@ typing-extensions==4.12.2
    #   torch
    #   typer
    #   typing-inspect
-    #   uvicorn
 typing-inspect==0.9.0
    # via dataclasses-json
 ujson==5.10.0
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -34,26 +34,24 @@ Source = "https://github.com/pipecat-ai/pipecat"
 Website = "https://pipecat.ai"

 [project.optional-dependencies]
-anthropic = [ "anthropic~=0.28.1" ]
-azure = [ "azure-cognitiveservices-speech~=1.38.0" ]
-cartesia = [ "cartesia~=1.0.3" ]
-daily = [ "daily-python~=0.10.1" ]
+anthropic = [ "anthropic~=0.25.7" ]
+azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
+cartesia = [ "numpy~=1.26.0", "sounddevice", "cartesia" ]
+daily = [ "daily-python~=0.10.0" ]
 deepgram = [ "deepgram-sdk~=3.2.7" ]
 examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
-fal = [ "fal-client~=0.4.1" ]
-gladia = [ "websockets~=12.0" ]
-google = [ "google-generativeai~=0.7.1" ]
-fireworks = [ "openai~=1.27.0" ]
-langchain = [ "langchain~=0.2.6", "langchain-community~=0.2.6", "langchain-openai~=0.1.10" ]
+fal = [ "fal-client~=0.4.0" ]
+google = [ "google-generativeai~=0.5.3" ]
+fireworks = [ "openai~=1.26.0" ]
+langchain = [ "langchain~=0.2.1", "langchain-community~=0.2.1", "langchain-openai~=0.1.8" ]
 local = [ "pyaudio~=0.2.0" ]
 moondream = [ "einops~=0.8.0", "timm~=0.9.16", "transformers~=4.40.2" ]
-openai = [ "openai~=1.27.0" ]
-openpipe = [ "openpipe~=4.16.0" ]
+openai = [ "openai~=1.26.0" ]
+openpipe = [ "openpipe~=4.14.0" ]
 playht = [ "pyht~=0.0.28" ]
-silero = [ "torch~=2.3.1", "torchaudio~=2.3.1" ]
+silero = [ "torch~=2.3.0", "torchaudio~=2.3.0" ]
 websocket = [ "websockets~=12.0", "fastapi~=0.111.0" ]
-whisper = [ "faster-whisper~=1.0.3" ]
-xtts = [ "resampy~=0.4.3" ]
+whisper = [ "faster-whisper~=1.0.2" ]

 [tool.setuptools.packages.find]
 # All the following settings are optional:
--- a/src/pipecat/frames/frames.py
+++ b/src/pipecat/frames/frames.py
@@ -240,23 +240,12 @@ class StopInterruptionFrame(SystemFrame):
    pass


-@dataclass
-class BotSpeakingFrame(SystemFrame):
-    """Emitted by transport outputs while the bot is still speaking. This can be
-    used, for example, to detect when a user is idle. That is, while the bot is
-    speaking we don't want to trigger any user idle timeout since the user might
-    be listening.
-
-    """
-    pass
-
-
@dataclass
 class MetricsFrame(SystemFrame):
    """Emitted by processor that can compute metrics like latencies.
    """
-    ttfb: List[Mapping[str, Any]] | None = None
-    processing: List[Mapping[str, Any]] | None = None
+    ttfb: Mapping[str, float]
+

 #
 # Control frames
--- a/src/pipecat/pipeline/runner.py
+++ b/src/pipecat/pipeline/runner.py
@@ -15,7 +15,7 @@ from loguru import logger

 class PipelineRunner:

-    def __init__(self, *, name: str | None = None, handle_sigint: bool = True):
+    def __init__(self, name: str | None = None, handle_sigint: bool = True):
        self.id: int = obj_id()
        self.name: str = name or f"{self.__class__.__name__}#{obj_count(self)}"

--- a/src/pipecat/pipeline/task.py
+++ b/src/pipecat/pipeline/task.py
@@ -95,9 +95,8 @@ class PipelineTask:

    def _initial_metrics_frame(self) -> MetricsFrame:
        processors = self._pipeline.processors_with_metrics()
-        ttfb = [{"name": p.name, "time": 0.0} for p in processors]
-        processing = [{"name": p.name, "time": 0.0} for p in processors]
-        return MetricsFrame(ttfb=ttfb, processing=processing)
+        ttfb = dict(zip([p.name for p in processors], [0] * len(processors)))
+        return MetricsFrame(ttfb=ttfb)

    async def _process_down_queue(self):
        start_frame = StartFrame(
--- a/src/pipecat/processors/async_frame_processor.py
+++ b/src/pipecat/processors/async_frame_processor.py
@@ -1,63 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-
-from pipecat.frames.frames import EndFrame, Frame, StartInterruptionFrame
-from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
-
-
-class AsyncFrameProcessor(FrameProcessor):
-
-    def __init__(
-            self,
-            *,
-            name: str | None = None,
-            loop: asyncio.AbstractEventLoop | None = None,
-            **kwargs):
-        super().__init__(name=name, loop=loop, **kwargs)
-
-        self._create_push_task()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, StartInterruptionFrame):
-            await self._handle_interruptions(frame)
-
-    async def queue_frame(
-            self,
-            frame: Frame,
-            direction: FrameDirection = FrameDirection.DOWNSTREAM):
-        await self._push_queue.put((frame, direction))
-
-    async def cleanup(self):
-        self._push_frame_task.cancel()
-        await self._push_frame_task
-
-    async def _handle_interruptions(self, frame: Frame):
-        # Cancel the task. This will stop pushing frames downstream.
-        self._push_frame_task.cancel()
-        await self._push_frame_task
-        # Push an out-of-band frame (i.e. not using the ordered push
-        # frame task).
-        await self.push_frame(frame)
-        # Create a new queue and task.
-        self._create_push_task()
-
-    def _create_push_task(self):
-        self._push_queue = asyncio.Queue()
-        self._push_frame_task = self.get_event_loop().create_task(self._push_frame_task_handler())
-
-    async def _push_frame_task_handler(self):
-        running = True
-        while running:
-            try:
-                (frame, direction) = await self._push_queue.get()
-                await self.push_frame(frame, direction)
-                running = not isinstance(frame, EndFrame)
-            except asyncio.CancelledError:
-                break
--- a/src/pipecat/processors/filters/wake_check_filter.py
+++ b/src/pipecat/processors/filters/wake_check_filter.py
@@ -82,5 +82,5 @@ class WakeCheckFilter(FrameProcessor):
                await self.push_frame(frame, direction)
        except Exception as e:
            error_msg = f"Error in wake word filter: {e}"
-            logger.exception(error_msg)
+            logger.error(error_msg)
            await self.push_error(ErrorFrame(error_msg))
--- a/src/pipecat/processors/frame_processor.py
+++ b/src/pipecat/processors/frame_processor.py
@@ -9,7 +9,7 @@ import time

 from enum import Enum

-from pipecat.frames.frames import ErrorFrame, Frame, MetricsFrame, StartFrame, StartInterruptionFrame, UserStoppedSpeakingFrame
+from pipecat.frames.frames import ErrorFrame, Frame, MetricsFrame, StartFrame, UserStoppedSpeakingFrame
 from pipecat.utils.utils import obj_count, obj_id

 from loguru import logger
@@ -20,53 +20,10 @@ class FrameDirection(Enum):
    UPSTREAM = 2


-class FrameProcessorMetrics:
-    def __init__(self, name: str):
-        self._name = name
-        self._start_ttfb_time = 0
-        self._start_processing_time = 0
-        self._should_report_ttfb = True
-
-    async def start_ttfb_metrics(self, report_only_initial_ttfb):
-        if self._should_report_ttfb:
-            self._start_ttfb_time = time.time()
-            self._should_report_ttfb = not report_only_initial_ttfb
-
-    async def stop_ttfb_metrics(self):
-        if self._start_ttfb_time == 0:
-            return None
-
-        value = time.time() - self._start_ttfb_time
-        logger.debug(f"{self._name} TTFB: {value}")
-        ttfb = {
-            "processor": self._name,
-            "value": value
-        }
-        self._start_ttfb_time = 0
-        return MetricsFrame(ttfb=[ttfb])
-
-    async def start_processing_metrics(self):
-        self._start_processing_time = time.time()
-
-    async def stop_processing_metrics(self):
-        if self._start_processing_time == 0:
-            return None
-
-        value = time.time() - self._start_processing_time
-        logger.debug(f"{self._name} processing time: {value}")
-        processing = {
-            "processor": self._name,
-            "value": value
-        }
-        self._start_processing_time = 0
-        return MetricsFrame(processing=[processing])
-
-
 class FrameProcessor:

    def __init__(
            self,
-            *,
            name: str | None = None,
            loop: asyncio.AbstractEventLoop | None = None,
            **kwargs):
@@ -82,7 +39,8 @@ class FrameProcessor:
        self._report_only_initial_ttfb = False

        # Metrics
-        self._metrics = FrameProcessorMetrics(name=self.name)
+        self._start_ttfb_time = 0
+        self._should_report_ttfb = True

    @property
    def interruptions_allowed(self):
@@ -100,28 +58,16 @@ class FrameProcessor:
        return False

    async def start_ttfb_metrics(self):
-        if self.can_generate_metrics() and self.metrics_enabled:
-            await self._metrics.start_ttfb_metrics(self._report_only_initial_ttfb)
+        if self.metrics_enabled and self._should_report_ttfb:
+            self._start_ttfb_time = time.time()
+            self._should_report_ttfb = not self._report_only_initial_ttfb

    async def stop_ttfb_metrics(self):
-        if self.can_generate_metrics() and self.metrics_enabled:
-            frame = await self._metrics.stop_ttfb_metrics()
-            if frame:
-                await self.push_frame(frame)
-
-    async def start_processing_metrics(self):
-        if self.can_generate_metrics() and self.metrics_enabled:
-            await self._metrics.start_processing_metrics()
-
-    async def stop_processing_metrics(self):
-        if self.can_generate_metrics() and self.metrics_enabled:
-            frame = await self._metrics.stop_processing_metrics()
-            if frame:
-                await self.push_frame(frame)
-
-    async def stop_all_metrics(self):
-        await self.stop_ttfb_metrics()
-        await self.stop_processing_metrics()
+        if self.metrics_enabled and self._start_ttfb_time > 0:
+            ttfb = time.time() - self._start_ttfb_time
+            logger.debug(f"{self.name} TTFB: {ttfb}")
+            await self.push_frame(MetricsFrame(ttfb={self.name: ttfb}))
+            self._start_ttfb_time = 0

    async def cleanup(self):
        pass
@@ -139,8 +85,6 @@ class FrameProcessor:
            self._allow_interruptions = frame.allow_interruptions
            self._enable_metrics = frame.enable_metrics
            self._report_only_initial_ttfb = frame.report_only_initial_ttfb
-        elif isinstance(frame, StartInterruptionFrame):
-            await self.stop_all_metrics()
        elif isinstance(frame, UserStoppedSpeakingFrame):
            self._should_report_ttfb = True

@@ -148,15 +92,12 @@ class FrameProcessor:
        await self.push_frame(error, FrameDirection.UPSTREAM)

    async def push_frame(self, frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM):
-        try:
-            if direction == FrameDirection.DOWNSTREAM and self._next:
-                logger.trace(f"Pushing {frame} from {self} to {self._next}")
-                await self._next.process_frame(frame, direction)
-            elif direction == FrameDirection.UPSTREAM and self._prev:
-                logger.trace(f"Pushing {frame} upstream from {self} to {self._prev}")
-                await self._prev.process_frame(frame, direction)
-        except Exception as e:
-            logger.exception(f"Uncaught exception in {self}: {e}")
+        if direction == FrameDirection.DOWNSTREAM and self._next:
+            logger.trace(f"Pushing {frame} from {self} to {self._next}")
+            await self._next.process_frame(frame, direction)
+        elif direction == FrameDirection.UPSTREAM and self._prev:
+            logger.trace(f"Pushing {frame} upstream from {self} to {self._prev}")
+            await self._prev.process_frame(frame, direction)

    def __str__(self):
        return self.name
--- a/src/pipecat/processors/frameworks/langchain.py
+++ b/src/pipecat/processors/frameworks/langchain.py
@@ -75,6 +75,5 @@ class LangchainProcessor(FrameProcessor):
        except GeneratorExit:
            logger.warning(f"{self} generator was closed prematurely")
        except Exception as e:
-            logger.exception(f"{self} an unknown error occurred: {e}")
-        finally:
-            await self.push_frame(LLMFullResponseEndFrame())
+            logger.error(f"{self} an unknown error occurred: {e}")
+        await self.push_frame(LLMFullResponseEndFrame())
--- a/src/pipecat/processors/idle_frame_processor.py
+++ b/src/pipecat/processors/idle_frame_processor.py
@@ -1,76 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-
-from typing import Awaitable, Callable, List
-
-from pipecat.frames.frames import Frame, SystemFrame
-from pipecat.processors.async_frame_processor import AsyncFrameProcessor
-from pipecat.processors.frame_processor import FrameDirection
-
-
-class IdleFrameProcessor(AsyncFrameProcessor):
-    """This class waits to receive any frame or list of desired frames within a
-    given timeout. If the timeout is reached before receiving any of those
-    frames the provided callback will be called.
-
-    The callback can then be used to push frames downstream by using
-    `queue_frame()` (or `push_frame()` for system frames).
-
-    """
-
-    def __init__(
-            self,
-            *,
-            callback: Callable[["IdleFrameProcessor"], Awaitable[None]],
-            timeout: float,
-            types: List[type] = [],
-            **kwargs):
-        super().__init__(**kwargs)
-
-        self._callback = callback
-        self._timeout = timeout
-        self._types = types
-
-        self._create_idle_task()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, SystemFrame):
-            await self.push_frame(frame, direction)
-        else:
-            await self.queue_frame(frame, direction)
-
-        # If we are not waiting for any specific frame set the event, otherwise
-        # check if we have received one of the desired frames.
-        if not self._types:
-            self._idle_event.set()
-        else:
-            for t in self._types:
-                if isinstance(frame, t):
-                    self._idle_event.set()
-
-        # If we are not waiting for any specific frame set the event, otherwise
-    async def cleanup(self):
-        self._idle_task.cancel()
-        await self._idle_task
-
-    def _create_idle_task(self):
-        self._idle_event = asyncio.Event()
-        self._idle_task = self.get_event_loop().create_task(self._idle_task_handler())
-
-    async def _idle_task_handler(self):
-        while True:
-            try:
-                await asyncio.wait_for(self._idle_event.wait(), timeout=self._timeout)
-            except asyncio.TimeoutError:
-                await self._callback(self)
-            except asyncio.CancelledError:
-                break
-            finally:
-                self._idle_event.clear()
--- a/src/pipecat/processors/user_idle_processor.py
+++ b/src/pipecat/processors/user_idle_processor.py
@@ -1,77 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-
-from typing import Awaitable, Callable
-
-from pipecat.frames.frames import BotSpeakingFrame, Frame, StartInterruptionFrame, StopInterruptionFrame, SystemFrame
-from pipecat.processors.async_frame_processor import AsyncFrameProcessor
-from pipecat.processors.frame_processor import FrameDirection
-
-
-class UserIdleProcessor(AsyncFrameProcessor):
-    """This class is useful to check if the user is interacting with the bot
-    within a given timeout. If the timeout is reached before any interaction
-    occurred the provided callback will be called.
-
-    The callback can then be used to push frames downstream by using
-    `queue_frame()` (or `push_frame()` for system frames).
-
-    """
-
-    def __init__(
-            self,
-            *,
-            callback: Callable[["UserIdleProcessor"], Awaitable[None]],
-            timeout: float,
-            **kwargs):
-        super().__init__(**kwargs)
-
-        self._callback = callback
-        self._timeout = timeout
-
-        self._interrupted = False
-
-        self._create_idle_task()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, SystemFrame):
-            await self.push_frame(frame, direction)
-        else:
-            await self.queue_frame(frame, direction)
-
-        # We shouldn't call the idle callback if the user or the bot are speaking.
-        if isinstance(frame, StartInterruptionFrame):
-            self._interrupted = True
-            self._idle_event.set()
-        elif isinstance(frame, StopInterruptionFrame):
-            self._interrupted = False
-            self._idle_event.set()
-        elif isinstance(frame, BotSpeakingFrame):
-            self._idle_event.set()
-
-    async def cleanup(self):
-        self._idle_task.cancel()
-        await self._idle_task
-
-    def _create_idle_task(self):
-        self._idle_event = asyncio.Event()
-        self._idle_task = self.get_event_loop().create_task(self._idle_task_handler())
-
-    async def _idle_task_handler(self):
-        while True:
-            try:
-                await asyncio.wait_for(self._idle_event.wait(), timeout=self._timeout)
-            except asyncio.TimeoutError:
-                if not self._interrupted:
-                    await self._callback(self)
-            except asyncio.CancelledError:
-                break
-            finally:
-                self._idle_event.clear()
--- a/src/pipecat/serializers/twilio.py
+++ b/src/pipecat/serializers/twilio.py
@@ -17,8 +17,8 @@ class TwilioFrameSerializer(FrameSerializer):
        AudioRawFrame: "audio",
    }

-    def __init__(self, stream_sid: str):
-        self._stream_sid = stream_sid
+    def __init__(self):
+        self._sid = None

    def serialize(self, frame: Frame) -> str | bytes | None:
        if not isinstance(frame, AudioRawFrame):
@@ -30,7 +30,7 @@ class TwilioFrameSerializer(FrameSerializer):
        payload = base64.b64encode(serialized_data).decode("utf-8")
        answer = {
            "event": "media",
-            "streamSid": self._stream_sid,
+            "streamSid": self._sid,
            "media": {
                "payload": payload
            }
@@ -41,6 +41,9 @@ class TwilioFrameSerializer(FrameSerializer):
    def deserialize(self, data: str | bytes) -> Frame | None:
        message = json.loads(data)

+        if not self._sid:
+            self._sid = message["streamSid"] if "streamSid" in message else None
+
        if message["event"] != "media":
            return None
        else:
--- a/src/pipecat/services/ai_services.py
+++ b/src/pipecat/services/ai_services.py
@@ -16,15 +16,13 @@ from pipecat.frames.frames import (
    EndFrame,
    ErrorFrame,
    Frame,
-    LLMFullResponseEndFrame,
    StartFrame,
-    StartInterruptionFrame,
    TTSStartedFrame,
    TTSStoppedFrame,
    TextFrame,
    VisionImageRawFrame,
+    LLMFullResponseEndFrame,
 )
-from pipecat.processors.async_frame_processor import AsyncFrameProcessor
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.utils.audio import calculate_audio_volume
 from pipecat.utils.utils import exp_smoothing
@@ -61,30 +59,6 @@ class AIService(FrameProcessor):
                await self.push_frame(f)


-class AsyncAIService(AsyncFrameProcessor):
-    def __init__(self, **kwargs):
-        super().__init__(**kwargs)
-
-    async def start(self, frame: StartFrame):
-        pass
-
-    async def stop(self, frame: EndFrame):
-        pass
-
-    async def cancel(self, frame: CancelFrame):
-        pass
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, StartFrame):
-            await self.start(frame)
-        elif isinstance(frame, CancelFrame):
-            await self.cancel(frame)
-        elif isinstance(frame, EndFrame):
-            await self.stop(frame)
-
-
 class LLMService(AIService):
    """This class is a no-op but serves as a base class for LLM services."""

@@ -118,7 +92,7 @@ class LLMService(AIService):


 class TTSService(AIService):
-    def __init__(self, *, aggregate_sentences: bool = True, **kwargs):
+    def __init__(self, aggregate_sentences: bool = True, **kwargs):
        super().__init__(**kwargs)
        self._aggregate_sentences: bool = aggregate_sentences
        self._current_sentence: str = ""
@@ -140,21 +114,15 @@ class TTSService(AIService):
            if self._current_sentence.strip().endswith(
                    (".", "?", "!")) and not self._current_sentence.strip().endswith(
                    ("Mr,", "Mrs.", "Ms.", "Dr.")):
-                text = self._current_sentence
+                text = self._current_sentence.strip()
                self._current_sentence = ""

        if text:
            await self._push_tts_frames(text)

    async def _push_tts_frames(self, text: str):
-        text = text.strip()
-        if not text:
-            return
-
        await self.push_frame(TTSStartedFrame())
-        await self.start_processing_metrics()
        await self.process_generator(self.run_tts(text))
-        await self.stop_processing_metrics()
        await self.push_frame(TTSStoppedFrame())
        # We send the original text after the audio. This way, if we are
        # interrupted, the text is not added to the assistant context.
@@ -165,12 +133,14 @@ class TTSService(AIService):

        if isinstance(frame, TextFrame):
            await self._process_text_frame(frame)
-        elif isinstance(frame, StartInterruptionFrame):
-            self._current_sentence = ""
-            await self.push_frame(frame, direction)
-        elif isinstance(frame, LLMFullResponseEndFrame) or isinstance(frame, EndFrame):
-            self._current_sentence = ""
-            await self._push_tts_frames(self._current_sentence)
+        elif isinstance(frame, EndFrame):
+            if self._current_sentence:
+                await self._push_tts_frames(self._current_sentence)
+            await self.push_frame(frame)
+        elif isinstance(frame, LLMFullResponseEndFrame):
+            if self._current_sentence:
+                await self._push_tts_frames(self._current_sentence.strip())
+                self._current_sentence = ""
            await self.push_frame(frame)
        else:
            await self.push_frame(frame, direction)
@@ -180,7 +150,6 @@ class STTService(AIService):
    """STTService is a base class for speech-to-text services."""

    def __init__(self,
-                 *,
                 min_volume: float = 0.6,
                 max_silence_secs: float = 0.3,
                 max_buffer_secs: float = 1.5,
@@ -236,9 +205,7 @@ class STTService(AIService):
            self._silence_num_frames = 0
            self._wave.close()
            self._content.seek(0)
-            await self.start_processing_metrics()
            await self.process_generator(self.run_stt(self._content.read()))
-            await self.stop_processing_metrics()
            (self._content, self._wave) = self._new_wave()

    async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -271,9 +238,7 @@ class ImageGenService(AIService):

        if isinstance(frame, TextFrame):
            await self.push_frame(frame, direction)
-            await self.start_processing_metrics()
            await self.process_generator(self.run_image_gen(frame.text))
-            await self.stop_processing_metrics()
        else:
            await self.push_frame(frame, direction)

@@ -293,8 +258,6 @@ class VisionService(AIService):
        await super().process_frame(frame, direction)

        if isinstance(frame, VisionImageRawFrame):
-            await self.start_processing_metrics()
            await self.process_generator(self.run_vision(frame))
-            await self.stop_processing_metrics()
        else:
            await self.push_frame(frame, direction)
--- a/src/pipecat/services/anthropic.py
+++ b/src/pipecat/services/anthropic.py
@@ -41,7 +41,6 @@ class AnthropicLLMService(LLMService):

    def __init__(
            self,
-            *,
            api_key: str,
            model: str = "claude-3-opus-20240229",
            max_tokens: int = 1024):
@@ -123,7 +122,7 @@ class AnthropicLLMService(LLMService):
                    await self.push_frame(LLMResponseEndFrame())

        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
+            logger.error(f"{self} exception: {e}")
        finally:
            await self.push_frame(LLMFullResponseEndFrame())

--- a/src/pipecat/services/azure.py
+++ b/src/pipecat/services/azure.py
@@ -12,18 +12,9 @@ import time
 from PIL import Image
 from typing import AsyncGenerator

-from pipecat.frames.frames import (
-    AudioRawFrame,
-    CancelFrame,
-    EndFrame,
-    ErrorFrame,
-    Frame,
-    StartFrame,
-    SystemFrame,
-    TranscriptionFrame,
-    URLImageRawFrame)
+from pipecat.frames.frames import AudioRawFrame, CancelFrame, EndFrame, ErrorFrame, Frame, StartFrame, SystemFrame, TranscriptionFrame, URLImageRawFrame
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.ai_services import AsyncAIService, TTSService, ImageGenService
+from pipecat.services.ai_services import AIService, TTSService, ImageGenService
 from pipecat.services.openai import BaseOpenAILLMService

 from loguru import logger
@@ -43,7 +34,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use Azure, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
+        "In order to use Azure TTS, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
    raise Exception(f"Missing module: {e}")


@@ -82,7 +73,7 @@ class AzureTTSService(TTSService):
        return True

    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
-        logger.debug(f"Generating TTS: [{text}]")
+        logger.debug(f"Generating TTS: {text}")

        await self.start_ttfb_metrics()

@@ -109,7 +100,7 @@ class AzureTTSService(TTSService):
                logger.error(f"{self} error: {cancellation_details.error_details}")


-class AzureSTTService(AsyncAIService):
+class AzureSTTService(AIService):
    def __init__(
            self,
            *,
@@ -132,6 +123,8 @@ class AzureSTTService(AsyncAIService):
            speech_config=speech_config, audio_config=audio_config)
        self._speech_recognizer.recognized.connect(self._on_handle_recognized)

+        self._create_push_task()
+
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

@@ -147,16 +140,34 @@ class AzureSTTService(AsyncAIService):

    async def stop(self, frame: EndFrame):
        self._speech_recognizer.stop_continuous_recognition_async()
-        self._audio_stream.close()
+        await self._push_queue.put((frame, FrameDirection.DOWNSTREAM))
+        await self._push_frame_task

    async def cancel(self, frame: CancelFrame):
        self._speech_recognizer.stop_continuous_recognition_async()
-        self._audio_stream.close()
+        self._push_frame_task.cancel()
+        await self._push_frame_task
+
+    def _create_push_task(self):
+        self._push_queue = asyncio.Queue()
+        self._push_frame_task = self.get_event_loop().create_task(self._push_frame_task_handler())
+
+    async def _push_frame_task_handler(self):
+        running = True
+        while running:
+            try:
+                (frame, direction) = await self._push_queue.get()
+                await self.push_frame(frame, direction)
+                running = not isinstance(frame, EndFrame)
+            except asyncio.CancelledError:
+                break

    def _on_handle_recognized(self, event):
        if event.result.reason == ResultReason.RecognizedSpeech and len(event.result.text) > 0:
+            direction = FrameDirection.DOWNSTREAM
            frame = TranscriptionFrame(event.result.text, "", int(time.time_ns() / 1000000))
-            asyncio.run_coroutine_threadsafe(self.queue_frame(frame), self.get_event_loop())
+            asyncio.run_coroutine_threadsafe(
+                self._push_queue.put((frame, direction)), self.get_event_loop())


 class AzureImageGenServiceREST(ImageGenService):
--- a/src/pipecat/services/cartesia.py
+++ b/src/pipecat/services/cartesia.py
@@ -4,11 +4,11 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-from cartesia import AsyncCartesia
+from cartesia.tts import AsyncCartesiaTTS

 from typing import AsyncGenerator

-from pipecat.frames.frames import AudioRawFrame, CancelFrame, EndFrame, Frame, StartFrame
+from pipecat.frames.frames import AudioRawFrame, Frame
 from pipecat.services.ai_services import TTSService

 from loguru import logger
@@ -20,57 +20,44 @@ class CartesiaTTSService(TTSService):
            self,
            *,
            api_key: str,
-            voice_id: str,
-            model_id: str = "sonic-english",
-            encoding: str = "pcm_s16le",
-            sample_rate: int = 16000,
+            voice_name: str,
+            model_id: str = "upbeat-moon",
+            output_format: str = "pcm_16000",
            **kwargs):
        super().__init__(**kwargs)

        self._api_key = api_key
-        self._voice_id = voice_id
+        self._voice_name = voice_name
        self._model_id = model_id
-        self._output_format = {
-            "container": "raw",
-            "encoding": encoding,
-            "sample_rate": sample_rate,
-        }
-        self._client = None
+        self._output_format = output_format
+
+        try:
+            self._client = AsyncCartesiaTTS(api_key=self._api_key)
+            voices = self._client.get_voices()
+            voice_id = voices[self._voice_name]["id"]
+            self._voice = self._client.get_voice_embedding(voice_id=voice_id)
+        except Exception as e:
+            logger.error(f"{self} initialization error: {e}")

    def can_generate_metrics(self) -> bool:
        return True

-    async def start(self, frame: StartFrame):
-        try:
-            self._client = AsyncCartesia(api_key=self._api_key)
-            self._voice = self._client.voices.get(id=self._voice_id)
-        except Exception as e:
-            logger.exception(f"{self} initialization error: {e}")
-
-    async def stop(self, frame: EndFrame):
-        if self._client:
-            await self._client.close()
-
-    async def cancel(self, frame: CancelFrame):
-        if self._client:
-            await self._client.close()
-
    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
        logger.debug(f"Generating TTS: [{text}]")

        try:
            await self.start_ttfb_metrics()

-            chunk_generator = await self._client.tts.sse(
+            chunk_generator = await self._client.generate(
                stream=True,
                transcript=text,
-                voice_embedding=self._voice["embedding"],
+                voice=self._voice,
                model_id=self._model_id,
                output_format=self._output_format,
            )

            async for chunk in chunk_generator:
                await self.stop_ttfb_metrics()
-                yield AudioRawFrame(chunk["audio"], self._output_format["sample_rate"], 1)
+                yield AudioRawFrame(chunk["audio"], chunk["sampling_rate"], 1)
        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
+            logger.error(f"{self} exception: {e}")
--- a/src/pipecat/services/deepgram.py
+++ b/src/pipecat/services/deepgram.py
@@ -5,6 +5,7 @@
 #

 import aiohttp
+import asyncio
 import time

 from typing import AsyncGenerator
@@ -20,24 +21,17 @@ from pipecat.frames.frames import (
    SystemFrame,
    TranscriptionFrame)
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.ai_services import AsyncAIService, TTSService
+from pipecat.services.ai_services import AIService, TTSService
+
+from deepgram import (
+    DeepgramClient,
+    DeepgramClientOptions,
+    LiveTranscriptionEvents,
+    LiveOptions,
+)

 from loguru import logger

-# See .env.example for Deepgram configuration needed
-try:
-    from deepgram import (
-        DeepgramClient,
-        DeepgramClientOptions,
-        LiveTranscriptionEvents,
-        LiveOptions,
-    )
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error(
-        "In order to use Deepgram, you need to `pip install pipecat-ai[deepgram]`. Also, set `DEEPGRAM_API_KEY` environment variable.")
-    raise Exception(f"Missing module: {e}")
-

 class DeepgramTTSService(TTSService):

@@ -89,12 +83,11 @@ class DeepgramTTSService(TTSService):
                    frame = AudioRawFrame(audio=data, sample_rate=16000, num_channels=1)
                    yield frame
        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
+            logger.error(f"{self} exception: {e}")


-class DeepgramSTTService(AsyncAIService):
+class DeepgramSTTService(AIService):
    def __init__(self,
-                 *,
                 api_key: str,
                 url: str = "",
                 live_options: LiveOptions = LiveOptions(
@@ -116,6 +109,8 @@ class DeepgramSTTService(AsyncAIService):
        self._connection = self._client.listen.asynclive.v("1")
        self._connection.on(LiveTranscriptionEvents.Transcript, self._on_message)

+        self._create_push_task()
+
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

@@ -124,7 +119,7 @@ class DeepgramSTTService(AsyncAIService):
        elif isinstance(frame, AudioRawFrame):
            await self._connection.send(frame.audio)
        else:
-            await self.queue_frame(frame, direction)
+            await self._push_queue.put((frame, direction))

    async def start(self, frame: StartFrame):
        if await self._connection.start(self._live_options):
@@ -134,9 +129,27 @@ class DeepgramSTTService(AsyncAIService):

    async def stop(self, frame: EndFrame):
        await self._connection.finish()
+        await self._push_queue.put((frame, FrameDirection.DOWNSTREAM))
+        await self._push_frame_task

    async def cancel(self, frame: CancelFrame):
        await self._connection.finish()
+        self._push_frame_task.cancel()
+        await self._push_frame_task
+
+    def _create_push_task(self):
+        self._push_queue = asyncio.Queue()
+        self._push_frame_task = self.get_event_loop().create_task(self._push_frame_task_handler())
+
+    async def _push_frame_task_handler(self):
+        running = True
+        while running:
+            try:
+                (frame, direction) = await self._push_queue.get()
+                await self.push_frame(frame, direction)
+                running = not isinstance(frame, EndFrame)
+            except asyncio.CancelledError:
+                break

    async def _on_message(self, *args, **kwargs):
        result = kwargs["result"]
@@ -144,6 +157,6 @@ class DeepgramSTTService(AsyncAIService):
        transcript = result.channel.alternatives[0].transcript
        if len(transcript) > 0:
            if is_final:
-                await self.queue_frame(TranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)))
+                await self._push_queue.put((TranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)), FrameDirection.DOWNSTREAM))
            else:
-                await self.queue_frame(InterimTranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)))
+                await self._push_queue.put((InterimTranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)), FrameDirection.DOWNSTREAM))
--- a/src/pipecat/services/fal.py
+++ b/src/pipecat/services/fal.py
@@ -56,7 +56,7 @@ class FalImageGenService(ImageGenService):

        response = await fal_client.run_async(
            self._model,
-            arguments={"prompt": prompt, **self._params.model_dump(exclude_none=True)}
+            arguments={"prompt": prompt, **self._params.model_dump()}
        )

        image_url = response["images"][0]["url"] if response else None
--- a/src/pipecat/services/fireworks.py
+++ b/src/pipecat/services/fireworks.py
@@ -19,7 +19,6 @@ except ModuleNotFoundError as e:

 class FireworksLLMService(BaseOpenAILLMService):
    def __init__(self,
-                 *,
                 model: str = "accounts/fireworks/models/firefunction-v1",
                 base_url: str = "https://api.fireworks.ai/inference/v1"):
        super().__init__(model, base_url)
--- a/src/pipecat/services/gladia.py
+++ b/src/pipecat/services/gladia.py
@@ -1,115 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import base64
-import json
-import time
-
-from typing import Optional
-from pydantic.main import BaseModel
-
-from pipecat.frames.frames import (
-    AudioRawFrame,
-    CancelFrame,
-    EndFrame,
-    Frame,
-    InterimTranscriptionFrame,
-    StartFrame,
-    SystemFrame,
-    TranscriptionFrame)
-from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.ai_services import AsyncAIService
-
-from loguru import logger
-
-# See .env.example for Gladia configuration needed
-try:
-    import websockets
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error(
-        "In order to use Gladia, you need to `pip install pipecat-ai[gladia]`. Also, set `GLADIA_API_KEY` environment variable.")
-    raise Exception(f"Missing module: {e}")
-
-
-class GladiaSTTService(AsyncAIService):
-    class InputParams(BaseModel):
-        sample_rate: Optional[int] = 16000
-        language: Optional[str] = "english"
-        transcription_hint: Optional[str] = None
-        endpointing: Optional[int] = 200
-        prosody: Optional[bool] = None
-
-    def __init__(self,
-                 *,
-                 api_key: str,
-                 url: str = "wss://api.gladia.io/audio/text/audio-transcription",
-                 confidence: float = 0.5,
-                 params: InputParams = InputParams(),
-                 **kwargs):
-        super().__init__(**kwargs)
-
-        self._api_key = api_key
-        self._url = url
-        self._params = params
-        self._confidence = confidence
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, SystemFrame):
-            await self.push_frame(frame, direction)
-        elif isinstance(frame, AudioRawFrame):
-            await self._send_audio(frame)
-        else:
-            await self.queue_frame(frame, direction)
-
-    async def start(self, frame: StartFrame):
-        self._websocket = await websockets.connect(self._url)
-        self._receive_task = self.get_event_loop().create_task(self._receive_task_handler())
-        await self._setup_gladia()
-
-    async def stop(self, frame: EndFrame):
-        await self._websocket.close()
-
-    async def cancel(self, frame: CancelFrame):
-        await self._websocket.close()
-
-    async def _setup_gladia(self):
-        configuration = {
-            "x_gladia_key": self._api_key,
-            "encoding": "WAV/PCM",
-            "model_type": "fast",
-            "language_behaviour": "manual",
-            **self._params.model_dump(exclude_none=True)
-        }
-
-        await self._websocket.send(json.dumps(configuration))
-
-    async def _send_audio(self, frame: AudioRawFrame):
-        message = {
-            'frames': base64.b64encode(frame.audio).decode("utf-8")
-        }
-        await self._websocket.send(json.dumps(message))
-
-    async def _receive_task_handler(self):
-        async for message in self._websocket:
-            utterance = json.loads(message)
-            if not utterance:
-                continue
-
-            if "error" in utterance:
-                message = utterance["message"]
-                logger.error(f"Gladia error: {message}")
-            elif "confidence" in utterance:
-                type = utterance["type"]
-                confidence = utterance["confidence"]
-                transcript = utterance["transcription"]
-                if confidence >= self._confidence:
-                    if type == "final":
-                        await self.queue_frame(TranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)))
-                    else:
-                        await self.queue_frame(InterimTranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)))
--- a/src/pipecat/services/google.py
+++ b/src/pipecat/services/google.py
@@ -42,7 +42,7 @@ class GoogleLLMService(LLMService):
    franca for all LLM services, so that it is easy to switch between different LLMs.
    """

-    def __init__(self, *, api_key: str, model: str = "gemini-1.5-flash-latest", **kwargs):
+    def __init__(self, api_key: str, model: str = "gemini-1.5-flash-latest", **kwargs):
        super().__init__(**kwargs)
        gai.configure(api_key=api_key)
        self._client = gai.GenerativeModel(model)
@@ -104,10 +104,10 @@ class GoogleLLMService(LLMService):
                        logger.debug(
                            f"LLM refused to generate content for safety reasons - {messages}.")
                    else:
-                        logger.exception(f"{self} error: {e}")
+                        logger.error(f"{self} error: {e}")

        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
+            logger.error(f"{self} exception: {e}")
        finally:
            await self.push_frame(LLMFullResponseEndFrame())

--- a/src/pipecat/services/moondream.py
+++ b/src/pipecat/services/moondream.py
@@ -46,7 +46,6 @@ def detect_device():
 class MoondreamService(VisionService):
    def __init__(
        self,
-            *,
        model="vikhyatk/moondream2",
        revision="2024-04-02",
        use_cpu=False
--- a/src/pipecat/services/ollama.py
+++ b/src/pipecat/services/ollama.py
@@ -9,5 +9,5 @@ from pipecat.services.openai import BaseOpenAILLMService

 class OLLamaLLMService(BaseOpenAILLMService):

-    def __init__(self, *, model: str = "llama2", base_url: str = "http://localhost:11434/v1"):
+    def __init__(self, model: str = "llama2", base_url: str = "http://localhost:11434/v1"):
        super().__init__(model=model, base_url=base_url, api_key="ollama")
--- a/src/pipecat/services/openai.py
+++ b/src/pipecat/services/openai.py
@@ -9,7 +9,7 @@ import base64
 import io
 import json

-from typing import AsyncGenerator, List, Literal
+from typing import Any, AsyncGenerator, List, Literal

 from loguru import logger
 from PIL import Image
@@ -53,7 +53,7 @@ except ModuleNotFoundError as e:
    raise Exception(f"Missing module: {e}")


-class OpenAIUnhandledFunctionException(Exception):
+class OpenAIUnhandledFunctionException(BaseException):
    pass


@@ -67,7 +67,7 @@ class BaseOpenAILLMService(LLMService):
    calls from the LLM.
    """

-    def __init__(self, *, model: str, api_key=None, base_url=None, **kwargs):
+    def __init__(self, model: str, api_key=None, base_url=None, **kwargs):
        super().__init__(**kwargs)
        self._model: str = model
        self._client = self.create_client(api_key=api_key, base_url=base_url, **kwargs)
@@ -109,7 +109,10 @@ class BaseOpenAILLMService(LLMService):
                del message["data"]
                del message["mime_type"]

-        chunks = await self.get_chat_completions(context, messages)
+        try:
+            chunks = await self.get_chat_completions(context, messages)
+        except Exception as e:
+            logger.error(f"{self} exception: {e}")

        return chunks

@@ -211,7 +214,7 @@ class BaseOpenAILLMService(LLMService):
        elif isinstance(result, type(None)):
            pass
        else:
-            raise TypeError(f"Unknown return type from function callback: {type(result)}")
+            raise BaseException(f"Unknown return type from function callback: {type(result)}")

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)
@@ -228,16 +231,14 @@ class BaseOpenAILLMService(LLMService):

        if context:
            await self.push_frame(LLMFullResponseStartFrame())
-            await self.start_processing_metrics()
            await self._process_context(context)
-            await self.stop_processing_metrics()
            await self.push_frame(LLMFullResponseEndFrame())


 class OpenAILLMService(BaseOpenAILLMService):

-    def __init__(self, *, model: str = "gpt-4o", **kwargs):
-        super().__init__(model=model, **kwargs)
+    def __init__(self, model="gpt-4o", **kwargs):
+        super().__init__(model, **kwargs)


 class OpenAIImageGenService(ImageGenService):
@@ -333,4 +334,4 @@ class OpenAITTSService(TTSService):
                        frame = AudioRawFrame(chunk, 24_000, 1)
                        yield frame
        except BadRequestError as e:
-            logger.exception(f"{self} error generating TTS: {e}")
+            logger.error(f"{self} error generating TTS: {e}")
--- a/src/pipecat/services/openpipe.py
+++ b/src/pipecat/services/openpipe.py
@@ -25,7 +25,6 @@ class OpenPipeLLMService(BaseOpenAILLMService):

    def __init__(
            self,
-            *,
            model: str = "gpt-4o",
            api_key: str | None = None,
            base_url: str | None = None,
@@ -34,9 +33,9 @@ class OpenPipeLLMService(BaseOpenAILLMService):
            tags: Dict[str, str] | None = None,
            **kwargs):
        super().__init__(
-            model=model,
-            api_key=api_key,
-            base_url=base_url,
+            model,
+            api_key,
+            base_url,
            openpipe_api_key=openpipe_api_key,
            openpipe_base_url=openpipe_base_url,
            **kwargs)
--- a/src/pipecat/services/playht.py
+++ b/src/pipecat/services/playht.py
@@ -80,4 +80,4 @@ class PlayHTTTSService(TTSService):
                        frame = AudioRawFrame(chunk, 16000, 1)
                        yield frame
        except Exception as e:
-            logger.exception(f"{self} error generating TTS: {e}")
+            logger.error(f"{self} error generating TTS: {e}")
--- a/src/pipecat/services/whisper.py
+++ b/src/pipecat/services/whisper.py
@@ -42,8 +42,7 @@ class WhisperSTTService(STTService):
    """Class to transcribe audio with a locally-downloaded Whisper model"""

    def __init__(self,
-                 *,
-                 model: str | Model = Model.DISTIL_MEDIUM_EN,
+                 model: Model = Model.DISTIL_MEDIUM_EN,
                 device: str = "auto",
                 compute_type: str = "default",
                 no_speech_prob: float = 0.4,
@@ -52,7 +51,7 @@ class WhisperSTTService(STTService):
        super().__init__(**kwargs)
        self._device: str = device
        self._compute_type = compute_type
-        self._model_name: str | Model = model
+        self._model_name: Model = model
        self._no_speech_prob = no_speech_prob
        self._model: WhisperModel | None = None
        self._load()
@@ -65,7 +64,7 @@ class WhisperSTTService(STTService):
        this model is being run, it will take time to download."""
        logger.debug("Loading Whisper model...")
        self._model = WhisperModel(
-            self._model_name.value if isinstance(self._model_name, Enum) else self._model_name,
+            self._model_name.value,
            device=self._device,
            compute_type=self._compute_type)
        logger.debug("Loaded Whisper model")
--- a/src/pipecat/services/xtts.py
+++ b/src/pipecat/services/xtts.py
@@ -1,112 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import aiohttp
-
-from typing import AsyncGenerator
-
-from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame
-from pipecat.services.ai_services import TTSService
-
-from loguru import logger
-
-import requests
-
-import numpy as np
-
-try:
-    import resampy
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error("In order to use XTTS, you need to `pip install pipecat-ai[xtts]`.")
-    raise Exception(f"Missing module: {e}")
-
-
-# The server below can connect to XTTS through a local running docker
-#
-# Docker command: $ docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121
-#
-# You can find more information on the official repo:
-# https://github.com/coqui-ai/xtts-streaming-server
-
-
-class XTTSService(TTSService):
-
-    def __init__(
-            self,
-            *,
-            aiohttp_session: aiohttp.ClientSession,
-            voice_id: str,
-            language: str,
-            base_url: str,
-            **kwargs):
-        super().__init__(**kwargs)
-
-        self._voice_id = voice_id
-        self._language = language
-        self._base_url = base_url
-        self._aiohttp_session = aiohttp_session
-        self._studio_speakers = requests.get(self._base_url + "/studio_speakers").json()
-
-    def can_generate_metrics(self) -> bool:
-        return True
-
-    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
-        logger.debug(f"Generating TTS: [{text}]")
-        embeddings = self._studio_speakers[self._voice_id]
-
-        url = self._base_url + "/tts_stream"
-
-        payload = {
-            "text": text.replace('.', '').replace('*', ''),
-            "language": self._language,
-            "speaker_embedding": embeddings["speaker_embedding"],
-            "gpt_cond_latent": embeddings["gpt_cond_latent"],
-            "add_wav_header": False,
-            "stream_chunk_size": 20,
-        }
-
-        await self.start_ttfb_metrics()
-
-        async with self._aiohttp_session.post(url, json=payload) as r:
-            if r.status != 200:
-                text = await r.text()
-                logger.error(f"{self} error getting audio (status: {r.status}, error: {text})")
-                yield ErrorFrame(f"Error getting audio (status: {r.status}, error: {text})")
-                return
-
-            buffer = bytearray()
-
-            async for chunk in r.content.iter_chunked(1024):
-                if len(chunk) > 0:
-                    await self.stop_ttfb_metrics()
-                    # Append new chunk to the buffer
-                    buffer.extend(chunk)
-
-                    # Check if buffer has enough data for processing
-                    while len(buffer) >= 48000:  # Assuming at least 0.5 seconds of audio data at 24000 Hz
-                        # Process the buffer up to a safe size for resampling
-                        process_data = buffer[:48000]
-                        # Remove processed data from buffer
-                        buffer = buffer[48000:]
-
-                        # Convert the byte data to numpy array for resampling
-                        audio_np = np.frombuffer(process_data, dtype=np.int16)
-                        # Resample the audio from 24000 Hz to 16000 Hz
-                        resampled_audio = resampy.resample(audio_np, 24000, 16000)
-                        # Convert the numpy array back to bytes
-                        resampled_audio_bytes = resampled_audio.astype(np.int16).tobytes()
-                        # Create the frame with the resampled audio
-                        frame = AudioRawFrame(resampled_audio_bytes, 16000, 1)
-                        yield frame
-
-            # Process any remaining data in the buffer
-            if len(buffer) > 0:
-                audio_np = np.frombuffer(buffer, dtype=np.int16)
-                resampled_audio = resampy.resample(audio_np, 24000, 16000)
-                resampled_audio_bytes = resampled_audio.astype(np.int16).tobytes()
-                frame = AudioRawFrame(resampled_audio_bytes, 16000, 1)
-                yield frame
--- a/src/pipecat/transports/base_input.py
+++ b/src/pipecat/transports/base_input.py
@@ -55,7 +55,7 @@ class BaseInputTransport(FrameProcessor):

    async def push_audio_frame(self, frame: AudioRawFrame):
        if self._params.audio_in_enabled or self._params.vad_enabled:
-            await self._audio_in_queue.put(frame)
+            self._audio_in_queue.put_nowait(frame)

    #
    # Frame processor
@@ -113,15 +113,10 @@ class BaseInputTransport(FrameProcessor):
            # Make sure we notify about interruptions quickly out-of-band
            if isinstance(frame, UserStartedSpeakingFrame):
                logger.debug("User started speaking")
-                # Cancel the task. This will stop pushing frames downstream.
                self._push_frame_task.cancel()
                await self._push_frame_task
-                # Push an out-of-band frame (i.e. not using the ordered push
-                # frame task) to stop everything, specially at the output
-                # transport.
-                await self.push_frame(StartInterruptionFrame())
-                # Create a new queue and task.
                self._create_push_task()
+                await self.push_frame(StartInterruptionFrame())
            elif isinstance(frame, UserStoppedSpeakingFrame):
                logger.debug("User stopped speaking")
                await self.push_frame(StopInterruptionFrame())
@@ -173,5 +168,5 @@ class BaseInputTransport(FrameProcessor):
                    await self._internal_push_frame(frame)
            except asyncio.CancelledError:
                break
-            except Exception as e:
-                logger.exception(f"{self} error reading audio frames: {e}")
+            except BaseException as e:
+                logger.error(f"{self} error reading audio frames: {e}")
--- a/src/pipecat/transports/base_output.py
+++ b/src/pipecat/transports/base_output.py
@@ -14,7 +14,6 @@ from typing import List
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.frames.frames import (
    AudioRawFrame,
-    BotSpeakingFrame,
    CancelFrame,
    MetricsFrame,
    SpriteFrame,
@@ -181,8 +180,8 @@ class BaseOutputTransport(FrameProcessor):
                self._sink_queue.task_done()
            except asyncio.CancelledError:
                break
-            except Exception as e:
-                logger.exception(f"{self} error processing sink queue: {e}")
+            except BaseException as e:
+                logger.error(f"{self} error processing sink queue: {e}")

    #
    # Push frames task
@@ -251,7 +250,7 @@ class BaseOutputTransport(FrameProcessor):
            except asyncio.CancelledError:
                break
            except Exception as e:
-                logger.exception(f"{self} error writing to camera: {e}")
+                logger.error(f"{self} error writing to camera: {e}")

    #
    # Audio out
@@ -264,5 +263,4 @@ class BaseOutputTransport(FrameProcessor):
        if len(buffer) >= self._audio_chunk_size:
            await self.write_raw_audio_frames(bytes(buffer[:self._audio_chunk_size]))
            buffer = buffer[self._audio_chunk_size:]
-            await self.push_frame(BotSpeakingFrame(), FrameDirection.UPSTREAM)
        return buffer
--- a/src/pipecat/transports/base_transport.py
+++ b/src/pipecat/transports/base_transport.py
@@ -82,4 +82,5 @@ class BaseTransport(ABC):
                else:
                    handler(self, *args, **kwargs)
        except Exception as e:
-            logger.exception(f"Exception in event handler {event_name}: {e}")
+            logger.error(f"Exception in event handler {event_name}: {e}")
+            raise e
--- a/src/pipecat/transports/network/fastapi_websocket.py
+++ b/src/pipecat/transports/network/fastapi_websocket.py
@@ -12,6 +12,7 @@ import wave
 from typing import Awaitable, Callable
 from pydantic.main import BaseModel

+from pipecat.serializers.twilio import TwilioFrameSerializer
 from pipecat.frames.frames import AudioRawFrame, StartFrame
 from pipecat.processors.frame_processor import FrameProcessor
 from pipecat.serializers.base_serializer import FrameSerializer
@@ -34,7 +35,7 @@ except ModuleNotFoundError as e:
 class FastAPIWebsocketParams(TransportParams):
    add_wav_header: bool = False
    audio_frame_size: int = 6400  # 200ms
-    serializer: FrameSerializer
+    serializer: FrameSerializer = TwilioFrameSerializer()


 class FastAPIWebsocketCallbacks(BaseModel):
@@ -113,7 +114,7 @@ class FastAPIWebsocketOutputTransport(BaseOutputTransport):
                frame = wav_frame

            payload = self._params.serializer.serialize(frame)
-            if payload and self._websocket.client_state == WebSocketState.CONNECTED:
+            if payload:
                await self._websocket.send_text(payload)

            self._audio_buffer = self._audio_buffer[self._params.audio_frame_size:]
@@ -124,7 +125,7 @@ class FastAPIWebsocketTransport(BaseTransport):
    def __init__(
            self,
            websocket: WebSocket,
-            params: FastAPIWebsocketParams,
+            params: FastAPIWebsocketParams = FastAPIWebsocketParams(),
            input_name: str | None = None,
            output_name: str | None = None,
            loop: asyncio.AbstractEventLoop | None = None):
--- a/src/pipecat/transports/network/websocket_server.py
+++ b/src/pipecat/transports/network/websocket_server.py
@@ -124,9 +124,6 @@ class WebsocketServerOutputTransport(BaseOutputTransport):
        self._websocket = websocket

    async def write_raw_audio_frames(self, frames: bytes):
-        if not self._websocket:
-            return
-
        self._audio_buffer += frames
        while len(self._audio_buffer) >= self._params.audio_frame_size:
            frame = AudioRawFrame(
@@ -151,8 +148,8 @@ class WebsocketServerOutputTransport(BaseOutputTransport):
                frame = wav_frame

            proto = self._params.serializer.serialize(frame)
-            if proto:
-                await self._websocket.send(proto)
+
+            await self._websocket.send(proto)

            self._audio_buffer = self._audio_buffer[self._params.audio_frame_size:]

--- a/src/pipecat/transports/services/daily.py
+++ b/src/pipecat/transports/services/daily.py
@@ -9,7 +9,7 @@ import asyncio
 import time

 from dataclasses import dataclass
-from typing import Any, Awaitable, Callable, Mapping, Optional
+from typing import Any, Awaitable, Callable, Mapping
 from concurrent.futures import ThreadPoolExecutor

 from daily import (
@@ -59,8 +59,8 @@ class DailyTransportMessageFrame(TransportMessageFrame):

 class WebRTCVADAnalyzer(VADAnalyzer):

-    def __init__(self, *, sample_rate=16000, num_channels=1, params: VADParams = VADParams()):
-        super().__init__(sample_rate=sample_rate, num_channels=num_channels, params=params)
+    def __init__(self, sample_rate=16000, num_channels=1, params: VADParams = VADParams()):
+        super().__init__(sample_rate, num_channels, params)

        self._webrtc_vad = Daily.create_native_vad(
            reset_period_ms=VAD_RESET_PERIOD_MS,
@@ -101,7 +101,7 @@ class DailyTranscriptionSettings(BaseModel):
 class DailyParams(TransportParams):
    api_url: str = "https://api.daily.co/v1"
    api_key: str = ""
-    dialin_settings: Optional[DailyDialinSettings] = None
+    dialin_settings: DailyDialinSettings | None = None
    transcription_enabled: bool = False
    transcription_settings: DailyTranscriptionSettings = DailyTranscriptionSettings()

@@ -199,9 +199,6 @@ class DailyTransportClient(EventHandler):
        self._callbacks = callbacks

    async def send_message(self, frame: DailyTransportMessageFrame):
-        if not self._client:
-            return
-
        future = self._loop.create_future()
        self._client.send_app_message(
            frame.message,
@@ -212,18 +209,19 @@ class DailyTransportClient(EventHandler):
    async def read_next_audio_frame(self) -> AudioRawFrame | None:
        sample_rate = self._params.audio_in_sample_rate
        num_channels = self._params.audio_in_channels
-        num_frames = int(sample_rate / 100) * 2  # 20ms of audio

-        future = self._loop.create_future()
-        self._speaker.read_frames(num_frames, completion=completion_callback(future))
-        audio = await future
+        if self._other_participant_has_joined:
+            num_frames = int(sample_rate / 100) * 2  # 20ms of audio
+
+            future = self._loop.create_future()
+            self._speaker.read_frames(num_frames, completion=completion_callback(future))
+            audio = await future

-        if len(audio) > 0:
            return AudioRawFrame(audio=audio, sample_rate=sample_rate, num_channels=num_channels)
        else:
-            # If we don't read any audio it could be there's no participant
-            # connected. daily-python will return immediately if that's the
-            # case, so let's sleep for a little bit (i.e. busy wait).
+            # If no one has ever joined the meeting `read_frames()` would block,
+            # instead we just wait a bit. daily-python should probably return
+            # silence instead.
            await asyncio.sleep(0.01)
            return None

@@ -268,7 +266,7 @@ class DailyTransportClient(EventHandler):
                    logger.info(
                        f"Enabling transcription with settings {self._params.transcription_settings}")
                    self._client.start_transcription(
-                        self._params.transcription_settings.model_dump(exclude_none=True))
+                        self._params.transcription_settings.model_dump())

                await self._callbacks.on_joined(data["participants"]["local"])
            else:
@@ -659,11 +657,11 @@ class DailyOutputTransport(BaseOutputTransport):
        await self._client.send_message(frame)

    async def send_metrics(self, frame: MetricsFrame):
+        ttfb = [{"name": n, "time": t} for n, t in frame.ttfb.items()]
        message = DailyTransportMessageFrame(message={
            "type": "pipecat-metrics",
            "metrics": {
-                "ttfb": frame.ttfb or [],
-                "processing": frame.processing or [],
+                "ttfb": ttfb
            },
        })
        await self._client.send_message(message)
@@ -838,8 +836,8 @@ class DailyTransport(BaseTransport):
                    logger.debug("Event dialin-ready was handled successfully")
            except asyncio.TimeoutError:
                logger.error(f"Timeout handling dialin-ready event ({url})")
-            except Exception as e:
-                logger.exception(f"Error handling dialin-ready event ({url}): {e}")
+            except BaseException as e:
+                logger.error(f"Error handling dialin-ready event ({url}): {e}")

    async def _on_dialin_ready(self, sip_endpoint):
        if self._params.dialin_settings:
--- a/src/pipecat/utils/test_frame_processor.py
+++ b/src/pipecat/utils/test_frame_processor.py
@@ -2,7 +2,7 @@ from typing import List
 from pipecat.processors.frame_processor import FrameProcessor


-class TestException(Exception):
+class TestException(BaseException):
    pass


--- a/src/pipecat/vad/silero.py
+++ b/src/pipecat/vad/silero.py
@@ -33,23 +33,14 @@ _MODEL_RESET_STATES_TIME = 5.0

 class SileroVADAnalyzer(VADAnalyzer):

-    def __init__(
-            self,
-            *,
-            sample_rate: int = 16000,
-            version: str = "v5.0",
-            params: VADParams = VADParams()):
+    def __init__(self, sample_rate=16000, params: VADParams = VADParams()):
        super().__init__(sample_rate=sample_rate, num_channels=1, params=params)

-        if sample_rate != 16000 and sample_rate != 8000:
-            raise ValueError("Silero VAD sample rate needs to be 16000 or 8000")
-
        logger.debug("Loading Silero VAD model...")

-        (self._model, _) = torch.hub.load(repo_or_dir=f"snakers4/silero-vad:{version}",
-                                          model="silero_vad",
-                                          force_reload=False,
-                                          trust_repo=True)
+        (self._model, utils) = torch.hub.load(
+            repo_or_dir="snakers4/silero-vad", model="silero_vad", force_reload=False
+        )

        self._last_reset_time = 0

@@ -60,7 +51,7 @@ class SileroVADAnalyzer(VADAnalyzer):
    #

    def num_frames_required(self) -> int:
-        return 512 if self.sample_rate == 16000 else 256
+        return int(self.sample_rate / 100) * 4  # 40ms

    def voice_confidence(self, buffer) -> float:
        try:
@@ -78,9 +69,9 @@ class SileroVADAnalyzer(VADAnalyzer):
                self._last_reset_time = curr_time

            return new_confidence
-        except Exception as e:
+        except BaseException as e:
            # This comes from an empty audio array
-            logger.exception(f"Error analyzing audio with Silero VAD: {e}")
+            logger.error(f"Error analyzing audio with Silero VAD: {e}")
            return 0


@@ -88,15 +79,12 @@ class SileroVAD(FrameProcessor):

    def __init__(
            self,
-            *,
            sample_rate: int = 16000,
-            version: str = "v5.0",
            vad_params: VADParams = VADParams(),
            audio_passthrough: bool = False):
        super().__init__()

-        self._vad_analyzer = SileroVADAnalyzer(
-            sample_rate=sample_rate, version=version, params=vad_params)
+        self._vad_analyzer = SileroVADAnalyzer(sample_rate=sample_rate, params=vad_params)
        self._audio_passthrough = audio_passthrough

        self._processor_vad_state: VADState = VADState.QUIET
--- a/src/pipecat/vad/vad_analyzer.py
+++ b/src/pipecat/vad/vad_analyzer.py
@@ -28,7 +28,7 @@ class VADParams(BaseModel):

 class VADAnalyzer:

-    def __init__(self, *, sample_rate: int, num_channels: int, params: VADParams):
+    def __init__(self, sample_rate: int, num_channels: int, params: VADParams):
        self._sample_rate = sample_rate
        self._num_channels = num_channels
        self._params = params
Author	SHA1	Message	Date
Jon Taylor	5bd5d22270	removed space from event handler	2024-06-26 18:30:56 +01:00
Jon Taylor	6ee7932337	added pause to start and new intro prompt	2024-06-26 18:24:14 +01:00
Jon Taylor	c407445dd1	removed header comment from bot runner	2024-06-24 17:35:26 +01:00
Jon Taylor	447f37167e	added VAD stop seconds env	2024-06-24 17:34:25 +01:00
Jon Taylor	354c21500e	prompt tweaks	2024-06-24 17:28:10 +01:00
Jon Taylor	5728e25b5a	added fastbot example	2024-06-24 16:25:36 +01:00