removed space from event handler

added pause to start and new intro prompt
removed header comment from bot runner
2024-06-26 18:30:56 +01:00 · 2024-06-26 18:24:14 +01:00 · 2024-06-24 17:35:26 +01:00 · 2024-06-24 17:34:25 +01:00 · 2024-06-24 17:28:10 +01:00 · 2024-06-24 16:25:36 +01:00
80 changed files with 1259 additions and 2847 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,159 +5,6 @@ All notable changes to **pipecat** will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

-## [0.0.37] - 2024-07-22
-
-### Added
-
- Added `RTVIProcessor` which implements the RTVI-AI standard.
-  See https://github.com/rtvi-ai
-
- Added `BotInterruptionFrame` which allows interrupting the bot while talking.
-
- Added `LLMMessagesAppendFrame` which allows appending messages to the current
-  LLM context.
-
- Added `LLMMessagesUpdateFrame` which allows changing the LLM context for the
-  one provided in this new frame.
-
- Added `LLMModelUpdateFrame` which allows updating the LLM model.
-
- Added `TTSSpeakFrame` which causes the bot say some text. This text will not
-  be part of the LLM context.
-
- Added `TTSVoiceUpdateFrame` which allows updating the TTS voice.
-
-### Removed
-
- We remove the `LLMResponseStartFrame` and `LLMResponseEndFrame` frames. These
-  were added in the past to properly handle interruptions for the
-  `LLMAssistantContextAggregator`. But the `LLMContextAggregator` is now based
-  on `LLMResponseAggregator` which handles interruptions properly by just
-  processing the `StartInterruptionFrame`, so there's no need for these extra
-  frames any more.
-
-### Fixed
-
- Fixed an issue with `StatelessTextTransformer` where it was pushing a string
-  instead of a `TextFrame`.
-
- `TTSService` end of sentence detection has been improved. It now works with
-  acronyms, numbers, hours and others.
-
- Fixed an issue in `TTSService` that would not properly flush the current
-  aggregated sentence if an `LLMFullResponseEndFrame` was found.
-
-### Performance
-
- `CartesiaTTSService` now uses websockets which improves speed. It also
-  leverages the new Cartesia contexts which maintains generated audio prosody
-  when multiple inputs are sent, therefore improving audio quality a lot.
-
-## [0.0.36] - 2024-07-02
-
-### Added
-
- Added `GladiaSTTService`.
-  See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition
-
- Added `XTTSService`. This is a local Text-To-Speech service.
-  See https://github.com/coqui-ai/TTS
-
- Added `UserIdleProcessor`. This processor can be used to wait for any
-  interaction with the user. If the user doesn't say anything within a given
-  timeout a provided callback is called.
-
- Added `IdleFrameProcessor`. This processor can be used to wait for frames
-  within a given timeout. If no frame is received within the timeout a provided
-  callback is called.
-
- Added new frame `BotSpeakingFrame`. This frame will be continuously pushed
-  upstream while the bot is talking.
-
- It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer`
-  or `SileroVAD`.
-
- Added `AysncFrameProcessor` and `AsyncAIService`.  Some services like
-  `DeepgramSTTService` need to process things asynchronously. For example, audio
-  is sent to Deepgram but transcriptions are not returned immediately. In these
-  cases we still require all frames (except system frames) to be pushed
-  downstream from a single task. That's what `AsyncFrameProcessor` is for. It
-  creates a task and all frames should be pushed from that task. So, whenever a
-  new Deepgram transcription is ready that transcription will also be pushed
-  from this internal task.
-
- The `MetricsFrame` now includes processing metrics if metrics are enabled. The
-  processing metrics indicate the time a processor needs to generate all its
-  output. Note that not all processors generate these kind of metrics.
-
-### Changed
-
- `WhisperSTTService` model can now also be a string.
-
- Added missing * keyword separators in services.
-
-### Fixed
-
- `WebsocketServerTransport` doesn't try to send frames anymore if serializers
-  returns `None`.
-
- Fixed an issue where exceptions that occurred inside frame processors were
-  being swallowed and not displayed.
-
- Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send
-  data to the websocket after being closed.
-
-### Other
-
- Added Fly.io deployment example in `examples/deployment/flyio-example`.
-
- Added new `17-detect-user-idle.py` example that shows how to use the new
-  `UserIdleProcessor`.
-
-## [0.0.35] - 2024-06-28
-
-### Changed
-
- `FastAPIWebsocketParams` now require a serializer.
-
- `TwilioFrameSerializer` now requires a `streamSid`.
-
-### Fixed
-
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for
-  8000 sample rate.
-
-## [0.0.34] - 2024-06-25
-
-### Fixed
-
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
-  interruptions to ignore transcriptions.
-
- Fixed an issue introduced in 0.0.33 that would cause the LLM to generate
-  shorter output.
-
-## [0.0.33] - 2024-06-25
-
-### Changed
-
- Upgraded to Cartesia's new Python library 1.0.0. `CartesiaTTSService` now
-  expects a voice ID instead of a voice name (you can get the voice ID from
-  Cartesia's playground). You can also specify the audio `sample_rate` and
-  `encoding` instead of the previous `output_format`.
-
-### Fixed
-
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
-  cause static audio issues and interruptions to not work properly when dealing
-  with multiple LLMs sentences.
-
- Fixed an issue that could mix new LLM responses with previous ones when
-  handling interruptions.
-
- Fixed a Daily transport blocking situation that occurred while reading audio
-  frames after a participant left the room. Needs daily-python >= 0.10.1.
-
 ## [0.0.32] - 2024-06-22

 ### Added
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ pip install "pipecat-ai[option,...]"

 Your project may or may not need these, so they're made available as optional requirements. Here is a list:

- **AI services**: `anthropic`, `azure`, `deepgram`, `gladia`, `google`, `fal`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`, `xtts`
+- **AI services**: `anthropic`, `azure`, `deepgram`, `google`, `fal`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`
 - **Transports**: `local`, `websocket`, `daily`

 ## Code examples
@@ -70,8 +70,8 @@ async def main():
    transport = DailyTransport(
      room_url=...,
      token=...,
-      bot_name="Bot Name",
-      params=DailyParams(audio_out_enabled=True))
+      "Bot Name",
+      DailyParams(audio_out_enabled=True))

    # Use Eleven Labs for Text-to-Speech
    tts = ElevenLabsTTSService(
@@ -125,7 +125,7 @@ Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://

 Voice Activity Detection &mdash; very important for knowing when a user has finished speaking to your bot. If you are not using press-to-talk, and want Pipecat to detect when the user has finished talking, VAD is an essential component for a natural feeling conversation.

-Pipecat makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.
+Pipecast makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.

 ```shell
 pip install pipecat-ai[silero]
--- a/dot-env.template
+++ b/dot-env.template
@@ -27,9 +27,6 @@ FAL_KEY=...
 # Fireworks
 FIREWORKS_API_KEY=...

-# Gladia
-GLADIA_API_KEY=...
-
 # PlayHT
 PLAY_HT_USER_ID=...
 PLAY_HT_API_KEY=...
--- a/examples/deployment/flyio-example/Dockerfile
+++ b/examples/deployment/flyio-example/Dockerfile
@@ -1,16 +0,0 @@
-FROM python:3.11-bullseye
-
-# Open port 7860 for http service
-ENV FAST_API_PORT=7860
-EXPOSE 7860
-
-# Install Python dependencies
-COPY *.py .
-COPY ./requirements.txt requirements.txt
-RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
-
-# Install models
-RUN python3 install_deps.py
-
-# Start the FastAPI server
-CMD python3 bot_runner.py --port ${FAST_API_PORT}
--- a/examples/deployment/flyio-example/README.md
+++ b/examples/deployment/flyio-example/README.md
@@ -1,43 +0,0 @@
-# Fly.io deployment example
-
-This project modifies the `bot_runner.py` server to launch a new machine for each user session. This is a recommended approach for production vs. running shell processess as your deployment will quickly run out of system resources under load.
-
-To speed up machine boot times, we also download and cache Silero VAD as part of the Dockerfile (`install_deps.py`). If you are using other custom models, you can add them here too.
-
-For this example, we are using Daily as a WebRTC transport and provisioning a new room and token for each session. You can use another transport, such as WebSockets, by modifying the `bot.py` and `bot_runner.py` files accordingly.
-
-## Setting up your fly.io deployment
-
-### Create your fly.toml file
-
-You can copy the `example-fly.toml` as a reference. Be sure to change the app name to something unique.
-
-### Create your .env file
-
-Copy the base `env.example` to `.env` and enter the necessary API keys. 
-
-`FLY_APP_NAME` should match that in the `fly.toml` file.
-
-### Launch a new fly.io project
-
-`fly launch` or `fly launch --org your-org-name`
-
-### Set the necessary app secrets from your .env
-
-Note: you can do this manually via the fly.io dashboard under the "secrets" sub-section of your deployment (e.g. "https://fly.io/apps/fly-app-name/secrets") or run the following terminal command:
-
-`cat .env | tr '\n' ' ' | xargs flyctl secrets set`
-
-### Deploy your machine
-
-`fly deploy`
-
-
-## Connecting to your bot
-
-Send a post request to your running fly.io instance:
-
-`curl --location --request POST 'https://YOUR_FLY_APP_NAME/start_bot'`
-
-This request will wait until the machine enters into a `starting` state, before returning the a room URL and token to join.
-
--- a/examples/deployment/flyio-example/bot.py
+++ b/examples/deployment/flyio-example/bot.py
@@ -1,103 +0,0 @@
-import asyncio
-import aiohttp
-import os
-import sys
-import argparse
-
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
-from pipecat.frames.frames import LLMMessagesFrame, EndFrame
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.elevenlabs import ElevenLabsTTSService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-daily_api_key = os.getenv("DAILY_API_KEY", "")
-daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
-
-
-async def main(room_url: str, token: str):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Chatbot",
-            DailyParams(
-                api_url=daily_api_url,
-                api_key=daily_api_key,
-                audio_in_enabled=True,
-                audio_out_enabled=True,
-                camera_out_enabled=False,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
-                transcription_enabled=True,
-            )
-        )
-
-        tts = ElevenLabsTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("ELEVENLABS_API_KEY", ""),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are Chatbot, a friendly, helpful robot. Your output will be converted to audio so don't include special characters other than '!' or '?' in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying hello.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        pipeline = Pipeline([
-            transport.input(),
-            tma_in,
-            llm,
-            tts,
-            transport.output(),
-            tma_out,
-        ])
-
-        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        @transport.event_handler("on_participant_left")
-        async def on_participant_left(transport, participant, reason):
-            await task.queue_frame(EndFrame())
-
-        @transport.event_handler("on_call_state_updated")
-        async def on_call_state_updated(transport, state):
-            if state == "left":
-                await task.queue_frame(EndFrame())
-
-        runner = PipelineRunner()
-
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="Pipecat Bot")
-    parser.add_argument("-u", type=str, help="Room URL")
-    parser.add_argument("-t", type=str, help="Token")
-    config = parser.parse_args()
-
-    asyncio.run(main(config.u, config.t))
--- a/examples/deployment/flyio-example/env.example
+++ b/examples/deployment/flyio-example/env.example
@@ -1,8 +0,0 @@
-DAILY_API_KEY=
-DAILY_SAMPLE_ROOM_URL= # Enter a Daily room URL to use a set room URL each time (useful for local testing)
-OPENAI_API_KEY=
-ELEVENLABS_API_KEY=
-ELEVENLABS_VOICE_ID=
-FLY_API_KEY=
-FLY_APP_NAME=
-RUN_AS_PROCESS= # Spawn fly.io machine for each session or run as local process
--- a/examples/deployment/flyio-example/example-fly.toml
+++ b/examples/deployment/flyio-example/example-fly.toml
@@ -1,25 +0,0 @@
-# fly.toml app configuration file generated for pipecat-fly-example on 2024-07-01T15:04:53+01:00
-#
-# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
-#
-
-app = 'pipecat-fly-example'
-primary_region = 'sjc'
-
-[build]
-
-[env]
-  FLY_APP_NAME = 'pipecat-fly-example'
-
-[http_service]
-  internal_port = 7860
-  force_https = true
-  auto_stop_machines = true
-  auto_start_machines = true
-  min_machines_running = 0
-  processes = ['app']
-
-[[vm]]
-  memory = 512
-  cpu_kind = 'shared'
-  cpus = 1
--- a/examples/deployment/flyio-example/install_deps.py
+++ b/examples/deployment/flyio-example/install_deps.py
@@ -1,4 +0,0 @@
-import torch
-
-# Download (cache) the Silero VAD model
-torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)
--- a/examples/fast-chatbot/.gitignore
+++ b/examples/fast-chatbot/.gitignore
@@ -0,0 +1,165 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
+runpod.toml
+
+# custom script to recursively upgrade items in requirements.py
+upgrade_requirements.py
+.DS_Store
--- a/examples/deployment/flyio-example/init.py
+++ b/examples/deployment/flyio-example/init.py
--- a/examples/fast-chatbot/bot.py
+++ b/examples/fast-chatbot/bot.py
@@ -0,0 +1,164 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+from loguru import logger
+import argparse
+import asyncio
+import aiohttp
+import os
+import sys
+import time
+from typing import Optional
+
+from pydantic import BaseModel, ValidationError
+
+from pipecat.vad.vad_analyzer import VADParams
+from pipecat.vad.silero import SileroVADAnalyzer
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.services.openai import OpenAILLMService
+from pipecat.services.deepgram import DeepgramSTTService
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.frames.frames import LLMMessagesFrame, EndFrame
+
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator, LLMUserResponseAggregator
+)
+
+from helpers import (
+    ClearableDeepgramTTSService,
+    AudioVolumeTimer,
+    TranscriptionTimingLogger
+)
+
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level=os.getenv("LOG_LEVEL", "DEBUG"))
+
+
+class BotSettings(BaseModel):
+    room_url: str
+    room_token: str
+    bot_name: str = "Pipecat"
+    prompt: Optional[str] = "You are a helpful assistant."
+    deepgram_api_key: Optional[str] = os.getenv("DEEPGRAM_API_KEY", None)
+    deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE", "aura-asteria-en")
+    deepgram_tts_base_url: Optional[str] = os.getenv(
+        "DEEPGRAM_TTS_BASE_URL", "https://api.deepgram.com/v1/speak")
+    deepgram_stt_base_url: Optional[str] = os.getenv(
+        "DEEPGRAM_STT_BASE_URL", "https://api.deepgram.com/v1/speak")
+    openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY", None),
+    openai_model: Optional[str] = os.getenv("OPENAI_MODEL", None),
+    openai_base_url: Optional[str] = os.getenv("OPENAI_BASE_URL", None)
+    vad_stop_secs: Optional[float] = os.getenv("VAD_STOP_SECS", 0.200)
+
+
+async def main(settings: BotSettings):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            settings.room_url,
+            settings.room_token,
+            settings.bot_name,
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=False,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(params=VADParams(
+                    stop_secs=settings.vad_stop_secs
+                )),
+                vad_audio_passthrough=True
+            )
+        )
+
+        stt = DeepgramSTTService(
+            name="STT",
+            api_key=settings.deepgram_api_key,
+            url=settings.deepgram_stt_base_url
+        )
+
+        tts = ClearableDeepgramTTSService(
+            name="Voice",
+            aiohttp_session=session,
+            api_key=settings.deepgram_api_key,
+            voice=settings.deepgram_voice,
+            **({'base_url': url} if (url := settings.deepgram_tts_base_url) else {})
+        )
+
+        llm = OpenAILLMService(
+            name="LLM",
+            api_key=settings.openai_api_key,
+            model=settings.openai_model,
+            base_url=settings.openai_base_url,
+        )
+
+        messages = [
+            {
+                "role": "system",
+                "content": settings.prompt,
+            },
+        ]
+
+        avt = AudioVolumeTimer()
+        tl = TranscriptionTimingLogger(avt)
+
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
+
+        pipeline = Pipeline([
+            transport.input(),   # Transport user input
+            avt,                 # Audio volume timer
+            stt,                 # Speech-to-text
+            tl,                  # Transcription timing logger
+            tma_in,              # User responses
+            llm,                 # LLM
+            tts,                 # TTS
+            transport.output(),  # Transport bot output
+            tma_out,             # Assistant spoken responses
+        ])
+
+        task = PipelineTask(
+            pipeline,
+            PipelineParams(
+                allow_interruptions=True,
+                enable_metrics=True,
+                report_only_initial_ttfb=True
+            ))
+
+        # When the participant leaves, we exit the bot.
+        @transport.event_handler("on_participant_left")
+        async def on_participant_left(transport, participant, reason):
+            await task.queue_frame(EndFrame())
+
+        # When the first participant joins, the bot should introduce itself.
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            # Provide some air whilst tracks subscribe
+            time.sleep(2)
+            messages.append(
+                {
+                    "role": "system",
+                    "content": "Briefly introduce yourself by saying 'hello, I'm FastBot, how can I help you today?'"})
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Pipecat Bot")
+    parser.add_argument("-s", "--settings", type=str, required=True, help="Pipecat bot settings")
+
+    args, unknown = parser.parse_known_args()
+
+    try:
+        settings = BotSettings.model_validate_json(args.settings)
+        asyncio.run(main(settings))
+    except ValidationError as e:
+        print(e)
--- a/examples/deployment/flyio-example/bot_runner.py
+++ b/examples/deployment/flyio-example/bot_runner.py
@@ -1,7 +1,15 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
 import os
 import argparse
 import subprocess
-import requests
+
+from pydantic import BaseModel, ValidationError
+from typing import Optional

 from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams

@@ -9,6 +17,8 @@ from fastapi import FastAPI, Request, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import JSONResponse

+from bot import BotSettings
+
 from dotenv import load_dotenv
 load_dotenv(override=True)

@@ -16,29 +26,24 @@ load_dotenv(override=True)
 # ------------ Configuration ------------ #

 MAX_SESSION_TIME = 5 * 60  # 5 minutes
-REQUIRED_ENV_VARS = [
-    'DAILY_API_KEY',
-    'OPENAI_API_KEY',
-    'ELEVENLABS_API_KEY',
-    'ELEVENLABS_VOICE_ID',
-    'FLY_API_KEY',
-    'FLY_APP_NAME',]
-
-FLY_API_HOST = os.getenv("FLY_API_HOST", "https://api.machines.dev/v1")
-FLY_APP_NAME = os.getenv("FLY_APP_NAME", "pipecat-fly-example")
-FLY_API_KEY = os.getenv("FLY_API_KEY", "")
-FLY_HEADERS = {
-    'Authorization': f"Bearer {FLY_API_KEY}",
-    'Content-Type': 'application/json'
-}
+REQUIRED_ENV_VARS = ['DAILY_API_URL', 'DAILY_API_KEY', 'DEEPGRAM_API_KEY']

 daily_rest_helper = DailyRESTHelper(
    os.getenv("DAILY_API_KEY", ""),
    os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))


+class RunnerSettings(BaseModel):
+    prompt: Optional[
+        str] = "You are a fast, low-latency chatbot. Your goal is to demonstrate voice-driven AI capabilities at human-like speeds. When introducing yourself briefly mention your goal is to showcase speed and conversational flow. The technology powering you is Daily for transport, Cerebrium for GPU hosting, Llama 3 (8-B version) LLM, and Deepgram for speech-to-text and text-to-speech. You are hosted on the east coast of the United States. Respond to what the user said in a creative and helpful way, but keep responses short and legible. Ensure responses contain only words. Check again that you have not included special characters other than '?' or '!'."
+    deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE")
+    openai_model: Optional[str] = os.getenv("OPENAI_MODEL", "gpt-4o")
+    openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY")
+    test: Optional[bool] = None
+
 # ----------------- API ----------------- #

+
 app = FastAPI()

 app.add_middleware(
@@ -52,67 +57,25 @@ app.add_middleware(
 # ----------------- Main ----------------- #


-def spawn_fly_machine(room_url: str, token: str):
-    # Use the same image as the bot runner
-    res = requests.get(f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines", headers=FLY_HEADERS)
-    if res.status_code != 200:
-        raise Exception(f"Unable to get machine info from Fly: {res.text}")
-    image = res.json()[0]['config']['image']
-
-    # Machine configuration
-    cmd = f"python3 bot.py -u {room_url} -t {token}"
-    cmd = cmd.split()
-    worker_props = {
-        "config": {
-            "image": image,
-            "auto_destroy": True,
-            "init": {
-                "cmd": cmd
-            },
-            "restart": {
-                "policy": "no"
-            },
-            "guest": {
-                "cpu_kind": "shared",
-                "cpus": 1,
-                "memory_mb": 1024
-            }
-        },
-
-    }
-
-    # Spawn a new machine instance
-    res = requests.post(
-        f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines",
-        headers=FLY_HEADERS,
-        json=worker_props)
-
-    if res.status_code != 200:
-        raise Exception(f"Problem starting a bot worker: {res.text}")
-
-    # Wait for the machine to enter the started state
-    vm_id = res.json()['id']
-
-    res = requests.get(
-        f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines/{vm_id}/wait?state=started",
-        headers=FLY_HEADERS)
-
-    if res.status_code != 200:
-        raise Exception(f"Bot was unable to enter started state: {res.text}")
-
-    print(f"Machine joined room: {room_url}")
-
-
@app.post("/start_bot")
 async def start_bot(request: Request) -> JSONResponse:
+    runner_settings = RunnerSettings()
    try:
-        data = await request.json()
-        # Is this a webhook creation request?
-        if "test" in data:
-            return JSONResponse({"test": True})
+        request_body = await request.body()
+        if len(request_body) > 0:
+            runner_settings = RunnerSettings.model_validate_json(request_body)
+    except ValidationError as e:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Invalid request: {e}")
    except Exception as e:
+        # If no data in request, pass
        pass

+    # Is this a webhook creation request?
+    if runner_settings.test is not None:
+        return JSONResponse({"test": True})
+
    # Use specified room URL, or create a new one if not specified
    room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")

@@ -141,25 +104,26 @@ async def start_bot(request: Request) -> JSONResponse:
        raise HTTPException(
            status_code=500, detail=f"Failed to get token for room: {room_url}")

-    # Launch a new fly.io machine, or run as a shell process (not recommended)
-    run_as_process = os.getenv("RUN_AS_PROCESS", False)
+    # Spawn a new agent, and join the user session
+    try:
+        bot_settings = BotSettings(
+            room_url=room.url,
+            room_token=token,
+            prompt=runner_settings.prompt,
+            deepgram_voice=runner_settings.deepgram_voice,
+            openai_model=runner_settings.openai_model,
+            openai_api_key=runner_settings.openai_api_key,
+        )
+        bot_settings_str = bot_settings.model_dump_json(exclude_none=True)

-    if run_as_process:
-        try:
-            subprocess.Popen(
-                [f"python3 -m bot -u {room.url} -t {token}"],
-                shell=True,
-                bufsize=1,
-                cwd=os.path.dirname(os.path.abspath(__file__)))
-        except Exception as e:
-            raise HTTPException(
-                status_code=500, detail=f"Failed to start subprocess: {e}")
-    else:
-        try:
-            spawn_fly_machine(room.url, token)
-        except Exception as e:
-            raise HTTPException(
-                status_code=500, detail=f"Failed to spawn VM: {e}")
+        subprocess.Popen(
+            [f"python3 -m bot -s '{bot_settings_str}'"],
+            shell=True,
+            bufsize=1,
+            cwd=os.path.dirname(os.path.abspath(__file__)))
+    except Exception as e:
+        raise HTTPException(
+            status_code=500, detail=f"Failed to start subprocess: {e}")

    # Grab a token for the user to join with
    user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
@@ -169,6 +133,7 @@ async def start_bot(request: Request) -> JSONResponse:
        "token": user_token,
    })

+
 if __name__ == "__main__":
    # Check environment variables
    for env_var in REQUIRED_ENV_VARS:
@@ -181,7 +146,7 @@ if __name__ == "__main__":
    parser.add_argument("--port", type=int,
                        default=os.getenv("PORT", 7860), help="Port number")
    parser.add_argument("--reload", action="store_true",
-                        default=False, help="Reload code on change")
+                        default=True, help="Reload code on change")

    config = parser.parse_args()

--- a/examples/fast-chatbot/env.example
+++ b/examples/fast-chatbot/env.example
@@ -0,0 +1,12 @@
+DAILY_SAMPLE_ROOM_URL= #optional: use the same room each time, or create a new one if unset
+DAILY_API_KEY=
+DAILY_API_URL=
+
+DEEPGRAM_API_KEY=
+DEEPGRAM_VOICE=
+DEEPGRAM_STT_URL=
+DEEPGRAM_TTS_BASE_URL=
+
+OPENAI_API_KEY=
+OPENAI_MODEL=
+OPENAI_BASE_URL=
--- a/examples/fast-chatbot/helpers.py
+++ b/examples/fast-chatbot/helpers.py
@@ -0,0 +1,267 @@
+from loguru import logger
+import asyncio
+import math
+import struct
+import time
+from dataclasses import dataclass, field
+from typing import List
+
+
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.frames.frames import (
+    Frame,
+    AudioRawFrame,
+    InterimTranscriptionFrame,
+    TranscriptionFrame,
+    TextFrame,
+    StartInterruptionFrame,
+    LLMFullResponseStartFrame,
+    TTSStoppedFrame,
+    MetricsFrame
+)
+
+from pipecat.vad.vad_analyzer import VADAnalyzer, VADState
+from pipecat.services.deepgram import DeepgramTTSService
+from pipecat.services.openai import OpenAILLMContext, OpenAILLMContextFrame
+
+
+class GreedyLLMAggregator(FrameProcessor):
+    def __init__(self, context: OpenAILLMContext = None, **kwargs):
+        super().__init__(**kwargs)
+        self.context: OpenAILLMContext = context if context else OpenAILLMContext()
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        logger.debug(f"{frame}")
+
+        try:
+            if isinstance(frame, InterimTranscriptionFrame):
+                return
+
+            if isinstance(frame, TranscriptionFrame):
+                # append transcribed text to last "user" frame
+                if self.context.messages and self.context.messages[-1]["role"] == "user":
+                    last_frame = self.context.messages.pop()
+                else:
+                    last_frame = {"role": "user", "content": ""}
+
+                last_frame["content"] += " " + frame.text
+                self.context.messages.append(last_frame)
+
+                oai_context_frame = OpenAILLMContextFrame(context=self.context)
+                logger.debug(f"pushing frame {oai_context_frame}")
+                await self.push_frame(oai_context_frame)
+                return
+
+            await self.push_frame(frame, direction)
+        except Exception as e:
+            logger.debug(f"error: {e}")
+
+
+class ClearableDeepgramTTSService(DeepgramTTSService):
+    def __init___(self, **kwargs):
+        super().__init(**kwargs)
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, StartInterruptionFrame):
+            self._current_sentence = ""
+
+
+@dataclass
+class BufferedSentence:
+    audio_frames: List[AudioRawFrame] = field(default_factory=list)
+    text_frame: TextFrame = None
+
+
+class VADGate(FrameProcessor):
+
+    def __init__(
+            self,
+            vad_analyzer: VADAnalyzer = None,
+            context: OpenAILLMContext = None,
+            **kwargs):
+        super().__init__(**kwargs)
+        self.vad_analyzer = vad_analyzer
+        self.context = context
+
+        self._audio_pusher_task = None
+        self._expect_text_frame_next = False
+        self._sentences: List[BufferedSentence] = []
+
+    # queue output from tts one sentence at a time. associate a buffer of audio frames with the content of
+    # each text frame.
+    #
+    # start a coroutine to service the queue and send sentences down the pipeline when possible.
+    # 1. do not send anything when we are not in VADState.QUIET
+    # 2. if we are in VADState.QUIET, send a sentence, estimate how long it will take for that sentence
+    #    to output, sleep until it's time to send another sentence
+    # 3. each time we send a sentence, append it to the conversation context
+    # 3. when the sentence buffer becomes empty, cancel the coroutine
+    # 4. if we get a new LLMFullResponse, treat that as a cancellation, too
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        try:
+
+            # A TTSService will emit a series of AudioRawFrame objects, then a TTSStoppedFrame,
+            # then a TextFrame.
+
+            if self._expect_text_frame_next:
+                self._expect_text_frame_next = False
+                if isinstance(frame, TextFrame):
+                    self._sentences[-1].text_frame = frame
+                else:
+                    logger.debug(f"expected a text frame, but received {frame}")
+                    await self.push_frame(frame, direction)
+                return
+            else:
+                if isinstance(frame, TextFrame):
+                    logger.error(f"XXXXXXXXXXXXXXXXXXX received a text frame, wasn't expecting it.")
+
+            if isinstance(frame, AudioRawFrame):
+                # if our buffer is empty or has a "finished" sentence at the end,
+                # then we need to start buffering a new sentence
+                if not self._sentences or self._sentences[-1].text_frame:
+                    self._sentences.append(BufferedSentence())
+                self._sentences[-1].audio_frames.append(frame)
+                await self.maybe_start_audio_pusher_task()
+                return
+
+            if isinstance(frame, TTSStoppedFrame):
+                self._expect_text_frame_next = True
+                await self.push_frame(frame, direction)
+                return
+
+            # There are two ways we can be interrupted. During greedy inference, a new
+            # LLM response can start. Or, during playout, we can get a traditional
+            # user interruption frame.
+            if (isinstance(frame, LLMFullResponseStartFrame) or
+                    isinstance(frame, StartInterruptionFrame)):
+                logger.debug(f"{frame} - Handle interruption in VADGate")
+                self._sentences = []
+                if self._audio_pusher_task:
+                    self._audio_pusher_task.cancel()
+                    self._audio_pusher_task = None
+                await self.push_frame(frame, direction)
+                return
+
+            await self.push_frame(frame, direction)
+        except Exception as e:
+            logger.debug(f"error: {e}")
+
+    async def maybe_start_audio_pusher_task(self):
+        try:
+            if self._audio_pusher_task:
+                return
+            self._audio_pusher_task = self.get_event_loop().create_task(self.push_audio())
+
+        except Exception as e:
+            logger.debug(f"Exception {e}")
+
+    async def push_audio(self):
+        try:
+            while True:
+                if not self._sentences:
+                    await asyncio.sleep(0.01)
+                    continue
+
+                if self.vad_analyzer._vad_state != VADState.QUIET:
+                    await asyncio.sleep(0.01)
+                    continue
+
+                # we only want to push completed sentence buffers
+                if not self._sentences[0].text_frame:
+                    await asyncio.sleep(0.01)
+                    continue
+
+                s = self._sentences.pop(0)
+                if not s.audio_frames:
+                    continue
+                sample_rate = s.audio_frames[0].sample_rate
+                duration = 0
+                logger.debug(f"Pushing {len(s.audio_frames)} audio frames")
+                for frame in s.audio_frames:
+                    await self.push_frame(frame)
+                    # assume linear16 encoding (2 bytes per sample). todo: add some more
+                    # metadata to AudioRawFrame, maybe
+                    duration += (len(frame.audio) / 2 / frame.num_channels) / sample_rate
+                await asyncio.sleep(duration - 20 / 1000)
+                if self.context:
+                    logger.debug(f"Appending assistant message to context: [{s.text_frame.text}]")
+                    self.context.messages.append(
+                        {"role": "assistant", "content": s.text_frame.text}
+                    )
+                await self.push_frame(s.text_frame)
+
+        except Exception as e:
+            logger.debug(f"Exception {e}")
+
+
+class TranscriptionTimingLogger(FrameProcessor):
+    def __init__(self, avt):
+        super().__init__()
+        self.name = "Transcription"
+        self._avt = avt
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        try:
+            await super().process_frame(frame, direction)
+            if isinstance(frame, TranscriptionFrame):
+                elapsed = time.time() - self._avt.last_transition_ts
+                logger.debug(f"Transcription TTF: {elapsed}")
+                await self.push_frame(MetricsFrame(ttfb={self.name: elapsed}))
+
+            await self.push_frame(frame, direction)
+        except Exception as e:
+            logger.debug(f"Exception {e}")
+
+
+class AudioVolumeTimer(FrameProcessor):
+    def __init__(self):
+        super().__init__()
+        self.last_transition_ts = 0
+        self._prev_volume = -80
+        self._speech_volume_threshold = -50
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, AudioRawFrame):
+            volume = self.calculate_volume(frame)
+            # print(f"Audio volume: {volume:.2f} dB")
+            if (volume >= self._speech_volume_threshold and
+                    self._prev_volume < self._speech_volume_threshold):
+                # logger.debug("transition above speech volume threshold")
+                self.last_transition_ts = time.time()
+            elif (volume < self._speech_volume_threshold and
+                    self._prev_volume >= self._speech_volume_threshold):
+                # logger.debug("transition below non-speech volume threshold")
+                self.last_transition_ts = time.time()
+            self._prev_volume = volume
+
+        await self.push_frame(frame, direction)
+
+    def calculate_volume(self, frame: AudioRawFrame) -> float:
+        if frame.num_channels != 1:
+            raise ValueError(f"Expected 1 channel, got {frame.num_channels}")
+
+        # Unpack audio data into 16-bit integers
+        fmt = f"{len(frame.audio) // 2}h"
+        audio_samples = struct.unpack(fmt, frame.audio)
+
+        # Calculate RMS
+        sum_squares = sum(sample**2 for sample in audio_samples)
+        rms = math.sqrt(sum_squares / len(audio_samples))
+
+        # Convert RMS to decibels (dB)
+        # Reference: maximum value for 16-bit audio is 32767
+        if rms > 0:
+            db = 20 * math.log10(rms / 32767)
+        else:
+            db = -96  # Minimum value (almost silent)
+
+        return db
--- a/examples/deployment/flyio-example/requirements.txt
+++ b/examples/deployment/flyio-example/requirements.txt
@@ -1,4 +1,4 @@
-pipecat-ai[daily,openai,silero]
+pipecat-ai[daily,openai,silero,deepgram]
 fastapi
 uvicorn
 requests
--- a/examples/foundational/06a-image-sync.py
+++ b/examples/foundational/06a-image-sync.py
@@ -67,12 +67,11 @@ async def main(room_url: str, token):
            "Respond bot",
            DailyParams(
                audio_out_enabled=True,
-                camera_out_enabled=True,
                camera_out_width=1024,
                camera_out_height=1024,
                transcription_enabled=True,
                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
+                vad_analyzer=SileroVADAnalyzer()
            )
        )

@@ -117,7 +116,7 @@ async def main(room_url: str, token):
        async def on_first_participant_joined(transport, participant):
            participant_name = participant["info"]["userName"] or ''
            transport.capture_participant_transcription(participant["id"])
-            await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])
+            await task.queue_frames([TextFrame(f"Hi, this is {participant_name}.")])

        runner = PipelineRunner()

--- a/examples/foundational/07d-interruptible-cartesia.py
+++ b/examples/foundational/07d-interruptible-cartesia.py
@@ -37,8 +37,8 @@ async def main(room_url: str, token):
        token,
        "Respond bot",
        DailyParams(
-            audio_out_sample_rate=44100,
            audio_out_enabled=True,
+            audio_out_sample_rate=44100,
            transcription_enabled=True,
            vad_enabled=True,
            vad_analyzer=SileroVADAnalyzer()
@@ -47,8 +47,8 @@ async def main(room_url: str, token):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="a0e99841-438c-4a64-b679-ae501e7d6091",  # Barbershop Man
-        sample_rate=44100,
+        voice_name="British Lady",
+        output_format="pcm_44100"
    )

    llm = OpenAILLMService(
@@ -70,11 +70,11 @@ async def main(room_url: str, token):
        tma_in,              # User responses
        llm,                 # LLM
        tts,                 # TTS
-        tma_out,             # Goes before the transport because cartesia has word-level timestamps!
        transport.output(),  # Transport bot output
+        tma_out              # Assistant spoken responses
    ])

-    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
+    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))

    @transport.event_handler("on_first_participant_joined")
    async def on_first_participant_joined(transport, participant):
--- a/examples/foundational/07i-interruptible-xtts.py
+++ b/examples/foundational/07i-interruptible-xtts.py
@@ -1,96 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import aiohttp
-import os
-import sys
-
-from pipecat.frames.frames import LLMMessagesFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
-from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.xtts import XTTSService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-async def main(room_url: str, token):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Respond bot",
-            DailyParams(
-                audio_out_enabled=True,
-                transcription_enabled=True,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
-            )
-        )
-
-        tts = XTTSService(
-            aiohttp_session=session,
-            voice_id="Claribel Dervla",
-            language="en",
-            base_url="http://localhost:8000"
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        pipeline = Pipeline([
-            transport.input(),   # Transport user input
-            tma_in,              # User responses
-            llm,                 # LLM
-            tts,                 # TTS
-            transport.output(),  # Transport bot output
-            tma_out              # Assistant spoken responses
-        ])
-
-        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            # Kick off the conversation.
-            messages.append(
-                {"role": "system", "content": "Please introduce yourself to the user."})
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        runner = PipelineRunner()
-
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    (url, token) = configure()
-    asyncio.run(main(url, token))
--- a/examples/foundational/07j-interruptible-gladia.py
+++ b/examples/foundational/07j-interruptible-gladia.py
@@ -1,101 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import aiohttp
-import os
-import sys
-
-from pipecat.frames.frames import LLMMessagesFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
-from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
-from pipecat.services.gladia import GladiaSTTService
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.xtts import XTTSService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-async def main(room_url: str, token):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Respond bot",
-            DailyParams(
-                audio_out_enabled=True,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
-                vad_audio_passthrough=True,
-            )
-        )
-
-        stt = GladiaSTTService(
-            api_key=os.getenv("GLADIA_API_KEY"),
-        )
-
-        tts = DeepgramTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("DEEPGRAM_API_KEY"),
-            voice="aura-helios-en"
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        pipeline = Pipeline([
-            transport.input(),   # Transport user input
-            stt,                 # STT
-            tma_in,              # User responses
-            llm,                 # LLM
-            tts,                 # TTS
-            transport.output(),  # Transport bot output
-            tma_out              # Assistant spoken responses
-        ])
-
-        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            # Kick off the conversation.
-            messages.append(
-                {"role": "system", "content": "Please introduce yourself to the user."})
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        runner = PipelineRunner()
-
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    (url, token) = configure()
-    asyncio.run(main(url, token))
--- a/examples/foundational/15-switch-voices.py
+++ b/examples/foundational/15-switch-voices.py
@@ -66,6 +66,7 @@ async def main(room_url: str, token):
            "Pipecat",
            DailyParams(
                audio_out_enabled=True,
+                audio_out_sample_rate=44100,
                transcription_enabled=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer()
@@ -74,17 +75,20 @@ async def main(room_url: str, token):

        news_lady = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="bf991597-6c13-47e4-8411-91ec2de5c466",  # Newslady
+            voice_name="Newslady",
+            output_format="pcm_44100"
        )

        british_lady = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
+            voice_name="British Lady",
+            output_format="pcm_44100"
        )

        barbershop_man = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="a0e99841-438c-4a64-b679-ae501e7d6091",  # Barbershop Man
+            voice_name="Barbershop Man",
+            output_format="pcm_44100"
        )

        llm = OpenAILLMService(
--- a/examples/foundational/17-detect-user-idle.py
+++ b/examples/foundational/17-detect-user-idle.py
@@ -1,108 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import aiohttp
-import os
-import sys
-
-from pipecat.frames.frames import LLMMessagesFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
-from pipecat.processors.frame_processor import FrameDirection
-from pipecat.processors.user_idle_processor import UserIdleProcessor
-from pipecat.services.elevenlabs import ElevenLabsTTSService
-from pipecat.services.openai import OpenAILLMService
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-async def main(room_url: str, token):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            room_url,
-            token,
-            "Respond bot",
-            DailyParams(
-                audio_out_enabled=True,
-                transcription_enabled=True,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer()
-            )
-        )
-
-        tts = ElevenLabsTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("ELEVENLABS_API_KEY"),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        async def user_idle_callback(user_idle: UserIdleProcessor):
-            messages.append(
-                {"role": "system", "content": "Ask the user if they are still there and try to prompt for some input, but be short."})
-            await user_idle.queue_frame(LLMMessagesFrame(messages))
-
-        user_idle = UserIdleProcessor(callback=user_idle_callback, timeout=5.0)
-
-        pipeline = Pipeline([
-            transport.input(),   # Transport user input
-            user_idle,           # Idle user check-in
-            tma_in,              # User responses
-            llm,                 # LLM
-            tts,                 # TTS
-            transport.output(),  # Transport bot output
-            tma_out              # Assistant spoken responses
-        ])
-
-        task = PipelineTask(pipeline, PipelineParams(
-            allow_interruptions=True,
-            enable_metrics=True,
-            report_only_initial_ttfb=True,
-        ))
-
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            # Kick off the conversation.
-            messages.append(
-                {"role": "system", "content": "Please introduce yourself to the user."})
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        runner = PipelineRunner()
-
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    (url, token) = configure()
-    asyncio.run(main(url, token))
--- a/examples/storytelling-chatbot/Dockerfile
+++ b/examples/storytelling-chatbot/Dockerfile
@@ -1,4 +1,4 @@
-FROM python:3.11-slim-bookworm
+FROM python:3.11-bullseye

 ARG DEBIAN_FRONTEND=noninteractive
 ARG USE_PERSISTENT_DATA
@@ -51,4 +51,4 @@ COPY --chown=user ./frontend/ frontend/
 RUN cd frontend && npm install && npm run build

 # Start the FastAPI server
-CMD python3 src/bot_runner.py --port ${FAST_API_PORT}
+CMD python3 src/server.py --port ${FAST_API_PORT}
--- a/examples/storytelling-chatbot/README.md
+++ b/examples/storytelling-chatbot/README.md
@@ -48,8 +48,6 @@ pip install -r requirements.txt
 mv env.example .env
 ```

-When deploying to production, to ensure only this app can spawn a new bot, set your `ENV` to `production`
-
 **Build the frontend:**

 This project uses a custom frontend, which needs to built. Note: this is done automatically as part of the Docker deployment.
@@ -66,11 +64,11 @@ The build UI files can be found in `frontend/out`

 Start the API / bot manager:

-`python src/bot_runner.py`
+`python src/server.py`

 If you'd like to run a custom domain or port:

-`python src/bot_runner.py --host somehost --p someport`
+`python src/server.py --host somehost --p 7777`

 ➡️ Open the host URL in your browser `http://localhost:7860`

--- a/examples/storytelling-chatbot/env.example
+++ b/examples/storytelling-chatbot/env.example
@@ -1,9 +1,5 @@
-DAILY_API_KEY=
-DAILY_SAMPLE_ROOM_URL=
-ELEVENLABS_API_KEY=
-ELEVENLABS_VOICE_ID=
-FAL_KEY=
-OPENAI_API_KEY=
-
-ENV= # dev | production
-RUN_AS_VM= # Set this if you want to run bots on process (not launch a new VM)
+DAILY_API_KEY=7df...
+ELEVENLABS_API_KEY=aeb...
+ELEVENLABS_VOICE_ID=7S...
+FAL_KEY=8c...
+OPENAI_API_KEY=sk-PL...
--- a/examples/storytelling-chatbot/frontend/components/App.tsx
+++ b/examples/storytelling-chatbot/frontend/components/App.tsx
@@ -27,11 +27,14 @@ export default function Call() {

    // Create a new room for the story session
    try {
-      const response = await fetch("/start_bot", {
+      const response = await fetch("/create", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
        },
+        body: JSON.stringify({
+          room_url: process.env.NEXT_PUBLIC_ROOM_URL || null,
+        }),
      });

      const { room_url, token } = await response.json();
@@ -52,9 +55,21 @@ export default function Call() {
      // Disable local audio, the bot will say hello first
      daily.setLocalAudio(false);

+      // Start the bot
+      const resp = await fetch("/start", {
+        method: "POST",
+        headers: {
+          "Content-Type": "application/json",
+        },
+        body: JSON.stringify({
+          room_url,
+        }),
+      });
+
      setState("started");
    } catch (error) {
      setState("error");
+      leave();
    }
  }

@@ -64,13 +79,7 @@ export default function Call() {
  }

  if (state === "error") {
-    return (
-      <div className="flex items-center mx-auto">
-        <p className="text-red-500 font-semibold bg-white px-4 py-2 shadow-xl rounded-lg">
-          This demo is currently at capacity. Please try again later.
-        </p>
-      </div>
-    );
+    return <div>An Error occured</div>;
  }

  if (state === "started") {
--- a/examples/storytelling-chatbot/frontend/components/DevicePicker/index.tsx
+++ b/examples/storytelling-chatbot/frontend/components/DevicePicker/index.tsx
@@ -108,26 +108,26 @@ export default function DevicePicker({}: Props) {
      {hasMicError && (
        <div className="error">
          {micState === "blocked" ? (
-            <p className="text-red-500">
+            <p>
              Please check your browser and system permissions. Make sure that
              this app is allowed to access your microphone.
            </p>
          ) : micState === "in-use" ? (
-            <p className="text-red-500">
+            <p>
              Your microphone is being used by another app. Please close any
              other apps using your microphone and restart this app.
            </p>
          ) : micState === "not-found" ? (
-            <p className="text-red-500">
+            <p>
              No microphone seems to be connected. Please connect a microphone.
            </p>
          ) : micState === "not-supported" ? (
-            <p className="text-red-500">
+            <p>
              This app is not supported on your device. Please update your
              software or use a different device.
            </p>
          ) : (
-            <p className="text-red-500">
+            <p>
              There seems to be an issue accessing your microphone. Try
              restarting the app or consult a system administrator.
            </p>
--- a/examples/storytelling-chatbot/frontend/components/Setup.tsx
+++ b/examples/storytelling-chatbot/frontend/components/Setup.tsx
@@ -1,7 +1,7 @@
 import React from "react";
 import { Button } from "@/components/ui/button";
 import DevicePicker from "@/components/DevicePicker";
-import { IconAlertCircle, IconEar, IconLoader2 } from "@tabler/icons-react";
+import { IconEar, IconLoader2 } from "@tabler/icons-react";

 type SetupProps = {
  handleStart: () => void;
@@ -24,6 +24,7 @@ export const Setup: React.FC<SetupProps> = ({ handleStart }) => {
          <h1 className="text-4xl font-bold text-pretty tracking-tighter mb-4">
            Welcome to <span className="text-sky-500">Storytime</span>
          </h1>
+
          {state === "intro" ? (
            <>
              <p className="text-gray-600 leading-relaxed text-pretty">
@@ -37,9 +38,6 @@ export const Setup: React.FC<SetupProps> = ({ handleStart }) => {
                <IconEar size={24} /> For best results, try in a quiet
                environment!
              </p>
-              <p className="flex flex-row gap-2 text-gray-600 font-medium text-red-500">
-                <IconAlertCircle size={24} /> This demo expires after 5 minutes.
-              </p>
            </>
          ) : (
            <>
@@ -51,6 +49,7 @@ export const Setup: React.FC<SetupProps> = ({ handleStart }) => {
              <DevicePicker />
            </>
          )}
+
          <hr className="border-gray-150 my-2" />

          <Button
--- a/examples/storytelling-chatbot/frontend/env.example
+++ b/examples/storytelling-chatbot/frontend/env.example
@@ -1 +1,2 @@
+NEXT_PUBLIC_ROOM_URL=
 SITE_URL=
--- a/examples/storytelling-chatbot/frontend/yarn.lock
+++ b/examples/storytelling-chatbot/frontend/yarn.lock
@@ -899,11 +899,11 @@ brace-expansion@^2.0.1:
    balanced-match "^1.0.0"

 braces@^3.0.2, braces@~3.0.2:
-  version "3.0.3"
-  resolved "https://registry.yarnpkg.com/braces/-/braces-3.0.3.tgz#490332f40919452272d55a8480adc0c441358789"
-  integrity "sha1-SQMy9AkZRSJy1VqEgK3AxEE1h4k= sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA=="
+  version "3.0.2"
+  resolved "https://registry.yarnpkg.com/braces/-/braces-3.0.2.tgz#3454e1a462ee8d599e236df336cd9ea4f8afe107"
+  integrity sha512-b8um+L1RzM3WDSzvhm6gIz1yfTbBt6YTlcEKAvsmqCZZFw46z626lVj9j1yEPW33H5H+lBQpZMP1k8l+78Ha0A==
  dependencies:
-    fill-range "^7.1.1"
+    fill-range "^7.0.1"

 browserslist@^4.23.0:
  version "4.23.0"
@@ -1551,10 +1551,10 @@ file-entry-cache@^6.0.1:
  dependencies:
    flat-cache "^3.0.4"

-fill-range@^7.1.1:
-  version "7.1.1"
-  resolved "https://registry.yarnpkg.com/fill-range/-/fill-range-7.1.1.tgz#44265d3cac07e3ea7dc247516380643754a05292"
-  integrity "sha1-RCZdPKwH4+p9wkdRY4BkN1SgUpI= sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg=="
+fill-range@^7.0.1:
+  version "7.0.1"
+  resolved "https://registry.yarnpkg.com/fill-range/-/fill-range-7.0.1.tgz#1919a6a7c75fe38b2c7c77e5198535da9acdda40"
+  integrity sha512-qOo9F+dMUmC2Lcb4BbVvnKJxTPjCm+RRpe4gDuGrzkL7mEVl/djYSu2OdQ2Pa302N4oqkSg9ir6jaLWJ2USVpQ==
  dependencies:
    to-regex-range "^5.0.1"

--- a/examples/storytelling-chatbot/src/bot.py
+++ b/examples/storytelling-chatbot/src/bot.py
@@ -5,7 +5,7 @@ import os
 import sys


-from pipecat.frames.frames import LLMMessagesFrame, StopTaskFrame, EndFrame
+from pipecat.frames.frames import LLMMessagesFrame, StopTaskFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineTask
@@ -139,16 +139,6 @@ async def main(room_url, token=None):

        main_task = PipelineTask(main_pipeline)

-        @transport.event_handler("on_participant_left")
-        async def on_participant_left(transport, participant, reason):
-            intro_task.queue_frame(EndFrame())
-            await main_task.queue_frame(EndFrame())
-
-        @transport.event_handler("on_call_state_updated")
-        async def on_call_state_updated(transport, state):
-            if state == "left":
-                await main_task.queue_frame(EndFrame())
-
        await runner.run(main_task)

 if __name__ == "__main__":
--- a/examples/storytelling-chatbot/src/bot_runner.py
+++ b/examples/storytelling-chatbot/src/bot_runner.py
@@ -1,233 +0,0 @@
-import os
-import argparse
-import subprocess
-import requests
-from pathlib import Path
-from typing import Optional
-
-from fastapi import FastAPI, Request, HTTPException
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.staticfiles import StaticFiles
-from fastapi.responses import FileResponse, JSONResponse
-
-from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams
-
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-# ------------ Fast API Config ------------ #
-
-MAX_SESSION_TIME = 5 * 60  # 5 minutes
-
-daily_rest_helper = DailyRESTHelper(
-    os.getenv("DAILY_API_KEY", ""),
-    os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))
-
-
-app = FastAPI()
-
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-
-# Mount the static directory
-STATIC_DIR = "frontend/out"
-
-
-# ------------ Fast API Routes ------------ #
-
-app.mount("/static", StaticFiles(directory=STATIC_DIR, html=True), name="static")
-
-
-@app.post("/start_bot")
-async def start_bot(request: Request) -> JSONResponse:
-    if os.getenv("ENV", "dev") == "production":
-        # Only allow requests from the specified domain
-        host_header = request.headers.get("host")
-        allowed_domains = ["storytelling-chatbot.fly.dev", "www.storytelling-chatbot.fly.dev"]
-        # Check if the Host header matches the allowed domain
-        if host_header not in allowed_domains:
-            raise HTTPException(status_code=403, detail="Access denied")
-
-    try:
-        data = await request.json()
-        # Is this a webhook creation request?
-        if "test" in data:
-            return JSONResponse({"test": True})
-    except Exception as e:
-        pass
-
-    # Use specified room URL, or create a new one if not specified
-    room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")
-
-    if not room_url:
-        params = DailyRoomParams(
-            properties=DailyRoomProperties()
-        )
-        try:
-            room: DailyRoomObject = daily_rest_helper.create_room(params=params)
-        except Exception as e:
-            raise HTTPException(
-                status_code=500,
-                detail=f"Unable to provision room {e}")
-    else:
-        # Check passed room URL exists, we should assume that it already has a sip set up
-        try:
-            room: DailyRoomObject = daily_rest_helper.get_room_from_url(room_url)
-        except Exception:
-            raise HTTPException(
-                status_code=500, detail=f"Room not found: {room_url}")
-
-    # Give the agent a token to join the session
-    token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
-
-    if not room or not token:
-        raise HTTPException(
-            status_code=500, detail=f"Failed to get token for room: {room_url}")
-
-    # Launch a new VM, or run as a shell process (not recommended)
-    if os.getenv("RUN_AS_VM", False):
-        try:
-            virtualize_bot(room.url, token)
-        except Exception as e:
-            raise HTTPException(
-                status_code=500, detail=f"Failed to spawn VM: {e}")
-    else:
-        try:
-            subprocess.Popen(
-                [f"python3 -m bot -u {room.url} -t {token}"],
-                shell=True,
-                bufsize=1,
-                cwd=os.path.dirname(os.path.abspath(__file__)))
-        except Exception as e:
-            raise HTTPException(
-                status_code=500, detail=f"Failed to start subprocess: {e}")
-
-    # Grab a token for the user to join with
-    user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
-
-    return JSONResponse({
-        "room_url": room.url,
-        "token": user_token,
-    })
-
-
-@app.get("/{path_name:path}", response_class=FileResponse)
-async def catch_all(path_name: Optional[str] = ""):
-    if path_name == "":
-        return FileResponse(f"{STATIC_DIR}/index.html")
-
-    file_path = Path(STATIC_DIR) / (path_name or "")
-
-    if file_path.is_file():
-        return file_path
-
-    html_file_path = file_path.with_suffix(".html")
-    if html_file_path.is_file():
-        return FileResponse(html_file_path)
-
-    raise HTTPException(status_code=450, detail="Incorrect API call")
-
-
-# ------------ Virtualization ------------ #
-
-def virtualize_bot(room_url: str, token: str):
-    """
-    This is an example of how to virtualize the bot using Fly.io
-    You can adapt this method to use whichever cloud provider you prefer.
-    """
-    FLY_API_HOST = os.getenv("FLY_API_HOST", "https://api.machines.dev/v1")
-    FLY_APP_NAME = os.getenv("FLY_APP_NAME", "storytelling-chatbot")
-    FLY_API_KEY = os.getenv("FLY_API_KEY", "")
-    FLY_HEADERS = {
-        'Authorization': f"Bearer {FLY_API_KEY}",
-        'Content-Type': 'application/json'
-    }
-
-    # Use the same image as the bot runner
-    res = requests.get(f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines", headers=FLY_HEADERS)
-    if res.status_code != 200:
-        raise Exception(f"Unable to get machine info from Fly: {res.text}")
-    image = res.json()[0]['config']['image']
-
-    # Machine configuration
-    cmd = f"python3 src/bot.py -u {room_url} -t {token}"
-    cmd = cmd.split()
-    worker_props = {
-        "config": {
-            "image": image,
-            "auto_destroy": True,
-            "init": {
-                "cmd": cmd
-            },
-            "restart": {
-                "policy": "no"
-            },
-            "guest": {
-                "cpu_kind": "shared",
-                "cpus": 1,
-                "memory_mb": 512
-            }
-        },
-
-    }
-
-    # Spawn a new machine instance
-    res = requests.post(
-        f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines",
-        headers=FLY_HEADERS,
-        json=worker_props)
-
-    if res.status_code != 200:
-        raise Exception(f"Problem starting a bot worker: {res.text}")
-
-    # Wait for the machine to enter the started state
-    vm_id = res.json()['id']
-
-    res = requests.get(
-        f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines/{vm_id}/wait?state=started",
-        headers=FLY_HEADERS)
-
-    if res.status_code != 200:
-        raise Exception(f"Bot was unable to enter started state: {res.text}")
-
-    print(f"Machine joined room: {room_url}")
-
-
-# ------------ Main ------------ #
-
-if __name__ == "__main__":
-    # Check environment variables
-    required_env_vars = ['OPENAI_API_KEY', 'DAILY_API_KEY',
-                         'FAL_KEY', 'ELEVENLABS_VOICE_ID', 'ELEVENLABS_API_KEY']
-    for env_var in required_env_vars:
-        if env_var not in os.environ:
-            raise Exception(f"Missing environment variable: {env_var}.")
-
-    import uvicorn
-
-    default_host = os.getenv("HOST", "0.0.0.0")
-    default_port = int(os.getenv("FAST_API_PORT", "7860"))
-
-    parser = argparse.ArgumentParser(
-        description="Daily Storyteller FastAPI server")
-    parser.add_argument("--host", type=str,
-                        default=default_host, help="Host address")
-    parser.add_argument("--port", type=int,
-                        default=default_port, help="Port number")
-    parser.add_argument("--reload", action="store_true",
-                        help="Reload code on change")
-
-    config = parser.parse_args()
-
-    uvicorn.run(
-        "bot_runner:app",
-        host=config.host,
-        port=config.port,
-        reload=config.reload
-    )
--- a/examples/storytelling-chatbot/src/server.py
+++ b/examples/storytelling-chatbot/src/server.py
@@ -0,0 +1,175 @@
+import os
+import argparse
+import subprocess
+import atexit
+from pathlib import Path
+from typing import Optional
+
+from fastapi import FastAPI, Request, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.staticfiles import StaticFiles
+from fastapi.responses import FileResponse, JSONResponse
+
+from utils.daily_helpers import create_room as _create_room, get_token, get_name_from_url
+
+MAX_BOTS_PER_ROOM = 1
+
+# Bot sub-process dict for status reporting and concurrency control
+bot_procs = {}
+
+
+def cleanup():
+    # Clean up function, just to be extra safe
+    for proc in bot_procs.values():
+        proc.terminate()
+        proc.wait()
+
+
+atexit.register(cleanup)
+
+
+app = FastAPI()
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# Mount the static directory
+STATIC_DIR = "frontend/out"
+
+app.mount("/static", StaticFiles(directory=STATIC_DIR, html=True), name="static")
+
+
+@app.post("/create")
+async def create_room(request: Request) -> JSONResponse:
+    data = await request.json()
+
+    if data.get('room_url') is not None:
+        room_url = data.get('room_url')
+        room_name = get_name_from_url(room_url)
+    else:
+        room_url, room_name = _create_room()
+
+    token = get_token(room_url)
+
+    return JSONResponse({"room_url": room_url, "room_name": room_name, "token": token})
+
+
+@app.post("/start")
+async def start_agent(request: Request) -> JSONResponse:
+    data = await request.json()
+
+    # Is this a webhook creation request?
+    if "test" in data:
+        return JSONResponse({"test": True})
+
+    # Ensure the room property is present
+    room_url = data.get('room_url')
+    if not room_url:
+        raise HTTPException(
+            status_code=500,
+            detail="Missing 'room' property in request data. Cannot start agent without a target room!")
+
+    # Check if there is already an existing process running in this room
+    num_bots_in_room = sum(
+        1 for proc in bot_procs.values() if proc[1] == room_url and proc[0].poll() is None)
+    if num_bots_in_room >= MAX_BOTS_PER_ROOM:
+        raise HTTPException(
+            status_code=500, detail=f"Max bot limited reach for room: {room_url}")
+
+    # Get the token for the room
+    token = get_token(room_url)
+
+    if not token:
+        raise HTTPException(
+            status_code=500, detail=f"Failed to get token for room: {room_url}")
+
+    # Spawn a new agent, and join the user session
+    # Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
+    try:
+        proc = subprocess.Popen(
+            [
+                f"python3 -m bot -u {room_url} -t {token}"
+            ],
+            shell=True,
+            bufsize=1,
+            cwd=os.path.dirname(os.path.abspath(__file__))
+        )
+        bot_procs[proc.pid] = (proc, room_url)
+    except Exception as e:
+        raise HTTPException(
+            status_code=500, detail=f"Failed to start subprocess: {e}")
+
+    return JSONResponse({"bot_id": proc.pid, "room_url": room_url})
+
+
+@app.get("/status/{pid}")
+def get_status(pid: int):
+    # Look up the subprocess
+    proc = bot_procs.get(pid)
+
+    # If the subprocess doesn't exist, return an error
+    if not proc:
+        raise HTTPException(
+            status_code=404, detail=f"Bot with process id: {pid} not found")
+
+    # Check the status of the subprocess
+    if proc[0].poll() is None:
+        status = "running"
+    else:
+        status = "finished"
+
+    return JSONResponse({"bot_id": pid, "status": status})
+
+
+@app.get("/{path_name:path}", response_class=FileResponse)
+async def catch_all(path_name: Optional[str] = ""):
+    if path_name == "":
+        return FileResponse(f"{STATIC_DIR}/index.html")
+
+    file_path = Path(STATIC_DIR) / (path_name or "")
+
+    if file_path.is_file():
+        return file_path
+
+    html_file_path = file_path.with_suffix(".html")
+    if html_file_path.is_file():
+        return FileResponse(html_file_path)
+
+    raise HTTPException(status_code=450, detail="Incorrect API call")
+
+
+if __name__ == "__main__":
+    # Check environment variables
+    required_env_vars = ['OPENAI_API_KEY', 'DAILY_API_KEY',
+                         'FAL_KEY', 'ELEVENLABS_VOICE_ID', 'ELEVENLABS_API_KEY']
+    for env_var in required_env_vars:
+        if env_var not in os.environ:
+            raise Exception(f"Missing environment variable: {env_var}.")
+
+    import uvicorn
+
+    default_host = os.getenv("HOST", "0.0.0.0")
+    default_port = int(os.getenv("FAST_API_PORT", "7860"))
+
+    parser = argparse.ArgumentParser(
+        description="Daily Storyteller FastAPI server")
+    parser.add_argument("--host", type=str,
+                        default=default_host, help="Host address")
+    parser.add_argument("--port", type=int,
+                        default=default_port, help="Port number")
+    parser.add_argument("--reload", action="store_true",
+                        help="Reload code on change")
+
+    config = parser.parse_args()
+
+    uvicorn.run(
+        "server:app",
+        host=config.host,
+        port=config.port,
+        reload=config.reload
+    )
--- a/examples/twilio-chatbot/bot.py
+++ b/examples/twilio-chatbot/bot.py
@@ -15,7 +15,6 @@ from pipecat.services.deepgram import DeepgramSTTService
 from pipecat.services.elevenlabs import ElevenLabsTTSService
 from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams
 from pipecat.vad.silero import SileroVADAnalyzer
-from pipecat.serializers.twilio import TwilioFrameSerializer

 from loguru import logger

@@ -26,7 +25,7 @@ logger.remove(0)
 logger.add(sys.stderr, level="DEBUG")


-async def run_bot(websocket_client, stream_sid):
+async def run_bot(websocket_client):
    async with aiohttp.ClientSession() as session:
        transport = FastAPIWebsocketTransport(
            websocket=websocket_client,
@@ -35,8 +34,7 @@ async def run_bot(websocket_client, stream_sid):
                add_wav_header=False,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
-                vad_audio_passthrough=True,
-                serializer=TwilioFrameSerializer(stream_sid)
+                vad_audio_passthrough=True
            )
        )

--- a/examples/twilio-chatbot/server.py
+++ b/examples/twilio-chatbot/server.py
@@ -1,5 +1,3 @@
-import json
-
 import uvicorn

 from fastapi import FastAPI, WebSocket
@@ -28,13 +26,8 @@ async def start_call():
@app.websocket("/ws")
 async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
-    start_data = websocket.iter_text()
-    await start_data.__anext__()
-    call_data = json.loads(await start_data.__anext__())
-    print(call_data, flush=True)
-    stream_sid = call_data['start']['streamSid']
    print("WebSocket connection accepted")
-    await run_bot(websocket, stream_sid)
+    await run_bot(websocket)


 if __name__ == "__main__":
--- a/linux-py3.10-requirements.txt
+++ b/linux-py3.10-requirements.txt
@@ -4,7 +4,7 @@
 #
 #    pip-compile --all-extras pyproject.toml
 #
-aiofiles==24.1.0
+aiofiles==23.2.1
    # via deepgram-sdk
 aiohttp==3.9.5
    # via
@@ -17,7 +17,7 @@ aiosignal==1.3.1
    # via aiohttp
 annotated-types==0.7.0
    # via pydantic
-anthropic==0.28.1
+anthropic==0.25.9
    # via
    #   openpipe
    #   pipecat-ai (pyproject.toml)
@@ -36,21 +36,23 @@ attrs==23.2.0
    # via
    #   aiohttp
    #   openpipe
-av==12.2.0
+av==12.1.0
    # via faster-whisper
-azure-cognitiveservices-speech==1.38.0
+azure-cognitiveservices-speech==1.37.0
    # via pipecat-ai (pyproject.toml)
 blinker==1.8.2
    # via flask
 cachetools==5.3.3
    # via google-auth
-cartesia==1.0.3
+cartesia==0.1.1
    # via pipecat-ai (pyproject.toml)
 certifi==2024.6.2
    # via
    #   httpcore
    #   httpx
    #   requests
+cffi==1.16.0
+    # via sounddevice
 charset-normalizer==3.3.2
    # via requests
 click==8.1.7
@@ -62,7 +64,7 @@ coloredlogs==15.0.1
    # via onnxruntime
 ctranslate2==4.3.1
    # via faster-whisper
-daily-python==0.10.1
+daily-python==0.10.0
    # via pipecat-ai (pyproject.toml)
 dataclasses-json==0.6.7
    # via
@@ -84,15 +86,15 @@ exceptiongroup==1.2.1
    # via
    #   anyio
    #   pytest
-fal-client==0.4.1
+fal-client==0.4.0
    # via pipecat-ai (pyproject.toml)
 fastapi==0.111.0
    # via pipecat-ai (pyproject.toml)
 fastapi-cli==0.0.4
    # via fastapi
-faster-whisper==1.0.3
+faster-whisper==1.0.2
    # via pipecat-ai (pyproject.toml)
-filelock==3.15.4
+filelock==3.15.3
    # via
    #   huggingface-hub
    #   pyht
@@ -111,22 +113,22 @@ frozenlist==1.4.1
    # via
    #   aiohttp
    #   aiosignal
-fsspec==2024.6.1
+fsspec==2024.6.0
    # via
    #   huggingface-hub
    #   torch
 future==1.0.0
    # via pyloudnorm
-google-ai-generativelanguage==0.6.6
+google-ai-generativelanguage==0.6.4
    # via google-generativeai
-google-api-core[grpc]==2.19.1
+google-api-core[grpc]==2.19.0
    # via
    #   google-ai-generativelanguage
    #   google-api-python-client
    #   google-generativeai
-google-api-python-client==2.135.0
+google-api-python-client==2.134.0
    # via google-generativeai
-google-auth==2.31.0
+google-auth==2.30.0
    # via
    #   google-ai-generativelanguage
    #   google-api-core
@@ -135,9 +137,9 @@ google-auth==2.31.0
    #   google-generativeai
 google-auth-httplib2==0.2.0
    # via google-api-python-client
-google-generativeai==0.7.1
+google-generativeai==0.5.4
    # via pipecat-ai (pyproject.toml)
-googleapis-common-protos==1.63.2
+googleapis-common-protos==1.63.1
    # via
    #   google-api-core
    #   grpcio-status
@@ -197,35 +199,31 @@ jinja2==3.1.4
    #   fastapi
    #   flask
    #   torch
-jiter==0.5.0
-    # via anthropic
 jsonpatch==1.33
    # via langchain-core
 jsonpointer==3.0.0
    # via jsonpatch
-langchain==0.2.6
+langchain==0.2.5
    # via
    #   langchain-community
    #   pipecat-ai (pyproject.toml)
-langchain-community==0.2.6
+langchain-community==0.2.5
    # via pipecat-ai (pyproject.toml)
-langchain-core==0.2.10
+langchain-core==0.2.9
    # via
    #   langchain
    #   langchain-community
    #   langchain-openai
    #   langchain-text-splitters
-langchain-openai==0.1.10
+langchain-openai==0.1.9
    # via pipecat-ai (pyproject.toml)
-langchain-text-splitters==0.2.2
+langchain-text-splitters==0.2.1
    # via langchain
-langsmith==0.1.83
+langsmith==0.1.81
    # via
    #   langchain
    #   langchain-community
    #   langchain-core
-llvmlite==0.43.0
-    # via numba
 loguru==0.7.2
    # via pipecat-ai (pyproject.toml)
 markdown-it-py==3.0.0
@@ -248,18 +246,14 @@ mypy-extensions==1.0.0
    # via typing-inspect
 networkx==3.3
    # via torch
-numba==0.60.0
-    # via resampy
 numpy==1.26.4
    # via
    #   ctranslate2
    #   langchain
    #   langchain-community
-    #   numba
    #   onnxruntime
    #   pipecat-ai (pyproject.toml)
    #   pyloudnorm
-    #   resampy
    #   scipy
    #   torchvision
    #   transformers
@@ -288,20 +282,20 @@ nvidia-cusparse-cu12==12.1.0.106
    #   torch
 nvidia-nccl-cu12==2.20.5
    # via torch
-nvidia-nvjitlink-cu12==12.5.82
+nvidia-nvjitlink-cu12==12.5.40
    # via
    #   nvidia-cusolver-cu12
    #   nvidia-cusparse-cu12
 nvidia-nvtx-cu12==12.1.105
    # via torch
-onnxruntime==1.18.1
+onnxruntime==1.18.0
    # via faster-whisper
-openai==1.27.0
+openai==1.26.0
    # via
    #   langchain-openai
    #   openpipe
    #   pipecat-ai (pyproject.toml)
-openpipe==4.16.0
+openpipe==4.14.0
    # via pipecat-ai (pyproject.toml)
 orjson==3.10.5
    # via
@@ -344,7 +338,9 @@ pyasn1-modules==0.4.0
    # via google-auth
 pyaudio==0.2.14
    # via pipecat-ai (pyproject.toml)
-pydantic==2.8.0
+pycparser==2.22
+    # via cffi
+pydantic==2.7.4
    # via
    #   anthropic
    #   fastapi
@@ -353,7 +349,7 @@ pydantic==2.8.0
    #   langchain-core
    #   langsmith
    #   openai
-pydantic-core==2.20.0
+pydantic-core==2.18.4
    # via pydantic
 pygments==2.18.0
    # via rich
@@ -400,8 +396,6 @@ requests==2.32.3
    #   pyht
    #   tiktoken
    #   transformers
-resampy==0.4.3
-    # via pipecat-ai (pyproject.toml)
 rich==13.7.1
    # via typer
 rsa==4.9
@@ -410,7 +404,7 @@ safetensors==0.4.3
    # via
    #   timm
    #   transformers
-scipy==1.14.0
+scipy==1.13.1
    # via pyloudnorm
 shellingham==1.5.4
    # via typer
@@ -422,6 +416,8 @@ sniffio==1.3.1
    #   anyio
    #   httpx
    #   openai
+sounddevice==0.4.7
+    # via pipecat-ai (pyproject.toml)
 sqlalchemy==2.0.31
    # via
    #   langchain
@@ -432,7 +428,7 @@ sympy==1.12.1
    # via
    #   onnxruntime
    #   torch
-tenacity==8.4.2
+tenacity==8.4.1
    # via
    #   langchain
    #   langchain-community
--- a/macos-py3.10-requirements.txt
+++ b/macos-py3.10-requirements.txt
@@ -1,10 +1,10 @@
 #
-# This file is autogenerated by pip-compile with Python 3.10
+# This file is autogenerated by pip-compile with Python 3.12
 # by the following command:
 #
 #    pip-compile --all-extras pyproject.toml
 #
-aiofiles==24.1.0
+aiofiles==23.2.1
    # via deepgram-sdk
 aiohttp==3.9.5
    # via
@@ -17,7 +17,7 @@ aiosignal==1.3.1
    # via aiohttp
 annotated-types==0.7.0
    # via pydantic
-anthropic==0.28.1
+anthropic==0.25.9
    # via
    #   openpipe
    #   pipecat-ai (pyproject.toml)
@@ -28,29 +28,27 @@ anyio==4.4.0
    #   openai
    #   starlette
    #   watchfiles
-async-timeout==4.0.3
-    # via
-    #   aiohttp
-    #   langchain
 attrs==23.2.0
    # via
    #   aiohttp
    #   openpipe
-av==12.2.0
+av==12.1.0
    # via faster-whisper
-azure-cognitiveservices-speech==1.38.0
+azure-cognitiveservices-speech==1.37.0
    # via pipecat-ai (pyproject.toml)
 blinker==1.8.2
    # via flask
 cachetools==5.3.3
    # via google-auth
-cartesia==1.0.3
+cartesia==0.1.1
    # via pipecat-ai (pyproject.toml)
 certifi==2024.6.2
    # via
    #   httpcore
    #   httpx
    #   requests
+cffi==1.16.0
+    # via sounddevice
 charset-normalizer==3.3.2
    # via requests
 click==8.1.7
@@ -62,7 +60,7 @@ coloredlogs==15.0.1
    # via onnxruntime
 ctranslate2==4.3.1
    # via faster-whisper
-daily-python==0.10.1
+daily-python==0.10.0
    # via pipecat-ai (pyproject.toml)
 dataclasses-json==0.6.7
    # via
@@ -80,19 +78,15 @@ einops==0.8.0
    # via pipecat-ai (pyproject.toml)
 email-validator==2.2.0
    # via fastapi
-exceptiongroup==1.2.1
-    # via
-    #   anyio
-    #   pytest
-fal-client==0.4.1
+fal-client==0.4.0
    # via pipecat-ai (pyproject.toml)
 fastapi==0.111.0
    # via pipecat-ai (pyproject.toml)
 fastapi-cli==0.0.4
    # via fastapi
-faster-whisper==1.0.3
+faster-whisper==1.0.2
    # via pipecat-ai (pyproject.toml)
-filelock==3.15.4
+filelock==3.15.3
    # via
    #   huggingface-hub
    #   pyht
@@ -110,22 +104,22 @@ frozenlist==1.4.1
    # via
    #   aiohttp
    #   aiosignal
-fsspec==2024.6.1
+fsspec==2024.6.0
    # via
    #   huggingface-hub
    #   torch
 future==1.0.0
    # via pyloudnorm
-google-ai-generativelanguage==0.6.6
+google-ai-generativelanguage==0.6.4
    # via google-generativeai
-google-api-core[grpc]==2.19.1
+google-api-core[grpc]==2.19.0
    # via
    #   google-ai-generativelanguage
    #   google-api-python-client
    #   google-generativeai
-google-api-python-client==2.135.0
+google-api-python-client==2.134.0
    # via google-generativeai
-google-auth==2.31.0
+google-auth==2.30.0
    # via
    #   google-ai-generativelanguage
    #   google-api-core
@@ -134,9 +128,9 @@ google-auth==2.31.0
    #   google-generativeai
 google-auth-httplib2==0.2.0
    # via google-api-python-client
-google-generativeai==0.7.1
+google-generativeai==0.5.4
    # via pipecat-ai (pyproject.toml)
-googleapis-common-protos==1.63.2
+googleapis-common-protos==1.63.1
    # via
    #   google-api-core
    #   grpcio-status
@@ -194,35 +188,31 @@ jinja2==3.1.4
    #   fastapi
    #   flask
    #   torch
-jiter==0.5.0
-    # via anthropic
 jsonpatch==1.33
    # via langchain-core
 jsonpointer==3.0.0
    # via jsonpatch
-langchain==0.2.6
+langchain==0.2.5
    # via
    #   langchain-community
    #   pipecat-ai (pyproject.toml)
-langchain-community==0.2.6
+langchain-community==0.2.5
    # via pipecat-ai (pyproject.toml)
-langchain-core==0.2.10
+langchain-core==0.2.9
    # via
    #   langchain
    #   langchain-community
    #   langchain-openai
    #   langchain-text-splitters
-langchain-openai==0.1.10
+langchain-openai==0.1.9
    # via pipecat-ai (pyproject.toml)
-langchain-text-splitters==0.2.2
+langchain-text-splitters==0.2.1
    # via langchain
-langsmith==0.1.83
+langsmith==0.1.81
    # via
    #   langchain
    #   langchain-community
    #   langchain-core
-llvmlite==0.43.0
-    # via numba
 loguru==0.7.2
    # via pipecat-ai (pyproject.toml)
 markdown-it-py==3.0.0
@@ -245,29 +235,25 @@ mypy-extensions==1.0.0
    # via typing-inspect
 networkx==3.3
    # via torch
-numba==0.60.0
-    # via resampy
 numpy==1.26.4
    # via
    #   ctranslate2
    #   langchain
    #   langchain-community
-    #   numba
    #   onnxruntime
    #   pipecat-ai (pyproject.toml)
    #   pyloudnorm
-    #   resampy
    #   scipy
    #   torchvision
    #   transformers
-onnxruntime==1.18.1
+onnxruntime==1.18.0
    # via faster-whisper
-openai==1.27.0
+openai==1.26.0
    # via
    #   langchain-openai
    #   openpipe
    #   pipecat-ai (pyproject.toml)
-openpipe==4.16.0
+openpipe==4.14.0
    # via pipecat-ai (pyproject.toml)
 orjson==3.10.5
    # via
@@ -310,7 +296,9 @@ pyasn1-modules==0.4.0
    # via google-auth
 pyaudio==0.2.14
    # via pipecat-ai (pyproject.toml)
-pydantic==2.8.0
+pycparser==2.22
+    # via cffi
+pydantic==2.7.4
    # via
    #   anthropic
    #   fastapi
@@ -319,7 +307,7 @@ pydantic==2.8.0
    #   langchain-core
    #   langsmith
    #   openai
-pydantic-core==2.20.0
+pydantic-core==2.18.4
    # via pydantic
 pygments==2.18.0
    # via rich
@@ -366,8 +354,6 @@ requests==2.32.3
    #   pyht
    #   tiktoken
    #   transformers
-resampy==0.4.3
-    # via pipecat-ai (pyproject.toml)
 rich==13.7.1
    # via typer
 rsa==4.9
@@ -376,7 +362,7 @@ safetensors==0.4.3
    # via
    #   timm
    #   transformers
-scipy==1.14.0
+scipy==1.13.1
    # via pyloudnorm
 shellingham==1.5.4
    # via typer
@@ -388,6 +374,8 @@ sniffio==1.3.1
    #   anyio
    #   httpx
    #   openai
+sounddevice==0.4.7
+    # via pipecat-ai (pyproject.toml)
 sqlalchemy==2.0.31
    # via
    #   langchain
@@ -398,7 +386,7 @@ sympy==1.12.1
    # via
    #   onnxruntime
    #   torch
-tenacity==8.4.2
+tenacity==8.4.1
    # via
    #   langchain
    #   langchain-community
@@ -412,8 +400,6 @@ tokenizers==0.19.1
    #   anthropic
    #   faster-whisper
    #   transformers
-tomli==2.0.1
-    # via pytest
 torch==2.3.1
    # via
    #   pipecat-ai (pyproject.toml)
@@ -437,7 +423,6 @@ typer==0.12.3
 typing-extensions==4.12.2
    # via
    #   anthropic
-    #   anyio
    #   deepgram-sdk
    #   fastapi
    #   google-generativeai
@@ -450,7 +435,6 @@ typing-extensions==4.12.2
    #   torch
    #   typer
    #   typing-inspect
-    #   uvicorn
 typing-inspect==0.9.0
    # via dataclasses-json
 ujson==5.10.0
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -8,7 +8,7 @@ dynamic = ["version"]
 description = "An open source framework for voice (and multimodal) assistants"
 license = { text = "BSD 2-Clause License" }
 readme = "README.md"
-requires-python = ">=3.10"
+requires-python = ">=3.7"
 keywords = ["webrtc", "audio", "video", "ai"]
 classifiers = [
    "Development Status :: 5 - Production/Stable",
@@ -34,26 +34,24 @@ Source = "https://github.com/pipecat-ai/pipecat"
 Website = "https://pipecat.ai"

 [project.optional-dependencies]
-anthropic = [ "anthropic~=0.28.1" ]
-azure = [ "azure-cognitiveservices-speech~=1.38.0" ]
-cartesia = [ "websockets~=12.0" ]
-daily = [ "daily-python~=0.10.1" ]
+anthropic = [ "anthropic~=0.25.7" ]
+azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
+cartesia = [ "numpy~=1.26.0", "sounddevice", "cartesia" ]
+daily = [ "daily-python~=0.10.0" ]
 deepgram = [ "deepgram-sdk~=3.2.7" ]
 examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
-fal = [ "fal-client~=0.4.1" ]
-gladia = [ "websockets~=12.0" ]
-google = [ "google-generativeai~=0.7.1" ]
-fireworks = [ "openai~=1.27.0" ]
-langchain = [ "langchain~=0.2.6", "langchain-community~=0.2.6", "langchain-openai~=0.1.10" ]
+fal = [ "fal-client~=0.4.0" ]
+google = [ "google-generativeai~=0.5.3" ]
+fireworks = [ "openai~=1.26.0" ]
+langchain = [ "langchain~=0.2.1", "langchain-community~=0.2.1", "langchain-openai~=0.1.8" ]
 local = [ "pyaudio~=0.2.0" ]
 moondream = [ "einops~=0.8.0", "timm~=0.9.16", "transformers~=4.40.2" ]
-openai = [ "openai~=1.27.0" ]
-openpipe = [ "openpipe~=4.16.0" ]
+openai = [ "openai~=1.26.0" ]
+openpipe = [ "openpipe~=4.14.0" ]
 playht = [ "pyht~=0.0.28" ]
-silero = [ "torch~=2.3.1", "torchaudio~=2.3.1" ]
+silero = [ "torch~=2.3.0", "torchaudio~=2.3.0" ]
 websocket = [ "websockets~=12.0", "fastapi~=0.111.0" ]
-whisper = [ "faster-whisper~=1.0.3" ]
-xtts = [ "resampy~=0.4.3" ]
+whisper = [ "faster-whisper~=1.0.2" ]

 [tool.setuptools.packages.find]
 # All the following settings are optional:
--- a/src/pipecat/frames/frames.py
+++ b/src/pipecat/frames/frames.py
@@ -158,34 +158,6 @@ class LLMMessagesFrame(DataFrame):
    messages: List[dict]


-@dataclass
-class LLMMessagesAppendFrame(DataFrame):
-    """A frame containing a list of LLM messages that neeed to be added to the
-    current context.
-
-    """
-    messages: List[dict]
-
-
-@dataclass
-class LLMMessagesUpdateFrame(DataFrame):
-    """A frame containing a list of new LLM messages. These messages will
-    replace the current context LLM messages and should generate a new
-    LLMMessagesFrame.
-
-    """
-    messages: List[dict]
-
-
-@dataclass
-class TTSSpeakFrame(DataFrame):
-    """A frame that contains a text that should be spoken by the TTS in the
-    pipeline (if any).
-
-    """
-    text: str
-
-
@dataclass
 class TransportMessageFrame(DataFrame):
    message: Any
@@ -268,33 +240,12 @@ class StopInterruptionFrame(SystemFrame):
    pass


-@dataclass
-class BotInterruptionFrame(SystemFrame):
-    """Emitted by when the bot should be interrupted. This will mainly cause the
-    same actions as if the user interrupted except that the
-    UserStartedSpeakingFrame and UserStoppedSpeakingFrame won't be generated.
-
-    """
-    pass
-
-
-@dataclass
-class BotSpeakingFrame(SystemFrame):
-    """Emitted by transport outputs while the bot is still speaking. This can be
-    used, for example, to detect when a user is idle. That is, while the bot is
-    speaking we don't want to trigger any user idle timeout since the user might
-    be listening.
-
-    """
-    pass
-
-
@dataclass
 class MetricsFrame(SystemFrame):
    """Emitted by processor that can compute metrics like latencies.
    """
-    ttfb: List[Mapping[str, Any]] | None = None
-    processing: List[Mapping[str, Any]] | None = None
+    ttfb: Mapping[str, float]
+

 #
 # Control frames
@@ -320,13 +271,27 @@ class EndFrame(ControlFrame):

@dataclass
 class LLMFullResponseStartFrame(ControlFrame):
-    """Used to indicate the beginning of an LLM response. Following by one or
-    more TextFrame and a final LLMFullResponseEndFrame."""
+    """Used to indicate the beginning of a full LLM response. Following
+    LLMResponseStartFrame, TextFrame and LLMResponseEndFrame for each sentence
+    until a LLMFullResponseEndFrame."""
    pass


@dataclass
 class LLMFullResponseEndFrame(ControlFrame):
+    """Indicates the end of a full LLM response."""
+    pass
+
+
+@dataclass
+class LLMResponseStartFrame(ControlFrame):
+    """Used to indicate the beginning of an LLM response. Following TextFrames
+    are part of the LLM response until an LLMResponseEndFrame"""
+    pass
+
+
+@dataclass
+class LLMResponseEndFrame(ControlFrame):
    """Indicates the end of an LLM response."""
    pass

@@ -373,17 +338,3 @@ class UserImageRequestFrame(ControlFrame):

    def __str__(self):
        return f"{self.name}, user: {self.user_id}"
-
-
-@dataclass
-class LLMModelUpdateFrame(ControlFrame):
-    """A control frame containing a request to update to a new LLM model.
-    """
-    model: str
-
-
-@dataclass
-class TTSVoiceUpdateFrame(ControlFrame):
-    """A control frame containing a request to update to a new TTS voice.
-    """
-    voice: str
--- a/src/pipecat/pipeline/pipeline.py
+++ b/src/pipecat/pipeline/pipeline.py
@@ -91,7 +91,5 @@ class Pipeline(BasePipeline):
    def _link_processors(self):
        prev = self._processors[0]
        for curr in self._processors[1:]:
-            prev.set_parent(self)
            prev.link(curr)
            prev = curr
-        prev.set_parent(self)
--- a/src/pipecat/pipeline/runner.py
+++ b/src/pipecat/pipeline/runner.py
@@ -15,7 +15,7 @@ from loguru import logger

 class PipelineRunner:

-    def __init__(self, *, name: str | None = None, handle_sigint: bool = True):
+    def __init__(self, name: str | None = None, handle_sigint: bool = True):
        self.id: int = obj_id()
        self.name: str = name or f"{self.__class__.__name__}#{obj_count(self)}"

--- a/src/pipecat/pipeline/task.py
+++ b/src/pipecat/pipeline/task.py
@@ -95,9 +95,8 @@ class PipelineTask:

    def _initial_metrics_frame(self) -> MetricsFrame:
        processors = self._pipeline.processors_with_metrics()
-        ttfb = [{"name": p.name, "time": 0.0} for p in processors]
-        processing = [{"name": p.name, "time": 0.0} for p in processors]
-        return MetricsFrame(ttfb=ttfb, processing=processing)
+        ttfb = dict(zip([p.name for p in processors], [0] * len(processors)))
+        return MetricsFrame(ttfb=ttfb)

    async def _process_down_queue(self):
        start_frame = StartFrame(
--- a/src/pipecat/processors/aggregators/llm_response.py
+++ b/src/pipecat/processors/aggregators/llm_response.py
@@ -14,9 +14,9 @@ from pipecat.frames.frames import (
    InterimTranscriptionFrame,
    LLMFullResponseEndFrame,
    LLMFullResponseStartFrame,
-    LLMMessagesAppendFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
    LLMMessagesFrame,
-    LLMMessagesUpdateFrame,
    StartInterruptionFrame,
    TranscriptionFrame,
    TextFrame,
@@ -122,19 +122,6 @@ class LLMResponseAggregator(FrameProcessor):
            # Reset anyways
            self._reset()
            await self.push_frame(frame, direction)
-        elif isinstance(frame, LLMMessagesAppendFrame):
-            self._messages.extend(frame.messages)
-            messages_frame = LLMMessagesFrame(self._messages)
-            await self.push_frame(messages_frame)
-        elif isinstance(frame, LLMMessagesUpdateFrame):
-            # We push the frame downstream so the assistant aggregator gets
-            # updated as well.
-            await self.push_frame(frame)
-            # We can now reset this one.
-            self._reset()
-            self._messages = frame.messages
-            messages_frame = LLMMessagesFrame(self._messages)
-            await self.push_frame(messages_frame)
        else:
            await self.push_frame(frame, direction)

@@ -186,7 +173,7 @@ class LLMUserResponseAggregator(LLMResponseAggregator):

 class LLMFullResponseAggregator(FrameProcessor):
    """This class aggregates Text frames until it receives a
-    LLMFullResponseEndFrame, then emits the concatenated text as
+    LLMResponseEndFrame, then emits the concatenated text as
    a single text frame.

    given the following frames:
@@ -195,12 +182,12 @@ class LLMFullResponseAggregator(FrameProcessor):
        TextFrame(" world.")
        TextFrame(" I am")
        TextFrame(" an LLM.")
-        LLMFullResponseEndFrame()]
+        LLMResponseEndFrame()]

    this processor will yield nothing for the first 4 frames, then

        TextFrame("Hello, world. I am an LLM.")
-        LLMFullResponseEndFrame()
+        LLMResponseEndFrame()

    when passed the last frame.

@@ -216,9 +203,9 @@ class LLMFullResponseAggregator(FrameProcessor):
    >>> asyncio.run(print_frames(aggregator, TextFrame(" world.")))
    >>> asyncio.run(print_frames(aggregator, TextFrame(" I am")))
    >>> asyncio.run(print_frames(aggregator, TextFrame(" an LLM.")))
-    >>> asyncio.run(print_frames(aggregator, LLMFullResponseEndFrame()))
+    >>> asyncio.run(print_frames(aggregator, LLMResponseEndFrame()))
    Hello, world. I am an LLM.
-    LLMFullResponseEndFrame
+    LLMResponseEndFrame
    """

    def __init__(self):
@@ -247,11 +234,6 @@ class LLMContextAggregator(LLMResponseAggregator):
    async def _push_aggregation(self):
        if len(self._aggregation) > 0:
            self._context.add_message({"role": self._role, "content": self._aggregation})
-
-            # Reset the aggregation. Reset it before pushing it down, otherwise
-            # if the tasks gets cancelled we won't be able to clear things up.
-            self._aggregation = ""
-
            frame = OpenAILLMContextFrame(self._context)
            await self.push_frame(frame)

@@ -265,10 +247,9 @@ class LLMAssistantContextAggregator(LLMContextAggregator):
            messages=[],
            context=context,
            role="assistant",
-            start_frame=LLMFullResponseStartFrame,
-            end_frame=LLMFullResponseEndFrame,
-            accumulator_frame=TextFrame,
-            handle_interruptions=True
+            start_frame=LLMResponseStartFrame,
+            end_frame=LLMResponseEndFrame,
+            accumulator_frame=TextFrame
        )


--- a/src/pipecat/processors/async_frame_processor.py
+++ b/src/pipecat/processors/async_frame_processor.py
@@ -1,64 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-
-from pipecat.frames.frames import EndFrame, Frame, StartInterruptionFrame
-from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
-
-
-class AsyncFrameProcessor(FrameProcessor):
-
-    def __init__(
-            self,
-            *,
-            name: str | None = None,
-            loop: asyncio.AbstractEventLoop | None = None,
-            **kwargs):
-        super().__init__(name=name, loop=loop, **kwargs)
-
-        self._create_push_task()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, StartInterruptionFrame):
-            await self._handle_interruptions(frame)
-
-    async def queue_frame(
-            self,
-            frame: Frame,
-            direction: FrameDirection = FrameDirection.DOWNSTREAM):
-        await self._push_queue.put((frame, direction))
-
-    async def cleanup(self):
-        self._push_frame_task.cancel()
-        await self._push_frame_task
-
-    async def _handle_interruptions(self, frame: Frame):
-        # Cancel the task. This will stop pushing frames downstream.
-        self._push_frame_task.cancel()
-        await self._push_frame_task
-        # Push an out-of-band frame (i.e. not using the ordered push
-        # frame task).
-        await self.push_frame(frame)
-        # Create a new queue and task.
-        self._create_push_task()
-
-    def _create_push_task(self):
-        self._push_queue = asyncio.Queue()
-        self._push_frame_task = self.get_event_loop().create_task(self._push_frame_task_handler())
-
-    async def _push_frame_task_handler(self):
-        running = True
-        while running:
-            try:
-                (frame, direction) = await self._push_queue.get()
-                await self.push_frame(frame, direction)
-                running = not isinstance(frame, EndFrame)
-                self._push_queue.task_done()
-            except asyncio.CancelledError:
-                break
--- a/src/pipecat/processors/filters/wake_check_filter.py
+++ b/src/pipecat/processors/filters/wake_check_filter.py
@@ -82,5 +82,5 @@ class WakeCheckFilter(FrameProcessor):
                await self.push_frame(frame, direction)
        except Exception as e:
            error_msg = f"Error in wake word filter: {e}"
-            logger.exception(error_msg)
+            logger.error(error_msg)
            await self.push_error(ErrorFrame(error_msg))
--- a/src/pipecat/processors/frame_processor.py
+++ b/src/pipecat/processors/frame_processor.py
@@ -9,7 +9,7 @@ import time

 from enum import Enum

-from pipecat.frames.frames import ErrorFrame, Frame, MetricsFrame, StartFrame, StartInterruptionFrame, UserStoppedSpeakingFrame
+from pipecat.frames.frames import ErrorFrame, Frame, MetricsFrame, StartFrame, UserStoppedSpeakingFrame
 from pipecat.utils.utils import obj_count, obj_id

 from loguru import logger
@@ -20,59 +20,15 @@ class FrameDirection(Enum):
    UPSTREAM = 2


-class FrameProcessorMetrics:
-    def __init__(self, name: str):
-        self._name = name
-        self._start_ttfb_time = 0
-        self._start_processing_time = 0
-        self._should_report_ttfb = True
-
-    async def start_ttfb_metrics(self, report_only_initial_ttfb):
-        if self._should_report_ttfb:
-            self._start_ttfb_time = time.time()
-            self._should_report_ttfb = not report_only_initial_ttfb
-
-    async def stop_ttfb_metrics(self):
-        if self._start_ttfb_time == 0:
-            return None
-
-        value = time.time() - self._start_ttfb_time
-        logger.debug(f"{self._name} TTFB: {value}")
-        ttfb = {
-            "processor": self._name,
-            "value": value
-        }
-        self._start_ttfb_time = 0
-        return MetricsFrame(ttfb=[ttfb])
-
-    async def start_processing_metrics(self):
-        self._start_processing_time = time.time()
-
-    async def stop_processing_metrics(self):
-        if self._start_processing_time == 0:
-            return None
-
-        value = time.time() - self._start_processing_time
-        logger.debug(f"{self._name} processing time: {value}")
-        processing = {
-            "processor": self._name,
-            "value": value
-        }
-        self._start_processing_time = 0
-        return MetricsFrame(processing=[processing])
-
-
 class FrameProcessor:

    def __init__(
            self,
-            *,
            name: str | None = None,
            loop: asyncio.AbstractEventLoop | None = None,
            **kwargs):
        self.id: int = obj_id()
        self.name = name or f"{self.__class__.__name__}#{obj_count(self)}"
-        self._parent: "FrameProcessor" | None = None
        self._prev: "FrameProcessor" | None = None
        self._next: "FrameProcessor" | None = None
        self._loop: asyncio.AbstractEventLoop = loop or asyncio.get_running_loop()
@@ -83,7 +39,8 @@ class FrameProcessor:
        self._report_only_initial_ttfb = False

        # Metrics
-        self._metrics = FrameProcessorMetrics(name=self.name)
+        self._start_ttfb_time = 0
+        self._should_report_ttfb = True

    @property
    def interruptions_allowed(self):
@@ -101,33 +58,21 @@ class FrameProcessor:
        return False

    async def start_ttfb_metrics(self):
-        if self.can_generate_metrics() and self.metrics_enabled:
-            await self._metrics.start_ttfb_metrics(self._report_only_initial_ttfb)
+        if self.metrics_enabled and self._should_report_ttfb:
+            self._start_ttfb_time = time.time()
+            self._should_report_ttfb = not self._report_only_initial_ttfb

    async def stop_ttfb_metrics(self):
-        if self.can_generate_metrics() and self.metrics_enabled:
-            frame = await self._metrics.stop_ttfb_metrics()
-            if frame:
-                await self.push_frame(frame)
-
-    async def start_processing_metrics(self):
-        if self.can_generate_metrics() and self.metrics_enabled:
-            await self._metrics.start_processing_metrics()
-
-    async def stop_processing_metrics(self):
-        if self.can_generate_metrics() and self.metrics_enabled:
-            frame = await self._metrics.stop_processing_metrics()
-            if frame:
-                await self.push_frame(frame)
-
-    async def stop_all_metrics(self):
-        await self.stop_ttfb_metrics()
-        await self.stop_processing_metrics()
+        if self.metrics_enabled and self._start_ttfb_time > 0:
+            ttfb = time.time() - self._start_ttfb_time
+            logger.debug(f"{self.name} TTFB: {ttfb}")
+            await self.push_frame(MetricsFrame(ttfb={self.name: ttfb}))
+            self._start_ttfb_time = 0

    async def cleanup(self):
        pass

-    def link(self, processor: "FrameProcessor"):
+    def link(self, processor: 'FrameProcessor'):
        self._next = processor
        processor._prev = self
        logger.debug(f"Linking {self} -> {self._next}")
@@ -135,19 +80,11 @@ class FrameProcessor:
    def get_event_loop(self) -> asyncio.AbstractEventLoop:
        return self._loop

-    def set_parent(self, parent: "FrameProcessor"):
-        self._parent = parent
-
-    def get_parent(self) -> "FrameProcessor":
-        return self._parent
-
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        if isinstance(frame, StartFrame):
            self._allow_interruptions = frame.allow_interruptions
            self._enable_metrics = frame.enable_metrics
            self._report_only_initial_ttfb = frame.report_only_initial_ttfb
-        elif isinstance(frame, StartInterruptionFrame):
-            await self.stop_all_metrics()
        elif isinstance(frame, UserStoppedSpeakingFrame):
            self._should_report_ttfb = True

@@ -155,15 +92,12 @@ class FrameProcessor:
        await self.push_frame(error, FrameDirection.UPSTREAM)

    async def push_frame(self, frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM):
-        try:
-            if direction == FrameDirection.DOWNSTREAM and self._next:
-                logger.trace(f"Pushing {frame} from {self} to {self._next}")
-                await self._next.process_frame(frame, direction)
-            elif direction == FrameDirection.UPSTREAM and self._prev:
-                logger.trace(f"Pushing {frame} upstream from {self} to {self._prev}")
-                await self._prev.process_frame(frame, direction)
-        except Exception as e:
-            logger.exception(f"Uncaught exception in {self}: {e}")
+        if direction == FrameDirection.DOWNSTREAM and self._next:
+            logger.trace(f"Pushing {frame} from {self} to {self._next}")
+            await self._next.process_frame(frame, direction)
+        elif direction == FrameDirection.UPSTREAM and self._prev:
+            logger.trace(f"Pushing {frame} upstream from {self} to {self._prev}")
+            await self._prev.process_frame(frame, direction)

    def __str__(self):
        return self.name
--- a/src/pipecat/processors/frameworks/langchain.py
+++ b/src/pipecat/processors/frameworks/langchain.py
@@ -11,6 +11,8 @@ from pipecat.frames.frames import (
    LLMFullResponseEndFrame,
    LLMFullResponseStartFrame,
    LLMMessagesFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
    TextFrame)
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor

@@ -67,10 +69,11 @@ class LangchainProcessor(FrameProcessor):
                {self._transcript_key: text},
                config={"configurable": {"session_id": self._participant_id}},
            ):
+                await self.push_frame(LLMResponseStartFrame())
                await self.push_frame(TextFrame(self.__get_token_value(token)))
+                await self.push_frame(LLMResponseEndFrame())
        except GeneratorExit:
            logger.warning(f"{self} generator was closed prematurely")
        except Exception as e:
-            logger.exception(f"{self} an unknown error occurred: {e}")
-        finally:
-            await self.push_frame(LLMFullResponseEndFrame())
+            logger.error(f"{self} an unknown error occurred: {e}")
+        await self.push_frame(LLMFullResponseEndFrame())
--- a/src/pipecat/processors/frameworks/rtvi.py
+++ b/src/pipecat/processors/frameworks/rtvi.py
@@ -1,523 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import dataclasses
-
-from typing import List, Literal, Optional, Type
-from pydantic import BaseModel, ValidationError
-
-from pipecat.frames.frames import (
-    BotInterruptionFrame,
-    Frame,
-    InterimTranscriptionFrame,
-    LLMFullResponseEndFrame,
-    LLMFullResponseStartFrame,
-    LLMMessagesAppendFrame,
-    LLMMessagesUpdateFrame,
-    LLMModelUpdateFrame,
-    StartFrame,
-    SystemFrame,
-    TTSSpeakFrame,
-    TTSVoiceUpdateFrame,
-    TextFrame,
-    TranscriptionFrame,
-    TransportMessageFrame,
-    UserStartedSpeakingFrame,
-    UserStoppedSpeakingFrame)
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
-from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
-from pipecat.services.ai_services import AIService
-from pipecat.services.cartesia import CartesiaTTSService
-from pipecat.services.openai import OpenAILLMService, OpenAILLMContext
-from pipecat.transports.base_transport import BaseTransport
-
-DEFAULT_MESSAGES = [
-    {
-        "role": "system",
-        "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-    }
-]
-
-DEFAULT_MODEL = "llama3-70b-8192"
-
-DEFAULT_VOICE = "79a125e8-cd45-4c13-8a67-188112f4dd22"
-
-
-class RTVILLMConfig(BaseModel):
-    model: Optional[str] = None
-    messages: Optional[List[dict]] = None
-
-
-class RTVITTSConfig(BaseModel):
-    voice: Optional[str] = None
-
-
-class RTVIConfig(BaseModel):
-    llm: Optional[RTVILLMConfig] = None
-    tts: Optional[RTVITTSConfig] = None
-
-
-class RTVISetup(BaseModel):
-    config: Optional[RTVIConfig] = None
-
-
-class RTVILLMMessageData(BaseModel):
-    messages: List[dict]
-
-
-class RTVITTSMessageData(BaseModel):
-    text: str
-    interrupt: Optional[bool] = False
-
-
-class RTVIMessageData(BaseModel):
-    setup: Optional[RTVISetup] = None
-    config: Optional[RTVIConfig] = None
-    llm: Optional[RTVILLMMessageData] = None
-    tts: Optional[RTVITTSMessageData] = None
-
-
-class RTVIMessage(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: str
-    id: str
-    data: Optional[RTVIMessageData] = None
-
-
-class RTVIResponseData(BaseModel):
-    success: bool
-    error: Optional[str] = None
-
-
-class RTVIResponse(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["response"] = "response"
-    id: str
-    data: RTVIResponseData
-
-
-class RTVIErrorData(BaseModel):
-    message: str
-
-
-class RTVIError(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["error"] = "error"
-    data: RTVIErrorData
-
-
-class RTVILLMContextMessageData(BaseModel):
-    messages: List[dict]
-
-
-class RTVILLMContextMessage(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["llm-context"] = "llm-context"
-    data: RTVILLMContextMessageData
-
-
-class RTVITTSTextMessageData(BaseModel):
-    text: str
-
-
-class RTVITTSTextMessage(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["tts-text"] = "tts-text"
-    data: RTVITTSTextMessageData
-
-
-class RTVIBotReady(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["bot-ready"] = "bot-ready"
-
-
-class RTVITranscriptionMessageData(BaseModel):
-    text: str
-    user_id: str
-    timestamp: str
-    final: bool
-
-
-class RTVITranscriptionMessage(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["user-transcription"] = "user-transcription"
-    data: RTVITranscriptionMessageData
-
-
-class RTVIUserStartedSpeakingMessage(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["user-started-speaking"] = "user-started-speaking"
-
-
-class RTVIUserStoppedSpeakingMessage(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["user-stopped-speaking"] = "user-stopped-speaking"
-
-
-class RTVIJSONCompletion(BaseModel):
-    label: Literal["rtvi"] = "rtvi"
-    type: Literal["json-completion"] = "json-completion"
-    data: str
-
-
-class FunctionCaller(FrameProcessor):
-
-    def __init__(self, context):
-        super().__init__()
-        self._checking = False
-        self._aggregating = False
-        self._emitted_start = False
-        self._aggregation = ""
-        self._context = context
-
-        self._callbacks = {}
-        self._start_callbacks = {}
-
-    def register_function(self, function_name: str, callback, start_callback=None):
-        self._callbacks[function_name] = callback
-        if start_callback:
-            self._start_callbacks[function_name] = start_callback
-
-    def unregister_function(self, function_name: str):
-        del self._callbacks[function_name]
-        if self._start_callbacks[function_name]:
-            del self._start_callbacks[function_name]
-
-    def has_function(self, function_name: str):
-        return function_name in self._callbacks.keys()
-
-    async def call_function(self, function_name: str, args):
-        if function_name in self._callbacks.keys():
-            return await self._callbacks[function_name](self, args)
-        return None
-
-    async def call_start_function(self, function_name: str):
-        if function_name in self._start_callbacks.keys():
-            await self._start_callbacks[function_name](self)
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, LLMFullResponseStartFrame):
-            self._checking = True
-            await self.push_frame(frame, direction)
-        elif isinstance(frame, TextFrame) and self._checking:
-            # TODO-CB: should we expand this to any non-text character to start the completion?
-            if frame.text.strip().startswith("{") or frame.text.strip().startswith("```"):
-                self._emitted_start = False
-                self._checking = False
-                self._aggregation = frame.text
-                self._aggregating = True
-            else:
-                self._checking = False
-                self._aggregating = False
-                self._aggregation = ""
-                self._emitted_start = False
-                await self.push_frame(frame, direction)
-        elif isinstance(frame, TextFrame) and self._aggregating:
-            self._aggregation += frame.text
-            # TODO-CB: We can probably ignore function start I think
-            # if not self._emitted_start:
-            #     fn = re.search(r'{"function_name":\s*"(.*)",', self._aggregation)
-            #     if fn and fn.group(1):
-            #         await self.call_start_function(fn.group(1))
-            #         self._emitted_start = True
-        elif isinstance(frame, LLMFullResponseEndFrame) and self._aggregating:
-            try:
-                self._aggregation = self._aggregation.replace("```json", "").replace("```", "")
-                self._context.add_message({"role": "assistant", "content": self._aggregation})
-                message = RTVIJSONCompletion(data=self._aggregation)
-                msg = message.model_dump(exclude_none=True)
-                await self.push_frame(TransportMessageFrame(message=msg))
-
-            except Exception as e:
-                print(f"Error parsing function call json: {e}")
-                print(f"aggregation was: {self._aggregation}")
-
-            self._aggregating = False
-            self._aggregation = ""
-            self._emitted_start = False
-        elif isinstance(frame, LLMFullResponseEndFrame):
-            await self.push_frame(frame, direction)
-        else:
-            await self.push_frame(frame, direction)
-
-
-class RTVITTSTextProcessor(FrameProcessor):
-
-    def __init__(self):
-        super().__init__()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        await self.push_frame(frame, direction)
-
-        if isinstance(frame, TextFrame):
-            message = RTVITTSTextMessage(data=RTVITTSTextMessageData(text=frame.text))
-            await self.push_frame(TransportMessageFrame(message=message.model_dump(exclude_none=True)))
-
-
-class RTVIProcessor(FrameProcessor):
-
-    def __init__(
-            self,
-            *,
-            transport: BaseTransport,
-            setup: RTVISetup | None = None,
-            llm_api_key: str = "",
-            llm_base_url: str = "https://api.groq.com/openai/v1",
-            tts_api_key: str = "",
-            llm_cls: Type[AIService] = OpenAILLMService,
-            tts_cls: Type[AIService] = CartesiaTTSService):
-        super().__init__()
-        self._transport = transport
-        self._setup = setup
-        self._llm_api_key = llm_api_key
-        self._llm_base_url = llm_base_url
-        self._tts_api_key = tts_api_key
-        self._llm_cls = llm_cls
-        self._tts_cls = tts_cls
-        self._start_frame: Frame | None = None
-        self._llm: FrameProcessor | None = None
-        self._tts: FrameProcessor | None = None
-        self._pipeline: FrameProcessor | None = None
-
-        self._frame_handler_task = self.get_event_loop().create_task(self._frame_handler())
-        self._frame_queue = asyncio.Queue()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, SystemFrame):
-            await self.push_frame(frame, direction)
-        else:
-            await self._frame_queue.put((frame, direction))
-
-        if isinstance(frame, StartFrame):
-            self._start_frame = frame
-            try:
-                await self._handle_setup(self._setup)
-            except Exception as e:
-                await self._send_error(f"unable to setup RTVI: {e}")
-
-    async def cleanup(self):
-        self._frame_handler_task.cancel()
-        await self._frame_handler_task
-
-    async def _frame_handler(self):
-        while True:
-            try:
-                (frame, direction) = await self._frame_queue.get()
-                await self._handle_frame(frame, direction)
-                self._frame_queue.task_done()
-            except asyncio.CancelledError:
-                break
-
-    async def _handle_frame(self, frame: Frame, direction: FrameDirection):
-        if isinstance(frame, TransportMessageFrame):
-            await self._handle_message(frame)
-        else:
-            await self.push_frame(frame, direction)
-
-        if isinstance(frame, TranscriptionFrame) or isinstance(frame, InterimTranscriptionFrame):
-            await self._handle_transcriptions(frame)
-        elif isinstance(frame, UserStartedSpeakingFrame) or isinstance(frame, UserStoppedSpeakingFrame):
-            await self._handle_interruptions(frame)
-
-    async def _handle_transcriptions(self, frame: Frame):
-        # TODO(aleix): Once we add support for using custom piplines, the STTs will
-        # be in the pipeline after this processor. This means the STT will have to
-        # push transcriptions upstream as well.
-
-        message = None
-        if isinstance(frame, TranscriptionFrame):
-            message = RTVITranscriptionMessage(
-                data=RTVITranscriptionMessageData(
-                    text=frame.text,
-                    user_id=frame.user_id,
-                    timestamp=frame.timestamp,
-                    final=True))
-        elif isinstance(frame, InterimTranscriptionFrame):
-            message = RTVITranscriptionMessage(
-                data=RTVITranscriptionMessageData(
-                    text=frame.text,
-                    user_id=frame.user_id,
-                    timestamp=frame.timestamp,
-                    final=False))
-
-        if message:
-            frame = TransportMessageFrame(message=message.model_dump(exclude_none=True))
-            await self.push_frame(frame)
-
-    async def _handle_interruptions(self, frame: Frame):
-        message = None
-        if isinstance(frame, UserStartedSpeakingFrame):
-            message = RTVIUserStartedSpeakingMessage()
-        elif isinstance(frame, UserStoppedSpeakingFrame):
-            message = RTVIUserStoppedSpeakingMessage()
-
-        if message:
-            frame = TransportMessageFrame(message=message.model_dump(exclude_none=True))
-            await self.push_frame(frame)
-
-    async def _handle_message(self, frame: TransportMessageFrame):
-        try:
-            message = RTVIMessage.model_validate(frame.message)
-        except ValidationError as e:
-            await self._send_error(f"invalid message: {e}")
-            return
-
-        try:
-            success = True
-            error = None
-            match message.type:
-                case "setup":
-                    setup = None
-                    if message.data:
-                        setup = message.data.setup
-                    await self._handle_setup(message.id, setup)
-                case "config-update":
-                    await self._handle_config_update(message.data.config)
-                case "llm-get-context":
-                    await self._handle_llm_get_context()
-                case "llm-append-context":
-                    await self._handle_llm_append_context(message.data.llm)
-                case "llm-update-context":
-                    await self._handle_llm_update_context(message.data.llm)
-                case "tts-speak":
-                    await self._handle_tts_speak(message.data.tts)
-                case "tts-interrupt":
-                    await self._handle_tts_interrupt()
-                case _:
-                    success = False
-                    error = f"unsupported type {message.type}"
-
-            await self._send_response(message.id, success, error)
-        except ValidationError as e:
-            await self._send_response(message.id, False, f"invalid message: {e}")
-        except Exception as e:
-            await self._send_response(message.id, False, f"{e}")
-
-    async def _handle_setup(self, setup: RTVISetup | None):
-        model = DEFAULT_MODEL
-        if setup and setup.config and setup.config.llm and setup.config.llm.model:
-            model = setup.config.llm.model
-
-        messages = DEFAULT_MESSAGES
-        if setup and setup.config and setup.config.llm and setup.config.llm.messages:
-            messages = setup.config.llm.messages
-
-        voice = DEFAULT_VOICE
-        if setup and setup.config and setup.config.tts and setup.config.tts.voice:
-            voice = setup.config.tts.voice
-
-        self._tma_in = LLMUserResponseAggregator(messages)
-        self._tma_out = LLMAssistantResponseAggregator(messages)
-
-        self._llm = self._llm_cls(
-            name="LLM",
-            base_url=self._llm_base_url,
-            api_key=self._llm_api_key,
-            model=model)
-
-        self._tts = self._tts_cls(name="TTS", api_key=self._tts_api_key, voice_id=voice)
-
-        # TODO-CB: Eventually we'll need to switch the context aggregators to use the
-        # OpenAI context frames instead of message frames
-        context = OpenAILLMContext(messages=messages)
-        self._fc = FunctionCaller(context)
-
-        self._tts_text = RTVITTSTextProcessor()
-
-        pipeline = Pipeline([
-            self._tma_in,
-            self._llm,
-            self._fc,
-            self._tts,
-            self._tts_text,
-            self._tma_out,
-            self._transport.output(),
-        ])
-        self._pipeline = pipeline
-
-        parent = self.get_parent()
-        if parent and self._start_frame:
-            parent.link(pipeline)
-
-            # We need to initialize the new pipeline with the same settings
-            # as the initial one.
-            start_frame = dataclasses.replace(self._start_frame)
-            await self.push_frame(start_frame)
-
-        message = RTVIBotReady()
-        frame = TransportMessageFrame(message=message.model_dump(exclude_none=True))
-        await self.push_frame(frame)
-
-    async def _handle_config_update(self, config: RTVIConfig):
-        # Change voice before LLM updates, so we can hear the new vocie.
-        if config.tts and config.tts.voice:
-            frame = TTSVoiceUpdateFrame(config.tts.voice)
-            await self.push_frame(frame)
-        if config.llm and config.llm.model:
-            frame = LLMModelUpdateFrame(config.llm.model)
-            await self.push_frame(frame)
-        if config.llm and config.llm.messages:
-            frame = LLMMessagesUpdateFrame(config.llm.messages)
-            await self.push_frame(frame)
-
-    async def _handle_llm_get_context(self):
-        data = RTVILLMContextMessageData(messages=self._tma_in.messages)
-        message = RTVILLMContextMessage(data=data)
-        frame = TransportMessageFrame(message=message.model_dump(exclude_none=True))
-        await self.push_frame(frame)
-
-    async def _handle_llm_append_context(self, data: RTVILLMMessageData):
-        if data and data.messages:
-            frame = LLMMessagesAppendFrame(data.messages)
-            await self.push_frame(frame)
-
-    async def _handle_llm_update_context(self, data: RTVILLMMessageData):
-        if data and data.messages:
-            frame = LLMMessagesUpdateFrame(data.messages)
-            await self.push_frame(frame)
-
-    async def _handle_tts_speak(self, data: RTVITTSMessageData):
-        if data and data.text:
-            if data.interrupt:
-                await self._handle_tts_interrupt()
-            frame = TTSSpeakFrame(text=data.text)
-            await self.push_frame(frame)
-
-    async def _handle_tts_interrupt(self):
-        await self.push_frame(BotInterruptionFrame(), FrameDirection.UPSTREAM)
-
-    async def _send_error(self, error: str):
-        message = RTVIError(data=RTVIErrorData(message=error))
-        frame = TransportMessageFrame(message=message.model_dump(exclude_none=True))
-        await self.push_frame(frame)
-
-    async def _send_response(self, id: str, success: bool, error: str | None = None):
-        # TODO(aleix): This is a bit hacky, but we might get invalid
-        # configuration or something might going wrong during setup and we would
-        # like to send the error to the client. However, if the pipeline is not
-        # setup yet we don't have an output transport and therefore we can't
-        # send any messages. So, we setup a super basic pipeline with just the
-        # output transport so we can send messages.
-        if not self._pipeline:
-            pipeline = Pipeline([self._transport.output()])
-            self._pipeline = pipeline
-
-            parent = self.get_parent()
-            if parent and self._start_frame:
-                parent.link(pipeline)
-
-        message = RTVIResponse(id=id, data=RTVIResponseData(success=success, error=error))
-        frame = TransportMessageFrame(message=message.model_dump(exclude_none=True))
-        await self.push_frame(frame)
--- a/src/pipecat/processors/idle_frame_processor.py
+++ b/src/pipecat/processors/idle_frame_processor.py
@@ -1,76 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-
-from typing import Awaitable, Callable, List
-
-from pipecat.frames.frames import Frame, SystemFrame
-from pipecat.processors.async_frame_processor import AsyncFrameProcessor
-from pipecat.processors.frame_processor import FrameDirection
-
-
-class IdleFrameProcessor(AsyncFrameProcessor):
-    """This class waits to receive any frame or list of desired frames within a
-    given timeout. If the timeout is reached before receiving any of those
-    frames the provided callback will be called.
-
-    The callback can then be used to push frames downstream by using
-    `queue_frame()` (or `push_frame()` for system frames).
-
-    """
-
-    def __init__(
-            self,
-            *,
-            callback: Callable[["IdleFrameProcessor"], Awaitable[None]],
-            timeout: float,
-            types: List[type] = [],
-            **kwargs):
-        super().__init__(**kwargs)
-
-        self._callback = callback
-        self._timeout = timeout
-        self._types = types
-
-        self._create_idle_task()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, SystemFrame):
-            await self.push_frame(frame, direction)
-        else:
-            await self.queue_frame(frame, direction)
-
-        # If we are not waiting for any specific frame set the event, otherwise
-        # check if we have received one of the desired frames.
-        if not self._types:
-            self._idle_event.set()
-        else:
-            for t in self._types:
-                if isinstance(frame, t):
-                    self._idle_event.set()
-
-        # If we are not waiting for any specific frame set the event, otherwise
-    async def cleanup(self):
-        self._idle_task.cancel()
-        await self._idle_task
-
-    def _create_idle_task(self):
-        self._idle_event = asyncio.Event()
-        self._idle_task = self.get_event_loop().create_task(self._idle_task_handler())
-
-    async def _idle_task_handler(self):
-        while True:
-            try:
-                await asyncio.wait_for(self._idle_event.wait(), timeout=self._timeout)
-            except asyncio.TimeoutError:
-                await self._callback(self)
-            except asyncio.CancelledError:
-                break
-            finally:
-                self._idle_event.clear()
--- a/src/pipecat/processors/text_transformer.py
+++ b/src/pipecat/processors/text_transformer.py
@@ -33,6 +33,6 @@ class StatelessTextTransformer(FrameProcessor):
            result = self._transform_fn(frame.text)
            if isinstance(result, Coroutine):
                result = await result
-            await self.push_frame(TextFrame(text=result))
+            await self.push_frame(result)
        else:
            await self.push_frame(frame, direction)
--- a/src/pipecat/processors/user_idle_processor.py
+++ b/src/pipecat/processors/user_idle_processor.py
@@ -1,77 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-
-from typing import Awaitable, Callable
-
-from pipecat.frames.frames import BotSpeakingFrame, Frame, StartInterruptionFrame, StopInterruptionFrame, SystemFrame
-from pipecat.processors.async_frame_processor import AsyncFrameProcessor
-from pipecat.processors.frame_processor import FrameDirection
-
-
-class UserIdleProcessor(AsyncFrameProcessor):
-    """This class is useful to check if the user is interacting with the bot
-    within a given timeout. If the timeout is reached before any interaction
-    occurred the provided callback will be called.
-
-    The callback can then be used to push frames downstream by using
-    `queue_frame()` (or `push_frame()` for system frames).
-
-    """
-
-    def __init__(
-            self,
-            *,
-            callback: Callable[["UserIdleProcessor"], Awaitable[None]],
-            timeout: float,
-            **kwargs):
-        super().__init__(**kwargs)
-
-        self._callback = callback
-        self._timeout = timeout
-
-        self._interrupted = False
-
-        self._create_idle_task()
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, SystemFrame):
-            await self.push_frame(frame, direction)
-        else:
-            await self.queue_frame(frame, direction)
-
-        # We shouldn't call the idle callback if the user or the bot are speaking.
-        if isinstance(frame, StartInterruptionFrame):
-            self._interrupted = True
-            self._idle_event.set()
-        elif isinstance(frame, StopInterruptionFrame):
-            self._interrupted = False
-            self._idle_event.set()
-        elif isinstance(frame, BotSpeakingFrame):
-            self._idle_event.set()
-
-    async def cleanup(self):
-        self._idle_task.cancel()
-        await self._idle_task
-
-    def _create_idle_task(self):
-        self._idle_event = asyncio.Event()
-        self._idle_task = self.get_event_loop().create_task(self._idle_task_handler())
-
-    async def _idle_task_handler(self):
-        while True:
-            try:
-                await asyncio.wait_for(self._idle_event.wait(), timeout=self._timeout)
-            except asyncio.TimeoutError:
-                if not self._interrupted:
-                    await self._callback(self)
-            except asyncio.CancelledError:
-                break
-            finally:
-                self._idle_event.clear()
--- a/src/pipecat/serializers/twilio.py
+++ b/src/pipecat/serializers/twilio.py
@@ -17,8 +17,8 @@ class TwilioFrameSerializer(FrameSerializer):
        AudioRawFrame: "audio",
    }

-    def __init__(self, stream_sid: str):
-        self._stream_sid = stream_sid
+    def __init__(self):
+        self._sid = None

    def serialize(self, frame: Frame) -> str | bytes | None:
        if not isinstance(frame, AudioRawFrame):
@@ -30,7 +30,7 @@ class TwilioFrameSerializer(FrameSerializer):
        payload = base64.b64encode(serialized_data).decode("utf-8")
        answer = {
            "event": "media",
-            "streamSid": self._stream_sid,
+            "streamSid": self._sid,
            "media": {
                "payload": payload
            }
@@ -41,6 +41,9 @@ class TwilioFrameSerializer(FrameSerializer):
    def deserialize(self, data: str | bytes) -> Frame | None:
        message = json.loads(data)

+        if not self._sid:
+            self._sid = message["streamSid"] if "streamSid" in message else None
+
        if message["event"] != "media":
            return None
        else:
--- a/src/pipecat/services/ai_services.py
+++ b/src/pipecat/services/ai_services.py
@@ -16,38 +16,16 @@ from pipecat.frames.frames import (
    EndFrame,
    ErrorFrame,
    Frame,
-    LLMFullResponseEndFrame,
    StartFrame,
-    StartInterruptionFrame,
-    TTSSpeakFrame,
    TTSStartedFrame,
    TTSStoppedFrame,
-    TTSVoiceUpdateFrame,
    TextFrame,
    VisionImageRawFrame,
+    LLMFullResponseEndFrame,
 )
-from pipecat.processors.async_frame_processor import AsyncFrameProcessor
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.utils.audio import calculate_audio_volume
 from pipecat.utils.utils import exp_smoothing
-import re
-
-
-ENDOFSENTENCE_PATTERN_STR = r"""
-    (?<![A-Z])       # Negative lookbehind: not preceded by an uppercase letter (e.g., "U.S.A.")
-    (?<!\d)          # Negative lookbehind: not preceded by a digit (e.g., "1. Let's start")
-    (?<!\d\s[ap])    # Negative lookbehind: not preceded by time (e.g., "3:00 a.m.")
-    (?<!Mr|Ms|Dr)    # Negative lookbehind: not preceded by Mr, Ms, Dr (combined bc. length is the same)
-    (?<!Mrs)         # Negative lookbehind: not preceded by "Mrs"
-    (?<!Prof)        # Negative lookbehind: not preceded by "Prof"
-    [\.\?\!:]        # Match a period, question mark, exclamation point, or colon
-    $                # End of string
-"""
-ENDOFSENTENCE_PATTERN = re.compile(ENDOFSENTENCE_PATTERN_STR, re.VERBOSE)
-
-
-def match_endofsentence(text: str) -> bool:
-    return ENDOFSENTENCE_PATTERN.search(text.rstrip()) is not None


 class AIService(FrameProcessor):
@@ -81,30 +59,6 @@ class AIService(FrameProcessor):
                await self.push_frame(f)


-class AsyncAIService(AsyncFrameProcessor):
-    def __init__(self, **kwargs):
-        super().__init__(**kwargs)
-
-    async def start(self, frame: StartFrame):
-        pass
-
-    async def stop(self, frame: EndFrame):
-        pass
-
-    async def cancel(self, frame: CancelFrame):
-        pass
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, StartFrame):
-            await self.start(frame)
-        elif isinstance(frame, CancelFrame):
-            await self.cancel(frame)
-        elif isinstance(frame, EndFrame):
-            await self.stop(frame)
-
-
 class LLMService(AIService):
    """This class is a no-op but serves as a base class for LLM services."""

@@ -138,22 +92,11 @@ class LLMService(AIService):


 class TTSService(AIService):
-    def __init__(
-            self,
-            *,
-            aggregate_sentences: bool = True,
-            # if True, subclass is responsible for pushing TextFrames and LLMFullResponseEndFrames
-            push_text_frames: bool = True,
-            **kwargs):
+    def __init__(self, aggregate_sentences: bool = True, **kwargs):
        super().__init__(**kwargs)
        self._aggregate_sentences: bool = aggregate_sentences
-        self._push_text_frames: bool = push_text_frames
        self._current_sentence: str = ""

-    @abstractmethod
-    async def set_voice(self, voice: str):
-        pass
-
    # Converts the text to audio.
    @abstractmethod
    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
@@ -162,58 +105,43 @@ class TTSService(AIService):
    async def say(self, text: str):
        await self.process_frame(TextFrame(text=text), FrameDirection.DOWNSTREAM)

-    async def _handle_interruption(self, frame: StartInterruptionFrame, direction: FrameDirection):
-        self._current_sentence = ""
-        await self.push_frame(frame, direction)
-
    async def _process_text_frame(self, frame: TextFrame):
        text: str | None = None
        if not self._aggregate_sentences:
            text = frame.text
        else:
            self._current_sentence += frame.text
-            if match_endofsentence(self._current_sentence):
-                text = self._current_sentence
+            if self._current_sentence.strip().endswith(
+                    (".", "?", "!")) and not self._current_sentence.strip().endswith(
+                    ("Mr,", "Mrs.", "Ms.", "Dr.")):
+                text = self._current_sentence.strip()
                self._current_sentence = ""

        if text:
            await self._push_tts_frames(text)

-    async def _push_tts_frames(self, text: str, text_passthrough: bool = True):
-        text = text.strip()
-        if not text:
-            return
-
+    async def _push_tts_frames(self, text: str):
        await self.push_frame(TTSStartedFrame())
-        await self.start_processing_metrics()
        await self.process_generator(self.run_tts(text))
-        await self.stop_processing_metrics()
        await self.push_frame(TTSStoppedFrame())
-        if self._push_text_frames:
-            # We send the original text after the audio. This way, if we are
-            # interrupted, the text is not added to the assistant context.
-            await self.push_frame(TextFrame(text))
+        # We send the original text after the audio. This way, if we are
+        # interrupted, the text is not added to the assistant context.
+        await self.push_frame(TextFrame(text))

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

        if isinstance(frame, TextFrame):
            await self._process_text_frame(frame)
-        elif isinstance(frame, StartInterruptionFrame):
-            await self._handle_interruption(frame, direction)
-        elif isinstance(frame, LLMFullResponseEndFrame) or isinstance(frame, EndFrame):
-            sentence = self._current_sentence
-            self._current_sentence = ""
-            await self._push_tts_frames(sentence)
-            if isinstance(frame, LLMFullResponseEndFrame):
-                if self._push_text_frames:
-                    await self.push_frame(frame, direction)
-            else:
-                await self.push_frame(frame, direction)
-        elif isinstance(frame, TTSSpeakFrame):
-            await self._push_tts_frames(frame.text, False)
-        elif isinstance(frame, TTSVoiceUpdateFrame):
-            await self.set_voice(frame.voice)
+        elif isinstance(frame, EndFrame):
+            if self._current_sentence:
+                await self._push_tts_frames(self._current_sentence)
+            await self.push_frame(frame)
+        elif isinstance(frame, LLMFullResponseEndFrame):
+            if self._current_sentence:
+                await self._push_tts_frames(self._current_sentence.strip())
+                self._current_sentence = ""
+            await self.push_frame(frame)
        else:
            await self.push_frame(frame, direction)

@@ -222,7 +150,6 @@ class STTService(AIService):
    """STTService is a base class for speech-to-text services."""

    def __init__(self,
-                 *,
                 min_volume: float = 0.6,
                 max_silence_secs: float = 0.3,
                 max_buffer_secs: float = 1.5,
@@ -278,9 +205,7 @@ class STTService(AIService):
            self._silence_num_frames = 0
            self._wave.close()
            self._content.seek(0)
-            await self.start_processing_metrics()
            await self.process_generator(self.run_stt(self._content.read()))
-            await self.stop_processing_metrics()
            (self._content, self._wave) = self._new_wave()

    async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -313,9 +238,7 @@ class ImageGenService(AIService):

        if isinstance(frame, TextFrame):
            await self.push_frame(frame, direction)
-            await self.start_processing_metrics()
            await self.process_generator(self.run_image_gen(frame.text))
-            await self.stop_processing_metrics()
        else:
            await self.push_frame(frame, direction)

@@ -335,8 +258,6 @@ class VisionService(AIService):
        await super().process_frame(frame, direction)

        if isinstance(frame, VisionImageRawFrame):
-            await self.start_processing_metrics()
            await self.process_generator(self.run_vision(frame))
-            await self.stop_processing_metrics()
        else:
            await self.push_frame(frame, direction)
--- a/src/pipecat/services/anthropic.py
+++ b/src/pipecat/services/anthropic.py
@@ -8,11 +8,12 @@ import base64

 from pipecat.frames.frames import (
    Frame,
-    LLMModelUpdateFrame,
    TextFrame,
    VisionImageRawFrame,
    LLMMessagesFrame,
    LLMFullResponseStartFrame,
+    LLMResponseStartFrame,
+    LLMResponseEndFrame,
    LLMFullResponseEndFrame
 )
 from pipecat.processors.frame_processor import FrameDirection
@@ -40,7 +41,6 @@ class AnthropicLLMService(LLMService):

    def __init__(
            self,
-            *,
            api_key: str,
            model: str = "claude-3-opus-20240229",
            max_tokens: int = 1024):
@@ -117,10 +117,12 @@ class AnthropicLLMService(LLMService):
            async for event in response:
                # logger.debug(f"Anthropic LLM event: {event}")
                if (event.type == "content_block_delta"):
+                    await self.push_frame(LLMResponseStartFrame())
                    await self.push_frame(TextFrame(event.delta.text))
+                    await self.push_frame(LLMResponseEndFrame())

        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
+            logger.error(f"{self} exception: {e}")
        finally:
            await self.push_frame(LLMFullResponseEndFrame())

@@ -135,9 +137,6 @@ class AnthropicLLMService(LLMService):
            context = OpenAILLMContext.from_messages(frame.messages)
        elif isinstance(frame, VisionImageRawFrame):
            context = OpenAILLMContext.from_image_frame(frame)
-        elif isinstance(frame, LLMModelUpdateFrame):
-            logger.debug(f"Switching LLM model to: [{frame.model}]")
-            self._model = frame.model
        else:
            await self.push_frame(frame, direction)

--- a/src/pipecat/services/azure.py
+++ b/src/pipecat/services/azure.py
@@ -12,18 +12,9 @@ import time
 from PIL import Image
 from typing import AsyncGenerator

-from pipecat.frames.frames import (
-    AudioRawFrame,
-    CancelFrame,
-    EndFrame,
-    ErrorFrame,
-    Frame,
-    StartFrame,
-    SystemFrame,
-    TranscriptionFrame,
-    URLImageRawFrame)
+from pipecat.frames.frames import AudioRawFrame, CancelFrame, EndFrame, ErrorFrame, Frame, StartFrame, SystemFrame, TranscriptionFrame, URLImageRawFrame
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.ai_services import AsyncAIService, TTSService, ImageGenService
+from pipecat.services.ai_services import AIService, TTSService, ImageGenService
 from pipecat.services.openai import BaseOpenAILLMService

 from loguru import logger
@@ -43,7 +34,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use Azure, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
+        "In order to use Azure TTS, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
    raise Exception(f"Missing module: {e}")


@@ -81,12 +72,8 @@ class AzureTTSService(TTSService):
    def can_generate_metrics(self) -> bool:
        return True

-    async def set_voice(self, voice: str):
-        logger.debug(f"Switching TTS voice to: [{voice}]")
-        self._voice = voice
-
    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
-        logger.debug(f"Generating TTS: [{text}]")
+        logger.debug(f"Generating TTS: {text}")

        await self.start_ttfb_metrics()

@@ -113,7 +100,7 @@ class AzureTTSService(TTSService):
                logger.error(f"{self} error: {cancellation_details.error_details}")


-class AzureSTTService(AsyncAIService):
+class AzureSTTService(AIService):
    def __init__(
            self,
            *,
@@ -136,6 +123,8 @@ class AzureSTTService(AsyncAIService):
            speech_config=speech_config, audio_config=audio_config)
        self._speech_recognizer.recognized.connect(self._on_handle_recognized)

+        self._create_push_task()
+
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

@@ -151,16 +140,34 @@ class AzureSTTService(AsyncAIService):

    async def stop(self, frame: EndFrame):
        self._speech_recognizer.stop_continuous_recognition_async()
-        self._audio_stream.close()
+        await self._push_queue.put((frame, FrameDirection.DOWNSTREAM))
+        await self._push_frame_task

    async def cancel(self, frame: CancelFrame):
        self._speech_recognizer.stop_continuous_recognition_async()
-        self._audio_stream.close()
+        self._push_frame_task.cancel()
+        await self._push_frame_task
+
+    def _create_push_task(self):
+        self._push_queue = asyncio.Queue()
+        self._push_frame_task = self.get_event_loop().create_task(self._push_frame_task_handler())
+
+    async def _push_frame_task_handler(self):
+        running = True
+        while running:
+            try:
+                (frame, direction) = await self._push_queue.get()
+                await self.push_frame(frame, direction)
+                running = not isinstance(frame, EndFrame)
+            except asyncio.CancelledError:
+                break

    def _on_handle_recognized(self, event):
        if event.result.reason == ResultReason.RecognizedSpeech and len(event.result.text) > 0:
+            direction = FrameDirection.DOWNSTREAM
            frame = TranscriptionFrame(event.result.text, "", int(time.time_ns() / 1000000))
-            asyncio.run_coroutine_threadsafe(self.queue_frame(frame), self.get_event_loop())
+            asyncio.run_coroutine_threadsafe(
+                self._push_queue.put((frame, direction)), self.get_event_loop())


 class AzureImageGenServiceREST(ImageGenService):
--- a/src/pipecat/services/cartesia.py
+++ b/src/pipecat/services/cartesia.py
@@ -4,37 +4,15 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-import json
-import uuid
-import base64
-import asyncio
-import time
+from cartesia.tts import AsyncCartesiaTTS

 from typing import AsyncGenerator

-from pipecat.processors.frame_processor import FrameDirection
-from pipecat.frames.frames import (
-    Frame,
-    AudioRawFrame,
-    StartInterruptionFrame,
-    StartFrame,
-    EndFrame,
-    TextFrame,
-    LLMFullResponseEndFrame
-)
+from pipecat.frames.frames import AudioRawFrame, Frame
 from pipecat.services.ai_services import TTSService

 from loguru import logger

-# See .env.example for Cartesia configuration needed
-try:
-    import websockets
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error(
-        "In order to use Cartesia, you need to `pip install pipecat-ai[cartesia]`. Also, set `CARTESIA_API_KEY` environment variable.")
-    raise Exception(f"Missing module: {e}")
-

 class CartesiaTTSService(TTSService):

@@ -42,184 +20,44 @@ class CartesiaTTSService(TTSService):
            self,
            *,
            api_key: str,
-            voice_id: str,
-            cartesia_version: str = "2024-06-10",
-            url: str = "wss://api.cartesia.ai/tts/websocket",
-            model_id: str = "sonic-english",
-            encoding: str = "pcm_s16le",
-            sample_rate: int = 16000,
-            language: str = "en",
+            voice_name: str,
+            model_id: str = "upbeat-moon",
+            output_format: str = "pcm_16000",
            **kwargs):
        super().__init__(**kwargs)

-        # Aggregating sentences still gives cleaner-sounding results and fewer
-        # artifacts than streaming one word at a time. On average, waiting for
-        # a full sentence should only "cost" us 15ms or so with GPT-4o or a Llama 3
-        # model, and it's worth it for the better audio quality.
-        self._aggregate_sentences = True
-
-        # we don't want to automatically push LLM response text frames, because the
-        # context aggregators will add them to the LLM context even if we're
-        # interrupted. cartesia gives us word-by-word timestamps. we can use those
-        # to generate text frames ourselves aligned with the playout timing of the audio!
-        self._push_text_frames = False
-
        self._api_key = api_key
-        self._cartesia_version = cartesia_version
-        self._url = url
-        self._voice_id = voice_id
+        self._voice_name = voice_name
        self._model_id = model_id
-        self._output_format = {
-            "container": "raw",
-            "encoding": encoding,
-            "sample_rate": sample_rate,
-        }
-        self._language = language
+        self._output_format = output_format

-        self._websocket = None
-        self._context_id = None
-        self._context_id_start_timestamp = None
-        self._timestamped_words_buffer = []
-        self._receive_task = None
-        self._context_appending_task = None
+        try:
+            self._client = AsyncCartesiaTTS(api_key=self._api_key)
+            voices = self._client.get_voices()
+            voice_id = voices[self._voice_name]["id"]
+            self._voice = self._client.get_voice_embedding(voice_id=voice_id)
+        except Exception as e:
+            logger.error(f"{self} initialization error: {e}")

    def can_generate_metrics(self) -> bool:
        return True

-    async def set_voice(self, voice: str):
-        logger.debug(f"Switching TTS voice to: [{voice}]")
-        self._voice_id = voice
-
-    async def start(self, frame: StartFrame):
-        await super().start(frame)
-        await self._connect()
-
-    async def stop(self, frame: EndFrame):
-        await super().stop(frame)
-        await self._disconnect()
-
-    async def _connect(self):
-        try:
-            self._websocket = await websockets.connect(
-                f"{self._url}?api_key={self._api_key}&cartesia_version={self._cartesia_version}"
-            )
-            self._receive_task = self.get_event_loop().create_task(self._receive_task_handler())
-            self._context_appending_task = self.get_event_loop().create_task(self._context_appending_task_handler())
-        except Exception as e:
-            logger.exception(f"{self} initialization error: {e}")
-            self._websocket = None
-
-    async def _disconnect(self):
-        try:
-            if self._context_appending_task:
-                self._context_appending_task.cancel()
-                await self._context_appending_task
-                self._context_appending_task = None
-            if self._receive_task:
-                self._receive_task.cancel()
-                await self._receive_task
-                self._receive_task = None
-            if self._websocket:
-                ws = self._websocket
-                self._websocket = None
-                await ws.close()
-            self._context_id = None
-            self._context_id_start_timestamp = None
-            self._timestamped_words_buffer = []
-            await self.stop_all_metrics()
-        except Exception as e:
-            logger.exception(f"{self} error closing websocket: {e}")
-
-    async def _handle_interruption(self, frame: StartInterruptionFrame, direction: FrameDirection):
-        await super()._handle_interruption(frame, direction)
-        self._context_id = None
-        self._context_id_start_timestamp = None
-        self._timestamped_words_buffer = []
-        await self.stop_all_metrics()
-        await self.push_frame(LLMFullResponseEndFrame())
-
-    async def _receive_task_handler(self):
-        try:
-            async for message in self._websocket:
-                msg = json.loads(message)
-                # logger.debug(f"Received message: {msg['type']} {msg['context_id']}")
-                if not msg or msg["context_id"] != self._context_id:
-                    continue
-                if msg["type"] == "done":
-                    await self.stop_ttfb_metrics()
-                    # unset _context_id but not the _context_id_start_timestamp because we are likely still
-                    # playing out audio and need the timestamp to set send context frames
-                    self._context_id = None
-                    self._timestamped_words_buffer.append(("LLMFullResponseEndFrame", 0))
-                elif msg["type"] == "timestamps":
-                    # logger.debug(f"TIMESTAMPS: {msg}")
-                    self._timestamped_words_buffer.extend(
-                        list(zip(msg["word_timestamps"]["words"], msg["word_timestamps"]["end"]))
-                    )
-                elif msg["type"] == "chunk":
-                    await self.stop_ttfb_metrics()
-                    if not self._context_id_start_timestamp:
-                        self._context_id_start_timestamp = time.time()
-                    frame = AudioRawFrame(
-                        audio=base64.b64decode(msg["data"]),
-                        sample_rate=self._output_format["sample_rate"],
-                        num_channels=1
-                    )
-                    await self.push_frame(frame)
-        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
-
-    async def _context_appending_task_handler(self):
-        try:
-            while True:
-                await asyncio.sleep(0.1)
-                if not self._context_id_start_timestamp:
-                    continue
-                elapsed_seconds = time.time() - self._context_id_start_timestamp
-                # pop all words from self._timestamped_words_buffer that are older than the
-                # elapsed time and print a message about them to the console
-                while self._timestamped_words_buffer and self._timestamped_words_buffer[0][1] <= elapsed_seconds:
-                    word, timestamp = self._timestamped_words_buffer.pop(0)
-                    if word == "LLMFullResponseEndFrame" and timestamp == 0:
-                        await self.push_frame(LLMFullResponseEndFrame())
-                        continue
-                    # print(f"Word '{word}' with timestamp {timestamp:.2f}s has been spoken.")
-                    await self.push_frame(TextFrame(word))
-        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
-
    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
        logger.debug(f"Generating TTS: [{text}]")

        try:
-            if not self._websocket:
-                await self._connect()
+            await self.start_ttfb_metrics()

-            if not self._context_id:
-                await self.start_ttfb_metrics()
-                self._context_id = str(uuid.uuid4())
+            chunk_generator = await self._client.generate(
+                stream=True,
+                transcript=text,
+                voice=self._voice,
+                model_id=self._model_id,
+                output_format=self._output_format,
+            )

-            msg = {
-                "transcript": text + " ",
-                "continue": True,
-                "context_id": self._context_id,
-                "model_id": self._model_id,
-                "voice": {
-                    "mode": "id",
-                    "id": self._voice_id
-                },
-                "output_format": self._output_format,
-                "language": self._language,
-                "add_timestamps": True,
-            }
-            # logger.debug(f"SENDING MESSAGE {json.dumps(msg)}")
-            try:
-                await self._websocket.send(json.dumps(msg))
-            except Exception as e:
-                logger.exception(f"{self} error sending message: {e}")
-                await self._disconnect()
-                await self._connect()
-                return
-            yield None
+            async for chunk in chunk_generator:
+                await self.stop_ttfb_metrics()
+                yield AudioRawFrame(chunk["audio"], chunk["sampling_rate"], 1)
        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
+            logger.error(f"{self} exception: {e}")
--- a/src/pipecat/services/deepgram.py
+++ b/src/pipecat/services/deepgram.py
@@ -5,6 +5,7 @@
 #

 import aiohttp
+import asyncio
 import time

 from typing import AsyncGenerator
@@ -20,24 +21,17 @@ from pipecat.frames.frames import (
    SystemFrame,
    TranscriptionFrame)
 from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.ai_services import AsyncAIService, TTSService
+from pipecat.services.ai_services import AIService, TTSService
+
+from deepgram import (
+    DeepgramClient,
+    DeepgramClientOptions,
+    LiveTranscriptionEvents,
+    LiveOptions,
+)

 from loguru import logger

-# See .env.example for Deepgram configuration needed
-try:
-    from deepgram import (
-        DeepgramClient,
-        DeepgramClientOptions,
-        LiveTranscriptionEvents,
-        LiveOptions,
-    )
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error(
-        "In order to use Deepgram, you need to `pip install pipecat-ai[deepgram]`. Also, set `DEEPGRAM_API_KEY` environment variable.")
-    raise Exception(f"Missing module: {e}")
-

 class DeepgramTTSService(TTSService):

@@ -59,10 +53,6 @@ class DeepgramTTSService(TTSService):
    def can_generate_metrics(self) -> bool:
        return True

-    async def set_voice(self, voice: str):
-        logger.debug(f"Switching TTS voice to: [{voice}]")
-        self._voice = voice
-
    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
        logger.debug(f"Generating TTS: [{text}]")

@@ -93,12 +83,11 @@ class DeepgramTTSService(TTSService):
                    frame = AudioRawFrame(audio=data, sample_rate=16000, num_channels=1)
                    yield frame
        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
+            logger.error(f"{self} exception: {e}")


-class DeepgramSTTService(AsyncAIService):
+class DeepgramSTTService(AIService):
    def __init__(self,
-                 *,
                 api_key: str,
                 url: str = "",
                 live_options: LiveOptions = LiveOptions(
@@ -120,6 +109,8 @@ class DeepgramSTTService(AsyncAIService):
        self._connection = self._client.listen.asynclive.v("1")
        self._connection.on(LiveTranscriptionEvents.Transcript, self._on_message)

+        self._create_push_task()
+
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

@@ -128,7 +119,7 @@ class DeepgramSTTService(AsyncAIService):
        elif isinstance(frame, AudioRawFrame):
            await self._connection.send(frame.audio)
        else:
-            await self.queue_frame(frame, direction)
+            await self._push_queue.put((frame, direction))

    async def start(self, frame: StartFrame):
        if await self._connection.start(self._live_options):
@@ -138,9 +129,27 @@ class DeepgramSTTService(AsyncAIService):

    async def stop(self, frame: EndFrame):
        await self._connection.finish()
+        await self._push_queue.put((frame, FrameDirection.DOWNSTREAM))
+        await self._push_frame_task

    async def cancel(self, frame: CancelFrame):
        await self._connection.finish()
+        self._push_frame_task.cancel()
+        await self._push_frame_task
+
+    def _create_push_task(self):
+        self._push_queue = asyncio.Queue()
+        self._push_frame_task = self.get_event_loop().create_task(self._push_frame_task_handler())
+
+    async def _push_frame_task_handler(self):
+        running = True
+        while running:
+            try:
+                (frame, direction) = await self._push_queue.get()
+                await self.push_frame(frame, direction)
+                running = not isinstance(frame, EndFrame)
+            except asyncio.CancelledError:
+                break

    async def _on_message(self, *args, **kwargs):
        result = kwargs["result"]
@@ -148,6 +157,6 @@ class DeepgramSTTService(AsyncAIService):
        transcript = result.channel.alternatives[0].transcript
        if len(transcript) > 0:
            if is_final:
-                await self.queue_frame(TranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)))
+                await self._push_queue.put((TranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)), FrameDirection.DOWNSTREAM))
            else:
-                await self.queue_frame(InterimTranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)))
+                await self._push_queue.put((InterimTranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)), FrameDirection.DOWNSTREAM))
--- a/src/pipecat/services/elevenlabs.py
+++ b/src/pipecat/services/elevenlabs.py
@@ -34,10 +34,6 @@ class ElevenLabsTTSService(TTSService):
    def can_generate_metrics(self) -> bool:
        return True

-    async def set_voice(self, voice: str):
-        logger.debug(f"Switching TTS voice to: [{voice}]")
-        self._voice_id = voice
-
    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
        logger.debug(f"Generating TTS: [{text}]")

--- a/src/pipecat/services/fal.py
+++ b/src/pipecat/services/fal.py
@@ -56,7 +56,7 @@ class FalImageGenService(ImageGenService):

        response = await fal_client.run_async(
            self._model,
-            arguments={"prompt": prompt, **self._params.model_dump(exclude_none=True)}
+            arguments={"prompt": prompt, **self._params.model_dump()}
        )

        image_url = response["images"][0]["url"] if response else None
--- a/src/pipecat/services/fireworks.py
+++ b/src/pipecat/services/fireworks.py
@@ -19,7 +19,6 @@ except ModuleNotFoundError as e:

 class FireworksLLMService(BaseOpenAILLMService):
    def __init__(self,
-                 *,
                 model: str = "accounts/fireworks/models/firefunction-v1",
                 base_url: str = "https://api.fireworks.ai/inference/v1"):
        super().__init__(model, base_url)
--- a/src/pipecat/services/gladia.py
+++ b/src/pipecat/services/gladia.py
@@ -1,115 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import base64
-import json
-import time
-
-from typing import Optional
-from pydantic.main import BaseModel
-
-from pipecat.frames.frames import (
-    AudioRawFrame,
-    CancelFrame,
-    EndFrame,
-    Frame,
-    InterimTranscriptionFrame,
-    StartFrame,
-    SystemFrame,
-    TranscriptionFrame)
-from pipecat.processors.frame_processor import FrameDirection
-from pipecat.services.ai_services import AsyncAIService
-
-from loguru import logger
-
-# See .env.example for Gladia configuration needed
-try:
-    import websockets
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error(
-        "In order to use Gladia, you need to `pip install pipecat-ai[gladia]`. Also, set `GLADIA_API_KEY` environment variable.")
-    raise Exception(f"Missing module: {e}")
-
-
-class GladiaSTTService(AsyncAIService):
-    class InputParams(BaseModel):
-        sample_rate: Optional[int] = 16000
-        language: Optional[str] = "english"
-        transcription_hint: Optional[str] = None
-        endpointing: Optional[int] = 200
-        prosody: Optional[bool] = None
-
-    def __init__(self,
-                 *,
-                 api_key: str,
-                 url: str = "wss://api.gladia.io/audio/text/audio-transcription",
-                 confidence: float = 0.5,
-                 params: InputParams = InputParams(),
-                 **kwargs):
-        super().__init__(**kwargs)
-
-        self._api_key = api_key
-        self._url = url
-        self._params = params
-        self._confidence = confidence
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, SystemFrame):
-            await self.push_frame(frame, direction)
-        elif isinstance(frame, AudioRawFrame):
-            await self._send_audio(frame)
-        else:
-            await self.queue_frame(frame, direction)
-
-    async def start(self, frame: StartFrame):
-        self._websocket = await websockets.connect(self._url)
-        self._receive_task = self.get_event_loop().create_task(self._receive_task_handler())
-        await self._setup_gladia()
-
-    async def stop(self, frame: EndFrame):
-        await self._websocket.close()
-
-    async def cancel(self, frame: CancelFrame):
-        await self._websocket.close()
-
-    async def _setup_gladia(self):
-        configuration = {
-            "x_gladia_key": self._api_key,
-            "encoding": "WAV/PCM",
-            "model_type": "fast",
-            "language_behaviour": "manual",
-            **self._params.model_dump(exclude_none=True)
-        }
-
-        await self._websocket.send(json.dumps(configuration))
-
-    async def _send_audio(self, frame: AudioRawFrame):
-        message = {
-            'frames': base64.b64encode(frame.audio).decode("utf-8")
-        }
-        await self._websocket.send(json.dumps(message))
-
-    async def _receive_task_handler(self):
-        async for message in self._websocket:
-            utterance = json.loads(message)
-            if not utterance:
-                continue
-
-            if "error" in utterance:
-                message = utterance["message"]
-                logger.error(f"Gladia error: {message}")
-            elif "confidence" in utterance:
-                type = utterance["type"]
-                confidence = utterance["confidence"]
-                transcript = utterance["transcription"]
-                if confidence >= self._confidence:
-                    if type == "final":
-                        await self.queue_frame(TranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)))
-                    else:
-                        await self.queue_frame(InterimTranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)))
--- a/src/pipecat/services/google.py
+++ b/src/pipecat/services/google.py
@@ -10,11 +10,12 @@ from typing import List

 from pipecat.frames.frames import (
    Frame,
-    LLMModelUpdateFrame,
    TextFrame,
    VisionImageRawFrame,
    LLMMessagesFrame,
    LLMFullResponseStartFrame,
+    LLMResponseStartFrame,
+    LLMResponseEndFrame,
    LLMFullResponseEndFrame
 )
 from pipecat.processors.frame_processor import FrameDirection
@@ -41,17 +42,14 @@ class GoogleLLMService(LLMService):
    franca for all LLM services, so that it is easy to switch between different LLMs.
    """

-    def __init__(self, *, api_key: str, model: str = "gemini-1.5-flash-latest", **kwargs):
+    def __init__(self, api_key: str, model: str = "gemini-1.5-flash-latest", **kwargs):
        super().__init__(**kwargs)
        gai.configure(api_key=api_key)
-        self._create_client(model)
+        self._client = gai.GenerativeModel(model)

    def can_generate_metrics(self) -> bool:
        return True

-    def _create_client(self, model: str):
-        self._client = gai.GenerativeModel(model)
-
    def _get_messages_from_openai_context(
            self, context: OpenAILLMContext) -> List[glm.Content]:
        openai_messages = context.get_messages()
@@ -97,17 +95,19 @@ class GoogleLLMService(LLMService):
            async for chunk in self._async_generator_wrapper(response):
                try:
                    text = chunk.text
+                    await self.push_frame(LLMResponseStartFrame())
                    await self.push_frame(TextFrame(text))
+                    await self.push_frame(LLMResponseEndFrame())
                except Exception as e:
                    # Google LLMs seem to flag safety issues a lot!
                    if chunk.candidates[0].finish_reason == 3:
                        logger.debug(
                            f"LLM refused to generate content for safety reasons - {messages}.")
                    else:
-                        logger.exception(f"{self} error: {e}")
+                        logger.error(f"{self} error: {e}")

        except Exception as e:
-            logger.exception(f"{self} exception: {e}")
+            logger.error(f"{self} exception: {e}")
        finally:
            await self.push_frame(LLMFullResponseEndFrame())

@@ -122,9 +122,6 @@ class GoogleLLMService(LLMService):
            context = OpenAILLMContext.from_messages(frame.messages)
        elif isinstance(frame, VisionImageRawFrame):
            context = OpenAILLMContext.from_image_frame(frame)
-        elif isinstance(frame, LLMModelUpdateFrame):
-            logger.debug(f"Switching LLM model to: [{frame.model}]")
-            self._create_client(frame.model)
        else:
            await self.push_frame(frame, direction)

--- a/src/pipecat/services/moondream.py
+++ b/src/pipecat/services/moondream.py
@@ -46,7 +46,6 @@ def detect_device():
 class MoondreamService(VisionService):
    def __init__(
        self,
-            *,
        model="vikhyatk/moondream2",
        revision="2024-04-02",
        use_cpu=False
--- a/src/pipecat/services/ollama.py
+++ b/src/pipecat/services/ollama.py
@@ -9,5 +9,5 @@ from pipecat.services.openai import BaseOpenAILLMService

 class OLLamaLLMService(BaseOpenAILLMService):

-    def __init__(self, *, model: str = "llama2", base_url: str = "http://localhost:11434/v1"):
+    def __init__(self, model: str = "llama2", base_url: str = "http://localhost:11434/v1"):
        super().__init__(model=model, base_url=base_url, api_key="ollama")
--- a/src/pipecat/services/openai.py
+++ b/src/pipecat/services/openai.py
@@ -8,9 +8,8 @@ import aiohttp
 import base64
 import io
 import json
-import httpx

-from typing import AsyncGenerator, List, Literal
+from typing import Any, AsyncGenerator, List, Literal

 from loguru import logger
 from PIL import Image
@@ -22,7 +21,8 @@ from pipecat.frames.frames import (
    LLMFullResponseEndFrame,
    LLMFullResponseStartFrame,
    LLMMessagesFrame,
-    LLMModelUpdateFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
    TextFrame,
    URLImageRawFrame,
    VisionImageRawFrame
@@ -39,7 +39,7 @@ from pipecat.services.ai_services import (
 )

 try:
-    from openai import AsyncOpenAI, AsyncStream, DefaultAsyncHttpxClient, BadRequestError
+    from openai import AsyncOpenAI, AsyncStream, BadRequestError
    from openai.types.chat import (
        ChatCompletionChunk,
        ChatCompletionFunctionMessageParam,
@@ -53,7 +53,7 @@ except ModuleNotFoundError as e:
    raise Exception(f"Missing module: {e}")


-class OpenAIUnhandledFunctionException(Exception):
+class OpenAIUnhandledFunctionException(BaseException):
    pass


@@ -67,20 +67,13 @@ class BaseOpenAILLMService(LLMService):
    calls from the LLM.
    """

-    def __init__(self, *, model: str, api_key=None, base_url=None, **kwargs):
+    def __init__(self, model: str, api_key=None, base_url=None, **kwargs):
        super().__init__(**kwargs)
        self._model: str = model
        self._client = self.create_client(api_key=api_key, base_url=base_url, **kwargs)

    def create_client(self, api_key=None, base_url=None, **kwargs):
-        return AsyncOpenAI(
-            api_key=api_key,
-            base_url=base_url,
-            http_client=DefaultAsyncHttpxClient(
-                limits=httpx.Limits(
-                    max_keepalive_connections=100,
-                    max_connections=1000,
-                    keepalive_expiry=None)))
+        return AsyncOpenAI(api_key=api_key, base_url=base_url)

    def can_generate_metrics(self) -> bool:
        return True
@@ -116,7 +109,10 @@ class BaseOpenAILLMService(LLMService):
                del message["data"]
                del message["mime_type"]

-        chunks = await self.get_chat_completions(context, messages)
+        try:
+            chunks = await self.get_chat_completions(context, messages)
+        except Exception as e:
+            logger.error(f"{self} exception: {e}")

        return chunks

@@ -158,7 +154,9 @@ class BaseOpenAILLMService(LLMService):
                    # Keep iterating through the response to collect all the argument fragments
                    arguments += tool_call.function.arguments
            elif chunk.choices[0].delta.content:
+                await self.push_frame(LLMResponseStartFrame())
                await self.push_frame(TextFrame(chunk.choices[0].delta.content))
+                await self.push_frame(LLMResponseEndFrame())

        # if we got a function name and arguments, check to see if it's a function with
        # a registered handler. If so, run the registered callback, save the result to
@@ -216,7 +214,7 @@ class BaseOpenAILLMService(LLMService):
        elif isinstance(result, type(None)):
            pass
        else:
-            raise TypeError(f"Unknown return type from function callback: {type(result)}")
+            raise BaseException(f"Unknown return type from function callback: {type(result)}")

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)
@@ -228,24 +226,19 @@ class BaseOpenAILLMService(LLMService):
            context = OpenAILLMContext.from_messages(frame.messages)
        elif isinstance(frame, VisionImageRawFrame):
            context = OpenAILLMContext.from_image_frame(frame)
-        elif isinstance(frame, LLMModelUpdateFrame):
-            logger.debug(f"Switching LLM model to: [{frame.model}]")
-            self._model = frame.model
        else:
            await self.push_frame(frame, direction)

        if context:
            await self.push_frame(LLMFullResponseStartFrame())
-            await self.start_processing_metrics()
            await self._process_context(context)
-            await self.stop_processing_metrics()
            await self.push_frame(LLMFullResponseEndFrame())


 class OpenAILLMService(BaseOpenAILLMService):

-    def __init__(self, *, model: str = "gpt-4o", **kwargs):
-        super().__init__(model=model, **kwargs)
+    def __init__(self, model="gpt-4o", **kwargs):
+        super().__init__(model, **kwargs)


 class OpenAIImageGenService(ImageGenService):
@@ -317,10 +310,6 @@ class OpenAITTSService(TTSService):
    def can_generate_metrics(self) -> bool:
        return True

-    async def set_voice(self, voice: str):
-        logger.debug(f"Switching TTS voice to: [{voice}]")
-        self._voice = voice
-
    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
        logger.debug(f"Generating TTS: [{text}]")

@@ -345,4 +334,4 @@ class OpenAITTSService(TTSService):
                        frame = AudioRawFrame(chunk, 24_000, 1)
                        yield frame
        except BadRequestError as e:
-            logger.exception(f"{self} error generating TTS: {e}")
+            logger.error(f"{self} error generating TTS: {e}")
--- a/src/pipecat/services/openpipe.py
+++ b/src/pipecat/services/openpipe.py
@@ -25,7 +25,6 @@ class OpenPipeLLMService(BaseOpenAILLMService):

    def __init__(
            self,
-            *,
            model: str = "gpt-4o",
            api_key: str | None = None,
            base_url: str | None = None,
@@ -34,9 +33,9 @@ class OpenPipeLLMService(BaseOpenAILLMService):
            tags: Dict[str, str] | None = None,
            **kwargs):
        super().__init__(
-            model=model,
-            api_key=api_key,
-            base_url=base_url,
+            model,
+            api_key,
+            base_url,
            openpipe_api_key=openpipe_api_key,
            openpipe_base_url=openpipe_base_url,
            **kwargs)
--- a/src/pipecat/services/playht.py
+++ b/src/pipecat/services/playht.py
@@ -80,4 +80,4 @@ class PlayHTTTSService(TTSService):
                        frame = AudioRawFrame(chunk, 16000, 1)
                        yield frame
        except Exception as e:
-            logger.exception(f"{self} error generating TTS: {e}")
+            logger.error(f"{self} error generating TTS: {e}")
--- a/src/pipecat/services/whisper.py
+++ b/src/pipecat/services/whisper.py
@@ -42,8 +42,7 @@ class WhisperSTTService(STTService):
    """Class to transcribe audio with a locally-downloaded Whisper model"""

    def __init__(self,
-                 *,
-                 model: str | Model = Model.DISTIL_MEDIUM_EN,
+                 model: Model = Model.DISTIL_MEDIUM_EN,
                 device: str = "auto",
                 compute_type: str = "default",
                 no_speech_prob: float = 0.4,
@@ -52,7 +51,7 @@ class WhisperSTTService(STTService):
        super().__init__(**kwargs)
        self._device: str = device
        self._compute_type = compute_type
-        self._model_name: str | Model = model
+        self._model_name: Model = model
        self._no_speech_prob = no_speech_prob
        self._model: WhisperModel | None = None
        self._load()
@@ -65,7 +64,7 @@ class WhisperSTTService(STTService):
        this model is being run, it will take time to download."""
        logger.debug("Loading Whisper model...")
        self._model = WhisperModel(
-            self._model_name.value if isinstance(self._model_name, Enum) else self._model_name,
+            self._model_name.value,
            device=self._device,
            compute_type=self._compute_type)
        logger.debug("Loaded Whisper model")
--- a/src/pipecat/services/xtts.py
+++ b/src/pipecat/services/xtts.py
@@ -1,116 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import aiohttp
-
-from typing import AsyncGenerator
-
-from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame
-from pipecat.services.ai_services import TTSService
-
-from loguru import logger
-
-import requests
-
-import numpy as np
-
-try:
-    import resampy
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error("In order to use XTTS, you need to `pip install pipecat-ai[xtts]`.")
-    raise Exception(f"Missing module: {e}")
-
-
-# The server below can connect to XTTS through a local running docker
-#
-# Docker command: $ docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121
-#
-# You can find more information on the official repo:
-# https://github.com/coqui-ai/xtts-streaming-server
-
-
-class XTTSService(TTSService):
-
-    def __init__(
-            self,
-            *,
-            aiohttp_session: aiohttp.ClientSession,
-            voice_id: str,
-            language: str,
-            base_url: str,
-            **kwargs):
-        super().__init__(**kwargs)
-
-        self._voice_id = voice_id
-        self._language = language
-        self._base_url = base_url
-        self._aiohttp_session = aiohttp_session
-        self._studio_speakers = requests.get(self._base_url + "/studio_speakers").json()
-
-    def can_generate_metrics(self) -> bool:
-        return True
-
-    async def set_voice(self, voice: str):
-        logger.debug(f"Switching TTS voice to: [{voice}]")
-        self._voice_id = voice
-
-    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
-        logger.debug(f"Generating TTS: [{text}]")
-        embeddings = self._studio_speakers[self._voice_id]
-
-        url = self._base_url + "/tts_stream"
-
-        payload = {
-            "text": text.replace('.', '').replace('*', ''),
-            "language": self._language,
-            "speaker_embedding": embeddings["speaker_embedding"],
-            "gpt_cond_latent": embeddings["gpt_cond_latent"],
-            "add_wav_header": False,
-            "stream_chunk_size": 20,
-        }
-
-        await self.start_ttfb_metrics()
-
-        async with self._aiohttp_session.post(url, json=payload) as r:
-            if r.status != 200:
-                text = await r.text()
-                logger.error(f"{self} error getting audio (status: {r.status}, error: {text})")
-                yield ErrorFrame(f"Error getting audio (status: {r.status}, error: {text})")
-                return
-
-            buffer = bytearray()
-
-            async for chunk in r.content.iter_chunked(1024):
-                if len(chunk) > 0:
-                    await self.stop_ttfb_metrics()
-                    # Append new chunk to the buffer
-                    buffer.extend(chunk)
-
-                    # Check if buffer has enough data for processing
-                    while len(buffer) >= 48000:  # Assuming at least 0.5 seconds of audio data at 24000 Hz
-                        # Process the buffer up to a safe size for resampling
-                        process_data = buffer[:48000]
-                        # Remove processed data from buffer
-                        buffer = buffer[48000:]
-
-                        # Convert the byte data to numpy array for resampling
-                        audio_np = np.frombuffer(process_data, dtype=np.int16)
-                        # Resample the audio from 24000 Hz to 16000 Hz
-                        resampled_audio = resampy.resample(audio_np, 24000, 16000)
-                        # Convert the numpy array back to bytes
-                        resampled_audio_bytes = resampled_audio.astype(np.int16).tobytes()
-                        # Create the frame with the resampled audio
-                        frame = AudioRawFrame(resampled_audio_bytes, 16000, 1)
-                        yield frame
-
-            # Process any remaining data in the buffer
-            if len(buffer) > 0:
-                audio_np = np.frombuffer(buffer, dtype=np.int16)
-                resampled_audio = resampy.resample(audio_np, 24000, 16000)
-                resampled_audio_bytes = resampled_audio.astype(np.int16).tobytes()
-                frame = AudioRawFrame(resampled_audio_bytes, 16000, 1)
-                yield frame
--- a/src/pipecat/transports/base_input.py
+++ b/src/pipecat/transports/base_input.py
@@ -11,7 +11,6 @@ from concurrent.futures import ThreadPoolExecutor
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.frames.frames import (
    AudioRawFrame,
-    BotInterruptionFrame,
    CancelFrame,
    StartFrame,
    EndFrame,
@@ -56,7 +55,7 @@ class BaseInputTransport(FrameProcessor):

    async def push_audio_frame(self, frame: AudioRawFrame):
        if self._params.audio_in_enabled or self._params.vad_enabled:
-            await self._audio_in_queue.put(frame)
+            self._audio_in_queue.put_nowait(frame)

    #
    # Frame processor
@@ -79,8 +78,6 @@ class BaseInputTransport(FrameProcessor):
        elif isinstance(frame, EndFrame):
            await self._internal_push_frame(frame, direction)
            await self.stop()
-        elif isinstance(frame, BotInterruptionFrame):
-            await self._handle_interruptions(frame, False)
        else:
            await self._internal_push_frame(frame, direction)

@@ -104,7 +101,6 @@ class BaseInputTransport(FrameProcessor):
            try:
                (frame, direction) = await self._push_queue.get()
                await self.push_frame(frame, direction)
-                self._push_queue.task_done()
            except asyncio.CancelledError:
                break

@@ -112,35 +108,19 @@ class BaseInputTransport(FrameProcessor):
    # Handle interruptions
    #

-    async def _start_interruption(self):
-        # Cancel the task. This will stop pushing frames downstream.
-        self._push_frame_task.cancel()
-        await self._push_frame_task
-        # Push an out-of-band frame (i.e. not using the ordered push
-        # frame task) to stop everything, specially at the output
-        # transport.
-        await self.push_frame(StartInterruptionFrame())
-        # Create a new queue and task.
-        self._create_push_task()
-
-    async def _stop_interruption(self):
-        await self.push_frame(StopInterruptionFrame())
-
-    async def _handle_interruptions(self, frame: Frame, push_frame: bool):
+    async def _handle_interruptions(self, frame: Frame):
        if self.interruptions_allowed:
            # Make sure we notify about interruptions quickly out-of-band
-            if isinstance(frame, BotInterruptionFrame):
-                logger.debug("Bot interruption")
-                await self._start_interruption()
-            elif isinstance(frame, UserStartedSpeakingFrame):
+            if isinstance(frame, UserStartedSpeakingFrame):
                logger.debug("User started speaking")
-                await self._start_interruption()
+                self._push_frame_task.cancel()
+                await self._push_frame_task
+                self._create_push_task()
+                await self.push_frame(StartInterruptionFrame())
            elif isinstance(frame, UserStoppedSpeakingFrame):
                logger.debug("User stopped speaking")
-                await self._stop_interruption()
-
-        if push_frame:
-            await self._internal_push_frame(frame)
+                await self.push_frame(StopInterruptionFrame())
+        await self._internal_push_frame(frame)

    #
    # Audio input
@@ -164,7 +144,7 @@ class BaseInputTransport(FrameProcessor):
                frame = UserStoppedSpeakingFrame()

            if frame:
-                await self._handle_interruptions(frame, True)
+                await self._handle_interruptions(frame)

            vad_state = new_vad_state
        return vad_state
@@ -186,9 +166,7 @@ class BaseInputTransport(FrameProcessor):
                # Push audio downstream if passthrough.
                if audio_passthrough:
                    await self._internal_push_frame(frame)
-
-                self._audio_in_queue.task_done()
            except asyncio.CancelledError:
                break
-            except Exception as e:
-                logger.exception(f"{self} error reading audio frames: {e}")
+            except BaseException as e:
+                logger.error(f"{self} error reading audio frames: {e}")
--- a/src/pipecat/transports/base_output.py
+++ b/src/pipecat/transports/base_output.py
@@ -14,7 +14,6 @@ from typing import List
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.frames.frames import (
    AudioRawFrame,
-    BotSpeakingFrame,
    CancelFrame,
    MetricsFrame,
    SpriteFrame,
@@ -181,8 +180,8 @@ class BaseOutputTransport(FrameProcessor):
                self._sink_queue.task_done()
            except asyncio.CancelledError:
                break
-            except Exception as e:
-                logger.exception(f"{self} error processing sink queue: {e}")
+            except BaseException as e:
+                logger.error(f"{self} error processing sink queue: {e}")

    #
    # Push frames task
@@ -204,7 +203,6 @@ class BaseOutputTransport(FrameProcessor):
            try:
                (frame, direction) = await self._push_queue.get()
                await self.push_frame(frame, direction)
-                self._push_queue.task_done()
            except asyncio.CancelledError:
                break

@@ -252,7 +250,7 @@ class BaseOutputTransport(FrameProcessor):
            except asyncio.CancelledError:
                break
            except Exception as e:
-                logger.exception(f"{self} error writing to camera: {e}")
+                logger.error(f"{self} error writing to camera: {e}")

    #
    # Audio out
@@ -265,5 +263,4 @@ class BaseOutputTransport(FrameProcessor):
        if len(buffer) >= self._audio_chunk_size:
            await self.write_raw_audio_frames(bytes(buffer[:self._audio_chunk_size]))
            buffer = buffer[self._audio_chunk_size:]
-            await self.push_frame(BotSpeakingFrame(), FrameDirection.UPSTREAM)
        return buffer
--- a/src/pipecat/transports/base_transport.py
+++ b/src/pipecat/transports/base_transport.py
@@ -82,4 +82,5 @@ class BaseTransport(ABC):
                else:
                    handler(self, *args, **kwargs)
        except Exception as e:
-            logger.exception(f"Exception in event handler {event_name}: {e}")
+            logger.error(f"Exception in event handler {event_name}: {e}")
+            raise e
--- a/src/pipecat/transports/network/fastapi_websocket.py
+++ b/src/pipecat/transports/network/fastapi_websocket.py
@@ -12,6 +12,7 @@ import wave
 from typing import Awaitable, Callable
 from pydantic.main import BaseModel

+from pipecat.serializers.twilio import TwilioFrameSerializer
 from pipecat.frames.frames import AudioRawFrame, StartFrame
 from pipecat.processors.frame_processor import FrameProcessor
 from pipecat.serializers.base_serializer import FrameSerializer
@@ -34,7 +35,7 @@ except ModuleNotFoundError as e:
 class FastAPIWebsocketParams(TransportParams):
    add_wav_header: bool = False
    audio_frame_size: int = 6400  # 200ms
-    serializer: FrameSerializer
+    serializer: FrameSerializer = TwilioFrameSerializer()


 class FastAPIWebsocketCallbacks(BaseModel):
@@ -113,7 +114,7 @@ class FastAPIWebsocketOutputTransport(BaseOutputTransport):
                frame = wav_frame

            payload = self._params.serializer.serialize(frame)
-            if payload and self._websocket.client_state == WebSocketState.CONNECTED:
+            if payload:
                await self._websocket.send_text(payload)

            self._audio_buffer = self._audio_buffer[self._params.audio_frame_size:]
@@ -124,7 +125,7 @@ class FastAPIWebsocketTransport(BaseTransport):
    def __init__(
            self,
            websocket: WebSocket,
-            params: FastAPIWebsocketParams,
+            params: FastAPIWebsocketParams = FastAPIWebsocketParams(),
            input_name: str | None = None,
            output_name: str | None = None,
            loop: asyncio.AbstractEventLoop | None = None):
--- a/src/pipecat/transports/network/websocket_server.py
+++ b/src/pipecat/transports/network/websocket_server.py
@@ -124,9 +124,6 @@ class WebsocketServerOutputTransport(BaseOutputTransport):
        self._websocket = websocket

    async def write_raw_audio_frames(self, frames: bytes):
-        if not self._websocket:
-            return
-
        self._audio_buffer += frames
        while len(self._audio_buffer) >= self._params.audio_frame_size:
            frame = AudioRawFrame(
@@ -151,8 +148,8 @@ class WebsocketServerOutputTransport(BaseOutputTransport):
                frame = wav_frame

            proto = self._params.serializer.serialize(frame)
-            if proto:
-                await self._websocket.send(proto)
+
+            await self._websocket.send(proto)

            self._audio_buffer = self._audio_buffer[self._params.audio_frame_size:]

--- a/src/pipecat/transports/services/daily.py
+++ b/src/pipecat/transports/services/daily.py
@@ -9,7 +9,7 @@ import asyncio
 import time

 from dataclasses import dataclass
-from typing import Any, Awaitable, Callable, Mapping, Optional
+from typing import Any, Awaitable, Callable, Mapping
 from concurrent.futures import ThreadPoolExecutor

 from daily import (
@@ -59,8 +59,8 @@ class DailyTransportMessageFrame(TransportMessageFrame):

 class WebRTCVADAnalyzer(VADAnalyzer):

-    def __init__(self, *, sample_rate=16000, num_channels=1, params: VADParams = VADParams()):
-        super().__init__(sample_rate=sample_rate, num_channels=num_channels, params=params)
+    def __init__(self, sample_rate=16000, num_channels=1, params: VADParams = VADParams()):
+        super().__init__(sample_rate, num_channels, params)

        self._webrtc_vad = Daily.create_native_vad(
            reset_period_ms=VAD_RESET_PERIOD_MS,
@@ -101,7 +101,7 @@ class DailyTranscriptionSettings(BaseModel):
 class DailyParams(TransportParams):
    api_url: str = "https://api.daily.co/v1"
    api_key: str = ""
-    dialin_settings: Optional[DailyDialinSettings] = None
+    dialin_settings: DailyDialinSettings | None = None
    transcription_enabled: bool = False
    transcription_settings: DailyTranscriptionSettings = DailyTranscriptionSettings()

@@ -198,36 +198,30 @@ class DailyTransportClient(EventHandler):
    def set_callbacks(self, callbacks: DailyCallbacks):
        self._callbacks = callbacks

-    async def send_message(self, frame: TransportMessageFrame):
-        if not self._client:
-            return
-
-        participant_id = None
-        if isinstance(frame, DailyTransportMessageFrame):
-            participant_id = frame.participant_id
-
+    async def send_message(self, frame: DailyTransportMessageFrame):
        future = self._loop.create_future()
        self._client.send_app_message(
            frame.message,
-            participant_id,
+            frame.participant_id,
            completion=completion_callback(future))
        await future

    async def read_next_audio_frame(self) -> AudioRawFrame | None:
        sample_rate = self._params.audio_in_sample_rate
        num_channels = self._params.audio_in_channels
-        num_frames = int(sample_rate / 100) * 2  # 20ms of audio

-        future = self._loop.create_future()
-        self._speaker.read_frames(num_frames, completion=completion_callback(future))
-        audio = await future
+        if self._other_participant_has_joined:
+            num_frames = int(sample_rate / 100) * 2  # 20ms of audio
+
+            future = self._loop.create_future()
+            self._speaker.read_frames(num_frames, completion=completion_callback(future))
+            audio = await future

-        if len(audio) > 0:
            return AudioRawFrame(audio=audio, sample_rate=sample_rate, num_channels=num_channels)
        else:
-            # If we don't read any audio it could be there's no participant
-            # connected. daily-python will return immediately if that's the
-            # case, so let's sleep for a little bit (i.e. busy wait).
+            # If no one has ever joined the meeting `read_frames()` would block,
+            # instead we just wait a bit. daily-python should probably return
+            # silence instead.
            await asyncio.sleep(0.01)
            return None

@@ -272,7 +266,7 @@ class DailyTransportClient(EventHandler):
                    logger.info(
                        f"Enabling transcription with settings {self._params.transcription_settings}")
                    self._client.start_transcription(
-                        self._params.transcription_settings.model_dump(exclude_none=True))
+                        self._params.transcription_settings.model_dump())

                await self._callbacks.on_joined(data["participants"]["local"])
            else:
@@ -659,15 +653,15 @@ class DailyOutputTransport(BaseOutputTransport):
        await super().cleanup()
        await self._client.cleanup()

-    async def send_message(self, frame: TransportMessageFrame):
+    async def send_message(self, frame: DailyTransportMessageFrame):
        await self._client.send_message(frame)

    async def send_metrics(self, frame: MetricsFrame):
+        ttfb = [{"name": n, "time": t} for n, t in frame.ttfb.items()]
        message = DailyTransportMessageFrame(message={
            "type": "pipecat-metrics",
            "metrics": {
-                "ttfb": frame.ttfb or [],
-                "processing": frame.processing or [],
+                "ttfb": ttfb
            },
        })
        await self._client.send_message(message)
@@ -842,8 +836,8 @@ class DailyTransport(BaseTransport):
                    logger.debug("Event dialin-ready was handled successfully")
            except asyncio.TimeoutError:
                logger.error(f"Timeout handling dialin-ready event ({url})")
-            except Exception as e:
-                logger.exception(f"Error handling dialin-ready event ({url}): {e}")
+            except BaseException as e:
+                logger.error(f"Error handling dialin-ready event ({url}): {e}")

    async def _on_dialin_ready(self, sip_endpoint):
        if self._params.dialin_settings:
--- a/src/pipecat/utils/test_frame_processor.py
+++ b/src/pipecat/utils/test_frame_processor.py
@@ -2,7 +2,7 @@ from typing import List
 from pipecat.processors.frame_processor import FrameProcessor


-class TestException(Exception):
+class TestException(BaseException):
    pass


--- a/src/pipecat/vad/silero.py
+++ b/src/pipecat/vad/silero.py
@@ -33,23 +33,14 @@ _MODEL_RESET_STATES_TIME = 5.0

 class SileroVADAnalyzer(VADAnalyzer):

-    def __init__(
-            self,
-            *,
-            sample_rate: int = 16000,
-            version: str = "v5.0",
-            params: VADParams = VADParams()):
+    def __init__(self, sample_rate=16000, params: VADParams = VADParams()):
        super().__init__(sample_rate=sample_rate, num_channels=1, params=params)

-        if sample_rate != 16000 and sample_rate != 8000:
-            raise ValueError("Silero VAD sample rate needs to be 16000 or 8000")
-
        logger.debug("Loading Silero VAD model...")

-        (self._model, _) = torch.hub.load(repo_or_dir=f"snakers4/silero-vad:{version}",
-                                          model="silero_vad",
-                                          force_reload=False,
-                                          trust_repo=True)
+        (self._model, utils) = torch.hub.load(
+            repo_or_dir="snakers4/silero-vad", model="silero_vad", force_reload=False
+        )

        self._last_reset_time = 0

@@ -60,7 +51,7 @@ class SileroVADAnalyzer(VADAnalyzer):
    #

    def num_frames_required(self) -> int:
-        return 512 if self.sample_rate == 16000 else 256
+        return int(self.sample_rate / 100) * 4  # 40ms

    def voice_confidence(self, buffer) -> float:
        try:
@@ -78,9 +69,9 @@ class SileroVADAnalyzer(VADAnalyzer):
                self._last_reset_time = curr_time

            return new_confidence
-        except Exception as e:
+        except BaseException as e:
            # This comes from an empty audio array
-            logger.exception(f"Error analyzing audio with Silero VAD: {e}")
+            logger.error(f"Error analyzing audio with Silero VAD: {e}")
            return 0


@@ -88,15 +79,12 @@ class SileroVAD(FrameProcessor):

    def __init__(
            self,
-            *,
            sample_rate: int = 16000,
-            version: str = "v5.0",
            vad_params: VADParams = VADParams(),
            audio_passthrough: bool = False):
        super().__init__()

-        self._vad_analyzer = SileroVADAnalyzer(
-            sample_rate=sample_rate, version=version, params=vad_params)
+        self._vad_analyzer = SileroVADAnalyzer(sample_rate=sample_rate, params=vad_params)
        self._audio_passthrough = audio_passthrough

        self._processor_vad_state: VADState = VADState.QUIET
--- a/src/pipecat/vad/vad_analyzer.py
+++ b/src/pipecat/vad/vad_analyzer.py
@@ -28,7 +28,7 @@ class VADParams(BaseModel):

 class VADAnalyzer:

-    def __init__(self, *, sample_rate: int, num_channels: int, params: VADParams):
+    def __init__(self, sample_rate: int, num_channels: int, params: VADParams):
        self._sample_rate = sample_rate
        self._num_channels = num_channels
        self._params = params
--- a/tests/integration/integration_openai_llm.py
+++ b/tests/integration/integration_openai_llm.py
@@ -8,6 +8,8 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.frames.frames import (
    LLMFullResponseStartFrame,
    LLMFullResponseEndFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
    TextFrame
 )
 from pipecat.utils.test_frame_processor import TestFrameProcessor
@@ -62,7 +64,7 @@ if __name__ == "__main__":
        llm.register_function("get_current_weather", get_weather_from_api)
        t = TestFrameProcessor([
            LLMFullResponseStartFrame,
-            TextFrame,
+            [LLMResponseStartFrame, TextFrame, LLMResponseEndFrame],
            LLMFullResponseEndFrame
        ])
        llm.link(t)
@@ -96,7 +98,7 @@ if __name__ == "__main__":
        llm.register_function("get_current_weather", get_weather_from_api)
        t = TestFrameProcessor([
            LLMFullResponseStartFrame,
-            TextFrame,
+            [LLMResponseStartFrame, TextFrame, LLMResponseEndFrame],
            LLMFullResponseEndFrame
        ])
        llm.link(t)
@@ -119,7 +121,7 @@ if __name__ == "__main__":
        api_key = os.getenv("OPENAI_API_KEY")
        t = TestFrameProcessor([
            LLMFullResponseStartFrame,
-            TextFrame,
+            [LLMResponseStartFrame, TextFrame, LLMResponseEndFrame],
            LLMFullResponseEndFrame
        ])
        llm = OpenAILLMService(
--- a/tests/test_ai_services.py
+++ b/tests/test_ai_services.py
@@ -2,8 +2,8 @@ import unittest

 from typing import AsyncGenerator

-from pipecat.services.ai_services import AIService, match_endofsentence
-from pipecat.frames.frames import EndFrame, Frame, TextFrame
+from pipecat.services.ai_services import AIService
+from pipecat.pipeline.frames import EndFrame, Frame, TextFrame


 class SimpleAIService(AIService):
@@ -27,22 +27,6 @@ class TestBaseAIService(unittest.IsolatedAsyncioTestCase):

        self.assertEqual(input_frames, output_frames)

-    async def test_endofsentence(self):
-        assert match_endofsentence("This is a sentence.")
-        assert match_endofsentence("This is a sentence! ")
-        assert match_endofsentence("This is a sentence?")
-        assert match_endofsentence("This is a sentence:")
-        assert not match_endofsentence("This is not a sentence")
-        assert not match_endofsentence("This is not a sentence,")
-        assert not match_endofsentence("This is not a sentence, ")
-        assert not match_endofsentence("Ok, Mr. Smith let's ")
-        assert not match_endofsentence("Dr. Walker, I presume ")
-        assert not match_endofsentence("Prof. Walker, I presume ")
-        assert not match_endofsentence("zweitens, und 3.")
-        assert not match_endofsentence("Heute ist Dienstag, der 3.")  # 3. Juli 2024
-        assert not match_endofsentence("America, or the U.")  # U.S.A.
-        assert not match_endofsentence("It still early, it's 3:00 a.")  # 3:00 a.m.
-

 if __name__ == "__main__":
    unittest.main()
Author	SHA1	Message	Date
Jon Taylor	5bd5d22270	removed space from event handler	2024-06-26 18:30:56 +01:00
Jon Taylor	6ee7932337	added pause to start and new intro prompt	2024-06-26 18:24:14 +01:00
Jon Taylor	c407445dd1	removed header comment from bot runner	2024-06-24 17:35:26 +01:00
Jon Taylor	447f37167e	added VAD stop seconds env	2024-06-24 17:34:25 +01:00
Jon Taylor	354c21500e	prompt tweaks	2024-06-24 17:28:10 +01:00
Jon Taylor	5728e25b5a	added fastbot example	2024-06-24 16:25:36 +01:00