claude sonnet 3.5 ice cream benchmark

fix typo
sonic ice cream ttft
2024-06-22 10:27:01 -04:00 · 2024-06-20 21:05:09 -04:00 · 2024-06-20 20:57:47 -04:00 · 2024-06-20 19:33:33 -04:00 · 2024-06-20 19:33:33 -04:00 · 2024-06-20 19:33:33 -04:00
32 changed files with 413 additions and 1440 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,34 +5,14 @@ All notable changes to **pipecat** will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

-## [0.0.32] - 2024-06-22
+## [Unreleased]

 ### Added

- Allow specifying a `DeepgramSTTService` url which allows using on-prem
-  Deepgram.
-
- Added new `FastAPIWebsocketTransport`. This is a new websocket transport that
-  can be integrated with FastAPI websockets.
-
- Added new `TwilioFrameSerializer`. This is a new serializer that knows how to
-  serialize and deserialize audio frames from Twilio.
-
- Added Daily transport event: `on_dialout_answered`.  See
-  https://reference-python.daily.co/api_reference.html#daily.EventHandler
-
 - Added new `AzureSTTService`. This allows you to use Azure Speech-To-Text.

-### Performance
-
- Convert `BaseOutputTransport` and `BaseOutputTransport` to fully use asyncio
-  and remove the use of threads.
-
 ### Other

- Added `twilio-chatbot`. This is an example that shows how to integrate Twilio
-  phone numbers with a Pipecat bot.
-
 - Updated `07f-interruptible-azure.py` to use `AzureLLMService`,
  `AzureSTTService` and `AzureTTSService`.

--- a/examples/README.md
+++ b/examples/README.md
@@ -32,15 +32,14 @@ Next, follow the steps in the README for each demo.

 ## Projects:

-| Project                                      | Description                                                                                                                                | Services                                                          |
-|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
-| [Simple Chatbot](simple-chatbot)             | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework.                                       | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI            |
-| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience.                                            | Deepgram, ElevenLabs, OpenAI, Fal, Daily, Custom UI               |
-| [Translation Chatbot](translation-chatbot)   | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI                 |
-| [Moondream Chatbot](moondream-chatbot)       | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU**                                                       | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
-| [Patient intake](patient-intake)             | A chatbot that can call functions in response to user input.                                                                               | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI            |
-| [Dialin Chatbot](dialin-chatbot)             | A chatbot that connects to an incoming phone call from Daily or Twilio.                                                                    | Deepgram, ElevenLabs, OpenAI, Daily, Twilio                       |
-| [Twilio Chatbot](twilio-chatbot)             | A chatbot that connects to an incoming phone call from Twilio.                                                                             | Deepgram, ElevenLabs, OpenAI, Daily, Twilio                       |
+| Project                                      | Description                                                                                                                                | Services                                       |
+| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
+| [Simple Chatbot](simple-chatbot)             | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework.                                       | Deepgram, OpenAI, Daily, Daily Prebuilt UI            |
+| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience.                                            | Deepgram, ElevenLabs, Open AI, Fal, Daily, Custom UI  |
+| [Translation Chatbot](translation-chatbot)   | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI     |
+| [Moondream Chatbot](moondream-chatbot)       | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU**                                                       | Deepgram, OpenAI, Moondream, Daily, Daily Prebuilt UI |
+| Function-calling Chatbot (TBC)               | A chatbot that can call functions in response to user input.                                                                                | Deepgram, OpenAI, Fireworks, Daily, Daily Prebuilt UI |
+| [Dialin Chatbot](dialin-chatbot)             | A chatbot that connects to an incoming phone call from Daily or Twilio.                                                                                | Deepgram, OpenAI, ElevenLabs, Daily, Twilio |

 > [!IMPORTANT]
 > These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.
--- a/examples/fast-chatbot/.gitignore
+++ b/examples/fast-chatbot/.gitignore
@@ -1,165 +0,0 @@
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/#use-with-ide
-.pdm.toml
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
-runpod.toml
-
-# custom script to recursively upgrade items in requirements.py
-upgrade_requirements.py
-.DS_Store
--- a/examples/fast-chatbot/README.md
+++ b/examples/fast-chatbot/README.md
--- a/examples/fast-chatbot/bot.py
+++ b/examples/fast-chatbot/bot.py
@@ -1,164 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-from loguru import logger
-import argparse
-import asyncio
-import aiohttp
-import os
-import sys
-import time
-from typing import Optional
-
-from pydantic import BaseModel, ValidationError
-
-from pipecat.vad.vad_analyzer import VADParams
-from pipecat.vad.silero import SileroVADAnalyzer
-from pipecat.transports.services.daily import DailyParams, DailyTransport
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.deepgram import DeepgramSTTService
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.frames.frames import LLMMessagesFrame, EndFrame
-
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator, LLMUserResponseAggregator
-)
-
-from helpers import (
-    ClearableDeepgramTTSService,
-    AudioVolumeTimer,
-    TranscriptionTimingLogger
-)
-
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level=os.getenv("LOG_LEVEL", "DEBUG"))
-
-
-class BotSettings(BaseModel):
-    room_url: str
-    room_token: str
-    bot_name: str = "Pipecat"
-    prompt: Optional[str] = "You are a helpful assistant."
-    deepgram_api_key: Optional[str] = os.getenv("DEEPGRAM_API_KEY", None)
-    deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE", "aura-asteria-en")
-    deepgram_tts_base_url: Optional[str] = os.getenv(
-        "DEEPGRAM_TTS_BASE_URL", "https://api.deepgram.com/v1/speak")
-    deepgram_stt_base_url: Optional[str] = os.getenv(
-        "DEEPGRAM_STT_BASE_URL", "https://api.deepgram.com/v1/speak")
-    openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY", None),
-    openai_model: Optional[str] = os.getenv("OPENAI_MODEL", None),
-    openai_base_url: Optional[str] = os.getenv("OPENAI_BASE_URL", None)
-    vad_stop_secs: Optional[float] = os.getenv("VAD_STOP_SECS", 0.200)
-
-
-async def main(settings: BotSettings):
-    async with aiohttp.ClientSession() as session:
-        transport = DailyTransport(
-            settings.room_url,
-            settings.room_token,
-            settings.bot_name,
-            DailyParams(
-                audio_out_enabled=True,
-                transcription_enabled=False,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(params=VADParams(
-                    stop_secs=settings.vad_stop_secs
-                )),
-                vad_audio_passthrough=True
-            )
-        )
-
-        stt = DeepgramSTTService(
-            name="STT",
-            api_key=settings.deepgram_api_key,
-            url=settings.deepgram_stt_base_url
-        )
-
-        tts = ClearableDeepgramTTSService(
-            name="Voice",
-            aiohttp_session=session,
-            api_key=settings.deepgram_api_key,
-            voice=settings.deepgram_voice,
-            **({'base_url': url} if (url := settings.deepgram_tts_base_url) else {})
-        )
-
-        llm = OpenAILLMService(
-            name="LLM",
-            api_key=settings.openai_api_key,
-            model=settings.openai_model,
-            base_url=settings.openai_base_url,
-        )
-
-        messages = [
-            {
-                "role": "system",
-                "content": settings.prompt,
-            },
-        ]
-
-        avt = AudioVolumeTimer()
-        tl = TranscriptionTimingLogger(avt)
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        pipeline = Pipeline([
-            transport.input(),   # Transport user input
-            avt,                 # Audio volume timer
-            stt,                 # Speech-to-text
-            tl,                  # Transcription timing logger
-            tma_in,              # User responses
-            llm,                 # LLM
-            tts,                 # TTS
-            transport.output(),  # Transport bot output
-            tma_out,             # Assistant spoken responses
-        ])
-
-        task = PipelineTask(
-            pipeline,
-            PipelineParams(
-                allow_interruptions=True,
-                enable_metrics=True,
-                report_only_initial_ttfb=True
-            ))
-
-        # When the participant leaves, we exit the bot.
-        @transport.event_handler("on_participant_left")
-        async def on_participant_left(transport, participant, reason):
-            await task.queue_frame(EndFrame())
-
-        # When the first participant joins, the bot should introduce itself.
-        @transport.event_handler("on_first_participant_joined")
-        async def on_first_participant_joined(transport, participant):
-            # Provide some air whilst tracks subscribe
-            time.sleep(2)
-            messages.append(
-                {
-                    "role": "system",
-                    "content": "Briefly introduce yourself by saying 'hello, I'm FastBot, how can I help you today?'"})
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        runner = PipelineRunner()
-        await runner.run(task)
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="Pipecat Bot")
-    parser.add_argument("-s", "--settings", type=str, required=True, help="Pipecat bot settings")
-
-    args, unknown = parser.parse_known_args()
-
-    try:
-        settings = BotSettings.model_validate_json(args.settings)
-        asyncio.run(main(settings))
-    except ValidationError as e:
-        print(e)
--- a/examples/fast-chatbot/bot_runner.py
+++ b/examples/fast-chatbot/bot_runner.py
@@ -1,164 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import os
-import argparse
-import subprocess
-
-from pydantic import BaseModel, ValidationError
-from typing import Optional
-
-from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams
-
-from fastapi import FastAPI, Request, HTTPException
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import JSONResponse
-
-from bot import BotSettings
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-
-# ------------ Configuration ------------ #
-
-MAX_SESSION_TIME = 5 * 60  # 5 minutes
-REQUIRED_ENV_VARS = ['DAILY_API_URL', 'DAILY_API_KEY', 'DEEPGRAM_API_KEY']
-
-daily_rest_helper = DailyRESTHelper(
-    os.getenv("DAILY_API_KEY", ""),
-    os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))
-
-
-class RunnerSettings(BaseModel):
-    prompt: Optional[
-        str] = "You are a fast, low-latency chatbot. Your goal is to demonstrate voice-driven AI capabilities at human-like speeds. When introducing yourself briefly mention your goal is to showcase speed and conversational flow. The technology powering you is Daily for transport, Cerebrium for GPU hosting, Llama 3 (8-B version) LLM, and Deepgram for speech-to-text and text-to-speech. You are hosted on the east coast of the United States. Respond to what the user said in a creative and helpful way, but keep responses short and legible. Ensure responses contain only words. Check again that you have not included special characters other than '?' or '!'."
-    deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE")
-    openai_model: Optional[str] = os.getenv("OPENAI_MODEL", "gpt-4o")
-    openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY")
-    test: Optional[bool] = None
-
-# ----------------- API ----------------- #
-
-
-app = FastAPI()
-
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"]
-)
-
-# ----------------- Main ----------------- #
-
-
-@app.post("/start_bot")
-async def start_bot(request: Request) -> JSONResponse:
-    runner_settings = RunnerSettings()
-    try:
-        request_body = await request.body()
-        if len(request_body) > 0:
-            runner_settings = RunnerSettings.model_validate_json(request_body)
-    except ValidationError as e:
-        raise HTTPException(
-            status_code=400,
-            detail=f"Invalid request: {e}")
-    except Exception as e:
-        # If no data in request, pass
-        pass
-
-    # Is this a webhook creation request?
-    if runner_settings.test is not None:
-        return JSONResponse({"test": True})
-
-    # Use specified room URL, or create a new one if not specified
-    room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")
-
-    if not room_url:
-        params = DailyRoomParams(
-            properties=DailyRoomProperties()
-        )
-        try:
-            room: DailyRoomObject = daily_rest_helper.create_room(params=params)
-        except Exception as e:
-            raise HTTPException(
-                status_code=500,
-                detail=f"Unable to provision room {e}")
-    else:
-        # Check passed room URL exists, we should assume that it already has a sip set up
-        try:
-            room: DailyRoomObject = daily_rest_helper.get_room_from_url(room_url)
-        except Exception:
-            raise HTTPException(
-                status_code=500, detail=f"Room not found: {room_url}")
-
-    # Give the agent a token to join the session
-    token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
-
-    if not room or not token:
-        raise HTTPException(
-            status_code=500, detail=f"Failed to get token for room: {room_url}")
-
-    # Spawn a new agent, and join the user session
-    try:
-        bot_settings = BotSettings(
-            room_url=room.url,
-            room_token=token,
-            prompt=runner_settings.prompt,
-            deepgram_voice=runner_settings.deepgram_voice,
-            openai_model=runner_settings.openai_model,
-            openai_api_key=runner_settings.openai_api_key,
-        )
-        bot_settings_str = bot_settings.model_dump_json(exclude_none=True)
-
-        subprocess.Popen(
-            [f"python3 -m bot -s '{bot_settings_str}'"],
-            shell=True,
-            bufsize=1,
-            cwd=os.path.dirname(os.path.abspath(__file__)))
-    except Exception as e:
-        raise HTTPException(
-            status_code=500, detail=f"Failed to start subprocess: {e}")
-
-    # Grab a token for the user to join with
-    user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
-
-    return JSONResponse({
-        "room_url": room.url,
-        "token": user_token,
-    })
-
-
-if __name__ == "__main__":
-    # Check environment variables
-    for env_var in REQUIRED_ENV_VARS:
-        if env_var not in os.environ:
-            raise Exception(f"Missing environment variable: {env_var}.")
-
-    parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
-    parser.add_argument("--host", type=str,
-                        default=os.getenv("HOST", "0.0.0.0"), help="Host address")
-    parser.add_argument("--port", type=int,
-                        default=os.getenv("PORT", 7860), help="Port number")
-    parser.add_argument("--reload", action="store_true",
-                        default=True, help="Reload code on change")
-
-    config = parser.parse_args()
-
-    try:
-        import uvicorn
-
-        uvicorn.run(
-            "bot_runner:app",
-            host=config.host,
-            port=config.port,
-            reload=config.reload
-        )
-
-    except KeyboardInterrupt:
-        print("Pipecat runner shutting down...")
--- a/examples/fast-chatbot/env.example
+++ b/examples/fast-chatbot/env.example
@@ -1,12 +0,0 @@
-DAILY_SAMPLE_ROOM_URL= #optional: use the same room each time, or create a new one if unset
-DAILY_API_KEY=
-DAILY_API_URL=
-
-DEEPGRAM_API_KEY=
-DEEPGRAM_VOICE=
-DEEPGRAM_STT_URL=
-DEEPGRAM_TTS_BASE_URL=
-
-OPENAI_API_KEY=
-OPENAI_MODEL=
-OPENAI_BASE_URL=
--- a/examples/fast-chatbot/requirements.txt
+++ b/examples/fast-chatbot/requirements.txt
@@ -1,6 +0,0 @@
-pipecat-ai[daily,openai,silero,deepgram]
-fastapi
-uvicorn
-requests
-python-dotenv
-loguru
--- a/examples/foundational/tmp-khk-sonnet-ttft.py
+++ b/examples/foundational/tmp-khk-sonnet-ttft.py
@@ -0,0 +1,114 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.frames.frames import LLMMessagesFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator, LLMUserResponseAggregator)
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.anthropic import AnthropicLLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.vad.silero import SileroVADAnalyzer
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer()
+            )
+        )
+
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            voice_name=sys.argv[1] if len(sys.argv) > 1 else "British Lady"
+        )
+
+        llm = AnthropicLLMService(
+            api_key=os.getenv("ANTHROPIC_API_KEY"),
+            model="claude-3-5-sonnet-20240620",
+            temperature=1.0
+        )
+
+        # todo: think more about how to handle system prompts in a more general way. OpenAI,
+        # Google, and Anthropic all have slightly different approaches to providing a system
+        # prompt.
+        messages = [
+            {
+                "role": "system",
+                "content": (
+                    "You are participating in a friendly competition to invent creative "
+                    "new ice cream flavors. Say the craziest flavor you can think of "
+                    "then wait for your opponent to come up with a different crazy flavor. "
+                    "then respond with another flavor idea. Repeat forever. Say only the "
+                    "ice cream flavors and nothing else. End each ice cream flavor statement "
+                    "with an exclamation mark! Go ..."
+                )
+            },
+        ]
+
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
+
+        pipeline = Pipeline([
+            transport.input(),   # Transport user input
+            tma_in,              # User responses
+            llm,                 # LLM
+            tts,                 # TTS
+            transport.output(),  # Transport bot output
+            tma_out,             # Assistant spoken responses
+        ])
+
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
+
+        # When a participant joins, start transcription for that participant so the
+        # bot can "hear" and respond to them.
+        @ transport.event_handler("on_participant_joined")
+        async def on_participant_joined(transport, participant):
+            transport.capture_participant_transcription(participant["id"])
+
+        # When the first participant joins, the bot should introduce itself.
+        @ transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
+
+# '{"action":"app-message","data":{"metrics":{"ttfb":[{"name":"AnthropicLLMService#0","time":0.5975627899169922}]},"type":"pipecat-metrics"},"fromId":"592d3489-90ba-401d-a760-c1a863d64a4a","callFrameId":"17189290998160.035120590426112264"}'
+# [Durian and Limburger Cheese Charcoal Activated Tar Twist!]
+# [Fermented Fish Sauce and Ghost Pepper Bubblegum Cotton Candy Nightmare!]
+# [Spoiled Yogurt and Ghost Pepper Gummy Bear Blizzard!]
+# [Matcha Green Tea and Sour Gummy Worm Fusion!]
--- a/examples/foundational/tmp-khk.py
+++ b/examples/foundational/tmp-khk.py
@@ -1,28 +1,50 @@
-from loguru import logger
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
 import asyncio
-import math
-import struct
-import time
+import aiohttp
+import os
+import sys
+import json
 from dataclasses import dataclass, field
 from typing import List

-
-from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.frames.frames import (
    Frame,
-    AudioRawFrame,
-    InterimTranscriptionFrame,
-    TranscriptionFrame,
    TextFrame,
+    LLMMessagesFrame,
+    TranscriptionFrame,
+    InterimTranscriptionFrame,
+    AudioRawFrame,
    StartInterruptionFrame,
+    StopInterruptionFrame,
    LLMFullResponseStartFrame,
-    TTSStoppedFrame,
-    MetricsFrame
+    TTSStoppedFrame
 )
-
-from pipecat.vad.vad_analyzer import VADAnalyzer, VADState
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.processors.logger import FrameLogger
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.services.deepgram import DeepgramTTSService
-from pipecat.services.openai import OpenAILLMContext, OpenAILLMContextFrame
+from pipecat.services.openai import OpenAILLMService, OpenAILLMContext, OpenAILLMContextFrame
+from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyTransportMessageFrame
+from pipecat.vad.silero import SileroVADAnalyzer
+from pipecat.vad.vad_analyzer import VADAnalyzer, VADParams, VADState
+
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")


 class GreedyLLMAggregator(FrameProcessor):
@@ -118,9 +140,6 @@ class VADGate(FrameProcessor):
                    logger.debug(f"expected a text frame, but received {frame}")
                    await self.push_frame(frame, direction)
                return
-            else:
-                if isinstance(frame, TextFrame):
-                    logger.error(f"XXXXXXXXXXXXXXXXXXX received a text frame, wasn't expecting it.")

            if isinstance(frame, AudioRawFrame):
                # if our buffer is empty or has a "finished" sentence at the end,
@@ -192,8 +211,11 @@ class VADGate(FrameProcessor):
                await asyncio.sleep(duration - 20 / 1000)
                if self.context:
                    logger.debug(f"Appending assistant message to context: [{s.text_frame.text}]")
-                    self.context.messages.append(
-                        {"role": "assistant", "content": s.text_frame.text}
+                    if self.context.messages and self.context.messages[-1]["role"] == "assistant":
+                        self.context.messages[-1]["content"] += " " + s.text_frame.text
+                    else:
+                        self.context.messages.append(
+                            {"role": "assistant", "content": s.text_frame.text}
                    )
                await self.push_frame(s.text_frame)

@@ -201,67 +223,102 @@ class VADGate(FrameProcessor):
            logger.debug(f"Exception {e}")


-class TranscriptionTimingLogger(FrameProcessor):
-    def __init__(self, avt):
-        super().__init__()
-        self.name = "Transcription"
-        self._avt = avt
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5))
+            )
+        )

-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        try:
-            await super().process_frame(frame, direction)
-            if isinstance(frame, TranscriptionFrame):
-                elapsed = time.time() - self._avt.last_transition_ts
-                logger.debug(f"Transcription TTF: {elapsed}")
-                await self.push_frame(MetricsFrame(ttfb={self.name: elapsed}))
+        tts = ClearableDeepgramTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("DEEPGRAM_API_KEY"),
+            voice="aura-asteria-en",
+            # base_url="http://0.0.0.0:8080/v1/speak"
+        )

-            await self.push_frame(frame, direction)
-        except Exception as e:
-            logger.debug(f"Exception {e}")
+        llm = OpenAILLMService(
+            # To use OpenAI
+            api_key=os.getenv("OPENAI_API_KEY"),
+            model="gpt-4o"
+            # Or, to use a local vLLM (or similar) api server
+            # model="meta-llama/Meta-Llama-3-8B-Instruct",
+            # model="neuralmagic/Meta-Llama-3-70B-Instruct-FP8",
+            # base_url="http://0.0.0.0:8000/v1"
+        )
+
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM communicating via audio. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        ctx = OpenAILLMContext()
+        greedy = GreedyLLMAggregator(name="greedy", context=ctx)
+        gate = VADGate(name="gate", vad_analyzer=transport.input().vad_analyzer(), context=ctx)
+
+        pipeline = Pipeline([
+            transport.input(),   # Transport user input
+            greedy,
+            llm,                 # LLM
+            tts,                 # TTS
+            gate,
+            transport.output(),  # Transport bot output
+            # FrameLogger()
+        ])
+
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
+
+        # When a participant joins, start transcription for that participant so the
+        # bot can "hear" and respond to them.
+        @ transport.event_handler("on_participant_joined")
+        async def on_participant_joined(transport, participant):
+            transport.capture_participant_transcription(participant["id"])
+
+        # When the first participant joins, the bot should introduce itself.
+        @ transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            messages.append(
+                {"role": "system", "content": "Please introduce yourself to the user."})
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        # Handle "latency-ping" messages. The client will send app messages that look like
+        # this:
+        #   { "latency-ping": { ts: <client-side timestamp> }}
+        #
+        # We want to send an immediate pong back to the client from this handler function.
+        # Also, we will push a frame into the top of the pipeline and send it after the
+        #
+        @ transport.event_handler("on_app_message")
+        async def on_app_message(transport, message, sender):
+            try:
+                if "latency-ping" in message:
+                    logger.debug(f"Received latency ping app message: {message}")
+                    ts = message["latency-ping"]["ts"]
+                    # Send immediately
+                    transport.output().send_message(DailyTransportMessageFrame(
+                        message={"latency-pong-msg-handler": {"ts": ts}},
+                        participant_id=sender))
+                    # And push to the pipeline for the Daily transport.output to send
+                    await tma_in.push_frame(
+                        DailyTransportMessageFrame(
+                            message={"latency-pong-pipeline-delivery": {"ts": ts}},
+                            participant_id=sender))
+            except Exception as e:
+                logger.debug(f"message handling error: {e} - {message}")
+
+        runner = PipelineRunner()
+        await runner.run(task)


-class AudioVolumeTimer(FrameProcessor):
-    def __init__(self):
-        super().__init__()
-        self.last_transition_ts = 0
-        self._prev_volume = -80
-        self._speech_volume_threshold = -50
-
-    async def process_frame(self, frame: Frame, direction: FrameDirection):
-        await super().process_frame(frame, direction)
-
-        if isinstance(frame, AudioRawFrame):
-            volume = self.calculate_volume(frame)
-            # print(f"Audio volume: {volume:.2f} dB")
-            if (volume >= self._speech_volume_threshold and
-                    self._prev_volume < self._speech_volume_threshold):
-                # logger.debug("transition above speech volume threshold")
-                self.last_transition_ts = time.time()
-            elif (volume < self._speech_volume_threshold and
-                    self._prev_volume >= self._speech_volume_threshold):
-                # logger.debug("transition below non-speech volume threshold")
-                self.last_transition_ts = time.time()
-            self._prev_volume = volume
-
-        await self.push_frame(frame, direction)
-
-    def calculate_volume(self, frame: AudioRawFrame) -> float:
-        if frame.num_channels != 1:
-            raise ValueError(f"Expected 1 channel, got {frame.num_channels}")
-
-        # Unpack audio data into 16-bit integers
-        fmt = f"{len(frame.audio) // 2}h"
-        audio_samples = struct.unpack(fmt, frame.audio)
-
-        # Calculate RMS
-        sum_squares = sum(sample**2 for sample in audio_samples)
-        rms = math.sqrt(sum_squares / len(audio_samples))
-
-        # Convert RMS to decibels (dB)
-        # Reference: maximum value for 16-bit audio is 32767
-        if rms > 0:
-            db = 20 * math.log10(rms / 32767)
-        else:
-            db = -96  # Minimum value (almost silent)
-
-        return db
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/examples/twilio-chatbot/.gitignore
+++ b/examples/twilio-chatbot/.gitignore
@@ -1,161 +0,0 @@
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/#use-with-ide
-.pdm.toml
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
-runpod.toml
--- a/examples/twilio-chatbot/Dockerfile
+++ b/examples/twilio-chatbot/Dockerfile
@@ -1,20 +0,0 @@
-# Use an official Python runtime as a parent image
-FROM python:3.10-bullseye
-
-# Set the working directory in the container
-WORKDIR /twilio-chatbot
-
-# Copy the requirements file into the container
-COPY requirements.txt .
-
-# Install any needed packages specified in requirements.txt
-RUN pip install --no-cache-dir -r requirements.txt
-
-# Copy the current directory contents into the container
-COPY . .
-
-# Expose the desired port
-EXPOSE 8765
-
-# Run the application
-CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8765"]
--- a/examples/twilio-chatbot/README.md
+++ b/examples/twilio-chatbot/README.md
@@ -1,82 +0,0 @@
-# Twilio Chatbot
-
-This project is a FastAPI-based chatbot that integrates with Twilio to handle WebSocket connections and provide real-time communication. The project includes endpoints for starting a call and handling WebSocket connections.
-
-## Table of Contents
-
- [Features](#features)
- [Requirements](#requirements)
- [Installation](#installation)
- [Configure Twilio URLs](#configure-twilio-urls)
- [Running the Application](#running-the-application)
- [Usage](#usage)
-
-## Features
-
- **FastAPI**: A modern, fast (high-performance), web framework for building APIs with Python 3.6+.
- **WebSocket Support**: Real-time communication using WebSockets.
- **CORS Middleware**: Allowing cross-origin requests for testing.
- **Dockerized**: Easily deployable using Docker.
-
-## Requirements
-
- Python 3.10
- Docker (for containerized deployment)
- ngrok (for tunneling)
- Twilio Account
-
-## Installation
-
-1. **Set up a virtual environment** (optional but recommended):
-    ```sh
-    python -m venv venv
-    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
-    ```
-
-2. **Install dependencies**:
-    ```sh
-    pip install -r requirements.txt
-    ```
-
-3. **Create .env**:
-    create .env based on env.example
-
-4. **Install ngrok**:
-    Follow the instructions on the [ngrok website](https://ngrok.com/download) to download and install ngrok.
-
-## Configure Twilio URLs
-
-1. **Update the Twilio Webhook**:
-    Copy the ngrok URL and update your Twilio phone number webhook URL to `http://<ngrok_url>/start_call`.
-
-2. **Update the streams.xml**:
-    Copy the ngrok URL and update templates/streams.xml with `wss://<ngrok_url>/ws`.
-
-## Running the Application
-
-### Using Python
-
-1. **Run the FastAPI application**:
-    ```sh
-    python server.py
-    ```
-
-2. **Start ngrok**:
-    In a new terminal, start ngrok to tunnel the local server:
-    ```sh
-    ngrok http 8765
-    ```
-### Using Docker
-
-1. **Build the Docker image**:
-    ```sh
-    docker build -t twilio-chatbot .
-    ```
-
-2. **Run the Docker container**:
-    ```sh
-    docker run -it --rm -p 8765:8765 twilio-chatbot
-    ```
-## Usage
-
-To start a call, simply make a call to your Twilio phone number. The webhook URL will direct the call to your FastAPI application, which will handle it accordingly.
--- a/examples/twilio-chatbot/bot.py
+++ b/examples/twilio-chatbot/bot.py
@@ -1,88 +0,0 @@
-import aiohttp
-import os
-import sys
-
-from pipecat.frames.frames import EndFrame, LLMMessagesFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_response import (
-    LLMAssistantResponseAggregator,
-    LLMUserResponseAggregator
-)
-from pipecat.services.openai import OpenAILLMService
-from pipecat.services.deepgram import DeepgramSTTService
-from pipecat.services.elevenlabs import ElevenLabsTTSService
-from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams
-from pipecat.vad.silero import SileroVADAnalyzer
-
-from loguru import logger
-
-from dotenv import load_dotenv
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-async def run_bot(websocket_client):
-    async with aiohttp.ClientSession() as session:
-        transport = FastAPIWebsocketTransport(
-            websocket=websocket_client,
-            params=FastAPIWebsocketParams(
-                audio_out_enabled=True,
-                add_wav_header=False,
-                vad_enabled=True,
-                vad_analyzer=SileroVADAnalyzer(),
-                vad_audio_passthrough=True
-            )
-        )
-
-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o")
-
-        stt = DeepgramSTTService(api_key=os.getenv('DEEPGRAM_API_KEY'))
-
-        tts = ElevenLabsTTSService(
-            aiohttp_session=session,
-            api_key=os.getenv("ELEVENLABS_API_KEY"),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
-        )
-
-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in an audio call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        tma_in = LLMUserResponseAggregator(messages)
-        tma_out = LLMAssistantResponseAggregator(messages)
-
-        pipeline = Pipeline([
-            transport.input(),   # Websocket input from client
-            stt,                 # Speech-To-Text
-            tma_in,              # User responses
-            llm,                 # LLM
-            tts,                 # Text-To-Speech
-            transport.output(),  # Websocket output to client
-            tma_out              # LLM responses
-        ])
-
-        task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
-
-        @transport.event_handler("on_client_connected")
-        async def on_client_connected(transport, client):
-            # Kick off the conversation.
-            messages.append(
-                {"role": "system", "content": "Please introduce yourself to the user."})
-            await task.queue_frames([LLMMessagesFrame(messages)])
-
-        @transport.event_handler("on_client_disconnected")
-        async def on_client_disconnected(transport, client):
-            await task.queue_frames([EndFrame()])
-
-        runner = PipelineRunner(handle_sigint=False)
-
-        await runner.run(task)
--- a/examples/twilio-chatbot/env.example
+++ b/examples/twilio-chatbot/env.example
@@ -1,4 +0,0 @@
-OPENAI_API_KEY=
-DEEPGRAM_API_KEY=
-ELEVENLABS_API_KEY=
-ELEVENLABS_VOICE_ID=
--- a/examples/twilio-chatbot/requirements.txt
+++ b/examples/twilio-chatbot/requirements.txt
@@ -1,5 +0,0 @@
-pipecat-ai[daily,openai,silero,deepgram]
-fastapi
-uvicorn
-python-dotenv
-loguru
--- a/examples/twilio-chatbot/server.py
+++ b/examples/twilio-chatbot/server.py
@@ -1,34 +0,0 @@
-import uvicorn
-
-from fastapi import FastAPI, WebSocket
-from fastapi.middleware.cors import CORSMiddleware
-from starlette.responses import HTMLResponse
-
-from bot import run_bot
-
-app = FastAPI()
-
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],  # Allow all origins for testing
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-
-
-@app.post('/start_call')
-async def start_call():
-    print("POST TwiML")
-    return HTMLResponse(content=open("templates/streams.xml").read(), media_type="application/xml")
-
-
-@app.websocket("/ws")
-async def websocket_endpoint(websocket: WebSocket):
-    await websocket.accept()
-    print("WebSocket connection accepted")
-    await run_bot(websocket)
-
-
-if __name__ == "__main__":
-    uvicorn.run(app, host="0.0.0.0", port=8765)
--- a/examples/twilio-chatbot/templates/streams.xml
+++ b/examples/twilio-chatbot/templates/streams.xml
@@ -1,7 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<Response>
-  <Connect>
-    <Stream url="wss://<your server url>/ws"></Stream>
-  </Connect>
-  <Pause length="40"/>
-</Response>
--- a/linux-py3.10-requirements.txt
+++ b/linux-py3.10-requirements.txt
@@ -26,8 +26,6 @@ anyio==4.4.0
    #   anthropic
    #   httpx
    #   openai
-    #   starlette
-    #   watchfiles
 async-timeout==4.0.3
    # via
    #   aiohttp
@@ -56,15 +54,12 @@ cffi==1.16.0
 charset-normalizer==3.3.2
    # via requests
 click==8.1.7
-    # via
-    #   flask
-    #   typer
-    #   uvicorn
+    # via flask
 coloredlogs==15.0.1
    # via onnxruntime
 ctranslate2==4.3.1
    # via faster-whisper
-daily-python==0.10.0
+daily-python==0.9.1
    # via pipecat-ai (pyproject.toml)
 dataclasses-json==0.6.7
    # via
@@ -76,25 +71,17 @@ distro==1.9.0
    # via
    #   anthropic
    #   openai
-dnspython==2.6.1
-    # via email-validator
 einops==0.8.0
    # via pipecat-ai (pyproject.toml)
-email-validator==2.2.0
-    # via fastapi
 exceptiongroup==1.2.1
    # via
    #   anyio
    #   pytest
 fal-client==0.4.0
    # via pipecat-ai (pyproject.toml)
-fastapi==0.111.0
-    # via pipecat-ai (pyproject.toml)
-fastapi-cli==0.0.4
-    # via fastapi
 faster-whisper==1.0.2
    # via pipecat-ai (pyproject.toml)
-filelock==3.15.3
+filelock==3.15.1
    # via
    #   huggingface-hub
    #   pyht
@@ -126,7 +113,7 @@ google-api-core[grpc]==2.19.0
    #   google-ai-generativelanguage
    #   google-api-python-client
    #   google-generativeai
-google-api-python-client==2.134.0
+google-api-python-client==2.133.0
    # via google-generativeai
 google-auth==2.30.0
    # via
@@ -153,29 +140,24 @@ grpcio==1.64.1
 grpcio-status==1.62.2
    # via google-api-core
 h11==0.14.0
-    # via
-    #   httpcore
-    #   uvicorn
+    # via httpcore
 httpcore==1.0.5
    # via httpx
 httplib2==0.22.0
    # via
    #   google-api-python-client
    #   google-auth-httplib2
-httptools==0.6.1
-    # via uvicorn
 httpx==0.27.0
    # via
    #   anthropic
    #   cartesia
    #   deepgram-sdk
    #   fal-client
-    #   fastapi
    #   openai
    #   openpipe
 httpx-sse==0.4.0
    # via fal-client
-huggingface-hub==0.23.4
+huggingface-hub==0.23.3
    # via
    #   faster-whisper
    #   timm
@@ -186,7 +168,6 @@ humanfriendly==10.0
 idna==3.7
    # via
    #   anyio
-    #   email-validator
    #   httpx
    #   requests
    #   yarl
@@ -196,46 +177,41 @@ itsdangerous==2.2.0
    # via flask
 jinja2==3.1.4
    # via
-    #   fastapi
    #   flask
    #   torch
 jsonpatch==1.33
    # via langchain-core
 jsonpointer==3.0.0
    # via jsonpatch
-langchain==0.2.5
+langchain==0.2.3
    # via
    #   langchain-community
    #   pipecat-ai (pyproject.toml)
-langchain-community==0.2.5
+langchain-community==0.2.4
    # via pipecat-ai (pyproject.toml)
-langchain-core==0.2.9
+langchain-core==0.2.5
    # via
    #   langchain
    #   langchain-community
    #   langchain-openai
    #   langchain-text-splitters
-langchain-openai==0.1.9
+langchain-openai==0.1.8
    # via pipecat-ai (pyproject.toml)
 langchain-text-splitters==0.2.1
    # via langchain
-langsmith==0.1.81
+langsmith==0.1.77
    # via
    #   langchain
    #   langchain-community
    #   langchain-core
 loguru==0.7.2
    # via pipecat-ai (pyproject.toml)
-markdown-it-py==3.0.0
-    # via rich
 markupsafe==2.1.5
    # via
    #   jinja2
    #   werkzeug
 marshmallow==3.21.3
    # via dataclasses-json
-mdurl==0.1.2
-    # via markdown-it-py
 mpmath==1.3.0
    # via sympy
 multidict==6.0.5
@@ -297,11 +273,9 @@ openai==1.26.0
    #   pipecat-ai (pyproject.toml)
 openpipe==4.14.0
    # via pipecat-ai (pyproject.toml)
-orjson==3.10.5
-    # via
-    #   fastapi
-    #   langsmith
-packaging==24.1
+orjson==3.10.4
+    # via langsmith
+packaging==23.2
    # via
    #   huggingface-hub
    #   langchain-core
@@ -315,7 +289,7 @@ pillow==10.3.0
    #   torchvision
 pluggy==1.5.0
    # via pytest
-proto-plus==1.24.0
+proto-plus==1.23.0
    # via
    #   google-ai-generativelanguage
    #   google-api-core
@@ -343,7 +317,6 @@ pycparser==2.22
 pydantic==2.7.4
    # via
    #   anthropic
-    #   fastapi
    #   google-generativeai
    #   langchain
    #   langchain-core
@@ -351,8 +324,6 @@ pydantic==2.7.4
    #   openai
 pydantic-core==2.18.4
    # via pydantic
-pygments==2.18.0
-    # via rich
 pyht==0.0.28
    # via pipecat-ai (pyproject.toml)
 pyloudnorm==0.1.1
@@ -366,11 +337,7 @@ pytest-asyncio==0.23.7
 python-dateutil==2.9.0.post0
    # via openpipe
 python-dotenv==1.0.1
-    # via
-    #   pipecat-ai (pyproject.toml)
-    #   uvicorn
-python-multipart==0.0.9
-    # via fastapi
+    # via pipecat-ai (pyproject.toml)
 pyyaml==6.0.1
    # via
    #   ctranslate2
@@ -380,7 +347,6 @@ pyyaml==6.0.1
    #   langchain-core
    #   timm
    #   transformers
-    #   uvicorn
 regex==2024.5.15
    # via
    #   tiktoken
@@ -396,8 +362,6 @@ requests==2.32.3
    #   pyht
    #   tiktoken
    #   transformers
-rich==13.7.1
-    # via typer
 rsa==4.9
    # via google-auth
 safetensors==0.4.3
@@ -406,8 +370,6 @@ safetensors==0.4.3
    #   transformers
 scipy==1.13.1
    # via pyloudnorm
-shellingham==1.5.4
-    # via typer
 six==1.16.0
    # via python-dateutil
 sniffio==1.3.1
@@ -418,17 +380,15 @@ sniffio==1.3.1
    #   openai
 sounddevice==0.4.7
    # via pipecat-ai (pyproject.toml)
-sqlalchemy==2.0.31
+sqlalchemy==2.0.30
    # via
    #   langchain
    #   langchain-community
-starlette==0.37.2
-    # via fastapi
 sympy==1.12.1
    # via
    #   onnxruntime
    #   torch
-tenacity==8.4.1
+tenacity==8.3.0
    # via
    #   langchain
    #   langchain-community
@@ -464,14 +424,11 @@ transformers==4.40.2
    # via pipecat-ai (pyproject.toml)
 triton==2.3.1
    # via torch
-typer==0.12.3
-    # via fastapi-cli
 typing-extensions==4.12.2
    # via
    #   anthropic
    #   anyio
    #   deepgram-sdk
-    #   fastapi
    #   google-generativeai
    #   huggingface-hub
    #   openai
@@ -480,31 +437,20 @@ typing-extensions==4.12.2
    #   pydantic-core
    #   sqlalchemy
    #   torch
-    #   typer
    #   typing-inspect
-    #   uvicorn
 typing-inspect==0.9.0
    # via dataclasses-json
-ujson==5.10.0
-    # via fastapi
 uritemplate==4.1.1
    # via google-api-python-client
-urllib3==2.2.2
+urllib3==2.2.1
    # via requests
-uvicorn[standard]==0.30.1
-    # via fastapi
-uvloop==0.19.0
-    # via uvicorn
 verboselogs==1.7
    # via deepgram-sdk
-watchfiles==0.22.0
-    # via uvicorn
 websockets==12.0
    # via
    #   cartesia
    #   deepgram-sdk
    #   pipecat-ai (pyproject.toml)
-    #   uvicorn
 werkzeug==3.0.3
    # via flask
 yarl==1.9.4
--- a/macos-py3.10-requirements.txt
+++ b/macos-py3.10-requirements.txt
@@ -1,5 +1,5 @@
 #
-# This file is autogenerated by pip-compile with Python 3.12
+# This file is autogenerated by pip-compile with Python 3.10
 # by the following command:
 #
 #    pip-compile --all-extras pyproject.toml
@@ -26,8 +26,10 @@ anyio==4.4.0
    #   anthropic
    #   httpx
    #   openai
-    #   starlette
-    #   watchfiles
+async-timeout==4.0.3
+    # via
+    #   aiohttp
+    #   langchain
 attrs==23.2.0
    # via
    #   aiohttp
@@ -52,15 +54,12 @@ cffi==1.16.0
 charset-normalizer==3.3.2
    # via requests
 click==8.1.7
-    # via
-    #   flask
-    #   typer
-    #   uvicorn
+    # via flask
 coloredlogs==15.0.1
    # via onnxruntime
 ctranslate2==4.3.1
    # via faster-whisper
-daily-python==0.10.0
+daily-python==0.9.1
    # via pipecat-ai (pyproject.toml)
 dataclasses-json==0.6.7
    # via
@@ -72,21 +71,17 @@ distro==1.9.0
    # via
    #   anthropic
    #   openai
-dnspython==2.6.1
-    # via email-validator
 einops==0.8.0
    # via pipecat-ai (pyproject.toml)
-email-validator==2.2.0
-    # via fastapi
+exceptiongroup==1.2.1
+    # via
+    #   anyio
+    #   pytest
 fal-client==0.4.0
    # via pipecat-ai (pyproject.toml)
-fastapi==0.111.0
-    # via pipecat-ai (pyproject.toml)
-fastapi-cli==0.0.4
-    # via fastapi
 faster-whisper==1.0.2
    # via pipecat-ai (pyproject.toml)
-filelock==3.15.3
+filelock==3.15.1
    # via
    #   huggingface-hub
    #   pyht
@@ -117,7 +112,7 @@ google-api-core[grpc]==2.19.0
    #   google-ai-generativelanguage
    #   google-api-python-client
    #   google-generativeai
-google-api-python-client==2.134.0
+google-api-python-client==2.133.0
    # via google-generativeai
 google-auth==2.30.0
    # via
@@ -142,29 +137,24 @@ grpcio==1.64.1
 grpcio-status==1.62.2
    # via google-api-core
 h11==0.14.0
-    # via
-    #   httpcore
-    #   uvicorn
+    # via httpcore
 httpcore==1.0.5
    # via httpx
 httplib2==0.22.0
    # via
    #   google-api-python-client
    #   google-auth-httplib2
-httptools==0.6.1
-    # via uvicorn
 httpx==0.27.0
    # via
    #   anthropic
    #   cartesia
    #   deepgram-sdk
    #   fal-client
-    #   fastapi
    #   openai
    #   openpipe
 httpx-sse==0.4.0
    # via fal-client
-huggingface-hub==0.23.4
+huggingface-hub==0.23.3
    # via
    #   faster-whisper
    #   timm
@@ -175,7 +165,6 @@ humanfriendly==10.0
 idna==3.7
    # via
    #   anyio
-    #   email-validator
    #   httpx
    #   requests
    #   yarl
@@ -185,46 +174,41 @@ itsdangerous==2.2.0
    # via flask
 jinja2==3.1.4
    # via
-    #   fastapi
    #   flask
    #   torch
 jsonpatch==1.33
    # via langchain-core
 jsonpointer==3.0.0
    # via jsonpatch
-langchain==0.2.5
+langchain==0.2.3
    # via
    #   langchain-community
    #   pipecat-ai (pyproject.toml)
-langchain-community==0.2.5
+langchain-community==0.2.4
    # via pipecat-ai (pyproject.toml)
-langchain-core==0.2.9
+langchain-core==0.2.5
    # via
    #   langchain
    #   langchain-community
    #   langchain-openai
    #   langchain-text-splitters
-langchain-openai==0.1.9
+langchain-openai==0.1.8
    # via pipecat-ai (pyproject.toml)
 langchain-text-splitters==0.2.1
    # via langchain
-langsmith==0.1.81
+langsmith==0.1.77
    # via
    #   langchain
    #   langchain-community
    #   langchain-core
 loguru==0.7.2
    # via pipecat-ai (pyproject.toml)
-markdown-it-py==3.0.0
-    # via rich
 markupsafe==2.1.5
    # via
    #   jinja2
    #   werkzeug
 marshmallow==3.21.3
    # via dataclasses-json
-mdurl==0.1.2
-    # via markdown-it-py
 mpmath==1.3.0
    # via sympy
 multidict==6.0.5
@@ -255,11 +239,9 @@ openai==1.26.0
    #   pipecat-ai (pyproject.toml)
 openpipe==4.14.0
    # via pipecat-ai (pyproject.toml)
-orjson==3.10.5
-    # via
-    #   fastapi
-    #   langsmith
-packaging==24.1
+orjson==3.10.4
+    # via langsmith
+packaging==23.2
    # via
    #   huggingface-hub
    #   langchain-core
@@ -273,7 +255,7 @@ pillow==10.3.0
    #   torchvision
 pluggy==1.5.0
    # via pytest
-proto-plus==1.24.0
+proto-plus==1.23.0
    # via
    #   google-ai-generativelanguage
    #   google-api-core
@@ -301,7 +283,6 @@ pycparser==2.22
 pydantic==2.7.4
    # via
    #   anthropic
-    #   fastapi
    #   google-generativeai
    #   langchain
    #   langchain-core
@@ -309,8 +290,6 @@ pydantic==2.7.4
    #   openai
 pydantic-core==2.18.4
    # via pydantic
-pygments==2.18.0
-    # via rich
 pyht==0.0.28
    # via pipecat-ai (pyproject.toml)
 pyloudnorm==0.1.1
@@ -324,11 +303,7 @@ pytest-asyncio==0.23.7
 python-dateutil==2.9.0.post0
    # via openpipe
 python-dotenv==1.0.1
-    # via
-    #   pipecat-ai (pyproject.toml)
-    #   uvicorn
-python-multipart==0.0.9
-    # via fastapi
+    # via pipecat-ai (pyproject.toml)
 pyyaml==6.0.1
    # via
    #   ctranslate2
@@ -338,7 +313,6 @@ pyyaml==6.0.1
    #   langchain-core
    #   timm
    #   transformers
-    #   uvicorn
 regex==2024.5.15
    # via
    #   tiktoken
@@ -354,8 +328,6 @@ requests==2.32.3
    #   pyht
    #   tiktoken
    #   transformers
-rich==13.7.1
-    # via typer
 rsa==4.9
    # via google-auth
 safetensors==0.4.3
@@ -364,8 +336,6 @@ safetensors==0.4.3
    #   transformers
 scipy==1.13.1
    # via pyloudnorm
-shellingham==1.5.4
-    # via typer
 six==1.16.0
    # via python-dateutil
 sniffio==1.3.1
@@ -376,17 +346,15 @@ sniffio==1.3.1
    #   openai
 sounddevice==0.4.7
    # via pipecat-ai (pyproject.toml)
-sqlalchemy==2.0.31
+sqlalchemy==2.0.30
    # via
    #   langchain
    #   langchain-community
-starlette==0.37.2
-    # via fastapi
 sympy==1.12.1
    # via
    #   onnxruntime
    #   torch
-tenacity==8.4.1
+tenacity==8.3.0
    # via
    #   langchain
    #   langchain-community
@@ -400,6 +368,8 @@ tokenizers==0.19.1
    #   anthropic
    #   faster-whisper
    #   transformers
+tomli==2.0.1
+    # via pytest
 torch==2.3.1
    # via
    #   pipecat-ai (pyproject.toml)
@@ -418,13 +388,11 @@ tqdm==4.66.4
    #   transformers
 transformers==4.40.2
    # via pipecat-ai (pyproject.toml)
-typer==0.12.3
-    # via fastapi-cli
 typing-extensions==4.12.2
    # via
    #   anthropic
+    #   anyio
    #   deepgram-sdk
-    #   fastapi
    #   google-generativeai
    #   huggingface-hub
    #   openai
@@ -433,30 +401,20 @@ typing-extensions==4.12.2
    #   pydantic-core
    #   sqlalchemy
    #   torch
-    #   typer
    #   typing-inspect
 typing-inspect==0.9.0
    # via dataclasses-json
-ujson==5.10.0
-    # via fastapi
 uritemplate==4.1.1
    # via google-api-python-client
-urllib3==2.2.2
+urllib3==2.2.1
    # via requests
-uvicorn[standard]==0.30.1
-    # via fastapi
-uvloop==0.19.0
-    # via uvicorn
 verboselogs==1.7
    # via deepgram-sdk
-watchfiles==0.22.0
-    # via uvicorn
 websockets==12.0
    # via
    #   cartesia
    #   deepgram-sdk
    #   pipecat-ai (pyproject.toml)
-    #   uvicorn
 werkzeug==3.0.3
    # via flask
 yarl==1.9.4
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -37,7 +37,7 @@ Website = "https://pipecat.ai"
 anthropic = [ "anthropic~=0.25.7" ]
 azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
 cartesia = [ "numpy~=1.26.0", "sounddevice", "cartesia" ]
-daily = [ "daily-python~=0.10.0" ]
+daily = [ "daily-python~=0.9.0" ]
 deepgram = [ "deepgram-sdk~=3.2.7" ]
 examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
 fal = [ "fal-client~=0.4.0" ]
@@ -50,7 +50,7 @@ openai = [ "openai~=1.26.0" ]
 openpipe = [ "openpipe~=4.14.0" ]
 playht = [ "pyht~=0.0.28" ]
 silero = [ "torch~=2.3.0", "torchaudio~=2.3.0" ]
-websocket = [ "websockets~=12.0", "fastapi~=0.111.0" ]
+websocket = [ "websockets~=12.0" ]
 whisper = [ "faster-whisper~=1.0.2" ]

 [tool.setuptools.packages.find]
--- a/src/pipecat/serializers/base_serializer.py
+++ b/src/pipecat/serializers/base_serializer.py
@@ -12,9 +12,9 @@ from pipecat.frames.frames import Frame
 class FrameSerializer(ABC):

    @abstractmethod
-    def serialize(self, frame: Frame) -> str | bytes | None:
+    def serialize(self, frame: Frame) -> bytes:
        pass

    @abstractmethod
-    def deserialize(self, data: str | bytes) -> Frame | None:
+    def deserialize(self, data: bytes) -> Frame | None:
        pass
--- a/src/pipecat/serializers/protobuf.py
+++ b/src/pipecat/serializers/protobuf.py
@@ -26,7 +26,7 @@ class ProtobufFrameSerializer(FrameSerializer):
    def __init__(self):
        pass

-    def serialize(self, frame: Frame) -> str | bytes | None:
+    def serialize(self, frame: Frame) -> bytes:
        proto_frame = frame_protos.Frame()
        if type(frame) not in self.SERIALIZABLE_TYPES:
            raise ValueError(
@@ -41,7 +41,7 @@ class ProtobufFrameSerializer(FrameSerializer):
        result = proto_frame.SerializeToString()
        return result

-    def deserialize(self, data: str | bytes) -> Frame | None:
+    def deserialize(self, data: bytes) -> Frame | None:
        """Returns a Frame object from a Frame protobuf. Used to convert frames
        passed over the wire as protobufs to Frame objects used in pipelines
        and frame processors.
--- a/src/pipecat/serializers/twilio.py
+++ b/src/pipecat/serializers/twilio.py
@@ -1,55 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import base64
-import json
-
-from pipecat.frames.frames import AudioRawFrame, Frame
-from pipecat.serializers.base_serializer import FrameSerializer
-from pipecat.utils.audio import ulaw_8000_to_pcm_16000, pcm_16000_to_ulaw_8000
-
-
-class TwilioFrameSerializer(FrameSerializer):
-    SERIALIZABLE_TYPES = {
-        AudioRawFrame: "audio",
-    }
-
-    def __init__(self):
-        self._sid = None
-
-    def serialize(self, frame: Frame) -> str | bytes | None:
-        if not isinstance(frame, AudioRawFrame):
-            return None
-
-        data = frame.audio
-
-        serialized_data = pcm_16000_to_ulaw_8000(data)
-        payload = base64.b64encode(serialized_data).decode("utf-8")
-        answer = {
-            "event": "media",
-            "streamSid": self._sid,
-            "media": {
-                "payload": payload
-            }
-        }
-
-        return json.dumps(answer)
-
-    def deserialize(self, data: str | bytes) -> Frame | None:
-        message = json.loads(data)
-
-        if not self._sid:
-            self._sid = message["streamSid"] if "streamSid" in message else None
-
-        if message["event"] != "media":
-            return None
-        else:
-            payload_base64 = message["media"]["payload"]
-            payload = base64.b64decode(payload_base64)
-
-            deserialized_data = ulaw_8000_to_pcm_16000(payload)
-            audio_frame = AudioRawFrame(audio=deserialized_data, num_channels=1, sample_rate=16000)
-            return audio_frame
--- a/src/pipecat/services/ai_services.py
+++ b/src/pipecat/services/ai_services.py
@@ -21,7 +21,6 @@ from pipecat.frames.frames import (
    TTSStoppedFrame,
    TextFrame,
    VisionImageRawFrame,
-    LLMFullResponseEndFrame,
 )
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.utils.audio import calculate_audio_volume
@@ -111,9 +110,7 @@ class TTSService(AIService):
            text = frame.text
        else:
            self._current_sentence += frame.text
-            if self._current_sentence.strip().endswith(
-                    (".", "?", "!")) and not self._current_sentence.strip().endswith(
-                    ("Mr,", "Mrs.", "Ms.", "Dr.")):
+            if self._current_sentence.strip().endswith((".", "?", "!")):
                text = self._current_sentence.strip()
                self._current_sentence = ""

@@ -137,11 +134,6 @@ class TTSService(AIService):
            if self._current_sentence:
                await self._push_tts_frames(self._current_sentence)
            await self.push_frame(frame)
-        elif isinstance(frame, LLMFullResponseEndFrame):
-            if self._current_sentence:
-                await self._push_tts_frames(self._current_sentence.strip())
-                self._current_sentence = ""
-            await self.push_frame(frame)
        else:
            await self.push_frame(frame, direction)

--- a/src/pipecat/services/anthropic.py
+++ b/src/pipecat/services/anthropic.py
@@ -40,14 +40,17 @@ class AnthropicLLMService(LLMService):
    """

    def __init__(
-            self,
-            api_key: str,
-            model: str = "claude-3-opus-20240229",
-            max_tokens: int = 1024):
+        self,
+        api_key: str,
+        model: str = "claude-3-opus-20240229",
+        max_tokens: int = 1024,
+        temperature: float = 0.0
+    ):
        super().__init__()
        self._client = AsyncAnthropic(api_key=api_key)
        self._model = model
        self._max_tokens = max_tokens
+        self._temperature = temperature

    def can_generate_metrics(self) -> bool:
        return True
@@ -110,6 +113,7 @@ class AnthropicLLMService(LLMService):
                messages=messages,
                model=self._model,
                max_tokens=self._max_tokens,
+                temperature=self._temperature,
                stream=True)

            await self.stop_ttfb_metrics()
--- a/src/pipecat/services/deepgram.py
+++ b/src/pipecat/services/deepgram.py
@@ -89,7 +89,6 @@ class DeepgramTTSService(TTSService):
 class DeepgramSTTService(AIService):
    def __init__(self,
                 api_key: str,
-                 url: str = "",
                 live_options: LiveOptions = LiveOptions(
                     encoding="linear16",
                     language="en-US",
@@ -105,7 +104,7 @@ class DeepgramSTTService(AIService):
        self._live_options = live_options

        self._client = DeepgramClient(
-            api_key, config=DeepgramClientOptions(url=url, options={"keepalive": "true"}))
+            api_key, config=DeepgramClientOptions(options={"keepalive": "true"}))
        self._connection = self._client.listen.asynclive.v("1")
        self._connection.on(LiveTranscriptionEvents.Transcript, self._on_message)

--- a/src/pipecat/transports/network/fastapi_websocket.py
+++ b/src/pipecat/transports/network/fastapi_websocket.py
@@ -1,160 +0,0 @@
-#
-# Copyright (c) 2024, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-
-import asyncio
-import io
-import wave
-
-from typing import Awaitable, Callable
-from pydantic.main import BaseModel
-
-from pipecat.serializers.twilio import TwilioFrameSerializer
-from pipecat.frames.frames import AudioRawFrame, StartFrame
-from pipecat.processors.frame_processor import FrameProcessor
-from pipecat.serializers.base_serializer import FrameSerializer
-from pipecat.transports.base_input import BaseInputTransport
-from pipecat.transports.base_output import BaseOutputTransport
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-
-from loguru import logger
-
-try:
-    from fastapi import WebSocket
-    from starlette.websockets import WebSocketState
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error(
-        "In order to use FastAPI websockets, you need to `pip install pipecat-ai[websocket]`.")
-    raise Exception(f"Missing module: {e}")
-
-
-class FastAPIWebsocketParams(TransportParams):
-    add_wav_header: bool = False
-    audio_frame_size: int = 6400  # 200ms
-    serializer: FrameSerializer = TwilioFrameSerializer()
-
-
-class FastAPIWebsocketCallbacks(BaseModel):
-    on_client_connected: Callable[[WebSocket], Awaitable[None]]
-    on_client_disconnected: Callable[[WebSocket], Awaitable[None]]
-
-
-class FastAPIWebsocketInputTransport(BaseInputTransport):
-
-    def __init__(
-            self,
-            websocket: WebSocket,
-            params: FastAPIWebsocketParams,
-            callbacks: FastAPIWebsocketCallbacks,
-            **kwargs):
-        super().__init__(params, **kwargs)
-
-        self._websocket = websocket
-        self._params = params
-        self._callbacks = callbacks
-
-    async def start(self, frame: StartFrame):
-        await self._callbacks.on_client_connected(self._websocket)
-        await super().start(frame)
-        self._receive_task = self.get_event_loop().create_task(self._receive_messages())
-
-    async def stop(self):
-        if self._websocket.client_state != WebSocketState.DISCONNECTED:
-            await self._websocket.close()
-        await super().stop()
-
-    async def _receive_messages(self):
-        async for message in self._websocket.iter_text():
-            frame = self._params.serializer.deserialize(message)
-
-            if not frame:
-                continue
-
-            if isinstance(frame, AudioRawFrame):
-                await self.push_audio_frame(frame)
-
-        await self._callbacks.on_client_disconnected(self._websocket)
-
-
-class FastAPIWebsocketOutputTransport(BaseOutputTransport):
-
-    def __init__(self, websocket: WebSocket, params: FastAPIWebsocketParams, **kwargs):
-        super().__init__(params, **kwargs)
-
-        self._websocket = websocket
-        self._params = params
-        self._audio_buffer = bytes()
-
-    async def write_raw_audio_frames(self, frames: bytes):
-        self._audio_buffer += frames
-        while len(self._audio_buffer) >= self._params.audio_frame_size:
-            frame = AudioRawFrame(
-                audio=self._audio_buffer[:self._params.audio_frame_size],
-                sample_rate=self._params.audio_out_sample_rate,
-                num_channels=self._params.audio_out_channels
-            )
-
-            if self._params.add_wav_header:
-                content = io.BytesIO()
-                ww = wave.open(content, "wb")
-                ww.setsampwidth(2)
-                ww.setnchannels(frame.num_channels)
-                ww.setframerate(frame.sample_rate)
-                ww.writeframes(frame.audio)
-                ww.close()
-                content.seek(0)
-                wav_frame = AudioRawFrame(
-                    content.read(),
-                    sample_rate=frame.sample_rate,
-                    num_channels=frame.num_channels)
-                frame = wav_frame
-
-            payload = self._params.serializer.serialize(frame)
-            if payload:
-                await self._websocket.send_text(payload)
-
-            self._audio_buffer = self._audio_buffer[self._params.audio_frame_size:]
-
-
-class FastAPIWebsocketTransport(BaseTransport):
-
-    def __init__(
-            self,
-            websocket: WebSocket,
-            params: FastAPIWebsocketParams = FastAPIWebsocketParams(),
-            input_name: str | None = None,
-            output_name: str | None = None,
-            loop: asyncio.AbstractEventLoop | None = None):
-        super().__init__(input_name=input_name, output_name=output_name, loop=loop)
-        self._params = params
-
-        self._callbacks = FastAPIWebsocketCallbacks(
-            on_client_connected=self._on_client_connected,
-            on_client_disconnected=self._on_client_disconnected
-        )
-
-        self._input = FastAPIWebsocketInputTransport(
-            websocket, self._params, self._callbacks, name=self._input_name)
-        self._output = FastAPIWebsocketOutputTransport(
-            websocket, self._params, name=self._output_name)
-
-        # Register supported handlers. The user will only be able to register
-        # these handlers.
-        self._register_event_handler("on_client_connected")
-        self._register_event_handler("on_client_disconnected")
-
-    def input(self) -> FrameProcessor:
-        return self._input
-
-    def output(self) -> FrameProcessor:
-        return self._output
-
-    async def _on_client_connected(self, websocket):
-        await self._call_event_handler("on_client_connected", websocket)
-
-    async def _on_client_disconnected(self, websocket):
-        await self._call_event_handler("on_client_disconnected", websocket)
--- a/src/pipecat/transports/network/websocket_server.py
+++ b/src/pipecat/transports/network/websocket_server.py
@@ -4,9 +4,11 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+
 import asyncio
 import io
 import wave
+import websockets

 from typing import Awaitable, Callable
 from pydantic.main import BaseModel
@@ -21,13 +23,6 @@ from pipecat.transports.base_transport import BaseTransport, TransportParams

 from loguru import logger

-try:
-    import websockets
-except ModuleNotFoundError as e:
-    logger.error(f"Exception: {e}")
-    logger.error("In order to use websockets, you need to `pip install pipecat-ai[websocket]`.")
-    raise Exception(f"Missing module: {e}")
-

 class WebsocketServerParams(TransportParams):
    add_wav_header: bool = False
--- a/src/pipecat/transports/services/daily.py
+++ b/src/pipecat/transports/services/daily.py
@@ -113,7 +113,6 @@ class DailyCallbacks(BaseModel):
    on_app_message: Callable[[Any, str], Awaitable[None]]
    on_call_state_updated: Callable[[str], Awaitable[None]]
    on_dialin_ready: Callable[[str], Awaitable[None]]
-    on_dialout_answered: Callable[[Any], Awaitable[None]]
    on_dialout_connected: Callable[[Any], Awaitable[None]]
    on_dialout_stopped: Callable[[Any], Awaitable[None]]
    on_dialout_error: Callable[[Any], Awaitable[None]]
@@ -433,9 +432,6 @@ class DailyTransportClient(EventHandler):
    def on_dialin_ready(self, sip_endpoint: str):
        self._call_async_callback(self._callbacks.on_dialin_ready, sip_endpoint)

-    def on_dialout_answered(self, data: Any):
-        self._call_async_callback(self._callbacks.on_dialout_answered, data)
-
    def on_dialout_connected(self, data: Any):
        self._call_async_callback(self._callbacks.on_dialout_connected, data)

@@ -693,7 +689,6 @@ class DailyTransport(BaseTransport):
            on_app_message=self._on_app_message,
            on_call_state_updated=self._on_call_state_updated,
            on_dialin_ready=self._on_dialin_ready,
-            on_dialout_answered=self._on_dialout_answered,
            on_dialout_connected=self._on_dialout_connected,
            on_dialout_stopped=self._on_dialout_stopped,
            on_dialout_error=self._on_dialout_error,
@@ -716,7 +711,6 @@ class DailyTransport(BaseTransport):
        self._register_event_handler("on_app_message")
        self._register_event_handler("on_call_state_updated")
        self._register_event_handler("on_dialin_ready")
-        self._register_event_handler("on_dialout_answered")
        self._register_event_handler("on_dialout_connected")
        self._register_event_handler("on_dialout_stopped")
        self._register_event_handler("on_dialout_error")
@@ -844,9 +838,6 @@ class DailyTransport(BaseTransport):
            await self._handle_dialin_ready(sip_endpoint)
        await self._call_event_handler("on_dialin_ready", sip_endpoint)

-    async def _on_dialout_answered(self, data):
-        await self._call_event_handler("on_dialout_answered", data)
-
    async def _on_dialout_connected(self, data):
        await self._call_event_handler("on_dialout_connected", data)

--- a/src/pipecat/utils/audio.py
+++ b/src/pipecat/utils/audio.py
@@ -4,7 +4,6 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-import audioop
 import numpy as np
 import pyloudnorm as pyln

@@ -32,23 +31,3 @@ def calculate_audio_volume(audio: bytes, sample_rate: int) -> float:

 def exp_smoothing(value: float, prev_value: float, factor: float) -> float:
    return prev_value + factor * (value - prev_value)
-
-
-def ulaw_8000_to_pcm_16000(ulaw_8000_bytes):
-    # Convert μ-law to PCM
-    pcm_8000_bytes = audioop.ulaw2lin(ulaw_8000_bytes, 2)
-
-    # Resample from 8000 Hz to 16000 Hz
-    pcm_16000_bytes = audioop.ratecv(pcm_8000_bytes, 2, 1, 8000, 16000, None)[0]
-
-    return pcm_16000_bytes
-
-
-def pcm_16000_to_ulaw_8000(pcm_16000_bytes):
-    # Resample from 16000 Hz to 8000 Hz
-    pcm_8000_bytes = audioop.ratecv(pcm_16000_bytes, 2, 1, 16000, 8000, None)[0]
-
-    # Convert PCM to μ-law
-    ulaw_8000_bytes = audioop.lin2ulaw(pcm_8000_bytes, 2)
-
-    return ulaw_8000_bytes
--- a/tests/vllm-inference-test.py
+++ b/tests/vllm-inference-test.py
@@ -0,0 +1,86 @@
+import asyncio
+import time
+
+from vllm import LLM, SamplingParams
+from vllm.engine.arg_utils import AsyncEngineArgs
+from vllm.engine.async_llm_engine import AsyncLLMEngine
+from vllm.utils import random_uuid
+
+sampling_params = SamplingParams(
+    temperature=0.8,
+    top_p=0.95,
+    max_tokens=4096
+)
+
+prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.<|eot_id|><|start_header_id|>system<|end_header_id|>\n\nPlease introduce yourself to the user.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
+
+
+async def main():
+    print("🥶 cold starting inference")
+    start = time.monotonic_ns()
+
+    engine_args = AsyncEngineArgs(
+        model="meta-llama/Meta-Llama-3-8B-Instruct",
+        enable_prefix_caching=True,
+        gpu_memory_utilization=0.90,
+        enforce_eager=False,        # False means slower starts but faster inference
+        disable_log_stats=True,     # disable logging so we can stream tokens
+        disable_log_requests=True,
+    )
+
+    engine = AsyncLLMEngine.from_engine_args(engine_args)
+    duration_s = (time.monotonic_ns() - start) / 1e9
+    print(f"🏎️ engine started in {duration_s:.0f}s")
+
+    request_id = random_uuid()
+    result_generator = engine.generate(
+        prompt,
+        sampling_params,
+        request_id,
+    )
+    index, num_tokens = 0, 0
+    start = time.monotonic_ns()
+    async for output in result_generator:
+        if (
+            output.outputs[0].text
+            and "\ufffd" == output.outputs[0].text[-1]
+        ):
+            continue
+        text_delta = output.outputs[0].text[index:]
+        index = len(output.outputs[0].text)
+        num_tokens = len(output.outputs[0].token_ids)
+
+        print(text_delta)
+    duration_s = (time.monotonic_ns() - start) / 1e9
+
+    print(
+        f"\n\tGenerated {num_tokens} tokens in {duration_s:.1f}s,"
+        f" throughput = {num_tokens / duration_s:.0f} tokens/second.\n"
+    )
+
+    return
+
+
+async def xmain():
+    llm = LLM(
+        model="meta-llama/Meta-Llama-3-8B-Instruct",
+        enable_prefix_caching=True
+    )
+
+    outputs = llm.generate(prompt, sampling_params)
+
+    for output in outputs:
+        prompt = output.prompt
+        generated_text = output.outputs[0].text
+        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+
+    outputs = llm.generate(prompt, sampling_params)
+
+    for output in outputs:
+        prompt = output.prompt
+        generated_text = output.outputs[0].text
+        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
Author	SHA1	Message	Date
Kwindla Hultman Kramer	de4f3b6c44	claude sonnet 3.5 ice cream benchmark	2024-06-22 10:27:01 -04:00
Kwindla Hultman Kramer	4ed6648f99	fix typo	2024-06-20 21:05:09 -04:00
Kwindla Hultman Kramer	dc5efe3028	sonic ice cream ttft	2024-06-20 20:57:47 -04:00
Kwindla Hultman Kramer	07041cccce	fix for multiple assistant messages in a row	2024-06-20 19:33:33 -04:00
Brian Hill	ce45a5f8bc	sample code for vllm local inference	2024-06-20 19:33:33 -04:00
Kwindla Hultman Kramer	4f1e9e2d50	more robust cancellation	2024-06-20 19:33:33 -04:00
Kwindla Hultman Kramer	b2c92c3225	experimenting with greedy inference	2024-06-20 19:33:33 -04:00
Aleix Conchillo Flaqué	2b324e4b01	transports: fully use asyncio in all read/write operations	2024-06-17 18:16:15 -07:00