alllow interrupt

working with summary
more variables
2024-11-02 16:12:29 -07:00 · 2024-11-02 15:33:03 -07:00 · 2024-11-02 14:05:19 -07:00 · 2024-11-02 13:46:28 -07:00 · 2024-11-02 13:37:35 -07:00 · 2024-11-02 13:27:08 -07:00
121 changed files with 2998 additions and 546 deletions
--- a/.github/workflows/format.yaml
+++ b/.github/workflows/format.yaml
@@ -38,4 +38,4 @@ jobs:
        id: ruff
        run: |
          source .venv/bin/activate
-          ruff format --config line-length=100 --diff --exclude "*_pb2.py"
+          ruff format --diff
--- a/.gitignore
+++ b/.gitignore
@@ -4,6 +4,7 @@ __pycache__/
 *~
 venv
 .venv
+/.idea
 #*#

 # Distribution / packaging
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,86 @@ All notable changes to **Pipecat** will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [Unreleased]
+
+### Added
+
+- Added `GatedOpenAILLMContextAggregator`. This aggregator keeps the last
+  received OpenAI LLM context frame and it doesn't let it through until the
+  notifier is notified.
+
+- Added `WakeNotifierFilter`. This processor expects a list of frame types and
+  will execute a given callback predicate when a frame of any of those type is
+  being processed. If the callback returns true the notifier will be notified.
+
+- Added `NullFilter`. A null filter doesn't push any frames upstream or
+  downstream. This is usually used to disable one of the pipelines in
+  `ParallelPipeline`.
+
+- Added `EventNotifier`. This can be used as a very simple synchronization
+  feature between processors.
+
+- Added `TavusVideoService`. This is an integration for Tavus digital twins.
+  (see https://www.tavus.io/)
+
+- Added `DailyTransport.update_subscriptions()`. This allows you to have fine
+  grained control of what media subscriptions you want for each participant in a
+  room.
+
+### Changed
+
+- The following `DailyTransport` functions are now `async` which means they need
+  to be awaited: `start_dialout`, `stop_dialout`, `start_recording`,
+  `stop_recording`, `capture_participant_transcription` and
+  `capture_participant_video`.
+
+- Changed default output sample rate to 24000. This changes all TTS service to
+  output to 24000 and also the default output transport sample rate. This
+  improves audio quality at the cost of some extra bandwidth.
+
+### Fixed
+
+- Improved bot speaking detection for all TTS services by using actual bot
+  audio.
+
+- Fixed an issue that was generating constant bot started/stopped speaking
+  frames for HTTP TTS services.
+
+- Fixed an issue that was causing stuttering with AWS TTS service.
+
+- Fixed an issue with PlayHTTTSService, where the TTFB metrics were reporting
+  very small time values.
+
+### Other
+
+- Added a new foundational example 22-natural-conversation.py. This examples
+  shows how to achieve a more natural conversation detecting when the user ends
+  statement.
+
+## [0.0.47] - 2024-10-22
+
+### Added
+
+- Added `AssemblyAISTTService` and corresponding foundational examples
+  `07o-interruptible-assemblyai.py` and `13d-assemblyai-transcription.py`.
+
+- Added a foundational example for Gladia transcription:
+  `13c-gladia-transcription.py`
+
+### Changed
+
+- Updated `GladiaSTTService` to use the V2 API.
+
+- Changed `DailyTransport` transcription model to `nova-2-general`.
+
+### Fixed
+
+- Fixed an issue that would cause an import error when importing
+  `SileroVADAnalyzer` from the old package `pipecat.vad.silero`.
+
+- Fixed `enable_usage_metrics` to control LLM/TTS usage metrics separately
+  from `enable_metrics`.
+
 ## [0.0.46] - 2024-10-19

 ### Added
@@ -17,6 +97,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### Changed

+- Changed `DeepgramSTTService` model to `nova-2-general`.
+
 - Moved `SileroVAD` audio processor to `processors.audio.vad`.

 - Module `utils.audio` is now `audio.utils`. A new `resample_audio` function has
--- a/README.md
+++ b/README.md
@@ -38,7 +38,7 @@ pip install "pipecat-ai[option,...]"

 Your project may or may not need these, so they're made available as optional requirements. Here is a list:

- **AI services**: `anthropic`, `aws`, `azure`, `deepgram`, `gladia`, `google`, `fal`, `lmnt`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`, `xtts`
+- **AI services**: `anthropic`, `assemblyai`, `aws`, `azure`, `deepgram`, `gladia`, `google`, `fal`, `lmnt`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`, `xtts`
 - **Transports**: `local`, `websocket`, `daily`

 ## Code examples
@@ -64,7 +64,7 @@ async def main():
  # Use Daily as a real-time media transport (WebRTC)
  transport = DailyTransport(
    room_url=...,
-    token=...,
+    token="", # leave empty. Note: token is _not_ your api key
    bot_name="Bot Name",
    params=DailyParams(audio_out_enabled=True))

@@ -178,7 +178,7 @@ You can use [use-package](https://github.com/jwiegley/use-package) to install [e
  :ensure t
  :hook ((python-mode . lazy-ruff-mode))
  :config
-  (setq lazy-ruff-format-command "ruff format --config line-length=100")
+  (setq lazy-ruff-format-command "ruff format")
  (setq lazy-ruff-only-format-block t)
  (setq lazy-ruff-only-format-region t)
  (setq lazy-ruff-only-format-buffer t))
@@ -197,14 +197,13 @@ You can use [use-package](https://github.com/jwiegley/use-package) to install [e
 ### Visual Studio Code

 Install the
-[Ruff](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, enable formatting on save and configure `ruff` arguments:
+[Ruff](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, and enable formatting on save:

 ```json
 "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.formatOnSave": true
-},
-"ruff.format.args": ["--config", "line-length=100"]
+}
 ```

 ## Getting help
--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
@@ -0,0 +1,165 @@
+## Contributing to Pipecat
+
+We welcome contributions of all kinds! Your help is appreciated. Follow these steps to get involved:
+
+1. **Fork this repository**: Start by forking the Pipecat Documentation repository to your GitHub account.
+
+2. **Clone the repository**: Clone your forked repository to your local machine.
+   ```bash
+   git clone https://github.com/your-username/pipecat
+   ```
+3. **Create a branch**: For your contribution, create a new branch.
+   ```bash
+   git checkout -b your-branch-name
+   ```
+4. **Make your changes**: Edit or add files as necessary.
+5. **Test your changes**: Ensure that your changes look correct and follow the style set in the codebase.
+6. **Commit your changes**: Once you're satisfied with your changes, commit them with a meaningful message.
+
+```bash
+git commit -m "Description of your changes"
+```
+
+7. **Push your changes**: Push your branch to your forked repository.
+
+```bash
+git push origin your-branch-name
+```
+
+9. **Submit a Pull Request (PR)**: Open a PR from your forked repository to the main branch of this repo. 
+> Important: Describe the changes you've made clearly!
+
+Our maintainers will review your PR, and once everything is good, your contributions will be merged!
+
+
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, caste, color, religion, or sexual
+identity and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the overall
+  community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or advances of
+  any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email address,
+  without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official email address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at pipecat-ai@daily.co.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series of
+actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or permanent
+ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within the
+community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.1, available at
+[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
+
+Community Impact Guidelines were inspired by
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
+[https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
--- a/docs/ISSUE_TEMPLATE.md
+++ b/docs/ISSUE_TEMPLATE.md
@@ -0,0 +1,22 @@
+# Description
+Is this reporting a bug or feature request?
+
+
+If reporting a bug, please fill out the following:
+
+### Environment
+- pipecat-ai version:
+- python version:
+- OS:
+
+### Issue description
+Provide a clear description of the issue.
+
+### Repro steps
+List the steps to reproduce the issue.
+
+### Expected behavior
+
+### Actual behavior
+
+### Logs
--- a/docs/PULL_REQUEST_TEMPLATE.md
+++ b/docs/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1 @@
+#### Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
--- a/dot-env.template
+++ b/dot-env.template
@@ -46,5 +46,10 @@ PLAY_HT_API_KEY=...
 # OpenAI
 OPENAI_API_KEY=...

-#OpenPipe
+# OpenPipe
 OPENPIPE_API_KEY=...
+
+# Tavus
+TAVUS_API_KEY=...
+TAVUS_REPLICA_ID=...
+TAVUS_PERSONA_ID=...
--- a/examples/canonical-metrics/Dockerfile
+++ b/examples/canonical-metrics/Dockerfile
@@ -1,16 +1,10 @@
 FROM python:3.10-bullseye
-
 RUN mkdir /app
-RUN mkdir /app/assets
-RUN mkdir /app/utils
 COPY *.py /app/
 COPY requirements.txt /app/
-copy assets/* /app/assets/
-copy utils/* /app/utils/
-
 WORKDIR /app
 RUN pip3 install -r requirements.txt

 EXPOSE 7860

-CMD ["python3", "server.py"]
+CMD ["python3", "server.py"]
--- a/examples/canonical-metrics/README.md
+++ b/examples/canonical-metrics/README.md
@@ -1,12 +1,41 @@
-# Simple Chatbot
+# Chatbot with canonical-metrics

-<img src="image.png" width="420px">
+This project implements a chatbot using a pipeline architecture that integrates audio processing, transcription, and a language model for conversational interactions. The chatbot operates within a daily communication environment, utilizing various services for text-to-speech and language model responses.

-This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion.
+## Features

-See a video of it in action: https://x.com/kwindla/status/1778628911817183509
+- **Audio Input and Output**: Captures microphone input and plays back audio responses.
+- **Voice Activity Detection**: Utilizes Silero VAD to manage audio input intelligently.
+- **Text-to-Speech**: Integrates ElevenLabs TTS service to convert text responses into audio.
+- **Language Model Interaction**: Uses OpenAI's GPT-4 model to generate responses based on user input.
+- **Transcription Services**: Captures and transcribes participant speech for analytics.
+- **Metrics Collection**: Sends audio data for analysis via Canonical Metrics Service.
+
+## Requirements
+
+- Python 3.10+
+- `python-dotenv`
+- Additional libraries from the `pipecat` package.
+
+## Setup
+
+1. Clone the repository.
+2. Install the required packages.
+3. Set up environment variables for API keys:
+   - `OPENAI_API_KEY`
+   - `ELEVENLABS_API_KEY`
+   - `CANONICAL_API_KEY`
+   - `CANONICAL_API_URL`
+4. Run the script.
+
+## Usage
+
+The chatbot introduces itself and engages in conversations, providing brief and creative responses. Designed for flexibility, it can support multiple languages with appropriate configuration.
+
+## Events
+
+- Participants joining or leaving the call are handled dynamically, adjusting the chatbot's behavior accordingly.

-And a quick video walkthrough of the code: https://www.loom.com/share/13df1967161f4d24ade054e7f8753416

 ℹ️ The first time, things might take extra time to get started since VAD (Voice Activity Detection) model needs to be downloaded.

@@ -27,7 +56,7 @@ cp env.example .env # and add your credentials
 python server.py
 ```

-Then, visit `http://localhost:7860/start` in your browser to start a chatbot session.
+Then, visit `http://localhost:7860/` in your browser to start a chatbot session.

 ## Build and test the Docker image

--- a/examples/canonical-metrics/bot.py
+++ b/examples/canonical-metrics/bot.py
@@ -124,7 +124,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            await task.queue_frames([LLMMessagesFrame(messages)])

        @transport.event_handler("on_participant_left")
--- a/examples/canonical-metrics/env.example
+++ b/examples/canonical-metrics/env.example
@@ -2,4 +2,5 @@ DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bo
 DAILY_API_KEY=7df...
 OPENAI_API_KEY=sk-PL...
 ELEVENLABS_API_KEY=aeb...
-CANONICAL_API_KEY=can...
+CANONICAL_API_KEY=can...
+CANONICAL_API_URL=
--- a/examples/canonical-metrics/server.py
+++ b/examples/canonical-metrics/server.py
@@ -59,7 +59,7 @@ app.add_middleware(
 )


-@app.get("/start")
+@app.get("/")
 async def start_agent(request: Request):
    print(f"!!! Creating room")
    room = await daily_helpers["rest"].create_room(DailyRoomParams())
--- a/examples/chatbot-audio-recording/README.md
+++ b/examples/chatbot-audio-recording/README.md
@@ -27,7 +27,7 @@ cp env.example .env # and add your credentials
 python server.py
 ```

-Then, visit `http://localhost:7860/start` in your browser to start a chatbot session.
+Then, visit `http://localhost:7860/` in your browser to start a chatbot session.

 ## Build and test the Docker image

--- a/examples/chatbot-audio-recording/bot.py
+++ b/examples/chatbot-audio-recording/bot.py
@@ -123,7 +123,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            await task.queue_frames([LLMMessagesFrame(messages)])

        @transport.event_handler("on_participant_left")
--- a/examples/chatbot-audio-recording/server.py
+++ b/examples/chatbot-audio-recording/server.py
@@ -59,7 +59,7 @@ app.add_middleware(
 )


-@app.get("/start")
+@app.get("/")
 async def start_agent(request: Request):
    print(f"!!! Creating room")
    room = await daily_helpers["rest"].create_room(DailyRoomParams())
--- a/examples/deployment/flyio-example/README.md
+++ b/examples/deployment/flyio-example/README.md
@@ -34,6 +34,6 @@ Note: you can do this manually via the fly.io dashboard under the "secrets" sub-

 Send a post request to your running fly.io instance:

-`curl --location --request POST 'https://YOUR_FLY_APP_NAME/start_bot'`
+`curl --location --request POST 'https://YOUR_FLY_APP_NAME/'`

 This request will wait until the machine enters into a `starting` state, before returning the a room URL and token to join.
--- a/examples/deployment/flyio-example/bot.py
+++ b/examples/deployment/flyio-example/bot.py
@@ -75,7 +75,7 @@ async def main(room_url: str, token: str):

    @transport.event_handler("on_first_participant_joined")
    async def on_first_participant_joined(transport, participant):
-        transport.capture_participant_transcription(participant["id"])
+        await transport.capture_participant_transcription(participant["id"])
        await task.queue_frames([LLMMessagesFrame(messages)])

    @transport.event_handler("on_participant_left")
--- a/examples/deployment/flyio-example/bot_runner.py
+++ b/examples/deployment/flyio-example/bot_runner.py
@@ -124,7 +124,7 @@ async def spawn_fly_machine(room_url: str, token: str):
    print(f"Machine joined room: {room_url}")


-@app.post("/start_bot")
+@app.post("/")
 async def start_bot(request: Request) -> JSONResponse:
    try:
        data = await request.json()
--- a/examples/dialin-chatbot/bot_daily.py
+++ b/examples/dialin-chatbot/bot_daily.py
@@ -81,7 +81,7 @@ async def main(room_url: str, token: str, callId: str, callDomain: str):

    @transport.event_handler("on_first_participant_joined")
    async def on_first_participant_joined(transport, participant):
-        transport.capture_participant_transcription(participant["id"])
+        await transport.capture_participant_transcription(participant["id"])
        await task.queue_frames([LLMMessagesFrame(messages)])

    @transport.event_handler("on_participant_left")
--- a/examples/dialin-chatbot/bot_twilio.py
+++ b/examples/dialin-chatbot/bot_twilio.py
@@ -84,7 +84,7 @@ async def main(room_url: str, token: str, callId: str, sipUri: str):

    @transport.event_handler("on_first_participant_joined")
    async def on_first_participant_joined(transport, participant):
-        transport.capture_participant_transcription(participant["id"])
+        await transport.capture_participant_transcription(participant["id"])
        await task.queue_frames([LLMMessagesFrame(messages)])

    @transport.event_handler("on_participant_left")
--- a/examples/foundational/01b-livekit-audio.py
+++ b/examples/foundational/01b-livekit-audio.py
@@ -81,7 +81,7 @@ async def main():
            url=url,
            token=token,
            room_name=room_name,
-            params=LiveKitParams(audio_out_enabled=True, audio_out_sample_rate=16000),
+            params=LiveKitParams(audio_out_enabled=True),
        )

        tts = CartesiaTTSService(
--- a/examples/foundational/06-listen-and-respond.py
+++ b/examples/foundational/06-listen-and-respond.py
@@ -5,33 +5,31 @@
 #

 import asyncio
-import aiohttp
 import os
 import sys

+import aiohttp
+from dotenv import load_dotenv
+from loguru import logger
+from runner import configure
+
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import Frame, LLMMessagesFrame, MetricsFrame
 from pipecat.metrics.metrics import (
-    TTFBMetricsData,
-    ProcessingMetricsData,
    LLMUsageMetricsData,
+    ProcessingMetricsData,
+    TTFBMetricsData,
    TTSUsageMetricsData,
 )
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineTask
+from pipecat.pipeline.task import PipelineParams, PipelineTask
 from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.services.cartesia import CartesiaTTSService
 from pipecat.services.openai import OpenAILLMService
 from pipecat.transports.services.daily import DailyParams, DailyTransport

-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-
 load_dotenv(override=True)

 logger.remove(0)
@@ -105,11 +103,14 @@ async def main():
            ]
        )

-        task = PipelineTask(pipeline)
+        task = PipelineTask(
+            pipeline,
+            PipelineParams(enable_metrics=True, enable_usage_metrics=True),
+        )

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/06a-image-sync.py
+++ b/examples/foundational/06a-image-sync.py
@@ -127,7 +127,7 @@ async def main():
        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
            participant_name = participant.get("info", {}).get("userName", "")
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])

        runner = PipelineRunner()
--- a/examples/foundational/07-interruptible-vad.py
+++ b/examples/foundational/07-interruptible-vad.py
@@ -89,7 +89,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07-interruptible.py
+++ b/examples/foundational/07-interruptible.py
@@ -87,7 +87,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07a-interruptible-anthropic.py
+++ b/examples/foundational/07a-interruptible-anthropic.py
@@ -82,7 +82,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([LLMMessagesFrame(messages)])

--- a/examples/foundational/07b-interruptible-langchain.py
+++ b/examples/foundational/07b-interruptible-langchain.py
@@ -109,7 +109,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            lc.set_participant_id(participant["id"])
            # Kick off the conversation.
            # the `LLMMessagesFrame` will be picked up by the LangchainProcessor using
--- a/examples/foundational/07d-interruptible-elevenlabs.py
+++ b/examples/foundational/07d-interruptible-elevenlabs.py
@@ -85,7 +85,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07e-interruptible-playht.py
+++ b/examples/foundational/07e-interruptible-playht.py
@@ -40,7 +40,6 @@ async def main():
            "Respond bot",
            DailyParams(
                audio_out_enabled=True,
-                audio_out_sample_rate=16000,
                transcription_enabled=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
@@ -89,7 +88,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07f-interruptible-azure.py
+++ b/examples/foundational/07f-interruptible-azure.py
@@ -41,7 +41,6 @@ async def main():
            "Respond bot",
            DailyParams(
                audio_out_enabled=True,
-                audio_out_sample_rate=16000,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
                vad_audio_passthrough=True,
@@ -90,7 +89,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07g-interruptible-openai-tts.py
+++ b/examples/foundational/07g-interruptible-openai-tts.py
@@ -74,7 +74,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07h-interruptible-openpipe.py
+++ b/examples/foundational/07h-interruptible-openpipe.py
@@ -86,7 +86,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07i-interruptible-xtts.py
+++ b/examples/foundational/07i-interruptible-xtts.py
@@ -81,7 +81,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07j-interruptible-gladia.py
+++ b/examples/foundational/07j-interruptible-gladia.py
@@ -5,12 +5,16 @@
 #

 import asyncio
-import aiohttp
 import os
 import sys

+import aiohttp
+from dotenv import load_dotenv
+from loguru import logger
+from runner import configure
+
 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import LLMMessagesFrame
+from pipecat.frames.frames import EndFrame, LLMMessagesFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -20,12 +24,6 @@ from pipecat.services.gladia import GladiaSTTService
 from pipecat.services.openai import OpenAILLMService
 from pipecat.transports.services.daily import DailyParams, DailyTransport

-from runner import configure
-
-from loguru import logger
-
-from dotenv import load_dotenv
-
 load_dotenv(override=True)

 logger.remove(0)
@@ -85,11 +83,16 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])

+        # Register an event handler to exit the application when the user leaves.
+        @transport.event_handler("on_participant_left")
+        async def on_participant_left(transport, participant, reason):
+            await task.queue_frame(EndFrame())
+
        runner = PipelineRunner()

        await runner.run(task)
--- a/examples/foundational/07k-interruptible-lmnt.py
+++ b/examples/foundational/07k-interruptible-lmnt.py
@@ -77,7 +77,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07l-interruptible-together.py
+++ b/examples/foundational/07l-interruptible-together.py
@@ -96,7 +96,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([LLMMessagesFrame(messages)])

--- a/examples/foundational/07m-interruptible-aws.py
+++ b/examples/foundational/07m-interruptible-aws.py
@@ -40,7 +40,6 @@ async def main():
            "Respond bot",
            DailyParams(
                audio_out_enabled=True,
-                audio_out_sample_rate=16000,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
                vad_audio_passthrough=True,
@@ -85,7 +84,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07n-interruptible-google.py
+++ b/examples/foundational/07n-interruptible-google.py
@@ -82,7 +82,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/07o-interruptible-assemblyai.py
+++ b/examples/foundational/07o-interruptible-assemblyai.py
@@ -0,0 +1,97 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import os
+import sys
+
+import aiohttp
+from dotenv import load_dotenv
+from loguru import logger
+from runner import configure
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMMessagesFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.services.assemblyai import AssemblyAISTTService
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        (room_url, token) = await configure(session)
+
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(),
+                vad_audio_passthrough=True,
+            ),
+        )
+
+        stt = AssemblyAISTTService(
+            api_key=os.getenv("ASSEMBLYAI_API_KEY"),
+        )
+
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
+        )
+
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
+
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = OpenAILLMContext(messages)
+        context_aggregator = llm.create_context_aggregator(context)
+
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                stt,  # STT
+                context_aggregator.user(),  # User responses
+                llm,  # LLM
+                tts,  # TTS
+                transport.output(),  # Transport bot output
+                context_aggregator.assistant(),  # Assistant spoken responses
+            ]
+        )
+
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            await transport.capture_participant_transcription(participant["id"])
+            # Kick off the conversation.
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/foundational/09-mirror.py
+++ b/examples/foundational/09-mirror.py
@@ -63,6 +63,7 @@ async def main():
            "Test",
            DailyParams(
                audio_in_enabled=True,
+                audio_in_sample_rate=24000,
                audio_out_enabled=True,
                camera_out_enabled=True,
                camera_out_is_live=True,
@@ -73,7 +74,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_video(participant["id"])
+            await transport.capture_participant_video(participant["id"])

        pipeline = Pipeline([transport.input(), MirrorProcessor(), transport.output()])

--- a/examples/foundational/09a-local-mirror.py
+++ b/examples/foundational/09a-local-mirror.py
@@ -65,7 +65,7 @@ async def main():
        tk_root.title("Local Mirror")

        daily_transport = DailyTransport(
-            room_url, token, "Test", DailyParams(audio_in_enabled=True)
+            room_url, token, "Test", DailyParams(audio_in_enabled=True, audio_in_sample_rate=24000)
        )

        tk_transport = TkLocalTransport(
@@ -81,7 +81,7 @@ async def main():

        @daily_transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_video(participant["id"])
+            await transport.capture_participant_video(participant["id"])

        pipeline = Pipeline([daily_transport.input(), MirrorProcessor(), tk_transport.output()])

--- a/examples/foundational/10-wake-phrase.py
+++ b/examples/foundational/10-wake-phrase.py
@@ -82,7 +82,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            await tts.say("Hi! If you want to talk to me, just say 'Hey Robot'.")

        runner = PipelineRunner()
--- a/examples/foundational/11-sound-effects.py
+++ b/examples/foundational/11-sound-effects.py
@@ -134,7 +134,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            await tts.say("Hi, I'm listening!")
            await transport.send_audio(sounds["ding1.wav"])

--- a/examples/foundational/12-describe-video.py
+++ b/examples/foundational/12-describe-video.py
@@ -84,8 +84,8 @@ async def main():
        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
            await tts.say("Hi there! Feel free to ask me what I see.")
-            transport.capture_participant_video(participant["id"], framerate=0)
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_video(participant["id"], framerate=0)
+            await transport.capture_participant_transcription(participant["id"])
            image_requester.set_participant_id(participant["id"])

        pipeline = Pipeline(
--- a/examples/foundational/12a-describe-video-gemini-flash.py
+++ b/examples/foundational/12a-describe-video-gemini-flash.py
@@ -86,8 +86,8 @@ async def main():
        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
            await tts.say("Hi there! Feel free to ask me what I see.")
-            transport.capture_participant_video(participant["id"], framerate=0)
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_video(participant["id"], framerate=0)
+            await transport.capture_participant_transcription(participant["id"])
            image_requester.set_participant_id(participant["id"])

        pipeline = Pipeline(
--- a/examples/foundational/12b-describe-video-gpt-4o.py
+++ b/examples/foundational/12b-describe-video-gpt-4o.py
@@ -83,8 +83,8 @@ async def main():
        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
            await tts.say("Hi there! Feel free to ask me what I see.")
-            transport.capture_participant_video(participant["id"], framerate=0)
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_video(participant["id"], framerate=0)
+            await transport.capture_participant_transcription(participant["id"])
            image_requester.set_participant_id(participant["id"])

        pipeline = Pipeline(
--- a/examples/foundational/12c-describe-video-anthropic.py
+++ b/examples/foundational/12c-describe-video-anthropic.py
@@ -78,16 +78,13 @@ async def main():
        tts = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
-            params=CartesiaTTSService.InputParams(
-                sample_rate=16000,
-            ),
        )

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
            await tts.say("Hi there! Feel free to ask me what I see.")
-            transport.capture_participant_video(participant["id"], framerate=0)
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_video(participant["id"], framerate=0)
+            await transport.capture_participant_transcription(participant["id"])
            image_requester.set_participant_id(participant["id"])

        pipeline = Pipeline(
--- a/examples/foundational/13c-gladia-transcription.py
+++ b/examples/foundational/13c-gladia-transcription.py
@@ -0,0 +1,63 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import os
+import sys
+
+import aiohttp
+from dotenv import load_dotenv
+from loguru import logger
+from runner import configure
+
+from pipecat.frames.frames import Frame, TranscriptionFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.services.gladia import GladiaSTTService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+class TranscriptionLogger(FrameProcessor):
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, TranscriptionFrame):
+            print(f"Transcription: {frame.text}")
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        (room_url, _) = await configure(session)
+
+        transport = DailyTransport(
+            room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
+        )
+
+        stt = GladiaSTTService(
+            api_key=os.getenv("GLADIA_API_KEY"),
+            # live_options=LiveOptions(language=Language.FR),
+        )
+
+        tl = TranscriptionLogger()
+
+        pipeline = Pipeline([transport.input(), stt, tl])
+
+        task = PipelineTask(pipeline)
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/foundational/13d-assemblyai-transcription.py
+++ b/examples/foundational/13d-assemblyai-transcription.py
@@ -0,0 +1,62 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import os
+import sys
+
+import aiohttp
+from dotenv import load_dotenv
+from loguru import logger
+from runner import configure
+
+from pipecat.frames.frames import Frame, TranscriptionFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.services.assemblyai import AssemblyAISTTService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+class TranscriptionLogger(FrameProcessor):
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, TranscriptionFrame):
+            print(f"Transcription: {frame.text}")
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        (room_url, _) = await configure(session)
+
+        transport = DailyTransport(
+            room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
+        )
+
+        stt = AssemblyAISTTService(
+            api_key=os.getenv("ASSEMBLYAI_API_KEY"),
+        )
+
+        tl = TranscriptionLogger()
+
+        pipeline = Pipeline([transport.input(), stt, tl])
+
+        task = PipelineTask(pipeline)
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/foundational/14-function-calling.py
+++ b/examples/foundational/14-function-calling.py
@@ -127,7 +127,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([context_aggregator.user().get_context_frame()])

--- a/examples/foundational/14a-function-calling-anthropic.py
+++ b/examples/foundational/14a-function-calling-anthropic.py
@@ -105,7 +105,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([context_aggregator.user().get_context_frame()])

--- a/examples/foundational/14b-function-calling-anthropic-video.py
+++ b/examples/foundational/14b-function-calling-anthropic-video.py
@@ -160,8 +160,8 @@ If you need to use a tool, simply use the tool. Do not tell the user the tool yo
        async def on_first_participant_joined(transport, participant):
            global video_participant_id
            video_participant_id = participant["id"]
-            transport.capture_participant_transcription(video_participant_id)
-            transport.capture_participant_video(video_participant_id, framerate=0)
+            await transport.capture_participant_transcription(video_participant_id)
+            await transport.capture_participant_video(video_participant_id, framerate=0)
            # Kick off the conversation.
            await task.queue_frames([context_aggregator.user().get_context_frame()])

--- a/examples/foundational/14c-function-calling-together.py
+++ b/examples/foundational/14c-function-calling-together.py
@@ -123,7 +123,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            # await tts.say("Hi! Ask me about the weather in San Francisco.")

--- a/examples/foundational/14d-function-calling-video.py
+++ b/examples/foundational/14d-function-calling-video.py
@@ -153,8 +153,8 @@ indicate you should use the get_image tool are:
        async def on_first_participant_joined(transport, participant):
            global video_participant_id
            video_participant_id = participant["id"]
-            transport.capture_participant_transcription(participant["id"])
-            transport.capture_participant_video(video_participant_id, framerate=0)
+            await transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_video(video_participant_id, framerate=0)
            # Kick off the conversation.
            await tts.say("Hi! Ask me about the weather in San Francisco.")

--- a/examples/foundational/14e-function-calling-gemini.py
+++ b/examples/foundational/14e-function-calling-gemini.py
@@ -0,0 +1,173 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.google import GoogleLLMService
+from pipecat.services.openai import OpenAILLMContext
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+video_participant_id = None
+
+
+async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
+    location = arguments["location"]
+    await result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
+
+
+async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback):
+    logger.debug(f"!!! IN get_image {video_participant_id}, {arguments}")
+    question = arguments["question"]
+    await llm.request_image_frame(user_id=video_participant_id, text_content=question)
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        (room_url, token) = await configure(session)
+
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(),
+            ),
+        )
+
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
+        )
+
+        llm = GoogleLLMService(model="gemini-1.5-flash-latest", api_key=os.getenv("GOOGLE_API_KEY"))
+        llm.register_function("get_weather", get_weather)
+        llm.register_function("get_image", get_image)
+
+        tools = [
+            {
+                "function_declarations": [
+                    {
+                        "name": "get_weather",
+                        "description": "Get the current weather",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "location": {
+                                    "type": "string",
+                                    "description": "The city and state, e.g. San Francisco, CA",
+                                },
+                                "format": {
+                                    "type": "string",
+                                    "enum": ["celsius", "fahrenheit"],
+                                    "description": "The temperature unit to use. Infer this from the users location.",
+                                },
+                            },
+                            "required": ["location", "format"],
+                        },
+                    },
+                    {
+                        "name": "get_image",
+                        "description": "Get and image from the camera or video stream.",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "question": {
+                                    "type": "string",
+                                    "description": "The question to to use when running inference on the acquired image.",
+                                },
+                            },
+                            "required": ["question"],
+                        },
+                    },
+                ]
+            }
+        ]
+
+        system_prompt = """\
+You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.
+
+Your response will be turned into speech so use only simple words and punctuation.
+
+You have access to two tools: get_weather and get_image.
+
+You can respond to questions about the weather using the get_weather tool.
+
+You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
+indicate you should use the get_image tool are:
+  - What do you see?
+  - What's in the video?
+  - Can you describe the video?
+  - Tell me about what you see.
+  - Tell me something interesting about what you see.
+  - What's happening in the video?
+"""
+        messages = [
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": "Say hello."},
+        ]
+
+        context = OpenAILLMContext(messages, tools)
+        context_aggregator = llm.create_context_aggregator(context)
+
+        pipeline = Pipeline(
+            [
+                transport.input(),
+                context_aggregator.user(),
+                llm,
+                tts,
+                transport.output(),
+                context_aggregator.assistant(),
+            ]
+        )
+
+        task = PipelineTask(
+            pipeline,
+            PipelineParams(
+                allow_interruptions=True,
+                enable_metrics=True,
+                enable_usage_metrics=True,
+                report_only_initial_ttfb=True,
+            ),
+        )
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            global video_participant_id
+            video_participant_id = participant["id"]
+            await transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_video(video_participant_id, framerate=0)
+            # Kick off the conversation.
+            await task.queue_frames([context_aggregator.user().get_context_frame()])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/foundational/15-switch-voices.py
+++ b/examples/foundational/15-switch-voices.py
@@ -141,7 +141,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append(
                {
--- a/examples/foundational/15a-switch-languages.py
+++ b/examples/foundational/15a-switch-languages.py
@@ -10,7 +10,7 @@ import os
 import sys

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import LLMMessagesFrame
+from pipecat.frames.frames import LLMMessagesFrame, TTSUpdateSettingsFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.parallel_pipeline import ParallelPipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -19,7 +19,6 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.processors.filters.function_filter import FunctionFilter
 from pipecat.services.cartesia import CartesiaTTSService
 from pipecat.services.openai import OpenAILLMService
-from pipecat.services.whisper import Model, WhisperSTTService
 from pipecat.transports.services.daily import DailyParams, DailyTransport

 from openai.types.chat import ChatCompletionToolParam
@@ -61,16 +60,14 @@ async def main():
            token,
            "Pipecat",
            DailyParams(
-                audio_in_enabled=True,
                audio_out_enabled=True,
+                transcription_enabled=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
                vad_audio_passthrough=True,
            ),
        )

-        stt = WhisperSTTService(model=Model.LARGE)
-
        english_tts = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
@@ -116,7 +113,6 @@ async def main():
        pipeline = Pipeline(
            [
                transport.input(),  # Transport user input
-                stt,  # STT
                context_aggregator.user(),  # User responses
                llm,  # LLM
                ParallelPipeline(  # TTS (bot will speak the chosen language)
@@ -132,7 +128,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append(
                {
--- a/examples/foundational/16-gpu-container-local-bot.py
+++ b/examples/foundational/16-gpu-container-local-bot.py
@@ -92,7 +92,7 @@ async def main():
        # bot can "hear" and respond to them.
        @transport.event_handler("on_participant_joined")
        async def on_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])

        # When the first participant joins, the bot should introduce itself.
        @transport.event_handler("on_first_participant_joined")
--- a/examples/foundational/17-detect-user-idle.py
+++ b/examples/foundational/17-detect-user-idle.py
@@ -99,7 +99,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
--- a/examples/foundational/19-openai-realtime-beta.py
+++ b/examples/foundational/19-openai-realtime-beta.py
@@ -166,7 +166,7 @@ Remember, your responses should be short. Just one or two sentences, usually."""

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([context_aggregator.user().get_context_frame()])

--- a/examples/foundational/20a-persistent-context-openai.py
+++ b/examples/foundational/20a-persistent-context-openai.py
@@ -223,7 +223,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([context_aggregator.user().get_context_frame()])

--- a/examples/foundational/20b-persistent-context-openai-realtime.py
+++ b/examples/foundational/20b-persistent-context-openai-realtime.py
@@ -249,7 +249,7 @@ Remember, your responses should be short. Just one or two sentences, usually."""

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([context_aggregator.user().get_context_frame()])

--- a/examples/foundational/20c-persistent-context-anthropic.py
+++ b/examples/foundational/20c-persistent-context-anthropic.py
@@ -219,7 +219,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([context_aggregator.user().get_context_frame()])

--- a/examples/foundational/20d-persistent-context-gemini.py
+++ b/examples/foundational/20d-persistent-context-gemini.py
@@ -0,0 +1,290 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import glob
+import json
+import os
+import sys
+from datetime import datetime
+
+import aiohttp
+from dotenv import load_dotenv
+from loguru import logger
+from runner import configure
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.audio.vad.vad_analyzer import VADParams
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import (
+    OpenAILLMContext,
+)
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.google import GoogleLLMService
+
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+video_participant_id = None
+
+
+BASE_FILENAME = "/tmp/pipecat_conversation_"
+tts = None
+
+
+async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
+    temperature = 75 if args["format"] == "fahrenheit" else 24
+    await result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": args["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback):
+    question = arguments["question"]
+    await llm.request_image_frame(user_id=video_participant_id, text_content=question)
+
+
+async def get_saved_conversation_filenames(
+    function_name, tool_call_id, args, llm, context, result_callback
+):
+    # Construct the full pattern including the BASE_FILENAME
+    full_pattern = f"{BASE_FILENAME}*.json"
+
+    # Use glob to find all matching files
+    matching_files = glob.glob(full_pattern)
+    logger.debug(f"matching files: {matching_files}")
+
+    await result_callback({"filenames": matching_files})
+
+
+async def save_conversation(function_name, tool_call_id, args, llm, context, result_callback):
+    timestamp = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
+    filename = f"{BASE_FILENAME}{timestamp}.json"
+    logger.debug(
+        f"writing conversation to {filename}\n{json.dumps(context.get_messages_for_logging(), indent=4)}"
+    )
+    try:
+        with open(filename, "w") as file:
+            # todo: extract 'system' into the first message in the list
+            messages = context.get_messages_for_persistent_storage()
+            # remove the last message (the instruction to save the context)
+            messages.pop()
+            json.dump(messages, file, indent=2)
+        await result_callback({"success": True})
+    except Exception as e:
+        logger.debug(f"error saving conversation: {e}")
+        await result_callback({"success": False, "error": str(e)})
+
+
+async def load_conversation(function_name, tool_call_id, args, llm, context, result_callback):
+    global tts
+    filename = args["filename"]
+    logger.debug(f"loading conversation from {filename}")
+    try:
+        with open(filename, "r") as file:
+            context.set_messages(json.load(file))
+        await result_callback(
+            {
+                "success": True,
+                "message": "The most recent conversation has been loaded. Awaiting further instructions.",
+            }
+        )
+    except Exception as e:
+        await result_callback({"success": False, "error": str(e)})
+
+
+# Test message munging ...
+messages = [
+    {
+        "role": "system",
+        "content": """You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your
+capabilities in a succinct way. Your output will be converted to audio so don't include special
+characters in your answers. Respond to what the user said in a creative and helpful way.
+
+You have several tools you can use to help you.
+
+You can respond to questions about the weather using the get_weather tool.
+
+You can save the current conversation using the save_conversation tool. This tool allows you to save
+the current conversation to external storage. If the user asks you to save the conversation, use this
+save_conversation too.
+
+You can load a saved conversation using the load_conversation tool. This tool allows you to load a
+conversation from external storage. You can get a list of conversations that have been saved using the
+get_saved_conversation_filenames tool.
+
+You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
+indicate you should use the get_image tool are:
+  - What do you see?
+  - What's in the video?
+  - Can you describe the video?
+  - Tell me about what you see.
+  - Tell me something interesting about what you see.
+  - What's happening in the video?
+        """,
+    },
+    # {"role": "user", "content": ""},
+    # {"role": "assistant", "content": []},
+    # {"role": "user", "content": "Tell me"},
+    # {"role": "user", "content": "a joke"},
+]
+tools = [
+    {
+        "function_declarations": [
+            {
+                "name": "get_current_weather",
+                "description": "Get the current weather",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "location": {
+                            "type": "string",
+                            "description": "The city and state, e.g. San Francisco, CA",
+                        },
+                        "format": {
+                            "type": "string",
+                            "enum": ["celsius", "fahrenheit"],
+                            "description": "The temperature unit to use. Infer this from the users location.",
+                        },
+                    },
+                    "required": ["location", "format"],
+                },
+            },
+            {
+                "name": "save_conversation",
+                "description": "Save the current conversation. Use this function to persist the current conversation to external storage.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "user_request_text": {
+                            "type": "string",
+                            "description": "The text of the user's request to save the conversation.",
+                        }
+                    },
+                    "required": ["user_request_text"],
+                },
+            },
+            {
+                "name": "get_saved_conversation_filenames",
+                "description": "Get a list of saved conversation histories. Returns a list of filenames. Each filename includes a date and timestamp. Each file is conversation history that can be loaded into this session.",
+                "parameters": None,
+            },
+            {
+                "name": "load_conversation",
+                "description": "Load a conversation history. Use this function to load a conversation history into the current session.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "filename": {
+                            "type": "string",
+                            "description": "The filename of the conversation history to load.",
+                        }
+                    },
+                    "required": ["filename"],
+                },
+            },
+            {
+                "name": "get_image",
+                "description": "Get and image from the camera or video stream.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "question": {
+                            "type": "string",
+                            "description": "The question to to use when running inference on the acquired image.",
+                        },
+                    },
+                    "required": ["question"],
+                },
+            },
+        ]
+    },
+]
+
+
+async def main():
+    global tts
+    async with aiohttp.ClientSession() as session:
+        (room_url, token) = await configure(session)
+
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.8)),
+            ),
+        )
+
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
+        )
+
+        llm = GoogleLLMService(model="gemini-1.5-flash-latest", api_key=os.getenv("GOOGLE_API_KEY"))
+
+        # you can either register a single function for all function calls, or specific functions
+        # llm.register_function(None, fetch_weather_from_api)
+        llm.register_function("get_current_weather", fetch_weather_from_api)
+        llm.register_function("save_conversation", save_conversation)
+        llm.register_function("get_saved_conversation_filenames", get_saved_conversation_filenames)
+        llm.register_function("load_conversation", load_conversation)
+        llm.register_function("get_image", get_image)
+
+        context = OpenAILLMContext(messages, tools)
+        context_aggregator = llm.create_context_aggregator(context)
+
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                context_aggregator.user(),
+                llm,  # LLM
+                tts,
+                context_aggregator.assistant(),
+                transport.output(),  # Transport bot output
+            ]
+        )
+
+        task = PipelineTask(
+            pipeline,
+            PipelineParams(
+                allow_interruptions=True,
+                enable_metrics=True,
+                enable_usage_metrics=True,
+                # report_only_initial_ttfb=True,
+            ),
+        )
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            global video_participant_id
+            video_participant_id = participant["id"]
+            await transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_video(video_participant_id, framerate=0)
+            # Kick off the conversation.
+            await task.queue_frames([context_aggregator.user().get_context_frame()])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/foundational/21-tavus-layer.py
+++ b/examples/foundational/21-tavus-layer.py
@@ -0,0 +1,133 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from typing import Any, Mapping
+
+from pipecat.frames.frames import LLMMessagesFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_response import (
+    LLMAssistantResponseAggregator,
+    LLMUserResponseAggregator,
+)
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.services.deepgram import DeepgramSTTService
+from pipecat.services.tavus import TavusVideoService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+
+from loguru import logger
+
+from dotenv import load_dotenv
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        tavus = TavusVideoService(
+            api_key=os.getenv("TAVUS_API_KEY"),
+            replica_id=os.getenv("TAVUS_REPLICA_ID"),
+            persona_id=os.getenv("TAVUS_PERSONA_ID", "pipecat0"),
+            session=session,
+        )
+
+        # get persona, look up persona_name, set this as the bot name to ignore
+        persona_name = await tavus.get_persona_name()
+        room_url = await tavus.initialize()
+
+        transport = DailyTransport(
+            room_url=room_url,
+            token=None,
+            bot_name="Pipecat bot",
+            params=DailyParams(
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(),
+                vad_audio_passthrough=True,
+            ),
+        )
+
+        stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            voice_id="a167e0f3-df7e-4d52-a9c3-f949145efdab",
+        )
+
+        llm = OpenAILLMService(model="gpt-4o-mini")
+
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        tma_in = LLMUserResponseAggregator(messages)
+        tma_out = LLMAssistantResponseAggregator(messages)
+
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                stt,  # STT
+                tma_in,  # User responses
+                llm,  # LLM
+                tts,  # TTS
+                tavus,  # Tavus output layer
+                transport.output(),  # Transport bot output
+                tma_out,  # Assistant spoken responses
+            ]
+        )
+
+        task = PipelineTask(
+            pipeline,
+            PipelineParams(
+                allow_interruptions=True,
+                enable_metrics=True,
+                enable_usage_metrics=True,
+                report_only_initial_ttfb=True,
+            ),
+        )
+
+        @transport.event_handler("on_participant_joined")
+        async def on_participant_joined(
+            transport: DailyTransport, participant: Mapping[str, Any]
+        ) -> None:
+            # Ignore the Tavus replica's microphone
+            if participant.get("info", {}).get("userName", "") == persona_name:
+                logger.debug(f"Ignoring {participant['id']}'s microphone")
+                await transport.update_subscriptions(
+                    participant_settings={
+                        participant["id"]: {
+                            "media": {"microphone": "unsubscribed"},
+                        }
+                    }
+                )
+
+            if participant.get("info", {}).get("userName", "") != persona_name:
+                # Kick off the conversation.
+                messages.append(
+                    {"role": "system", "content": "Please introduce yourself to the user."}
+                )
+                await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/foundational/22-natural-conversation.py
+++ b/examples/foundational/22-natural-conversation.py
@@ -0,0 +1,168 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import aiohttp
+import os
+import sys
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMMessagesFrame, TextFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.parallel_pipeline import ParallelPipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.gated_openai_llm_context import GatedOpenAILLMContextAggregator
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.processors.filters.null_filter import NullFilter
+from pipecat.processors.filters.wake_notifier_filter import WakeNotifierFilter
+from pipecat.processors.user_idle_processor import UserIdleProcessor
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.deepgram import DeepgramSTTService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.sync.event_notifier import EventNotifier
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+
+from runner import configure
+
+from loguru import logger
+
+from dotenv import load_dotenv
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        (room_url, _) = await configure(session)
+
+        transport = DailyTransport(
+            room_url,
+            None,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(),
+                vad_audio_passthrough=True,
+            ),
+        )
+
+        stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
+        )
+
+        # This is the LLM that will be used to detect if the user has finished a
+        # statement. This doesn't really need to be an LLM, we could use NLP
+        # libraries for that, but it was easier as an example because we
+        # leverage the context aggregators.
+        statement_llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
+
+        statement_messages = [
+            {
+                "role": "system",
+                "content": "Determine if the user's statement is a complete sentence or question, ending in a natural pause or punctuation. Return 'YES' if it is complete and 'NO' if it seems to leave a thought unfinished.",
+            },
+        ]
+
+        statement_context = OpenAILLMContext(statement_messages)
+        statement_context_aggregator = statement_llm.create_context_aggregator(statement_context)
+
+        # This is the regular LLM.
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
+
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = OpenAILLMContext(messages)
+        context_aggregator = llm.create_context_aggregator(context)
+
+        # We have instructed the LLM to return 'YES' if it thinks the user
+        # completed a sentence. So, if it's 'YES' we will return true in this
+        # predicate which will wake up the notifier.
+        async def wake_check_filter(frame):
+            return frame.text == "YES"
+
+        # This is a notifier that we use to synchronize the two LLMs.
+        notifier = EventNotifier()
+
+        # This a filter that will wake up the notifier if the given predicate
+        # (wake_check_filter) returns true.
+        completness_check = WakeNotifierFilter(
+            notifier, types=(TextFrame,), filter=wake_check_filter
+        )
+
+        # This processor keeps the last context and will let it through once the
+        # notifier is woken up.
+        gated_context_aggregator = GatedOpenAILLMContextAggregator(notifier)
+
+        # Notify if the user hasn't said anything.
+        async def user_idle_notifier(frame):
+            await notifier.notify()
+
+        # Sometimes the LLM will fail detecting if a user has completed a
+        # sentence, this will wake up the notifier if that happens.
+        user_idle = UserIdleProcessor(callback=user_idle_notifier, timeout=3.0)
+
+        # The ParallePipeline input are the user transcripts. We have two
+        # contexts. The first one will be used to determine if the user finished
+        # a statement and if so the notifier will be woken up. The second
+        # context is simply the regular context but it's gated waiting for the
+        # notifier to be woken up.
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                stt,
+                ParallelPipeline(
+                    [
+                        statement_context_aggregator.user(),
+                        statement_llm,
+                        completness_check,
+                        NullFilter(),
+                    ],
+                    [context_aggregator.user(), gated_context_aggregator, llm],
+                ),
+                user_idle,
+                tts,  # TTS
+                transport.output(),  # Transport bot output
+                context_aggregator.assistant(),  # Assistant spoken responses
+            ]
+        )
+
+        task = PipelineTask(
+            pipeline,
+            PipelineParams(
+                allow_interruptions=True,
+                enable_metrics=True,
+                enable_usage_metrics=True,
+                report_only_initial_ttfb=True,
+            ),
+        )
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            await transport.capture_participant_transcription(participant["id"])
+            # Kick off the conversation.
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            await task.queue_frames([LLMMessagesFrame(messages)])
+
+        runner = PipelineRunner()
+
+        await runner.run(task)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/foundational/99-anthropic-hackathon.py
+++ b/examples/foundational/99-anthropic-hackathon.py
@@ -0,0 +1,298 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+import base64
+import io
+import os
+import sys
+from collections import deque
+
+import aiohttp
+from dotenv import load_dotenv
+from loguru import logger
+from PIL import Image
+from runner import configure
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import (
+    BotInterruptionFrame,
+    Frame,
+    ImageRawFrame,
+    LLMFullResponseEndFrame,
+    LLMMessagesFrame,
+    TextFrame,
+    TranscriptionFrame,
+)
+from pipecat.pipeline.parallel_pipeline import ParallelPipeline
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import (
+    OpenAILLMContext,
+    OpenAILLMContextFrame,
+)
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.processors.frameworks.rtvi import (
+    RTVIBotTranscriptionProcessor,
+    RTVIUserTranscriptionProcessor,
+)
+from pipecat.services.anthropic import AnthropicLLMContext, AnthropicLLMService
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+
+load_dotenv(override=True)
+
+logger.remove(0)
+logger.add(sys.stderr, level="DEBUG")
+
+MAX_FRAMES = 5
+FRAMES_PER_SECOND = 0.2
+
+
+video_participant_id = None
+anthropic_context = None
+recent_image_frames = deque(maxlen=MAX_FRAMES)
+most_recent_image_summary = ""
+
+
+class ImageFrameCatcher(FrameProcessor):
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        global recent_image_frames
+
+        await super().process_frame(frame, direction)
+        if isinstance(frame, ImageRawFrame):
+            recent_image_frames.append(frame)
+        else:
+            await self.push_frame(frame, direction)
+
+
+class TranscriptFrameCatcher(FrameProcessor):
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+        if isinstance(frame, TranscriptionFrame):
+            logger.debug(
+                f"TranscriptLogger: {frame}, num frames: {len(recent_image_frames)}, anthropic context: {anthropic_context}"
+            )
+            if anthropic_context:
+                add_message_with_images(
+                    anthropic_context, frame.text, frames=list(recent_image_frames)
+                )
+        await self.push_frame(frame, direction)
+
+
+class MessageFrameCatcher(FrameProcessor):
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+        if isinstance(frame, OpenAILLMContextFrame):
+            last_message = frame.context.messages[-1]
+
+            system_message = """
+Give me a concise summary of the images supplied.
+            """
+            frame = LLMMessagesFrame(
+                messages=[
+                    {
+                        "role": "system",
+                        "content": system_message,
+                    },
+                    last_message,
+                ],
+            )
+            await self.push_frame(frame, direction)
+            return
+
+
+class MessageFrameCatcher2(FrameProcessor):
+    def __init__(self):
+        super().__init__()
+        self.text_blob = ""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        global most_recent_image_summary
+        await super().process_frame(frame, direction)
+        if isinstance(frame, TextFrame):
+            self.text_blob += f" {frame.text}"
+
+        if isinstance(frame, LLMFullResponseEndFrame):
+            logger.debug(f"MessageFrameCatcher2: {self.text_blob}")
+            most_recent_image_summary = self.text_blob
+            self.text_blob = ""
+
+        await self.push_frame(frame, direction)
+
+
+async def main():
+    global llm
+    global anthropic_context
+
+    async with aiohttp.ClientSession() as session:
+        (room_url, token) = await configure(session)
+
+        transport = DailyTransport(
+            room_url,
+            token,
+            "Respond bot",
+            DailyParams(
+                audio_out_enabled=True,
+                transcription_enabled=True,
+                vad_enabled=True,
+                vad_analyzer=SileroVADAnalyzer(),
+            ),
+        )
+
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
+        )
+
+        llm = AnthropicLLMService(
+            api_key=os.getenv("ANTHROPIC_API_KEY"),
+            model="claude-3-5-sonnet-20240620",
+            enable_prompt_caching_beta=True,
+        )
+
+        vision_llm = AnthropicLLMService(
+            api_key=os.getenv("ANTHROPIC_API_KEY"),
+            model="claude-3-5-sonnet-20240620",
+            enable_prompt_caching_beta=True,
+        )
+
+        # todo: test with very short initial user message
+
+        system_prompt = """\
+You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions. Keep
+your answers brief unless explicitly asked for more information.
+
+Your response will be turned into speech so use only simple words and punctuation.
+        """
+
+        messages = [
+            {
+                "role": "system",
+                "content": [
+                    {
+                        "type": "text",
+                        "text": system_prompt,
+                    }
+                ],
+            },
+            {"role": "user", "content": "Start the conversation by saying 'hello'."},
+        ]
+
+        context = OpenAILLMContext(messages)
+        anthropic_context = AnthropicLLMContext.upgrade_to_anthropic(context)
+        context_aggregator = llm.create_context_aggregator(context)
+
+        rtvi_user_transcription = RTVIUserTranscriptionProcessor()
+        rtvi_bot_transcription = RTVIBotTranscriptionProcessor()
+
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                ImageFrameCatcher(),
+                TranscriptFrameCatcher(),
+                rtvi_user_transcription,
+                context_aggregator.user(),  # User speech to text
+                ParallelPipeline(
+                    [
+                        llm,  # LLM
+                        rtvi_bot_transcription,
+                        tts,  # TTS
+                        transport.output(),  # Transport bot output
+                        context_aggregator.assistant(),  # Assistant spoken responses and tool context
+                    ],
+                    [MessageFrameCatcher(), vision_llm, MessageFrameCatcher2()],
+                ),
+            ],
+        )
+
+        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
+
+        @transport.event_handler("on_first_participant_joined")
+        async def on_first_participant_joined(transport, participant):
+            global video_participant_id
+            video_participant_id = participant["id"]
+            await transport.capture_participant_transcription(video_participant_id)
+            await transport.capture_participant_video(
+                video_participant_id, framerate=FRAMES_PER_SECOND, video_source="screenVideo"
+            )
+            # Kick off the conversation.
+            await task.queue_frames([context_aggregator.user().get_context_frame()])
+
+        @transport.event_handler("on_app_message")
+        async def on_app_message(transport, message, sender):
+            logger.debug(f"Received app message: {message} - {context}")
+
+            if not recent_image_frames:
+                logger.debug("No image frames to send")
+                return
+
+            add_message_with_images(
+                anthropic_context, message["message"], frames=list(recent_image_frames)
+            )
+
+            interrupt_message = "STOP"
+
+            if interrupt_message == message["message"]:
+                logger.debug("Interrupting")
+                await task.queue_frames([BotInterruptionFrame()])
+            else:
+                await task.queue_frames([context_aggregator.user().get_context_frame()])
+
+        runner = PipelineRunner()
+        await runner.run(task)
+
+
+def add_message_with_images(c, message, frames=None):
+    if frames is None:
+        frames = list(recent_image_frames)
+
+    if not frames:
+        logger.debug("No image frames to send")
+        return
+
+    # Create content list starting with all images
+    content = []
+    for frame in frames:
+        buffer = io.BytesIO()
+        Image.frombytes(frame.format, frame.size, frame.image).save(buffer, format="JPEG")
+        encoded_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
+
+        content.append(
+            {
+                "type": "image",
+                "source": {
+                    "type": "base64",
+                    "media_type": "image/jpeg",
+                    "data": encoded_image,
+                },
+            }
+        )
+
+    # Add text message at the end if provided
+    if message:
+        content.append({"type": "text", "text": message})
+
+    # Go through all messages and replace user messages containing images
+    if c.messages:
+        for i, msg in enumerate(c.messages):
+            if (
+                msg["role"] == "user"
+                and isinstance(msg["content"], list)
+                and len(msg["content"]) > 0
+            ):
+                if msg["content"][0].get("type") == "image":
+                    logger.debug(
+                        f"Replacing user message {i} containing images with summary: {most_recent_image_summary}"
+                    )
+                    c.messages[i] = {"role": "user", "content": most_recent_image_summary}
+
+    c.add_message({"role": "user", "content": content})
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/examples/moondream-chatbot/README.md
+++ b/examples/moondream-chatbot/README.md
@@ -24,7 +24,7 @@ cp env.example .env # and add your credentials
 python server.py
 ```

-Then, visit `http://localhost:7860/start` in your browser to start a chatbot
+Then, visit `http://localhost:7860/` in your browser to start a chatbot
 session.

 ## Build and test the Docker image
@@ -41,4 +41,4 @@ docker build -t moonbot -f Dockerfile.intel .
 docker run --env-file .env -p 7860:7860 --device /dev/dri moonbot
 ```

-You can try to visit `http://localhost:7860/start` again.
+You can try to visit `http://localhost:7860/` again.
--- a/examples/moondream-chatbot/bot.py
+++ b/examples/moondream-chatbot/bot.py
@@ -203,8 +203,8 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
-            transport.capture_participant_video(participant["id"], framerate=0)
+            await transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_video(participant["id"], framerate=0)
            ir.set_participant_id(participant["id"])
            await task.queue_frames([LLMMessagesFrame(messages)])

--- a/examples/moondream-chatbot/server.py
+++ b/examples/moondream-chatbot/server.py
@@ -57,7 +57,7 @@ app.add_middleware(
 )


-@app.get("/start")
+@app.get("/")
 async def start_agent(request: Request):
    print(f"!!! Creating room")
    room = await daily_helpers["rest"].create_room(DailyRoomParams())
--- a/examples/patient-intake/README.md
+++ b/examples/patient-intake/README.md
@@ -54,7 +54,7 @@ cp env.example .env # and add your credentials
 python server.py
 ```

-Then, visit `http://localhost:7860/start` in your browser to start a chatbot session.
+Then, visit `http://localhost:7860/` in your browser to start a chatbot session.

 ## Build and test the Docker image

--- a/examples/patient-intake/bot.py
+++ b/examples/patient-intake/bot.py
@@ -352,7 +352,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            print(f"Context is: {context}")
            await task.queue_frames([OpenAILLMContextFrame(context)])

--- a/examples/patient-intake/server.py
+++ b/examples/patient-intake/server.py
@@ -57,7 +57,7 @@ app.add_middleware(
 )


-@app.get("/start")
+@app.get("/")
 async def start_agent(request: Request):
    print(f"!!! Creating room")
    room = await daily_helpers["rest"].create_room(DailyRoomParams())
@@ -128,7 +128,7 @@ if __name__ == "__main__":
    parser.add_argument("--reload", action="store_true", help="Reload code on change")

    config = parser.parse_args()
-    print(f"to join a test room, visit http://localhost:{config.port}/start")
+    print(f"to join a test room, visit http://localhost:{config.port}/")
    uvicorn.run(
        "server:app",
        host=config.host,
--- a/examples/simple-chatbot/README.md
+++ b/examples/simple-chatbot/README.md
@@ -27,7 +27,7 @@ cp env.example .env # and add your credentials
 python server.py
 ```

-Then, visit `http://localhost:7860/start` in your browser to start a chatbot session.
+Then, visit `http://localhost:7860/` in your browser to start a chatbot session.

 ## Build and test the Docker image

--- a/examples/simple-chatbot/server.py
+++ b/examples/simple-chatbot/server.py
@@ -17,6 +17,10 @@ from fastapi.responses import JSONResponse, RedirectResponse

 from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams

+from dotenv import load_dotenv
+
+load_dotenv(override=True)
+
 MAX_BOTS_PER_ROOM = 1

 # Bot sub-process dict for status reporting and concurrency control
@@ -57,7 +61,7 @@ app.add_middleware(
 )


-@app.get("/start")
+@app.get("/")
 async def start_agent(request: Request):
    print(f"!!! Creating room")
    room = await daily_helpers["rest"].create_room(DailyRoomParams())
--- a/examples/storytelling-chatbot/frontend/components/App.tsx
+++ b/examples/storytelling-chatbot/frontend/components/App.tsx
@@ -27,7 +27,7 @@ export default function Call() {

    // Create a new room for the story session
    try {
-      const response = await fetch("/start_bot", {
+      const response = await fetch("/", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
--- a/examples/storytelling-chatbot/src/bot.py
+++ b/examples/storytelling-chatbot/src/bot.py
@@ -102,7 +102,7 @@ async def main(room_url, token=None):
        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
            logger.debug("Participant joined, storytime commence!")
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            await intro_task.queue_frames(
                [
                    images["book1"],
--- a/examples/storytelling-chatbot/src/bot_runner.py
+++ b/examples/storytelling-chatbot/src/bot_runner.py
@@ -69,7 +69,7 @@ STATIC_DIR = "frontend/out"
 app.mount("/static", StaticFiles(directory=STATIC_DIR, html=True), name="static")


-@app.post("/start_bot")
+@app.post("/")
 async def start_bot(request: Request) -> JSONResponse:
    if os.getenv("ENV", "dev") == "production":
        # Only allow requests from the specified domain
--- a/examples/studypal/studypal.py
+++ b/examples/studypal/studypal.py
@@ -165,7 +165,7 @@ Your task is to help the user understand and learn from this article in 2 senten

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])
            messages.append(
                {
                    "role": "system",
--- a/examples/translation-chatbot/README.md
+++ b/examples/translation-chatbot/README.md
@@ -23,7 +23,7 @@ cp env.example .env # and add your credentials
 python server.py
 ```

-Then, visit `http://localhost:7860/start` in your browser to start a translatorbot session.
+Then, visit `http://localhost:7860/` in your browser to start a translatorbot session.

 ## Build and test the Docker image

--- a/examples/translation-chatbot/bot.py
+++ b/examples/translation-chatbot/bot.py
@@ -121,7 +121,7 @@ async def main():

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
-            transport.capture_participant_transcription(participant["id"])
+            await transport.capture_participant_transcription(participant["id"])

        runner = PipelineRunner()

--- a/examples/translation-chatbot/server.py
+++ b/examples/translation-chatbot/server.py
@@ -57,7 +57,7 @@ app.add_middleware(
 )


-@app.get("/start")
+@app.get("/")
 async def start_agent(request: Request):
    print(f"!!! Creating room")
    room = await daily_helpers["rest"].create_room(DailyRoomParams())
--- a/examples/twilio-chatbot/README.md
+++ b/examples/twilio-chatbot/README.md
@@ -53,7 +53,7 @@ This project is a FastAPI-based chatbot that integrates with Twilio to handle We
    ```

 2. **Update the Twilio Webhook**:
-    Copy the ngrok URL and update your Twilio phone number webhook URL to `http://<ngrok_url>/start_call`.
+    Copy the ngrok URL and update your Twilio phone number webhook URL to `http://<ngrok_url>/`.

 3. **Update streams.xml**:
    Copy the ngrok URL and update templates/streams.xml with `wss://<ngrok_url>/ws`.
--- a/examples/twilio-chatbot/server.py
+++ b/examples/twilio-chatbot/server.py
@@ -19,7 +19,7 @@ app.add_middleware(
 )


-@app.post("/start_call")
+@app.post("/")
 async def start_call():
    print("POST TwiML")
    return HTMLResponse(content=open("templates/streams.xml").read(), media_type="application/xml")
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -21,14 +21,14 @@ classifiers = [
 ]
 dependencies = [
    "aiohttp~=3.10.3",
+    "loguru~=0.7.2",
    "Markdown~=3.7",
    "numpy~=1.26.4",
-    "loguru~=0.7.2",
    "Pillow~=10.4.0",
    "protobuf~=4.25.4",
    "pydantic~=2.8.2",
    "pyloudnorm~=0.1.1",
-    "scipy~=1.14.1",
+    "resampy~=0.4.3",
 ]

 [project.urls]
@@ -37,17 +37,18 @@ Website = "https://pipecat.ai"

 [project.optional-dependencies]
 anthropic = [ "anthropic~=0.34.0" ]
+assemblyai = [ "assemblyai~=0.34.0" ]
 aws = [ "boto3~=1.35.27" ]
 azure = [ "azure-cognitiveservices-speech~=1.40.0" ]
 canonical = [ "aiofiles~=24.1.0" ]
 cartesia = [ "cartesia~=1.0.13", "websockets~=13.1" ]
-daily = [ "daily-python~=0.11.0" ]
+daily = [ "daily-python~=0.12.0" ]
 deepgram = [ "deepgram-sdk~=3.7.3" ]
 elevenlabs = [ "websockets~=13.1" ]
 examples = [ "python-dotenv~=1.0.1", "flask~=3.0.3", "flask_cors~=4.0.1" ]
 fal = [ "fal-client~=0.4.1" ]
 gladia = [ "websockets~=13.1" ]
-google = [ "google-generativeai~=0.7.2", "google-cloud-texttospeech~=2.17.2" ]
+google = [ "google-generativeai~=0.8.3", "google-cloud-texttospeech~=2.17.2" ]
 gstreamer = [ "pygobject~=3.48.2" ]
 fireworks = [ "openai~=1.37.2" ]
 langchain = [ "langchain~=0.2.14", "langchain-community~=0.2.12", "langchain-openai~=0.1.20" ]
@@ -73,3 +74,7 @@ pythonpath = ["src"]
 [tool.setuptools_scm]
 local_scheme = "no-local-version"
 fallback_version = "0.0.0-dev"
+
+[tool.ruff]
+exclude = ["*_pb2.py"]
+line-length = 100
--- a/src/pipecat/audio/utils.py
+++ b/src/pipecat/audio/utils.py
@@ -7,13 +7,14 @@
 import audioop
 import numpy as np
 import pyloudnorm as pyln
-from scipy import signal
+import resampy


 def resample_audio(audio: bytes, original_rate: int, target_rate: int) -> bytes:
+    if original_rate == target_rate:
+        return audio
    audio_data = np.frombuffer(audio, dtype=np.int16)
-    num_samples = int(len(audio) * target_rate / original_rate)
-    resampled_audio = signal.resample(audio_data, num_samples)
+    resampled_audio = resampy.resample(audio_data, original_rate, target_rate)
    return resampled_audio.astype(np.int16).tobytes()


--- a/src/pipecat/audio/vad/silero.py
+++ b/src/pipecat/audio/vad/silero.py
@@ -52,7 +52,7 @@ class SileroOnnxModel:

        if sr not in self.sample_rates:
            raise ValueError(
-                f"Supported sampling rates: {self.sample_rates} (or multiply of 16000)"
+                f"Supported sampling rates: {self.sample_rates} (or multiple of 16000)"
            )
        if sr / np.shape(x)[1] > 31.25:
            raise ValueError("Input audio chunk is too short")
--- a/src/pipecat/audio/vad/vad_analyzer.py
+++ b/src/pipecat/audio/vad/vad_analyzer.py
@@ -12,6 +12,11 @@ from pydantic.main import BaseModel

 from pipecat.audio.utils import calculate_audio_volume, exp_smoothing

+VAD_CONFIDENCE = 0.7
+VAD_START_SECS = 0.2
+VAD_STOP_SECS = 0.8
+VAD_MIN_VOLUME = 0.6
+

 class VADState(Enum):
    QUIET = 1
@@ -21,10 +26,10 @@ class VADState(Enum):


 class VADParams(BaseModel):
-    confidence: float = 0.7
-    start_secs: float = 0.2
-    stop_secs: float = 0.8
-    min_volume: float = 0.6
+    confidence: float = VAD_CONFIDENCE
+    start_secs: float = VAD_START_SECS
+    stop_secs: float = VAD_STOP_SECS
+    min_volume: float = VAD_MIN_VOLUME


 class VADAnalyzer:
@@ -41,13 +46,17 @@ class VADAnalyzer:
        self._prev_volume = 0

    @property
-    def sample_rate(self):
+    def sample_rate(self) -> int:
        return self._sample_rate

    @property
-    def num_channels(self):
+    def num_channels(self) -> int:
        return self._num_channels

+    @property
+    def params(self) -> VADParams:
+        return self._params
+
    @abstractmethod
    def num_frames_required(self) -> int:
        pass
--- a/src/pipecat/pipeline/task.py
+++ b/src/pipecat/pipeline/task.py
@@ -156,7 +156,7 @@ class PipelineTask:
        start_frame = StartFrame(
            allow_interruptions=self._params.allow_interruptions,
            enable_metrics=self._params.enable_metrics,
-            enable_usage_metrics=self._params.enable_metrics,
+            enable_usage_metrics=self._params.enable_usage_metrics,
            report_only_initial_ttfb=self._params.report_only_initial_ttfb,
            clock=self._clock,
        )
--- a/src/pipecat/processors/aggregators/gated_openai_llm_context.py
+++ b/src/pipecat/processors/aggregators/gated_openai_llm_context.py
@@ -0,0 +1,55 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import asyncio
+
+from pipecat.frames.frames import CancelFrame, EndFrame, Frame, StartFrame
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContextFrame
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.sync.base_notifier import BaseNotifier
+
+
+class GatedOpenAILLMContextAggregator(FrameProcessor):
+    """This aggregator keeps the last received OpenAI LLM context frame and it
+    doesn't let it through until the notifier is notified.
+
+    """
+
+    def __init__(self, notifier: BaseNotifier, **kwargs):
+        super().__init__(**kwargs)
+        self._notifier = notifier
+        self._last_context_frame = None
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, StartFrame):
+            await self.push_frame(frame)
+            await self._start()
+        if isinstance(frame, (EndFrame, CancelFrame)):
+            await self._stop()
+            await self.push_frame(frame)
+        elif isinstance(frame, OpenAILLMContextFrame):
+            self._last_context_frame = frame
+        else:
+            await self.push_frame(frame, direction)
+
+    async def _start(self):
+        self._gate_task = self.get_event_loop().create_task(self._gate_task_handler())
+
+    async def _stop(self):
+        self._gate_task.cancel()
+        await self._gate_task
+
+    async def _gate_task_handler(self):
+        while True:
+            try:
+                await self._notifier.wait()
+                if self._last_context_frame:
+                    await self.push_frame(self._last_context_frame)
+                    self._last_context_frame = None
+            except asyncio.CancelledError:
+                break
--- a/src/pipecat/processors/aggregators/openai_llm_context.py
+++ b/src/pipecat/processors/aggregators/openai_llm_context.py
@@ -70,6 +70,8 @@ class OpenAILLMContext:
            context.add_message(message)
        return context

+    # todo: deprecate from_image_frame. It's only used to create a single-use
+    # context, which isn't useful for most real-world applications.
    @staticmethod
    def from_image_frame(frame: VisionImageRawFrame) -> "OpenAILLMContext":
        """
@@ -77,6 +79,10 @@ class OpenAILLMContext:
        expects images to be base64 encoded, but other vision models may not.
        So we'll store the image as bytes and do the base64 encoding as needed
        in the LLM service.
+
+        NOTE: the above only applies to the deprecated use of this method. The
+        add_image_frame_message() below does the base64 encoding as expected
+        in the OpenAI format.
        """
        context = OpenAILLMContext()
        buffer = io.BytesIO()
--- a/src/pipecat/processors/filters/frame_filter.py
+++ b/src/pipecat/processors/filters/frame_filter.py
@@ -4,14 +4,14 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-from typing import List
+from typing import Tuple, Type

 from pipecat.frames.frames import AppFrame, ControlFrame, Frame, SystemFrame
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor


 class FrameFilter(FrameProcessor):
-    def __init__(self, types: List[type]):
+    def __init__(self, types: Tuple[Type[Frame]]):
        super().__init__()
        self._types = types

@@ -20,9 +20,8 @@ class FrameFilter(FrameProcessor):
    #

    def _should_passthrough_frame(self, frame):
-        for t in self._types:
-            if isinstance(frame, t):
-                return True
+        if isinstance(frame, self._types):
+            return True

        return (
            isinstance(frame, AppFrame)
--- a/src/pipecat/processors/filters/null_filter.py
+++ b/src/pipecat/processors/filters/null_filter.py
@@ -0,0 +1,14 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+from pipecat.processors.frame_processor import FrameProcessor
+
+
+class NullFilter(FrameProcessor):
+    """This filter doesn't allow passing any frames up or downstream."""
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
--- a/src/pipecat/processors/filters/wake_notifier_filter.py
+++ b/src/pipecat/processors/filters/wake_notifier_filter.py
@@ -0,0 +1,40 @@
+#
+# Copyright (c) 2024, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+from typing import Awaitable, Callable, Tuple, Type
+
+from pipecat.frames.frames import Frame
+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+from pipecat.sync.base_notifier import BaseNotifier
+
+
+class WakeNotifierFilter(FrameProcessor):
+    """This processor expects a list of frame types and will execute a given
+    callback predicate when a frame of any of those type is being processed. If
+    the callback returns true the notifier will be notified.
+
+    """
+
+    def __init__(
+        self,
+        notifier: BaseNotifier,
+        *,
+        types: Tuple[Type[Frame]],
+        filter: Callable[[Frame], Awaitable[bool]],
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self._notifier = notifier
+        self._types = types
+        self._filter = filter
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, self._types) and await self._filter(frame):
+            await self._notifier.notify()
+
+        await self.push_frame(frame, direction)
--- a/src/pipecat/services/ai_services.py
+++ b/src/pipecat/services/ai_services.py
@@ -205,7 +205,7 @@ class TTSService(AIService):
        # if push_stop_frames is True, wait for this idle period before pushing TTSStoppedFrame
        stop_frame_timeout_s: float = 1.0,
        # TTS output sample rate
-        sample_rate: int = 16000,
+        sample_rate: int = 24000,
        text_filter: Optional[BaseTextFilter] = None,
        **kwargs,
    ):
@@ -514,7 +514,7 @@ class SegmentedSTTService(STTService):
        min_volume: float = 0.6,
        max_silence_secs: float = 0.3,
        max_buffer_secs: float = 1.5,
-        sample_rate: int = 16000,
+        sample_rate: int = 24000,
        num_channels: int = 1,
        **kwargs,
    ):
--- a/src/pipecat/services/assemblyai.py
+++ b/src/pipecat/services/assemblyai.py
@@ -0,0 +1,152 @@
+import asyncio
+from typing import AsyncGenerator
+
+from loguru import logger
+
+from pipecat.frames.frames import (
+    CancelFrame,
+    EndFrame,
+    ErrorFrame,
+    Frame,
+    InterimTranscriptionFrame,
+    StartFrame,
+    TranscriptionFrame,
+)
+from pipecat.services.ai_services import STTService
+from pipecat.transcriptions.language import Language
+from pipecat.utils.time import time_now_iso8601
+
+try:
+    import assemblyai as aai
+    from assemblyai import AudioEncoding
+except ModuleNotFoundError as e:
+    logger.error(f"Exception: {e}")
+    logger.error(
+        "In order to use AssemblyAI, you need to `pip install pipecat-ai[assemblyai]`. Also, set `ASSEMBLYAI_API_KEY` environment variable."
+    )
+    raise Exception(f"Missing module: {e}")
+
+
+class AssemblyAISTTService(STTService):
+    def __init__(
+        self,
+        *,
+        api_key: str,
+        sample_rate: int = 16000,
+        encoding: AudioEncoding = AudioEncoding("pcm_s16le"),
+        language=Language.EN,  # Only English is supported for Realtime
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+
+        aai.settings.api_key = api_key
+        self._transcriber: aai.RealtimeTranscriber | None = None
+        # Store reference to the main event loop for use in callback functions
+        self._loop = asyncio.get_event_loop()
+
+        self._settings = {
+            "sample_rate": sample_rate,
+            "encoding": encoding,
+            "language": language,
+        }
+
+    async def set_language(self, language: Language):
+        logger.info(f"Switching STT language to: [{language}]")
+        self._settings["language"] = language
+
+    async def start(self, frame: StartFrame):
+        await super().start(frame)
+        await self._connect()
+
+    async def stop(self, frame: EndFrame):
+        await super().stop(frame)
+        await self._disconnect()
+
+    async def cancel(self, frame: CancelFrame):
+        await super().cancel(frame)
+        await self._disconnect()
+
+    async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
+        """
+        Process an audio chunk for STT transcription.
+
+        This method streams the audio data to AssemblyAI for real-time transcription.
+        Transcription results are handled asynchronously via callback functions.
+
+        :param audio: Audio data as bytes
+        :yield: None (transcription frames are pushed via self.push_frame in callbacks)
+        """
+        if self._transcriber:
+            await self.start_processing_metrics()
+            self._transcriber.stream(audio)
+            await self.stop_processing_metrics()
+        yield None
+
+    async def _connect(self):
+        """
+        Establish a connection to the AssemblyAI real-time transcription service.
+
+        This method sets up the necessary callback functions and initializes the
+        AssemblyAI transcriber.
+        """
+
+        def on_open(session_opened: aai.RealtimeSessionOpened):
+            """Callback for when the connection to AssemblyAI is opened."""
+            logger.info(f"{self}: Connected to AssemblyAI")
+
+        def on_data(transcript: aai.RealtimeTranscript):
+            """
+            Callback for handling incoming transcription data.
+
+            This function runs in a separate thread from the main asyncio event loop.
+            It creates appropriate transcription frames and schedules them to be
+            pushed to the next stage of the pipeline in the main event loop.
+            """
+            if not transcript.text:
+                return
+
+            timestamp = time_now_iso8601()
+
+            if isinstance(transcript, aai.RealtimeFinalTranscript):
+                frame = TranscriptionFrame(
+                    transcript.text, "", timestamp, self._settings["language"]
+                )
+            else:
+                frame = InterimTranscriptionFrame(
+                    transcript.text, "", timestamp, self._settings["language"]
+                )
+
+            # Schedule the coroutine to run in the main event loop
+            # This is necessary because this callback runs in a different thread
+            asyncio.run_coroutine_threadsafe(self.push_frame(frame), self._loop)
+
+        def on_error(error: aai.RealtimeError):
+            """
+            Callback for handling errors from AssemblyAI.
+
+            Like on_data, this runs in a separate thread and schedules error
+            handling in the main event loop.
+            """
+            logger.error(f"{self}: An error occurred: {error}")
+            # Schedule the coroutine to run in the main event loop
+            asyncio.run_coroutine_threadsafe(self.push_frame(ErrorFrame(str(error))), self._loop)
+
+        def on_close():
+            """Callback for when the connection to AssemblyAI is closed."""
+            logger.info(f"{self}: Disconnected from AssemblyAI")
+
+        self._transcriber = aai.RealtimeTranscriber(
+            sample_rate=self._settings["sample_rate"],
+            encoding=self._settings["encoding"],
+            on_data=on_data,
+            on_error=on_error,
+            on_open=on_open,
+            on_close=on_close,
+        )
+        self._transcriber.connect()
+
+    async def _disconnect(self):
+        """Disconnect from the AssemblyAI service and clean up resources."""
+        if self._transcriber:
+            self._transcriber.close()
+            self._transcriber = None
--- a/src/pipecat/services/aws.py
+++ b/src/pipecat/services/aws.py
@@ -4,11 +4,14 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

+import asyncio
+
 from typing import AsyncGenerator, Optional

 from loguru import logger
 from pydantic import BaseModel

+from pipecat.audio.utils import resample_audio
 from pipecat.frames.frames import (
    ErrorFrame,
    Frame,
@@ -45,7 +48,7 @@ class AWSTTSService(TTSService):
        aws_access_key_id: str,
        region: str,
        voice_id: str = "Joanna",
-        sample_rate: int = 16000,
+        sample_rate: int = 24000,
        params: InputParams = InputParams(),
        **kwargs,
    ):
@@ -164,6 +167,14 @@ class AWSTTSService(TTSService):
        return ssml

    async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
+        def read_audio_data(**args):
+            response = self._polly_client.synthesize_speech(**args)
+            if "AudioStream" in response:
+                audio_data = response["AudioStream"].read()
+                resampled = resample_audio(audio_data, 16000, self._settings["sample_rate"])
+                return resampled
+            return None
+
        logger.debug(f"Generating TTS: [{text}]")

        try:
@@ -178,28 +189,31 @@ class AWSTTSService(TTSService):
                "OutputFormat": "pcm",
                "VoiceId": self._voice_id,
                "Engine": self._settings["engine"],
-                "SampleRate": str(self._settings["sample_rate"]),
+                # AWS only supports 8000 and 16000 for PCM. We select 16000.
+                "SampleRate": "16000",
            }

            # Filter out None values
            filtered_params = {k: v for k, v in params.items() if v is not None}

-            response = self._polly_client.synthesize_speech(**filtered_params)
+            audio_data = await asyncio.to_thread(read_audio_data, **filtered_params)
+
+            if not audio_data:
+                logger.error(f"{self} No audio data returned")
+                yield None
+                return

            await self.start_tts_usage_metrics(text)

            yield TTSStartedFrame()

-            if "AudioStream" in response:
-                with response["AudioStream"] as stream:
-                    audio_data = stream.read()
-                    chunk_size = 8192
-                    for i in range(0, len(audio_data), chunk_size):
-                        chunk = audio_data[i : i + chunk_size]
-                        if len(chunk) > 0:
-                            await self.stop_ttfb_metrics()
-                            frame = TTSAudioRawFrame(chunk, self._settings["sample_rate"], 1)
-                            yield frame
+            chunk_size = 8192
+            for i in range(0, len(audio_data), chunk_size):
+                chunk = audio_data[i : i + chunk_size]
+                if len(chunk) > 0:
+                    await self.stop_ttfb_metrics()
+                    frame = TTSAudioRawFrame(chunk, self._settings["sample_rate"], 1)
+                    yield frame

            yield TTSStoppedFrame()

--- a/src/pipecat/services/azure.py
+++ b/src/pipecat/services/azure.py
@@ -25,8 +25,14 @@ from pipecat.frames.frames import (
    TTSStoppedFrame,
    URLImageRawFrame,
 )
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
 from pipecat.services.ai_services import ImageGenService, STTService, TTSService
-from pipecat.services.openai import BaseOpenAILLMService
+from pipecat.services.openai import (
+    BaseOpenAILLMService,
+    OpenAIAssistantContextAggregator,
+    OpenAIContextAggregatorPair,
+    OpenAIUserContextAggregator,
+)
 from pipecat.transcriptions.language import Language
 from pipecat.utils.time import time_now_iso8601

@@ -38,6 +44,7 @@ try:
        SpeechConfig,
        SpeechRecognizer,
        SpeechSynthesizer,
+        SpeechSynthesisOutputFormat,
    )
    from azure.cognitiveservices.speech.audio import (
        AudioStreamFormat,
@@ -70,6 +77,33 @@ class AzureLLMService(BaseOpenAILLMService):
            api_version=self._api_version,
        )

+    @staticmethod
+    def create_context_aggregator(
+        context: OpenAILLMContext, *, assistant_expect_stripped_words: bool = True
+    ) -> OpenAIContextAggregatorPair:
+        user = OpenAIUserContextAggregator(context)
+        assistant = OpenAIAssistantContextAggregator(
+            user, expect_stripped_words=assistant_expect_stripped_words
+        )
+        return OpenAIContextAggregatorPair(_user=user, _assistant=assistant)
+
+
+def sample_rate_to_output_format(sample_rate: int) -> SpeechSynthesisOutputFormat:
+    match sample_rate:
+        case 8000:
+            return SpeechSynthesisOutputFormat.Raw8Khz16BitMonoPcm
+        case 16000:
+            return SpeechSynthesisOutputFormat.Raw16Khz16BitMonoPcm
+        case 22050:
+            return SpeechSynthesisOutputFormat.Raw22050Hz16BitMonoPcm
+        case 24000:
+            return SpeechSynthesisOutputFormat.Raw24Khz16BitMonoPcm
+        case 44100:
+            return SpeechSynthesisOutputFormat.Raw44100Hz16BitMonoPcm
+        case 48000:
+            return SpeechSynthesisOutputFormat.Raw48Khz16BitMonoPcm
+    return SpeechSynthesisOutputFormat.Raw16Khz16BitMonoPcm
+

 class AzureTTSService(TTSService):
    class InputParams(BaseModel):
@@ -88,13 +122,15 @@ class AzureTTSService(TTSService):
        api_key: str,
        region: str,
        voice="en-US-SaraNeural",
-        sample_rate: int = 16000,
+        sample_rate: int = 24000,
        params: InputParams = InputParams(),
        **kwargs,
    ):
        super().__init__(sample_rate=sample_rate, **kwargs)

        speech_config = SpeechConfig(subscription=api_key, region=region)
+        speech_config.set_speech_synthesis_output_format(sample_rate_to_output_format(sample_rate))
+
        self._speech_synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=None)

        self._settings = {
@@ -283,7 +319,7 @@ class AzureSTTService(STTService):
        api_key: str,
        region: str,
        language=Language.EN_US,
-        sample_rate=16000,
+        sample_rate=24000,
        channels=1,
        **kwargs,
    ):
--- a/src/pipecat/services/cartesia.py
+++ b/src/pipecat/services/cartesia.py
@@ -68,9 +68,6 @@ def language_to_cartesia_language(language: Language) -> str | None:

 class CartesiaTTSService(WordTTSService):
    class InputParams(BaseModel):
-        encoding: Optional[str] = "pcm_s16le"
-        sample_rate: Optional[int] = 16000
-        container: Optional[str] = "raw"
        language: Optional[Language] = Language.EN
        speed: Optional[Union[str, float]] = ""
        emotion: Optional[List[str]] = []
@@ -83,6 +80,9 @@ class CartesiaTTSService(WordTTSService):
        cartesia_version: str = "2024-06-10",
        url: str = "wss://api.cartesia.ai/tts/websocket",
        model: str = "sonic-english",
+        sample_rate: int = 24000,
+        encoding: str = "pcm_s16le",
+        container: str = "raw",
        params: InputParams = InputParams(),
        **kwargs,
    ):
@@ -99,7 +99,7 @@ class CartesiaTTSService(WordTTSService):
        super().__init__(
            aggregate_sentences=True,
            push_text_frames=False,
-            sample_rate=params.sample_rate,
+            sample_rate=sample_rate,
            **kwargs,
        )

@@ -108,9 +108,9 @@ class CartesiaTTSService(WordTTSService):
        self._url = url
        self._settings = {
            "output_format": {
-                "container": params.container,
-                "encoding": params.encoding,
-                "sample_rate": params.sample_rate,
+                "container": container,
+                "encoding": encoding,
+                "sample_rate": sample_rate,
            },
            "language": self.language_to_service_language(params.language)
            if params.language
@@ -288,9 +288,6 @@ class CartesiaTTSService(WordTTSService):

 class CartesiaHttpTTSService(TTSService):
    class InputParams(BaseModel):
-        encoding: Optional[str] = "pcm_s16le"
-        sample_rate: Optional[int] = 16000
-        container: Optional[str] = "raw"
        language: Optional[Language] = Language.EN
        speed: Optional[Union[str, float]] = ""
        emotion: Optional[List[str]] = []
@@ -302,17 +299,20 @@ class CartesiaHttpTTSService(TTSService):
        voice_id: str,
        model: str = "sonic-english",
        base_url: str = "https://api.cartesia.ai",
+        sample_rate: int = 24000,
+        encoding: str = "pcm_s16le",
+        container: str = "raw",
        params: InputParams = InputParams(),
        **kwargs,
    ):
-        super().__init__(**kwargs)
+        super().__init__(sample_rate=sample_rate, **kwargs)

        self._api_key = api_key
        self._settings = {
            "output_format": {
-                "container": params.container,
-                "encoding": params.encoding,
-                "sample_rate": params.sample_rate,
+                "container": container,
+                "encoding": encoding,
+                "sample_rate": sample_rate,
            },
            "language": self.language_to_service_language(params.language)
            if params.language
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Nikita Gamolsky	0265c1d3ef	alllow interrupt	2024-11-02 16:12:29 -07:00
Nikita Gamolsky	ffa0e5a122	working with summary	2024-11-02 15:33:03 -07:00
Nikita Gamolsky	cdeab597b3	more variables	2024-11-02 14:05:19 -07:00
Nikita Gamolsky	abd486025b	more updates	2024-11-02 13:46:28 -07:00
Nikita Gamolsky	c4cdb2d809	update to use context global	2024-11-02 13:37:35 -07:00
Nikita Gamolsky	05ba10c969	update	2024-11-02 13:27:08 -07:00
Kwindla Hultman Kramer	2f80683dc4	initial commit of screen capture in 99-anthropic-hackathon.py	2024-11-02 10:42:31 -07:00
Kwindla Hultman Kramer	151242d3a0	Merge pull request #666 from pipecat-ai/khk/realtime-pipecat-vad Support using Pipecat turn detection instead of OpenAI Realtime API turn detection	2024-11-02 08:36:31 -07:00
Kwindla Hultman Kramer	93c6e5098c	added comment explaining config of TurnDetection	2024-11-02 08:24:54 -07:00
Mark Backman	84bd767312	Merge pull request #685 from pipecat-ai/mb/add-recording-events Add recording events and callbacks	2024-11-01 12:02:46 -04:00
Mark Backman	802c29e9e1	Add recording events and callbacks	2024-11-01 10:20:00 -04:00
Aleix Conchillo Flaqué	f83381860c	Merge pull request #677 from pipecat-ai/aleix/add-notifier-and-notifier-filters add notifiers and more frame filters	2024-10-31 15:55:07 -07:00
Aleix Conchillo Flaqué	4dad1bfe49	examples: add foundational/22-natural-conversation.py	2024-10-31 12:10:33 -07:00
marcus-daily	9ee8896b64	Removing unnecessary ruff arguments from README	2024-10-31 18:02:29 +00:00
marcus-daily	5f7a2f66d4	Add .idea to .gitignore	2024-10-31 18:02:29 +00:00
marcus-daily	76e5f1e847	Remove unnecessary ruff params in CI	2024-10-31 15:07:28 +00:00
marcus-daily	6975340d6c	Set Ruff config for the project	2024-10-31 15:07:28 +00:00
marcus-daily	0f4cf56418	Load dotenv in simple chatbot server (fixes #415 )	2024-10-31 12:08:30 +00:00
Aleix Conchillo Flaqué	018e51e8a3	add notifiers and more frame filters	2024-10-30 16:36:17 -07:00
Vanessa Pyne	b050143952	Merge pull request #676 from RonakAgarwalVani/fix/chunk-choices-delta-none Fix uncaught exception when accessing 'tool_calls' in NoneType delta in response handling	2024-10-30 14:44:32 -05:00
Mark Backman	98ea1f0791	Merge pull request #675 from pipecat-ai/mb/playht-add-request-id Add a request_id to each TTS sequence	2024-10-30 13:56:15 -04:00
Mark Backman	8272c35527	Use a request_id in TTS commands for the PlayHT websocket service	2024-10-30 13:54:18 -04:00
Mark Backman	e973e82e05	Merge pull request #672 from pipecat-ai/mb/fix-playht Fix PlayHT TTFB metrics	2024-10-30 13:53:02 -04:00
RonakAgarwalVani	d1396bf618	Update openai.py	2024-10-30 14:26:49 +05:30
Vanessa Pyne	8186e423de	Merge pull request #637 from pipecat-ai/vp-issue-template docs: add ISSUE_TEMPLATE.md	2024-10-29 15:08:42 -05:00
vipyne	3010addb8b	docs: add CONTRIBUTING.md	2024-10-29 15:03:07 -05:00
vipyne	029e0d391e	docs: add ISSUE_TEMPLATE.md	2024-10-29 15:03:07 -05:00
Vanessa Pyne	bf31223577	Merge pull request #671 from pipecat-ai/vp-issue-635 docs: small fix	2024-10-29 14:34:13 -05:00
vipyne	42cc79154f	docs: small fix	2024-10-29 14:33:57 -05:00
Mark Backman	05b857006a	Update changelog	2024-10-28 20:56:29 -04:00
Mark Backman	2e57d21b89	Fix ttfb metrics	2024-10-28 20:27:24 -04:00
Aleix Conchillo Flaqué	fa05ec46be	Merge pull request #667 from pipecat-ai/aleix/base-output-bot-speaking-detection transports(base_output): use audio frames for bot speaking detection	2024-10-28 10:54:54 -07:00
Aleix Conchillo Flaqué	e3ce619284	transports(base_output): use audio frames for bot speaking detection	2024-10-28 10:07:37 -07:00
Vanessa Pyne	fb512dcd74	Merge pull request #630 from MoofSoup/update-readme docs: simplify readme	2024-10-28 10:26:30 -05:00
Aleix Conchillo Flaqué	ca15d97383	Merge pull request #662 from pipecat-ai/aleix/daily-transport-async-functions transports(daily): make functions async	2024-10-25 16:14:06 -07:00
Aleix Conchillo Flaqué	b32448e967	transports(daily): make functions async	2024-10-25 15:01:52 -07:00
Aleix Conchillo Flaqué	7e30da6183	Merge pull request #661 from pipecat-ai/aleix/allow-updating-subscritption-before transports(daily): allow updating subscriptions before join	2024-10-25 15:00:34 -07:00
Aleix Conchillo Flaqué	a6dd2600d2	examples(tavus): await update_subscriptions	2024-10-25 14:56:56 -07:00
Aleix Conchillo Flaqué	b905b57dfc	transports(daily): allow updating subscriptions before join	2024-10-25 14:46:17 -07:00
Kwindla Hultman Kramer	e1a7edfb58	make it possible to use Pipecat turn detection instead of OpenAI turn detection	2024-10-25 15:59:48 -05:00
Aleix Conchillo Flaqué	1b30b1fc23	Merge pull request #665 from pipecat-ai/aleix/fix-bot-started-stopped-speaking transports(base_output): fix constant bot started/stopped speaking fr…	2024-10-25 13:00:38 -07:00
Aleix Conchillo Flaqué	55026898f6	transports(base_output): use vad stop secs for bot stopped speaking	2024-10-25 12:59:15 -07:00
Aleix Conchillo Flaqué	4283557894	audio(vad): expose params property	2024-10-25 12:59:15 -07:00
Aleix Conchillo Flaqué	5ab00e01aa	transports(base_output): fix constant bot started/stopped speaking frames	2024-10-25 12:10:24 -07:00
Aleix Conchillo Flaqué	fcfc729e83	Merge pull request #664 from pipecat-ai/aleix/fix-aws-stuttering services(aws): read stream and resample in a thread	2024-10-25 11:49:28 -07:00
Aleix Conchillo Flaqué	4eacb34fd8	services(aws): read stream and resample in a thread	2024-10-25 11:22:28 -07:00
Aleix Conchillo Flaqué	3a8aacccf7	Merge pull request #663 from pipecat-ai/aleix/audio-resampling-with-resampy audio: use resamply for audio resampling	2024-10-25 10:16:20 -07:00
roey	54c0bf0c70	Adding `TavusVideoService` layer (#617 ) Co-authored-by: roey <159067767+roey-tavus@users.noreply.github.com> Co-authored-by: Mert Gerdan <mert@tavus.io> Co-authored-by: Aleix Conchillo Flaqué <aleix@daily.co>	2024-10-25 09:46:25 -07:00
Aleix Conchillo Flaqué	778b05a252	audio: use resamply for audio resampling	2024-10-25 09:22:22 -07:00
Mark Backman	f16a416c2b	Merge pull request #660 from pipecat-ai/mb/add-gemini-inputs Add input params to Google Gemini	2024-10-24 20:58:19 -04:00
Aleix Conchillo Flaqué	1be63bccb8	Merge pull request #647 from pipecat-ai/aleix/daily-transport-only-transcribe-users transport(daily): only transcribe users	2024-10-24 17:40:34 -07:00
Mark Backman	37820ac0df	Add input params to Google Gemini	2024-10-24 20:12:41 -04:00
Aleix Conchillo Flaqué	8ea80d43f4	transports(daily): only transcribe user audio	2024-10-24 17:06:43 -07:00
Aleix Conchillo Flaqué	e117d70a00	update to daily-python 0.12.0	2024-10-24 16:49:19 -07:00
Aleix Conchillo Flaqué	2ba753272a	Merge pull request #658 from pipecat-ai/aleix/default-to-24000-sample-rate update TTS and transport output sample rate to 24000	2024-10-24 16:48:41 -07:00
Aleix Conchillo Flaqué	60c8c2f6e9	examples(15a): use daily transcription instead of local whisper	2024-10-24 16:47:41 -07:00
Aleix Conchillo Flaqué	cfb48200c2	services(azure): support sample rates	2024-10-24 16:47:35 -07:00
Aleix Conchillo Flaqué	6d317c6e8e	audio: don't resample if same sample rate	2024-10-24 16:47:35 -07:00
Aleix Conchillo Flaqué	158d52856f	transports(livekit): fix VADAnalyzer import	2024-10-24 16:47:35 -07:00
Aleix Conchillo Flaqué	92a69e404f	update TTS and transport output sample rate to 24000	2024-10-24 16:47:35 -07:00
Aleix Conchillo Flaqué	d24c6185d8	Merge pull request #654 from pipecat-ai/aleix/daily-allow-completion-futures transport(daily): allow completion futures	2024-10-24 14:28:53 -07:00
Mark Backman	1fd21578a6	Merge pull request #657 from pipecat-ai/mb/add-elevenlabs-output-format-type Add ElevenLabs output format type	2024-10-24 17:07:04 -04:00
Mark Backman	700db87127	Merge pull request #656 from pipecat-ai/mb/add-gemini-metrics Add Gemini token usage metrics	2024-10-24 17:04:56 -04:00
Mark Backman	6f1310569c	Add ElevenLabs output format type	2024-10-24 17:03:45 -04:00
Aleix Conchillo Flaqué	14cedb0be8	Merge pull request #655 from pipecat-ai/aleix/fix-together-params services(together): fix together AI InputParams	2024-10-24 13:51:38 -07:00
Mark Backman	fae97f9051	Add Gemini token usage metrics	2024-10-24 16:37:21 -04:00
Aleix Conchillo Flaqué	d930a46e64	services(together): fix together AI InputParams	2024-10-24 13:08:35 -07:00
Aleix Conchillo Flaqué	2e6b5d1843	transports(daily): fix aiohttp timeout	2024-10-24 11:44:30 -07:00
Aleix Conchillo Flaqué	88362db034	transports(daily): no more need for an output message queue	2024-10-24 11:44:30 -07:00
Aleix Conchillo Flaqué	f7f0c44c32	transports(daily): don't block event handlers	2024-10-24 11:44:30 -07:00
Mark Backman	33553b71d4	Merge pull request #653 from pipecat-ai/mb/align-tts-constructors Align TTSService constructors	2024-10-24 13:52:43 -04:00
Mark Backman	be8ca505cd	Merge pull request #652 from pipecat-ai/khk/more-gemini Gemini new context manager and rewrite to use google data structures internally	2024-10-24 13:47:38 -04:00
Mark Backman	e957cce422	Align TTSService constructors	2024-10-24 13:42:06 -04:00
Mark Backman	418a13a4ec	Merge pull request #650 from pipecat-ai/mb/assembly-fix AssemblyAI: don't disconnect on language change	2024-10-24 11:26:56 -04:00
Mark Backman	fc445c0a1f	Merge pull request #649 from pipecat-ai/mb/open-ai-max-tokens Add max_tokens and max_completion_tokens inputs for OpenAI	2024-10-24 11:26:44 -04:00
Mark Backman	f0c65468ed	AssemblyAI: don't disconnect on language change	2024-10-24 08:30:48 -04:00
Mark Backman	ce6a2bdcf7	Add max tokens inputs to OpenAI	2024-10-24 07:03:45 -04:00
Mark Backman	673542e235	Merge pull request #646 from pipecat-ai/mb/grok-function-calling Support function calling for Grok	2024-10-23 21:56:38 -04:00
Kwindla Hultman Kramer	e032b0b70a	gemini context aggregators	2024-10-23 18:44:09 -07:00
Mark Backman	e39f7e965b	Support function calling for Grok	2024-10-23 17:22:26 -04:00
Mattie Ruth	d26751e968	add missing PipelineParams to enable the metrics (#645 )	2024-10-23 16:46:46 -04:00
Aleix Conchillo Flaqué	e0ca4a9c23	Merge pull request #643 from pipecat-ai/aleix/daily-update-subscriptions transports(daily): add update_subscriptions()	2024-10-22 17:07:07 -07:00
Aleix Conchillo Flaqué	801e52c095	transports(daily): add update_subscriptions()	2024-10-22 15:02:55 -07:00
Aleix Conchillo Flaqué	a46eaa838b	Merge pull request #641 from pipecat-ai/aleix/prepare-0.0.47 prepare 0.0.47	2024-10-22 10:30:42 -07:00
Aleix Conchillo Flaqué	7c432499db	update CHANGELOG for 0.0.47	2024-10-22 10:02:50 -07:00
Aleix Conchillo Flaqué	8d75fcc9f0	use warnings package to report deprecated code	2024-10-22 10:02:21 -07:00
Aleix Conchillo Flaqué	61d73f81ae	Merge pull request #639 from pipecat-ai/aleix/daily-transcription-model transport(daily): use "nova-2-general" for transcription	2024-10-22 09:40:43 -07:00
Aleix Conchillo Flaqué	951255def9	transport(daily): use "nova-2-general" for transcription	2024-10-22 09:40:03 -07:00
Moof Soup	bf5a7c3562	docs: Clarify README example and token usage clarified readme example	2024-10-21 19:54:34 -07:00
Mark Backman	e556f34094	Merge pull request #638 from pipecat-ai/mb/fix-silero-vad-import Fix Silero VAD import issue	2024-10-21 20:48:06 -04:00
Mark Backman	ccc3691620	Fix Silero VAD import issue	2024-10-21 20:39:20 -04:00
Vanessa Pyne	5321affda7	Merge pull request #588 from Allenmylath/patch-11 Update README.md	2024-10-21 11:20:05 -05:00
Mark Backman	e5ad8dc67b	Merge pull request #627 from pipecat-ai/mb/upgrade-gladia-to-v2-api Update GladiaSTTService to use the Gladia V2 API	2024-10-21 12:01:20 -04:00
Mark Backman	46927805bc	Update GladiaSTTService to use the Gladia V2 API	2024-10-21 07:10:38 -04:00
Aleix Conchillo Flaqué	b6b1ef0a40	Merge pull request #589 from Allenmylath/patch-12 Update Dockerfile	2024-10-20 10:59:43 -07:00
Mark Backman	e62f762382	Merge pull request #625 from pipecat-ai/mb/add-assemblyai-stt Add support for AssemblyAI STT	2024-10-20 13:59:33 -04:00
Aleix Conchillo Flaqué	dbfda14342	Merge pull request #587 from Allenmylath/patch-9 Update env.example	2024-10-20 10:58:50 -07:00
Aleix Conchillo Flaqué	fee85418cd	Merge pull request #620 from gregschwartz/main Start agent/call/bot at localhost root	2024-10-20 10:14:10 -07:00
Mark Backman	015faa3dbd	Update CHANGELOG and README	2024-10-20 08:57:57 -04:00
Mark Backman	1dbf4ff27d	Add AssemblyAI STT service	2024-10-20 08:57:57 -04:00
Aleix Conchillo Flaqué	4f1b2dce9b	Merge pull request #624 from pvilchez/fix_enable_usage_metrics Fixing `enable_usage_metrics` setting.	2024-10-20 01:00:12 -07:00
Paul Vilchez	5640bd9447	Fixing a config mismatch which caused usage stats to only report when `enable_metrics` was true.	2024-10-20 03:33:13 -04:00
Greg Schwartz	1fa52b62aa	Put start agent/call at localhost root. Before you had to read in the docs to go to /start, or /start_call or /start_bot. Which isn't mentioned in the console output, and is inconsistent, adding friction to learning the codebase	2024-10-19 16:18:43 -07:00
Kwindla Hultman Kramer	07712cdb16	gemini function calling and partial implementation of standard context stuff	2024-10-18 17:14:57 -07:00
allenmylath	ec98a13a08	Update Dockerfile utils and assets not used in this example hence removed	2024-10-15 08:18:16 +05:30
allenmylath	b999b76f70	Update README.md readme description still shows simple-chatbot definition hence made more accurate description	2024-10-15 08:14:43 +05:30
allenmylath	b64dbe7bb4	Update env.example canonical api url is also used from env.	2024-10-15 08:10:07 +05:30
				`@@ -0,0 +1 @@`
				`#### Please describe the changes in your PR. If it is addressing an issue, please reference that as well.`