added fuzz example

wip
wip: telestrator
2024-03-22 14:20:16 +00:00 · 2024-03-19 22:04:47 +00:00 · 2024-03-19 15:31:19 +00:00 · 2024-03-19 03:08:04 +00:00 · 2024-03-19 01:51:36 +00:00 · 2024-03-18 22:14:02 +00:00
146 changed files with 7169 additions and 2856 deletions
--- a/.github/workflows/lint.yaml
+++ b/.github/workflows/lint.yaml
@@ -0,0 +1,32 @@
+name: lint
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - "**"
+    paths-ignore:
+      - "docs/**"
+
+concurrency:
+  group: build-lint-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  autopep8:
+    name: "Formatting lints"
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repo
+        uses: actions/checkout@v4
+      - name: autopep8
+        id: autopep8
+        uses: peter-evans/autopep8@v2
+        with:
+          args: --exit-code -r -d -a -a src/
+      - name: Fail if autopep8 requires changes
+        if: steps.autopep8.outputs.exit-code == 2
+        run: exit 1
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,7 @@
 env/
 __pycache__/
 *~
+venv
 #*#

 # Distribution / packaging
--- a/24
+++ b/24
@@ -0,0 +1,24 @@
+BSD 2-Clause License
+
+Copyright (c) 2024, Daily
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/README.md
+++ b/README.md
@@ -1,8 +1,78 @@
-# dailyai SDK
+# dailyai — an open source framework for real-time, multi-modal, conversational AI applications

-This SDK can help you build applications that participate in WebRTC meetings and use various AI services to interact with other participants.
+Build things like this:

-## Build/Install
+[![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0)
+
+
+
+
+**`dailyai` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
+
+
+In 2023 a *lot* of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
+- low-latency, reliable audio transport
+- echo cancellation
+- phrase endpointing (knowing when the bot should respond to human speech)
+- interruptibility
+- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models
+
+As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
+
+Today, `dailyai` is:
+
+1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
+2. transport services that moves audio, video, and events across the Internet
+3. implementations of specific generative AI services
+
+Currently implemented services:
+- Speech-to-text
+  - Deepgram
+  - Whisper
+- LLMs
+  - Azure
+  - OpenAI
+- Image generation
+  - Azure
+  - Fal
+  - OpenAI
+- Text-to-speech
+  - Azure
+  - Deepgram
+  - ElevenLabs
+- Transport
+  - Daily
+  - Local (in progress, intended as a quick start example service)
+
+If you'd like to [implement a service]((https://github.com/daily-co/daily-ai-sdk/tree/main/src/dailyai/services)), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
+
+## Step 1: Get started
+
+Today, the easiest way to get started with `dailyai` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations. (The [transport base class](https://github.com/daily-co/daily-ai-sdk/blob/main/src/dailyai/services/base_transport_service.py) is easy to extend, though, so feel free to submit PRs if you'd like to implement another transport service.)
+
+```
+# install the module
+pip install dailyai
+
+# set up an .env file with API keys
+cp dot-env.template .env
+
+# sign up for a free Daily account, if you don't already have one, and
+# join the Daily room URL directly from a browser tab, then run one of the
+# samples
+python src/examples/foundational/02-llm-say-one-thing.py
+```
+
+## Code examples
+
+There are two directories of examples:
+
+- [foundational](https://github.com/daily-co/daily-ai-sdk/tree/main/src/examples/foundational) — demos that build on each other, introducing one or two concepts at a time
+- [starter apps](https://github.com/daily-co/daily-ai-sdk/tree/main/src/examples/starter-apps) — complete applications that you can use as starting points for development
+
+
+
+## Hacking on the framework itself

 _Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_

@@ -29,27 +99,3 @@ If you want to use this package from another directory, you can run:
 ```
 pip install path_to_this_repo
 ```
-
-## Running the samples
-
-Tou can run the simple sample like so:
-
-```
-python src/samples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
-```
-
-Note that the sample uses Azure's TTS and LLM services. You'll need to set the following environment variables for the sample to work:
-
-```
-AZURE_SPEECH_SERVICE_KEY
-AZURE_SPEECH_SERVICE_REGION
-AZURE_CHATGPT_KEY
-AZURE_CHATGPT_ENDPOINT
-AZURE_CHATGPT_DEPLOYMENT_ID
-```
-
-If you have those environment variables stored in an .env file, you can quickly load them into your terminal's environment by running this:
-
-```bash
-export $(grep -v '^#' .env | xargs)
-```
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,13 @@
+# Daily AI SDK Docs
+
+## [Architecture Overview](architecture.md)
+
+Learn about the thinking behind the SDK's design.
+
+## [Example Code](examples/)
+
+The repo includes several example apps in the `src/examples` directory. The docs explain how they work.
+
+## [API Reference](api/)
+
+Complete documentation of the available classes and methods in the SDK.
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,17 @@
+# Daily AI SDK Architecture Guide
+
+## Frames
+
+Frames can represent discrete chunks of data, for instance a chunk of text, a chunk of audio, or an image. They can also be used to as control flow, for instance a frame that indicates that there is no more data available, or that a user started or stopped talking. They can also represent more complex data structures, such as a message array used for an LLM completion.
+
+## FrameProcessors
+
+Frame processors operate on frames. Every frame processor implements a `process_frame` method that consumes one frame and produces zero or more frames. Frame processors can do simple transforms, such as concatenating text fragments into sentences, or they can treat frames as input for an AI Service, and emit chat completions based on message arrays or transform text into audio or images.
+
+## Pipelines
+
+Pipelines are lists of frame processors that read from a source queue and send the processed frames to a sink queue. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport's send queue as its sync. Placing LLM message frames on the pipeline's source queue will cause the LLM's response to be spoken. See example #2 for an implementation of this.
+
+## Transports
+
+Transports provide a receive queue, which is input from "the outside world", and a sink queue, which is data that will be sent "to the outside world". The `LocalTransportService` does this with the local camera, mic, display and speaker. The `DailyTransportService` does this with a WebRTC session joined to a Daily.co room.
--- a/docs/examples/01-say-one-thing.md
+++ b/docs/examples/01-say-one-thing.md
@@ -0,0 +1,119 @@
+# 01: Say One Thing
+
+_video here - youtube?_
+
+This example uses a text-to-speech (TTS) service to say one predefined sentence. But first, a quick overview of the general structure of these examples.
+
+## Running the demos
+
+All of the demos have something like this at the bottom of the file:
+
+```python
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
+```
+
+### `configure()`
+
+The `configure()` function comes from `src/examples/foundational/support/runner.py`, and it allows you to configure the examples from the command line directly, or using environment variables:
+
+```bash
+python 01-say-one-thing.py -u https://YOUR_DOMAIN.daily.co/YOUR_ROOM -k YOUR_API_KEY
+# or
+DAILY_ROOM_URL=https://YOUR_DOMAIN.daily.co/YOUR_ROOM DAILY_API_KEY=YOUR_API_KEY python 01-say-one-thing.py
+# or set DAILY_ROOM_URL and DAILY_API_KEY in a .env file
+python 01-say-one-thing.py
+```
+
+You'll need a Daily account to run these demos. You can sign up for free at [daily.co](https://daily.co). Once you've signed up you can create a room from the [Dashboard](https://dashboard.daily.co/rooms), and grab [your API key](https://dashboard.daily.co/developers) while you're there.
+
+Some functionality (such as transcription) requires the bot to have owner privileges in the room. `runner.py` uses the Daily REST API to create a meeting token with owner privileges. You can learn more about meeting tokens in the [Daily docs](https://docs.daily.co/reference/rest-api/meeting-tokens).
+
+### `asyncio.run()`
+
+The AI SDK makes heavy use of Python's `asyncio` module. [This is a reasonable intro to the topic](https://builtin.com/data-science/asyncio) if you haven't worked with `asyncio` and coroutines before.
+
+You can learn a bit more about the specifics of how the Daily AI SDK uses coroutines in the [Architecture Guide](../architecture.md).
+
+## The `main()` function
+
+All of the examples have a `main()` function with a similar structure:
+
+- Configure the transport
+- Configure the AI service(s) used in the demo
+- Configure any event listeners
+- Define a processing pipeline
+- Run the example's coroutine(s)
+
+### Configuring the transport
+
+The first section of the `main()` function configures the transport object:
+
+```python
+meeting_duration_minutes = 5
+transport = DailyTransportService(
+    room_url,
+    None,
+    "Say One Thing",
+    meeting_duration_minutes,
+)
+transport.mic_enabled = True
+```
+
+The [Architecture Guide](../architecture.md) explains the transport object in more detail. In this case, we're configuring a Daily transport object and enabling the virtual microphone, so our bot can play audio.
+
+### Configuring the services
+
+As described in the [Architecture Guide](../architecture.md), 'a 'Service' is a class that processes 'Frames' as part of a 'Pipeline'. In this demo app, we'll only need one service: a text-to-speech generator. We can create an instance of the `ElevenLabsTTSService` class with this line of code:
+
+```python
+tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
+```
+
+You'll need to make sure and set those environment variables somewhere. The easiest way to do that is to copy the `example.env` file in the repo and rename it to `.env`, and then add your credentials to that file. `runner.py` loads the `python-dotenv` module and initializes it, making the values in that file available in the environment.
+
+### Configuring event listeners
+
+This part isn't strictly necessary for an app like this. You could include the contents of the `on_participant_joined` function directly in the body of the `main()` function, and it would run as soon as you started the script from the command line.
+
+Instead, we can use an event handler to wait to run that code until someone else joins the meeting. We'll define a function called `greet_user()`, and use the `@transport.event_handler("on_participant_joined")` decorator to tell the SDK that we want to run that function whenever a user joins the room.
+
+```python
+@transport.event_handler("on_participant_joined")
+async def greet_user(transport, participant):
+    if participant["info"]["isLocal"]:
+        return
+
+    await tts.say(
+        "Hello there, " + participant["info"]["userName"] + "!",
+        transport.send_queue,
+    )
+
+    # wait for the output queue to be empty, then leave the meeting
+    await transport.stop_when_done()
+```
+
+### Defining a processing pipeline
+
+In this example, we don't actually have much of a processing pipeline! In fact, we're doing the whole thing inside the `greet_user()` function already.
+
+Pipelines usually look like a bunch of nested calls to the `run()` or `run_to_queue()` function from different Services. In this example, we're using the `say()` function from the TTS service. This is effectively a convenience wrapper around the `run_to_queue()` function, which we'll discuss more later. It's important to `await` this function to ensure that the speech frames are queued for playback before the next line of code, because of the `stop_when_done()` function being called immediately afterward.
+
+The output of the `say()` function goes to the transport's `send_queue`. This queue is the all-important connection between the world of the Services pipeline that's generating frames asynchronously and the ordered playback of audio and visual media in the WebRTC call.
+
+### Running the coroutines
+
+In this example, we don't actually have any separate processing pipelines—everything happens as a result of an event from the transport. So we only need to run the transport's coroutine, and await its completion:
+
+```python
+await transport.run()
+```
+
+In future examples, we'll run more processes in parallel. For now, this script can run until the transport exits—which will happen based on calling `stop_when_done()` in the `greet_user()` function.
+
+## Next Steps
+
+Next, we'll start connecting multiple AI services together by building a service pipeline.
+
+## [02 - LLM Say One Thing »](02-llm-say-one-thing.md)
--- a/docs/examples/README.md
+++ b/docs/examples/README.md
@@ -0,0 +1,5 @@
+# Daily AI SDK Examples
+
+The docs in this folder pair with the example apps located in `src/examples/foundational`. They are designed to serve as a quick references for building different kinds of AI apps. But the examples also build on one another, so it can be really helpful to walk through them in order.
+
+To start, you can learn about the overall structure of the examples in [01 - Say One Thing](01-say-one-thing.md).
--- a/dot-env.template
+++ b/dot-env.template
@@ -0,0 +1,5 @@
+OPENAI_API_KEY=...
+ELEVENLABS_API_KEY=...
+ELEVENLABS_VOICE_ID=...
+DAILY_API_KEY=...
+DAILY_SAMPLE_ROOM_URL=https://...
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -3,21 +3,44 @@ requires = ["setuptools"]
 build-backend = "setuptools.build_meta"

 [project]
-name = "daily_ai"
-version = "0.0.1"
-description = "Orchestrator for AI bots with Daily"
-dependencies = [
-    "daily-python",
-    "Pillow",
-    "typing-extensions",
-    "openai",
-    "google-cloud-texttospeech",
-    "azure-cognitiveservices-speech",
-    "pyht",
-    "opentelemetry-sdk",
-    "aiohttp",
-    "fal"
+name = "dailyai"
+version = "0.0.3.1"
+description = "An open source framework for real-time, multi-modal, conversational AI applications"
+license = { text = "BSD 2-Clause License" }
+readme = "README.md"
+requires-python = ">=3.7"
+keywords = ["webrtc", "audio", "video", "ai"]
+classifiers = [
+    "Development Status :: 5 - Production/Stable",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: BSD License",
+    "Topic :: Communications :: Conferencing",
+    "Topic :: Multimedia :: Sound/Audio",
+    "Topic :: Multimedia :: Video",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence"
 ]
+dependencies = [
+    "aiohttp",
+    "anthropic",
+    "azure-cognitiveservices-speech",
+    "daily-python",
+    "fal",
+    "faster_whisper",
+    "google-cloud-texttospeech",
+    "numpy",
+    "openai",
+    "Pillow",
+    "pyht",
+    "python-dotenv",
+    "torch",
+    "torchaudio",
+    "pyaudio",
+    "typing-extensions"
+]
+
+[project.urls]
+Source = "https://github.com/daily-co/daily-ai-sdk"
+Website = "https://daily.co"

 [tool.setuptools.packages.find]
 # All the following settings are optional:
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,4 @@
+autopep8==2.0.4
 build==1.0.3
 packaging==23.2
 pyproject_hooks==1.0.0
--- a/src/dailyai/async_processor/init.py
+++ b/src/dailyai/async_processor/init.py
--- a/src/dailyai/async_processor/async_processor.py
+++ b/src/dailyai/async_processor/async_processor.py
@@ -1,347 +0,0 @@
-import json
-import logging
-import re
-
-from collections import defaultdict
-from dataclasses import dataclass, field
-from enum import Enum
-from queue import Queue, PriorityQueue, Empty
-from threading import Event, Semaphore, Thread
-from typing import Any, Generator, Iterator, Optional, Type
-
-from dailyai.queue_frame import QueueFrame, FrameType
-from dailyai.message_handler.message_handler import MessageHandler
-from dailyai.services.ai_services import AIServiceConfig
-
-class AsyncProcessorState:
-    # Setting class variables, other synchronous activities
-    INIT = 0
-
-    # Making asynchronous requests to LLM and other services to render response
-    PREPARING = 1
-
-    # Ready to start presenting to user (but may not have all data yet)
-    READY = 2
-
-    # Playing response
-    PLAYING = 3
-
-    # An interrupt has been requested and the response is shutting down in-flight processing
-    INTERRUPTING = 4
-
-    # An interrupt has been requested and the response is finished stopping in-flight processing
-    INTERRUPTED = 5
-
-    # Response has been played or interrupted
-    DONE = 6
-
-    # Response is being finalized (updating records of speech, updating LLM context, etc.)
-    FINALIZING = 7
-
-    # Response is complete. This could mean that everything is updated, or that the response
-    # was interrupted.
-    FINALIZED = 8
-
-    state_transitions = {
-        INIT: [PREPARING, INTERRUPTING],
-        PREPARING: [READY, INTERRUPTING],
-        READY: [PLAYING, INTERRUPTING],
-        PLAYING: [DONE, INTERRUPTING],
-        INTERRUPTING: [INTERRUPTED],
-        INTERRUPTED: [DONE],
-        DONE: [FINALIZING],
-        FINALIZING: [FINALIZED],
-        FINALIZED: [FINALIZED],
-    }
-
-
-@dataclass(order=True)
-class StateTransitionItem:
-    state: int
-    evt: Event = field(compare=False)
-
-class AsyncProcessor:
-    def __init__(
-        self,
-        services: AIServiceConfig
-    ) -> None:
-        self.state = AsyncProcessorState.INIT
-        self.prepare_thread = None
-        self.play_thread = None
-        self.finalize_thread = None
-
-        self.services: AIServiceConfig = services
-
-        self.state_transition_semaphore = Semaphore()
-        self.waiting_for_state_changes = PriorityQueue()
-        self.state_queue = Queue()
-
-        self.state_change_callbacks = defaultdict(list)
-
-        self.was_interrupted = False
-
-        self.logger: logging.Logger = logging.getLogger("dailyai")
-
-    def set_state(self, state: int) -> None:
-        if state in AsyncProcessorState.state_transitions[self.state]:
-            self.state_transition_semaphore.acquire()
-
-            self.state: int = state
-            self.state_transition_semaphore.release()
-
-            # wake up any threads waiting for this state transition
-            try:
-                while True:
-                    waiter = self.waiting_for_state_changes.get_nowait()
-                    if waiter.state <= state:
-                        waiter.evt.set()
-                    else:
-                        self.waiting_for_state_changes.put(waiter)
-                        break
-            except Empty:
-                pass
-
-            # make all the callbacks for this state
-            for callback in self.state_change_callbacks[state]:
-                callback(self)
-        else:
-            self.logger.error(
-                f"Invalid state transition from {self.state} to {state} in {self.__class__.__name__}"
-            )
-            raise Exception(f"Invalid state transition from {self.state} to {state}")
-
-    #
-    # This is used for state transitions that could be blocked by an interruption.
-    # If we are interrupted, we silently fail this call. Use only if you know that
-    # this state transition should fail if the processor has been interrupted.
-    #
-
-    def maybe_set_state(self, state: int) -> bool:
-        if state in AsyncProcessorState.state_transitions[self.state]:
-            self.set_state(state)
-            return True
-        else:
-            return False
-
-    def wait_for_state_transition(self, state: int) -> None:
-        if self.state >= state:
-            return
-
-        self.state_transition_semaphore.acquire()
-
-        evt = Event()
-        self.waiting_for_state_changes.put(StateTransitionItem(state, evt))
-        self.state_transition_semaphore.release()
-        result = evt.wait(120.0)
-        if not result:
-            self.logger.error(
-                f"Timed out waiting for state transition to {state} from {self.state}"
-            )
-
-    def set_state_callback(self, state: int, callback: callable) -> None:
-        self.state_change_callbacks[state].append(callback)
-
-    def prepare(self) -> None:
-        self.prepare_thread = Thread(target=self.async_prepare, daemon=True)
-        self.prepare_thread.start()
-        self.wait_for_state_transition(AsyncProcessorState.READY)
-
-    def play(self) -> None:
-        self.wait_for_state_transition(AsyncProcessorState.READY)
-        self.play_thread = Thread(target=self.async_play, daemon=True)
-        self.play_thread.start()
-        self.wait_for_state_transition(AsyncProcessorState.PLAYING)
-
-    def finalize(self) -> None:
-        # don't finalize until we're done playing.
-        self.wait_for_state_transition(AsyncProcessorState.DONE)
-        self.set_state(AsyncProcessorState.FINALIZING)
-        self.do_finalization()
-        self.set_state(AsyncProcessorState.FINALIZED)
-
-    def interrupt(self) -> None:
-        # nothing to interrupt if we're already finalizing or finalized, no-op
-        if self.state in [
-            AsyncProcessorState.FINALIZING,
-            AsyncProcessorState.FINALIZED,
-        ]:
-            return
-
-        self.set_state(AsyncProcessorState.INTERRUPTING)
-        self.was_interrupted = True
-        self.do_interruption()
-        self.set_state(AsyncProcessorState.INTERRUPTED)
-        self.set_state(AsyncProcessorState.DONE)
-
-    def async_play(self) -> None:
-        self.logger.info(f"Starting to play")
-        if self.maybe_set_state(AsyncProcessorState.PLAYING):
-            self.do_play()
-        self.maybe_set_state(AsyncProcessorState.DONE)
-
-    def async_prepare(self) -> None:
-        self.set_state(AsyncProcessorState.PREPARING)
-        self.start_preparation()
-        self.set_state(AsyncProcessorState.READY)
-        self.continue_preparation()
-        self.logger.info(f"Preparation done for {self.__class__.__name__}")
-        self.preparation_done()
-
-    def start_preparation(self) -> None:
-        pass
-
-    def continue_preparation(self) -> None:
-        pass
-
-    def preparation_done(self):
-        pass
-
-    def get_preparation_iterator(self) -> Iterator:
-        yield None
-
-    def process_chunk(self, chunk) -> None:
-        pass
-
-    def do_interruption(self) -> None:
-        pass
-
-    def do_play(self) -> None:
-        pass
-
-    def do_finalization(self) -> None:
-        pass
-
-# A common class for responses that use a message queue and
-# an output queue.
-
-class OrchestratorResponse(AsyncProcessor):
-
-    def __init__(
-        self,
-        services,
-        message_handler,
-        output_queue,
-    ) -> None:
-        super().__init__(services)
-
-        self.message_handler: MessageHandler = message_handler
-        self.output_queue: Queue = output_queue
-
-
-class LLMResponse(OrchestratorResponse):
-    def __init__(
-        self,
-        services,
-        message_handler,
-        output_queue,
-    ) -> None:
-        super().__init__(services, message_handler, output_queue)
-
-        self.has_sent_first_frame = False
-
-        self.chunks_in_preparation = Queue()
-
-        self.llm_responses: list[str] = []
-
-    def get_preparation_iterator(self) -> Iterator:
-        messages_for_llm = self.message_handler.get_llm_messages()
-        self.logger.debug(f"Messages for llm: {json.dumps(messages_for_llm, indent=2)}")
-        return self.clauses_from_chunks(
-            self.services.llm.run_llm_async(messages_for_llm)
-        )
-
-    def clauses_from_chunks(self, chunks) -> Iterator:
-        out = ""
-        for chunk in chunks:
-            if self.state not in [
-                AsyncProcessorState.READY,
-                AsyncProcessorState.PLAYING,
-            ]:
-                break
-
-            out += chunk
-
-            if re.match(r"^.*[.!?]$", out):  # it looks like a sentence
-                yield out.strip()
-                out = ""
-
-        if out.strip():
-            yield out.strip()
-
-    def get_frames_from_tts_response(self, audio_frame) -> list[QueueFrame]:
-        return [QueueFrame(FrameType.AUDIO, audio_frame)]
-
-    def get_frames_from_chunk(self, chunk) -> Generator[list[QueueFrame], Any, None]:
-        for audio_frame in self.services.tts.run_tts(chunk):
-            yield self.get_frames_from_tts_response(audio_frame)
-
-    def start_preparation(self) -> None:
-        self.preparation_iterator = self.get_preparation_iterator()
-
-    def continue_preparation(self) -> None:
-        for chunk in self.preparation_iterator:
-            if self.state not in [
-                AsyncProcessorState.READY,
-                AsyncProcessorState.PLAYING,
-            ]:
-                break
-
-            self.process_chunk(chunk)
-
-    def process_chunk(self, chunk) -> None:
-        self.chunks_in_preparation.put((chunk, self.get_frames_from_chunk(chunk)))
-
-    def preparation_done(self):
-        self.chunks_in_preparation.put((None, None))
-
-    def do_play(self) -> None:
-        while True:
-            if self.state not in [
-                AsyncProcessorState.READY,
-                AsyncProcessorState.PLAYING,
-            ]:
-                break
-            prepared_chunk = self.chunks_in_preparation.get()
-            if prepared_chunk[0] == None:
-                return
-
-            self.play_prepared_chunk(prepared_chunk)
-
-    def play_prepared_chunk(self, prepared_chunk) -> None:
-        chunk, tts_generator = prepared_chunk
-        for frames in tts_generator:
-            if self.state not in [
-                AsyncProcessorState.READY,
-                AsyncProcessorState.PLAYING,
-            ]:
-                break
-
-            if not self.has_sent_first_frame:
-                self.output_queue.put(QueueFrame(FrameType.START_STREAM, None))
-                self.has_sent_first_frame = True
-
-            for frame in frames:
-                self.output_queue.put(frame)
-
-        self.output_queue.join()
-        self.llm_responses.append(chunk)
-
-    def do_finalization(self) -> None:
-        self.message_handler.add_assistant_messages(self.llm_responses)
-
-    def do_interruption(self) -> None:
-        self.chunks_in_preparation.put((None, None))
-
-        if self.prepare_thread and self.prepare_thread.is_alive():
-            self.prepare_thread.join()
-
-        if self.play_thread and self.play_thread.is_alive():
-            self.play_thread.join()
-
-
-@dataclass(frozen=True)
-class ConversationProcessorCollection:
-    introduction: Optional[Type[OrchestratorResponse]] = None
-    waiting: Optional[Type[OrchestratorResponse]] = None
-    response: Optional[Type[OrchestratorResponse]] = None
-    goodbye: Optional[Type[OrchestratorResponse]] = None
--- a/src/dailyai/message_handler/init.py
+++ b/src/dailyai/message_handler/init.py
--- a/src/dailyai/message_handler/message_handler.py
+++ b/src/dailyai/message_handler/message_handler.py
@@ -1,127 +0,0 @@
-import logging
-import time
-
-from dataclasses import dataclass
-from queue import Queue, Empty
-from threading import Thread
-
-from dailyai.storage.search import SearchIndexer
-from dailyai.services.ai_services import AIServiceConfig
-
-
-@dataclass
-class Message:
-    type: str
-    timestamp: float
-    message: str
-
-
-class MessageHandler:
-    def __init__(self, intro):
-        self.messages: list[Message] = [Message("system", time.time(), intro)]
-        self.last_user_message_idx:int | None = None
-        self.finalized_user_message_idx: int | None = None
-
-    def add_user_message(self, message) -> None:
-        if self.last_user_message_idx is not None and self.last_user_message_idx != self.finalized_user_message_idx:
-            previous_message: str = self.messages[self.last_user_message_idx].message
-            self.messages[self.last_user_message_idx] = Message(
-                "user", time.time(), ' '.join([previous_message, message])
-            )
-            self.messages = self.messages[: self.last_user_message_idx + 1]
-        else:
-            self.messages.append(Message("user", time.time(), message))
-
-        self.last_user_message_idx = len(self.messages) - 1
-
-    def add_assistant_message(self, message) -> None:
-        if self.messages[-1].type == "assistant":
-            self.messages[-1].message += " " + message
-        else:
-            self.messages.append(Message("assistant", time.time(), message))
-
-    def add_assistant_messages(self, messages) -> None:
-        self.messages.append(Message("assistant", time.time(), " ".join(messages)))
-
-    def get_llm_messages(self) -> list[dict[str, str]]:
-        return [{"role": m.type, "content": m.message} for m in self.messages]
-
-    def finalize_user_message(self) -> None:
-        self.finalized_user_message_idx = self.last_user_message_idx
-
-    def shutdown(self) -> None:
-        pass
-
-class IndexingMessageHandler(MessageHandler):
-    def __init__(
-        self, intro, services: AIServiceConfig, indexer: SearchIndexer
-    ) -> None:
-        super().__init__(intro)
-        self.services = services
-
-        self.search_indexer = indexer
-
-        self.last_written_idx = 0
-        self.storage_message_queue = Queue()
-
-        self.index_writer_thread = Thread(target=self.storage_writer, daemon=True)
-        self.index_writer_thread.start()
-
-        self.logger = logging.getLogger("dailyai")
-
-    def shutdown(self):
-        self.finalize_user_message()
-        self.storage_message_queue.put(None)
-        self.index_writer_thread.join()
-
-    def storage_writer(self) -> None:
-        while True:
-            try:
-                message_idx = self.storage_message_queue.get()
-                self.storage_message_queue.task_done()
-
-                if message_idx is None:
-                    return
-
-                if message_idx <= self.last_written_idx:
-                    continue
-
-                self.last_written_idx = message_idx
-
-                message = self.messages[message_idx]
-                content = message.message
-                if message.type == "user":
-                    content = self.cleanup_user_message(content)
-
-                    # sometimes the LLM returns a string wrapped in quotes and sometimes it doesn't.
-                    # if it didn't, wrap it in quotes
-                    if content[0] != '"':
-                        content = '"' + content + '"'
-
-                self.search_indexer.index_text(content)
-            except Empty:
-                pass
-
-    def cleanup_user_message(self, user_message) -> str:
-        return user_message
-
-    def finalize_user_message(self):
-        super().finalize_user_message()
-        self.write_messages_to_storage()
-
-    def write_messages_to_storage(self):
-        if self.finalized_user_message_idx is None:
-            return
-
-        for idx in range(self.last_written_idx, len(self.messages)):
-            self.logger.info(
-                f"Writing to storage: {self.messages[idx].type} {self.messages[idx].message}"
-            )
-            if (
-                self.messages[idx].type == "user"
-                and idx > self.finalized_user_message_idx
-            ):
-                break
-
-            if self.messages[idx].type != "system":
-                self.storage_message_queue.put(idx)
--- a/src/dailyai/orchestrator.py
+++ b/src/dailyai/orchestrator.py
@@ -1,409 +0,0 @@
-import logging
-import os
-import time
-import wave
-
-from dataclasses import dataclass
-from enum import Enum
-from queue import Queue, Empty
-from opentelemetry import trace, context
-
-from dailyai.async_processor.async_processor import (
-    AsyncProcessor,
-    AsyncProcessorState,
-    ConversationProcessorCollection,
-    OrchestratorResponse,
-    LLMResponse,
-)
-from dailyai.queue_frame import QueueFrame, FrameType
-from dailyai.services.ai_services import AIServiceConfig
-from dailyai.message_handler.message_handler import MessageHandler
-
-from threading import Thread, Semaphore, Event, Timer
-
-from opentelemetry import context
-from opentelemetry.context.context import Context
-
-from daily import (
-    EventHandler,
-    CallClient,
-    Daily,
-    VirtualCameraDevice,
-    VirtualMicrophoneDevice,
-    VirtualSpeakerDevice,
-)
-
-
-@dataclass
-class OrchestratorConfig:
-    room_url: str
-    token: str
-    bot_name: str
-    expiration: float
-
-# Note that we use this as a default parameter value in the Orchestrator
-# constructor. The dataclass is defined with Frozen=True, so this should
-# be safe.
-default_conversation_collection = ConversationProcessorCollection(
-    introduction=LLMResponse,
-    waiting=None,
-    response=LLMResponse,
-    goodbye=None,
-)
-
-
-class Orchestrator(EventHandler):
-
-    def __init__(
-        self,
-        daily_config: OrchestratorConfig,
-        ai_service_config: AIServiceConfig,
-        message_handler: MessageHandler,
-        conversation_processors: ConversationProcessorCollection = default_conversation_collection,
-        tracer=None,
-    ):
-        self.bot_name: str = daily_config.bot_name
-        self.room_url: str = daily_config.room_url
-        self.token: str = daily_config.token
-        self.expiration: float = daily_config.expiration
-
-        self.logger: logging.Logger = logging.getLogger("dailyai")
-        self.tracer = tracer or trace.get_tracer("orchestrator")
-
-        self.ctx: Context = context.get_current()
-
-        self.transcription = ""
-        self.last_fragment_at = None
-        self.talked_at = None
-        self.paused_at = None
-
-        self.logger.info(f"Creating Response for introductions")
-        self.services: AIServiceConfig = ai_service_config
-        self.output_queue = Queue()
-        self.is_interrupted = Event()
-        self.stop_threads = Event()
-        self.story_started = False
-
-        self.message_handler = message_handler
-        self.conversation_processors: ConversationProcessorCollection = conversation_processors
-
-        if conversation_processors.introduction is not None:
-            intro = conversation_processors.introduction(
-                services=self.services, message_handler=self.message_handler, output_queue=self.output_queue
-            )
-            intro.prepare()
-            intro.set_state_callback(AsyncProcessorState.DONE, self.on_intro_played)
-            intro.set_state_callback(AsyncProcessorState.FINALIZED, self.on_intro_finished)
-            self.logger.info(f"Introduction is preparing")
-
-            self.current_response: AsyncProcessor = intro
-        self.can_interrupt = False
-        # self.response_event.set()
-        self.response_semaphore = Semaphore()
-
-        self.speech_timeout = None
-        self.interrupt_time = None
-
-        self.logger.info("Configuring daily")
-        self.configure_daily()
-
-    def configure_daily(self):
-        Daily.init()
-        self.client = CallClient(event_handler=self)
-
-        self.logger.info(f"Mic sample rate: {self.services.tts.get_mic_sample_rate()}")
-        self.mic: VirtualMicrophoneDevice  = Daily.create_microphone_device(
-            "mic", sample_rate=self.services.tts.get_mic_sample_rate(), channels=1
-        )
-        self.speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
-            "speaker", sample_rate=16000, channels=1
-        )
-        self.camera: VirtualCameraDevice = Daily.create_camera_device(
-            "camera", width=720, height=1280, color_format="RGB"
-        )
-
-        Daily.select_speaker_device("speaker")
-
-        self.client.set_user_name(self.bot_name)
-        self.client.join(self.room_url, self.token, completion=self.call_joined)
-
-        self.client.update_inputs(
-            {
-                "camera": {
-                    "isEnabled": True,
-                    "settings": {
-                        "deviceId": "camera",
-                    },
-                },
-                "microphone": {
-                    "isEnabled": True,
-                    "settings": {
-                        "deviceId": "mic",
-                        "customConstraints": {
-                            "autoGainControl": {"exact": False},
-                            "echoCancellation": {"exact": False},
-                            "noiseSuppression": {"exact": False},
-                        },
-                    },
-                },
-            }
-        )
-
-        self.client.update_publishing(
-            {
-                "camera": {
-                    "sendSettings": {
-                        "maxQuality": "low",
-                        "encodings": {
-                            "low": {
-                                "maxBitrate": 250000,
-                                "scaleResolutionDownBy": 1.333,
-                                "maxFramerate": 8,
-                            }
-                        },
-                    }
-                }
-            }
-        )
-
-        self.my_participant_id = self.client.participants()["local"]["id"]
-
-    def start(self) -> None:
-        # TODO: this loop could, I think, be replaced with a timer and an event
-        self.participant_left = False
-
-        try:
-            participant_count: int = len(self.client.participants())
-            self.logger.info(f"{participant_count} participants in room")
-            while time.time() < self.expiration and not self.participant_left:
-                # all handling of incoming transcriptions happens in on_transcription_message
-                time.sleep(1)
-        except Exception as e:
-            self.logger.error(f"Exception {e}")
-        finally:
-            self.client.leave()
-
-    def stop(self):
-        self.logger.info("Stop current response")
-        if self.current_response:
-            if self.current_response.state < AsyncProcessorState.INTERRUPTED:
-                self.current_response.interrupt()
-
-            self.logger.info("Wait for state transition")
-            self.current_response.wait_for_state_transition(AsyncProcessorState.FINALIZED)
-
-        self.stop_threads.set()
-        self.camera_thread.join()
-        self.logger.info("Camera thread stopped")
-
-        self.logger.info("Put stop in output queue")
-        self.output_queue.put(QueueFrame(FrameType.END_STREAM, None))
-
-        self.frame_consumer_thread.join()
-        self.logger.info("Orchestrator stopped.")
-
-    def on_intro_played(self, intro):
-        self.logger.info(f"Introduction has played")
-        self.can_interrupt = True
-        intro.finalize()
-
-    def on_intro_finished(self, intro):
-        self.logger.info(f"Introduction has finished")
-        waiting = self.conversation_processors.waiting(self.services, self.message_handler, self.output_queue)
-        waiting.prepare()
-        waiting.play()
-
-    def on_response_played(self, response):
-        response.finalize()
-
-    def on_response_finished(self, response):
-        if not response.was_interrupted:
-            self.message_handler.finalize_user_message()
-
-    def call_joined(self, join_data, client_error):
-        self.logger.info(f"Call_joined: {join_data}, {client_error}")
-        self.client.start_transcription(
-            {
-                "language": "en",
-                "tier": "nova",
-                "model": "2-conversationalai",
-                "profanity_filter": True,
-                "redact": False,
-                "extra": {
-                    "endpointing": True,
-                    "punctuate": False,
-                }
-            }
-        )
-
-    def on_participant_joined(self, participant):
-        with self.tracer.start_as_current_span("on_participant_joined", context=self.ctx):
-            self.logger.info(f"on_participant_joined: {participant}")
-
-            # TODO: figure out the architecture to get the story id to the client
-            # self.client.send_app_message({"event": "story-id", "storyID": self.story_id})
-            time.sleep(2)
-
-            if not self.story_started:
-                self.action()
-                self.story_started = True
-
-    def on_participant_left(self, participant, reason):
-        self.logger.info(f"Participant {participant} left")
-        if len(self.client.participants()) < 2:
-            self.participant_left = True
-
-    def on_app_message(self, message, sender):
-        with self.tracer.start_as_current_span("on_app_message", context=self.ctx):
-            self.logger.info(f"on_app_message {message} from {sender}")
-            if "isSpeaking" in message and message["isSpeaking"] == True:
-                self.handle_user_started_talking()
-
-            if "isSpeaking" in message and message["isSpeaking"] == False:
-                self.handle_user_stopped_talking()
-
-    def on_transcription_message(self, message):
-        with self.tracer.start_as_current_span("on_transcription_message", context=self.ctx):
-            if message["session_id"] != self.my_participant_id:
-                self.handle_transcription_fragment(message['text'])
-
-    def on_transcription_stopped(self, stopped_by, stopped_by_error):
-        self.logger.info(f"Transcription stopped {stopped_by}, {stopped_by_error}")
-
-    def on_transcription_error(self, message):
-        self.logger.error(f"Transcription error {message}")
-
-    def on_transcription_started(self, status):
-        self.logger.info(f"Transcription started {status}")
-
-    def set_image(self, image: bytes):
-        self.image: bytes | None = image
-
-    def run_camera(self):
-        try:
-            while not self.stop_threads.is_set():
-                if self.image:
-                    self.camera.write_frame(self.image)
-
-                time.sleep(1.0 / 8.0)  # 8 fps
-        except Exception as e:
-            self.logger.error(f"Exception {e} in camera thread.")
-
-    def handle_user_started_talking(self):
-        # TODO: allow configuration of the timer timeout
-        self.logger.error("user started talking")
-        self.speech_timeout = Timer(1.0, self.utterance_interrupt)
-
-    def handle_user_stopped_talking(self):
-        self.logger.error("user stopped talking, canceling utterance interrupt")
-        if self.speech_timeout:
-            self.speech_timeout.cancel()
-
-    def utterance_interrupt(self):
-        self.logger.error("utterance interrupt")
-        self.is_interrupted.set()
-
-    def handle_transcription_fragment(self, fragment):
-        if not self.can_interrupt:
-            return
-
-        # start generating a new response. We'll do the fast parts of the interrupt
-        # now but wait for the state transition after we've kicked off the prepare
-        # on the new response.
-        if (
-            self.current_response
-            and self.current_response.state < AsyncProcessorState.INTERRUPTED
-        ):
-            self.interrupt_time = time.perf_counter()
-            self.is_interrupted.set()
-            self.current_response.interrupt()
-
-        self.message_handler.add_user_message(fragment)
-
-        response_type: type[OrchestratorResponse] | type[LLMResponse] = self.conversation_processors.response or LLMResponse
-        new_response: OrchestratorResponse = response_type(
-            self.services, self.message_handler, self.output_queue
-        )
-        new_response.set_state_callback(
-            AsyncProcessorState.DONE, self.on_response_played
-        )
-        new_response.set_state_callback(
-            AsyncProcessorState.FINALIZED, self.on_response_finished
-        )
-        new_response.prepare()
-
-        self.response_semaphore.acquire()
-        if (
-            self.current_response
-            and self.current_response.state < AsyncProcessorState.INTERRUPTED
-        ):
-            self.current_response.wait_for_state_transition(
-                AsyncProcessorState.FINALIZED
-            )
-
-        self.current_response = new_response
-        self.current_response.play()
-
-        self.response_semaphore.release()
-
-    def action(self):
-        self.logger.info("Starting camera thread")
-        self.image: bytes | None = None
-        self.camera_thread = Thread(target=self.run_camera, daemon=True)
-        self.camera_thread.start()
-
-        self.logger.info("Starting frame consumer thread")
-        self.frame_consumer_thread = Thread(target=self.frame_consumer, daemon=True)
-        self.frame_consumer_thread.start()
-
-        self.logger.info("Playing introduction")
-        self.can_interrupt = False
-        self.current_response.play()
-
-    def frame_consumer(self):
-        self.logger.info("🎬 Starting frame consumer thread")
-        b = bytearray()
-        smallest_write_size = 3200
-        all_audio_frames = bytearray()
-        while True:
-            try:
-                frame:QueueFrame = self.output_queue.get()
-                if frame.frame_type == FrameType.END_STREAM:
-                    self.logger.info("Stopping frame consumer thread")
-                    return
-
-                # if interrupted, we just pull frames off the queue and discard them
-                if not self.is_interrupted.is_set():
-                    if frame:
-                        if frame.frame_type == FrameType.AUDIO:
-                            chunk = frame.frame_data
-
-                            all_audio_frames.extend(chunk)
-
-                            b.extend(chunk)
-                            l = len(b) - (len(b) % smallest_write_size)
-                            if l:
-                                self.mic.write_frames(bytes(b[:l]))
-                                b = b[l:]
-                        elif frame.frame_type == FrameType.IMAGE:
-                            self.set_image(frame.frame_data)
-                    elif len(b):
-                        self.mic.write_frames(bytes(b))
-                        b = bytearray()
-                else:
-                    if self.interrupt_time:
-                        self.logger.info(f"Lag to stop stream after interruption {time.perf_counter() - self.interrupt_time}")
-                        self.interrupt_time = None
-
-                    if frame.frame_type == FrameType.START_STREAM:
-                        self.is_interrupted.clear()
-
-                self.output_queue.task_done()
-            except Empty:
-                try:
-                    if len(b):
-                        self.mic.write_frames(bytes(b))
-                except Exception as e:
-                    self.logger.error(f"Exception in frame_consumer: {e}, {len(b)}")
-
-                b = bytearray()
--- a/src/dailyai/pipeline/aggregators.py
+++ b/src/dailyai/pipeline/aggregators.py
@@ -0,0 +1,400 @@
+import asyncio
+import re
+
+from dailyai.pipeline.frame_processor import FrameProcessor
+
+from dailyai.pipeline.frames import (
+    EndFrame,
+    AudioFrame,
+    EndPipeFrame,
+    Frame,
+    ImageFrame,
+    LLMMessagesQueueFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
+    TextFrame,
+    TranscriptionQueueFrame,
+    UserStartedSpeakingFrame,
+    UserStoppedSpeakingFrame,
+)
+from dailyai.pipeline.pipeline import Pipeline
+from dailyai.services.ai_services import AIService
+
+from typing import AsyncGenerator, Callable, Coroutine, List
+
+from dailyai.services.openai_llm_context import OpenAILLMContext
+
+
+class ResponseAggregator(FrameProcessor):
+
+    def __init__(
+        self,
+        *,
+        messages: list[dict] | None,
+        role: str,
+        start_frame,
+        end_frame,
+        accumulator_frame,
+        pass_through=True,
+    ):
+        self.aggregation = ""
+        self.aggregating = False
+        self.messages = messages
+        self._role = role
+        self._start_frame = start_frame
+        self._end_frame = end_frame
+        self._accumulator_frame = accumulator_frame
+        self._pass_through = pass_through
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if not self.messages:
+            return
+
+        if isinstance(frame, self._start_frame):
+            self.aggregating = True
+        elif isinstance(frame, self._end_frame):
+            self.aggregating = False
+            # Sometimes VAD triggers quickly on and off. If we don't get any transcription,
+            # it creates empty LLM message queue frames
+            if len(self.aggregation) > 0:
+                self.messages.append(
+                    {"role": self._role, "content": self.aggregation})
+                self.aggregation = ""
+                yield self._end_frame()
+                yield LLMMessagesQueueFrame(self.messages)
+        elif isinstance(frame, self._accumulator_frame) and self.aggregating:
+            self.aggregation += f" {frame.text}"
+            if self._pass_through:
+                yield frame
+        else:
+            yield frame
+
+
+class LLMResponseAggregator(ResponseAggregator):
+    def __init__(self, messages: list[dict]):
+        super().__init__(
+            messages=messages,
+            role="assistant",
+            start_frame=LLMResponseStartFrame,
+            end_frame=LLMResponseEndFrame,
+            accumulator_frame=TextFrame,
+        )
+
+
+class UserResponseAggregator(ResponseAggregator):
+    def __init__(self, messages: list[dict]):
+        super().__init__(
+            messages=messages,
+            role="user",
+            start_frame=UserStartedSpeakingFrame,
+            end_frame=UserStoppedSpeakingFrame,
+            accumulator_frame=TranscriptionQueueFrame,
+            pass_through=False,
+        )
+
+
+class LLMContextAggregator(AIService):
+    def __init__(
+        self,
+        messages: list[dict],
+        role: str,
+        bot_participant_id=None,
+        complete_sentences=True,
+        pass_through=True,
+    ):
+        super().__init__()
+        self.messages = messages
+        self.bot_participant_id = bot_participant_id
+        self.role = role
+        self.sentence = ""
+        self.complete_sentences = complete_sentences
+        self.pass_through = pass_through
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        # We don't do anything with non-text frames, pass it along to next in
+        # the pipeline.
+        if not isinstance(frame, TextFrame):
+            yield frame
+            return
+
+        # Ignore transcription frames from the bot
+        if isinstance(frame, TranscriptionQueueFrame):
+            if frame.participantId == self.bot_participant_id:
+                return
+
+        # The common case for "pass through" is receiving frames from the LLM that we'll
+        # use to update the "assistant" LLM messages, but also passing the text frames
+        # along to a TTS service to be spoken to the user.
+        if self.pass_through:
+            yield frame
+
+        # TODO: split up transcription by participant
+        if self.complete_sentences:
+            # type: ignore -- the linter thinks this isn't a TextQueueFrame, even
+            # though we check it above
+            self.sentence += frame.text
+            if self.sentence.endswith((".", "?", "!")):
+                self.messages.append(
+                    {"role": self.role, "content": self.sentence})
+                self.sentence = ""
+                yield LLMMessagesQueueFrame(self.messages)
+        else:
+            # type: ignore -- the linter thinks this isn't a TextQueueFrame, even
+            # though we check it above
+            self.messages.append({"role": self.role, "content": frame.text})
+            yield LLMMessagesQueueFrame(self.messages)
+
+
+class LLMUserContextAggregator(LLMContextAggregator):
+    def __init__(
+            self,
+            messages: list[dict],
+            bot_participant_id=None,
+            complete_sentences=True):
+        super().__init__(
+            messages,
+            "user",
+            bot_participant_id,
+            complete_sentences,
+            pass_through=False)
+
+
+class LLMAssistantContextAggregator(LLMContextAggregator):
+    def __init__(
+            self,
+            messages: list[dict],
+            bot_participant_id=None,
+            complete_sentences=True):
+        super().__init__(
+            messages,
+            "assistant",
+            bot_participant_id,
+            complete_sentences,
+            pass_through=True,
+        )
+
+
+class SentenceAggregator(FrameProcessor):
+    """This frame processor aggregates text frames into complete sentences.
+
+    Frame input/output:
+        TextFrame("Hello,") -> None
+        TextFrame(" world.") -> TextFrame("Hello world.")
+
+    Doctest:
+    >>> async def print_frames(aggregator, frame):
+    ...     async for frame in aggregator.process_frame(frame):
+    ...         print(frame.text)
+
+    >>> aggregator = SentenceAggregator()
+    >>> asyncio.run(print_frames(aggregator, TextFrame("Hello,")))
+    >>> asyncio.run(print_frames(aggregator, TextFrame(" world.")))
+    Hello, world.
+    """
+
+    def __init__(self):
+        self.aggregation = ""
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, TextFrame):
+            m = re.search("(.*[?.!])(.*)", frame.text)
+            if m:
+                yield TextFrame(self.aggregation + m.group(1))
+                self.aggregation = m.group(2)
+            else:
+                self.aggregation += frame.text
+        elif isinstance(frame, EndFrame):
+            if self.aggregation:
+                yield TextFrame(self.aggregation)
+            yield frame
+        else:
+            yield frame
+
+
+class LLMFullResponseAggregator(FrameProcessor):
+    """This class aggregates Text frames until it receives a
+    LLMResponseEndFrame, then emits the concatenated text as
+    a single text frame.
+
+    given the following frames:
+
+        TextFrame("Hello,")
+        TextFrame(" world.")
+        TextFrame(" I am")
+        TextFrame(" an LLM.")
+        LLMResponseEndFrame()]
+
+    this processor will yield nothing for the first 4 frames, then
+
+        TextFrame("Hello, world. I am an LLM.")
+        LLMResponseEndFrame()
+
+    when passed the last frame.
+
+    >>> async def print_frames(aggregator, frame):
+    ...     async for frame in aggregator.process_frame(frame):
+    ...         if isinstance(frame, TextFrame):
+    ...             print(frame.text)
+    ...         else:
+    ...             print(frame.__class__.__name__)
+
+    >>> aggregator = LLMFullResponseAggregator()
+    >>> asyncio.run(print_frames(aggregator, TextFrame("Hello,")))
+    >>> asyncio.run(print_frames(aggregator, TextFrame(" world.")))
+    >>> asyncio.run(print_frames(aggregator, TextFrame(" I am")))
+    >>> asyncio.run(print_frames(aggregator, TextFrame(" an LLM.")))
+    >>> asyncio.run(print_frames(aggregator, LLMResponseEndFrame()))
+    Hello, world. I am an LLM.
+    LLMResponseEndFrame
+    """
+
+    def __init__(self):
+        self.aggregation = ""
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if not isinstance(frame, AudioFrame):
+            print(f"^^^ LFRA got frame: {frame}")
+        if isinstance(frame, TextFrame):
+            self.aggregation += frame.text
+            print(
+                f"^^^ LFRA got textframe. aggregation is now {self.aggregation}")
+        elif isinstance(frame, LLMResponseEndFrame):
+            print(
+                f"^^^ LFRA got an llmresponseendframe. About to yield aggregation: {self.aggregation}")
+            yield TextFrame(self.aggregation)
+            yield frame
+            self.aggregation = ""
+        else:
+            yield frame
+
+
+class StatelessTextTransformer(FrameProcessor):
+    """This processor calls the given function on any text in a text frame.
+
+    >>> async def print_frames(aggregator, frame):
+    ...     async for frame in aggregator.process_frame(frame):
+    ...         print(frame.text)
+
+    >>> aggregator = StatelessTextTransformer(lambda x: x.upper())
+    >>> asyncio.run(print_frames(aggregator, TextFrame("Hello")))
+    HELLO
+    """
+
+    def __init__(self, transform_fn):
+        self.transform_fn = transform_fn
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, TextFrame):
+            result = self.transform_fn(frame.text)
+            if isinstance(result, Coroutine):
+                result = await result
+
+            yield TextFrame(result)
+        else:
+            yield frame
+
+
+class ParallelPipeline(FrameProcessor):
+    """Run multiple pipelines in parallel.
+
+    This class takes frames from its source queue and sends them to each
+    sub-pipeline. Each sub-pipeline emits its frames into this class's
+    sink queue. No guarantees are made about the ordering of frames in
+    the sink queue (that is, no sub-pipeline has higher priority than
+    any other, frames are put on the sink in the order they're emitted
+    by the sub-pipelines).
+
+    After each frame is taken from this class's source queue and placed
+    in each sub-pipeline's source queue, an EndPipeFrame is put on each
+    sub-pipeline's source queue. This indicates to the sub-pipe runner
+    that it should exit.
+
+    Since frame handlers pass through unhandled frames by convention, this
+    class de-dupes frames in its sink before yielding them.
+    """
+
+    def __init__(self, pipeline_definitions: List[List[FrameProcessor]]):
+        self.sources = [asyncio.Queue() for _ in pipeline_definitions]
+        self.sink: asyncio.Queue[Frame] = asyncio.Queue()
+        self.pipelines: list[Pipeline] = [
+            Pipeline(
+                pipeline_definition,
+                source,
+                self.sink,
+            )
+            for source, pipeline_definition in zip(self.sources, pipeline_definitions)
+        ]
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        for source in self.sources:
+            await source.put(frame)
+            await source.put(EndPipeFrame())
+
+        await asyncio.gather(*[pipeline.run_pipeline() for pipeline in self.pipelines])
+
+        seen_ids = set()
+        while not self.sink.empty():
+            frame = await self.sink.get()
+
+            # de-dup frames. Because the convention is to yield a frame that isn't processed,
+            # each pipeline will likely yield the same frame, so we will end up with _n_ copies
+            # of unprocessed frames where _n_ is the number of parallel pipes that don't
+            # process that frame.
+            if id(frame) in seen_ids:
+                continue
+            seen_ids.add(id(frame))
+
+            # Skip passing along EndParallelPipeQueueFrame, because we use them
+            # for our own flow control.
+            if not isinstance(frame, EndPipeFrame):
+                yield frame
+
+
+class GatedAggregator(FrameProcessor):
+    """Accumulate frames, with custom functions to start and stop accumulation.
+    Yields gate-opening frame before any accumulated frames, then ensuing frames
+    until and not including the gate-closed frame.
+
+    >>> async def print_frames(aggregator, frame):
+    ...     async for frame in aggregator.process_frame(frame):
+    ...         if isinstance(frame, TextFrame):
+    ...             print(frame.text)
+    ...         else:
+    ...             print(frame.__class__.__name__)
+
+    >>> aggregator = GatedAggregator(
+    ...     gate_close_fn=lambda x: isinstance(x, LLMResponseStartFrame),
+    ...     gate_open_fn=lambda x: isinstance(x, ImageFrame),
+    ...     start_open=False)
+    >>> asyncio.run(print_frames(aggregator, TextFrame("Hello")))
+    >>> asyncio.run(print_frames(aggregator, TextFrame("Hello again.")))
+    >>> asyncio.run(print_frames(aggregator, ImageFrame(url='', image=bytes([]))))
+    ImageFrame
+    Hello
+    Hello again.
+    >>> asyncio.run(print_frames(aggregator, TextFrame("Goodbye.")))
+    Goodbye.
+    """
+
+    def __init__(self, gate_open_fn, gate_close_fn, start_open):
+        self.gate_open_fn = gate_open_fn
+        self.gate_close_fn = gate_close_fn
+        self.gate_open = start_open
+        self.accumulator: List[Frame] = []
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if self.gate_open:
+            if self.gate_close_fn(frame):
+                self.gate_open = False
+        else:
+            if self.gate_open_fn(frame):
+                self.gate_open = True
+
+        if self.gate_open:
+            yield frame
+            if self.accumulator:
+                for frame in self.accumulator:
+                    yield frame
+            self.accumulator = []
+        else:
+            self.accumulator.append(frame)
--- a/src/dailyai/pipeline/frame_processor.py
+++ b/src/dailyai/pipeline/frame_processor.py
@@ -0,0 +1,33 @@
+from abc import abstractmethod
+from typing import AsyncGenerator
+
+from dailyai.pipeline.frames import ControlFrame, Frame
+
+
+class FrameProcessor:
+    """This is the base class for all frame processors. Frame processors consume a frame
+    and yield 0 or more frames. Generally frame processors are used as part of a pipeline
+    where frames come from a source queue, are processed by a series of frame processors,
+    then placed on a sink queue.
+
+    By convention, FrameProcessors should immediately yield any frames they don't process.
+
+    Stateful FrameProcessors should watch for the EndStreamQueueFrame and finalize their
+    output, eg. yielding an unfinished sentence if they're aggregating LLM output to full
+    sentences. EndStreamQueueFrame is also a chance to clean up any services that need to
+    be closed, del'd, etc.
+    """
+
+    @abstractmethod
+    async def process_frame(
+        self, frame: Frame
+    ) -> AsyncGenerator[Frame, None]:
+        """Process a single frame and yield 0 or more frames."""
+        if isinstance(frame, ControlFrame):
+            yield frame
+        yield frame
+
+    @abstractmethod
+    async def interrupted(self) -> None:
+        """Handle any cleanup if the pipeline was interrupted."""
+        pass
--- a/src/dailyai/pipeline/frames.py
+++ b/src/dailyai/pipeline/frames.py
@@ -0,0 +1,211 @@
+from dataclasses import dataclass
+from typing import Any, List
+
+from dailyai.services.openai_llm_context import OpenAILLMContext
+
+
+class Frame:
+    def __str__(self):
+        return f"{self.__class__.__name__}"
+
+
+class ControlFrame(Frame):
+    # Control frames should contain no instance data, so
+    # equality is based solely on the class.
+    def __eq__(self, other):
+        return isinstance(other, self.__class__)
+
+
+class StartFrame(ControlFrame):
+    """Used (but not required) to start a pipeline, and is also used to
+    indicate that an interruption has ended and the transport should start
+    processing frames again."""
+    pass
+
+
+class EndFrame(ControlFrame):
+    """Indicates that a pipeline has ended and frame processors and pipelines
+    should be shut down. If the transport receives this frame, it will stop
+    sending frames to its output channel(s) and close all its threads."""
+    pass
+
+
+class EndPipeFrame(ControlFrame):
+    """Indicates that a pipeline has ended but that the transport should
+    continue processing. This frame is used in parallel pipelines and other
+    sub-pipelines."""
+    pass
+
+
+class PipelineStartedFrame(ControlFrame):
+    """
+    Used by the transport to indicate that execution of a pipeline is starting
+    (or restarting). It should be the first frame your app receives when it
+    starts, or when an interruptible pipeline has been interrupted.
+    """
+
+    pass
+
+
+class LLMResponseStartFrame(ControlFrame):
+    """Used to indicate the beginning of an LLM response. Following TextFrames
+    are part of the LLM response until an LLMResponseEndFrame"""
+    pass
+
+
+class LLMResponseEndFrame(ControlFrame):
+    """Indicates the end of an LLM response."""
+    pass
+
+
+@dataclass()
+class AudioFrame(Frame):
+    """A chunk of audio. Will be played by the transport if the transport's mic
+    has been enabled."""
+    data: bytes
+
+    def __str__(self):
+        return f"{self.__class__.__name__}, size: {len(self.data)} B"
+
+
+@dataclass()
+class ImageFrame(Frame):
+    """An image. Will be shown by the transport if the transport's camera is
+    enabled."""
+    url: str | None
+    image: bytes
+
+    def __str__(self):
+        return f"{self.__class__.__name__}, url: {self.url}, image size: {len(self.image)} B"
+
+
+@dataclass()
+class SpriteFrame(Frame):
+    """An animated sprite. Will be shown by the transport if the transport's
+    camera is enabled. Will play at the framerate specified in the transport's
+    `fps` constructor parameter."""
+    images: list[bytes]
+
+    def __str__(self):
+        return f"{self.__class__.__name__}, list size: {len(self.images)}"
+
+
+@dataclass()
+class TextFrame(Frame):
+    """A chunk of text. Emitted by LLM services, consumed by TTS services, can
+    be used to send text through pipelines."""
+    text: str
+
+    def __str__(self):
+        return f'{self.__class__.__name__}: "{self.text}"'
+
+
+@dataclass()
+class TranscriptionQueueFrame(TextFrame):
+    """A text frame with transcription-specific data. Will be placed in the
+    transport's receive queue when a participant speaks."""
+    participantId: str
+    timestamp: str
+
+
+@dataclass()
+class LLMMessagesQueueFrame(Frame):
+    """A frame containing a list of LLM messages. Used to signal that an LLM
+    service should run a chat completion and emit an LLMStartFrames, TextFrames
+    and an LLMEndFrame.
+    Note that the messages property on this class is mutable, and will be
+    be updated by various ResponseAggregator frame processors."""
+    messages: List[dict]
+
+
+@dataclass()
+class OpenAILLMContextFrame(Frame):
+    """Like an LLMMessagesQueueFrame, but with extra context specific to the
+    OpenAI API. The context in this message is also mutable, and will be
+    changed by the OpenAIContextAggregator frame processor."""
+    context: OpenAILLMContext
+
+
+@dataclass()
+class ReceivedAppMessageFrame(Frame):
+    message: Any
+    sender: str
+
+    def __str__(self):
+        return f"ReceivedAppMessageFrame: sender: {self.sender}, message: {self.message}"
+
+
+@dataclass()
+class SendAppMessageFrame(Frame):
+    message: Any
+    participantId: str | None
+
+    def __str__(self):
+        return f"SendAppMessageFrame: participantId: {self.participantId}, message: {self.message}"
+
+
+class UserStartedSpeakingFrame(Frame):
+    """Emitted by VAD to indicate that a participant has started speaking.
+    This can be used for interruptions or other times when detecting that
+    someone is speaking is more important than knowing what they're saying
+    (as you will with a TranscriptionFrame)"""
+    pass
+
+
+class UserStoppedSpeakingFrame(Frame):
+    """Emitted by the VAD to indicate that a user stopped speaking."""
+    pass
+
+
+class BotStartedSpeakingFrame(Frame):
+    pass
+
+
+class BotStoppedSpeakingFrame(Frame):
+    pass
+
+
+@dataclass()
+class LLMFunctionStartFrame(Frame):
+    """Emitted when the LLM receives the beginning of a function call
+    completion. A frame processor can use this frame to indicate that it should
+    start preparing to make a function call, if it can do so in the absence of
+    any arguments."""
+    function_name: str
+
+
+@dataclass()
+class LLMFunctionCallFrame(Frame):
+    """Emitted when the LLM has received an entire function call completion."""
+    function_name: str
+    arguments: str
+
+
+@dataclass()
+class VideoImageFrame(Frame):
+    """Contains a still image from a partcipant's video stream."""
+    participantId: str
+    image: bytes
+
+    # def __str__(self):
+    #     return f"{self.__class__.__name__}, participantId: {self.participantId}, image size: {len(self.image)} B"
+
+
+class TelestratorImageFrame(ImageFrame):
+    pass
+
+
+@dataclass()
+class VisionFrame(Frame):
+    prompt: str
+    image: bytes
+
+    # def __str__(self):
+    #     return f"{self.__class__.__name__}, prompt: {self.prompt}, image size: {len(self.image)} B"
+
+
+@dataclass()
+class RequestVideoImageFrame(Frame):
+    """Send to the transport to request a new video image from a specific participant. Leave participantId
+    empty to request a frame from all participants."""
+    participantId: str | None
--- a/src/dailyai/pipeline/merge_pipeline.py
+++ b/src/dailyai/pipeline/merge_pipeline.py
@@ -0,0 +1,24 @@
+from typing import List
+from dailyai.pipeline.frames import EndFrame, EndPipeFrame
+from dailyai.pipeline.pipeline import Pipeline
+
+
+class SequentialMergePipeline(Pipeline):
+    """This class merges the sink queues from a list of pipelines. Frames from
+    each pipeline's sink are merged in the order of pipelines in the list."""
+
+    def __init__(self, pipelines: List[Pipeline]):
+        super().__init__([])
+        self.pipelines = pipelines
+
+    async def run_pipeline(self):
+        for pipeline in self.pipelines:
+            while True:
+                frame = await pipeline.sink.get()
+                if isinstance(
+                        frame, EndFrame) or isinstance(
+                        frame, EndPipeFrame):
+                    break
+                await self.sink.put(frame)
+
+        await self.sink.put(EndFrame())
--- a/src/dailyai/pipeline/opeanai_llm_aggregator.py
+++ b/src/dailyai/pipeline/opeanai_llm_aggregator.py
@@ -0,0 +1,109 @@
+from typing import Any, AsyncGenerator, Callable
+from dailyai.pipeline.frame_processor import FrameProcessor
+from dailyai.pipeline.frames import (
+    Frame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
+    OpenAILLMContextFrame,
+    TextFrame,
+    TranscriptionQueueFrame,
+    UserStartedSpeakingFrame,
+    UserStoppedSpeakingFrame,
+)
+from dailyai.services.openai_llm_context import OpenAILLMContext
+
+from openai.types.chat import ChatCompletionRole
+
+
+class OpenAIContextAggregator(FrameProcessor):
+
+    def __init__(
+        self,
+        context: OpenAILLMContext,
+        aggregator: Callable[[Frame, str | None], str | None],
+        role: ChatCompletionRole,
+        start_frame: type,
+        end_frame: type,
+        accumulator_frame: type,
+        pass_through=True,
+    ):
+        if not (
+            issubclass(start_frame, Frame)
+            and issubclass(end_frame, Frame)
+            and issubclass(accumulator_frame, Frame)
+        ):
+            raise TypeError(
+                "start_frame, end_frame and accumulator_frame must be instances of Frame"
+            )
+
+        self._context: OpenAILLMContext = context
+        self._aggregator: Callable[[Frame, str | None], None] = aggregator
+        self._role: ChatCompletionRole = role
+        self._start_frame = start_frame
+        self._end_frame = end_frame
+        self._accumulator_frame = accumulator_frame
+        self._pass_through = pass_through
+
+        self._aggregating = False
+        self._aggregation = None
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, self._start_frame):
+            self._aggregating = True
+        elif isinstance(frame, self._end_frame):
+            self._aggregating = False
+            if self._aggregation:
+                self._context.add_message(
+                    {
+                        "role": self._role,
+                        "content": self._aggregation,
+                        "name": self._role,
+                    }  # type: ignore
+                )
+            self._aggregation = None
+            yield OpenAILLMContextFrame(self._context)
+        elif isinstance(frame, self._accumulator_frame) and self._aggregating:
+            self._aggregation = self._aggregator(frame, self._aggregation)
+            if self._pass_through:
+                yield frame
+        else:
+            yield frame
+
+    def string_aggregator(
+            self,
+            frame: Frame,
+            aggregation: str | None) -> str | None:
+        if not isinstance(frame, TextFrame):
+            raise TypeError(
+                "Frame must be a TextFrame instance to be aggregated by a string aggregator."
+            )
+        if not aggregation:
+            aggregation = ""
+        return " ".join([aggregation, frame.text])
+
+
+class OpenAIUserContextAggregator(OpenAIContextAggregator):
+    def __init__(self, context: OpenAILLMContext):
+        super().__init__(
+            context=context,
+            aggregator=self.string_aggregator,
+            role="user",
+            start_frame=UserStartedSpeakingFrame,
+            end_frame=UserStoppedSpeakingFrame,
+            accumulator_frame=TranscriptionQueueFrame,
+            pass_through=False,
+        )
+
+
+class OpenAIAssistantContextAggregator(OpenAIContextAggregator):
+
+    def __init__(self, context: OpenAILLMContext):
+        super().__init__(
+            context,
+            aggregator=self.string_aggregator,
+            role="assistant",
+            start_frame=LLMResponseStartFrame,
+            end_frame=LLMResponseEndFrame,
+            accumulator_frame=TextFrame,
+            pass_through=True,
+        )
--- a/src/dailyai/pipeline/pipeline.py
+++ b/src/dailyai/pipeline/pipeline.py
@@ -0,0 +1,110 @@
+import asyncio
+from typing import AsyncGenerator, AsyncIterable, Iterable, List
+from dailyai.pipeline.frame_processor import FrameProcessor
+
+from dailyai.pipeline.frames import EndPipeFrame, EndFrame, Frame
+
+
+class Pipeline:
+    """
+    This class manages a pipe of FrameProcessors, and runs them in sequence. The "source"
+    and "sink" queues are managed by the caller. You can use this class stand-alone to
+    perform specialized processing, or you can use the Transport's run_pipeline method to
+    instantiate and run a pipeline with the Transport's sink and source queues.
+    """
+
+    def __init__(
+        self,
+        processors: List[FrameProcessor],
+        source: asyncio.Queue | None = None,
+        sink: asyncio.Queue[Frame] | None = None
+    ):
+        """Create a new pipeline. By default we create the sink and source queues
+        if they're not provided, but these can be overridden to point to other
+        queues. If this pipeline is run by a transport, its sink and source queues
+        will be overridden.
+        """
+        self.processors: List[FrameProcessor] = processors
+
+        self.source: asyncio.Queue[Frame] = source or asyncio.Queue()
+        self.sink: asyncio.Queue[Frame] = sink or asyncio.Queue()
+
+    def set_source(self, source: asyncio.Queue[Frame]):
+        """Set the source queue for this pipeline. Frames from this queue
+        will be processed by each frame_processor in the pipeline, or order
+        from first to last."""
+        self.source = source
+
+    def set_sink(self, sink: asyncio.Queue[Frame]):
+        """Set the sink queue for this pipeline. After the last frame_processor
+        has processed a frame, its output will be placed on this queue."""
+        self.sink = sink
+
+    async def get_next_source_frame(self) -> AsyncGenerator[Frame, None]:
+        """Convenience function to get the next frame from the source queue. This
+        lets us consistently have an AsyncGenerator yield frames, from either the
+        source queue or a frame_processor."""
+
+        yield await self.source.get()
+
+    async def queue_frames(
+        self,
+        frames: Iterable[Frame] | AsyncIterable[Frame],
+    ) -> None:
+        """Insert frames directly into a pipeline. This is typically used inside a transport
+        participant_joined callback to prompt a bot to start a conversation, for example."""
+
+        if isinstance(frames, AsyncIterable):
+            async for frame in frames:
+                await self.source.put(frame)
+        elif isinstance(frames, Iterable):
+            for frame in frames:
+                await self.source.put(frame)
+        else:
+            raise Exception("Frames must be an iterable or async iterable")
+
+    async def run_pipeline(self):
+        """Run the pipeline. Take each frame from the source queue, pass it to
+        the first frame_processor, pass the output of that frame_processor to the
+        next in the list, etc. until the last frame_processor has processed the
+        resulting frames, then place those frames in the sink queue.
+
+        The source and sink queues must be set before calling this method.
+
+        This method will exit when an EndStreamQueueFrame is placed on the sink queue.
+        No more frames will be placed on the sink queue after an EndStreamQueueFrame, even
+        if it's not the last frame yielded by the last frame_processor in the pipeline..
+        """
+
+        try:
+            while True:
+                initial_frame = await self.source.get()
+                async for frame in self._run_pipeline_recursively(
+                    initial_frame, self.processors
+                ):
+                    await self.sink.put(frame)
+
+                if isinstance(initial_frame, EndFrame) or isinstance(
+                    initial_frame, EndPipeFrame
+                ):
+                    break
+        except asyncio.CancelledError:
+            # this means there's been an interruption, do any cleanup necessary
+            # here.
+            for processor in self.processors:
+                await processor.interrupted()
+            pass
+
+    async def _run_pipeline_recursively(
+        self, initial_frame: Frame, processors: List[FrameProcessor]
+    ) -> AsyncGenerator[Frame, None]:
+        """Internal function to add frames to the pipeline as they're yielded
+        by each processor."""
+        if processors:
+            async for frame in processors[0].process_frame(initial_frame):
+                async for final_frame in self._run_pipeline_recursively(
+                    frame, processors[1:]
+                ):
+                    yield final_frame
+        else:
+            yield initial_frame
--- a/src/dailyai/queue_frame.py
+++ b/src/dailyai/queue_frame.py
@@ -1,19 +0,0 @@
-from enum import Enum
-from dataclasses import dataclass
-
-class FrameType(Enum):
-    START_STREAM = 0
-    END_STREAM = 1
-    AUDIO = 2
-    IMAGE = 3
-    SENTENCE = 4
-    TEXT_CHUNK = 5
-    LLM_MESSAGE = 6
-    APP_MESSAGE = 7
-    IMAGE_DESCRIPTION = 8
-    TRANSCRIPTION = 9
-
-@dataclass(frozen=True)
-class QueueFrame:
-    frame_type: FrameType
-    frame_data: str | dict | bytes | list | None
--- a/src/dailyai/requirements.txt
+++ b/src/dailyai/requirements.txt
@@ -1,2 +0,0 @@
-Pillow==10.1.0
-typing_extensions==4.9.0
--- a/src/dailyai/services/ai_services.py
+++ b/src/dailyai/services/ai_services.py
@@ -1,169 +1,88 @@
 import asyncio
+import io
 import logging
-import re
+import time
+import wave
+from dailyai.pipeline.frame_processor import FrameProcessor

-from httpx import request
-
-from dailyai.queue_frame import QueueFrame, FrameType
+from dailyai.pipeline.frames import (
+    AudioFrame,
+    EndFrame,
+    EndPipeFrame,
+    ImageFrame,
+    LLMMessagesQueueFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
+    LLMFunctionStartFrame,
+    LLMFunctionCallFrame,
+    Frame,
+    TextFrame,
+    TranscriptionQueueFrame,
+    VisionFrame
+)

 from abc import abstractmethod
-from typing import AsyncGenerator, Iterable
-from dataclasses import dataclass
-from typing import AsyncGenerator
+from typing import AsyncGenerator, BinaryIO

-from collections.abc import Iterable, AsyncIterable
-
-class AIService:

+class AIService(FrameProcessor):
    def __init__(self):
        self.logger = logging.getLogger("dailyai")

-    def stop(self):
-        pass
-
-    def allowed_input_frame_types(self) -> set[FrameType]:
-        return set()
-
-    def possible_output_frame_types(self) -> set[FrameType]:
-        return set()
-
-    async def run_to_queue(self, queue: asyncio.Queue, frames, add_end_of_stream=False) -> None:
-        async for frame in self.run(frames):
-            await queue.put(frame)
-
-        if add_end_of_stream:
-            await queue.put(QueueFrame(FrameType.END_STREAM, None))
-
-    async def run(
-        self,
-        frames: Iterable[QueueFrame]
-        | AsyncIterable[QueueFrame]
-        | asyncio.Queue[QueueFrame],
-        requested_frame_types: set[FrameType] | None=None,
-    ) -> AsyncGenerator[QueueFrame, None]:
-        if requested_frame_types and self.possible_output_frame_types().intersection(requested_frame_types) == set():
-            raise Exception(f"Requested frame types {requested_frame_types} are not supported by this service.")
-
-        if not requested_frame_types:
-            requested_frame_types = self.possible_output_frame_types()
-
-        if isinstance(frames, AsyncIterable):
-            async for frame in frames:
-                async for output_frame in self.process_frame(requested_frame_types, frame):
-                    yield output_frame
-        elif isinstance(frames, Iterable):
-            for frame in frames:
-                async for output_frame in self.process_frame(requested_frame_types, frame):
-                    yield output_frame
-        elif isinstance(frames, asyncio.Queue):
-            while True:
-                frame = await frames.get()
-                async for output_frame in self.process_frame(requested_frame_types, frame):
-                    yield output_frame
-                if frame.frame_type == FrameType.END_STREAM:
-                    break
-        else:
-            raise Exception("Frames must be an iterable or async iterable")
-
-    @abstractmethod
-    async def process_frame(self, requested_frame_types:set[FrameType], frame:QueueFrame) -> AsyncGenerator[QueueFrame, None]:
-        # Yield something so the linter can deduce what should happen here.
-        yield QueueFrame(FrameType.END_STREAM, None)
-
-class SentenceAggregator(AIService):
-    def __init__(self, **kwargs):
-        super().__init__(**kwargs)
-        self.current_sentence = ""
-
-    def allowed_input_frame_types(self) -> set[FrameType]:
-        return set([FrameType.TEXT_CHUNK, FrameType.SENTENCE])
-
-    def possible_output_frame_types(self) -> set[FrameType]:
-        return set([FrameType.SENTENCE])
-
-    async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
-        if not FrameType.SENTENCE in requested_frame_types:
-            return
-
-        if frame.frame_type == FrameType.TEXT_CHUNK:
-            if type(frame.frame_data) != str:
-                raise Exception(
-                    "Sentence aggregator requires a string for the data field"
-                )
-
-            self.current_sentence += frame.frame_data
-            if self.current_sentence.endswith((".", "?", "!")):
-                sentence = self.current_sentence
-                self.current_sentence = ""
-                yield QueueFrame(FrameType.SENTENCE, sentence)
-        elif frame.frame_type == FrameType.END_STREAM:
-            if self.current_sentence:
-                yield QueueFrame(FrameType.SENTENCE, self.current_sentence)
-        elif frame.frame_type == FrameType.SENTENCE:
-            yield frame
-

 class LLMService(AIService):
-    def allowed_input_frame_types(self) -> set[FrameType]:
-        return set([FrameType.LLM_MESSAGE, FrameType.SENTENCE, FrameType.TRANSCRIPTION])
+    """This class is a no-op but serves as a base class for LLM services."""

-    def allowed_output_frame_types(self) -> set[FrameType]:
-        return set([FrameType.SENTENCE, FrameType.TEXT_CHUNK])
-
-    @abstractmethod
-    async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
-        yield ""
-
-    @abstractmethod
-    async def run_llm(self, messages) -> str:
-        pass
-
-    async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
-        if frame.frame_type == FrameType.LLM_MESSAGE:
-            if type(frame.frame_data) != list:
-                raise Exception("LLM service requires a dict for the data field")
-
-            messages: list[dict[str, str]] = frame.frame_data
-            if FrameType.SENTENCE in requested_frame_types:
-                yield QueueFrame(FrameType.SENTENCE, await self.run_llm(messages))
-            else:
-                async for text_chunk in self.run_llm_async(messages):
-                    yield QueueFrame(FrameType.TEXT_CHUNK, text_chunk)
-
-        # TODO: handle other frame types! Need to aggregate into messages
+    def __init__(self):
+        super().__init__()


 class TTSService(AIService):
+    def __init__(self, aggregate_sentences=True):
+        super().__init__()
+        self.aggregate_sentences: bool = aggregate_sentences
+        self.current_sentence: str = ""
+
    # Some TTS services require a specific sample rate. We default to 16k
    def get_mic_sample_rate(self):
        return 16000

-    def allowed_input_frame_types(self) -> set[FrameType]:
-        return set([FrameType.SENTENCE, FrameType.TRANSCRIPTION, FrameType.TEXT_CHUNK])
-
-    def possible_output_frame_types(self) -> set[FrameType]:
-        return set([FrameType.AUDIO])
-
-    # Converts the sentence to audio. Yields a list of audio frames that can
+    # Converts the text to audio. Yields a list of audio frames that can
    # be sent to the microphone device
    @abstractmethod
-    async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
+    async def run_tts(self, text) -> AsyncGenerator[bytes, None]:
        # yield empty bytes here, so linting can infer what this method does
        yield bytes()

-    async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
-        if not FrameType.AUDIO in requested_frame_types:
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, EndFrame) or isinstance(frame, EndPipeFrame):
+            if self.current_sentence:
+                async for audio_chunk in self.run_tts(self.current_sentence):
+                    yield AudioFrame(audio_chunk)
+                yield TextFrame(self.current_sentence)
+
+        if not isinstance(frame, TextFrame):
+            print(f"*** tts yielding non-text: {frame}")
+            yield frame
            return

-        if type(frame.frame_data) != str:
-            raise Exception("TTS service requires a string for the data field")
+        text: str | None = None
+        if not self.aggregate_sentences:
+            text = frame.text
+        else:
+            self.current_sentence += frame.text
+            if self.current_sentence.strip().endswith((".", "?", "!")):
+                text = self.current_sentence
+                self.current_sentence = ""

-        async for audio_chunk in self.run_tts(frame.frame_data):
-            yield QueueFrame(FrameType.AUDIO, audio_chunk)
+        if text:
+            async for audio_chunk in self.run_tts(text):
+                yield AudioFrame(audio_chunk)

-    # Convenience function to send the audio for a sentence to the given queue
-    async def say(self, sentence, queue: asyncio.Queue):
-        await self.run_to_queue(queue, [QueueFrame(FrameType.SENTENCE, sentence)])
+            # note we pass along the text frame *after* the audio, so the text
+            # frame is completed after the audio is processed.
+            print(f"*** tts yielding text: {text}")
+            yield TextFrame(text)


 class ImageGenService(AIService):
@@ -171,30 +90,83 @@ class ImageGenService(AIService):
        super().__init__(**kwargs)
        self.image_size = image_size

-    def allowed_input_frame_types(self) -> set[FrameType]:
-        return set([FrameType.SENTENCE, FrameType.TRANSCRIPTION, FrameType.TEXT_CHUNK, FrameType.IMAGE_DESCRIPTION])
-
-    def possible_output_frame_types(self) -> set[FrameType]:
-        return set([FrameType.IMAGE])
-
    # Renders the image. Returns an Image object.
    @abstractmethod
-    async def run_image_gen(self, sentence) -> tuple[str, bytes]:
+    async def run_image_gen(self, sentence: str) -> tuple[str, bytes]:
        pass

-    async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
-        if not FrameType.IMAGE in requested_frame_types:
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if not isinstance(frame, TextFrame):
+            yield frame
            return

-        if type(frame.frame_data) != str:
-            raise Exception("Image service requires a string for the data field")
-
-        (_, image_data) = await self.run_image_gen(frame.frame_data)
-        yield QueueFrame(FrameType.IMAGE, image_data)
+        (url, image_data) = await self.run_image_gen(frame.text)
+        yield ImageFrame(url, image_data)


-@dataclass
-class AIServiceConfig:
-    tts: TTSService
-    image: ImageGenService
-    llm: LLMService
+class STTService(AIService):
+    """STTService is a base class for speech-to-text services."""
+
+    _frame_rate: int
+
+    def __init__(self, frame_rate: int = 16000, **kwargs):
+        super().__init__(**kwargs)
+        self._frame_rate = frame_rate
+
+    @abstractmethod
+    async def run_stt(self, audio: BinaryIO) -> str:
+        """Returns transcript as a string"""
+        pass
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        """Processes a frame of audio data, either buffering or transcribing it."""
+        if not isinstance(frame, AudioFrame):
+            return
+
+        data = frame.data
+        content = io.BufferedRandom(io.BytesIO())
+        ww = wave.open(self._content, "wb")
+        ww.setnchannels(1)
+        ww.setsampwidth(2)
+        ww.setframerate(self._frame_rate)
+        ww.writeframesraw(data)
+        ww.close()
+        content.seek(0)
+        text = await self.run_stt(content)
+        yield TranscriptionQueueFrame(text, "", str(time.time()))
+
+
+class VisionService(AIService):
+    def __init__(self):
+        super().__init__()
+
+    # Renders the image. Returns an Image object.
+    # TODO-CB: return type
+    @abstractmethod
+    async def run_vision(self, prompt: str, image: bytes):
+        pass
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, VisionFrame):
+            async for frame in self.run_vision(frame.prompt, frame.image):
+                print(
+                    f"&&& visionservce processframe got frame to yield: {frame}")
+                yield frame
+            yield LLMResponseEndFrame()
+        else:
+            yield frame
+
+
+class FrameLogger(AIService):
+    def __init__(self, prefix="Frame", **kwargs):
+        super().__init__(**kwargs)
+        self.prefix = prefix
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, (AudioFrame)):
+            # self.logger.info(f"{self.prefix}: {type(frame)}")
+            pass
+        else:
+            print(f"{self.prefix}: {frame}")
+
+        yield frame
--- a/src/dailyai/services/anthropic_llm_service.py
+++ b/src/dailyai/services/anthropic_llm_service.py
@@ -0,0 +1,39 @@
+import asyncio
+import os
+from typing import AsyncGenerator
+from anthropic import AsyncAnthropic
+from dailyai.pipeline.frames import Frame, LLMMessagesQueueFrame, TextFrame
+
+from dailyai.services.ai_services import LLMService
+
+
+class AnthropicLLMService(LLMService):
+
+    def __init__(
+            self,
+            api_key,
+            model="claude-3-opus-20240229",
+            max_tokens=1024):
+        super().__init__()
+        self.client = AsyncAnthropic(api_key=api_key)
+        self.model = model
+        self.max_tokens = max_tokens
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if not isinstance(frame, LLMMessagesQueueFrame):
+            yield frame
+
+        stream = await self.client.messages.create(
+            max_tokens=self.max_tokens,
+            messages=[
+                {
+                    "role": "user",
+                    "content": "Hello, Claude",
+                }
+            ],
+            model=self.model,
+            stream=True,
+        )
+        async for event in stream:
+            if event.type == "content_block_delta":
+                yield TextFrame(event.delta.text)
--- a/src/dailyai/services/azure_ai_services.py
+++ b/src/dailyai/services/azure_ai_services.py
@@ -2,6 +2,7 @@ import aiohttp
 import asyncio
 import io
 import json
+import time
 from openai import AsyncAzureOpenAI

 import os
@@ -13,32 +14,38 @@ from dailyai.services.ai_services import LLMService, TTSService, ImageGenService
 from PIL import Image

 # See .env.example for Azure configuration needed
-from azure.cognitiveservices.speech import SpeechSynthesizer, SpeechConfig, ResultReason, CancellationReason
+from azure.cognitiveservices.speech import (
+    SpeechSynthesizer,
+    SpeechConfig,
+    ResultReason,
+    CancellationReason,
+)
+
+from dailyai.services.openai_api_llm_service import BaseOpenAILLMService
+

 class AzureTTSService(TTSService):
-    def __init__(self, speech_key=None, speech_region=None):
+    def __init__(self, *, api_key, region, voice="en-US-SaraNeural"):
        super().__init__()

-        speech_key = speech_key or os.getenv("AZURE_SPEECH_SERVICE_KEY")
-        speech_region = speech_region or os.getenv("AZURE_SPEECH_SERVICE_REGION")
-
-        self.speech_config = SpeechConfig(subscription=speech_key, region=speech_region)
-        self.speech_synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=None)
+        self.speech_config = SpeechConfig(subscription=api_key, region=region)
+        self.speech_synthesizer = SpeechSynthesizer(
+            speech_config=self.speech_config, audio_config=None
+        )
+        self._voice = voice

    async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
        self.logger.info("Running azure tts")
-        ssml = "<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' " \
-           "xmlns:mstts='http://www.w3.org/2001/mstts'>" \
-           "<voice name='en-US-SaraNeural'>" \
-           "<mstts:silence type='Sentenceboundary' value='20ms' />" \
-           "<mstts:express-as style='lyrical' styledegree='2' role='SeniorFemale'>" \
-           "<prosody rate='1.05'>" \
-           f"{sentence}" \
-           "</prosody></mstts:express-as></voice></speak> "
-        try:
-            result = await asyncio.to_thread(self.speech_synthesizer.speak_ssml, (ssml))
-        except Exception as e:
-            self.logger.error("Error in azure tts", e)
+        ssml = (
+            "<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' "
+            "xmlns:mstts='http://www.w3.org/2001/mstts'>"
+            f"<voice name='{self._voice}'>"
+            "<mstts:silence type='Sentenceboundary' value='20ms' />"
+            "<mstts:express-as style='lyrical' styledegree='2' role='SeniorFemale'>"
+            "<prosody rate='1.05'>"
+            f"{sentence}"
+            "</prosody></mstts:express-as></voice></speak> ")
+        result = await asyncio.to_thread(self.speech_synthesizer.speak_ssml, (ssml))
        self.logger.info("Got azure tts result")
        if result.reason == ResultReason.SynthesizingAudioCompleted:
            self.logger.info("Returning result")
@@ -46,130 +53,96 @@ class AzureTTSService(TTSService):
            yield result.audio_data[44:]
        elif result.reason == ResultReason.Canceled:
            cancellation_details = result.cancellation_details
-            self.logger.info("Speech synthesis canceled: {}".format(cancellation_details.reason))
+            self.logger.info(
+                "Speech synthesis canceled: {}".format(
+                    cancellation_details.reason))
            if cancellation_details.reason == CancellationReason.Error:
-                self.logger.info("Error details: {}".format(cancellation_details.error_details))
+                self.logger.info(
+                    "Error details: {}".format(
+                        cancellation_details.error_details))

-class AzureLLMService(LLMService):
-    def __init__(self, api_key=None, azure_endpoint=None, api_version=None, model=None):
-        super().__init__()
-        api_key = api_key or os.getenv("AZURE_CHATGPT_KEY")

-        azure_endpoint = azure_endpoint or os.getenv("AZURE_CHATGPT_ENDPOINT")
-        if not azure_endpoint:
-            raise Exception("No azure endpoint specified for Azure LLM, please set AZURE_CHATGPT_ENDPOINT in the environment or pass it to the AzureLLMService constructor")
+class AzureLLMService(BaseOpenAILLMService):
+    def __init__(
+            self,
+            *,
+            api_key,
+            endpoint,
+            api_version="2023-12-01-preview",
+            model):
+        self._endpoint = endpoint
+        self._api_version = api_version

-        model: str | None = model or os.getenv("AZURE_CHATGPT_DEPLOYMENT_ID")
-        if not model:
-            raise Exception("No model specified for Azure LLM, please set AZURE_CHATGPT_DEPLOYMENT_ID in the environment or pass it to the AzureLLMService constructor")
-        self.model: str = model
+        super().__init__(api_key=api_key, model=model)
+        self._model: str = model

-        api_version = api_version or "2023-12-01-preview"
-        self.client = AsyncAzureOpenAI(
+    def create_client(self, api_key=None, base_url=None):
+        self._client = AsyncAzureOpenAI(
            api_key=api_key,
-            azure_endpoint=azure_endpoint,
-            api_version=api_version,
+            azure_endpoint=self._endpoint,
+            api_version=self._api_version,
        )

-    async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
-        messages_for_log = json.dumps(messages)
-        self.logger.debug(f"Generating chat via azure: {messages_for_log}")
-
-        chunks = await self.client.chat.completions.create(model=self.model, stream=True, messages=messages)
-        async for chunk in chunks:
-            if len(chunk.choices) == 0:
-                continue
-
-            if chunk.choices[0].delta.content:
-                yield chunk.choices[0].delta.content
-
-    async def run_llm(self, messages) -> str | None:
-        messages_for_log = json.dumps(messages)
-        self.logger.debug(f"Generating chat via azure: {messages_for_log}")
-
-        response = await self.client.chat.completions.create(model=self.model, stream=False, messages=messages)
-        if response and len(response.choices) > 0:
-            return response.choices[0].message.content
-        else:
-            return None

 class AzureImageGenServiceREST(ImageGenService):

-    def __init__(self, image_size:str, api_key=None, azure_endpoint=None, api_version=None, model=None):
+    def __init__(
+        self,
+        *,
+        api_version="2023-06-01-preview",
+        image_size: str,
+        aiohttp_session: aiohttp.ClientSession,
+        api_key,
+        endpoint,
+        model,
+    ):
        super().__init__(image_size=image_size)
-        self.api_key = api_key or os.getenv("AZURE_DALLE_KEY")
-        self.azure_endpoint = azure_endpoint or os.getenv("AZURE_DALLE_ENDPOINT")
-        self.api_version = api_version or "2023-06-01-preview"
-        self.model = model or os.getenv("AZURE_DALLE_DEPLOYMENT_ID")
+
+        self._api_key = api_key
+        self._azure_endpoint = endpoint
+        self._api_version = api_version
+        self._model = model
+        self._aiohttp_session = aiohttp_session

    async def run_image_gen(self, sentence) -> tuple[str, bytes]:
-        # TODO hoist the session to app-level
-        async with aiohttp.ClientSession() as session:
-            url = f"{self.azure_endpoint}openai/images/generations:submit?api-version={self.api_version}"
-            headers= { "api-key": self.api_key, "Content-Type": "application/json" }
-            body = {
-                # Enter your prompt text here
-                "prompt": sentence,
-                "size": self.image_size,
-                "n": 1,
-            }
-            async with session.post(url, headers=headers, json=body) as submission:
-                operation_location = submission.headers['operation-location']
+        url = f"{self._azure_endpoint}openai/images/generations:submit?api-version={self._api_version}"
+        headers = {
+            "api-key": self._api_key,
+            "Content-Type": "application/json"}
+        body = {
+            # Enter your prompt text here
+            "prompt": sentence,
+            "size": self.image_size,
+            "n": 1,
+        }
+        async with self._aiohttp_session.post(
+            url, headers=headers, json=body
+        ) as submission:
+            # We never get past this line, because this header isn't
+            # defined on a 429 response, but something is eating our
+            # exceptions!
+            operation_location = submission.headers["operation-location"]
+            status = ""
+            attempts_left = 120
+            json_response = None
+            while status != "succeeded":
+                attempts_left -= 1
+                if attempts_left == 0:
+                    raise Exception("Image generation timed out")

-                status = ""
-                attempts_left = 120
-                json_response = None
-                while status != "succeeded":
-                    attempts_left -= 1
-                    if attempts_left == 0:
-                        raise Exception("Image generation timed out")
+                await asyncio.sleep(1)
+                response = await self._aiohttp_session.get(
+                    operation_location, headers=headers
+                )
+                json_response = await response.json()
+                status = json_response["status"]

-                    await asyncio.sleep(1)
-                    response = await session.get(operation_location, headers=headers)
-                    json_response = await response.json()
-                    status = json_response["status"]
-
-                image_url = json_response["result"]["data"][0]["url"] if json_response else None
-                if not image_url:
-                    raise Exception("Image generation failed")
-
-                # Load the image from the url
-                async with session.get(image_url) as response:
-                    image_stream = io.BytesIO(await response.content.read())
-                    image = Image.open(image_stream)
-                    return (image_url, image.tobytes())
-
-
-class AzureImageGenService(ImageGenService):
-
-    def __init__(self, api_key=None, azure_endpoint=None, api_version=None, model=None):
-        super().__init__()
-
-        api_key = api_key or os.getenv("AZURE_DALLE_KEY")
-        azure_endpoint = azure_endpoint or os.getenv("AZURE_DALLE_ENDPOINT")
-        api_version = api_version or "2023-06-01-preview"
-        self.model = model or os.getenv("AZURE_DALLE_DEPLOYMENT_ID")
-
-        self.client = AzureOpenAI(
-            api_key=api_key,
-            azure_endpoint=azure_endpoint,
-            api_version=api_version,
-        )
-
-    async def run_image_gen(self, sentence) -> tuple[str, bytes]:
-        self.logger.info("Generating azure image", sentence)
-
-        image = self.client.images.generate(
-            model=self.model,
-            prompt=sentence,
-            n=1,
-            size=self.image_size,
-        )
-
-        url = image["data"][0]["url"]
-        response = requests.get(url)
-
-        dalle_stream = io.BytesIO(response.content)
-        dalle_im = Image.open(dalle_stream.tobytes())
-
-        return (url, dalle_im)
+            image_url = (
+                json_response["result"]["data"][0]["url"] if json_response else None)
+            if not image_url:
+                raise Exception("Image generation failed")
+            # Load the image from the url
+            async with self._aiohttp_session.get(image_url) as response:
+                image_stream = io.BytesIO(await response.content.read())
+                image = Image.open(image_stream)
+                return (image_url, image.tobytes())
--- a/src/dailyai/services/base_transport_service.py
+++ b/src/dailyai/services/base_transport_service.py
@@ -0,0 +1,515 @@
+from abc import abstractmethod
+import asyncio
+import itertools
+import logging
+import numpy as np
+import pyaudio
+import torch
+import queue
+import threading
+import time
+from typing import Any, AsyncGenerator
+from enum import Enum
+from dailyai.pipeline.frame_processor import FrameProcessor
+
+from dailyai.pipeline.frames import (
+    SendAppMessageFrame,
+    AudioFrame,
+    EndFrame,
+    ImageFrame,
+    Frame,
+    PipelineStartedFrame,
+    SpriteFrame,
+    StartFrame,
+    TextFrame,
+    UserStartedSpeakingFrame,
+    UserStoppedSpeakingFrame,
+    RequestVideoImageFrame,
+    TelestratorImageFrame
+)
+from dailyai.pipeline.pipeline import Pipeline
+from dailyai.services.ai_services import TTSService
+
+torch.set_num_threads(1)
+
+model, utils = torch.hub.load(
+    repo_or_dir="snakers4/silero-vad", model="silero_vad", force_reload=False
+)
+
+(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils
+
+# Taken from utils_vad.py
+
+
+def validate(model, inputs: torch.Tensor):
+    with torch.no_grad():
+        outs = model(inputs)
+    return outs
+
+
+# Provided by Alexander Veysov
+
+
+def int2float(sound):
+    abs_max = np.abs(sound).max()
+    sound = sound.astype("float32")
+    if abs_max > 0:
+        sound *= 1 / 32768
+    sound = sound.squeeze()  # depends on the use case
+    return sound
+
+
+FORMAT = pyaudio.paInt16
+CHANNELS = 1
+SAMPLE_RATE = 16000
+CHUNK = int(SAMPLE_RATE / 10)
+
+audio = pyaudio.PyAudio()
+
+
+class VADState(Enum):
+    QUIET = 1
+    STARTING = 2
+    SPEAKING = 3
+    STOPPING = 4
+
+
+class BaseTransportService:
+
+    def __init__(
+        self,
+        **kwargs,
+    ) -> None:
+        self._mic_enabled = kwargs.get("mic_enabled") or False
+        self._mic_sample_rate = kwargs.get("mic_sample_rate") or 16000
+        self._camera_enabled = kwargs.get("camera_enabled") or False
+        self._camera_width = kwargs.get("camera_width") or 1024
+        self._camera_height = kwargs.get("camera_height") or 768
+        self._speaker_enabled = kwargs.get("speaker_enabled") or False
+        self._speaker_sample_rate = kwargs.get("speaker_sample_rate") or 16000
+        self._fps = kwargs.get("fps") or 8
+        self._vad_start_s = kwargs.get("vad_start_s") or 0.2
+        self._vad_stop_s = kwargs.get("vad_stop_s") or 0.8
+        self._context = kwargs.get("context") or []
+        self._vad_enabled = kwargs.get("vad_enabled") or False
+        self._receive_video = kwargs.get("receive_video") or False
+        self._receive_video_fps = kwargs.get("receive_video_fps") or 0.0
+        self._participant_frame_times = {}
+        if self._vad_enabled and self._speaker_enabled:
+            raise Exception(
+                "Sorry, you can't use speaker_enabled and vad_enabled at the same time. Please set one to False."
+            )
+
+        self._vad_samples = 1536
+        vad_frame_s = self._vad_samples / SAMPLE_RATE
+        self._vad_start_frames = round(self._vad_start_s / vad_frame_s)
+        self._vad_stop_frames = round(self._vad_stop_s / vad_frame_s)
+        self._vad_starting_count = 0
+        self._vad_stopping_count = 0
+        self._vad_state = VADState.QUIET
+        self._user_is_speaking = False
+
+        duration_minutes = kwargs.get("duration_minutes") or 10
+        self._expiration = time.time() + duration_minutes * 60
+
+        self.send_queue = asyncio.Queue()
+        self.receive_queue = asyncio.Queue()
+
+        self.completed_queue = asyncio.Queue()
+
+        self._threadsafe_send_queue = queue.Queue()
+
+        self._images = None
+
+        try:
+            self._loop: asyncio.AbstractEventLoop | None = asyncio.get_running_loop()
+        except RuntimeError:
+            self._loop = None
+
+        self._stop_threads = threading.Event()
+        self._is_interrupted = threading.Event()
+
+        self._logger: logging.Logger = logging.getLogger()
+
+    async def run(self, pipeline: Pipeline | None = None, override_pipeline_source_queue=True):
+        self._prerun()
+
+        async_output_queue_marshal_task = asyncio.create_task(
+            self._marshal_frames())
+
+        self._camera_thread = threading.Thread(
+            target=self._run_camera, daemon=True)
+        self._camera_thread.start()
+
+        self._frame_consumer_thread = threading.Thread(
+            target=self._frame_consumer, daemon=True
+        )
+        self._frame_consumer_thread.start()
+
+        if self._speaker_enabled:
+            self._receive_audio_thread = threading.Thread(
+                target=self._receive_audio, daemon=True
+            )
+            self._receive_audio_thread.start()
+
+        if self._vad_enabled:
+            self._vad_thread = threading.Thread(target=self._vad, daemon=True)
+            self._vad_thread.start()
+
+        pipeline_task = None
+        if pipeline:
+            pipeline_task = asyncio.create_task(
+                self.run_pipeline(pipeline, override_pipeline_source_queue)
+            )
+
+        try:
+            while time.time() < self._expiration and not self._stop_threads.is_set():
+                await asyncio.sleep(1)
+        except Exception as e:
+            self._logger.error(f"Exception {e}")
+            raise e
+        finally:
+            # Do anything that must be done to clean up
+            self._post_run()
+
+        self._stop_threads.set()
+
+        if pipeline_task:
+            pipeline_task.cancel()
+
+        await self.send_queue.put(EndFrame())
+
+        await async_output_queue_marshal_task
+        self._frame_consumer_thread.join()
+
+        if self._speaker_enabled:
+            self._receive_audio_thread.join()
+
+        if self._vad_enabled:
+            self._vad_thread.join()
+
+    async def run_pipeline(self, pipeline: Pipeline, override_pipeline_source_queue=True):
+        pipeline.set_sink(self.send_queue)
+        if override_pipeline_source_queue:
+            pipeline.set_source(self.receive_queue)
+        await pipeline.run_pipeline()
+
+    async def run_interruptible_pipeline(
+        self,
+        pipeline: Pipeline,
+        allow_interruptions=True,
+        pre_processor=None,
+        post_processor: FrameProcessor | None = None,
+    ):
+        pipeline.set_sink(self.send_queue)
+        source_queue = asyncio.Queue()
+        pipeline.set_source(source_queue)
+        pipeline.set_sink(self.send_queue)
+        pipeline_task = asyncio.create_task(pipeline.run_pipeline())
+
+        async def yield_frame(frame: Frame) -> AsyncGenerator[Frame, None]:
+            yield frame
+
+        async def post_process(post_processor: FrameProcessor):
+            while True:
+                frame = await self.completed_queue.get()
+
+                # We ignore the output of the post_processor's process frame;
+                # this is called to update the post-processor's state.
+                async for frame in post_processor.process_frame(frame):
+                    pass
+
+                if isinstance(frame, EndFrame):
+                    break
+
+        if post_processor:
+            post_process_task = asyncio.create_task(
+                post_process(post_processor))
+
+        started = False
+
+        async for frame in self.get_receive_frames():
+            if isinstance(frame, UserStartedSpeakingFrame):
+                pipeline_task.cancel()
+                self.interrupt()
+                pipeline_task = asyncio.create_task(pipeline.run_pipeline())
+                started = False
+
+            if not started:
+                await self.send_queue.put(StartFrame())
+
+            if pre_processor:
+                frame_generator = pre_processor.process_frame(frame)
+            else:
+                frame_generator = yield_frame(frame)
+
+            async for frame in frame_generator:
+                await source_queue.put(frame)
+
+            if isinstance(frame, EndFrame):
+                break
+
+        await asyncio.gather(pipeline_task, post_process_task)
+
+    async def say(self, text: str, tts: TTSService):
+        """Say a phrase. Use with caution; this bypasses any running pipelines."""
+        async for frame in tts.process_frame(TextFrame(text)):
+            await self.send_queue.put(frame)
+
+    def _post_run(self):
+        # Note that this function must be idempotent! It can be called multiple times
+        # if, for example, a keyboard interrupt occurs.
+        pass
+
+    def stop(self):
+        self._stop_threads.set()
+
+    async def stop_when_done(self):
+        await self._wait_for_send_queue_to_empty()
+        self.stop()
+
+    async def _wait_for_send_queue_to_empty(self):
+        await self.send_queue.join()
+        self._threadsafe_send_queue.join()
+
+    @abstractmethod
+    def write_frame_to_camera(self, frame: bytes):
+        pass
+
+    @abstractmethod
+    def write_frame_to_mic(self, frame: bytes):
+        pass
+
+    @abstractmethod
+    def read_audio_frames(self, desired_frame_count):
+        return bytes()
+
+    @abstractmethod
+    def _prerun(self):
+        pass
+
+    def _vad(self):
+        # CB: Starting silero VAD stuff
+        # TODO-CB: Probably need to force virtual speaker creation if we're
+        # going to build this in?
+        # TODO-CB: pyaudio installation
+        while not self._stop_threads.is_set():
+            audio_chunk = self.read_audio_frames(self._vad_samples)
+            audio_int16 = np.frombuffer(audio_chunk, np.int16)
+            audio_float32 = int2float(audio_int16)
+            new_confidence = model(
+                torch.from_numpy(audio_float32), 16000).item()
+            speaking = new_confidence > 0.5
+
+            if speaking:
+                match self._vad_state:
+                    case VADState.QUIET:
+                        self._vad_state = VADState.STARTING
+                        self._vad_starting_count = 1
+                    case VADState.STARTING:
+                        self._vad_starting_count += 1
+                    case VADState.STOPPING:
+                        self._vad_state = VADState.SPEAKING
+                        self._vad_stopping_count = 0
+            else:
+                match self._vad_state:
+                    case VADState.STARTING:
+                        self._vad_state = VADState.QUIET
+                        self._vad_starting_count = 0
+                    case VADState.SPEAKING:
+                        self._vad_state = VADState.STOPPING
+                        self._vad_stopping_count = 1
+                    case VADState.STOPPING:
+                        self._vad_stopping_count += 1
+
+            if (
+                self._vad_state == VADState.STARTING
+                and self._vad_starting_count >= self._vad_start_frames
+            ):
+                if self._loop:
+                    asyncio.run_coroutine_threadsafe(
+                        self.receive_queue.put(
+                            UserStartedSpeakingFrame()), self._loop)
+                # self.interrupt()
+                self._vad_state = VADState.SPEAKING
+                self._vad_starting_count = 0
+            if (
+                self._vad_state == VADState.STOPPING
+                and self._vad_stopping_count >= self._vad_stop_frames
+            ):
+                if self._loop:
+                    asyncio.run_coroutine_threadsafe(
+                        self.receive_queue.put(
+                            UserStoppedSpeakingFrame()), self._loop)
+                self._vad_state = VADState.QUIET
+                self._vad_stopping_count = 0
+
+    async def _marshal_frames(self):
+        while True:
+            frame: Frame | list = await self.send_queue.get()
+            self._threadsafe_send_queue.put(frame)
+            self.send_queue.task_done()
+            if isinstance(frame, EndFrame):
+                break
+
+    def interrupt(self):
+        self._logger.debug("### Interrupting")
+        self._is_interrupted.set()
+
+    async def get_receive_frames(self) -> AsyncGenerator[Frame, None]:
+        while True:
+            frame = await self.receive_queue.get()
+            yield frame
+            if isinstance(frame, EndFrame):
+                break
+
+    def _receive_audio(self):
+        if not self._loop:
+            self._logger.error("No loop available for audio thread")
+            return
+
+        seconds = 1
+        desired_frame_count = self._speaker_sample_rate * seconds
+        while not self._stop_threads.is_set():
+            buffer = self.read_audio_frames(desired_frame_count)
+            if len(buffer) > 0:
+                frame = AudioFrame(buffer)
+                asyncio.run_coroutine_threadsafe(
+                    self.receive_queue.put(frame), self._loop
+                )
+
+        asyncio.run_coroutine_threadsafe(
+            self.receive_queue.put(
+                EndFrame()), self._loop)
+
+    def _set_image(self, image: bytes):
+        self._images = itertools.cycle([image])
+
+    def _set_images(self, images: list[bytes], start_frame=0):
+        self._images = itertools.cycle(images)
+
+    def send_app_message(self, message: Any, participantId: str | None):
+        """ Child classes should override this to send a custom message to the room. """
+        pass
+
+    def _run_camera(self):
+        try:
+            while not self._stop_threads.is_set():
+                if self._images:
+                    this_frame = next(self._images)
+                    self.write_frame_to_camera(this_frame)
+
+                time.sleep(1.0 / self._fps)
+        except Exception as e:
+            self._logger.error(f"Exception {e} in camera thread.")
+            raise e
+
+    def _frame_consumer(self):
+        self._logger.info("🎬 Starting frame consumer thread")
+        b = bytearray()
+        smallest_write_size = 3200
+        largest_write_size = 8000
+        while True:
+            try:
+                frames_or_frame: Frame | list[Frame] = self._threadsafe_send_queue.get(
+                )
+                if (
+                    isinstance(frames_or_frame, AudioFrame)
+                    and len(frames_or_frame.data) > largest_write_size
+                ):
+                    # subdivide large audio frames to enable interruption
+                    frames = []
+                    for i in range(0, len(frames_or_frame.data),
+                                   largest_write_size):
+                        frames.append(AudioFrame(
+                            frames_or_frame.data[i: i + largest_write_size]))
+                elif isinstance(frames_or_frame, Frame):
+                    frames: list[Frame] = [frames_or_frame]
+                elif isinstance(frames_or_frame, list):
+                    frames: list[Frame] = frames_or_frame
+                else:
+                    raise Exception("Unknown type in output queue")
+
+                for frame in frames:
+                    if isinstance(frame, EndFrame):
+                        self._logger.info("Stopping frame consumer thread")
+                        self._stop_threads.set()
+                        self._threadsafe_send_queue.task_done()
+                        if self._loop:
+                            asyncio.run_coroutine_threadsafe(
+                                self.completed_queue.put(frame), self._loop
+                            )
+                        return
+
+                    # if interrupted, we just pull frames off the queue and
+                    # discard them
+                    if not self._is_interrupted.is_set():
+                        if frame:
+
+                            if isinstance(frame, AudioFrame):
+                                chunk = frame.data
+
+                                b.extend(chunk)
+                                truncated_length: int = len(b) - (
+                                    len(b) % smallest_write_size
+                                )
+                                if truncated_length:
+                                    self.write_frame_to_mic(
+                                        bytes(b[:truncated_length]))
+                                    b = b[truncated_length:]
+                            elif isinstance(frame, TelestratorImageFrame):
+                                self._set_image(frame.image)
+                                asyncio.run_coroutine_threadsafe(
+                                    self.receive_queue.put(frame),
+                                    self._loop,
+                                )
+                            elif isinstance(frame, ImageFrame):
+                                self._set_image(frame.image)
+                            elif isinstance(frame, SpriteFrame):
+                                self._set_images(frame.images)
+                            elif isinstance(frame, SendAppMessageFrame):
+                                self.send_app_message(
+                                    frame.message, frame.participantId)
+                            elif isinstance(frame, RequestVideoImageFrame):
+                                # removing one or all participant IDs from _participant_frame_times
+                                # will cause the transport to send the next available frame from
+                                # that participant
+                                if frame.participantId:
+                                    self._participant_frame_times.pop(
+                                        frame.participantId, None)
+                                else:
+                                    self._participant_frame_times.clear()
+                        elif len(b):
+                            self.write_frame_to_mic(bytes(b))
+                            b = bytearray()
+                    else:
+                        # if there are leftover audio bytes, write them now; failing to do so
+                        # can cause static in the audio stream.
+                        if len(b):
+                            truncated_length = len(b) - (len(b) % 160)
+                            self.write_frame_to_mic(
+                                bytes(b[:truncated_length]))
+                            b = bytearray()
+
+                        if isinstance(frame, StartFrame):
+                            self._is_interrupted.clear()
+                            asyncio.run_coroutine_threadsafe(
+                                self.receive_queue.put(PipelineStartedFrame()),
+                                self._loop,
+                            )
+
+                    if self._loop:
+                        asyncio.run_coroutine_threadsafe(
+                            self.completed_queue.put(frame), self._loop
+                        )
+
+                self._threadsafe_send_queue.task_done()
+            except queue.Empty:
+                if len(b):
+                    self.write_frame_to_mic(bytes(b))
+
+                b = bytearray()
+            except Exception as e:
+                self._logger.error(
+                    f"Exception in frame_consumer: {e}, {len(b)}")
+                raise e
--- a/src/dailyai/services/daily_transport_service.py
+++ b/src/dailyai/services/daily_transport_service.py
@@ -1,15 +1,22 @@
 import asyncio
 import inspect
 import logging
+import signal
 import time
+import threading
 import types

 from functools import partial
-from queue import Queue, Empty
+from typing import Any

-from dailyai.queue_frame import QueueFrame, FrameType
+from dailyai.pipeline.frames import (
+    ReceivedAppMessageFrame,
+    TranscriptionQueueFrame,
+    VideoImageFrame,
+    TelestratorImageFrame
+)

-from threading import Thread, Event, Timer
+from threading import Event

 from daily import (
    EventHandler,
@@ -20,43 +27,44 @@ from daily import (
    VirtualSpeakerDevice,
 )

-class DailyTransportService(EventHandler):
+from dailyai.services.base_transport_service import BaseTransportService
+
+
+class DailyTransportService(BaseTransportService, EventHandler):
+    _daily_initialized = False
+    _lock = threading.Lock()
+
+    _speaker_enabled: bool
+    _speaker_sample_rate: int
+    _vad_enabled: bool
+
+    # This is necessary to override EventHandler's __new__ method.
+    def __new__(cls, *args, **kwargs):
+        return super().__new__(cls)
+
    def __init__(
        self,
        room_url: str,
        token: str | None,
        bot_name: str,
-        duration: float = 10,
+        min_others_count: int = 1,
+        start_transcription: bool = False,
+        **kwargs,
    ):
-        super().__init__()
-        self.bot_name: str = bot_name
-        self.room_url: str = room_url
-        self.token: str | None = token
-        self.duration: float = duration
-        self.expiration = time.time() + duration * 60
+        # This will call BaseTransportService.__init__ method, not EventHandler
+        super().__init__(**kwargs)

-        # This queue is used to marshal frames from the async send queue to the thread that emits audio & video.
-        # We need this to maintain the asynchronous behavior of asyncio queues -- to give async functions
-        # a chance to run while waiting for queue items -- but also to maintain thread safety and have a threaded
-        # handler to send frames, to ensure that sending isn't subject to pauses in the async thread.
-        self.threadsafe_send_queue = Queue()
+        self._room_url: str = room_url
+        self._bot_name: str = bot_name
+        self._token: str | None = token
+        self._min_others_count = min_others_count
+        self._start_transcription = start_transcription

-        self.is_interrupted = Event()
-        self.stop_threads = Event()
-        self.story_started = False
-        self.mic_enabled = False
-        self.mic_sample_rate = 16000
-        self.camera_width = 1024
-        self.camera_height = 768
-        self.camera_enabled = False
+        self._is_interrupted = Event()
+        self._stop_threads = Event()

-        self.send_queue = asyncio.Queue()
-        self.receive_queue = asyncio.Queue()
-
-        self.other_participant_has_joined = False
-
-        self.camera_thread = None
-        self.frame_consumer_thread = None
+        self._other_participant_has_joined = False
+        self._my_participant_id = None

        self.transcription_settings = {
            "language": "en",
@@ -70,41 +78,48 @@ class DailyTransportService(EventHandler):
            },
        }

-        self.logger: logging.Logger = logging.getLogger("dailyai")
+        self._logger: logging.Logger = logging.getLogger("dailyai")

-        self.event_handlers = {}
+        self._event_handlers = {}

+    def _patch_method(self, event_name, *args, **kwargs):
        try:
-            self.loop = asyncio.get_running_loop()
-        except RuntimeError:
-            self.loop = None
-
-    def patch_method(self, event_name, *args, **kwargs):
-        try:
-            for handler in self.event_handlers[event_name]:
+            for handler in self._event_handlers[event_name]:
                if inspect.iscoroutinefunction(handler):
-                    if self.loop:
-                        asyncio.run_coroutine_threadsafe(handler(*args, **kwargs), self.loop)
+                    if self._loop:
+                        future = asyncio.run_coroutine_threadsafe(
+                            handler(*args, **kwargs), self._loop)
+
+                        # wait for the coroutine to finish. This will also
+                        # raise any exceptions raised by the coroutine.
+                        future.result()
                    else:
-                        raise Exception("No event loop to run coroutine. In order to use async event handlers, you must run the DailyTransportService in an asyncio event loop.")
+                        raise Exception(
+                            "No event loop to run coroutine. In order to use async event handlers, you must run the DailyTransportService in an asyncio event loop.")
                else:
                    handler(*args, **kwargs)
        except Exception as e:
-            self.logger.error(f"Exception in event handler {event_name}: {e}")
+            self._logger.error(f"Exception in event handler {event_name}: {e}")
+            raise e

    def add_event_handler(self, event_name: str, handler):
        if not event_name.startswith("on_"):
-            raise Exception(f"Event handler {event_name} must start with 'on_'")
+            raise Exception(
+                f"Event handler {event_name} must start with 'on_'")

        methods = inspect.getmembers(self, predicate=inspect.ismethod)
        if event_name not in [method[0] for method in methods]:
            raise Exception(f"Event handler {event_name} not found")

-        if not event_name in self.event_handlers:
-            self.event_handlers[event_name] = [getattr(self, event_name), types.MethodType(handler, self)]
-            setattr(self, event_name, partial(self.patch_method, event_name))
+        if event_name not in self._event_handlers:
+            self._event_handlers[event_name] = [
+                getattr(
+                    self, event_name), types.MethodType(
+                    handler, self)]
+            setattr(self, event_name, partial(self._patch_method, event_name))
        else:
-            self.event_handlers[event_name].append(types.MethodType(handler, self))
+            self._event_handlers[event_name].append(
+                types.MethodType(handler, self))

    def event_handler(self, event_name: str):
        def decorator(handler):
@@ -113,245 +128,189 @@ class DailyTransportService(EventHandler):

        return decorator

-    def configure_daily(self):
-        Daily.init()
+    def write_frame_to_camera(self, frame: bytes):
+        self.camera.write_frame(frame)
+
+    def write_frame_to_mic(self, frame: bytes):
+        self.mic.write_frames(frame)
+
+    def send_app_message(self, message: Any, participantId: str | None):
+        self.client.send_app_message(message, participantId)
+
+    def read_audio_frames(self, desired_frame_count):
+        bytes = self._speaker.read_frames(desired_frame_count)
+        return bytes
+
+    def _prerun(self):
+        # Only initialize Daily once
+        if not DailyTransportService._daily_initialized:
+            with DailyTransportService._lock:
+                Daily.init()
+                DailyTransportService._daily_initialized = True
        self.client = CallClient(event_handler=self)

-        if self.mic_enabled:
+        if self._mic_enabled:
            self.mic: VirtualMicrophoneDevice = Daily.create_microphone_device(
-                "mic", sample_rate=self.mic_sample_rate, channels=1
+                "mic", sample_rate=self._mic_sample_rate, channels=1
            )

-        if self.camera_enabled:
+        if self._camera_enabled:
            self.camera: VirtualCameraDevice = Daily.create_camera_device(
-                "camera", width=self.camera_width, height=self.camera_height, color_format="RGB"
+                "camera", width=self._camera_width, height=self._camera_height, color_format="RGB")
+
+        if self._speaker_enabled or self._vad_enabled:
+            self._speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
+                "speaker", sample_rate=self._speaker_sample_rate, channels=1
            )
+            Daily.select_speaker_device("speaker")

-        self.speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
-            "speaker", sample_rate=16000, channels=1
-        )
-
-        self.image: bytes | None = None
-        self.camera_thread = Thread(target=self.run_camera, daemon=True)
-        self.camera_thread.start()
-
-        self.logger.info("Starting frame consumer thread")
-        self.frame_consumer_thread = Thread(target=self.frame_consumer, daemon=True)
-        self.frame_consumer_thread.start()
-
-        Daily.select_speaker_device("speaker")
-
-        self.client.set_user_name(self.bot_name)
-        self.client.join(self.room_url, self.token, completion=self.call_joined)
-
-        self.client.update_inputs(
-            {
-                "camera": {
-                    "isEnabled": True,
-                    "settings": {
-                        "deviceId": "camera",
+        self.client.set_user_name(self._bot_name)
+        self.client.join(
+            self._room_url,
+            self._token,
+            completion=self.call_joined,
+            client_settings={
+                "inputs": {
+                    "camera": {
+                        "isEnabled": True,
+                        "settings": {
+                            "deviceId": "camera",
+                        },
                    },
-                },
-                "microphone": {
-                    "isEnabled": True,
-                    "settings": {
-                        "deviceId": "mic",
-                        "customConstraints": {
-                            "autoGainControl": {"exact": False},
-                            "echoCancellation": {"exact": False},
-                            "noiseSuppression": {"exact": False},
+                    "microphone": {
+                        "isEnabled": True,
+                        "settings": {
+                            "deviceId": "mic",
+                            "customConstraints": {
+                                "autoGainControl": {"exact": False},
+                                "echoCancellation": {"exact": False},
+                                "noiseSuppression": {"exact": False},
+                            },
                        },
                    },
                },
-            }
-        )
-
-        self.client.update_publishing(
-            {
-                "camera": {
-                    "sendSettings": {
-                        "maxQuality": "low",
-                        "encodings": {
-                            "low": {
-                                "maxBitrate": 250000,
-                                "scaleResolutionDownBy": 1.333,
-                                "maxFramerate": 8,
-                            }
-                        },
+                "publishing": {
+                    "camera": {
+                        "sendSettings": {
+                            "maxQuality": "low",
+                            "encodings": {
+                                "low": {
+                                    "maxBitrate": 250000,
+                                    "scaleResolutionDownBy": 1.333,
+                                    "maxFramerate": 8,
+                                }
+                            },
+                        }
                    }
-                }
-            }
+                },
+            },
        )
+        self._my_participant_id = self.client.participants()["local"]["id"]

-        if self.token:
+        if not self._receive_video:
+            self.client.update_subscription_profiles({
+                "base": {
+                    "camera": "unsubscribed",
+                }
+            })
+
+        if self._token and self._start_transcription:
            self.client.start_transcription(self.transcription_settings)

-        self.my_participant_id = self.client.participants()["local"]["id"]
+        self.original_sigint_handler = signal.getsignal(signal.SIGINT)
+        signal.signal(signal.SIGINT, self.process_interrupt_handler)

-    async def get_receive_frames(self):
-        while True:
-            frame = await self.receive_queue.get()
-            yield frame
-            if frame.frame_type == FrameType.END_STREAM:
-                break
+    def process_interrupt_handler(self, signum, frame):
+        self._post_run()
+        if callable(self.original_sigint_handler):
+            self.original_sigint_handler(signum, frame)

-    def get_async_send_queue(self):
-        return self.send_queue
+    def _post_run(self):
+        self.client.leave()
+        self.client.release()

-    async def marshal_frames(self):
-        while True:
-            frame: QueueFrame | list = await self.send_queue.get()
-            self.threadsafe_send_queue.put(frame)
-            self.send_queue.task_done()
-            if type(frame) == QueueFrame and frame.frame_type == FrameType.END_STREAM:
-                break
+    def _handle_video_frame(self, participant_id, video_frame):
+        """If receive_video is true, this function is called once for each frame from each participant. We
+         don't need to send every frame to the pipeline, so there are two ways to decide how to send frames:
+         1. Set a greater-than-zero value for receive_video_fps. The transport will track the last send time
+            for each participant and send a new frame when the requested frame rate has elapsed. This
+            guarantees an image every second, for example.
+         2. Set receive_video_fps less than or equal to zero to disable timed frame sending. Then, put a
+            RequestVideoImageFrame in the pipeline to get a new frame for one or all participants. By
+            sending a RequestVideoImageFrame immediately after successfully processing an image, you can
+            ensure you don't end up queueing up frames faster than you can process them.
+            """
+        send_frame = False
+        if not participant_id in self._participant_frame_times:
+            # then it's a new participant; send the first frame
+            send_frame = True
+        elif self._receive_video_fps > 0 and time.time() > self._participant_frame_times[participant_id] + 1.0/self._receive_video_fps:
+            # Then it's an existing participant who is due to send a new frame
+            send_frame = True

-    async def wait_for_send_queue_to_empty(self):
-        await self.send_queue.join()
-        self.threadsafe_send_queue.join()
-
-    async def stop_when_done(self):
-        await self.wait_for_send_queue_to_empty()
-        self.stop()
-
-    async def run(self) -> None:
-        self.configure_daily()
-
-        self.participant_left = False
-
-        async_output_queue_marshal_task = asyncio.create_task(self.marshal_frames())
-
-        try:
-            participant_count: int = len(self.client.participants())
-            self.logger.info(f"{participant_count} participants in room")
-            while time.time() < self.expiration and not self.participant_left and not self.stop_threads.is_set():
-                await asyncio.sleep(1)
-        except Exception as e:
-            self.logger.error(f"Exception {e}")
-        finally:
-            self.client.leave()
-
-        self.stop_threads.set()
-
-        await self.receive_queue.put(QueueFrame(FrameType.END_STREAM, None))
-        await self.send_queue.put(QueueFrame(FrameType.END_STREAM, None))
-        await async_output_queue_marshal_task
-
-        if self.camera_thread and self.camera_thread.is_alive():
-            self.camera_thread.join()
-        if self.frame_consumer_thread and self.frame_consumer_thread.is_alive():
-            self.frame_consumer_thread.join()
-
-    def stop(self):
-        self.stop_threads.set()
+        if send_frame:
+            self._participant_frame_times[participant_id] = time.time()
+            future = asyncio.run_coroutine_threadsafe(
+                self.receive_queue.put(
+                    VideoImageFrame(participant_id, video_frame)), self._loop)

    def on_first_other_participant_joined(self):
        pass

    def call_joined(self, join_data, client_error):
-        self.logger.info(f"Call_joined: {join_data}, {client_error}")
+        # self._logger.info(f"Call_joined: {join_data}, {client_error}")
+        pass
+
+    def dialout(self, number):
+        self.client.start_dialout({"phoneNumber": number})
+
+    def start_recording(self):
+        self.client.start_recording()

    def on_error(self, error):
-        self.logger.error(f"on_error: {error}")
+        self._logger.error(f"on_error: {error}")

    def on_call_state_updated(self, state):
        pass

    def on_participant_joined(self, participant):
-        if not self.other_participant_has_joined and participant["id"] != self.my_participant_id:
-            self.other_participant_has_joined = True
+        if not self._other_participant_has_joined and participant["id"] != self._my_participant_id:
+            self._other_participant_has_joined = True
            self.on_first_other_participant_joined()
+        if self._receive_video:
+            self.client.set_video_renderer(
+                participant["id"], self._handle_video_frame)

    def on_participant_left(self, participant, reason):
-        if len(self.client.participants()) < 2:
-            self.participant_left = True
-        pass
+        if len(self.client.participants()) < self._min_others_count + 1:
+            self._stop_threads.set()

-    def on_app_message(self, message, sender):
-        pass
+    def on_app_message(self, message: Any, sender: str):
+        if self._loop:
+            frame = ReceivedAppMessageFrame(message, sender)
+            print(frame)
+            asyncio.run_coroutine_threadsafe(
+                self.receive_queue.put(frame), self._loop
+            )

-    def on_transcription_message(self, message:dict):
-        if self.loop:
-            frame = QueueFrame(FrameType.TRANSCRIPTION, message)
-            asyncio.run_coroutine_threadsafe(self.receive_queue.put(frame), self.loop)
-
-    def on_transcription_stopped(self, stopped_by, stopped_by_error):
-        pass
+    def on_transcription_message(self, message: dict):
+        if self._loop:
+            participantId = ""
+            if "participantId" in message:
+                participantId = message["participantId"]
+            elif "session_id" in message:
+                participantId = message["session_id"]
+            if self._my_participant_id and participantId != self._my_participant_id:
+                frame = TranscriptionQueueFrame(
+                    message["text"], participantId, message["timestamp"])
+                asyncio.run_coroutine_threadsafe(
+                    self.receive_queue.put(frame), self._loop)

    def on_transcription_error(self, message):
-        pass
+        self._logger.error(f"Transcription error: {message}")

    def on_transcription_started(self, status):
        pass

-    def set_image(self, image: bytes):
-        self.image: bytes | None = image
-
-    def run_camera(self):
-        try:
-            while not self.stop_threads.is_set():
-                if self.image:
-                    self.camera.write_frame(self.image)
-
-                time.sleep(1.0 / 8)  # 8 fps
-        except Exception as e:
-            self.logger.error(f"Exception {e} in camera thread.")
-
-    def frame_consumer(self):
-        self.logger.info("🎬 Starting frame consumer thread")
-        b = bytearray()
-        smallest_write_size = 3200
-        all_audio_frames = bytearray()
-        while True:
-            try:
-                frames_or_frame: QueueFrame | list[QueueFrame] = self.threadsafe_send_queue.get()
-                if type(frames_or_frame) == QueueFrame:
-                    frames: list[QueueFrame] = [frames_or_frame]
-                elif type(frames_or_frame) == list:
-                    frames: list[QueueFrame] = frames_or_frame
-                else:
-                    raise Exception("Unknown type in output queue")
-
-                for frame in frames:
-                    if frame.frame_type == FrameType.END_STREAM:
-                        self.logger.info("Stopping frame consumer thread")
-                        self.threadsafe_send_queue.task_done()
-                        return
-
-                    # if interrupted, we just pull frames off the queue and discard them
-                    if not self.is_interrupted.is_set():
-                        if frame:
-                            if frame.frame_type == FrameType.AUDIO:
-                                chunk = frame.frame_data
-
-                                all_audio_frames.extend(chunk)
-
-                                b.extend(chunk)
-                                l = len(b) - (len(b) % smallest_write_size)
-                                if l:
-                                    self.mic.write_frames(bytes(b[:l]))
-                                    b = b[l:]
-                            elif frame.frame_type == FrameType.IMAGE:
-                                self.set_image(frame.frame_data)
-                        elif len(b):
-                            self.mic.write_frames(bytes(b))
-                            b = bytearray()
-                    else:
-                        if self.interrupt_time:
-                            self.logger.info(
-                                f"Lag to stop stream after interruption {time.perf_counter() - self.interrupt_time}"
-                            )
-                            self.interrupt_time = None
-
-                        if frame.frame_type == FrameType.START_STREAM:
-                            self.is_interrupted.clear()
-
-                self.threadsafe_send_queue.task_done()
-            except Empty:
-                try:
-                    if len(b):
-                        self.mic.write_frames(bytes(b))
-                except Exception as e:
-                    self.logger.error(f"Exception in frame_consumer: {e}, {len(b)}")
-
-                b = bytearray()
+    def on_transcription_stopped(self, stopped_by, stopped_by_error):
+        pass
--- a/src/dailyai/services/deepgram_ai_service.py
+++ b/src/dailyai/services/deepgram_ai_service.py
@@ -0,0 +1,38 @@
+import os
+import aiohttp
+import requests
+
+from dailyai.services.ai_services import TTSService
+
+
+class DeepgramAIService(TTSService):
+    def __init__(
+        self,
+        *,
+        aiohttp_session: aiohttp.ClientSession,
+        api_key,
+        voice,
+        sample_rate=16000
+    ):
+        super().__init__()
+
+        self._api_key = api_key
+        self._voice = voice
+        self._sample_rate = sample_rate
+        self._aiohttp_session = aiohttp_session
+
+    async def run_tts(self, sentence):
+        self.logger.info(f"Running deepgram tts for {sentence}")
+        base_url = "https://api.beta.deepgram.com/v1/speak"
+        request_url = f"{base_url}?model={self._voice}&encoding=linear16&container=none&sample_rate={self._sample_rate}"
+        headers = {
+            "authorization": f"token {self._api_key}",
+            "Content-Type": "application/json"}
+        data = {"text": sentence}
+
+        async with self._aiohttp_session.post(
+            request_url, headers=headers, json=data
+        ) as r:
+            async for chunk in r.content:
+                if chunk:
+                    yield chunk
--- a/src/dailyai/services/deepgram_ai_services.py
+++ b/src/dailyai/services/deepgram_ai_services.py
@@ -7,23 +7,29 @@ import requests
 from collections.abc import AsyncGenerator
 from dailyai.services.ai_services import TTSService

+
 class DeepgramTTSService(TTSService):
-    def __init__(self, speech_key=None, voice=None):
+    def __init__(
+            self,
+            *,
+            aiohttp_session,
+            api_key,
+            voice="alpha-asteria-en-v2"):
        super().__init__()

-        self.voice = voice or os.getenv("DEEPGRAM_VOICE") or "alpha-asteria-en-v2"
-        self.speech_key = speech_key or os.getenv("DEEPGRAM_API_KEY")
-    
+        self._voice = voice
+        self._api_key = api_key
+        self._aiohttp_session = aiohttp_session
+
    def get_mic_sample_rate(self):
        return 24000

    async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
        self.logger.info(f"Running deepgram tts for {sentence}")
        base_url = "https://api.beta.deepgram.com/v1/speak"
-        request_url = f"{base_url}?model={self.voice}&encoding=linear16&container=none&sample_rate=16000"
-        headers = {"authorization": f"token {self.speech_key}"}
-        body = { "text": sentence }
-        async with aiohttp.ClientSession() as session:
-            async with session.post(request_url, headers=headers, json=body) as r:
-                async for data in r.content:
-                    yield data
+        request_url = f"{base_url}?model={self._voice}&encoding=linear16&container=none&sample_rate=16000"
+        headers = {"authorization": f"token {self._api_key}"}
+        body = {"text": sentence}
+        async with self._aiohttp_session.post(request_url, headers=headers, json=body) as r:
+            async for data in r.content:
+                yield data
--- a/src/dailyai/services/elevenlabs_ai_service.py
+++ b/src/dailyai/services/elevenlabs_ai_service.py
@@ -9,28 +9,43 @@ from dailyai.services.ai_services import TTSService


 class ElevenLabsTTSService(TTSService):
-    def __init__(self, api_key=None, voice_id=None):
-        super().__init__()

-        self.api_key = api_key or os.getenv("ELEVENLABS_API_KEY")
-        self.voice_id = voice_id or os.getenv("ELEVENLABS_VOICE_ID")
+    def __init__(
+        self,
+        *,
+        aiohttp_session: aiohttp.ClientSession,
+        api_key,
+        narrator,
+        model="eleven_turbo_v2",
+        aggregate_sentences=True
+    ):
+        super().__init__(aggregate_sentences)
+
+        self._api_key = api_key
+        self._narrator = narrator
+        self._aiohttp_session = aiohttp_session
+        self._model = model

    async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
-        async with aiohttp.ClientSession() as session:
-            url = f"https://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}/stream"
-            payload = {"text": sentence, "model_id": "eleven_turbo_v2"}
-            querystring = {"output_format": "pcm_16000", "optimize_streaming_latency": 2}
-            headers = {
-                "xi-api-key": self.api_key,
-                "Content-Type": "application/json",
-            }
-            async with session.post(url, json=payload, headers=headers, params=querystring) as r:
-                if r.status != 200:
-                    self.logger.error(
-                        f"audio fetch status code: {r.status}, error: {r.text}"
-                    )
-                    return
+        url = f"https://api.elevenlabs.io/v1/text-to-speech/{self._narrator['narrator']['voice_id']}/stream"
+        payload = {"text": sentence, "model_id": self._model}
+        querystring = {
+            "output_format": "pcm_16000",
+            "optimize_streaming_latency": 2}
+        headers = {
+            "xi-api-key": self._api_key,
+            "Content-Type": "application/json",
+        }

-                async for chunk in r.content:
-                    if chunk:
-                        yield chunk
+        async with self._aiohttp_session.post(
+            url, json=payload, headers=headers, params=querystring
+        ) as r:
+            if r.status != 200:
+                self.logger.error(
+                    f"audio fetch status code: {r.status}, error: {r.text}"
+                )
+                return
+
+            async for chunk in r.content:
+                if chunk:
+                    yield chunk
--- a/src/dailyai/services/fal_ai_services.py
+++ b/src/dailyai/services/fal_ai_services.py
@@ -2,30 +2,43 @@ import fal
 import aiohttp
 import asyncio
 import io
-import json
+import os
 from PIL import Image

+from dailyai.services.ai_services import ImageGenService
+
+
+from dailyai.services.ai_services import ImageGenService

-from dailyai.services.ai_services import LLMService, TTSService, ImageGenService
 # Fal expects FAL_KEY_ID and FAL_KEY_SECRET to be set in the env
+
+
 class FalImageGenService(ImageGenService):
-    def __init__(self, image_size):
+    def __init__(
+        self,
+        *,
+        image_size,
+        aiohttp_session: aiohttp.ClientSession,
+        key_id=None,
+        key_secret=None
+    ):
        super().__init__(image_size)
+        self._aiohttp_session = aiohttp_session
+        if key_id:
+            os.environ["FAL_KEY_ID"] = key_id
+        if key_secret:
+            os.environ["FAL_KEY_SECRET"] = key_secret

    async def run_image_gen(self, sentence) -> tuple[str, bytes]:
        def get_image_url(sentence, size):
-            print("starting fal submit...")
            handler = fal.apps.submit(
                "110602490-fast-sdxl",
-                arguments={
-                "prompt": sentence
-                },
-                )
-            print("past fal handler init, about to wait for iter_events...")
+                # "fal-ai/fast-sdxl",
+                arguments={"prompt": sentence},
+            )
            for event in handler.iter_events():
                if isinstance(event, fal.apps.InProgress):
-                    print('Request in progress')
-                    print(event.logs)
+                    pass

            result = handler.get()

@@ -34,16 +47,13 @@ class FalImageGenService(ImageGenService):
                raise Exception("Image generation failed")

            return image_url
-        print(f"fetching image url...")
-        image_url = await asyncio.to_thread(get_image_url, sentence, self.image_size)
-        print(f"got image url, downloading image...")
-        # Load the image from the url
-        async with aiohttp.ClientSession() as session:
-            async with session.get(image_url) as response:
-                print("got image response")
-                image_stream = io.BytesIO(await response.content.read())
-                print("read image stream")
-                image = Image.open(image_stream)
-                return (image_url, image.tobytes())

-        # return (image_url, dalle_im.tobytes())
+        image_url = await asyncio.to_thread(get_image_url, sentence, self.image_size)
+        # Load the image from the url
+        async with self._aiohttp_session.get(image_url) as response:
+            image_stream = io.BytesIO(await response.content.read())
+            image = Image.open(image_stream)
+            image_bytes = image.tobytes()
+            print(f"!!! fal image tobytes is:")
+            print(image)
+            return (image_url, image_bytes)
--- a/src/dailyai/services/local_stt_service.py
+++ b/src/dailyai/services/local_stt_service.py
@@ -0,0 +1,73 @@
+import array
+import io
+import math
+import time
+from typing import AsyncGenerator
+import wave
+from dailyai.pipeline.frames import AudioFrame, Frame, TranscriptionQueueFrame
+from dailyai.services.ai_services import STTService
+
+
+class LocalSTTService(STTService):
+    _content: io.BufferedRandom
+    _wave: wave.Wave_write
+    _current_silence_frames: int
+
+    # Configuration
+    _min_rms: int
+    _max_silence_frames: int
+    _frame_rate: int
+
+    def __init__(self,
+                 min_rms: int = 400,
+                 max_silence_frames: int = 3,
+                 frame_rate: int = 16000,
+                 **kwargs):
+        super().__init__(frame_rate, **kwargs)
+        self._current_silence_frames = 0
+        self._min_rms = min_rms
+        self._max_silence_frames = max_silence_frames
+        self._frame_rate = frame_rate
+        self._new_wave()
+
+    def _new_wave(self):
+        """Creates a new wave object and content buffer."""
+        self._content = io.BufferedRandom(io.BytesIO())
+        ww = wave.open(self._content, "wb")
+        ww.setnchannels(1)
+        ww.setsampwidth(2)
+        ww.setframerate(self._frame_rate)
+        self._wave = ww
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        """Processes a frame of audio data, either buffering or transcribing it."""
+        if not isinstance(frame, AudioFrame):
+            return
+
+        data = frame.data
+        # Try to filter out empty background noise
+        # (Very rudimentary approach, can be improved)
+        rms = self._get_volume(data)
+        if rms >= self._min_rms:
+            # If volume is high enough, write new data to wave file
+            self._wave.writeframesraw(data)
+
+        # If buffer is not empty and we detect a 3-frame pause in speech,
+        # transcribe the audio gathered so far.
+        if self._content.tell() > 0 and self._current_silence_frames > self._max_silence_frames:
+            self._current_silence_frames = 0
+            self._wave.close()
+            self._content.seek(0)
+            text = await self.run_stt(self._content)
+            self._new_wave()
+            yield TranscriptionQueueFrame(text, '', str(time.time()))
+        # If we get this far, this is a frame of silence
+        self._current_silence_frames += 1
+
+    def _get_volume(self, audio: bytes) -> float:
+        # https://docs.python.org/3/library/array.html
+        audio_array = array.array('h', audio)
+        squares = [sample**2 for sample in audio_array]
+        mean = sum(squares) / len(audio_array)
+        rms = math.sqrt(mean)
+        return rms
--- a/src/dailyai/services/local_transport_service.py
+++ b/src/dailyai/services/local_transport_service.py
@@ -0,0 +1,84 @@
+import asyncio
+import time
+import numpy as np
+import tkinter as tk
+import pyaudio
+
+from dailyai.services.base_transport_service import BaseTransportService
+
+
+class LocalTransportService(BaseTransportService):
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self._sample_width = kwargs.get("sample_width") or 2
+        self._n_channels = kwargs.get("n_channels") or 1
+        self._tk_root = kwargs.get("tk_root") or None
+
+        if self._camera_enabled and not self._tk_root:
+            raise ValueError(
+                "If camera is enabled, a tkinter root must be provided")
+
+        if self._speaker_enabled:
+            self._speaker_buffer_pending = bytearray()
+
+    async def _write_frame_to_tkinter(self, frame: bytes):
+        data = f"P6 {self._camera_width} {self._camera_height} 255 ".encode() + \
+            frame
+        photo = tk.PhotoImage(
+            width=self._camera_width,
+            height=self._camera_height,
+            data=data,
+            format="PPM")
+        self._image_label.config(image=photo)
+
+        # This holds a reference to the photo, preventing it from being garbage
+        # collected.
+        self._image_label.image = photo  # type: ignore
+
+    def write_frame_to_camera(self, frame: bytes):
+        if self._camera_enabled and self._loop:
+            asyncio.run_coroutine_threadsafe(
+                self._write_frame_to_tkinter(frame), self._loop
+            )
+
+    def write_frame_to_mic(self, frame: bytes):
+        self._audio_stream.write(frame)
+
+    def read_frames(self, desired_frame_count):
+        bytes = self._speaker_stream.read(
+            desired_frame_count,
+            exception_on_overflow=False,
+        )
+        return bytes
+
+    def _prerun(self):
+        if self._mic_enabled:
+            self._pyaudio = pyaudio.PyAudio()
+            self._audio_stream = self._pyaudio.open(
+                format=self._pyaudio.get_format_from_width(self._sample_width),
+                channels=self._n_channels,
+                rate=self._speaker_sample_rate,
+                output=True,
+            )
+
+        if self._camera_enabled:
+            # Start with a neutral gray background.
+            array = np.ones((1024, 1024, 3)) * 128
+            data = f"P5 {1024} {1024} 255 ".encode(
+            ) + array.astype(np.uint8).tobytes()
+            photo = tk.PhotoImage(
+                width=1024,
+                height=1024,
+                data=data,
+                format="PPM")
+            self._image_label = tk.Label(self._tk_root, image=photo)
+            self._image_label.pack()
+
+        if self._speaker_enabled:
+            self._speaker_stream = self._pyaudio.open(
+                format=self._pyaudio.get_format_from_width(self._sample_width),
+                channels=self._n_channels,
+                rate=self._speaker_sample_rate,
+                frames_per_buffer=self._speaker_sample_rate,
+                input=True
+            )
--- a/src/dailyai/services/ollama_ai_services.py
+++ b/src/dailyai/services/ollama_ai_services.py
@@ -0,0 +1,7 @@
+from dailyai.services.openai_api_llm_service import BaseOpenAILLMService
+
+
+class OLLamaLLMService(BaseOpenAILLMService):
+
+    def __init__(self, model="llama2", base_url="http://localhost:11434/v1"):
+        super().__init__(model=model, base_url=base_url, api_key="ollama")
--- a/src/dailyai/services/open_ai_services.py
+++ b/src/dailyai/services/open_ai_services.py
@@ -1,67 +1,52 @@
-import requests
 import aiohttp
-import asyncio
 from PIL import Image
 import io
-from openai import AsyncOpenAI
+import time
+import base64
+from openai import AsyncOpenAI, AsyncStream

-import os
 import json
 from collections.abc import AsyncGenerator

-from dailyai.services.ai_services import AIService, TTSService, LLMService, ImageGenService
+from openai.types.chat import (
+    ChatCompletion,
+    ChatCompletionChunk,
+    ChatCompletionMessageParam,
+)
+
+from daily import VideoFrame
+from dailyai.services.ai_services import LLMService, ImageGenService, VisionService
+from dailyai.services.openai_api_llm_service import BaseOpenAILLMService
+from dailyai.pipeline.frames import TextFrame


-class OpenAILLMService(LLMService):
-    def __init__(self, api_key=None, model=None):
-        super().__init__()
-        api_key = api_key or os.getenv("OPEN_AI_KEY")
-        self.model = model or os.getenv("OPEN_AI_LLM_MODEL") or "gpt-4"
-        self.client = AsyncOpenAI(api_key=api_key)
+class OpenAILLMService(BaseOpenAILLMService):

-    async def get_response(self, messages, stream):
-        return await self.client.chat.completions.create(
-            stream=stream,
-            messages=messages,
-            model=self.model
-        )
+    def __init__(self, model="gpt-4", * args, **kwargs):
+        super().__init__(model, *args, **kwargs)

-    async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
-        messages_for_log = json.dumps(messages)
-        self.logger.debug(f"Generating chat via openai: {messages_for_log}")
-
-        response = await self.get_response(messages, stream=True)
-
-        for chunk in response:
-            if len(chunk.choices) == 0:
-                continue
-
-            if chunk.choices[0].delta.content:
-                yield chunk.choices[0].delta.content
-
-    async def run_llm(self, messages) -> str | None:
-        messages_for_log = json.dumps(messages)
-        self.logger.debug(f"Generating chat via openai: {messages_for_log}")
-
-        response = await self.get_response(messages, stream=False)
-        if response and len(response.choices) > 0:
-            return response.choices[0].message.content
-        else:
-            return None

 class OpenAIImageGenService(ImageGenService):
-    def __init__(self, image_size:str, api_key=None, model=None):
+
+    def __init__(
+        self,
+        *,
+        image_size: str,
+        aiohttp_session: aiohttp.ClientSession,
+        api_key,
+        model="dall-e-3",
+    ):
        super().__init__(image_size=image_size)
-        api_key = api_key or os.getenv("OPEN_AI_KEY")
-        self.model = model or os.getenv("OPEN_AI_IMAGE_MODEL") or "dall-e-3"
-        self.client = AsyncOpenAI(api_key=api_key)
+        self._model = model
+        self._client = AsyncOpenAI(api_key=api_key)
+        self._aiohttp_session = aiohttp_session

    async def run_image_gen(self, sentence) -> tuple[str, bytes]:
        self.logger.info("Generating OpenAI image", sentence)

-        image = await self.client.images.generate(
+        image = await self._client.images.generate(
            prompt=sentence,
-            model=self.model,
+            model=self._model,
            n=1,
            size=self.image_size
        )
@@ -70,10 +55,71 @@ class OpenAIImageGenService(ImageGenService):
            raise Exception("No image provided in response", image)

        # Load the image from the url
-        async with aiohttp.ClientSession() as session:
-            async with session.get(image_url) as response:
-                image_stream = io.BytesIO(await response.content.read())
-                image = Image.open(image_stream)
-                return (image_url, image.tobytes())
+        async with self._aiohttp_session.get(image_url) as response:
+            image_stream = io.BytesIO(await response.content.read())
+            image = Image.open(image_stream)
+            return (image_url, image.tobytes())

-        return (image_url, dalle_im.tobytes())
+
+class OpenAIVisionService(VisionService):
+    def __init__(
+        self,
+        *,
+        model="gpt-4-vision-preview",
+        api_key,
+    ):
+        self._model = model
+        self._client = AsyncOpenAI(api_key=api_key)
+
+    async def run_vision(self, prompt: str, image: bytes):
+        if isinstance(image, VideoFrame):
+            # Then it's from a daily video frame
+            print("### processing daily video frame for recognition")
+            IMAGE_WIDTH = image.width
+            IMAGE_HEIGHT = image.height
+            COLOR_FORMAT = image.color_format
+            a_image = Image.frombytes(
+                'RGBA', (IMAGE_WIDTH, IMAGE_HEIGHT), image.buffer)
+            new_image = a_image.convert('RGB')
+        else:
+            # handle it as a byte stream from image gen
+            new_image = Image.frombytes('RGB', (1024, 1024), image)
+            # Uncomment these lines to write the frame to a jpg in the same directory.
+            # current_path = os.getcwd()
+            # image_path = os.path.join(current_path, "image.jpg")
+            # image.save(image_path, format="JPEG")
+
+        jpeg_buffer = io.BytesIO()
+
+        new_image.save(jpeg_buffer, format='JPEG')
+
+        jpeg_bytes = jpeg_buffer.getvalue()
+        base64_image = base64.b64encode(jpeg_bytes).decode('utf-8')
+
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": prompt},
+                    {
+                        "type": "image_url",
+                        "image_url": {
+                            "url": f"data:image/jpeg;base64,{base64_image}"
+                        },
+                    },
+                ],
+            }
+        ]
+        chunks: AsyncStream[ChatCompletionChunk] = (
+            await self._client.chat.completions.create(
+                model=self._model,
+                stream=True,
+                messages=messages,
+            )
+        )
+        async for chunk in chunks:
+            print(f"%%% chunk: {chunk}")
+            if len(chunk.choices) == 0:
+                continue
+            if chunk.choices[0].delta.content:
+                yield TextFrame(chunk.choices[0].delta.content)
--- a/src/dailyai/services/openai_api_llm_service.py
+++ b/src/dailyai/services/openai_api_llm_service.py
@@ -0,0 +1,124 @@
+import json
+import time
+from typing import AsyncGenerator, List
+from openai import AsyncOpenAI, AsyncStream
+from dailyai.pipeline.frames import (
+    Frame,
+    LLMFunctionCallFrame,
+    LLMFunctionStartFrame,
+    LLMMessagesQueueFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
+    OpenAILLMContextFrame,
+    TextFrame,
+)
+from dailyai.services.ai_services import LLMService
+from dailyai.services.openai_llm_context import OpenAILLMContext
+
+from openai.types.chat import (
+    ChatCompletion,
+    ChatCompletionChunk,
+    ChatCompletionMessageParam,
+)
+
+
+class BaseOpenAILLMService(LLMService):
+    """This is the base for all services that use the AsyncOpenAI client.
+
+    This service consumes OpenAILLMContextFrame frames, which contain a reference
+    to an OpenAILLMContext frame. The OpenAILLMContext object defines the context
+    sent to the LLM for a completion. This includes user, assistant and system messages
+    as well as tool choices and the tool, which is used if requesting function
+    calls from the LLM.
+    """
+
+    def __init__(self, model: str, api_key=None, base_url=None):
+        super().__init__()
+        self._model: str = model
+        self.create_client(api_key=api_key, base_url=base_url)
+
+    def create_client(self, api_key=None, base_url=None):
+        self._client = AsyncOpenAI(api_key=api_key, base_url=base_url)
+
+    async def _stream_chat_completions(
+        self, context: OpenAILLMContext
+    ) -> AsyncStream[ChatCompletionChunk]:
+        messages: List[ChatCompletionMessageParam] = context.get_messages()
+        messages_for_log = json.dumps(messages)
+        self.logger.debug(f"Generating chat via openai: {messages_for_log}")
+
+        start_time = time.time()
+        chunks: AsyncStream[ChatCompletionChunk] = (
+            await self._client.chat.completions.create(
+                model=self._model,
+                stream=True,
+                messages=messages,
+                tools=context.tools,
+                tool_choice=context.tool_choice,
+            )
+        )
+        self.logger.info(f"=== OpenAI LLM TTFB: {time.time() - start_time}")
+        return chunks
+
+    async def _chat_completions(self, messages) -> str | None:
+        messages_for_log = json.dumps(messages)
+        self.logger.debug(f"Generating chat via openai: {messages_for_log}")
+
+        response: ChatCompletion = await self._client.chat.completions.create(
+            model=self._model, stream=False, messages=messages
+        )
+        if response and len(response.choices) > 0:
+            return response.choices[0].message.content
+        else:
+            return None
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, OpenAILLMContextFrame):
+            context: OpenAILLMContext = frame.context
+        elif isinstance(frame, LLMMessagesQueueFrame):
+            context = OpenAILLMContext.from_messages(frame.messages)
+        else:
+            yield frame
+            return
+
+        function_name = ""
+        arguments = ""
+
+        yield LLMResponseStartFrame()
+        chunk_stream: AsyncStream[ChatCompletionChunk] = (
+            await self._stream_chat_completions(context)
+        )
+        async for chunk in chunk_stream:
+            if len(chunk.choices) == 0:
+                continue
+
+            if chunk.choices[0].delta.tool_calls:
+                # We're streaming the LLM response to enable the fastest response times.
+                # For text, we just yield each chunk as we receive it and count on consumers
+                # to do whatever coalescing they need (eg. to pass full sentences to TTS)
+                #
+                # If the LLM is a function call, we'll do some coalescing here.
+                # If the response contains a function name, we'll yield a frame to tell consumers
+                # that they can start preparing to call the function with that name.
+                # We accumulate all the arguments for the rest of the streamed response, then when
+                # the response is done, we package up all the arguments and the function name and
+                # yield a frame containing the function name and the arguments.
+
+                tool_call = chunk.choices[0].delta.tool_calls[0]
+                if tool_call.function and tool_call.function.name:
+                    function_name += tool_call.function.name
+                    yield LLMFunctionStartFrame(function_name=tool_call.function.name)
+                if tool_call.function and tool_call.function.arguments:
+                    # Keep iterating through the response to collect all the argument fragments and
+                    # yield a complete LLMFunctionCallFrame after run_llm_async
+                    # completes
+                    arguments += tool_call.function.arguments
+            elif chunk.choices[0].delta.content:
+                yield TextFrame(chunk.choices[0].delta.content)
+
+        # if we got a function name and arguments, yield the frame with all the info so
+        # frame consumers can take action based on the function call.
+        if function_name and arguments:
+            yield LLMFunctionCallFrame(function_name=function_name, arguments=arguments)
+
+        yield LLMResponseEndFrame()
--- a/src/dailyai/services/openai_llm_context.py
+++ b/src/dailyai/services/openai_llm_context.py
@@ -0,0 +1,54 @@
+from typing import List
+from openai._types import NOT_GIVEN, NotGiven
+
+from openai.types.chat import (
+    ChatCompletionToolParam,
+    ChatCompletionToolChoiceOptionParam,
+    ChatCompletionMessageParam,
+)
+
+
+class OpenAILLMContext:
+
+    def __init__(
+        self,
+        messages: List[ChatCompletionMessageParam] | None = None,
+        tools: List[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,
+        tool_choice: ChatCompletionToolChoiceOptionParam | NotGiven = NOT_GIVEN
+    ):
+        self.messages: List[ChatCompletionMessageParam] = messages if messages else [
+        ]
+        self.tool_choice: ChatCompletionToolChoiceOptionParam | NotGiven = tool_choice
+        self.tools: List[ChatCompletionToolParam] | NotGiven = tools
+
+    @staticmethod
+    def from_messages(messages: List[dict]) -> "OpenAILLMContext":
+        context = OpenAILLMContext()
+        for message in messages:
+            context.add_message({
+                "content": message["content"],
+                "role": message["role"],
+                "name": message["name"] if "name" in message else message["role"]
+            })
+        return context
+
+    # def __deepcopy__(self, memo):
+
+    def add_message(self, message: ChatCompletionMessageParam):
+        self.messages.append(message)
+
+    def get_messages(self) -> List[ChatCompletionMessageParam]:
+        return self.messages
+
+    def set_tool_choice(
+        self, tool_choice: ChatCompletionToolChoiceOptionParam | NotGiven
+    ):
+        self.tool_choice = tool_choice
+
+    def set_tools(
+            self,
+            tools: List[ChatCompletionToolParam] | NotGiven = NOT_GIVEN):
+        if tools != NOT_GIVEN and len(tools) == 0:
+            tools = NOT_GIVEN
+
+        self.tools = tools
--- a/src/dailyai/services/to_be_updated/playht_ai_service.py
+++ b/src/dailyai/services/to_be_updated/playht_ai_service.py
@@ -1,36 +1,40 @@
 import io
-import os
 import struct
 from pyht import Client
-from dotenv import load_dotenv
 from pyht.client import TTSOptions
 from pyht.protos.api_pb2 import Format

-from services.ai_service import AIService
+from dailyai.services.ai_services import TTSService

-class PlayHTAIService(AIService):
-    def __init__(self, **kwargs):
-        super().__init__(**kwargs)

-        self.speech_key = os.getenv("PLAY_HT_KEY") or ''
-        self.user_id = os.getenv("PLAY_HT_USER_ID") or ''
+class PlayHTAIService(TTSService):
+
+    def __init__(
+        self,
+        *,
+        api_key,
+        user_id,
+        voice_url
+    ):
+        super().__init__()
+
+        self.speech_key = api_key
+        self.user_id = user_id

        self.client = Client(
            user_id=self.user_id,
            api_key=self.speech_key,
        )
        self.options = TTSOptions(
-            voice="s3://voice-cloning-zero-shot/820da3d2-3a3b-42e7-844d-e68db835a206/sarah/manifest.json",
+            voice=voice_url,
            sample_rate=16000,
            quality="higher",
-            format=Format.FORMAT_WAV
-        )
+            format=Format.FORMAT_WAV)

-    def close(self):
-        super().close()
+    def __del__(self):
        self.client.close()

-    def run_tts(self, sentence):
+    async def run_tts(self, sentence):
        b = bytearray()
        in_header = True
        for chunk in self.client.tts(sentence, self.options):
@@ -43,14 +47,15 @@ class PlayHTAIService(AIService):
                    fh = io.BytesIO(b)
                    fh.seek(36)
                    (data, size) = struct.unpack('<4sI', fh.read(8))
-                    self.logger.info(f"first attempt: data: {data}, size: {hex(size)}, position: {fh.tell()}")
+                    self.logger.info(
+                        f"first attempt: data: {data}, size: {hex(size)}, position: {fh.tell()}")
                    while data != b'data':
                        fh.read(size)
                        (data, size) = struct.unpack('<4sI', fh.read(8))
-                        self.logger.info(f"subsequent data: {data}, size: {hex(size)}, position: {fh.tell()}, data != data: {data != b'data'}")
+                        self.logger.info(
+                            f"subsequent data: {data}, size: {hex(size)}, position: {fh.tell()}, data != data: {data != b'data'}")
                    self.logger.info("position: ", fh.tell())
                    in_header = False
            else:
                if len(chunk):
                    yield chunk
-
--- a/src/dailyai/services/to_be_updated/cloudflare_ai_service.py
+++ b/src/dailyai/services/to_be_updated/cloudflare_ai_service.py
@@ -4,6 +4,8 @@ from services.ai_service import AIService

 # Note that Cloudflare's AI workers are still in beta.
 # https://developers.cloudflare.com/workers-ai/
+
+
 class CloudflareAIService(AIService):
    def __init__(self):
        super().__init__()
@@ -15,15 +17,18 @@ class CloudflareAIService(AIService):

    # base endpoint, used by the others
    def run(self, model, input):
-        response = requests.post(f"{self.api_base_url}{model}", headers=self.headers, json=input)
+        response = requests.post(
+            f"{self.api_base_url}{model}",
+            headers=self.headers,
+            json=input)
        return response.json()

    # https://developers.cloudflare.com/workers-ai/models/llm/
-    def run_llm(self, messages, latest_user_message=None, stream = True):
+    def run_llm(self, messages, latest_user_message=None, stream=True):
        input = {
            "messages": [
-                { "role": "system", "content": "You are a friendly assistant" },
-                { "role": "user", "content": sentence }
+                {"role": "system", "content": "You are a friendly assistant"},
+                {"role": "user", "content": sentence}
            ]
        }

@@ -39,7 +44,8 @@ class CloudflareAIService(AIService):

    # https://developers.cloudflare.com/workers-ai/models/sentiment-analysis/
    def run_text_sentiment(self, sentence):
-        return self.run("@cf/huggingface/distilbert-sst-2-int8", {"text": sentence})
+        return self.run("@cf/huggingface/distilbert-sst-2-int8",
+                        {"text": sentence})

    # https://developers.cloudflare.com/workers-ai/models/image-classification/
    def run_image_classification(self, image_url):
@@ -57,9 +63,9 @@ class CloudflareAIService(AIService):
    # https://developers.cloudflare.com/workers-ai/models/embedding/
    def run_embeddings(self, texts, size="medium"):
        models = {
-            "small": "@cf/baai/bge-small-en-v1.5", # 384 output dimensions
-            "medium": "@cf/baai/bge-base-en-v1.5", # 768 output dimensions
-            "large": "@cf/baai/bge-large-en-v1.5" #1024 output dimensions
+            "small": "@cf/baai/bge-small-en-v1.5",  # 384 output dimensions
+            "medium": "@cf/baai/bge-base-en-v1.5",  # 768 output dimensions
+            "large": "@cf/baai/bge-large-en-v1.5"  # 1024 output dimensions
        }

        return self.run(models[size], {"text": texts})
--- a/src/dailyai/services/to_be_updated/deepgram_ai_service.py
+++ b/src/dailyai/services/to_be_updated/deepgram_ai_service.py
@@ -1,28 +0,0 @@
-import os
-import requests
-
-from services.ai_service import AIService
-from PIL import Image
-
-
-class DeepgramAIService(AIService):
-    def __init__(self, **kwargs):
-        super().__init__(**kwargs)
-
-        self.api_key = os.getenv("DEEPGRAM_API_KEY")
-
-    def get_mic_sample_rate(self):
-        return 24000
-
-    def run_tts(self, sentence):
-        self.logger.info(f"Running deepgram tts for {sentence}")
-        base_url = "https://api.beta.deepgram.com/v1/speak"
-        voice = os.getenv("DEEPGRAM_VOICE") or "alpha-apollo-en-v1"  # move this to an environment variable
-        request_url = f"{base_url}?model={voice}&encoding=linear16&container=none"
-        headers = {"authorization": f"token {self.api_key}"}
-
-        r = requests.post(request_url, headers=headers, data=sentence)
-        self.logger.info(
-            f"audio fetch status code: {r.status_code}, content length: {len(r.content)}"
-        )
-        yield r.content
--- a/src/dailyai/services/to_be_updated/google_ai_service.py
+++ b/src/dailyai/services/to_be_updated/google_ai_service.py
@@ -2,9 +2,12 @@ from services.ai_service import AIService
 import openai
 import os

-# To use Google Cloud's AI products, you'll need to install Google Cloud CLI and enable the TTS and in your project: https://cloud.google.com/sdk/docs/install
+# To use Google Cloud's AI products, you'll need to install Google Cloud
+# CLI and enable the TTS and in your project:
+# https://cloud.google.com/sdk/docs/install
 from google.cloud import texttospeech

+
 class GoogleAIService(AIService):
    def __init__(self):
        super().__init__()
@@ -15,11 +18,14 @@ class GoogleAIService(AIService):
        )

        self.audio_config = texttospeech.AudioConfig(
-            audio_encoding = texttospeech.AudioEncoding.LINEAR16,
-            sample_rate_hertz = 16000
+            audio_encoding=texttospeech.AudioEncoding.LINEAR16,
+            sample_rate_hertz=16000
        )

    def run_tts(self, sentence):
-        synthesis_input = texttospeech.SynthesisInput(text = sentence.strip())
-        result = self.client.synthesize_speech(input=synthesis_input, voice=self.voice, audio_config=self.audio_config)
+        synthesis_input = texttospeech.SynthesisInput(text=sentence.strip())
+        result = self.client.synthesize_speech(
+            input=synthesis_input,
+            voice=self.voice,
+            audio_config=self.audio_config)
        return result
--- a/src/dailyai/services/to_be_updated/huggingface_ai_service.py
+++ b/src/dailyai/services/to_be_updated/huggingface_ai_service.py
@@ -1,7 +1,12 @@
 from services.ai_service import AIService
 from transformers import pipeline

-# These functions are just intended for testing, not production use. If you'd like to use HuggingFace, you should use your own models, or do some research into the specific models that will work best for your use case.
+# These functions are just intended for testing, not production use. If
+# you'd like to use HuggingFace, you should use your own models, or do
+# some research into the specific models that will work best for your use
+# case.
+
+
 class HuggingFaceAIService(AIService):
    def __init__(self):
        super().__init__()
@@ -10,9 +15,12 @@ class HuggingFaceAIService(AIService):
        classifier = pipeline("sentiment-analysis")
        return classifier(sentence)

-    # available models at https://huggingface.co/Helsinki-NLP (**not all models use 2-character language codes**)
+    # available models at https://huggingface.co/Helsinki-NLP (**not all
+    # models use 2-character language codes**)
    def run_text_translation(self, sentence, source_language, target_language):
-        translator = pipeline(f"translation", model=f"Helsinki-NLP/opus-mt-{source_language}-{target_language}")
+        translator = pipeline(
+            f"translation",
+            model=f"Helsinki-NLP/opus-mt-{source_language}-{target_language}")

        return translator(sentence)[0]["translation_text"]

--- a/src/dailyai/services/to_be_updated/mock_ai_service.py
+++ b/src/dailyai/services/to_be_updated/mock_ai_service.py
@@ -4,6 +4,7 @@ import time
 from PIL import Image
 from services.ai_service import AIService

+
 class MockAIService(AIService):
    def __init__(self):
        super().__init__()
@@ -20,8 +21,7 @@ class MockAIService(AIService):
        time.sleep(1)
        return (image_url, image)

-    def run_llm(self, messages, latest_user_message=None, stream = True):
+    def run_llm(self, messages, latest_user_message=None, stream=True):
        for i in range(5):
            time.sleep(1)
-            yield({"choices": [{"delta": {"content": f"hello {i}!"}}]})
-
+            yield ({"choices": [{"delta": {"content": f"hello {i}!"}}]})
--- a/src/dailyai/services/whisper_ai_services.py
+++ b/src/dailyai/services/whisper_ai_services.py
@@ -0,0 +1,55 @@
+"""This module implements Whisper transcription with a locally-downloaded model."""
+import asyncio
+from enum import Enum
+import logging
+from typing import BinaryIO
+from faster_whisper import WhisperModel
+from dailyai.services.local_stt_service import LocalSTTService
+
+
+class Model(Enum):
+    """Class of basic Whisper model selection options"""
+    TINY = "tiny"
+    BASE = "base"
+    MEDIUM = "medium"
+    LARGE = "large-v3"
+    DISTIL_LARGE_V2 = "Systran/faster-distil-whisper-large-v2"
+    DISTIL_MEDIUM_EN = "Systran/faster-distil-whisper-medium.en"
+
+
+class WhisperSTTService(LocalSTTService):
+    """Class to transcribe audio with a locally-downloaded Whisper model"""
+    _model: WhisperModel
+
+    # Model configuration
+    _model_name: Model
+    _device: str
+    _compute_type: str
+
+    def __init__(self, model_name: Model = Model.DISTIL_MEDIUM_EN,
+                 device: str = "auto",
+                 compute_type: str = "default"):
+
+        super().__init__()
+        self.logger: logging.Logger = logging.getLogger("dailyai")
+        self._model_name = model_name
+        self._device = device
+        self._compute_type = compute_type
+        self._load()
+
+    def _load(self):
+        """Loads the Whisper model. Note that if this is the first time
+        this model is being run, it will take time to download."""
+        model = WhisperModel(
+            self._model_name.value,
+            device=self._device,
+            compute_type=self._compute_type)
+        self._model = model
+
+    async def run_stt(self, audio: BinaryIO) -> str:
+        """Transcribes given audio using Whisper"""
+        segments, _ = await asyncio.to_thread(self._model.transcribe, audio)
+        res: str = ""
+        for segment in segments:
+            res += f"{segment.text} "
+        return res
--- a/src/dailyai/tests/integration/integration_azure_llm.py
+++ b/src/dailyai/tests/integration/integration_azure_llm.py
@@ -0,0 +1,28 @@
+import asyncio
+import os
+from dailyai.pipeline.frames import (
+    OpenAILLMContextFrame,
+)
+from dailyai.services.azure_ai_services import AzureLLMService
+from dailyai.services.openai_llm_context import OpenAILLMContext
+
+from openai.types.chat import (
+    ChatCompletionSystemMessageParam,
+)
+
+if __name__ == "__main__":
+    async def test_chat():
+        llm = AzureLLMService(
+            api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
+            endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
+            model=os.getenv("AZURE_CHATGPT_MODEL"),
+        )
+        context = OpenAILLMContext()
+        message: ChatCompletionSystemMessageParam = ChatCompletionSystemMessageParam(
+            content="Please tell the world hello.", name="system", role="system")
+        context.add_message(message)
+        frame = OpenAILLMContextFrame(context)
+        async for s in llm.process_frame(frame):
+            print(s)
+
+    asyncio.run(test_chat())
--- a/src/dailyai/tests/integration/integration_ollama_llm.py
+++ b/src/dailyai/tests/integration/integration_ollama_llm.py
@@ -0,0 +1,23 @@
+import asyncio
+from dailyai.pipeline.frames import (
+    OpenAILLMContextFrame,
+)
+from dailyai.services.openai_llm_context import OpenAILLMContext
+
+from openai.types.chat import (
+    ChatCompletionSystemMessageParam,
+)
+from dailyai.services.ollama_ai_services import OLLamaLLMService
+
+if __name__ == "__main__":
+    async def test_chat():
+        llm = OLLamaLLMService()
+        context = OpenAILLMContext()
+        message: ChatCompletionSystemMessageParam = ChatCompletionSystemMessageParam(
+            content="Please tell the world hello.", name="system", role="system")
+        context.add_message(message)
+        frame = OpenAILLMContextFrame(context)
+        async for s in llm.process_frame(frame):
+            print(s)
+
+    asyncio.run(test_chat())
--- a/src/dailyai/tests/integration/integration_openai_llm.py
+++ b/src/dailyai/tests/integration/integration_openai_llm.py
@@ -0,0 +1,85 @@
+import asyncio
+import os
+from dailyai.pipeline.frames import (
+    OpenAILLMContextFrame,
+)
+from dailyai.services.openai_llm_context import OpenAILLMContext
+
+from openai.types.chat import (
+    ChatCompletionSystemMessageParam,
+    ChatCompletionToolParam,
+    ChatCompletionUserMessageParam,
+)
+
+from dailyai.services.openai_api_llm_service import BaseOpenAILLMService
+
+if __name__ == "__main__":
+    async def test_functions():
+        tools = [
+            ChatCompletionToolParam(
+                type="function",
+                function={
+                    "name": "get_current_weather",
+                    "description": "Get the current weather",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "location": {
+                                "type": "string",
+                                "description": "The city and state, e.g. San Francisco, CA",
+                            },
+                            "format": {
+                                "type": "string",
+                                "enum": [
+                                    "celsius",
+                                    "fahrenheit"],
+                                "description": "The temperature unit to use. Infer this from the users location.",
+                            },
+                        },
+                        "required": [
+                            "location",
+                            "format"],
+                    },
+                })]
+
+        api_key = os.getenv("OPENAI_API_KEY")
+
+        llm = BaseOpenAILLMService(
+            api_key=api_key or "",
+            model="gpt-4-1106-preview",
+        )
+        context = OpenAILLMContext(tools=tools)
+        system_message: ChatCompletionSystemMessageParam = ChatCompletionSystemMessageParam(
+            content="Ask the user to ask for a weather report", name="system", role="system"
+        )
+        user_message: ChatCompletionUserMessageParam = ChatCompletionUserMessageParam(
+            content="Could you tell me the weather for Boulder, Colorado",
+            name="user",
+            role="user",
+        )
+        context.add_message(system_message)
+        context.add_message(user_message)
+        frame = OpenAILLMContextFrame(context)
+        async for s in llm.process_frame(frame):
+            print(s)
+
+    async def test_chat():
+        api_key = os.getenv("OPENAI_API_KEY")
+
+        llm = BaseOpenAILLMService(
+            api_key=api_key or "",
+            model="gpt-4-1106-preview",
+        )
+        context = OpenAILLMContext()
+        message: ChatCompletionSystemMessageParam = ChatCompletionSystemMessageParam(
+            content="Please tell the world hello.", name="system", role="system")
+        context.add_message(message)
+        frame = OpenAILLMContextFrame(context)
+        async for s in llm.process_frame(frame):
+            print(s)
+
+    async def run_tests():
+        await test_functions()
+        await test_chat()
+
+    asyncio.run(run_tests())
--- a/src/dailyai/tests/test_aggregators.py
+++ b/src/dailyai/tests/test_aggregators.py
@@ -0,0 +1,129 @@
+import asyncio
+import doctest
+import functools
+import unittest
+
+from dailyai.pipeline.aggregators import (
+    GatedAggregator,
+    ParallelPipeline,
+    SentenceAggregator,
+    StatelessTextTransformer,
+)
+from dailyai.pipeline.frames import (
+    AudioFrame,
+    EndFrame,
+    ImageFrame,
+    LLMResponseEndFrame,
+    LLMResponseStartFrame,
+    Frame,
+    TextFrame,
+)
+
+from dailyai.pipeline.pipeline import Pipeline
+
+
+class TestDailyFrameAggregators(unittest.IsolatedAsyncioTestCase):
+    async def test_sentence_aggregator(self):
+        sentence = "Hello, world. How are you? I am fine"
+        expected_sentences = ["Hello, world.", " How are you?", " I am fine "]
+        aggregator = SentenceAggregator()
+        for word in sentence.split(" "):
+            async for sentence in aggregator.process_frame(TextFrame(word + " ")):
+                self.assertIsInstance(sentence, TextFrame)
+                if isinstance(sentence, TextFrame):
+                    self.assertEqual(sentence.text, expected_sentences.pop(0))
+
+        async for sentence in aggregator.process_frame(EndFrame()):
+            if len(expected_sentences):
+                self.assertIsInstance(sentence, TextFrame)
+                if isinstance(sentence, TextFrame):
+                    self.assertEqual(sentence.text, expected_sentences.pop(0))
+            else:
+                self.assertIsInstance(sentence, EndFrame)
+
+        self.assertEqual(expected_sentences, [])
+
+    async def test_gated_accumulator(self):
+        gated_aggregator = GatedAggregator(
+            gate_open_fn=lambda frame: isinstance(
+                frame, ImageFrame), gate_close_fn=lambda frame: isinstance(
+                frame, LLMResponseStartFrame), start_open=False, )
+
+        frames = [
+            LLMResponseStartFrame(),
+            TextFrame("Hello, "),
+            TextFrame("world."),
+            AudioFrame(b"hello"),
+            ImageFrame("image", b"image"),
+            AudioFrame(b"world"),
+            LLMResponseEndFrame(),
+        ]
+
+        expected_output_frames = [
+            ImageFrame("image", b"image"),
+            LLMResponseStartFrame(),
+            TextFrame("Hello, "),
+            TextFrame("world."),
+            AudioFrame(b"hello"),
+            AudioFrame(b"world"),
+            LLMResponseEndFrame(),
+        ]
+        for frame in frames:
+            async for out_frame in gated_aggregator.process_frame(frame):
+                self.assertEqual(out_frame, expected_output_frames.pop(0))
+        self.assertEqual(expected_output_frames, [])
+
+    async def test_parallel_pipeline(self):
+
+        async def slow_add(sleep_time: float, name: str, x: str):
+            await asyncio.sleep(sleep_time)
+            return ":".join([x, name])
+
+        pipe1_annotation = StatelessTextTransformer(
+            functools.partial(slow_add, 0.1, 'pipe1'))
+        pipe2_annotation = StatelessTextTransformer(
+            functools.partial(slow_add, 0.2, 'pipe2'))
+        sentence_aggregator = SentenceAggregator()
+        add_dots = StatelessTextTransformer(lambda x: x + ".")
+
+        source = asyncio.Queue()
+        sink = asyncio.Queue()
+        pipeline = Pipeline(
+            [
+                ParallelPipeline(
+                    [[pipe1_annotation], [sentence_aggregator, pipe2_annotation]]
+                ),
+                add_dots,
+            ],
+            source,
+            sink,
+        )
+
+        frames = [
+            TextFrame("Hello, "),
+            TextFrame("world."),
+            EndFrame()
+        ]
+
+        expected_output_frames: list[Frame] = [
+            TextFrame(text='Hello, :pipe1.'),
+            TextFrame(text='world.:pipe1.'),
+            TextFrame(text='Hello, world.:pipe2.'),
+            EndFrame()
+        ]
+
+        for frame in frames:
+            await source.put(frame)
+
+        await pipeline.run_pipeline()
+
+        while not sink.empty():
+            frame = await sink.get()
+            self.assertEqual(frame, expected_output_frames.pop(0))
+
+
+def load_tests(loader, tests, ignore):
+    """ Run doctests on the aggregators module. """
+    from dailyai.pipeline import aggregators
+    tests.addTests(doctest.DocTestSuite(aggregators))
+    return tests
--- a/src/dailyai/tests/test_ai_services.py
+++ b/src/dailyai/tests/test_ai_services.py
@@ -1,129 +1,32 @@
-from re import A
 import unittest

 from typing import AsyncGenerator, Generator

-from dailyai.services.ai_services import AIService, SentenceAggregator
-from dailyai.queue_frame import QueueFrame, FrameType
+from dailyai.services.ai_services import AIService
+from dailyai.pipeline.frames import EndFrame, Frame, TextFrame
+

 class SimpleAIService(AIService):
-    def allowed_input_frame_types(self) -> set[FrameType]:
-        return set([FrameType.TEXT_CHUNK])
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        yield frame

-    def possible_output_frame_types(self) -> set[FrameType]:
-        return set([FrameType.TEXT_CHUNK])
-
-    async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> QueueFrame | None:
-        return frame

 class TestBaseAIService(unittest.IsolatedAsyncioTestCase):
-    async def test_async_input(self):
+    async def test_simple_processing(self):
        service = SimpleAIService()

        input_frames = [
-            QueueFrame(FrameType.TEXT_CHUNK, "hello"),
-            QueueFrame(FrameType.END_STREAM, None),
+            TextFrame("hello"),
+            EndFrame()
        ]
-        async def iterate_frames() -> AsyncGenerator[QueueFrame, None]:
-            for frame in input_frames:
-                yield frame

        output_frames = []
-        async for frame in service.run(set([FrameType.TEXT_CHUNK]), iterate_frames()):
-            output_frames.append(frame)
+        for input_frame in input_frames:
+            async for output_frame in service.process_frame(input_frame):
+                output_frames.append(output_frame)

        self.assertEqual(input_frames, output_frames)

-    async def test_nonasync_input(self):
-        service = SimpleAIService()
-
-        input_frames = [
-            QueueFrame(FrameType.TEXT_CHUNK, "hello"),
-            QueueFrame(FrameType.END_STREAM, None),
-        ]
-
-        def iterate_frames() -> Generator[QueueFrame, None, None]:
-            for frame in input_frames:
-                yield frame
-
-        output_frames = []
-        async for frame in service.run(set([FrameType.TEXT_CHUNK]), iterate_frames()):
-            output_frames.append(frame)
-
-        self.assertEqual(input_frames, output_frames)
-
-
-class TestSentenceAggregator(unittest.IsolatedAsyncioTestCase):
-    async def test_clause(self) -> None:
-        input_frames = [
-            QueueFrame(FrameType.TEXT_CHUNK, "hello"),
-            QueueFrame(FrameType.END_STREAM, None),
-        ]
-
-        service = SentenceAggregator()
-        output_frames = []
-        async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
-            output_frames.append(frame)
-
-        self.assertEqual(1, len(output_frames))
-        self.assertEqual(QueueFrame(FrameType.SENTENCE, "hello"), output_frames[0])
-
-    async def test_sentence(self) -> None:
-        input_frames = [
-            QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
-            QueueFrame(FrameType.TEXT_CHUNK, "world."),
-            QueueFrame(FrameType.END_STREAM, None),
-        ]
-
-        service = SentenceAggregator()
-        output_frames = []
-        async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
-            output_frames.append(frame)
-
-        self.assertEqual(1, len(output_frames))
-        self.assertEqual(QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0])
-
-    async def test_sentence_and_clause(self) -> None:
-        input_frames = [
-            QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
-            QueueFrame(FrameType.TEXT_CHUNK, "world."),
-            QueueFrame(FrameType.TEXT_CHUNK, " How are"),
-            QueueFrame(FrameType.END_STREAM, None),
-        ]
-
-        service = SentenceAggregator()
-        output_frames = []
-        async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
-            output_frames.append(frame)
-
-        self.assertEqual(2, len(output_frames))
-        self.assertEqual(
-            QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0]
-        )
-        self.assertEqual(
-            QueueFrame(FrameType.SENTENCE, " How are"), output_frames[1]
-        )
-
-    async def test_two_sentences(self) -> None:
-        input_frames = [
-            QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
-            QueueFrame(FrameType.TEXT_CHUNK, "world."),
-            QueueFrame(FrameType.TEXT_CHUNK, " How are"),
-            QueueFrame(FrameType.TEXT_CHUNK, " you doing?"),
-            QueueFrame(FrameType.END_STREAM, None),
-        ]
-
-        service = SentenceAggregator()
-        output_frames = []
-        async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
-            output_frames.append(frame)
-
-        self.assertEqual(2, len(output_frames))
-        self.assertEqual(
-            QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0]
-        )
-        self.assertEqual(QueueFrame(FrameType.SENTENCE, " How are you doing?"), output_frames[1])
-

 if __name__ == "__main__":
    unittest.main()
--- a/src/dailyai/tests/test_asyncprocessor.py
+++ b/src/dailyai/tests/test_asyncprocessor.py
@@ -1,180 +0,0 @@
-import time
-import unittest
-
-from queue import Queue, Empty
-from threading import Thread, Event
-from typing import Generator
-
-from dailyai.async_processor.async_processor import (
-    AsyncProcessor,
-    AsyncProcessorState,
-    LLMResponse,
-)
-from dailyai.message_handler.message_handler import MessageHandler
-from dailyai.queue_frame import QueueFrame, FrameType
-from dailyai.services.ai_services import (
-    AIServiceConfig,
-    ImageGenService,
-    LLMService,
-    TTSService,
-)
-"""
-class MockTTSService(TTSService):
-    def run_tts(self, sentence):
-        for word in sentence.split(' '):
-            time.sleep(0.1)
-            yield bytes(word, "utf-8")
-
-class MockLLMService(LLMService):
-    def run_llm_async(self, messages) -> Generator[str, None, None]:
-        for i in ["Hello ", "there.", "How are ", "you?", "I ", "hope ", "you ", "are ", "well."]:
-            time.sleep(0.1)
-            yield i
-
-class MockImageService(ImageGenService):
-    def run_image_gen(self, sentence) -> None:
-        return None
-
-class TestResponse(unittest.TestCase):
-    def test_base_state_transitions(self):
-        mock_tts_service = MockTTSService()
-        mock_llm_service = MockLLMService()
-        mock_image_service = MockImageService()
-        processor = AsyncProcessor(AIServiceConfig(tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service))
-        processor.prepare()
-        processor.play()
-        processor.finalize()
-        self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
-
-    def test_state_transitions(self):
-        output_queue = Queue()
-        mock_tts_service = MockTTSService()
-        mock_llm_service = MockLLMService()
-        mock_image_service = MockImageService()
-        message_handler = MessageHandler("Hello World")
-        processor = LLMResponse(
-            AIServiceConfig(
-                tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
-            ),
-            message_handler,
-            output_queue,
-        )
-        processor.prepare()
-        processor.play()
-
-        # Consume the output from the output queue. It's necessary to mark these tasks as done for the
-        # play function to return.
-        expected_words = ["Hello", "there.", "How", "are", "you?", "I", "hope", "you", "are", "well."]
-
-        # remove the "start_stream" message from the queue
-        output_queue.get()
-        output_queue.task_done()
-
-        while expected_words:
-            actual_word:QueueFrame = output_queue.get()
-            word = expected_words.pop(0)
-            self.assertEqual(actual_word.frame_type, FrameType.AUDIO_FRAME)
-            self.assertEqual(actual_word.frame_data, bytes(word, "utf-8"))
-            output_queue.task_done()
-
-        processor.finalize()
-
-        self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
-
-    def test_interrupt_preparation(self):
-        output_queue = Queue()
-        mock_tts_service = MockTTSService()
-        mock_llm_service = MockLLMService()
-        mock_image_service = MockImageService()
-        message_handler = MessageHandler("System Message")
-        processor = LLMResponse(
-            AIServiceConfig(
-                tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
-            ),
-            message_handler,
-            output_queue,
-        )
-        processor.prepare()
-        interrupt_request_at = time.perf_counter()
-        processor.interrupt()
-        processor.finalize()
-        finalized_at = time.perf_counter()
-        self.assertTrue(0.1 < finalized_at - interrupt_request_at < 0.2)
-        print(f"delta: {interrupt_request_at, finalized_at}")
-        self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
-
-    def test_interrupt_play(self):
-        output_queue = Queue()
-        mock_tts_service = MockTTSService()
-        mock_llm_service = MockLLMService()
-        mock_image_service = MockImageService()
-        message_handler = MessageHandler("System Message")
-        processor = LLMResponse(
-            AIServiceConfig(
-                tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
-            ),
-            message_handler,
-            output_queue,
-        )
-        processor.prepare()
-        processor.play()
-
-        stop_processing_output_queue = Event()
-        def process_output_queue_async():
-            # Consume the output from the output queue. It's necessary to mark these tasks as done for the
-            # play function to return.
-            time.sleep(0.1)
-            expected_words = ["Hello", "there.", "How", "are", "you?", "I", "hope", "you", "are", "well."]
-            while expected_words and not stop_processing_output_queue.is_set():
-                try:
-                    actual_word:QueueFrame = output_queue.get_nowait()
-                    if actual_word.frame_type == FrameType.AUDIO_FRAME:
-                        time.sleep(0.1)
-                        word = expected_words.pop(0)
-                        self.assertEqual(actual_word.frame_type, FrameType.AUDIO_FRAME)
-                        self.assertEqual(actual_word.frame_data, bytes(word, "utf-8"))
-                    output_queue.task_done()
-                except Empty:
-                    pass
-
-        process_output_queue = Thread(target=process_output_queue_async, daemon=True)
-        process_output_queue.start()
-
-        time.sleep(0.5)
-        processor.interrupt()
-
-        stop_processing_output_queue.set()
-        process_output_queue.join()
-
-        processor.finalize()
-        self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
-
-    def test_statechange_callback(self):
-        mock_tts_service = MockTTSService()
-        mock_llm_service = MockLLMService()
-        mock_image_service = MockImageService()
-        processor = AsyncProcessor(
-            AIServiceConfig(
-                tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
-            )
-        )
-        is_finalized = False
-        def set_is_finalized(async_processor:AsyncProcessor):
-            nonlocal is_finalized
-            is_finalized = True
-
-        processor.set_state_callback(
-            AsyncProcessorState.FINALIZED, set_is_finalized
-        )
-        processor.prepare()
-        self.assertFalse(is_finalized)
-        processor.play()
-        self.assertFalse(is_finalized)
-        processor.finalize()
-        self.assertTrue(is_finalized)
-        self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
-
-
-if __name__ == '__main__':
-    unittest.main()
-"""
--- a/src/dailyai/tests/test_daily_transport_service.py
+++ b/src/dailyai/tests/test_daily_transport_service.py
@@ -0,0 +1,92 @@
+import asyncio
+import threading
+import unittest
+
+from unittest.mock import MagicMock, patch
+
+from dailyai.pipeline.frames import AudioFrame, ImageFrame
+
+
+class TestDailyTransport(unittest.IsolatedAsyncioTestCase):
+
+    async def test_event_handler(self):
+        from dailyai.services.daily_transport_service import DailyTransportService
+
+        transport = DailyTransportService("mock.daily.co/mock", "token", "bot")
+
+        was_called = False
+
+        @transport.event_handler("on_first_other_participant_joined")
+        def test_event_handler(transport):
+            nonlocal was_called
+            was_called = True
+
+        transport.on_first_other_participant_joined()
+
+        self.assertTrue(was_called)
+
+    """
+    TODO: fix this test, it broke when I added the `.result` call in the patch.
+    async def test_event_handler_async(self):
+        from dailyai.services.daily_transport_service import DailyTransportService
+
+        transport = DailyTransportService("mock.daily.co/mock", "token", "bot")
+
+        event = asyncio.Event()
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def test_event_handler(transport):
+            nonlocal event
+            print("sleeping")
+            await asyncio.sleep(0.1)
+            print("setting")
+            event.set()
+            print("returning")
+
+        thread = threading.Thread(target=transport.on_first_other_participant_joined)
+        thread.start()
+        thread.join()
+
+        await asyncio.wait_for(event.wait(), timeout=1)
+        self.assertTrue(event.is_set())
+    """
+
+    """
+    @patch("dailyai.services.daily_transport_service.CallClient")
+    @patch("dailyai.services.daily_transport_service.Daily")
+    async def test_run_with_camera_and_mic(self, daily_mock, callclient_mock):
+        from dailyai.services.daily_transport_service import DailyTransportService
+        transport = DailyTransportService(
+            "https://mock.daily.co/mock",
+            "token",
+            "bot",
+            mic_enabled=True,
+            camera_enabled=True,
+            duration_minutes=0.01,
+        )
+
+        mic = MagicMock()
+        camera = MagicMock()
+        daily_mock.create_microphone_device.return_value = mic
+        daily_mock.create_camera_device.return_value = camera
+
+        async def send_audio_frame():
+            await transport.send_queue.put(AudioQueueFrame(bytes([0] * 3300)))
+
+        async def send_video_frame():
+            await transport.send_queue.put(ImageQueueFrame(None, b"test"))
+
+        await asyncio.gather(transport.run(), send_audio_frame(), send_video_frame())
+
+        daily_mock.init.assert_called_once_with()
+        daily_mock.create_microphone_device.assert_called_once()
+        daily_mock.create_camera_device.assert_called_once()
+
+        callclient_mock.return_value.set_user_name.assert_called_once_with("bot")
+        callclient_mock.return_value.join.assert_called_once_with(
+            "https://mock.daily.co/mock", "token", completion=transport.call_joined
+        )
+
+        camera.write_frame.assert_called_with(b"test")
+        mic.write_frames.assert_called()
+    """
--- a/src/dailyai/tests/test_message_handler.py
+++ b/src/dailyai/tests/test_message_handler.py
@@ -1,147 +0,0 @@
-import time
-import unittest
-
-from unittest.mock import MagicMock, call
-
-from dailyai.message_handler.message_handler import MessageHandler, IndexingMessageHandler
-from dailyai.services.ai_services import (
-    AIServiceConfig,
-    TTSService,
-    LLMService,
-    ImageGenService,
-)
-from ..storage.search import SearchIndexer
-
-
-class TestMessageHandler(unittest.TestCase):
-    def test_simple_intro(self):
-        message_handler = MessageHandler("Hello world")
-        self.assertEqual(
-            message_handler.get_llm_messages(),
-            [{"role": "system", "content": "Hello world"}],
-        )
-
-    def test_simple_user_message(self):
-        message_handler = MessageHandler("System prompt")
-        message_handler.add_user_message("User message")
-        self.assertEqual(
-            message_handler.get_llm_messages(),
-            [
-                {"role": "system", "content": "System prompt"},
-                {"role": "user", "content": "User message"},
-            ],
-        )
-
-    def test_simple_user_and_assistant_message(self):
-        message_handler = MessageHandler("System prompt")
-        message_handler.add_user_message("User message")
-        message_handler.add_assistant_message("Assistant message")
-        self.assertEqual(
-            message_handler.get_llm_messages(),
-            [
-                {"role": "system", "content": "System prompt"},
-                {"role": "user", "content": "User message"},
-                {"role": "assistant", "content": "Assistant message"},
-            ],
-        )
-
-    def test_user_message_overwrite(self):
-        message_handler = MessageHandler("System prompt")
-        message_handler.add_user_message("User message")
-        message_handler.add_assistant_message("Assistant message")
-        message_handler.add_user_message("plus something else")
-        self.assertEqual(
-            message_handler.get_llm_messages(),
-            [
-                {"role": "system", "content": "System prompt"},
-                {"role": "user", "content": "User message plus something else"},
-            ],
-        )
-
-    def test_user_message_after_assistant(self):
-        message_handler = MessageHandler("System prompt")
-        message_handler.add_user_message("User message")
-        message_handler.add_assistant_message("Assistant message")
-        message_handler.finalize_user_message()
-        message_handler.add_user_message("other user message")
-        self.assertEqual(
-            message_handler.get_llm_messages(),
-            [
-                {"role": "system", "content": "System prompt"},
-                {"role": "user", "content": "User message"},
-                {"role": "assistant", "content": "Assistant message"},
-                {"role": "user", "content": "other user message"},
-            ],
-        )
-
-
-class MockTTSService(TTSService):
-    def run_tts(self, sentence):
-        for word in sentence.split(" "):
-            time.sleep(0.1)
-            yield bytes(word, "utf-8")
-
-
-class MockLLMService(LLMService):
-    def run_llm(self, messages) -> str:
-        return "Parsed user message."
-
-class MockImageService(ImageGenService):
-    def run_image_gen(self, sentence) -> None:
-        return None
-
-
-class TestStorageMessageHandler(unittest.TestCase):
-    def test_user_message_finalized(self):
-        mock_tts_service = MockTTSService()
-        mock_llm_service = MockLLMService()
-        mock_image_service = MockImageService()
-
-        service_config = AIServiceConfig(
-            tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
-        )
-
-        mock_indexer = MagicMock(spec=SearchIndexer)
-
-        message_handler = IndexingMessageHandler(
-            "Hello world", service_config, mock_indexer
-        )
-        message_handler.cleanup_user_message = MagicMock(return_value="Parsed user message.")
-        message_handler.add_user_message("User message")
-        message_handler.add_assistant_message("Assistant message will be ignored")
-        message_handler.add_user_message("plus something else")
-        message_handler.finalize_user_message()
-        message_handler.add_assistant_message(
-            "New assistant message will not be ignored"
-        )
-        message_handler.add_user_message("User message second time")
-        message_handler.add_assistant_message("Assistant message second time")
-        message_handler.write_messages_to_storage()
-
-        time.sleep(0.5)
-        message_handler.cleanup_user_message.assert_called_with("User message plus something else")
-        self.assertEqual(
-            mock_indexer.mock_calls,
-            [
-                call.index_text('"Parsed user message."'),
-                call.index_text("New assistant message will not be ignored"),
-            ],
-        )
-
-        mock_indexer.reset_mock()
-
-        message_handler.finalize_user_message()
-
-        time.sleep(0.5)
-
-        self.assertEqual(
-            mock_indexer.mock_calls,
-            [
-                call.index_text('"Parsed user message."'),
-                call.index_text("Assistant message second time"),
-            ],
-        )
-
-
-if __name__ == "__main__":
-    unittest.main()
--- a/src/dailyai/tests/test_pipeline.py
+++ b/src/dailyai/tests/test_pipeline.py
@@ -0,0 +1,59 @@
+import asyncio
+import unittest
+from dailyai.pipeline.aggregators import SentenceAggregator, StatelessTextTransformer
+from dailyai.pipeline.frames import EndFrame, TextFrame
+
+from dailyai.pipeline.pipeline import Pipeline
+
+
+class TestDailyPipeline(unittest.IsolatedAsyncioTestCase):
+
+    async def test_pipeline_simple(self):
+        aggregator = SentenceAggregator()
+
+        outgoing_queue = asyncio.Queue()
+        incoming_queue = asyncio.Queue()
+        pipeline = Pipeline([aggregator], incoming_queue, outgoing_queue)
+
+        await incoming_queue.put(TextFrame("Hello, "))
+        await incoming_queue.put(TextFrame("world."))
+        await incoming_queue.put(EndFrame())
+
+        await pipeline.run_pipeline()
+
+        self.assertEqual(await outgoing_queue.get(), TextFrame("Hello, world."))
+        self.assertIsInstance(await outgoing_queue.get(), EndFrame)
+
+    async def test_pipeline_multiple_stages(self):
+        sentence_aggregator = SentenceAggregator()
+        to_upper = StatelessTextTransformer(lambda x: x.upper())
+        add_space = StatelessTextTransformer(lambda x: x + " ")
+
+        outgoing_queue = asyncio.Queue()
+        incoming_queue = asyncio.Queue()
+        pipeline = Pipeline(
+            [add_space, sentence_aggregator, to_upper],
+            incoming_queue,
+            outgoing_queue
+        )
+
+        sentence = "Hello, world. It's me, a pipeline."
+        for c in sentence:
+            await incoming_queue.put(TextFrame(c))
+        await incoming_queue.put(EndFrame())
+
+        await pipeline.run_pipeline()
+
+        self.assertEqual(
+            await outgoing_queue.get(), TextFrame("H E L L O ,   W O R L D .")
+        )
+        self.assertEqual(
+            await outgoing_queue.get(),
+            TextFrame("   I T ' S   M E ,   A   P I P E L I N E ."),
+        )
+        # leftover little bit because of the spacing
+        self.assertEqual(
+            await outgoing_queue.get(),
+            TextFrame(" "),
+        )
+        self.assertIsInstance(await outgoing_queue.get(), EndFrame)
--- a/src/examples/foundational/01-say-one-thing.py
+++ b/src/examples/foundational/01-say-one-thing.py
@@ -0,0 +1,51 @@
+import asyncio
+import aiohttp
+import logging
+import os
+from dailyai.pipeline.frames import EndFrame, TextFrame
+from dailyai.pipeline.pipeline import Pipeline
+
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            None,
+            "Say One Thing",
+            mic_enabled=True,
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        pipeline = Pipeline([tts])
+
+        # Register an event handler so we can play the audio when the
+        # participant joins.
+        @transport.event_handler("on_participant_joined")
+        async def on_participant_joined(transport, participant):
+            if participant["info"]["isLocal"]:
+                return
+
+            participant_name = participant["info"]["userName"] or ''
+            await pipeline.queue_frames([TextFrame("Hello there, " + participant_name + "!"), EndFrame()])
+
+        await transport.run(pipeline)
+        del tts
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url))
--- a/src/examples/foundational/01a-local-transport.py
+++ b/src/examples/foundational/01a-local-transport.py
@@ -0,0 +1,38 @@
+import asyncio
+import aiohttp
+import logging
+import os
+
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.services.local_transport_service import LocalTransportService
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        meeting_duration_minutes = 1
+        transport = LocalTransportService(
+            duration_minutes=meeting_duration_minutes, mic_enabled=True
+        )
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        async def say_something():
+            await asyncio.sleep(1)
+            await tts.say(
+                "Hello there.",
+                transport.send_queue,
+            )
+            await transport.stop_when_done()
+
+        await asyncio.gather(transport.run(), say_something())
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/src/examples/foundational/02-llm-say-one-thing.py
+++ b/src/examples/foundational/02-llm-say-one-thing.py
@@ -0,0 +1,56 @@
+import asyncio
+import os
+import logging
+
+import aiohttp
+
+from dailyai.pipeline.frames import EndFrame, LLMMessagesQueueFrame
+from dailyai.pipeline.pipeline import Pipeline
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.services.open_ai_services import OpenAILLMService
+
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            None,
+            "Say One Thing From an LLM",
+            mic_enabled=True,
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        messages = [
+            {
+                "role": "system",
+                "content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
+            }]
+
+        pipeline = Pipeline([llm, tts])
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            await pipeline.queue_frames([LLMMessagesQueueFrame(messages), EndFrame()])
+
+        await transport.run(pipeline)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url))
--- a/src/examples/foundational/03-still-frame.py
+++ b/src/examples/foundational/03-still-frame.py
@@ -0,0 +1,54 @@
+import asyncio
+import aiohttp
+import logging
+import os
+
+from dailyai.pipeline.frames import EndFrame, TextFrame
+from dailyai.pipeline.pipeline import Pipeline
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.fal_ai_services import FalImageGenService
+
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            None,
+            "Show a still frame image",
+            camera_enabled=True,
+            camera_width=1024,
+            camera_height=1024,
+            duration_minutes=1
+        )
+
+        imagegen = FalImageGenService(
+            image_size="square_hd",
+            aiohttp_session=session,
+            key_id=os.getenv("FAL_KEY_ID"),
+            key_secret=os.getenv("FAL_KEY_SECRET"),
+        )
+
+        pipeline = Pipeline([imagegen])
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            # Note that we do not put an EndFrame() item in the pipeline for this demo.
+            # This means that the bot will stay in the channel until it times out.
+            # An EndFrame() in the pipeline would cause the transport to shut
+            # down.
+            await pipeline.queue_frames(
+                [TextFrame("a cat in the style of picasso")]
+            )
+
+        await transport.run(pipeline)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url))
--- a/src/examples/foundational/03a-image-local.py
+++ b/src/examples/foundational/03a-image-local.py
@@ -0,0 +1,55 @@
+import asyncio
+import aiohttp
+import logging
+import os
+
+import tkinter as tk
+
+from dailyai.pipeline.frames import TextFrame
+from dailyai.services.fal_ai_services import FalImageGenService
+from dailyai.services.local_transport_service import LocalTransportService
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+local_joined = False
+participant_joined = False
+
+
+async def main():
+    async with aiohttp.ClientSession() as session:
+        meeting_duration_minutes = 2
+        tk_root = tk.Tk()
+        tk_root.title("Calendar")
+        transport = LocalTransportService(
+            tk_root=tk_root,
+            mic_enabled=True,
+            camera_enabled=True,
+            camera_width=1024,
+            camera_height=1024,
+            duration_minutes=meeting_duration_minutes,
+        )
+
+        imagegen = FalImageGenService(
+            image_size="1024x1024",
+            aiohttp_session=session,
+            key_id=os.getenv("FAL_KEY_ID"),
+            key_secret=os.getenv("FAL_KEY_SECRET"),
+        )
+        image_task = asyncio.create_task(
+            imagegen.run_to_queue(
+                transport.send_queue, [
+                    TextFrame("a cat in the style of picasso")]))
+
+        async def run_tk():
+            while not transport._stop_threads.is_set():
+                tk_root.update()
+                tk_root.update_idletasks()
+                await asyncio.sleep(0.1)
+
+        await asyncio.gather(transport.run(), image_task, run_tk())
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/src/examples/foundational/04-utterance-and-speech.py
+++ b/src/examples/foundational/04-utterance-and-speech.py
@@ -0,0 +1,81 @@
+import asyncio
+import logging
+import os
+
+import aiohttp
+from dailyai.pipeline.merge_pipeline import SequentialMergePipeline
+from dailyai.pipeline.pipeline import Pipeline
+
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
+from dailyai.services.deepgram_ai_services import DeepgramTTSService
+from dailyai.pipeline.frames import EndFrame, EndPipeFrame, LLMMessagesQueueFrame, TextFrame
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url: str):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            None,
+            "Static And Dynamic Speech",
+            duration_minutes=1,
+            mic_enabled=True,
+            mic_sample_rate=16000,
+        )
+
+        llm = AzureLLMService(
+            api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
+            endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
+            model=os.getenv("AZURE_CHATGPT_MODEL"),
+        )
+        azure_tts = AzureTTSService(
+            api_key=os.getenv("AZURE_SPEECH_API_KEY"),
+            region=os.getenv("AZURE_SPEECH_REGION"),
+        )
+
+        deepgram_tts = DeepgramTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("DEEPGRAM_API_KEY"),
+        )
+        elevenlabs_tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        messages = [{"role": "system",
+                     "content": "tell the user a joke about llamas"}]
+
+        # Start a task to run the LLM to create a joke, and convert the LLM output to audio frames. This task
+        # will run in parallel with generating and speaking the audio for static text, so there's no delay to
+        # speak the LLM response.
+        llm_pipeline = Pipeline([llm, elevenlabs_tts])
+        await llm_pipeline.queue_frames([LLMMessagesQueueFrame(messages), EndPipeFrame()])
+
+        simple_tts_pipeline = Pipeline([azure_tts])
+        await simple_tts_pipeline.queue_frames(
+            [
+                TextFrame("My friend the LLM is going to tell a joke about llamas"),
+                EndPipeFrame(),
+            ]
+        )
+
+        merge_pipeline = SequentialMergePipeline(
+            [simple_tts_pipeline, llm_pipeline])
+
+        await asyncio.gather(
+            transport.run(merge_pipeline),
+            simple_tts_pipeline.run_pipeline(),
+            llm_pipeline.run_pipeline(),
+        )
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url))
--- a/src/examples/foundational/05-sync-speech-and-image.py
+++ b/src/examples/foundational/05-sync-speech-and-image.py
@@ -0,0 +1,144 @@
+import asyncio
+from re import S
+import aiohttp
+import os
+import logging
+
+from dataclasses import dataclass
+from typing import AsyncGenerator
+
+from dailyai.pipeline.aggregators import (
+    GatedAggregator,
+    LLMFullResponseAggregator,
+    ParallelPipeline,
+    SentenceAggregator,
+)
+from dailyai.pipeline.frames import (
+    Frame,
+    TextFrame,
+    EndFrame,
+    ImageFrame,
+    LLMMessagesQueueFrame,
+    LLMResponseStartFrame,
+)
+from dailyai.pipeline.frame_processor import FrameProcessor
+
+from dailyai.pipeline.pipeline import Pipeline
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.open_ai_services import OpenAILLMService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.services.fal_ai_services import FalImageGenService
+
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+@dataclass
+class MonthFrame(Frame):
+    month: str
+
+
+class MonthPrepender(FrameProcessor):
+    def __init__(self):
+        self.most_recent_month = "Placeholder, month frame not yet received"
+        self.prepend_to_next_text_frame = False
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, MonthFrame):
+            self.most_recent_month = frame.month
+        elif self.prepend_to_next_text_frame and isinstance(frame, TextFrame):
+            yield TextFrame(f"{self.most_recent_month}: {frame.text}")
+            self.prepend_to_next_text_frame = False
+        elif isinstance(frame, LLMResponseStartFrame):
+            self.prepend_to_next_text_frame = True
+            yield frame
+        else:
+            yield frame
+
+
+async def main(room_url):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            None,
+            "Month Narration Bot",
+            mic_enabled=True,
+            camera_enabled=True,
+            mic_sample_rate=16000,
+            camera_width=1024,
+            camera_height=1024,
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        imagegen = FalImageGenService(
+            image_size="square_hd",
+            aiohttp_session=session,
+            key_id=os.getenv("FAL_KEY_ID"),
+            key_secret=os.getenv("FAL_KEY_SECRET"),
+        )
+
+        gated_aggregator = GatedAggregator(
+            gate_open_fn=lambda frame: isinstance(
+                frame, ImageFrame), gate_close_fn=lambda frame: isinstance(
+                frame, LLMResponseStartFrame), start_open=False, )
+
+        sentence_aggregator = SentenceAggregator()
+        month_prepender = MonthPrepender()
+        llm_full_response_aggregator = LLMFullResponseAggregator()
+
+        pipeline = Pipeline(
+            processors=[
+                llm,
+                sentence_aggregator,
+                ParallelPipeline(
+                    [[month_prepender, tts], [llm_full_response_aggregator, imagegen]]
+                ),
+                gated_aggregator,
+            ],
+        )
+
+        frames = []
+        for month in [
+            "January",
+            "February",
+            "March",
+            "April",
+            "May",
+            "June",
+            "July",
+            "August",
+            "September",
+            "October",
+            "November",
+            "December",
+        ]:
+            messages = [
+                {
+                    "role": "system",
+                    "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
+                }
+            ]
+            frames.append(MonthFrame(month))
+            frames.append(LLMMessagesQueueFrame(messages))
+
+        frames.append(EndFrame())
+        await pipeline.queue_frames(frames)
+
+        await transport.run(pipeline, override_pipeline_source_queue=False)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url))
--- a/src/examples/foundational/05a-local-sync-speech-and-text.py
+++ b/src/examples/foundational/05a-local-sync-speech-and-text.py
@@ -0,0 +1,146 @@
+import aiohttp
+import argparse
+import asyncio
+import logging
+import tkinter as tk
+import os
+
+from dailyai.pipeline.frames import AudioFrame, ImageFrame
+from dailyai.services.open_ai_services import OpenAILLMService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.services.fal_ai_services import FalImageGenService
+from dailyai.services.local_transport_service import LocalTransportService
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url):
+    async with aiohttp.ClientSession() as session:
+        meeting_duration_minutes = 5
+        tk_root = tk.Tk()
+        tk_root.title("Calendar")
+
+        transport = LocalTransportService(
+            mic_enabled=True,
+            camera_enabled=True,
+            camera_width=1024,
+            camera_height=1024,
+            duration_minutes=meeting_duration_minutes,
+            tk_root=tk_root,
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        dalle = FalImageGenService(
+            image_size="1024x1024",
+            aiohttp_session=session,
+            key_id=os.getenv("FAL_KEY_ID"),
+            key_secret=os.getenv("FAL_KEY_SECRET"),
+        )
+
+        # Get a complete audio chunk from the given text. Splitting this into its own
+        # coroutine lets us ensure proper ordering of the audio chunks on the
+        # send queue.
+        async def get_all_audio(text):
+            all_audio = bytearray()
+            async for audio in tts.run_tts(text):
+                all_audio.extend(audio)
+
+            return all_audio
+
+        async def get_month_data(month):
+            messages = [
+                {
+                    "role": "system",
+                    "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
+                }
+            ]
+
+            image_description = await llm.run_llm(messages)
+            if not image_description:
+                return
+
+            to_speak = f"{month}: {image_description}"
+            audio_task = asyncio.create_task(get_all_audio(to_speak))
+            image_task = asyncio.create_task(
+                dalle.run_image_gen(image_description))
+            (audio, image_data) = await asyncio.gather(audio_task, image_task)
+
+            return {
+                "month": month,
+                "text": image_description,
+                "image_url": image_data[0],
+                "image": image_data[1],
+                "audio": audio,
+            }
+
+        months: list[str] = [
+            "January",
+            "February",
+            "March",
+            "April",
+            "May",
+            "June",
+            "July",
+            "August",
+            "September",
+            "October",
+            "November",
+            "December",
+        ]
+
+        async def show_images():
+            # This will play the months in the order they're completed. The benefit
+            # is we'll have as little delay as possible before the first month, and
+            # likely no delay between months, but the months won't display in
+            # order.
+            for month_data_task in asyncio.as_completed(month_tasks):
+                data = await month_data_task
+                if data:
+                    await transport.send_queue.put(
+                        [
+                            ImageFrame(data["image_url"], data["image"]),
+                            AudioFrame(data["audio"]),
+                        ]
+                    )
+
+            await asyncio.sleep(25)
+
+            # wait for the output queue to be empty, then leave the meeting
+            await transport.stop_when_done()
+
+        async def run_tk():
+            while not transport._stop_threads.is_set():
+                tk_root.update()
+                tk_root.update_idletasks()
+                await asyncio.sleep(0.1)
+
+        month_tasks = [
+            asyncio.create_task(
+                get_month_data(month)) for month in months]
+
+        await asyncio.gather(transport.run(), show_images(), run_tk())
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
+    parser.add_argument(
+        "-u",
+        "--url",
+        type=str,
+        required=True,
+        help="URL of the Daily room to join")
+
+    args, unknown = parser.parse_known_args()
+
+    asyncio.run(main(args.url))
--- a/src/examples/foundational/06-listen-and-respond.py
+++ b/src/examples/foundational/06-listen-and-respond.py
@@ -0,0 +1,85 @@
+import asyncio
+import aiohttp
+import logging
+import os
+from dailyai.pipeline.frames import LLMMessagesQueueFrame
+from dailyai.pipeline.pipeline import Pipeline
+
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.services.open_ai_services import OpenAILLMService
+from dailyai.services.ai_services import FrameLogger
+from dailyai.pipeline.aggregators import (
+    LLMAssistantContextAggregator,
+    LLMUserContextAggregator,
+)
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            token,
+            "Respond bot",
+            duration_minutes=5,
+            start_transcription=True,
+            mic_enabled=True,
+            mic_sample_rate=16000,
+            camera_enabled=False,
+            vad_enabled=True,
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+        fl = FrameLogger("Inner")
+        fl2 = FrameLogger("Outer")
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        tma_in = LLMUserContextAggregator(
+            messages, transport._my_participant_id)
+        tma_out = LLMAssistantContextAggregator(
+            messages, transport._my_participant_id
+        )
+        pipeline = Pipeline(
+            processors=[
+                fl,
+                tma_in,
+                llm,
+                fl2,
+                tts,
+                tma_out,
+            ],
+        )
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            # Kick off the conversation.
+            messages.append(
+                {"role": "system", "content": "Please introduce yourself to the user."})
+            await pipeline.queue_frames([LLMMessagesQueueFrame(messages)])
+
+        transport.transcription_settings["extra"]["endpointing"] = True
+        transport.transcription_settings["extra"]["punctuate"] = True
+        await transport.run(pipeline)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/src/examples/foundational/06a-image-sync.py
+++ b/src/examples/foundational/06a-image-sync.py
@@ -0,0 +1,122 @@
+import argparse
+import asyncio
+import os
+import logging
+from typing import AsyncGenerator
+import aiohttp
+import requests
+import time
+import urllib.parse
+from PIL import Image
+
+from dailyai.pipeline.frames import ImageFrame, Frame
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.ai_services import AIService
+from dailyai.pipeline.aggregators import (
+    LLMAssistantContextAggregator,
+    LLMUserContextAggregator,
+)
+from dailyai.services.open_ai_services import OpenAILLMService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.services.fal_ai_services import FalImageGenService
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+class ImageSyncAggregator(AIService):
+    def __init__(self, speaking_path: str, waiting_path: str):
+        self._speaking_image = Image.open(speaking_path)
+        self._speaking_image_bytes = self._speaking_image.tobytes()
+
+        self._waiting_image = Image.open(waiting_path)
+        self._waiting_image_bytes = self._waiting_image.tobytes()
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        yield ImageFrame(None, self._speaking_image_bytes)
+        yield frame
+        yield ImageFrame(None, self._waiting_image_bytes)
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            token,
+            "Respond bot",
+            5,
+        )
+        transport._camera_enabled = True
+        transport._camera_width = 1024
+        transport._camera_height = 1024
+        transport._mic_enabled = True
+        transport._mic_sample_rate = 16000
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        img = FalImageGenService(
+            image_size="1024x1024",
+            aiohttp_session=session,
+            key_id=os.getenv("FAL_KEY_ID"),
+            key_secret=os.getenv("FAL_KEY_SECRET"),
+        )
+
+        async def get_images():
+            get_speaking_task = asyncio.create_task(
+                img.run_image_gen("An image of a cat speaking")
+            )
+            get_waiting_task = asyncio.create_task(
+                img.run_image_gen("An image of a cat waiting")
+            )
+
+            (speaking_data, waiting_data) = await asyncio.gather(
+                get_speaking_task, get_waiting_task
+            )
+
+            return speaking_data, waiting_data
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            await tts.say("Hi, I'm listening!", transport.send_queue)
+
+        async def handle_transcriptions():
+            messages = [
+                {
+                    "role": "system",
+                    "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
+                },
+            ]
+
+            tma_in = LLMUserContextAggregator(
+                messages, transport._my_participant_id)
+            tma_out = LLMAssistantContextAggregator(
+                messages, transport._my_participant_id
+            )
+            image_sync_aggregator = ImageSyncAggregator(
+                os.path.join(
+                    os.path.dirname(__file__), "assets", "speaking.png"), os.path.join(
+                    os.path.dirname(__file__), "assets", "waiting.png"), )
+            await tts.run_to_queue(
+                transport.send_queue,
+                image_sync_aggregator.run(
+                    tma_out.run(llm.run(tma_in.run(transport.get_receive_frames())))
+                ),
+            )
+
+        transport.transcription_settings["extra"]["punctuate"] = True
+        await asyncio.gather(transport.run(), handle_transcriptions())
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/src/examples/foundational/07-interruptible.py
+++ b/src/examples/foundational/07-interruptible.py
@@ -0,0 +1,74 @@
+import asyncio
+import aiohttp
+import logging
+import os
+from dailyai.pipeline.aggregators import (
+    LLMAssistantContextAggregator,
+    LLMResponseAggregator,
+    LLMUserContextAggregator,
+    UserResponseAggregator,
+)
+
+from dailyai.pipeline.pipeline import Pipeline
+from dailyai.services.ai_services import FrameLogger
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.open_ai_services import OpenAILLMService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            token,
+            "Respond bot",
+            duration_minutes=5,
+            start_transcription=True,
+            mic_enabled=True,
+            mic_sample_rate=16000,
+            camera_enabled=False,
+            vad_enabled=True,
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        pipeline = Pipeline([FrameLogger(), llm, FrameLogger(), tts])
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            await transport.say("Hi, I'm listening!", tts)
+
+        async def run_conversation():
+            messages = [
+                {
+                    "role": "system",
+                    "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
+                },
+            ]
+
+            await transport.run_interruptible_pipeline(
+                pipeline,
+                post_processor=LLMResponseAggregator(messages),
+                pre_processor=UserResponseAggregator(messages),
+            )
+
+        transport.transcription_settings["extra"]["punctuate"] = False
+        await asyncio.gather(transport.run(), run_conversation())
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/src/examples/foundational/08-bots-arguing.py
+++ b/src/examples/foundational/08-bots-arguing.py
@@ -0,0 +1,143 @@
+from typing import Tuple
+import aiohttp
+import asyncio
+import logging
+import os
+from dailyai.pipeline.aggregators import SentenceAggregator
+from dailyai.pipeline.pipeline import Pipeline
+
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.services.fal_ai_services import FalImageGenService
+from dailyai.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesQueueFrame, TextFrame
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url: str):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            None,
+            "Respond bot",
+            duration_minutes=10,
+            mic_enabled=True,
+            mic_sample_rate=16000,
+            camera_enabled=True,
+            camera_width=1024,
+            camera_height=1024,
+        )
+
+        llm = AzureLLMService(
+            api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
+            endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
+            model=os.getenv("AZURE_CHATGPT_MODEL"),
+        )
+        tts1 = AzureTTSService(
+            api_key=os.getenv("AZURE_SPEECH_API_KEY"),
+            region=os.getenv("AZURE_SPEECH_REGION"),
+        )
+        tts2 = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id="jBpfuIE2acCO8z3wKNLl",
+        )
+        dalle = FalImageGenService(
+            image_size="1024x1024",
+            aiohttp_session=session,
+            key_id=os.getenv("FAL_KEY_ID"),
+            key_secret=os.getenv("FAL_KEY_SECRET"),
+        )
+
+        bot1_messages = [
+            {
+                "role": "system",
+                "content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long.",
+            },
+        ]
+        bot2_messages = [
+            {
+                "role": "system",
+                "content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich.",
+            },
+        ]
+
+        async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
+            """This function streams text from the LLM and uses the TTS service to convert
+             that text to speech as it's received. """
+            source_queue = asyncio.Queue()
+            sink_queue = asyncio.Queue()
+            sentence_aggregator = SentenceAggregator()
+            pipeline = Pipeline(
+                [llm, sentence_aggregator, tts1], source_queue, sink_queue
+            )
+
+            await source_queue.put(LLMMessagesQueueFrame(messages))
+            await source_queue.put(EndFrame())
+            await pipeline.run_pipeline()
+
+            message = ""
+            all_audio = bytearray()
+            while sink_queue.qsize():
+                frame = sink_queue.get_nowait()
+                if isinstance(frame, TextFrame):
+                    message += frame.text
+                elif isinstance(frame, AudioFrame):
+                    all_audio.extend(frame.data)
+
+            return (message, all_audio)
+
+        async def get_bot1_statement():
+            message, audio = await get_text_and_audio(bot1_messages)
+
+            bot1_messages.append({"role": "assistant", "content": message})
+            bot2_messages.append({"role": "user", "content": message})
+
+            return audio
+
+        async def get_bot2_statement():
+            message, audio = await get_text_and_audio(bot2_messages)
+
+            bot2_messages.append({"role": "assistant", "content": message})
+            bot1_messages.append({"role": "user", "content": message})
+
+            return audio
+
+        async def argue():
+            for i in range(100):
+                print(f"In iteration {i}")
+
+                bot1_description = "A woman conservatively dressed as a librarian in a library surrounded by books, cartoon, serious, highly detailed"
+
+                (audio1, image_data1) = await asyncio.gather(
+                    get_bot1_statement(), dalle.run_image_gen(bot1_description)
+                )
+                await transport.send_queue.put(
+                    [
+                        ImageFrame(None, image_data1[1]),
+                        AudioFrame(audio1),
+                    ]
+                )
+
+                bot2_description = "A cat dressed in a hot dog costume, cartoon, bright colors, funny, highly detailed"
+
+                (audio2, image_data2) = await asyncio.gather(
+                    get_bot2_statement(), dalle.run_image_gen(bot2_description)
+                )
+                await transport.send_queue.put(
+                    [
+                        ImageFrame(None, image_data2[1]),
+                        AudioFrame(audio2),
+                    ]
+                )
+
+        await asyncio.gather(transport.run(), argue())
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url))
--- a/src/examples/foundational/10-wake-word.py
+++ b/src/examples/foundational/10-wake-word.py
@@ -0,0 +1,186 @@
+import aiohttp
+import asyncio
+import logging
+import os
+import random
+from typing import AsyncGenerator
+from PIL import Image
+
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.open_ai_services import OpenAILLMService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.pipeline.aggregators import (
+    LLMUserContextAggregator,
+    LLMAssistantContextAggregator,
+)
+from dailyai.pipeline.frames import (
+    Frame,
+    TextFrame,
+    ImageFrame,
+    SpriteFrame,
+    TranscriptionQueueFrame,
+)
+from dailyai.services.ai_services import AIService
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+sprites = {}
+image_files = [
+    "sc-default.png",
+    "sc-talk.png",
+    "sc-listen-1.png",
+    "sc-think-1.png",
+    "sc-think-2.png",
+    "sc-think-3.png",
+    "sc-think-4.png",
+]
+
+script_dir = os.path.dirname(__file__)
+
+for file in image_files:
+    # Build the full path to the image file
+    full_path = os.path.join(script_dir, "assets", file)
+    # Get the filename without the extension to use as the dictionary key
+    filename = os.path.splitext(os.path.basename(full_path))[0]
+    # Open the image and convert it to bytes
+    with Image.open(full_path) as img:
+        sprites[file] = img.tobytes()
+
+# When the bot isn't talking, show a static image of the cat listening
+quiet_frame = ImageFrame("", sprites["sc-listen-1.png"])
+# When the bot is talking, build an animation from two sprites
+talking_list = [sprites["sc-default.png"], sprites["sc-talk.png"]]
+talking = [random.choice(talking_list) for x in range(30)]
+talking_frame = SpriteFrame(images=talking)
+
+# TODO: Support "thinking" as soon as we get a valid transcript, while LLM
+# is processing
+thinking_list = [
+    sprites["sc-think-1.png"],
+    sprites["sc-think-2.png"],
+    sprites["sc-think-3.png"],
+    sprites["sc-think-4.png"],
+]
+thinking_frame = SpriteFrame(images=thinking_list)
+
+
+class TranscriptFilter(AIService):
+    def __init__(self, bot_participant_id=None):
+        self.bot_participant_id = bot_participant_id
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, TranscriptionQueueFrame):
+            if frame.participantId != self.bot_participant_id:
+                yield frame
+
+
+class NameCheckFilter(AIService):
+    def __init__(self, names: list[str]):
+        self.names = names
+        self.sentence = ""
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        content: str = ""
+
+        # TODO: split up transcription by participant
+        if isinstance(frame, TextFrame):
+            content = frame.text
+
+        self.sentence += content
+        if self.sentence.endswith((".", "?", "!")):
+            if any(name in self.sentence for name in self.names):
+                out = self.sentence
+                self.sentence = ""
+                yield TextFrame(out)
+            else:
+                out = self.sentence
+                self.sentence = ""
+
+
+class ImageSyncAggregator(AIService):
+    def __init__(self):
+        pass
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        yield talking_frame
+        yield frame
+        yield quiet_frame
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            token,
+            "Santa Cat",
+            duration_minutes=3,
+            start_transcription=True,
+            mic_enabled=True,
+            mic_sample_rate=16000,
+            camera_enabled=True,
+            camera_width=720,
+            camera_height=1280,
+        )
+        transport._mic_enabled = True
+        transport._mic_sample_rate = 16000
+        transport._camera_enabled = True
+        transport._camera_width = 720
+        transport._camera_height = 1280
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id="jBpfuIE2acCO8z3wKNLl",
+        )
+        isa = ImageSyncAggregator()
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            await tts.say(
+                "Hi! If you want to talk to me, just say 'hey Santa Cat'.",
+                transport.send_queue,
+            )
+
+        async def handle_transcriptions():
+            messages = [
+                {
+                    "role": "system",
+                    "content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while.  Your responses should only be a few sentences long.",
+                },
+            ]
+
+            tma_in = LLMUserContextAggregator(
+                messages, transport._my_participant_id)
+            tma_out = LLMAssistantContextAggregator(
+                messages, transport._my_participant_id
+            )
+            tf = TranscriptFilter(transport._my_participant_id)
+            ncf = NameCheckFilter(["Santa Cat", "Santa"])
+            await tts.run_to_queue(
+                transport.send_queue,
+                isa.run(
+                    tma_out.run(
+                        llm.run(
+                            tma_in.run(ncf.run(tf.run(transport.get_receive_frames())))
+                        )
+                    )
+                ),
+            )
+
+        async def starting_image():
+            await transport.send_queue.put(quiet_frame)
+
+        transport.transcription_settings["extra"]["punctuate"] = True
+        await asyncio.gather(transport.run(), handle_transcriptions(), starting_image())
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/src/examples/foundational/11-sound-effects.py
+++ b/src/examples/foundational/11-sound-effects.py
@@ -0,0 +1,138 @@
+import aiohttp
+import asyncio
+import logging
+import os
+import wave
+
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.open_ai_services import OpenAILLMService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.pipeline.aggregators import (
+    LLMContextAggregator,
+    LLMUserContextAggregator,
+    LLMAssistantContextAggregator,
+)
+from dailyai.services.ai_services import AIService, FrameLogger
+from dailyai.pipeline.frames import (
+    Frame,
+    AudioFrame,
+    LLMResponseEndFrame,
+    LLMMessagesQueueFrame,
+)
+from typing import AsyncGenerator
+
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+sounds = {}
+sound_files = ["ding1.wav", "ding2.wav"]
+
+script_dir = os.path.dirname(__file__)
+
+for file in sound_files:
+    # Build the full path to the image file
+    full_path = os.path.join(script_dir, "assets", file)
+    # Get the filename without the extension to use as the dictionary key
+    filename = os.path.splitext(os.path.basename(full_path))[0]
+    # Open the image and convert it to bytes
+    with wave.open(full_path) as audio_file:
+        sounds[file] = audio_file.readframes(-1)
+
+
+class OutboundSoundEffectWrapper(AIService):
+    def __init__(self):
+        pass
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, LLMResponseEndFrame):
+            yield AudioFrame(sounds["ding1.wav"])
+            # In case anything else up the stack needs it
+            yield frame
+        else:
+            yield frame
+
+
+class InboundSoundEffectWrapper(AIService):
+    def __init__(self):
+        pass
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, LLMMessagesQueueFrame):
+            yield AudioFrame(sounds["ding2.wav"])
+            # In case anything else up the stack needs it
+            yield frame
+        else:
+            yield frame
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            token,
+            "Respond bot",
+            duration_minutes=5,
+            mic_enabled=True,
+            mic_sample_rate=16000,
+            camera_enabled=False,
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id="ErXwobaYiN019PkySvjV",
+        )
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            await tts.say("Hi, I'm listening!", transport.send_queue)
+            await transport.send_queue.put(AudioFrame(sounds["ding1.wav"]))
+
+        async def handle_transcriptions():
+            messages = [
+                {
+                    "role": "system",
+                    "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
+                },
+            ]
+
+            tma_in = LLMUserContextAggregator(
+                messages, transport._my_participant_id)
+            tma_out = LLMAssistantContextAggregator(
+                messages, transport._my_participant_id
+            )
+            out_sound = OutboundSoundEffectWrapper()
+            in_sound = InboundSoundEffectWrapper()
+            fl = FrameLogger("LLM Out")
+            fl2 = FrameLogger("Transcription In")
+            await out_sound.run_to_queue(
+                transport.send_queue,
+                tts.run(
+                    fl.run(
+                        tma_out.run(
+                            llm.run(
+                                fl2.run(
+                                    in_sound.run(
+                                        tma_in.run(transport.get_receive_frames())
+                                    )
+                                )
+                            )
+                        )
+                    )
+                ),
+            )
+
+        transport.transcription_settings["extra"]["punctuate"] = True
+        await asyncio.gather(transport.run(), handle_transcriptions())
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/src/examples/foundational/12-describe-video.py
+++ b/src/examples/foundational/12-describe-video.py
@@ -0,0 +1,97 @@
+import asyncio
+import aiohttp
+import logging
+import os
+from typing import AsyncGenerator
+
+from dailyai.pipeline.frames import Frame, LLMMessagesQueueFrame, RequestVideoImageFrame, LLMResponseEndFrame
+from dailyai.pipeline.pipeline import Pipeline
+from dailyai.pipeline.frame_processor import FrameProcessor
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
+from dailyai.services.open_ai_services import OpenAILLMService, OpenAIVisionService
+from dailyai.services.deepgram_ai_services import DeepgramTTSService
+from dailyai.services.ai_services import FrameLogger
+from dailyai.pipeline.aggregators import (
+    LLMAssistantContextAggregator,
+    LLMUserContextAggregator,
+)
+from dailyai.pipeline.frames import VideoImageFrame, VisionFrame
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+class VideoImageFrameProcessor(FrameProcessor):
+    def __init__(self):
+        pass
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, VideoImageFrame):
+            yield VisionFrame("Describe the image in one sentence.", frame.image)
+        else:
+            yield frame
+
+
+class ImageRefresher(FrameProcessor):
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, LLMResponseEndFrame):
+            yield RequestVideoImageFrame(participantId=None)
+            yield frame
+        else:
+            yield frame
+
+
+async def main(room_url: str, token):
+    async with aiohttp.ClientSession() as session:
+        transport = DailyTransportService(
+            room_url,
+            token,
+            "Respond bot",
+            duration_minutes=5,
+            start_transcription=True,
+            mic_enabled=True,
+            mic_sample_rate=16000,
+            camera_enabled=False,
+            vad_enabled=True,
+            receive_video=True,
+            receive_video_fps=0
+        )
+
+        tts = ElevenLabsTTSService(
+            aiohttp_session=session,
+            api_key=os.getenv("ELEVENLABS_API_KEY"),
+            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
+        )
+
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_CHATGPT_API_KEY"),
+            model="gpt-4-turbo-preview")
+
+        vs = OpenAIVisionService(api_key=os.getenv("OPENAI_CHATGPT_API_KEY"))
+        vifp = VideoImageFrameProcessor()
+        ir = ImageRefresher()
+        pipeline = Pipeline(
+            processors=[
+                vifp,
+                vs,
+                llm,
+                tts,
+                ir,
+            ],
+        )
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            await pipeline.queue_frames([RequestVideoImageFrame(participantId=None)])
+
+        transport.transcription_settings["extra"]["endpointing"] = True
+        transport.transcription_settings["extra"]["punctuate"] = True
+        await transport.run(pipeline)
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/src/examples/foundational/13-whisper-transcription.py
+++ b/src/examples/foundational/13-whisper-transcription.py
@@ -0,0 +1,43 @@
+import asyncio
+import logging
+
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.whisper_ai_services import WhisperSTTService
+from examples.support.runner import configure
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url: str):
+    transport = DailyTransportService(
+        room_url,
+        None,
+        "Transcription bot",
+        start_transcription=True,
+        mic_enabled=False,
+        camera_enabled=False,
+        speaker_enabled=True,
+    )
+
+    stt = WhisperSTTService()
+    transcription_output_queue = asyncio.Queue()
+
+    async def handle_transcription():
+        print("`````````TRANSCRIPTION`````````")
+        while True:
+            item = await transcription_output_queue.get()
+            print(item.text)
+
+    async def handle_speaker():
+        await stt.run_to_queue(
+            transcription_output_queue, transport.get_receive_frames()
+        )
+
+    await asyncio.gather(transport.run(), handle_speaker(), handle_transcription())
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url))
--- a/src/examples/foundational/13a-whisper-local.py
+++ b/src/examples/foundational/13a-whisper-local.py
@@ -0,0 +1,67 @@
+import argparse
+import asyncio
+import logging
+import wave
+from dailyai.pipeline.frames import EndFrame, TranscriptionQueueFrame
+
+from dailyai.services.local_transport_service import LocalTransportService
+from dailyai.services.whisper_ai_services import WhisperSTTService
+
+logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
+logger = logging.getLogger("dailyai")
+logger.setLevel(logging.DEBUG)
+
+
+async def main(room_url: str):
+    global transport
+    global stt
+
+    meeting_duration_minutes = 1
+    transport = LocalTransportService(
+        mic_enabled=True,
+        camera_enabled=False,
+        speaker_enabled=True,
+        duration_minutes=meeting_duration_minutes,
+        start_transcription=True,
+    )
+    stt = WhisperSTTService()
+    transcription_output_queue = asyncio.Queue()
+    transport_done = asyncio.Event()
+
+    async def handle_transcription():
+        print("`````````TRANSCRIPTION`````````")
+        while not transport_done.is_set():
+            item = await transcription_output_queue.get()
+            print("got item from queue", item)
+            if isinstance(item, TranscriptionQueueFrame):
+                print(item.text)
+            elif isinstance(item, EndFrame):
+                break
+        print("handle_transcription done")
+
+    async def handle_speaker():
+        await stt.run_to_queue(
+            transcription_output_queue, transport.get_receive_frames()
+        )
+        await transcription_output_queue.put(EndFrame())
+        print("handle speaker done.")
+
+    async def run_until_done():
+        await transport.run()
+        transport_done.set()
+        print("run_until_done done")
+
+    await asyncio.gather(run_until_done(), handle_speaker(), handle_transcription())
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
+    parser.add_argument(
+        "-u",
+        "--url",
+        type=str,
+        required=True,
+        help="URL of the Daily room to join")
+
+    args, unknown = parser.parse_known_args()
+    asyncio.run(main(args.url))
--- a/src/examples/foundational/assets/ding1.wav
+++ b/src/examples/foundational/assets/ding1.wav
--- a/src/examples/foundational/assets/ding2.wav
+++ b/src/examples/foundational/assets/ding2.wav
--- a/src/samples/deprecated/static-sprite/sprites/intro.png
+++ b/src/samples/deprecated/static-sprite/sprites/intro.png
--- a/src/samples/deprecated/static-sprite/sprites/wait.png
+++ b/src/samples/deprecated/static-sprite/sprites/wait.png
--- a/src/examples/foundational/assets/sc-listen-2.png
+++ b/src/examples/foundational/assets/sc-listen-2.png
--- a/src/samples/deprecated/static-sprite/sprites/talk-1.png
+++ b/src/samples/deprecated/static-sprite/sprites/talk-1.png
--- a/src/samples/deprecated/static-sprite/sprites/talk-2.png
+++ b/src/samples/deprecated/static-sprite/sprites/talk-2.png
--- a/src/examples/foundational/assets/sc-think-2.png
+++ b/src/examples/foundational/assets/sc-think-2.png
--- a/src/examples/foundational/assets/sc-think-3.png
+++ b/src/examples/foundational/assets/sc-think-3.png
--- a/src/examples/foundational/assets/sc-think-4.png
+++ b/src/examples/foundational/assets/sc-think-4.png
--- a/src/examples/foundational/assets/speaking.png
+++ b/src/examples/foundational/assets/speaking.png
--- a/src/examples/foundational/assets/waiting.png
+++ b/src/examples/foundational/assets/waiting.png
--- a/src/examples/image-gen.py
+++ b/src/examples/image-gen.py
@@ -7,11 +7,12 @@ import random

 from dailyai.services.daily_transport_service import DailyTransportService
 from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
-from dailyai.queue_frame import QueueFrame, FrameType
+from dailyai.pipeline.frames import Frame, FrameType
 from dailyai.services.fal_ai_services import FalImageGenService
 from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService

-async def main(room_url:str, token):
+
+async def main(room_url: str, token):
    global transport
    global llm
    global tts
@@ -22,44 +23,46 @@ async def main(room_url:str, token):
        "Imagebot",
        1,
    )
-    transport.mic_enabled = True
-    transport.camera_enabled = True
-    transport.mic_sample_rate = 16000
-    transport.camera_width = 1024
-    transport.camera_height = 1024
+    transport._mic_enabled = True
+    transport._camera_enabled = True
+    transport._mic_sample_rate = 16000
+    transport._camera_width = 1024
+    transport._camera_height = 1024

    llm = AzureLLMService()
    tts = AzureTTSService()
    img = FalImageGenService()

-
    async def handle_transcriptions():
        print("handle_transcriptions got called")

        sentence = ""
        async for message in transport.get_transcriptions():
            print(f"transcription message: {message}")
-            if message["session_id"] == transport.my_participant_id:
+            if message["session_id"] == transport._my_participant_id:
                continue
-            finder =  message["text"].find("start over")
+            finder = message["text"].find("start over")
            print(f"finder: {finder}")
            if finder >= 0:
                async for audio in tts.run_tts(f"Resetting."):
-                    transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
+                    transport.output_queue.put(
+                        Frame(FrameType.AUDIO_FRAME, audio))
                sentence = ""
                continue
-            # todo: we could differentiate between transcriptions from different participants
+            # todo: we could differentiate between transcriptions from
+            # different participants
            sentence += f" {message['text']}"
            print(f"sentence is now: {sentence}")
            # TODO: Cache this audio
-            phrase = random.choice(["OK.", "Got it.", "Sure.", "You bet.", "Sure thing."])
+            phrase = random.choice(
+                ["OK.", "Got it.", "Sure.", "You bet.", "Sure thing."])
            async for audio in tts.run_tts(phrase):
-                transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
+                transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
            img_result = img.run_image_gen(sentence, "1024x1024")
            awaited_img = await asyncio.gather(img_result)
            transport.output_queue.put(
                [
-                    QueueFrame(FrameType.IMAGE_FRAME, awaited_img[0][1]),
+                    Frame(FrameType.IMAGE_FRAME, awaited_img[0][1]),
                ]
            )

@@ -69,9 +72,10 @@ async def main(room_url:str, token):
        if participant["info"]["isLocal"]:
            return
        async for audio in tts.run_tts("Describe an image, and I'll create it."):
-            audio_generator = tts.run_tts(f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
+            audio_generator = tts.run_tts(
+                f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
            async for audio in audio_generator:
-                transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
+                transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))

    transport.transcription_settings["extra"]["punctuate"] = False
    transport.transcription_settings["extra"]["endpointing"] = False
@@ -81,8 +85,11 @@ async def main(room_url:str, token):
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
    parser.add_argument(
-        "-u", "--url", type=str, required=True, help="URL of the Daily room to join"
-    )
+        "-u",
+        "--url",
+        type=str,
+        required=True,
+        help="URL of the Daily room to join")
    parser.add_argument(
        "-k",
        "--apikey",
@@ -93,20 +100,25 @@ if __name__ == "__main__":

    args, unknown = parser.parse_known_args()

-    # Create a meeting token for the given room with an expiration 1 hour in the future.
+    # Create a meeting token for the given room with an expiration 1 hour in
+    # the future.
    room_name: str = urllib.parse.urlparse(args.url).path[1:]
    expiration: float = time.time() + 60 * 60

    res: requests.Response = requests.post(
        f"https://api.daily.co/v1/meeting-tokens",
-        headers={"Authorization": f"Bearer {args.apikey}"},
+        headers={
+            "Authorization": f"Bearer {args.apikey}"},
        json={
-            "properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
-        },
+            "properties": {
+                "room_name": room_name,
+                "is_owner": True,
+                "exp": expiration}},
    )

    if res.status_code != 200:
-        raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
+        raise Exception(
+            f"Failed to create meeting token: {res.status_code} {res.text}")

    token: str = res.json()["token"]

--- a/src/examples/internal/11a-dial-out.py
+++ b/src/examples/internal/11a-dial-out.py
@@ -0,0 +1,134 @@
+import aiohttp
+import asyncio
+import os
+import wave
+
+from dailyai.services.daily_transport_service import DailyTransportService
+from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
+from dailyai.pipeline.aggregators import LLMContextAggregator
+from dailyai.services.ai_services import AIService, FrameLogger
+from dailyai.pipeline.frames import Frame, AudioFrame, LLMResponseEndFrame, LLMMessagesQueueFrame
+from typing import AsyncGenerator
+
+from examples.support.runner import configure
+
+sounds = {}
+sound_files = [
+    'ding1.wav',
+    'ding2.wav'
+]
+
+script_dir = os.path.dirname(__file__)
+
+for file in sound_files:
+    # Build the full path to the image file
+    full_path = os.path.join(script_dir, "assets", file)
+    # Get the filename without the extension to use as the dictionary key
+    filename = os.path.splitext(os.path.basename(full_path))[0]
+    # Open the image and convert it to bytes
+    with wave.open(full_path) as audio_file:
+        sounds[file] = audio_file.readframes(-1)
+
+
+class OutboundSoundEffectWrapper(AIService):
+    def __init__(self):
+        pass
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, LLMResponseEndFrame):
+            yield AudioFrame(sounds["ding1.wav"])
+            # In case anything else up the stack needs it
+            yield frame
+        else:
+            yield frame
+
+
+class InboundSoundEffectWrapper(AIService):
+    def __init__(self):
+        pass
+
+    async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
+        if isinstance(frame, LLMMessagesQueueFrame):
+            yield AudioFrame(sounds["ding2.wav"])
+            # In case anything else up the stack needs it
+            yield frame
+        else:
+            yield frame
+
+
+async def main(room_url: str, token, phone):
+    async with aiohttp.ClientSession() as session:
+
+        global transport
+        global llm
+        global tts
+
+        transport = DailyTransportService(
+            room_url,
+            token,
+            "Respond bot",
+            300,
+        )
+        transport._mic_enabled = True
+        transport._mic_sample_rate = 16000
+        transport._camera_enabled = False
+
+        llm = AzureLLMService()
+        tts = AzureTTSService()
+
+        @transport.event_handler("on_first_other_participant_joined")
+        async def on_first_other_participant_joined(transport):
+            await tts.say("Hi, I'm listening!", transport.send_queue)
+            await transport.send_queue.put(AudioFrame(sounds["ding1.wav"]))
+
+        async def handle_transcriptions():
+            messages = [
+                {"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
+            ]
+
+            tma_in = LLMContextAggregator(
+                messages, "user", transport._my_participant_id
+            )
+            tma_out = LLMContextAggregator(
+                messages, "assistant", transport._my_participant_id
+            )
+            out_sound = OutboundSoundEffectWrapper()
+            in_sound = InboundSoundEffectWrapper()
+            fl = FrameLogger("LLM Out")
+            fl2 = FrameLogger("Transcription In")
+            await out_sound.run_to_queue(
+                transport.send_queue,
+                tts.run(
+                    tma_out.run(
+                        llm.run(
+                            fl2.run(
+                                in_sound.run(
+                                    tma_in.run(
+                                        transport.get_receive_frames()
+                                    )
+                                )
+                            )
+                        )
+                    )
+                )
+            )
+
+        @transport.event_handler("on_participant_joined")
+        async def pax_joined(transport, pax):
+            print(f"PARTICIPANT JOINED: {pax}")
+
+        @transport.event_handler("on_call_state_updated")
+        async def on_call_state_updated(transport, state):
+            if (state == "joined"):
+                if (phone):
+                    transport.start_recording()
+                    transport.dialout(phone)
+
+        transport.transcription_settings["extra"]["punctuate"] = True
+
+        await asyncio.gather(transport.run(), handle_transcriptions())
+
+
+if __name__ == "__main__":
+    (url, token) = configure()
+    asyncio.run(main(url, token))
--- a/src/examples/server/Dockerfile
+++ b/src/examples/server/Dockerfile
@@ -0,0 +1,39 @@
+# setup
+FROM python:3.11.5
+
+WORKDIR /app
+COPY requirements.txt /app
+COPY *.py /app
+COPY pyproject.toml /app
+
+COPY src/ /app/src/
+
+WORKDIR /app
+RUN ls --recursive /app/
+RUN pip3 install --upgrade -r requirements.txt
+RUN python -m build .
+RUN pip3 install .
+
+# If running on Ubuntu, Azure TTS requires some extra config
+# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi
+
+RUN wget -O - https://www.openssl.org/source/openssl-1.1.1w.tar.gz | tar zxf -
+WORKDIR openssl-1.1.1w
+RUN ./config --prefix=/usr/local
+RUN make -j $(nproc)
+RUN make install_sw install_ssldirs
+RUN ldconfig -v
+ENV SSL_CERT_DIR=/etc/ssl/certs
+
+#ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
+RUN apt clean
+RUN apt-get update
+RUN apt-get -y install build-essential libssl-dev ca-certificates libasound2 wget
+
+ENV PYTHONUNBUFFERED=1
+
+WORKDIR /app
+
+EXPOSE 8000
+# run
+CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]
--- a/src/examples/server/README.md
+++ b/src/examples/server/README.md
@@ -0,0 +1,13 @@
+# Server Example
+
+This is an example server based on [Santa Cat](https://santacat.ai). You can run the server with this command:
+
+```
+flask --app daily-bot-manager.py --debug run
+```
+
+Once the server is started, you can load `http://127.0.0.1:5000/spin-up-kitty` in a browser, and the server will do the following:
+
+- Create a new, randomly-named Daily room with `DAILY_API_KEY` from your .env file or environment
+- Start the `10-wake-word.py` example and connect it to that room
+- 301 redirect your browser to the room
--- a/src/examples/server/auth.py
+++ b/src/examples/server/auth.py
@@ -0,0 +1,34 @@
+import time
+import urllib
+
+from dotenv import load_dotenv
+import requests
+from flask import jsonify
+import os
+
+load_dotenv()
+
+
+def get_meeting_token(room_name, daily_api_key, token_expiry):
+    api_path = os.getenv('DAILY_API_PATH') or 'https://api.daily.co/v1'
+
+    if not token_expiry:
+        token_expiry = time.time() + 600
+    res = requests.post(
+        f'{api_path}/meeting-tokens',
+        headers={
+            'Authorization': f'Bearer {daily_api_key}'},
+        json={
+            'properties': {
+                'room_name': room_name,
+                'is_owner': True,
+                'exp': token_expiry}})
+    if res.status_code != 200:
+        return jsonify(
+            {'error': 'Unable to create meeting token', 'detail': res.text}), 500
+    meeting_token = res.json()['token']
+    return meeting_token
+
+
+def get_room_name(room_url):
+    return urllib.parse.urlparse(room_url).path[1:]
--- a/src/examples/server/daily-bot-manager.py
+++ b/src/examples/server/daily-bot-manager.py
@@ -0,0 +1,103 @@
+import os
+import requests
+import subprocess
+import time
+
+from flask import Flask, jsonify, request, redirect
+from flask_cors import CORS
+from examples.server.auth import get_meeting_token
+
+from dotenv import load_dotenv
+
+load_dotenv()
+
+app = Flask(__name__)
+CORS(app)
+
+print(
+    f"I loaded an environment, and my FAL_KEY_ID is {os.getenv('FAL_KEY_ID')}")
+
+
+def start_bot(bot_path, args=None):
+    daily_api_key = os.getenv("DAILY_API_KEY")
+    api_path = os.getenv("DAILY_API_PATH") or "https://api.daily.co/v1"
+
+    timeout = int(os.getenv("DAILY_ROOM_TIMEOUT")
+                  or os.getenv("DAILY_BOT_MAX_DURATION") or 300)
+    exp = time.time() + timeout
+    res = requests.post(
+        f"{api_path}/rooms",
+        headers={"Authorization": f"Bearer {daily_api_key}"},
+        json={
+            "properties": {
+                "exp": exp,
+                "enable_chat": True,
+                "enable_emoji_reactions": True,
+                "eject_at_room_exp": True,
+                "enable_prejoin_ui": False,
+                "enable_recording": "cloud"
+            }
+        },
+    )
+    if res.status_code != 200:
+        return (
+            jsonify(
+                {
+                    "error": "Unable to create room",
+                    "status_code": res.status_code,
+                    "text": res.text,
+                }
+            ),
+            500,
+        )
+    room_url = res.json()["url"]
+    room_name = res.json()["name"]
+
+    meeting_token = get_meeting_token(room_name, daily_api_key, exp)
+
+    if args:
+        extra_args = " ".join([f'-{x[0]} "{x[1]}"' for x in args])
+    else:
+        extra_args = ""
+
+    proc = subprocess.Popen(
+        [f"python {bot_path} -u {room_url} -t {meeting_token} -k {daily_api_key} {extra_args}"],
+        shell=True,
+        bufsize=1,
+    )
+
+    # Don't return until the bot has joined the room, but wait for at most 2
+    # seconds.
+    attempts = 0
+    while attempts < 20:
+        time.sleep(0.1)
+        attempts += 1
+        res = requests.get(
+            f"{api_path}/rooms/{room_name}/get-session-data",
+            headers={"Authorization": f"Bearer {daily_api_key}"},
+        )
+        if res.status_code == 200:
+            break
+    print(f"Took {attempts} attempts to join room {room_name}")
+
+    # Additional client config
+    config = {}
+    if os.getenv("CLIENT_VAD_TIMEOUT_SEC"):
+        config['vad_timeout_sec'] = float(
+            os.getenv("DAILY_CLIENT_VAD_TIMEOUT_SEC"))
+    else:
+        config['vad_timeout_sec'] = 1.5
+
+    # return jsonify({"room_url": room_url, "token": meeting_token, "config":
+    # config}), 200
+    return redirect(room_url, code=301)
+
+
+@app.route("/spin-up-kitty", methods=["GET", "POST"])
+def spin_up_kitty():
+    return start_bot("./src/examples/foundational/10-wake-word.py")
+
+
+@app.route("/healthz")
+def health_check():
+    return "ok", 200
--- a/src/examples/starter-apps/assets/clack-short-quiet.wav
+++ b/src/examples/starter-apps/assets/clack-short-quiet.wav
--- a/src/examples/starter-apps/assets/clack-short.wav
+++ b/src/examples/starter-apps/assets/clack-short.wav
--- a/src/examples/starter-apps/assets/clack.wav
+++ b/src/examples/starter-apps/assets/clack.wav
--- a/src/examples/starter-apps/assets/ding.wav
+++ b/src/examples/starter-apps/assets/ding.wav
--- a/src/examples/starter-apps/assets/ding2.wav
+++ b/src/examples/starter-apps/assets/ding2.wav
--- a/src/examples/starter-apps/assets/ding3.wav
+++ b/src/examples/starter-apps/assets/ding3.wav
--- a/src/examples/starter-apps/assets/grandma-listening.png
+++ b/src/examples/starter-apps/assets/grandma-listening.png
--- a/src/examples/starter-apps/assets/grandma-writing.png
+++ b/src/examples/starter-apps/assets/grandma-writing.png
--- a/src/examples/starter-apps/assets/listening.wav
+++ b/src/examples/starter-apps/assets/listening.wav
--- a/src/examples/starter-apps/assets/robot01.png
+++ b/src/examples/starter-apps/assets/robot01.png
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Chad Bailey	c73fb4750f	added fuzz example	2024-03-22 14:20:16 +00:00
Chad Bailey	34b10cb4c7	wip	2024-03-19 22:04:47 +00:00
Chad Bailey	e726f15c4e	wip: telestrator	2024-03-19 15:31:19 +00:00
Chad Bailey	25ca8b751e	cleanup	2024-03-19 03:08:04 +00:00
Chad Bailey	0b4b63d2ee	Working vision example	2024-03-19 01:51:36 +00:00
Chad Bailey	6c9425d66a	wip: video image frames	2024-03-18 22:14:02 +00:00
Chad Bailey	6d3c52ae81	added app message	2024-03-18 19:52:31 +00:00
Aleix Conchillo Flaqué	2f4e31d1b2	Merge pull request #69 from daily-co/add-github-linting-workflow github: add linting workflow	2024-03-19 02:46:50 +08:00
Aleix Conchillo Flaqué	9385270775	autopep8 formatting	2024-03-18 11:28:32 -07:00
Aleix Conchillo Flaqué	2914e43350	github: add linting workflow	2024-03-18 11:28:06 -07:00
chadbailey59	78638d2dba	Live translation (#61 ) * added translator * fixup	2024-03-18 13:26:05 -05:00
Aleix Conchillo Flaqué	141a5bb548	Merge pull request #68 from daily-co/log-transcription-errors daily: log transcription errors	2024-03-19 01:53:40 +08:00
Aleix Conchillo Flaqué	3957813202	Merge pull request #67 from daily-co/add-dot-env-template add dot-env.template	2024-03-19 01:49:21 +08:00
Aleix Conchillo Flaqué	549862ef99	daily: log transcription errors	2024-03-18 10:47:20 -07:00
Aleix Conchillo Flaqué	1000ca5b55	add dot-env.template	2024-03-18 10:43:57 -07:00
Moishe Lettvin	91dbfef4c3	Merge pull request #64 from daily-co/docs Some docs	2024-03-18 13:38:32 -04:00
Moishe Lettvin	3b61d0b41a	fix typos	2024-03-18 13:38:00 -04:00
Moishe Lettvin	bf3ae091b9	Merge pull request #62 from daily-co/anthropic-support Anthropic LLM service	2024-03-18 13:36:39 -04:00
Aleix Conchillo Flaqué	34ac796607	Merge pull request #66 from daily-co/daily-transport-release-client services: release daily client after leave	2024-03-19 01:36:22 +08:00
Aleix Conchillo Flaqué	e0551e9d85	services: release daily client after leave	2024-03-18 10:32:46 -07:00
Moishe Lettvin	b1ab6f91b9	Merge pull request #65 from daily-co/app-messages Support for app messages	2024-03-18 11:37:10 -04:00
Moishe Lettvin	58726dc20d	clean up imports	2024-03-18 10:14:51 -04:00
Moishe Lettvin	8e61fe8e36	Support for app messages	2024-03-18 10:08:41 -04:00
Moishe Lettvin	99b836c227	added docstrings to frames.	2024-03-18 09:08:12 -04:00
Moishe Lettvin	1c27f77f1a	drafty architecture doc	2024-03-18 08:39:50 -04:00
Moishe Lettvin	c91fa39a99	Remove testing code	2024-03-15 19:42:46 -04:00
Moishe Lettvin	eacaea7db4	Anthropic LLM service	2024-03-15 19:40:37 -04:00
Moishe Lettvin	c6dfcb6f7a	Merge pull request #60 from daily-co/remove-ai-service-methods Remove run_to_queue and run from AIService class	2024-03-15 15:28:28 -04:00
Moishe Lettvin	18bf26de14	Update apps	2024-03-15 13:39:33 -04:00
Moishe Lettvin	b8b35db89c	Remove run_to_queue and run from AIService class	2024-03-15 11:04:22 -04:00
Moishe Lettvin	358166f347	Merge pull request #59 from daily-co/remove-requirements Remove unused requirements file	2024-03-13 16:23:42 -04:00
Moishe Lettvin	c006c123b2	Remove unused requirements file	2024-03-13 16:19:03 -04:00
chadbailey59	cf302fb765	Storybot and Chatbot examples (#58 ) * storybot * storybot * added pipeline.queue_frames * fixup	2024-03-13 15:12:59 -05:00
Moishe Lettvin	e33820fe36	Merge pull request #56 from daily-co/fal-redux Use other model in FAL	2024-03-12 15:14:57 -04:00
Moishe Lettvin	b84b3d59f3	Use other model in FAL	2024-03-12 14:47:00 -04:00
Moishe Lettvin	7b5b88b99b	Merge pull request #55 from daily-co/fix-fal set FAL param correctly	2024-03-12 14:12:16 -04:00
Moishe Lettvin	e87196cce7	set FAL param correctly	2024-03-12 14:03:43 -04:00
chadbailey59	bbfc9e703b	intake cleanup (#54 )	2024-03-12 13:01:39 -05:00
Moishe Lettvin	c21a63d48b	Merge pull request #49 from daily-co/openai-base-llm Base OpenAI LLM service	2024-03-12 12:58:31 -04:00
Moishe Lettvin	f546bb32da	Make 08- work again	2024-03-12 10:34:52 -04:00
Moishe Lettvin	d9378e23ba	Base OpenAI LLM service	2024-03-11 16:52:41 -04:00
Moishe Lettvin	c75a3fb0d0	Merge pull request #53 from daily-co/fix_other_joined_event Don't do time-consuming processing in `on_other_joined_event`	2024-03-11 13:27:13 -04:00
Moishe Lettvin	f8ae264957	remove unnecessary print	2024-03-11 13:20:28 -04:00
Moishe Lettvin	977c12d530	undo fal change	2024-03-11 13:19:47 -04:00
Moishe Lettvin	61c55d2f47	Fix up other examples	2024-03-11 13:17:31 -04:00
Moishe Lettvin	fd2fa23e9c	Fix example 2	2024-03-11 13:00:29 -04:00
Moishe Lettvin	de026ccc8a	Merge pull request #50 from daily-co/khk/launch-samples Khk/launch samples	2024-03-11 12:50:38 -04:00
Moishe Lettvin	c5bb0e14ab	Merge pull request #51 from daily-co/khk/readme updated README	2024-03-11 12:50:22 -04:00
chadbailey59	a4f3c51184	the smallest commit in history	2024-03-11 09:47:00 -05:00
Moishe Lettvin	7786e685cc	Merge pull request #52 from daily-co/pypi-updates updates to pyproject.toml	2024-03-11 10:34:35 -04:00
Moishe Lettvin	33793ca9f8	update description	2024-03-11 07:31:39 -04:00
Moishe Lettvin	d26aede667	updates to pyproject.toml	2024-03-11 07:25:20 -04:00
Moishe Lettvin	ad993056d8	rename to dailyai	2024-03-11 07:16:20 -04:00
Kwindla Hultman Kramer	5b1f26aacb	updated README	2024-03-10 22:06:23 -07:00
Kwindla Hultman Kramer	4e16e514dd	attempting to change tts to deepgram in example 04	2024-03-10 19:43:06 -07:00
Kwindla Hultman Kramer	959ffa9d36	small streamlining of example 03	2024-03-10 19:42:19 -07:00
Kwindla Hultman Kramer	4396b1018a	small streamlining of example 02	2024-03-10 19:41:32 -07:00
Kwindla Hultman Kramer	37e904ce68	changed fal to a maybe slightly faster model	2024-03-10 19:40:51 -07:00
Kwindla Hultman Kramer	ef39d842a5	custom processor in example 05	2024-03-10 19:18:37 -07:00
Kwindla Hultman Kramer	72f631a066	working on foundational examples	2024-03-10 17:21:46 -07:00
chadbailey59	5d46302b9e	changed default services (#47 )	2024-03-08 15:36:30 -06:00
chadbailey59	8241dc0bed	cleaned up example logging (#46 )	2024-03-08 15:25:17 -06:00
Moishe Lettvin	95a1efbe75	Merge pull request #45 from daily-co/exception_handling_callbacks Wait for the callback's result, so exceptions get raised	2024-03-08 15:04:15 -05:00
Moishe Lettvin	e59df8476e	Wait for the callback's result, so exceptions get raised	2024-03-08 15:02:15 -05:00
chadbailey59	824df8ca7c	moved patient intake and example runner (#44 )	2024-03-08 12:07:51 -06:00
chadbailey59	0db8a51b27	cleaned up function calling frames (#43 )	2024-03-08 10:13:28 -06:00
chadbailey59	ce9c6ede66	function allowlist (#42 )	2024-03-08 08:49:09 -06:00
Moishe Lettvin	192b46bbab	Merge pull request #41 from daily-co/optimize-pipeline Optimize pipeline processing	2024-03-07 21:01:03 -05:00
Moishe Lettvin	196279e342	Add endframe to sample 4	2024-03-07 19:24:27 -05:00
Moishe Lettvin	edd93bc4cb	remove errant print statement	2024-03-07 19:05:03 -05:00
Moishe Lettvin	d0076dd4ee	Optimize pipeline processing so we don't wait for the completion of one generator to move onto the next.	2024-03-07 18:59:47 -05:00
chadbailey59	3c5f4800d4	Chad's big patient intake PR (#40 ) * at least it runs, kind of * wip * wip with user response aggregator * frame and pipeline docstrings * Getting started on docstrings * finish docstrings for aggregators * patient intake is working! * cleanup * cleanup --------- Co-authored-by: Moishe Lettvin <moishel@gmail.com>	2024-03-07 17:41:32 -06:00
Moishe Lettvin	2bcb4966d3	Merge pull request #39 from daily-co/docstrings Docstrings	2024-03-07 15:39:50 -05:00
Moishe Lettvin	b14f08a7d5	finish docstrings for aggregators	2024-03-07 15:16:23 -05:00
Moishe Lettvin	8fb92e3fd7	Getting started on docstrings	2024-03-07 12:51:19 -05:00
Moishe Lettvin	337ca7f581	frame and pipeline docstrings	2024-03-07 10:16:27 -05:00
Moishe Lettvin	eb430621f1	Merge pull request #37 from daily-co/fix-interruptible Fix interruptible pipeline runner and aggregator.	2024-03-07 09:09:41 -05:00
Moishe Lettvin	d5683c4f24	Fix interruptible pipeline runner and aggregator.	2024-03-07 09:05:49 -05:00
chadbailey59	b4505b7eff	added audio chunking for better interruption support (#35 )	2024-03-06 18:20:04 -06:00
Moishe Lettvin	3e46d28aff	Add start frame to interrupt loop	2024-03-06 15:58:19 -05:00
Moishe Lettvin	d3e76c4fd6	Merge pull request #34 from daily-co/rename-frames Remove Queue in frame names	2024-03-06 14:10:56 -05:00
Moishe Lettvin	62fd371b97	Remove Queue in frame names	2024-03-06 14:09:06 -05:00
Moishe Lettvin	b9556716dd	Merge pull request #33 from daily-co/pipeline-instead-of-nest Pipeline instead of nest	2024-03-05 11:04:20 -05:00
Moishe Lettvin	2708dcf7b5	Remove conversation wrapper	2024-03-04 14:07:49 -05:00
Moishe Lettvin	d3f86dab2e	starting on interruptions	2024-03-04 13:41:28 -05:00
Moishe Lettvin	18e7626b9f	Getting started on interruptible transport pipeline runner	2024-03-04 07:51:22 -05:00
Moishe Lettvin	763a50f8ec	First cut at sample 6 rewrite with pipelines	2024-03-04 07:28:10 -05:00
Moishe Lettvin	3b282cc921	some comments	2024-03-03 20:17:48 -05:00
Moishe Lettvin	434772dc23	Update sample 5!	2024-03-03 19:50:13 -05:00
Moishe Lettvin	15df4a9d58	cleanup, make sample 4 work with new stuff	2024-03-03 19:37:30 -05:00
Moishe Lettvin	643be238f9	getting started	2024-03-03 16:31:31 -05:00
chadbailey59	d90fdb1cae	Isolated changes to add VAD (#32 ) * added VAD * added separate 'vad enabled' property	2024-02-28 15:16:44 -06:00
Moishe Lettvin	f710aeae95	Merge pull request #30 from daily-co/unsub-video cleanup client properties and unsubscribe from camera	2024-02-27 13:16:20 -05:00
Moishe Lettvin	20091d91c9	cleanup client properties and unsubscribe from camera	2024-02-27 13:09:55 -05:00
Moishe Lettvin	92ec5641d4	update deepgram tts to new service structure	2024-02-14 13:44:59 -05:00
Moishe Lettvin	53e97bd872	Merge pull request #28 from daily-co/update-playht-service Update playht service	2024-02-14 12:54:34 -05:00
Moishe Lettvin	dcbd79333a	make destructor call client.close in PlayHT service	2024-02-14 12:53:20 -05:00
Moishe Lettvin	97a4cb8b7f	Update playht tts service	2024-02-14 12:40:13 -05:00
Moishe Lettvin	cc7877f626	Merge pull request #26 from daily-co/fix-sigint fix sigint handling	2024-02-14 12:11:44 -05:00
Moishe Lettvin	1992b7e79e	fix sigint handling	2024-02-14 12:10:47 -05:00
Moishe Lettvin	2516670874	Merge pull request #25 from daily-co/keyboard-interrupt Call client.leave on keyboard interrupt	2024-02-13 14:18:42 -05:00
Moishe Lettvin	4fecc10808	Call client.leave on keyboard interrupt	2024-02-13 14:17:09 -05:00
Moishe Lettvin	08144fc560	Merge pull request #24 from daily-co/another-formatting-pass Another autopep8 formatting pass	2024-02-10 09:39:51 -05:00
Moishe Lettvin	815aa2bc3e	Another autopep8 formatting pass	2024-02-10 09:29:08 -05:00
Moishe Lettvin	560c98f2fa	Merge pull request #23 from daily-co/ollama-service Ollama LLM service	2024-02-10 09:27:17 -05:00
Moishe Lettvin	0e0c992f59	Ollama LLM service	2024-02-10 09:22:52 -05:00
Moishe Lettvin	d76139ac1a	Merge pull request #22 from daily-co/temp-readme-patch Make the README okay-enough for limited public release	2024-02-09 11:57:39 -05:00
Moishe Lettvin	444418d94c	Make the README okay-enough for limited public release	2024-02-09 10:26:39 -05:00
Moishe Lettvin	d27122e35e	Create LICENSE	2024-02-09 09:10:28 -06:00
Chad Bailey	0ae83577c6	renamed samples to examples	2024-02-08 16:34:48 +00:00
Chad Bailey	5c402eee81	started adding docs	2024-02-08 16:31:17 +00:00
Moishe Lettvin	80750fe022	Remove old/deprecated/broken samples	2024-02-08 09:56:22 -05:00
Moishe Lettvin	ccfba04ea2	Remove mistakenly-added file	2024-02-08 09:55:28 -05:00
Moishe Lettvin	5b8198cf9e	Merge pull request #21 from daily-co/cleanup_constructor_args Cleanup constructor args in examples	2024-02-08 09:44:51 -05:00
Moishe Lettvin	3fa00c4db8	Cleanup constructor args in examples	2024-02-08 09:41:51 -05:00
Moishe Lettvin	4ce36f8c63	Merge pull request #20 from daily-co/base_transport Add a "Local Transport" as a proof of concept	2024-02-08 08:25:03 -05:00
Moishe Lettvin	9620080cc5	A little example cleanup	2024-02-08 08:24:25 -05:00
Moishe Lettvin	ee1ce8f288	Abstract base transport class & local transport class	2024-02-08 08:15:28 -05:00
chadbailey59	70d07b6ea2	WIP: environment cleanup (#19 ) * removed env var usage from SDK services * started consolidating configure.py * 1–3 work * cleaned up the rest * more cleanup * cleanup and 05 tinkering * made fal keys optional	2024-02-06 15:07:16 -06:00
Moishe Lettvin	9d5ad5675c	Fix 06- demo and also fix bugs where dangling sentences wouldn't be spoken	2024-02-01 12:54:23 -05:00
chadbailey59	0d96f91cde	Added sound effect example (#18 ) * added sound effect example * added dialout to this branch too * fixup * fixup for more dialout testing * cleanup	2024-02-01 10:26:50 -06:00
Moishe Lettvin	4e9586595d	minor cleanup	2024-01-29 15:06:39 -05:00
Moishe Lettvin	d0bcddfd70	Fix 06a-image-sync.py	2024-01-29 14:29:32 -05:00
Chad Bailey	065a213ebb	example renaming	2024-01-29 17:42:45 +00:00
Chad Bailey	7d6c94d604	added 09 examples	2024-01-29 17:39:28 +00:00
Chad Bailey	0859b57b00	Added 09 examples	2024-01-29 17:39:14 +00:00
Moishe Lettvin	09838c9b1f	Merge pull request #17 from daily-co/start_tests Add some basic daily_transport tests	2024-01-29 07:57:33 -05:00
Moishe Lettvin	c39920132c	Add some basic daily_transport tests	2024-01-29 07:56:12 -05:00
Moishe Lettvin	860129a4be	Merge pull request #16 from daily-co/image_tweaks Minor Cleanup	2024-01-27 19:10:52 -05:00
Moishe Lettvin	4416f36ae9	some minor cleanup, and coalesce image/images into one thing, and use itertools.cycle	2024-01-27 19:07:29 -05:00
chadbailey59	86af896150	Wake word and animation sprites (#15 ) * WIP: golden kitty * added web server * added health check * added flask to module build * trying requirements.txt * added dotenv * flask_cors * gunicorn * requirements cleanup * Dockerfile * WOOF * basic wake word * removed otel * basic animation kind of works * i think animation defeated me * added santa cat assets * cleanup * cleanup * server example and cleanup * more cleanup * fix up some class variable names * minor cleanup, remove mistakenly-added print and logger stuff * cleanup * cleanup --------- Co-authored-by: Moishe Lettvin <moishel@gmail.com>	2024-01-26 15:37:39 -06:00
Moishe Lettvin	5cbac4701b	minor cleanup, remove mistakenly-added print and logger stuff	2024-01-26 15:27:12 -05:00
Moishe Lettvin	5d9aa530e2	fix up some class variable names	2024-01-26 15:15:44 -05:00
Moishe Lettvin	d4c4d49035	Merge pull request #14 from daily-co/aiosessions Don't create aiohttp sessions inside services	2024-01-26 14:01:24 -05:00
Moishe Lettvin	e81f247845	Don't create aiohttp sessions inside services	2024-01-26 12:30:37 -05:00
Liza	8baf137511	prefix suspected private members (#13 )	2024-01-26 18:28:54 +01:00
Moishe Lettvin	fcceb32bd7	Merge pull request #12 from daily-co/frame_sync Speaking / waiting images	2024-01-26 10:17:01 -05:00
Moishe Lettvin	ead655fe23	some more fixup	2024-01-26 10:07:16 -05:00
Moishe Lettvin	bab102f197	little more cleanup	2024-01-26 09:54:51 -05:00
Moishe Lettvin	95fc802607	Speaking / waiting images	2024-01-26 09:15:29 -05:00
Moishe Lettvin	2886997693	Merge pull request #11 from daily-co/autopep Autopep linter fixes	2024-01-25 12:17:26 -05:00
Moishe Lettvin	5fdda43bed	Autopep linter fixes	2024-01-25 12:12:46 -05:00
Moishe Lettvin	f0d9b0613e	Add faster_whisper to module dependencies; remove unneeded import	2024-01-25 11:27:00 -05:00
Moishe Lettvin	a661905d7f	Merge pull request #9 from daily-co/interruptions Interruptable conversation wrapper	2024-01-25 11:24:57 -05:00
Moishe Lettvin	c9c2e5f561	Remove unnecessary try/except	2024-01-25 11:18:55 -05:00
Moishe Lettvin	795a339542	Add InterruptibleConversationWrapper	2024-01-25 11:15:04 -05:00
Liza	31db156dfc	Local Whisper transcription (#10 ) * First pass at Whisper transcription * deletions * Revise based on feedback, add autopep8	2024-01-25 13:43:25 +01:00
Moishe Lettvin	690cf2e47d	Merge pull request #8 from daily-co/queueframe-refactor Refactor QueueFrame	2024-01-23 13:15:11 -05:00
Moishe Lettvin	ba89e41c5b	remove commented-out code	2024-01-23 09:37:15 -05:00
Moishe Lettvin	c134598a77	Refactor QueueFrame	2024-01-23 09:33:51 -05:00
Liza	b51abd2969	facilitate manual call management (#7 )	2024-01-23 14:33:27 +01:00
Moishe Lettvin	3fda9b0ecb	Use more flexibile aggregator	2024-01-22 16:02:35 -05:00
Moishe Lettvin	95c92e5304	Aggregators for LLM messages	2024-01-22 10:59:13 -05:00
Moishe Lettvin	b443fbdb60	Very rough draft at intro/overview in README	2024-01-19 16:20:08 -05:00
Moishe Lettvin	ccd2fa31e5	Rename 'theoretical-to-real' samples to 'foundational'	2024-01-19 13:57:52 -05:00
Moishe Lettvin	9b65286216	Merge pull request #6 from daily-co/rm-sentence-aggregator Cleanup: no more sentence aggregator	2024-01-19 13:42:27 -05:00
Moishe Lettvin	6ae733ebfe	Cleanup: no more sentence aggregator, let the TTS service deal with that; also removed the queue typing stuff from ai_services	2024-01-19 13:06:15 -05:00
Liza	1071dede1a	Only initialize Daily once (#5 )	2024-01-19 14:59:48 +01:00