Compare commits

...

88 Commits

Author SHA1 Message Date
Moishe Lettvin
b168c53e44 Adding some more doscstrings, cleanup 2024-03-08 09:39:51 -05:00
chadbailey59
3c5f4800d4 Chad's big patient intake PR (#40)
* at least it runs, kind of

* wip

* wip with user response aggregator

* frame and pipeline docstrings

* Getting started on docstrings

* finish docstrings for aggregators

* patient intake is working!

* cleanup

* cleanup

---------

Co-authored-by: Moishe Lettvin <moishel@gmail.com>
2024-03-07 17:41:32 -06:00
Moishe Lettvin
2bcb4966d3 Merge pull request #39 from daily-co/docstrings
Docstrings
2024-03-07 15:39:50 -05:00
Moishe Lettvin
b14f08a7d5 finish docstrings for aggregators 2024-03-07 15:16:23 -05:00
Moishe Lettvin
8fb92e3fd7 Getting started on docstrings 2024-03-07 12:51:19 -05:00
Moishe Lettvin
337ca7f581 frame and pipeline docstrings 2024-03-07 10:16:27 -05:00
Moishe Lettvin
eb430621f1 Merge pull request #37 from daily-co/fix-interruptible
Fix interruptible pipeline runner and aggregator.
2024-03-07 09:09:41 -05:00
Moishe Lettvin
d5683c4f24 Fix interruptible pipeline runner and aggregator. 2024-03-07 09:05:49 -05:00
chadbailey59
b4505b7eff added audio chunking for better interruption support (#35) 2024-03-06 18:20:04 -06:00
Moishe Lettvin
3e46d28aff Add start frame to interrupt loop 2024-03-06 15:58:19 -05:00
Moishe Lettvin
d3e76c4fd6 Merge pull request #34 from daily-co/rename-frames
Remove Queue in frame names
2024-03-06 14:10:56 -05:00
Moishe Lettvin
62fd371b97 Remove Queue in frame names 2024-03-06 14:09:06 -05:00
Moishe Lettvin
b9556716dd Merge pull request #33 from daily-co/pipeline-instead-of-nest
Pipeline instead of nest
2024-03-05 11:04:20 -05:00
Moishe Lettvin
2708dcf7b5 Remove conversation wrapper 2024-03-04 14:07:49 -05:00
Moishe Lettvin
d3f86dab2e starting on interruptions 2024-03-04 13:41:28 -05:00
Moishe Lettvin
18e7626b9f Getting started on interruptible transport pipeline runner 2024-03-04 07:51:22 -05:00
Moishe Lettvin
763a50f8ec First cut at sample 6 rewrite with pipelines 2024-03-04 07:28:10 -05:00
Moishe Lettvin
3b282cc921 some comments 2024-03-03 20:17:48 -05:00
Moishe Lettvin
434772dc23 Update sample 5! 2024-03-03 19:50:13 -05:00
Moishe Lettvin
15df4a9d58 cleanup, make sample 4 work with new stuff 2024-03-03 19:37:30 -05:00
Moishe Lettvin
643be238f9 getting started 2024-03-03 16:31:31 -05:00
chadbailey59
d90fdb1cae Isolated changes to add VAD (#32)
* added VAD

* added separate 'vad enabled' property
2024-02-28 15:16:44 -06:00
Moishe Lettvin
f710aeae95 Merge pull request #30 from daily-co/unsub-video
cleanup client properties and unsubscribe from camera
2024-02-27 13:16:20 -05:00
Moishe Lettvin
20091d91c9 cleanup client properties and unsubscribe from camera 2024-02-27 13:09:55 -05:00
Moishe Lettvin
92ec5641d4 update deepgram tts to new service structure 2024-02-14 13:44:59 -05:00
Moishe Lettvin
53e97bd872 Merge pull request #28 from daily-co/update-playht-service
Update playht service
2024-02-14 12:54:34 -05:00
Moishe Lettvin
dcbd79333a make destructor call client.close in PlayHT service 2024-02-14 12:53:20 -05:00
Moishe Lettvin
97a4cb8b7f Update playht tts service 2024-02-14 12:40:13 -05:00
Moishe Lettvin
cc7877f626 Merge pull request #26 from daily-co/fix-sigint
fix sigint handling
2024-02-14 12:11:44 -05:00
Moishe Lettvin
1992b7e79e fix sigint handling 2024-02-14 12:10:47 -05:00
Moishe Lettvin
2516670874 Merge pull request #25 from daily-co/keyboard-interrupt
Call client.leave on keyboard interrupt
2024-02-13 14:18:42 -05:00
Moishe Lettvin
4fecc10808 Call client.leave on keyboard interrupt 2024-02-13 14:17:09 -05:00
Moishe Lettvin
08144fc560 Merge pull request #24 from daily-co/another-formatting-pass
Another autopep8 formatting pass
2024-02-10 09:39:51 -05:00
Moishe Lettvin
815aa2bc3e Another autopep8 formatting pass 2024-02-10 09:29:08 -05:00
Moishe Lettvin
560c98f2fa Merge pull request #23 from daily-co/ollama-service
Ollama LLM service
2024-02-10 09:27:17 -05:00
Moishe Lettvin
0e0c992f59 Ollama LLM service 2024-02-10 09:22:52 -05:00
Moishe Lettvin
d76139ac1a Merge pull request #22 from daily-co/temp-readme-patch
Make the README okay-enough for limited public release
2024-02-09 11:57:39 -05:00
Moishe Lettvin
444418d94c Make the README okay-enough for limited public release 2024-02-09 10:26:39 -05:00
Moishe Lettvin
d27122e35e Create LICENSE 2024-02-09 09:10:28 -06:00
Chad Bailey
0ae83577c6 renamed samples to examples 2024-02-08 16:34:48 +00:00
Chad Bailey
5c402eee81 started adding docs 2024-02-08 16:31:17 +00:00
Moishe Lettvin
80750fe022 Remove old/deprecated/broken samples 2024-02-08 09:56:22 -05:00
Moishe Lettvin
ccfba04ea2 Remove mistakenly-added file 2024-02-08 09:55:28 -05:00
Moishe Lettvin
5b8198cf9e Merge pull request #21 from daily-co/cleanup_constructor_args
Cleanup constructor args in examples
2024-02-08 09:44:51 -05:00
Moishe Lettvin
3fa00c4db8 Cleanup constructor args in examples 2024-02-08 09:41:51 -05:00
Moishe Lettvin
4ce36f8c63 Merge pull request #20 from daily-co/base_transport
Add a "Local Transport" as a proof of concept
2024-02-08 08:25:03 -05:00
Moishe Lettvin
9620080cc5 A little example cleanup 2024-02-08 08:24:25 -05:00
Moishe Lettvin
ee1ce8f288 Abstract base transport class & local transport class 2024-02-08 08:15:28 -05:00
chadbailey59
70d07b6ea2 WIP: environment cleanup (#19)
* removed env var usage from SDK services

* started consolidating configure.py

* 1–3 work

* cleaned up the rest

* more cleanup

* cleanup and 05 tinkering

* made fal keys optional
2024-02-06 15:07:16 -06:00
Moishe Lettvin
9d5ad5675c Fix 06- demo and also fix bugs where dangling sentences wouldn't be spoken 2024-02-01 12:54:23 -05:00
chadbailey59
0d96f91cde Added sound effect example (#18)
* added sound effect example

* added dialout to this branch too

* fixup

* fixup for more dialout testing

* cleanup
2024-02-01 10:26:50 -06:00
Moishe Lettvin
4e9586595d minor cleanup 2024-01-29 15:06:39 -05:00
Moishe Lettvin
d0bcddfd70 Fix 06a-image-sync.py 2024-01-29 14:29:32 -05:00
Chad Bailey
065a213ebb example renaming 2024-01-29 17:42:45 +00:00
Chad Bailey
7d6c94d604 added 09 examples 2024-01-29 17:39:28 +00:00
Chad Bailey
0859b57b00 Added 09 examples 2024-01-29 17:39:14 +00:00
Moishe Lettvin
09838c9b1f Merge pull request #17 from daily-co/start_tests
Add some basic daily_transport tests
2024-01-29 07:57:33 -05:00
Moishe Lettvin
c39920132c Add some basic daily_transport tests 2024-01-29 07:56:12 -05:00
Moishe Lettvin
860129a4be Merge pull request #16 from daily-co/image_tweaks
Minor Cleanup
2024-01-27 19:10:52 -05:00
Moishe Lettvin
4416f36ae9 some minor cleanup, and coalesce image/images into one thing, and use itertools.cycle 2024-01-27 19:07:29 -05:00
chadbailey59
86af896150 Wake word and animation sprites (#15)
* WIP: golden kitty

* added web server

* added health check

* added flask to module build

* trying requirements.txt

* added dotenv

* flask_cors

* gunicorn

* requirements cleanup

* Dockerfile

* WOOF

* basic wake word

* removed otel

* basic animation kind of works

* i think animation defeated me

* added santa cat assets

* cleanup

* cleanup

* server example and cleanup

* more cleanup

* fix up some class variable names

* minor cleanup, remove mistakenly-added print and logger stuff

* cleanup

* cleanup

---------

Co-authored-by: Moishe Lettvin <moishel@gmail.com>
2024-01-26 15:37:39 -06:00
Moishe Lettvin
5cbac4701b minor cleanup, remove mistakenly-added print and logger stuff 2024-01-26 15:27:12 -05:00
Moishe Lettvin
5d9aa530e2 fix up some class variable names 2024-01-26 15:15:44 -05:00
Moishe Lettvin
d4c4d49035 Merge pull request #14 from daily-co/aiosessions
Don't create aiohttp sessions inside services
2024-01-26 14:01:24 -05:00
Moishe Lettvin
e81f247845 Don't create aiohttp sessions inside services 2024-01-26 12:30:37 -05:00
Liza
8baf137511 prefix suspected private members (#13) 2024-01-26 18:28:54 +01:00
Moishe Lettvin
fcceb32bd7 Merge pull request #12 from daily-co/frame_sync
Speaking / waiting images
2024-01-26 10:17:01 -05:00
Moishe Lettvin
ead655fe23 some more fixup 2024-01-26 10:07:16 -05:00
Moishe Lettvin
bab102f197 little more cleanup 2024-01-26 09:54:51 -05:00
Moishe Lettvin
95fc802607 Speaking / waiting images 2024-01-26 09:15:29 -05:00
Moishe Lettvin
2886997693 Merge pull request #11 from daily-co/autopep
Autopep linter fixes
2024-01-25 12:17:26 -05:00
Moishe Lettvin
5fdda43bed Autopep linter fixes 2024-01-25 12:12:46 -05:00
Moishe Lettvin
f0d9b0613e Add faster_whisper to module dependencies; remove unneeded import 2024-01-25 11:27:00 -05:00
Moishe Lettvin
a661905d7f Merge pull request #9 from daily-co/interruptions
Interruptable conversation wrapper
2024-01-25 11:24:57 -05:00
Moishe Lettvin
c9c2e5f561 Remove unnecessary try/except 2024-01-25 11:18:55 -05:00
Moishe Lettvin
795a339542 Add InterruptibleConversationWrapper 2024-01-25 11:15:04 -05:00
Liza
31db156dfc Local Whisper transcription (#10)
* First pass at Whisper transcription

* deletions

* Revise based on feedback, add autopep8
2024-01-25 13:43:25 +01:00
Moishe Lettvin
690cf2e47d Merge pull request #8 from daily-co/queueframe-refactor
Refactor QueueFrame
2024-01-23 13:15:11 -05:00
Moishe Lettvin
ba89e41c5b remove commented-out code 2024-01-23 09:37:15 -05:00
Moishe Lettvin
c134598a77 Refactor QueueFrame 2024-01-23 09:33:51 -05:00
Liza
b51abd2969 facilitate manual call management (#7) 2024-01-23 14:33:27 +01:00
Moishe Lettvin
3fda9b0ecb Use more flexibile aggregator 2024-01-22 16:02:35 -05:00
Moishe Lettvin
95c92e5304 Aggregators for LLM messages 2024-01-22 10:59:13 -05:00
Moishe Lettvin
b443fbdb60 Very rough draft at intro/overview in README 2024-01-19 16:20:08 -05:00
Moishe Lettvin
ccd2fa31e5 Rename 'theoretical-to-real' samples to 'foundational' 2024-01-19 13:57:52 -05:00
Moishe Lettvin
9b65286216 Merge pull request #6 from daily-co/rm-sentence-aggregator
Cleanup: no more sentence aggregator
2024-01-19 13:42:27 -05:00
Moishe Lettvin
6ae733ebfe Cleanup: no more sentence aggregator, let the TTS service deal with that; also removed the queue typing stuff from ai_services 2024-01-19 13:06:15 -05:00
Liza
1071dede1a Only initialize Daily once (#5) 2024-01-19 14:59:48 +01:00
94 changed files with 4544 additions and 2739 deletions

1
.gitignore vendored
View File

@@ -2,6 +2,7 @@
env/
__pycache__/
*~
venv
#*#
# Distribution / packaging

24
LICENSE Normal file
View File

@@ -0,0 +1,24 @@
BSD 2-Clause License
Copyright (c) 2024, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

128
README.md
View File

@@ -1,6 +1,19 @@
# dailyai SDK
# Daily AI SDK
This SDK can help you build applications that participate in WebRTC meetings and use various AI services to interact with other participants.
Build conversational, multi-modal AI apps with real-time voice and video, like this:
_Demo Video to come_
With built-in support for many of the best AI platforms (or [add your own](/docs)):
- Azure - DALL-E, ChatGPT, and Azure AI Text-to-Speech
- Deepgram - Speech-to-text, and Aura text-to-speech
- Eleven Labs text-to-speech
- Fal.ai image generation
- OpenAI DALL-E and ChatGPT
- Whisper local speech-to-text
## Step 1: Get Started
## Build/Install
@@ -35,21 +48,112 @@ pip install path_to_this_repo
Tou can run the simple sample like so:
```
python src/samples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
python src/examples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
```
## Overview
Note that the sample uses Azure's TTS and LLM services. You'll need to set the following environment variables for the sample to work:
The Daily AI SDK allows you to build applications that can participate in WebRTC sessions and interact with AI Services. Some examples of what you can build with this:
- conversational bots that interact 1:1 with a user, using voice recognition and text-to-speech
- assistant bots that aggregate transcriptions from multiple participants in a meeting and provide realtime summaries or other AI-generated output.
- image-recognition bots
- etc
## Concepts
### Transport Service
The SDK provides one “transport service”, which is a wrapper around Dailys `daily-python` client (tk add link). You can use this service to listen for events related to a WebRTC session, such as “a participant joined the meeting”.
The transport service also exposes a send queue, and a receive queue. You can use the send queue to send audio and video to the WebRTC session, and you can listen to the receive queue to see audio, video and transcription data from the WebRTC session.
### AI Services
The AI Service classes provide wrappers around various AI providers, and allow you to query LLMs, convert text to speech and make images from text. The audio and images can then be placed on the transport services send queue, where theyll be sent to the WebRTC session.
### Queue Frames
Communication between the transport service and AI services, and between various AI services, takes place in Queue Frames. These frames contain an indication of the type of data as well as the data itself.
## Using Transports, AI Services and Frames
AI Services all define a `.run` method. This method consumes and generates `QueueFrame` frames. The kind of frames that can be consumed and generated depend on the kind of service. For instance, an LLM AI Service consumes `LLM_MESSAGE` frames (which define a history of interaction with an LLM) and emit `TEXT` frames (the response from the LLM).
The `.run` method is an `AsyncIterable`, and it takes an `iterable`, `AsyncIterable` or `asyncio.Queue` that produces QueueFrames as a parameter. This makes it easy to chain AI Services, and consume input from the Transports `receive_queue` .
AI Services also have a `.run_to_queue` method. This method is not an AsyncIterable, but instead sends processed QueueFrames to a queue. This makes it easy to send the output of an AI Service to the Transports `send_queue`.
AI Services also define convenience functions that let you bypass creating QueueFrames for some simple cases (eg. using the TTS service to convert a string to audio output and send that audio to the transports `send_queue`). See below for examples.
## Examples
### Say Something
The base TTS AI service exposes a `.say` method. After creating a transport and TTS service, you can use this method like so:
```
AZURE_SPEECH_SERVICE_KEY
AZURE_SPEECH_SERVICE_REGION
AZURE_CHATGPT_KEY
AZURE_CHATGPT_ENDPOINT
AZURE_CHATGPT_DEPLOYMENT_ID
transport = DailyTransportService(...)
tts = AzureTTSService()
await tts.say("hello world", transport.send_queue)
```
If you have those environment variables stored in an .env file, you can quickly load them into your terminal's environment by running this:
This will call the TTS service to render the text to audio frames, then put the audio frames on the transports send queue. The transport will then send those frames along to the WebRTC session.
### Speak an LLM response
Given a system prompt contained in a `messages` array, you can emit the LLMs response as audio with a chain like this:
```bash
export $(grep -v '^#' .env | xargs)
```
transport = DailyTransportService(...) # setup parameters omitted
tts = AzureTTSService()
llm = AzureLLMService()
messages = [...] # system prompt omitted for brevity
await tts.run_to_queue(
transport.send_queue,
llm.run([QueueFrame.LLM_MESSAGES, messages])
)
```
In this code, the LLM service object sends the messages to Azures OpenAI implementation, which streams chunks back asynchronously. Those chunks are aggregated by the TTS Service to ensure the best audio response (TTS works best when it gets complete sentence, so it can inflect correctly), then sent to Azures TTS service, converted to audio frames, and sent to the WebRTC session via the Daily transport.
### Pre-cache an LLM response
Sometimes LLMs can be slower than wed like for natural-feeling communication. Heres an example where we take advantage of the time it takes to speak some pre-defined text to get a head start on the LLM response:
(TK link to 04- sample)
In this sample, we set up a buffer queue to receive the audio frames from the LLM response before while we are joining the call and start an asynchronous task to start filling this buffer:
```
buffer_queue = asyncio.Queue()
llm_response_task = asyncio.create_task(
elevenlabs_tts.run_to_queue(
buffer_queue,
llm.run([QueueFrame(FrameType.LLM_MESSAGE, messages)]),
True,
)
)
```
Then, when weve joined the call, we speak the static text:
```
await azure_tts.say("My friend...", transport.send_queue)
```
As that text is being spoken, the asynchronous LLM task continues in the background. When the text is done, we pull the frames off the buffer queue and put them in the transports `send_queue`:
```
async def buffer_to_send_queue():
while True:
frame = await buffer_queue.get()
await transport.send_queue.put(frame)
buffer_queue.task_done()
if frame.frame_type == FrameType.END_STREAM:
break
await asyncio.gather(llm_response_task, buffer_to_send_queue())
```
One thing to note here is the last parameter to `run_to_queue` in the first code clause above: this causes the `run_to_queue` method to send an `END_STREAM` frame when its done rendering. This lets us know when to stop our `buffer_to_send_queue` task above.

13
docs/README.md Normal file
View File

@@ -0,0 +1,13 @@
# Daily AI SDK Docs
## [Architecture Overview](architecture.md)
Learn about the thinking behind the SDK's design.
## [Example Code](examples/)
The repo includes several example apps in the `src/examples` directory. The docs explain how they work.
## [API Reference](api/)
Complete documentation of the available classes and methods in the SDK.

2
docs/architecture.md Normal file
View File

@@ -0,0 +1,2 @@
# Daily AI SDK Architecture Guide

View File

@@ -0,0 +1,119 @@
# 01: Say One Thing
_video here - youtube?_
This example uses a text-to-speech (TTS) service to say one predefined sentence. But first, a quick overview of the general structure of these examples.
## Running the demos
All of the demos have something like this at the bottom of the file:
```python
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
```
### `configure()`
The `configure()` function comes from `src/examples/foundational/support/runner.py`, and it allows you to configure the examples from the command line directly, or using environment variables:
```bash
python 01-say-one-thing.py -u https://YOUR_DOMAIN.daily.co/YOUR_ROOM -k YOUR_API_KEY
# or
DAILY_ROOM_URL=https://YOUR_DOMAIN.daily.co/YOUR_ROOM DAILY_API_KEY=YOUR_API_KEY python 01-say-one-thing.py
# or set DAILY_ROOM_URL and DAILY_API_KEY in a .env file
python 01-say-one-thing.py
```
You'll need a Daily account to run these demos. You can sign up for free at [daily.co](https://daily.co). Once you've signed up you can create a room from the [Dashboard](https://dashboard.daily.co/rooms), and grab [your API key](https://dashboard.daily.co/developers) while you're there.
Some functionality (such as transcription) requires the bot to have owner privileges in the room. `runner.py` uses the Daily REST API to create a meeting token with owner privileges. You can learn more about meeting tokens in the [Daily docs](https://docs.daily.co/reference/rest-api/meeting-tokens).
### `asyncio.run()`
The AI SDK makes heavy use of Python's `asyncio` module. [This is a reasonable intro to the topic](https://builtin.com/data-science/asyncio) if you haven't worked with `asyncio` and coroutines before.
You can learn a bit more about the specifics of how the Daily AI SDK uses coroutines in the [Architecture Guide](../architecture.md).
## The `main()` function
All of the examples have a `main()` function with a similar structure:
- Configure the transport
- Configure the AI service(s) used in the demo
- Configure any event listeners
- Define a processing pipeline
- Run the example's coroutine(s)
### Configuring the transport
The first section of the `main()` function configures the transport object:
```python
meeting_duration_minutes = 5
transport = DailyTransportService(
room_url,
None,
"Say One Thing",
meeting_duration_minutes,
)
transport.mic_enabled = True
```
The [Architecture Guide](../architecture.md) explains the transport object in more detail. In this case, we're configuring a Daily transport object and enabling the virtual microphone, so our bot can play audio.
### Configuring the services
As described in the [Architecture Guide](../architecture.md), 'a 'Service' is a class that processes 'Frames' as part of a 'Pipeline'. In this demo app, we'll only need one service: a text-to-speech generator. We can create an instance of the `ElevenLabsTTSService` class with this line of code:
```python
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
```
You'll need to make sure and set those environment variables somewhere. The easiest way to do that is to copy the `example.env` file in the repo and rename it to `.env`, and then add your credentials to that file. `runner.py` loads the `python-dotenv` module and initializes it, making the values in that file available in the environment.
### Configuring event listeners
This part isn't strictly necessary for an app like this. You could include the contents of the `on_participant_joined` function directly in the body of the `main()` function, and it would run as soon as you started the script from the command line.
Instead, we can use an event handler to wait to run that code until someone else joins the meeting. We'll define a function called `greet_user()`, and use the `@transport.event_handler("on_participant_joined")` decorator to tell the SDK that we want to run that function whenever a user joins the room.
```python
@transport.event_handler("on_participant_joined")
async def greet_user(transport, participant):
if participant["info"]["isLocal"]:
return
await tts.say(
"Hello there, " + participant["info"]["userName"] + "!",
transport.send_queue,
)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
```
### Defining a processing pipeline
In this example, we don't actually have much of a processing pipeline! In fact, we're doing the whole thing inside the `greet_user()` function already.
Pipelines usually look like a bunch of nested calls to the `run()` or `run_to_queue()` function from different Services. In this example, we're using the `say()` function from the TTS service. This is effectively a convenience wrapper around the `run_to_queue()` function, which we'll discuss more later. It's important to `await` this function to ensure that the speech frames are queued for playback before the next line of code, because of the `stop_when_done()` function being called immediately afterward.
The output of the `say()` function goes to the transport's `send_queue`. This queue is the all-important connection between the world of the Services pipeline that's generating frames asynchronously and the ordered playback of audio and visual media in the WebRTC call.
### Running the coroutines
In this example, we don't actually have any separate processing pipelines—everything happens as a result of an event from the transport. So we only need to run the transport's coroutine, and await its completion:
```python
await transport.run()
```
In future examples, we'll run more processes in parallel. For now, this script can run until the transport exits—which will happen based on calling `stop_when_done()` in the `greet_user()` function.
## Next Steps
Next, we'll start connecting multiple AI services together by building a service pipeline.
## [02 - LLM Say One Thing »](02-llm-say-one-thing.md)

5
docs/examples/README.md Normal file
View File

@@ -0,0 +1,5 @@
# Daily AI SDK Examples
The docs in this folder pair with the example apps located in `src/examples/foundational`. They are designed to serve as a quick references for building different kinds of AI apps. But the examples also build on one another, so it can be really helpful to walk through them in order.
To start, you can learn about the overall structure of the examples in [01 - Say One Thing](01-say-one-thing.md).

View File

@@ -7,16 +7,21 @@ name = "daily_ai"
version = "0.0.1"
description = "Orchestrator for AI bots with Daily"
dependencies = [
"daily-python",
"Pillow",
"typing-extensions",
"openai",
"google-cloud-texttospeech",
"azure-cognitiveservices-speech",
"pyht",
"opentelemetry-sdk",
"aiohttp",
"fal"
"azure-cognitiveservices-speech",
"daily-python",
"fal",
"faster_whisper",
"google-cloud-texttospeech",
"numpy",
"openai",
"Pillow",
"pyht",
"python-dotenv",
"torch",
"torchaudio",
"pyaudio",
"typing-extensions"
]
[tool.setuptools.packages.find]

View File

@@ -1,3 +1,4 @@
autopep8==2.0.4
build==1.0.3
packaging==23.2
pyproject_hooks==1.0.0

View File

@@ -1,347 +0,0 @@
import json
import logging
import re
from collections import defaultdict
from dataclasses import dataclass, field
from enum import Enum
from queue import Queue, PriorityQueue, Empty
from threading import Event, Semaphore, Thread
from typing import Any, Generator, Iterator, Optional, Type
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.message_handler.message_handler import MessageHandler
from dailyai.services.ai_services import AIServiceConfig
class AsyncProcessorState:
# Setting class variables, other synchronous activities
INIT = 0
# Making asynchronous requests to LLM and other services to render response
PREPARING = 1
# Ready to start presenting to user (but may not have all data yet)
READY = 2
# Playing response
PLAYING = 3
# An interrupt has been requested and the response is shutting down in-flight processing
INTERRUPTING = 4
# An interrupt has been requested and the response is finished stopping in-flight processing
INTERRUPTED = 5
# Response has been played or interrupted
DONE = 6
# Response is being finalized (updating records of speech, updating LLM context, etc.)
FINALIZING = 7
# Response is complete. This could mean that everything is updated, or that the response
# was interrupted.
FINALIZED = 8
state_transitions = {
INIT: [PREPARING, INTERRUPTING],
PREPARING: [READY, INTERRUPTING],
READY: [PLAYING, INTERRUPTING],
PLAYING: [DONE, INTERRUPTING],
INTERRUPTING: [INTERRUPTED],
INTERRUPTED: [DONE],
DONE: [FINALIZING],
FINALIZING: [FINALIZED],
FINALIZED: [FINALIZED],
}
@dataclass(order=True)
class StateTransitionItem:
state: int
evt: Event = field(compare=False)
class AsyncProcessor:
def __init__(
self,
services: AIServiceConfig
) -> None:
self.state = AsyncProcessorState.INIT
self.prepare_thread = None
self.play_thread = None
self.finalize_thread = None
self.services: AIServiceConfig = services
self.state_transition_semaphore = Semaphore()
self.waiting_for_state_changes = PriorityQueue()
self.state_queue = Queue()
self.state_change_callbacks = defaultdict(list)
self.was_interrupted = False
self.logger: logging.Logger = logging.getLogger("dailyai")
def set_state(self, state: int) -> None:
if state in AsyncProcessorState.state_transitions[self.state]:
self.state_transition_semaphore.acquire()
self.state: int = state
self.state_transition_semaphore.release()
# wake up any threads waiting for this state transition
try:
while True:
waiter = self.waiting_for_state_changes.get_nowait()
if waiter.state <= state:
waiter.evt.set()
else:
self.waiting_for_state_changes.put(waiter)
break
except Empty:
pass
# make all the callbacks for this state
for callback in self.state_change_callbacks[state]:
callback(self)
else:
self.logger.error(
f"Invalid state transition from {self.state} to {state} in {self.__class__.__name__}"
)
raise Exception(f"Invalid state transition from {self.state} to {state}")
#
# This is used for state transitions that could be blocked by an interruption.
# If we are interrupted, we silently fail this call. Use only if you know that
# this state transition should fail if the processor has been interrupted.
#
def maybe_set_state(self, state: int) -> bool:
if state in AsyncProcessorState.state_transitions[self.state]:
self.set_state(state)
return True
else:
return False
def wait_for_state_transition(self, state: int) -> None:
if self.state >= state:
return
self.state_transition_semaphore.acquire()
evt = Event()
self.waiting_for_state_changes.put(StateTransitionItem(state, evt))
self.state_transition_semaphore.release()
result = evt.wait(120.0)
if not result:
self.logger.error(
f"Timed out waiting for state transition to {state} from {self.state}"
)
def set_state_callback(self, state: int, callback: callable) -> None:
self.state_change_callbacks[state].append(callback)
def prepare(self) -> None:
self.prepare_thread = Thread(target=self.async_prepare, daemon=True)
self.prepare_thread.start()
self.wait_for_state_transition(AsyncProcessorState.READY)
def play(self) -> None:
self.wait_for_state_transition(AsyncProcessorState.READY)
self.play_thread = Thread(target=self.async_play, daemon=True)
self.play_thread.start()
self.wait_for_state_transition(AsyncProcessorState.PLAYING)
def finalize(self) -> None:
# don't finalize until we're done playing.
self.wait_for_state_transition(AsyncProcessorState.DONE)
self.set_state(AsyncProcessorState.FINALIZING)
self.do_finalization()
self.set_state(AsyncProcessorState.FINALIZED)
def interrupt(self) -> None:
# nothing to interrupt if we're already finalizing or finalized, no-op
if self.state in [
AsyncProcessorState.FINALIZING,
AsyncProcessorState.FINALIZED,
]:
return
self.set_state(AsyncProcessorState.INTERRUPTING)
self.was_interrupted = True
self.do_interruption()
self.set_state(AsyncProcessorState.INTERRUPTED)
self.set_state(AsyncProcessorState.DONE)
def async_play(self) -> None:
self.logger.info(f"Starting to play")
if self.maybe_set_state(AsyncProcessorState.PLAYING):
self.do_play()
self.maybe_set_state(AsyncProcessorState.DONE)
def async_prepare(self) -> None:
self.set_state(AsyncProcessorState.PREPARING)
self.start_preparation()
self.set_state(AsyncProcessorState.READY)
self.continue_preparation()
self.logger.info(f"Preparation done for {self.__class__.__name__}")
self.preparation_done()
def start_preparation(self) -> None:
pass
def continue_preparation(self) -> None:
pass
def preparation_done(self):
pass
def get_preparation_iterator(self) -> Iterator:
yield None
def process_chunk(self, chunk) -> None:
pass
def do_interruption(self) -> None:
pass
def do_play(self) -> None:
pass
def do_finalization(self) -> None:
pass
# A common class for responses that use a message queue and
# an output queue.
class OrchestratorResponse(AsyncProcessor):
def __init__(
self,
services,
message_handler,
output_queue,
) -> None:
super().__init__(services)
self.message_handler: MessageHandler = message_handler
self.output_queue: Queue = output_queue
class LLMResponse(OrchestratorResponse):
def __init__(
self,
services,
message_handler,
output_queue,
) -> None:
super().__init__(services, message_handler, output_queue)
self.has_sent_first_frame = False
self.chunks_in_preparation = Queue()
self.llm_responses: list[str] = []
def get_preparation_iterator(self) -> Iterator:
messages_for_llm = self.message_handler.get_llm_messages()
self.logger.debug(f"Messages for llm: {json.dumps(messages_for_llm, indent=2)}")
return self.clauses_from_chunks(
self.services.llm.run_llm_async(messages_for_llm)
)
def clauses_from_chunks(self, chunks) -> Iterator:
out = ""
for chunk in chunks:
if self.state not in [
AsyncProcessorState.READY,
AsyncProcessorState.PLAYING,
]:
break
out += chunk
if re.match(r"^.*[.!?]$", out): # it looks like a sentence
yield out.strip()
out = ""
if out.strip():
yield out.strip()
def get_frames_from_tts_response(self, audio_frame) -> list[QueueFrame]:
return [QueueFrame(FrameType.AUDIO, audio_frame)]
def get_frames_from_chunk(self, chunk) -> Generator[list[QueueFrame], Any, None]:
for audio_frame in self.services.tts.run_tts(chunk):
yield self.get_frames_from_tts_response(audio_frame)
def start_preparation(self) -> None:
self.preparation_iterator = self.get_preparation_iterator()
def continue_preparation(self) -> None:
for chunk in self.preparation_iterator:
if self.state not in [
AsyncProcessorState.READY,
AsyncProcessorState.PLAYING,
]:
break
self.process_chunk(chunk)
def process_chunk(self, chunk) -> None:
self.chunks_in_preparation.put((chunk, self.get_frames_from_chunk(chunk)))
def preparation_done(self):
self.chunks_in_preparation.put((None, None))
def do_play(self) -> None:
while True:
if self.state not in [
AsyncProcessorState.READY,
AsyncProcessorState.PLAYING,
]:
break
prepared_chunk = self.chunks_in_preparation.get()
if prepared_chunk[0] == None:
return
self.play_prepared_chunk(prepared_chunk)
def play_prepared_chunk(self, prepared_chunk) -> None:
chunk, tts_generator = prepared_chunk
for frames in tts_generator:
if self.state not in [
AsyncProcessorState.READY,
AsyncProcessorState.PLAYING,
]:
break
if not self.has_sent_first_frame:
self.output_queue.put(QueueFrame(FrameType.START_STREAM, None))
self.has_sent_first_frame = True
for frame in frames:
self.output_queue.put(frame)
self.output_queue.join()
self.llm_responses.append(chunk)
def do_finalization(self) -> None:
self.message_handler.add_assistant_messages(self.llm_responses)
def do_interruption(self) -> None:
self.chunks_in_preparation.put((None, None))
if self.prepare_thread and self.prepare_thread.is_alive():
self.prepare_thread.join()
if self.play_thread and self.play_thread.is_alive():
self.play_thread.join()
@dataclass(frozen=True)
class ConversationProcessorCollection:
introduction: Optional[Type[OrchestratorResponse]] = None
waiting: Optional[Type[OrchestratorResponse]] = None
response: Optional[Type[OrchestratorResponse]] = None
goodbye: Optional[Type[OrchestratorResponse]] = None

View File

@@ -1,127 +0,0 @@
import logging
import time
from dataclasses import dataclass
from queue import Queue, Empty
from threading import Thread
from dailyai.storage.search import SearchIndexer
from dailyai.services.ai_services import AIServiceConfig
@dataclass
class Message:
type: str
timestamp: float
message: str
class MessageHandler:
def __init__(self, intro):
self.messages: list[Message] = [Message("system", time.time(), intro)]
self.last_user_message_idx:int | None = None
self.finalized_user_message_idx: int | None = None
def add_user_message(self, message) -> None:
if self.last_user_message_idx is not None and self.last_user_message_idx != self.finalized_user_message_idx:
previous_message: str = self.messages[self.last_user_message_idx].message
self.messages[self.last_user_message_idx] = Message(
"user", time.time(), ' '.join([previous_message, message])
)
self.messages = self.messages[: self.last_user_message_idx + 1]
else:
self.messages.append(Message("user", time.time(), message))
self.last_user_message_idx = len(self.messages) - 1
def add_assistant_message(self, message) -> None:
if self.messages[-1].type == "assistant":
self.messages[-1].message += " " + message
else:
self.messages.append(Message("assistant", time.time(), message))
def add_assistant_messages(self, messages) -> None:
self.messages.append(Message("assistant", time.time(), " ".join(messages)))
def get_llm_messages(self) -> list[dict[str, str]]:
return [{"role": m.type, "content": m.message} for m in self.messages]
def finalize_user_message(self) -> None:
self.finalized_user_message_idx = self.last_user_message_idx
def shutdown(self) -> None:
pass
class IndexingMessageHandler(MessageHandler):
def __init__(
self, intro, services: AIServiceConfig, indexer: SearchIndexer
) -> None:
super().__init__(intro)
self.services = services
self.search_indexer = indexer
self.last_written_idx = 0
self.storage_message_queue = Queue()
self.index_writer_thread = Thread(target=self.storage_writer, daemon=True)
self.index_writer_thread.start()
self.logger = logging.getLogger("dailyai")
def shutdown(self):
self.finalize_user_message()
self.storage_message_queue.put(None)
self.index_writer_thread.join()
def storage_writer(self) -> None:
while True:
try:
message_idx = self.storage_message_queue.get()
self.storage_message_queue.task_done()
if message_idx is None:
return
if message_idx <= self.last_written_idx:
continue
self.last_written_idx = message_idx
message = self.messages[message_idx]
content = message.message
if message.type == "user":
content = self.cleanup_user_message(content)
# sometimes the LLM returns a string wrapped in quotes and sometimes it doesn't.
# if it didn't, wrap it in quotes
if content[0] != '"':
content = '"' + content + '"'
self.search_indexer.index_text(content)
except Empty:
pass
def cleanup_user_message(self, user_message) -> str:
return user_message
def finalize_user_message(self):
super().finalize_user_message()
self.write_messages_to_storage()
def write_messages_to_storage(self):
if self.finalized_user_message_idx is None:
return
for idx in range(self.last_written_idx, len(self.messages)):
self.logger.info(
f"Writing to storage: {self.messages[idx].type} {self.messages[idx].message}"
)
if (
self.messages[idx].type == "user"
and idx > self.finalized_user_message_idx
):
break
if self.messages[idx].type != "system":
self.storage_message_queue.put(idx)

View File

@@ -1,409 +0,0 @@
import logging
import os
import time
import wave
from dataclasses import dataclass
from enum import Enum
from queue import Queue, Empty
from opentelemetry import trace, context
from dailyai.async_processor.async_processor import (
AsyncProcessor,
AsyncProcessorState,
ConversationProcessorCollection,
OrchestratorResponse,
LLMResponse,
)
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.ai_services import AIServiceConfig
from dailyai.message_handler.message_handler import MessageHandler
from threading import Thread, Semaphore, Event, Timer
from opentelemetry import context
from opentelemetry.context.context import Context
from daily import (
EventHandler,
CallClient,
Daily,
VirtualCameraDevice,
VirtualMicrophoneDevice,
VirtualSpeakerDevice,
)
@dataclass
class OrchestratorConfig:
room_url: str
token: str
bot_name: str
expiration: float
# Note that we use this as a default parameter value in the Orchestrator
# constructor. The dataclass is defined with Frozen=True, so this should
# be safe.
default_conversation_collection = ConversationProcessorCollection(
introduction=LLMResponse,
waiting=None,
response=LLMResponse,
goodbye=None,
)
class Orchestrator(EventHandler):
def __init__(
self,
daily_config: OrchestratorConfig,
ai_service_config: AIServiceConfig,
message_handler: MessageHandler,
conversation_processors: ConversationProcessorCollection = default_conversation_collection,
tracer=None,
):
self.bot_name: str = daily_config.bot_name
self.room_url: str = daily_config.room_url
self.token: str = daily_config.token
self.expiration: float = daily_config.expiration
self.logger: logging.Logger = logging.getLogger("dailyai")
self.tracer = tracer or trace.get_tracer("orchestrator")
self.ctx: Context = context.get_current()
self.transcription = ""
self.last_fragment_at = None
self.talked_at = None
self.paused_at = None
self.logger.info(f"Creating Response for introductions")
self.services: AIServiceConfig = ai_service_config
self.output_queue = Queue()
self.is_interrupted = Event()
self.stop_threads = Event()
self.story_started = False
self.message_handler = message_handler
self.conversation_processors: ConversationProcessorCollection = conversation_processors
if conversation_processors.introduction is not None:
intro = conversation_processors.introduction(
services=self.services, message_handler=self.message_handler, output_queue=self.output_queue
)
intro.prepare()
intro.set_state_callback(AsyncProcessorState.DONE, self.on_intro_played)
intro.set_state_callback(AsyncProcessorState.FINALIZED, self.on_intro_finished)
self.logger.info(f"Introduction is preparing")
self.current_response: AsyncProcessor = intro
self.can_interrupt = False
# self.response_event.set()
self.response_semaphore = Semaphore()
self.speech_timeout = None
self.interrupt_time = None
self.logger.info("Configuring daily")
self.configure_daily()
def configure_daily(self):
Daily.init()
self.client = CallClient(event_handler=self)
self.logger.info(f"Mic sample rate: {self.services.tts.get_mic_sample_rate()}")
self.mic: VirtualMicrophoneDevice = Daily.create_microphone_device(
"mic", sample_rate=self.services.tts.get_mic_sample_rate(), channels=1
)
self.speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
"speaker", sample_rate=16000, channels=1
)
self.camera: VirtualCameraDevice = Daily.create_camera_device(
"camera", width=720, height=1280, color_format="RGB"
)
Daily.select_speaker_device("speaker")
self.client.set_user_name(self.bot_name)
self.client.join(self.room_url, self.token, completion=self.call_joined)
self.client.update_inputs(
{
"camera": {
"isEnabled": True,
"settings": {
"deviceId": "camera",
},
},
"microphone": {
"isEnabled": True,
"settings": {
"deviceId": "mic",
"customConstraints": {
"autoGainControl": {"exact": False},
"echoCancellation": {"exact": False},
"noiseSuppression": {"exact": False},
},
},
},
}
)
self.client.update_publishing(
{
"camera": {
"sendSettings": {
"maxQuality": "low",
"encodings": {
"low": {
"maxBitrate": 250000,
"scaleResolutionDownBy": 1.333,
"maxFramerate": 8,
}
},
}
}
}
)
self.my_participant_id = self.client.participants()["local"]["id"]
def start(self) -> None:
# TODO: this loop could, I think, be replaced with a timer and an event
self.participant_left = False
try:
participant_count: int = len(self.client.participants())
self.logger.info(f"{participant_count} participants in room")
while time.time() < self.expiration and not self.participant_left:
# all handling of incoming transcriptions happens in on_transcription_message
time.sleep(1)
except Exception as e:
self.logger.error(f"Exception {e}")
finally:
self.client.leave()
def stop(self):
self.logger.info("Stop current response")
if self.current_response:
if self.current_response.state < AsyncProcessorState.INTERRUPTED:
self.current_response.interrupt()
self.logger.info("Wait for state transition")
self.current_response.wait_for_state_transition(AsyncProcessorState.FINALIZED)
self.stop_threads.set()
self.camera_thread.join()
self.logger.info("Camera thread stopped")
self.logger.info("Put stop in output queue")
self.output_queue.put(QueueFrame(FrameType.END_STREAM, None))
self.frame_consumer_thread.join()
self.logger.info("Orchestrator stopped.")
def on_intro_played(self, intro):
self.logger.info(f"Introduction has played")
self.can_interrupt = True
intro.finalize()
def on_intro_finished(self, intro):
self.logger.info(f"Introduction has finished")
waiting = self.conversation_processors.waiting(self.services, self.message_handler, self.output_queue)
waiting.prepare()
waiting.play()
def on_response_played(self, response):
response.finalize()
def on_response_finished(self, response):
if not response.was_interrupted:
self.message_handler.finalize_user_message()
def call_joined(self, join_data, client_error):
self.logger.info(f"Call_joined: {join_data}, {client_error}")
self.client.start_transcription(
{
"language": "en",
"tier": "nova",
"model": "2-conversationalai",
"profanity_filter": True,
"redact": False,
"extra": {
"endpointing": True,
"punctuate": False,
}
}
)
def on_participant_joined(self, participant):
with self.tracer.start_as_current_span("on_participant_joined", context=self.ctx):
self.logger.info(f"on_participant_joined: {participant}")
# TODO: figure out the architecture to get the story id to the client
# self.client.send_app_message({"event": "story-id", "storyID": self.story_id})
time.sleep(2)
if not self.story_started:
self.action()
self.story_started = True
def on_participant_left(self, participant, reason):
self.logger.info(f"Participant {participant} left")
if len(self.client.participants()) < 2:
self.participant_left = True
def on_app_message(self, message, sender):
with self.tracer.start_as_current_span("on_app_message", context=self.ctx):
self.logger.info(f"on_app_message {message} from {sender}")
if "isSpeaking" in message and message["isSpeaking"] == True:
self.handle_user_started_talking()
if "isSpeaking" in message and message["isSpeaking"] == False:
self.handle_user_stopped_talking()
def on_transcription_message(self, message):
with self.tracer.start_as_current_span("on_transcription_message", context=self.ctx):
if message["session_id"] != self.my_participant_id:
self.handle_transcription_fragment(message['text'])
def on_transcription_stopped(self, stopped_by, stopped_by_error):
self.logger.info(f"Transcription stopped {stopped_by}, {stopped_by_error}")
def on_transcription_error(self, message):
self.logger.error(f"Transcription error {message}")
def on_transcription_started(self, status):
self.logger.info(f"Transcription started {status}")
def set_image(self, image: bytes):
self.image: bytes | None = image
def run_camera(self):
try:
while not self.stop_threads.is_set():
if self.image:
self.camera.write_frame(self.image)
time.sleep(1.0 / 8.0) # 8 fps
except Exception as e:
self.logger.error(f"Exception {e} in camera thread.")
def handle_user_started_talking(self):
# TODO: allow configuration of the timer timeout
self.logger.error("user started talking")
self.speech_timeout = Timer(1.0, self.utterance_interrupt)
def handle_user_stopped_talking(self):
self.logger.error("user stopped talking, canceling utterance interrupt")
if self.speech_timeout:
self.speech_timeout.cancel()
def utterance_interrupt(self):
self.logger.error("utterance interrupt")
self.is_interrupted.set()
def handle_transcription_fragment(self, fragment):
if not self.can_interrupt:
return
# start generating a new response. We'll do the fast parts of the interrupt
# now but wait for the state transition after we've kicked off the prepare
# on the new response.
if (
self.current_response
and self.current_response.state < AsyncProcessorState.INTERRUPTED
):
self.interrupt_time = time.perf_counter()
self.is_interrupted.set()
self.current_response.interrupt()
self.message_handler.add_user_message(fragment)
response_type: type[OrchestratorResponse] | type[LLMResponse] = self.conversation_processors.response or LLMResponse
new_response: OrchestratorResponse = response_type(
self.services, self.message_handler, self.output_queue
)
new_response.set_state_callback(
AsyncProcessorState.DONE, self.on_response_played
)
new_response.set_state_callback(
AsyncProcessorState.FINALIZED, self.on_response_finished
)
new_response.prepare()
self.response_semaphore.acquire()
if (
self.current_response
and self.current_response.state < AsyncProcessorState.INTERRUPTED
):
self.current_response.wait_for_state_transition(
AsyncProcessorState.FINALIZED
)
self.current_response = new_response
self.current_response.play()
self.response_semaphore.release()
def action(self):
self.logger.info("Starting camera thread")
self.image: bytes | None = None
self.camera_thread = Thread(target=self.run_camera, daemon=True)
self.camera_thread.start()
self.logger.info("Starting frame consumer thread")
self.frame_consumer_thread = Thread(target=self.frame_consumer, daemon=True)
self.frame_consumer_thread.start()
self.logger.info("Playing introduction")
self.can_interrupt = False
self.current_response.play()
def frame_consumer(self):
self.logger.info("🎬 Starting frame consumer thread")
b = bytearray()
smallest_write_size = 3200
all_audio_frames = bytearray()
while True:
try:
frame:QueueFrame = self.output_queue.get()
if frame.frame_type == FrameType.END_STREAM:
self.logger.info("Stopping frame consumer thread")
return
# if interrupted, we just pull frames off the queue and discard them
if not self.is_interrupted.is_set():
if frame:
if frame.frame_type == FrameType.AUDIO:
chunk = frame.frame_data
all_audio_frames.extend(chunk)
b.extend(chunk)
l = len(b) - (len(b) % smallest_write_size)
if l:
self.mic.write_frames(bytes(b[:l]))
b = b[l:]
elif frame.frame_type == FrameType.IMAGE:
self.set_image(frame.frame_data)
elif len(b):
self.mic.write_frames(bytes(b))
b = bytearray()
else:
if self.interrupt_time:
self.logger.info(f"Lag to stop stream after interruption {time.perf_counter() - self.interrupt_time}")
self.interrupt_time = None
if frame.frame_type == FrameType.START_STREAM:
self.is_interrupted.clear()
self.output_queue.task_done()
except Empty:
try:
if len(b):
self.mic.write_frames(bytes(b))
except Exception as e:
self.logger.error(f"Exception in frame_consumer: {e}, {len(b)}")
b = bytearray()

View File

@@ -0,0 +1,360 @@
import asyncio
import re
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.frames import (
EndFrame,
EndPipeFrame,
Frame,
ImageFrame,
LLMMessagesQueueFrame,
LLMResponseEndFrame,
LLMResponseStartFrame,
TextFrame,
TranscriptionQueueFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame
)
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.ai_services import AIService
from typing import AsyncGenerator, Coroutine, List
class ResponseAggregator(FrameProcessor):
def __init__(self, *, messages: list[dict], role: str, start_frame, end_frame, accumulator_frame, pass_through=True):
self.aggregation = ""
self.aggregating = False
self.messages = messages
self._role = role
self._start_frame = start_frame
self._end_frame = end_frame
self._accumulator_frame = accumulator_frame
self._pass_through = pass_through
async def process_frame(
self, frame: Frame
) -> AsyncGenerator[Frame, None]:
if isinstance(frame, self._start_frame):
self.aggregating = True
elif isinstance(frame, self._end_frame):
self.aggregating = False
self.messages.append({"role": self._role, "content": self.aggregation})
self.aggregation = ""
yield LLMMessagesQueueFrame(self.messages)
elif isinstance(frame, self._accumulator_frame) and self.aggregating:
self.aggregation += f" {frame.text}"
if self._pass_through:
yield frame
else:
yield frame
class LLMResponseAggregator(ResponseAggregator):
def __init__(self, messages: list[dict]):
super().__init__(
messages=messages,
role="assistant",
start_frame=LLMResponseStartFrame,
end_frame=LLMResponseEndFrame,
accumulator_frame=TextFrame
)
class UserResponseAggregator(ResponseAggregator):
def __init__(self, messages: list[dict]):
super().__init__(
messages=messages,
role="user",
start_frame=UserStartedSpeakingFrame,
end_frame=UserStoppedSpeakingFrame,
accumulator_frame=TranscriptionQueueFrame,
pass_through=False
)
class LLMContextAggregator(AIService):
def __init__(
self,
messages: list[dict],
role: str,
bot_participant_id=None,
complete_sentences=True,
pass_through=True,
):
super().__init__()
self.messages = messages
self.bot_participant_id = bot_participant_id
self.role = role
self.sentence = ""
self.complete_sentences = complete_sentences
self.pass_through = pass_through
async def process_frame(
self, frame: Frame
) -> AsyncGenerator[Frame, None]:
# We don't do anything with non-text frames, pass it along to next in the pipeline.
if not isinstance(frame, TextFrame):
yield frame
return
# Ignore transcription frames from the bot
if isinstance(frame, TranscriptionQueueFrame):
if frame.participantId == self.bot_participant_id:
return
# The common case for "pass through" is receiving frames from the LLM that we'll
# use to update the "assistant" LLM messages, but also passing the text frames
# along to a TTS service to be spoken to the user.
if self.pass_through:
yield frame
# TODO: split up transcription by participant
if self.complete_sentences:
# type: ignore -- the linter thinks this isn't a TextQueueFrame, even
# though we check it above
self.sentence += frame.text
if self.sentence.endswith((".", "?", "!")):
self.messages.append({"role": self.role, "content": self.sentence})
self.sentence = ""
yield LLMMessagesQueueFrame(self.messages)
else:
# type: ignore -- the linter thinks this isn't a TextQueueFrame, even
# though we check it above
self.messages.append({"role": self.role, "content": frame.text})
yield LLMMessagesQueueFrame(self.messages)
class LLMUserContextAggregator(LLMContextAggregator):
def __init__(
self, messages: list[dict], bot_participant_id=None, complete_sentences=True
):
super().__init__(
messages, "user", bot_participant_id, complete_sentences, pass_through=False
)
class LLMAssistantContextAggregator(LLMContextAggregator):
def __init__(
self, messages: list[dict], bot_participant_id=None, complete_sentences=True
):
super().__init__(
messages,
"assistant",
bot_participant_id,
complete_sentences,
pass_through=True,
)
class SentenceAggregator(FrameProcessor):
"""This frame processor aggregates text frames into complete sentences.
Frame input/output:
TextFrame("Hello,") -> None
TextFrame(" world.") -> TextFrame("Hello world.")
Doctest:
>>> async def print_frames(aggregator, frame):
... async for frame in aggregator.process_frame(frame):
... print(frame.text)
>>> aggregator = SentenceAggregator()
>>> asyncio.run(print_frames(aggregator, TextFrame("Hello,")))
>>> asyncio.run(print_frames(aggregator, TextFrame(" world.")))
Hello, world.
"""
def __init__(self):
self.aggregation = ""
async def process_frame(
self, frame: Frame
) -> AsyncGenerator[Frame, None]:
if isinstance(frame, TextFrame):
m = re.search("(.*[?.!])(.*)", frame.text)
if m:
yield TextFrame(self.aggregation + m.group(1))
self.aggregation = m.group(2)
else:
self.aggregation += frame.text
elif isinstance(frame, EndFrame):
if self.aggregation:
yield TextFrame(self.aggregation)
yield frame
else:
yield frame
class LLMFullResponseAggregator(FrameProcessor):
"""This class aggregates Text frames until it receives a
LLMResponseEndFrame, then emits the concatenated text as
a single text frame.
given the following frames:
TextFrame("Hello,")
TextFrame(" world.")
TextFrame(" I am")
TextFrame(" an LLM.")
LLMResponseEndFrame()]
this processor will yield nothing for the first 4 frames, then
TextFrame("Hello, world. I am an LLM.")
LLMResponseEndFrame()
when passed the last frame.
>>> async def print_frames(aggregator, frame):
... async for frame in aggregator.process_frame(frame):
... if isinstance(frame, TextFrame):
... print(frame.text)
... else:
... print(frame.__class__.__name__)
>>> aggregator = LLMFullResponseAggregator()
>>> asyncio.run(print_frames(aggregator, TextFrame("Hello,")))
>>> asyncio.run(print_frames(aggregator, TextFrame(" world.")))
>>> asyncio.run(print_frames(aggregator, TextFrame(" I am")))
>>> asyncio.run(print_frames(aggregator, TextFrame(" an LLM.")))
>>> asyncio.run(print_frames(aggregator, LLMResponseEndFrame()))
Hello, world. I am an LLM.
LLMResponseEndFrame
"""
def __init__(self):
self.aggregation = ""
async def process_frame(
self, frame: Frame
) -> AsyncGenerator[Frame, None]:
if isinstance(frame, TextFrame):
self.aggregation += frame.text
elif isinstance(frame, LLMResponseEndFrame):
yield TextFrame(self.aggregation)
yield frame
self.aggregation = ""
else:
yield frame
class StatelessTextTransformer(FrameProcessor):
"""This processor calls the given function on any text in a text frame.
>>> async def print_frames(aggregator, frame):
... async for frame in aggregator.process_frame(frame):
... print(frame.text)
>>> aggregator = StatelessTextTransformer(lambda x: x.upper())
>>> asyncio.run(print_frames(aggregator, TextFrame("Hello")))
HELLO
"""
def __init__(self, transform_fn):
self.transform_fn = transform_fn
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, TextFrame):
result = self.transform_fn(frame.text)
if isinstance(result, Coroutine):
result = await result
yield TextFrame(result)
else:
yield frame
class ParallelPipeline(FrameProcessor):
""" Run multiple pipelines in parallel.
This class takes frames from its source queue and sends them to each
sub-pipeline. Each sub-pipeline emits its frames into this class's
sink queue. No guarantees are made about the ordering of frames in
the sink queue (that is, no sub-pipeline has higher priority than
any other, frames are put on the sink in the order they're emitted
by the sub-pipelines).
After each frame is taken from this class's source queue and placed
in each sub-pipeline's source queue, an EndPipeFrame is put on each
sub-pipeline's source queue. This indicates to the sub-pipe runner
that it should exit.
Since frame handlers pass through unhandled frames by convention, this
class de-dupes frames in its sink before yielding them.
"""
def __init__(self, pipeline_definitions: List[List[FrameProcessor]]):
self.sources = [asyncio.Queue() for _ in pipeline_definitions]
self.sink: asyncio.Queue[Frame] = asyncio.Queue()
self.pipelines: list[Pipeline] = [
Pipeline(
pipeline_definition,
source,
self.sink,
)
for source, pipeline_definition in zip(self.sources, pipeline_definitions)
]
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
for source in self.sources:
await source.put(frame)
await source.put(EndPipeFrame())
await asyncio.gather(*[pipeline.run_pipeline() for pipeline in self.pipelines])
seen_ids = set()
while not self.sink.empty():
frame = await self.sink.get()
# de-dup frames. Because the convention is to yield a frame that isn't processed,
# each pipeline will likely yield the same frame, so we will end up with _n_ copies
# of unprocessed frames where _n_ is the number of parallel pipes that don't
# process that frame.
if id(frame) in seen_ids:
continue
seen_ids.add(id(frame))
# Skip passing along EndParallelPipeQueueFrame, because we use them for our own flow control.
if not isinstance(frame, EndPipeFrame):
yield frame
class GatedAggregator(FrameProcessor):
"""Accumulate frames, with custom functions to start and stop accumulation.
Yields gate-opening frame before any accumulated frames, then ensuing frames
until and not including the gate-closed frame.
>>> async def print_frames(aggregator, frame):
... async for frame in aggregator.process_frame(frame):
... if isinstance(frame, TextFrame):
... print(frame.text)
... else:
... print(frame.__class__.__name__)
>>> aggregator = GatedAggregator(
... gate_close_fn=lambda x: isinstance(x, LLMResponseStartFrame),
... gate_open_fn=lambda x: isinstance(x, ImageFrame),
... start_open=False)
>>> asyncio.run(print_frames(aggregator, TextFrame("Hello")))
>>> asyncio.run(print_frames(aggregator, TextFrame("Hello again.")))
>>> asyncio.run(print_frames(aggregator, ImageFrame(url='', image=bytes([]))))
ImageFrame
Hello
Hello again.
>>> asyncio.run(print_frames(aggregator, TextFrame("Goodbye.")))
Goodbye.
"""
def __init__(self, gate_open_fn, gate_close_fn, start_open):
self.gate_open_fn = gate_open_fn
self.gate_close_fn = gate_close_fn
self.gate_open = start_open
self.accumulator: List[Frame] = []
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if self.gate_open:
if self.gate_close_fn(frame):
self.gate_open = False
else:
if self.gate_open_fn(frame):
self.gate_open = True
if self.gate_open:
yield frame
if self.accumulator:
for frame in self.accumulator:
yield frame
self.accumulator = []
else:
self.accumulator.append(frame)

View File

@@ -0,0 +1,33 @@
from abc import abstractmethod
from typing import AsyncGenerator
from dailyai.pipeline.frames import ControlFrame, Frame
class FrameProcessor:
"""This is the base class for all frame processors. Frame processors consume a frame
and yield 0 or more frames. Generally frame processors are used as part of a pipeline
where frames come from a source queue, are processed by a series of frame processors,
then placed on a sink queue.
By convention, FrameProcessors should immediately yield any frames they don't process.
Stateful FrameProcessors should watch for the EndStreamQueueFrame and finalize their
output, eg. yielding an unfinished sentence if they're aggregating LLM output to full
sentences. EndStreamQueueFrame is also a chance to clean up any services that need to
be closed, del'd, etc.
"""
@abstractmethod
async def process_frame(
self, frame: Frame
) -> AsyncGenerator[Frame, None]:
"""Process a single frame and yield 0 or more frames."""
if isinstance(frame, ControlFrame):
yield frame
yield frame
@abstractmethod
async def interrupted(self) -> None:
"""Handle any cleanup if the pipeline was interrupted."""
pass

View File

@@ -0,0 +1,79 @@
from dataclasses import dataclass
from typing import Any
class Frame:
pass
class ControlFrame(Frame):
# Control frames should contain no instance data, so
# equality is based solely on the class.
def __eq__(self, other):
return type(other) == self.__class__
class StartFrame(ControlFrame):
pass
class EndFrame(ControlFrame):
pass
class EndPipeFrame(ControlFrame):
pass
class LLMResponseStartFrame(ControlFrame):
pass
class LLMResponseEndFrame(ControlFrame):
pass
@dataclass()
class AudioFrame(Frame):
data: bytes
@dataclass()
class ImageFrame(Frame):
url: str | None
image: bytes
@dataclass()
class SpriteFrame(Frame):
images: list[bytes]
@dataclass()
class TextFrame(Frame):
text: str
@dataclass()
class TranscriptionQueueFrame(TextFrame):
participantId: str
timestamp: str
@dataclass()
class LLMMessagesQueueFrame(Frame):
messages: list[dict[str, str]] # TODO: define this more concretely!
class AppMessageQueueFrame(Frame):
message: Any
participantId: str
class UserStartedSpeakingFrame(Frame):
pass
class UserStoppedSpeakingFrame(Frame):
pass
@dataclass()
class LLMFunctionCallFrame(Frame):
function_name: str
arguments: str

View File

@@ -0,0 +1,89 @@
import asyncio
from typing import AsyncGenerator, List
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.frames import EndPipeFrame, EndFrame, Frame
class Pipeline:
"""
This class manages a pipe of FrameProcessors, and runs them in sequence. The "source"
and "sink" queues are managed by the caller. You can use this class stand-alone to
perform specialized processing, or you can use the Transport's run_pipeline method to
instantiate and run a pipeline with the Transport's sink and source queues.
"""
def __init__(
self,
processors: List[FrameProcessor],
source: asyncio.Queue | None = None,
sink: asyncio.Queue[Frame] | None = None,
):
""" Create a new pipeline. By default neither the source nor sink
queues are set, so you'll need to pass them to this constructor or
call set_source and set_sink before using the pipeline. Note that
the transport's run_*_pipeline methods will set the source and sink
queues on the pipeline for you.
"""
self.processors = processors
self.source: asyncio.Queue[Frame] | None = source
self.sink: asyncio.Queue[Frame] | None = sink
def set_source(self, source: asyncio.Queue[Frame]):
""" Set the source queue for this pipeline. Frames from this queue
will be processed by each frame_processor in the pipeline, or order
from first to last. """
self.source = source
def set_sink(self, sink: asyncio.Queue[Frame]):
""" Set the sink queue for this pipeline. After the last frame_processor
has processed a frame, its output will be placed on this queue."""
self.sink = sink
async def get_next_source_frame(self) -> AsyncGenerator[Frame, None]:
""" Convenience function to get the next frame from the source queue. This
lets us consistently have an AsyncGenerator yield frames, from either the
source queue or a frame_processor."""
if self.source is None:
raise ValueError("Source queue not set")
yield await self.source.get()
async def run_pipeline(self):
""" Run the pipeline. Take each frame from the source queue, pass it to
the first frame_processor, pass the output of that frame_processor to the
next in the list, etc. until the last frame_processor has processed the
resulting frames, then place those frames in the sink queue.
The source and sink queues must be set before calling this method.
This method will exit when an EndStreamQueueFrame is placed on the sink queue.
No more frames will be placed on the sink queue after an EndStreamQueueFrame, even
if it's not the last frame yielded by the last frame_processor in the pipeline.."""
if self.source is None or self.sink is None:
raise ValueError("Source or sink queue not set")
try:
while True:
frame_generators = [self.get_next_source_frame()]
for processor in self.processors:
next_frame_generators = []
for frame_generator in frame_generators:
async for frame in frame_generator:
next_frame_generators.append(processor.process_frame(frame))
frame_generators = next_frame_generators
for frame_generator in frame_generators:
async for frame in frame_generator:
await self.sink.put(frame)
if isinstance(
frame, EndFrame
) or isinstance(
frame, EndPipeFrame
):
return
except asyncio.CancelledError:
# this means there's been an interruption, do any cleanup necessary here.
for processor in self.processors:
await processor.interrupted()
pass

View File

@@ -1,19 +0,0 @@
from enum import Enum
from dataclasses import dataclass
class FrameType(Enum):
START_STREAM = 0
END_STREAM = 1
AUDIO = 2
IMAGE = 3
SENTENCE = 4
TEXT_CHUNK = 5
LLM_MESSAGE = 6
APP_MESSAGE = 7
IMAGE_DESCRIPTION = 8
TRANSCRIPTION = 9
@dataclass(frozen=True)
class QueueFrame:
frame_type: FrameType
frame_data: str | dict | bytes | list | None

View File

@@ -1,2 +1,3 @@
Pillow==10.1.0
typing_extensions==4.9.0
typing_extensions==4.9.0
faster-whisper==0.10.0

View File

@@ -1,19 +1,35 @@
import asyncio
import io
import logging
import re
import time
import wave
from dailyai.pipeline.frame_processor import FrameProcessor
from httpx import request
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.pipeline.frames import (
AudioFrame,
EndFrame,
ImageFrame,
LLMMessagesQueueFrame,
LLMResponseEndFrame,
LLMResponseStartFrame,
LLMFunctionCallFrame,
Frame,
TextFrame,
TranscriptionQueueFrame,
)
from abc import abstractmethod
from typing import AsyncGenerator, Iterable
from dataclasses import dataclass
from typing import AsyncGenerator
from typing import AsyncGenerator, AsyncIterable, BinaryIO, Iterable, List
from collections.abc import Iterable, AsyncIterable
class AIService(FrameProcessor):
""" This is the base class for various AI services (LLM, TTS and Image)
class AIService:
This class adds some convenienence functions to run, effectively, a one-stage
pipeline where the incoming frames can come from an iterable or queue
and the processed frames go to a queue. Child classes extend those convenience
functions, eg. TTS's `say` method runs the TTS and emits the AudioFrames to a
queue.
"""
def __init__(self):
self.logger = logging.getLogger("dailyai")
@@ -21,149 +37,143 @@ class AIService:
def stop(self):
pass
def allowed_input_frame_types(self) -> set[FrameType]:
return set()
def possible_output_frame_types(self) -> set[FrameType]:
return set()
async def run_to_queue(self, queue: asyncio.Queue, frames, add_end_of_stream=False) -> None:
async def run_to_queue(
self,
queue: asyncio.Queue,
frames: Iterable[Frame] | AsyncIterable[Frame] | asyncio.Queue[Frame]
) -> None:
""" Process the given frames (from an iterable or queue) and send them to
the given queue.
"""
async for frame in self.run(frames):
await queue.put(frame)
if add_end_of_stream:
await queue.put(QueueFrame(FrameType.END_STREAM, None))
async def run(
self,
frames: Iterable[QueueFrame]
| AsyncIterable[QueueFrame]
| asyncio.Queue[QueueFrame],
requested_frame_types: set[FrameType] | None=None,
) -> AsyncGenerator[QueueFrame, None]:
if requested_frame_types and self.possible_output_frame_types().intersection(requested_frame_types) == set():
raise Exception(f"Requested frame types {requested_frame_types} are not supported by this service.")
frames: Iterable[Frame]
| AsyncIterable[Frame]
| asyncio.Queue[Frame],
) -> AsyncGenerator[Frame, None]:
""" Generates 0 or more frames from the given iterable or queue.
if not requested_frame_types:
requested_frame_types = self.possible_output_frame_types()
This is a convenience function to take a collection of frames, process
them, and yield processed frames.
if isinstance(frames, AsyncIterable):
async for frame in frames:
async for output_frame in self.process_frame(requested_frame_types, frame):
yield output_frame
elif isinstance(frames, Iterable):
for frame in frames:
async for output_frame in self.process_frame(requested_frame_types, frame):
yield output_frame
elif isinstance(frames, asyncio.Queue):
while True:
frame = await frames.get()
async for output_frame in self.process_frame(requested_frame_types, frame):
yield output_frame
if frame.frame_type == FrameType.END_STREAM:
break
else:
raise Exception("Frames must be an iterable or async iterable")
@abstractmethod
async def process_frame(self, requested_frame_types:set[FrameType], frame:QueueFrame) -> AsyncGenerator[QueueFrame, None]:
# Yield something so the linter can deduce what should happen here.
yield QueueFrame(FrameType.END_STREAM, None)
class SentenceAggregator(AIService):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.current_sentence = ""
def allowed_input_frame_types(self) -> set[FrameType]:
return set([FrameType.TEXT_CHUNK, FrameType.SENTENCE])
def possible_output_frame_types(self) -> set[FrameType]:
return set([FrameType.SENTENCE])
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
if not FrameType.SENTENCE in requested_frame_types:
return
if frame.frame_type == FrameType.TEXT_CHUNK:
if type(frame.frame_data) != str:
raise Exception(
"Sentence aggregator requires a string for the data field"
)
self.current_sentence += frame.frame_data
if self.current_sentence.endswith((".", "?", "!")):
sentence = self.current_sentence
self.current_sentence = ""
yield QueueFrame(FrameType.SENTENCE, sentence)
elif frame.frame_type == FrameType.END_STREAM:
if self.current_sentence:
yield QueueFrame(FrameType.SENTENCE, self.current_sentence)
elif frame.frame_type == FrameType.SENTENCE:
yield frame
The preferred way to use FrameProcessors is with a pipeline, but if you
have a very simple case (eg. a list of static text blocks you want to speak,
or a list of static image description you want to render) this function
will be helpful.
"""
try:
if isinstance(frames, AsyncIterable):
async for frame in frames:
async for output_frame in self.process_frame(frame):
yield output_frame
elif isinstance(frames, Iterable):
for frame in frames:
async for output_frame in self.process_frame(frame):
yield output_frame
elif isinstance(frames, asyncio.Queue):
while True:
frame = await frames.get()
async for output_frame in self.process_frame(frame):
yield output_frame
if isinstance(frame, EndFrame):
break
else:
raise Exception("Frames must be an iterable or async iterable")
except Exception as e:
self.logger.error("Exception occurred while running AI service", e)
raise e
class LLMService(AIService):
def allowed_input_frame_types(self) -> set[FrameType]:
return set([FrameType.LLM_MESSAGE, FrameType.SENTENCE, FrameType.TRANSCRIPTION])
def allowed_output_frame_types(self) -> set[FrameType]:
return set([FrameType.SENTENCE, FrameType.TEXT_CHUNK])
def __init__(self, messages=None, tools=None):
super().__init__()
self._tools = tools
self._messages = messages
@abstractmethod
async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
async def run_llm_async(self, messages, tool_choice=None) -> AsyncGenerator[str, None]:
yield ""
@abstractmethod
async def run_llm(self, messages) -> str:
pass
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
if frame.frame_type == FrameType.LLM_MESSAGE:
if type(frame.frame_data) != list:
raise Exception("LLM service requires a dict for the data field")
async def process_frame(self, frame: Frame, tool_choice: str | None = None) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMMessagesQueueFrame):
function_name = ""
arguments = ""
if isinstance(frame, LLMMessagesQueueFrame):
yield LLMResponseStartFrame()
async for text_chunk in self.run_llm_async(frame.messages, tool_choice):
if isinstance(text_chunk, str):
yield TextFrame(text_chunk)
elif text_chunk.function:
if text_chunk.function.name:
# function_name += text_chunk.function.name
yield LLMFunctionCallFrame(function_name=text_chunk.function.name, arguments=None)
if text_chunk.function.arguments:
# arguments += text_chunk.function.arguments
yield LLMFunctionCallFrame(function_name=None, arguments=text_chunk.function.arguments)
messages: list[dict[str, str]] = frame.frame_data
if FrameType.SENTENCE in requested_frame_types:
yield QueueFrame(FrameType.SENTENCE, await self.run_llm(messages))
else:
async for text_chunk in self.run_llm_async(messages):
yield QueueFrame(FrameType.TEXT_CHUNK, text_chunk)
# TODO: handle other frame types! Need to aggregate into messages
if (function_name and arguments):
function_name = ""
arguments = ""
yield LLMResponseEndFrame()
else:
yield frame
class TTSService(AIService):
def __init__(self, aggregate_sentences=True):
super().__init__()
self.aggregate_sentences: bool = aggregate_sentences
self.current_sentence: str = ""
# Some TTS services require a specific sample rate. We default to 16k
def get_mic_sample_rate(self):
return 16000
def allowed_input_frame_types(self) -> set[FrameType]:
return set([FrameType.SENTENCE, FrameType.TRANSCRIPTION, FrameType.TEXT_CHUNK])
def possible_output_frame_types(self) -> set[FrameType]:
return set([FrameType.AUDIO])
# Converts the sentence to audio. Yields a list of audio frames that can
# Converts the text to audio. Yields a list of audio frames that can
# be sent to the microphone device
@abstractmethod
async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
async def run_tts(self, text) -> AsyncGenerator[bytes, None]:
# yield empty bytes here, so linting can infer what this method does
yield bytes()
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
if not FrameType.AUDIO in requested_frame_types:
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, EndFrame):
if self.current_sentence:
async for audio_chunk in self.run_tts(self.current_sentence):
yield AudioFrame(audio_chunk)
yield TextFrame(self.current_sentence)
if not isinstance(frame, TextFrame):
yield frame
return
if type(frame.frame_data) != str:
raise Exception("TTS service requires a string for the data field")
text: str | None = None
if not self.aggregate_sentences:
text = frame.text
else:
self.current_sentence += frame.text
if self.current_sentence.endswith((".", "?", "!")):
text = self.current_sentence
self.current_sentence = ""
async for audio_chunk in self.run_tts(frame.frame_data):
yield QueueFrame(FrameType.AUDIO, audio_chunk)
if text:
async for audio_chunk in self.run_tts(text):
yield AudioFrame(audio_chunk)
# note we pass along the text frame *after* the audio, so the text frame is completed after the audio is processed.
yield TextFrame(text)
# Convenience function to send the audio for a sentence to the given queue
async def say(self, sentence, queue: asyncio.Queue):
await self.run_to_queue(queue, [QueueFrame(FrameType.SENTENCE, sentence)])
await self.run_to_queue(queue, [LLMResponseStartFrame(), TextFrame(sentence), LLMResponseEndFrame()])
class ImageGenService(AIService):
@@ -171,30 +181,61 @@ class ImageGenService(AIService):
super().__init__(**kwargs)
self.image_size = image_size
def allowed_input_frame_types(self) -> set[FrameType]:
return set([FrameType.SENTENCE, FrameType.TRANSCRIPTION, FrameType.TEXT_CHUNK, FrameType.IMAGE_DESCRIPTION])
def possible_output_frame_types(self) -> set[FrameType]:
return set([FrameType.IMAGE])
# Renders the image. Returns an Image object.
@abstractmethod
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
async def run_image_gen(self, sentence: str) -> tuple[str, bytes]:
pass
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
if not FrameType.IMAGE in requested_frame_types:
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if not isinstance(frame, TextFrame):
yield frame
return
if type(frame.frame_data) != str:
raise Exception("Image service requires a string for the data field")
(_, image_data) = await self.run_image_gen(frame.frame_data)
yield QueueFrame(FrameType.IMAGE, image_data)
(url, image_data) = await self.run_image_gen(frame.text)
yield ImageFrame(url, image_data)
@dataclass
class AIServiceConfig:
tts: TTSService
image: ImageGenService
llm: LLMService
class STTService(AIService):
"""STTService is a base class for speech-to-text services."""
_frame_rate: int
def __init__(self, frame_rate: int = 16000, **kwargs):
super().__init__(**kwargs)
self._frame_rate = frame_rate
@abstractmethod
async def run_stt(self, audio: BinaryIO) -> str:
"""Returns transcript as a string"""
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
"""Processes a frame of audio data, either buffering or transcribing it."""
if not isinstance(frame, AudioFrame):
return
data = frame.data
content = io.BufferedRandom(io.BytesIO())
ww = wave.open(self._content, "wb")
ww.setnchannels(1)
ww.setsampwidth(2)
ww.setframerate(self._frame_rate)
ww.writeframesraw(data)
ww.close()
content.seek(0)
text = await self.run_stt(content)
yield TranscriptionQueueFrame(text, '', str(time.time()))
class FrameLogger(AIService):
def __init__(self, prefix="Frame", **kwargs):
super().__init__(**kwargs)
self.prefix = prefix
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, (AudioFrame, ImageFrame)):
self.logger.info(f"{self.prefix}: {type(frame)}")
else:
print(f"{self.prefix}: {frame}")
yield frame

View File

@@ -2,6 +2,7 @@ import aiohttp
import asyncio
import io
import json
import time
from openai import AsyncAzureOpenAI
import os
@@ -15,30 +16,26 @@ from PIL import Image
# See .env.example for Azure configuration needed
from azure.cognitiveservices.speech import SpeechSynthesizer, SpeechConfig, ResultReason, CancellationReason
class AzureTTSService(TTSService):
def __init__(self, speech_key=None, speech_region=None):
def __init__(self, *, api_key, region):
super().__init__()
speech_key = speech_key or os.getenv("AZURE_SPEECH_SERVICE_KEY")
speech_region = speech_region or os.getenv("AZURE_SPEECH_SERVICE_REGION")
self.speech_config = SpeechConfig(subscription=speech_key, region=speech_region)
self.speech_synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=None)
self.speech_config = SpeechConfig(subscription=api_key, region=region)
self.speech_synthesizer = SpeechSynthesizer(
speech_config=self.speech_config, audio_config=None)
async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
self.logger.info("Running azure tts")
ssml = "<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' " \
"xmlns:mstts='http://www.w3.org/2001/mstts'>" \
"<voice name='en-US-SaraNeural'>" \
"<mstts:silence type='Sentenceboundary' value='20ms' />" \
"<mstts:express-as style='lyrical' styledegree='2' role='SeniorFemale'>" \
"<prosody rate='1.05'>" \
f"{sentence}" \
"</prosody></mstts:express-as></voice></speak> "
try:
result = await asyncio.to_thread(self.speech_synthesizer.speak_ssml, (ssml))
except Exception as e:
self.logger.error("Error in azure tts", e)
"xmlns:mstts='http://www.w3.org/2001/mstts'>" \
"<voice name='en-US-SaraNeural'>" \
"<mstts:silence type='Sentenceboundary' value='20ms' />" \
"<mstts:express-as style='lyrical' styledegree='2' role='SeniorFemale'>" \
"<prosody rate='1.05'>" \
f"{sentence}" \
"</prosody></mstts:express-as></voice></speak> "
result = await asyncio.to_thread(self.speech_synthesizer.speak_ssml, (ssml))
self.logger.info("Got azure tts result")
if result.reason == ResultReason.SynthesizingAudioCompleted:
self.logger.info("Returning result")
@@ -50,126 +47,101 @@ class AzureTTSService(TTSService):
if cancellation_details.reason == CancellationReason.Error:
self.logger.info("Error details: {}".format(cancellation_details.error_details))
class AzureLLMService(LLMService):
def __init__(self, api_key=None, azure_endpoint=None, api_version=None, model=None):
super().__init__()
api_key = api_key or os.getenv("AZURE_CHATGPT_KEY")
def __init__(self, *, api_key, endpoint, api_version="2023-12-01-preview", model, tools=None, messages=None):
super().__init__(tools=tools, messages=messages)
self._model: str = model
azure_endpoint = azure_endpoint or os.getenv("AZURE_CHATGPT_ENDPOINT")
if not azure_endpoint:
raise Exception("No azure endpoint specified for Azure LLM, please set AZURE_CHATGPT_ENDPOINT in the environment or pass it to the AzureLLMService constructor")
model: str | None = model or os.getenv("AZURE_CHATGPT_DEPLOYMENT_ID")
if not model:
raise Exception("No model specified for Azure LLM, please set AZURE_CHATGPT_DEPLOYMENT_ID in the environment or pass it to the AzureLLMService constructor")
self.model: str = model
api_version = api_version or "2023-12-01-preview"
self.client = AsyncAzureOpenAI(
self._client = AsyncAzureOpenAI(
api_key=api_key,
azure_endpoint=azure_endpoint,
azure_endpoint=endpoint,
api_version=api_version,
)
async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
async def run_llm_async(self, messages, tool_choice=None) -> AsyncGenerator[str, None]:
messages_for_log = json.dumps(messages)
self.logger.debug(f"Generating chat via azure: {messages_for_log}")
chunks = await self.client.chat.completions.create(model=self.model, stream=True, messages=messages)
if self._tools:
tools = self._tools
else:
tools = None
start_time = time.time()
chunks = await self._client.chat.completions.create(model=self._model, stream=True, messages=messages, tools=tools, tool_choice=tool_choice)
self.logger.info(f"=== Azure OpenAI LLM TTFB: {time.time() - start_time}")
async for chunk in chunks:
if len(chunk.choices) == 0:
continue
if chunk.choices[0].delta.content:
if chunk.choices[0].delta.tool_calls:
yield chunk.choices[0].delta.tool_calls[0]
elif chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
async def run_llm(self, messages) -> str | None:
messages_for_log = json.dumps(messages)
self.logger.debug(f"Generating chat via azure: {messages_for_log}")
response = await self.client.chat.completions.create(model=self.model, stream=False, messages=messages)
response = await self._client.chat.completions.create(model=self._model, stream=False, messages=messages)
if response and len(response.choices) > 0:
return response.choices[0].message.content
else:
return None
class AzureImageGenServiceREST(ImageGenService):
def __init__(self, image_size:str, api_key=None, azure_endpoint=None, api_version=None, model=None):
def __init__(
self,
*,
api_version="2023-06-01-preview",
image_size: str,
aiohttp_session: aiohttp.ClientSession,
api_key,
endpoint,
model):
super().__init__(image_size=image_size)
self.api_key = api_key or os.getenv("AZURE_DALLE_KEY")
self.azure_endpoint = azure_endpoint or os.getenv("AZURE_DALLE_ENDPOINT")
self.api_version = api_version or "2023-06-01-preview"
self.model = model or os.getenv("AZURE_DALLE_DEPLOYMENT_ID")
self._api_key = api_key
self._azure_endpoint = endpoint
self._api_version = api_version
self._model = model
self._aiohttp_session = aiohttp_session
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
# TODO hoist the session to app-level
async with aiohttp.ClientSession() as session:
url = f"{self.azure_endpoint}openai/images/generations:submit?api-version={self.api_version}"
headers= { "api-key": self.api_key, "Content-Type": "application/json" }
body = {
# Enter your prompt text here
"prompt": sentence,
"size": self.image_size,
"n": 1,
}
async with session.post(url, headers=headers, json=body) as submission:
operation_location = submission.headers['operation-location']
url = f"{self._azure_endpoint}openai/images/generations:submit?api-version={self._api_version}"
headers = {"api-key": self._api_key, "Content-Type": "application/json"}
body = {
# Enter your prompt text here
"prompt": sentence,
"size": self.image_size,
"n": 1,
}
async with self._aiohttp_session.post(
url, headers=headers, json=body
) as submission:
# We never get past this line, because this header isn't
# defined on a 429 response, but something is eating our exceptions!
operation_location = submission.headers['operation-location']
status = ""
attempts_left = 120
json_response = None
while status != "succeeded":
attempts_left -= 1
if attempts_left == 0:
raise Exception("Image generation timed out")
status = ""
attempts_left = 120
json_response = None
while status != "succeeded":
attempts_left -= 1
if attempts_left == 0:
raise Exception("Image generation timed out")
await asyncio.sleep(1)
response = await self._aiohttp_session.get(
operation_location, headers=headers
)
json_response = await response.json()
status = json_response["status"]
await asyncio.sleep(1)
response = await session.get(operation_location, headers=headers)
json_response = await response.json()
status = json_response["status"]
image_url = json_response["result"]["data"][0]["url"] if json_response else None
if not image_url:
raise Exception("Image generation failed")
# Load the image from the url
async with session.get(image_url) as response:
image_stream = io.BytesIO(await response.content.read())
image = Image.open(image_stream)
return (image_url, image.tobytes())
class AzureImageGenService(ImageGenService):
def __init__(self, api_key=None, azure_endpoint=None, api_version=None, model=None):
super().__init__()
api_key = api_key or os.getenv("AZURE_DALLE_KEY")
azure_endpoint = azure_endpoint or os.getenv("AZURE_DALLE_ENDPOINT")
api_version = api_version or "2023-06-01-preview"
self.model = model or os.getenv("AZURE_DALLE_DEPLOYMENT_ID")
self.client = AzureOpenAI(
api_key=api_key,
azure_endpoint=azure_endpoint,
api_version=api_version,
)
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
self.logger.info("Generating azure image", sentence)
image = self.client.images.generate(
model=self.model,
prompt=sentence,
n=1,
size=self.image_size,
)
url = image["data"][0]["url"]
response = requests.get(url)
dalle_stream = io.BytesIO(response.content)
dalle_im = Image.open(dalle_stream.tobytes())
return (url, dalle_im)
image_url = json_response["result"]["data"][0]["url"] if json_response else None
if not image_url:
raise Exception("Image generation failed")
# Load the image from the url
async with self._aiohttp_session.get(image_url) as response:
image_stream = io.BytesIO(await response.content.read())
image = Image.open(image_stream)
return (image_url, image.tobytes())

View File

@@ -0,0 +1,460 @@
from abc import abstractmethod
import asyncio
import itertools
import logging
import numpy as np
import pyaudio
import torch
import queue
import threading
import time
from typing import AsyncGenerator
from enum import Enum
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.frames import (
AudioFrame,
EndFrame,
ImageFrame,
Frame,
SpriteFrame,
StartFrame,
TranscriptionQueueFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame
)
from dailyai.pipeline.pipeline import Pipeline
torch.set_num_threads(1)
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
model='silero_vad',
force_reload=False)
(get_speech_timestamps,
save_audio,
read_audio,
VADIterator,
collect_chunks) = utils
# Taken from utils_vad.py
def validate(model,
inputs: torch.Tensor):
with torch.no_grad():
outs = model(inputs)
return outs
# Provided by Alexander Veysov
def int2float(sound):
abs_max = np.abs(sound).max()
sound = sound.astype('float32')
if abs_max > 0:
sound *= 1/32768
sound = sound.squeeze() # depends on the use case
return sound
FORMAT = pyaudio.paInt16
CHANNELS = 1
SAMPLE_RATE = 16000
CHUNK = int(SAMPLE_RATE / 10)
audio = pyaudio.PyAudio()
class VADState(Enum):
QUIET = 1
STARTING = 2
SPEAKING = 3
STOPPING = 4
class BaseTransportService():
def __init__(
self,
**kwargs,
) -> None:
self._mic_enabled = kwargs.get("mic_enabled") or False
self._mic_sample_rate = kwargs.get("mic_sample_rate") or 16000
self._camera_enabled = kwargs.get("camera_enabled") or False
self._camera_width = kwargs.get("camera_width") or 1024
self._camera_height = kwargs.get("camera_height") or 768
self._speaker_enabled = kwargs.get("speaker_enabled") or False
self._speaker_sample_rate = kwargs.get("speaker_sample_rate") or 16000
self._fps = kwargs.get("fps") or 8
self._vad_start_s = kwargs.get("vad_start_s") or 0.2
self._vad_stop_s = kwargs.get("vad_stop_s") or 0.8
self._context = kwargs.get("context") or []
self._vad_enabled = kwargs.get("vad_enabled") or False
if self._vad_enabled and self._speaker_enabled:
raise Exception(
"Sorry, you can't use speaker_enabled and vad_enabled at the same time. Please set one to False.")
self._vad_samples = 1536
vad_frame_s = self._vad_samples / SAMPLE_RATE
self._vad_start_frames = round(self._vad_start_s / vad_frame_s)
self._vad_stop_frames = round(self._vad_stop_s / vad_frame_s)
self._vad_starting_count = 0
self._vad_stopping_count = 0
self._vad_state = VADState.QUIET
self._user_is_speaking = False
duration_minutes = kwargs.get("duration_minutes") or 10
self._expiration = time.time() + duration_minutes * 60
self.send_queue = asyncio.Queue()
self.receive_queue = asyncio.Queue()
self.completed_queue = asyncio.Queue()
self._threadsafe_send_queue = queue.Queue()
self._images = None
try:
self._loop: asyncio.AbstractEventLoop | None = asyncio.get_running_loop()
except RuntimeError:
self._loop = None
self._stop_threads = threading.Event()
self._is_interrupted = threading.Event()
self._logger: logging.Logger = logging.getLogger()
async def run(self):
self._prerun()
async_output_queue_marshal_task = asyncio.create_task(
self._marshal_frames())
self._camera_thread = threading.Thread(
target=self._run_camera, daemon=True)
self._camera_thread.start()
self._frame_consumer_thread = threading.Thread(
target=self._frame_consumer, daemon=True)
self._frame_consumer_thread.start()
if self._speaker_enabled:
self._receive_audio_thread = threading.Thread(
target=self._receive_audio, daemon=True)
self._receive_audio_thread.start()
if self._vad_enabled:
self._vad_thread = threading.Thread(target=self._vad, daemon=True)
self._vad_thread.start()
try:
while (
time.time() < self._expiration
and not self._stop_threads.is_set()
):
await asyncio.sleep(1)
except Exception as e:
self._logger.error(f"Exception {e}")
raise e
finally:
# Do anything that must be done to clean up
self._post_run()
self._stop_threads.set()
await self.send_queue.put(EndFrame())
await async_output_queue_marshal_task
await self.send_queue.join()
self._frame_consumer_thread.join()
if self._speaker_enabled:
self._receive_audio_thread.join()
if self._vad_enabled:
self._vad_thread.join()
async def run_uninterruptible_pipeline(self, pipeline: Pipeline):
pipeline.set_sink(self.send_queue)
pipeline.set_source(self.receive_queue)
await pipeline.run_pipeline()
async def run_interruptible_pipeline(
self,
pipeline: Pipeline,
allow_interruptions=True,
pre_processor=None,
post_processor: FrameProcessor | None = None,
):
pipeline.set_sink(self.send_queue)
source_queue = asyncio.Queue()
pipeline.set_source(source_queue)
pipeline.set_sink(self.send_queue)
pipeline_task = asyncio.create_task(pipeline.run_pipeline())
async def yield_frame(frame: Frame) -> AsyncGenerator[Frame, None]:
yield frame
async def post_process(post_processor: FrameProcessor):
while True:
frame = await self.completed_queue.get()
# We ignore the output of the post_processor's process frame;
# this is called to update the post-processor's state.
async for frame in post_processor.process_frame(frame):
pass
if isinstance(frame, EndFrame):
break
if post_processor:
post_process_task = asyncio.create_task(post_process(post_processor))
started = False
async for frame in self.get_receive_frames():
if isinstance(frame, UserStartedSpeakingFrame):
pipeline_task.cancel()
self.interrupt()
pipeline_task = asyncio.create_task(pipeline.run_pipeline())
started = False
if not started:
await self.send_queue.put(StartFrame())
if pre_processor:
frame_generator = pre_processor.process_frame(frame)
else:
frame_generator = yield_frame(frame)
async for frame in frame_generator:
await source_queue.put(frame)
if isinstance(frame, EndFrame):
break
await asyncio.gather(pipeline_task, post_process_task)
def _post_run(self):
# Note that this function must be idempotent! It can be called multiple times
# if, for example, a keyboard interrupt occurs.
pass
def stop(self):
self._stop_threads.set()
async def stop_when_done(self):
await self._wait_for_send_queue_to_empty()
self.stop()
async def _wait_for_send_queue_to_empty(self):
await self.send_queue.join()
self._threadsafe_send_queue.join()
@abstractmethod
def write_frame_to_camera(self, frame: bytes):
pass
@abstractmethod
def write_frame_to_mic(self, frame: bytes):
pass
@abstractmethod
def read_audio_frames(self, desired_frame_count):
return bytes()
@abstractmethod
def _prerun(self):
pass
def _vad(self):
# CB: Starting silero VAD stuff
# TODO-CB: Probably need to force virtual speaker creation if we're
# going to build this in?
# TODO-CB: pyaudio installation
while not self._stop_threads.is_set():
audio_chunk = self.read_audio_frames(self._vad_samples)
audio_int16 = np.frombuffer(audio_chunk, np.int16)
audio_float32 = int2float(audio_int16)
new_confidence = model(
torch.from_numpy(audio_float32), 16000).item()
speaking = new_confidence > 0.5
if speaking:
match self._vad_state:
case VADState.QUIET:
self._vad_state = VADState.STARTING
self._vad_starting_count = 1
case VADState.STARTING:
self._vad_starting_count += 1
case VADState.STOPPING:
self._vad_state = VADState.SPEAKING
self._vad_stopping_count = 0
else:
match self._vad_state:
case VADState.STARTING:
self._vad_state = VADState.QUIET
self._vad_starting_count = 0
case VADState.SPEAKING:
self._vad_state = VADState.STOPPING
self._vad_stopping_count = 1
case VADState.STOPPING:
self._vad_stopping_count += 1
if self._vad_state == VADState.STARTING and self._vad_starting_count >= self._vad_start_frames:
asyncio.run_coroutine_threadsafe(
self.receive_queue.put(
UserStartedSpeakingFrame()), self._loop
)
# self.interrupt()
self._vad_state = VADState.SPEAKING
self._vad_starting_count = 0
if self._vad_state == VADState.STOPPING and self._vad_stopping_count >= self._vad_stop_frames:
asyncio.run_coroutine_threadsafe(
self.receive_queue.put(
UserStoppedSpeakingFrame()), self._loop
)
self._vad_state = VADState.QUIET
self._vad_stopping_count = 0
async def _marshal_frames(self):
while True:
frame: Frame | list = await self.send_queue.get()
self._threadsafe_send_queue.put(frame)
self.send_queue.task_done()
if isinstance(frame, EndFrame):
break
def interrupt(self):
self._logger.debug("!!! Interrupting")
self._is_interrupted.set()
async def get_receive_frames(self) -> AsyncGenerator[Frame, None]:
while True:
frame = await self.receive_queue.get()
yield frame
if isinstance(frame, EndFrame):
break
def _receive_audio(self):
if not self._loop:
self._logger.error("No loop available for audio thread")
return
seconds = 1
desired_frame_count = self._speaker_sample_rate * seconds
while not self._stop_threads.is_set():
buffer = self.read_audio_frames(desired_frame_count)
if len(buffer) > 0:
frame = AudioFrame(buffer)
asyncio.run_coroutine_threadsafe(
self.receive_queue.put(frame), self._loop
)
asyncio.run_coroutine_threadsafe(
self.receive_queue.put(EndFrame()), self._loop
)
def _set_image(self, image: bytes):
self._images = itertools.cycle([image])
def _set_images(self, images: list[bytes], start_frame=0):
self._images = itertools.cycle(images)
def _run_camera(self):
try:
while not self._stop_threads.is_set():
if self._images:
this_frame = next(self._images)
self.write_frame_to_camera(this_frame)
time.sleep(1.0 / self._fps)
except Exception as e:
self._logger.error(f"Exception {e} in camera thread.")
raise e
def _frame_consumer(self):
self._logger.info("🎬 Starting frame consumer thread")
b = bytearray()
smallest_write_size = 3200
largest_write_size = 8000
all_audio_frames = bytearray()
while True:
try:
frames_or_frame: Frame | list[Frame] = (
self._threadsafe_send_queue.get()
)
if isinstance(frames_or_frame, AudioFrame) and len(frames_or_frame.data) > largest_write_size:
# subdivide large audio frames to enable interruption
frames = []
for i in range(0, len(frames_or_frame.data), largest_write_size):
frames.append(AudioFrame(
frames_or_frame.data[i: i+largest_write_size]))
elif isinstance(frames_or_frame, Frame):
frames: list[Frame] = [frames_or_frame]
elif isinstance(frames_or_frame, list):
frames: list[Frame] = frames_or_frame
else:
raise Exception("Unknown type in output queue")
for frame in frames:
if isinstance(frame, EndFrame):
self._logger.info("Stopping frame consumer thread")
self._threadsafe_send_queue.task_done()
if self._loop:
asyncio.run_coroutine_threadsafe(
self.completed_queue.put(frame), self._loop
)
return
# if interrupted, we just pull frames off the queue and discard them
if not self._is_interrupted.is_set():
if frame:
if isinstance(frame, AudioFrame):
chunk = frame.data
all_audio_frames.extend(chunk)
b.extend(chunk)
truncated_length: int = len(b) - (
len(b) % smallest_write_size
)
if truncated_length:
self.write_frame_to_mic(
bytes(b[:truncated_length]))
b = b[truncated_length:]
elif isinstance(frame, ImageFrame):
self._set_image(frame.image)
elif isinstance(frame, SpriteFrame):
self._set_images(frame.images)
elif len(b):
self.write_frame_to_mic(bytes(b))
b = bytearray()
else:
# if there are leftover audio bytes, write them now; failing to do so
# can cause static in the audio stream.
if len(b):
truncated_length = len(b) - (len(b) % 160)
self.write_frame_to_mic(
bytes(b[:truncated_length]))
b = bytearray()
if isinstance(frame, StartFrame):
self._is_interrupted.clear()
if self._loop:
asyncio.run_coroutine_threadsafe(
self.completed_queue.put(frame), self._loop
)
self._threadsafe_send_queue.task_done()
except queue.Empty:
if len(b):
self.write_frame_to_mic(bytes(b))
b = bytearray()
except Exception as e:
self._logger.error(
f"Exception in frame_consumer: {e}, {len(b)}")
raise e

View File

@@ -1,15 +1,17 @@
import asyncio
import inspect
import logging
import time
import signal
import threading
import types
from functools import partial
from queue import Queue, Empty
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.pipeline.frames import (
TranscriptionQueueFrame,
)
from threading import Thread, Event, Timer
from threading import Event
from daily import (
EventHandler,
@@ -20,43 +22,43 @@ from daily import (
VirtualSpeakerDevice,
)
class DailyTransportService(EventHandler):
from dailyai.services.base_transport_service import BaseTransportService
class DailyTransportService(BaseTransportService, EventHandler):
_daily_initialized = False
_lock = threading.Lock()
_speaker_enabled: bool
_speaker_sample_rate: int
_vad_enabled: bool
# This is necessary to override EventHandler's __new__ method.
def __new__(cls, *args, **kwargs):
return super().__new__(cls)
def __init__(
self,
room_url: str,
token: str | None,
bot_name: str,
duration: float = 10,
min_others_count: int = 1,
start_transcription: bool = False,
**kwargs,
):
super().__init__()
self.bot_name: str = bot_name
self.room_url: str = room_url
self.token: str | None = token
self.duration: float = duration
self.expiration = time.time() + duration * 60
super().__init__(**kwargs) # This will call BaseTransportService.__init__ method, not EventHandler
# This queue is used to marshal frames from the async send queue to the thread that emits audio & video.
# We need this to maintain the asynchronous behavior of asyncio queues -- to give async functions
# a chance to run while waiting for queue items -- but also to maintain thread safety and have a threaded
# handler to send frames, to ensure that sending isn't subject to pauses in the async thread.
self.threadsafe_send_queue = Queue()
self._room_url: str = room_url
self._bot_name: str = bot_name
self._token: str | None = token
self._min_others_count = min_others_count
self._start_transcription = start_transcription
self.is_interrupted = Event()
self.stop_threads = Event()
self.story_started = False
self.mic_enabled = False
self.mic_sample_rate = 16000
self.camera_width = 1024
self.camera_height = 768
self.camera_enabled = False
self._is_interrupted = Event()
self._stop_threads = Event()
self.send_queue = asyncio.Queue()
self.receive_queue = asyncio.Queue()
self.other_participant_has_joined = False
self.camera_thread = None
self.frame_consumer_thread = None
self._other_participant_has_joined = False
self._my_participant_id = None
self.transcription_settings = {
"language": "en",
@@ -70,27 +72,24 @@ class DailyTransportService(EventHandler):
},
}
self.logger: logging.Logger = logging.getLogger("dailyai")
self._logger: logging.Logger = logging.getLogger("dailyai")
self.event_handlers = {}
self._event_handlers = {}
def _patch_method(self, event_name, *args, **kwargs):
try:
self.loop = asyncio.get_running_loop()
except RuntimeError:
self.loop = None
def patch_method(self, event_name, *args, **kwargs):
try:
for handler in self.event_handlers[event_name]:
for handler in self._event_handlers[event_name]:
if inspect.iscoroutinefunction(handler):
if self.loop:
asyncio.run_coroutine_threadsafe(handler(*args, **kwargs), self.loop)
if self._loop:
asyncio.run_coroutine_threadsafe(handler(*args, **kwargs), self._loop)
else:
raise Exception("No event loop to run coroutine. In order to use async event handlers, you must run the DailyTransportService in an asyncio event loop.")
raise Exception(
"No event loop to run coroutine. In order to use async event handlers, you must run the DailyTransportService in an asyncio event loop.")
else:
handler(*args, **kwargs)
except Exception as e:
self.logger.error(f"Exception in event handler {event_name}: {e}")
self._logger.error(f"Exception in event handler {event_name}: {e}")
raise e
def add_event_handler(self, event_name: str, handler):
if not event_name.startswith("on_"):
@@ -100,11 +99,14 @@ class DailyTransportService(EventHandler):
if event_name not in [method[0] for method in methods]:
raise Exception(f"Event handler {event_name} not found")
if not event_name in self.event_handlers:
self.event_handlers[event_name] = [getattr(self, event_name), types.MethodType(handler, self)]
setattr(self, event_name, partial(self.patch_method, event_name))
if event_name not in self._event_handlers:
self._event_handlers[event_name] = [
getattr(
self, event_name), types.MethodType(
handler, self)]
setattr(self, event_name, partial(self._patch_method, event_name))
else:
self.event_handlers[event_name].append(types.MethodType(handler, self))
self._event_handlers[event_name].append(types.MethodType(handler, self))
def event_handler(self, event_name: str):
def decorator(handler):
@@ -113,167 +115,143 @@ class DailyTransportService(EventHandler):
return decorator
def configure_daily(self):
Daily.init()
def write_frame_to_camera(self, frame: bytes):
self.camera.write_frame(frame)
def write_frame_to_mic(self, frame: bytes):
self.mic.write_frames(frame)
def read_audio_frames(self, desired_frame_count):
bytes = self._speaker.read_frames(desired_frame_count)
return bytes
def _prerun(self):
# Only initialize Daily once
if not DailyTransportService._daily_initialized:
with DailyTransportService._lock:
Daily.init()
DailyTransportService._daily_initialized = True
self.client = CallClient(event_handler=self)
if self.mic_enabled:
if self._mic_enabled:
self.mic: VirtualMicrophoneDevice = Daily.create_microphone_device(
"mic", sample_rate=self.mic_sample_rate, channels=1
"mic", sample_rate=self._mic_sample_rate, channels=1
)
if self.camera_enabled:
if self._camera_enabled:
self.camera: VirtualCameraDevice = Daily.create_camera_device(
"camera", width=self.camera_width, height=self.camera_height, color_format="RGB"
"camera", width=self._camera_width, height=self._camera_height, color_format="RGB"
)
self.speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
"speaker", sample_rate=16000, channels=1
)
if self._speaker_enabled or self._vad_enabled:
self._speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
"speaker", sample_rate=self._speaker_sample_rate, channels=1
)
Daily.select_speaker_device("speaker")
self.image: bytes | None = None
self.camera_thread = Thread(target=self.run_camera, daemon=True)
self.camera_thread.start()
self.logger.info("Starting frame consumer thread")
self.frame_consumer_thread = Thread(target=self.frame_consumer, daemon=True)
self.frame_consumer_thread.start()
Daily.select_speaker_device("speaker")
self.client.set_user_name(self.bot_name)
self.client.join(self.room_url, self.token, completion=self.call_joined)
self.client.update_inputs(
{
"camera": {
"isEnabled": True,
"settings": {
"deviceId": "camera",
self.client.set_user_name(self._bot_name)
self.client.join(
self._room_url,
self._token,
completion=self.call_joined,
client_settings={
"inputs": {
"camera": {
"isEnabled": True,
"settings": {
"deviceId": "camera",
},
},
},
"microphone": {
"isEnabled": True,
"settings": {
"deviceId": "mic",
"customConstraints": {
"autoGainControl": {"exact": False},
"echoCancellation": {"exact": False},
"noiseSuppression": {"exact": False},
"microphone": {
"isEnabled": True,
"settings": {
"deviceId": "mic",
"customConstraints": {
"autoGainControl": {"exact": False},
"echoCancellation": {"exact": False},
"noiseSuppression": {"exact": False},
},
},
},
},
}
)
self.client.update_publishing(
{
"camera": {
"sendSettings": {
"maxQuality": "low",
"encodings": {
"low": {
"maxBitrate": 250000,
"scaleResolutionDownBy": 1.333,
"maxFramerate": 8,
}
},
"publishing": {
"camera": {
"sendSettings": {
"maxQuality": "low",
"encodings": {
"low": {
"maxBitrate": 250000,
"scaleResolutionDownBy": 1.333,
"maxFramerate": 8,
}
},
}
}
}
}
},
},
)
self._my_participant_id = self.client.participants()["local"]["id"]
if self.token:
self.client.update_subscription_profiles({
"base": {
"camera": "unsubscribed",
}
})
if self._token and self._start_transcription:
self.client.start_transcription(self.transcription_settings)
self.my_participant_id = self.client.participants()["local"]["id"]
self.original_sigint_handler = signal.getsignal(signal.SIGINT)
signal.signal(signal.SIGINT, self.process_interrupt_handler)
async def get_receive_frames(self):
while True:
frame = await self.receive_queue.get()
yield frame
if frame.frame_type == FrameType.END_STREAM:
break
def process_interrupt_handler(self, signum, frame):
self._post_run()
if callable(self.original_sigint_handler):
self.original_sigint_handler(signum, frame)
def get_async_send_queue(self):
return self.send_queue
async def marshal_frames(self):
while True:
frame: QueueFrame | list = await self.send_queue.get()
self.threadsafe_send_queue.put(frame)
self.send_queue.task_done()
if type(frame) == QueueFrame and frame.frame_type == FrameType.END_STREAM:
break
async def wait_for_send_queue_to_empty(self):
await self.send_queue.join()
self.threadsafe_send_queue.join()
async def stop_when_done(self):
await self.wait_for_send_queue_to_empty()
self.stop()
async def run(self) -> None:
self.configure_daily()
self.participant_left = False
async_output_queue_marshal_task = asyncio.create_task(self.marshal_frames())
try:
participant_count: int = len(self.client.participants())
self.logger.info(f"{participant_count} participants in room")
while time.time() < self.expiration and not self.participant_left and not self.stop_threads.is_set():
await asyncio.sleep(1)
except Exception as e:
self.logger.error(f"Exception {e}")
finally:
self.client.leave()
self.stop_threads.set()
await self.receive_queue.put(QueueFrame(FrameType.END_STREAM, None))
await self.send_queue.put(QueueFrame(FrameType.END_STREAM, None))
await async_output_queue_marshal_task
if self.camera_thread and self.camera_thread.is_alive():
self.camera_thread.join()
if self.frame_consumer_thread and self.frame_consumer_thread.is_alive():
self.frame_consumer_thread.join()
def stop(self):
self.stop_threads.set()
def _post_run(self):
self.client.leave()
def on_first_other_participant_joined(self):
pass
def call_joined(self, join_data, client_error):
self.logger.info(f"Call_joined: {join_data}, {client_error}")
self._logger.info(f"Call_joined: {join_data}, {client_error}")
def dialout(self, number):
self.client.start_dialout({"phoneNumber": number})
def start_recording(self):
self.client.start_recording()
def on_error(self, error):
self.logger.error(f"on_error: {error}")
self._logger.error(f"on_error: {error}")
def on_call_state_updated(self, state):
pass
def on_participant_joined(self, participant):
if not self.other_participant_has_joined and participant["id"] != self.my_participant_id:
self.other_participant_has_joined = True
if not self._other_participant_has_joined and participant["id"] != self._my_participant_id:
self._other_participant_has_joined = True
self.on_first_other_participant_joined()
def on_participant_left(self, participant, reason):
if len(self.client.participants()) < 2:
self.participant_left = True
pass
if len(self.client.participants()) < self._min_others_count + 1:
self._stop_threads.set()
def on_app_message(self, message, sender):
pass
def on_transcription_message(self, message:dict):
if self.loop:
frame = QueueFrame(FrameType.TRANSCRIPTION, message)
asyncio.run_coroutine_threadsafe(self.receive_queue.put(frame), self.loop)
def on_transcription_message(self, message: dict):
if self._loop:
participantId = ""
if "participantId" in message:
participantId = message["participantId"]
elif "session_id" in message:
participantId = message["session_id"]
if self._my_participant_id and participantId != self._my_participant_id:
frame = TranscriptionQueueFrame(message["text"], participantId, message["timestamp"])
asyncio.run_coroutine_threadsafe(self.receive_queue.put(frame), self._loop)
def on_transcription_stopped(self, stopped_by, stopped_by_error):
pass
@@ -283,75 +261,3 @@ class DailyTransportService(EventHandler):
def on_transcription_started(self, status):
pass
def set_image(self, image: bytes):
self.image: bytes | None = image
def run_camera(self):
try:
while not self.stop_threads.is_set():
if self.image:
self.camera.write_frame(self.image)
time.sleep(1.0 / 8) # 8 fps
except Exception as e:
self.logger.error(f"Exception {e} in camera thread.")
def frame_consumer(self):
self.logger.info("🎬 Starting frame consumer thread")
b = bytearray()
smallest_write_size = 3200
all_audio_frames = bytearray()
while True:
try:
frames_or_frame: QueueFrame | list[QueueFrame] = self.threadsafe_send_queue.get()
if type(frames_or_frame) == QueueFrame:
frames: list[QueueFrame] = [frames_or_frame]
elif type(frames_or_frame) == list:
frames: list[QueueFrame] = frames_or_frame
else:
raise Exception("Unknown type in output queue")
for frame in frames:
if frame.frame_type == FrameType.END_STREAM:
self.logger.info("Stopping frame consumer thread")
self.threadsafe_send_queue.task_done()
return
# if interrupted, we just pull frames off the queue and discard them
if not self.is_interrupted.is_set():
if frame:
if frame.frame_type == FrameType.AUDIO:
chunk = frame.frame_data
all_audio_frames.extend(chunk)
b.extend(chunk)
l = len(b) - (len(b) % smallest_write_size)
if l:
self.mic.write_frames(bytes(b[:l]))
b = b[l:]
elif frame.frame_type == FrameType.IMAGE:
self.set_image(frame.frame_data)
elif len(b):
self.mic.write_frames(bytes(b))
b = bytearray()
else:
if self.interrupt_time:
self.logger.info(
f"Lag to stop stream after interruption {time.perf_counter() - self.interrupt_time}"
)
self.interrupt_time = None
if frame.frame_type == FrameType.START_STREAM:
self.is_interrupted.clear()
self.threadsafe_send_queue.task_done()
except Empty:
try:
if len(b):
self.mic.write_frames(bytes(b))
except Exception as e:
self.logger.error(f"Exception in frame_consumer: {e}, {len(b)}")
b = bytearray()

View File

@@ -0,0 +1,36 @@
import os
import aiohttp
import requests
from dailyai.services.ai_services import TTSService
class DeepgramAIService(TTSService):
def __init__(
self,
*,
aiohttp_session: aiohttp.ClientSession,
api_key,
voice,
sample_rate=16000
):
super().__init__()
self._api_key = api_key
self._voice = voice
self._sample_rate = sample_rate
self._aiohttp_session = aiohttp_session
async def run_tts(self, sentence):
self.logger.info(f"Running deepgram tts for {sentence}")
base_url = "https://api.beta.deepgram.com/v1/speak"
request_url = f"{base_url}?model={self._voice}&encoding=linear16&container=none&sample_rate={self._sample_rate}"
headers = {"authorization": f"token {self._api_key}", "Content-Type": "application/json"}
data = {"text": sentence}
async with self._aiohttp_session.post(
request_url, headers=headers, json=data
) as r:
async for chunk in r.content:
if chunk:
yield chunk

View File

@@ -7,23 +7,24 @@ import requests
from collections.abc import AsyncGenerator
from dailyai.services.ai_services import TTSService
class DeepgramTTSService(TTSService):
def __init__(self, speech_key=None, voice=None):
def __init__(self, *, aiohttp_session, api_key, voice="alpha-asteria-en-v2"):
super().__init__()
self.voice = voice or os.getenv("DEEPGRAM_VOICE") or "alpha-asteria-en-v2"
self.speech_key = speech_key or os.getenv("DEEPGRAM_API_KEY")
self._voice = voice
self._api_key = api_key
self._aiohttp_session = aiohttp_session
def get_mic_sample_rate(self):
return 24000
async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
self.logger.info(f"Running deepgram tts for {sentence}")
base_url = "https://api.beta.deepgram.com/v1/speak"
request_url = f"{base_url}?model={self.voice}&encoding=linear16&container=none&sample_rate=16000"
headers = {"authorization": f"token {self.speech_key}"}
body = { "text": sentence }
async with aiohttp.ClientSession() as session:
async with session.post(request_url, headers=headers, json=body) as r:
async for data in r.content:
yield data
request_url = f"{base_url}?model={self._voice}&encoding=linear16&container=none&sample_rate=16000"
headers = {"authorization": f"token {self._api_key}"}
body = {"text": sentence}
async with self._aiohttp_session.post(request_url, headers=headers, json=body) as r:
async for data in r.content:
yield data

View File

@@ -9,28 +9,37 @@ from dailyai.services.ai_services import TTSService
class ElevenLabsTTSService(TTSService):
def __init__(self, api_key=None, voice_id=None):
def __init__(
self,
*,
aiohttp_session: aiohttp.ClientSession,
api_key,
voice_id,
):
super().__init__()
self.api_key = api_key or os.getenv("ELEVENLABS_API_KEY")
self.voice_id = voice_id or os.getenv("ELEVENLABS_VOICE_ID")
self._api_key = api_key
self._voice_id = voice_id
self._aiohttp_session = aiohttp_session
async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
async with aiohttp.ClientSession() as session:
url = f"https://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}/stream"
payload = {"text": sentence, "model_id": "eleven_turbo_v2"}
querystring = {"output_format": "pcm_16000", "optimize_streaming_latency": 2}
headers = {
"xi-api-key": self.api_key,
"Content-Type": "application/json",
}
async with session.post(url, json=payload, headers=headers, params=querystring) as r:
if r.status != 200:
self.logger.error(
f"audio fetch status code: {r.status}, error: {r.text}"
)
return
url = f"https://api.elevenlabs.io/v1/text-to-speech/{self._voice_id}/stream"
payload = {"text": sentence, "model_id": "eleven_turbo_v2"}
querystring = {"output_format": "pcm_16000", "optimize_streaming_latency": 2}
headers = {
"xi-api-key": self._api_key,
"Content-Type": "application/json",
}
async with self._aiohttp_session.post(
url, json=payload, headers=headers, params=querystring
) as r:
if r.status != 200:
self.logger.error(
f"audio fetch status code: {r.status}, error: {r.text}"
)
return
async for chunk in r.content:
if chunk:
yield chunk
async for chunk in r.content:
if chunk:
yield chunk

View File

@@ -2,30 +2,42 @@ import fal
import aiohttp
import asyncio
import io
import json
import os
from PIL import Image
from dailyai.services.ai_services import ImageGenService
from dailyai.services.ai_services import LLMService, TTSService, ImageGenService
from dailyai.services.ai_services import ImageGenService
# Fal expects FAL_KEY_ID and FAL_KEY_SECRET to be set in the env
class FalImageGenService(ImageGenService):
def __init__(self, image_size):
def __init__(
self,
*,
image_size,
aiohttp_session: aiohttp.ClientSession,
key_id=None,
key_secret=None):
super().__init__(image_size)
self._aiohttp_session = aiohttp_session
if key_id:
os.environ["FAL_KEY_ID"] = key_id
if key_secret:
os.environ["FAL_KEY_SECRET"] = key_secret
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
def get_image_url(sentence, size):
print("starting fal submit...")
handler = fal.apps.submit(
"110602490-fast-sdxl",
arguments={
"prompt": sentence
"prompt": sentence
},
)
print("past fal handler init, about to wait for iter_events...")
)
for event in handler.iter_events():
if isinstance(event, fal.apps.InProgress):
print('Request in progress')
print(event.logs)
pass
result = handler.get()
@@ -34,16 +46,9 @@ class FalImageGenService(ImageGenService):
raise Exception("Image generation failed")
return image_url
print(f"fetching image url...")
image_url = await asyncio.to_thread(get_image_url, sentence, self.image_size)
print(f"got image url, downloading image...")
# Load the image from the url
async with aiohttp.ClientSession() as session:
async with session.get(image_url) as response:
print("got image response")
image_stream = io.BytesIO(await response.content.read())
print("read image stream")
image = Image.open(image_stream)
return (image_url, image.tobytes())
# return (image_url, dalle_im.tobytes())
async with self._aiohttp_session.get(image_url) as response:
image_stream = io.BytesIO(await response.content.read())
image = Image.open(image_stream)
return (image_url, image.tobytes())

View File

@@ -0,0 +1,73 @@
import array
import io
import math
import time
from typing import AsyncGenerator
import wave
from dailyai.pipeline.frames import AudioFrame, Frame, TranscriptionQueueFrame
from dailyai.services.ai_services import STTService
class LocalSTTService(STTService):
_content: io.BufferedRandom
_wave: wave.Wave_write
_current_silence_frames: int
# Configuration
_min_rms: int
_max_silence_frames: int
_frame_rate: int
def __init__(self,
min_rms: int = 400,
max_silence_frames: int = 3,
frame_rate: int = 16000,
**kwargs):
super().__init__(frame_rate, **kwargs)
self._current_silence_frames = 0
self._min_rms = min_rms
self._max_silence_frames = max_silence_frames
self._frame_rate = frame_rate
self._new_wave()
def _new_wave(self):
"""Creates a new wave object and content buffer."""
self._content = io.BufferedRandom(io.BytesIO())
ww = wave.open(self._content, "wb")
ww.setnchannels(1)
ww.setsampwidth(2)
ww.setframerate(self._frame_rate)
self._wave = ww
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
"""Processes a frame of audio data, either buffering or transcribing it."""
if not isinstance(frame, AudioFrame):
return
data = frame.data
# Try to filter out empty background noise
# (Very rudimentary approach, can be improved)
rms = self._get_volume(data)
if rms >= self._min_rms:
# If volume is high enough, write new data to wave file
self._wave.writeframesraw(data)
# If buffer is not empty and we detect a 3-frame pause in speech,
# transcribe the audio gathered so far.
if self._content.tell() > 0 and self._current_silence_frames > self._max_silence_frames:
self._current_silence_frames = 0
self._wave.close()
self._content.seek(0)
text = await self.run_stt(self._content)
self._new_wave()
yield TranscriptionQueueFrame(text, '', str(time.time()))
# If we get this far, this is a frame of silence
self._current_silence_frames += 1
def _get_volume(self, audio: bytes) -> float:
# https://docs.python.org/3/library/array.html
audio_array = array.array('h', audio)
squares = [sample**2 for sample in audio_array]
mean = sum(squares) / len(audio_array)
rms = math.sqrt(mean)
return rms

View File

@@ -0,0 +1,76 @@
import asyncio
import time
import numpy as np
import tkinter as tk
import pyaudio
from dailyai.services.base_transport_service import BaseTransportService
class LocalTransportService(BaseTransportService):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._sample_width = kwargs.get("sample_width") or 2
self._n_channels = kwargs.get("n_channels") or 1
self._tk_root = kwargs.get("tk_root") or None
if self._camera_enabled and not self._tk_root:
raise ValueError("If camera is enabled, a tkinter root must be provided")
if self._speaker_enabled:
self._speaker_buffer_pending = bytearray()
async def _write_frame_to_tkinter(self, frame: bytes):
data = f"P6 {self._camera_width} {self._camera_height} 255 ".encode() + frame
photo = tk.PhotoImage(
width=self._camera_width,
height=self._camera_height,
data=data,
format="PPM")
self._image_label.config(image=photo)
# This holds a reference to the photo, preventing it from being garbage collected.
self._image_label.image = photo # type: ignore
def write_frame_to_camera(self, frame: bytes):
if self._camera_enabled and self._loop:
asyncio.run_coroutine_threadsafe(
self._write_frame_to_tkinter(frame), self._loop
)
def write_frame_to_mic(self, frame: bytes):
self._audio_stream.write(frame)
def read_frames(self, desired_frame_count):
bytes = self._speaker_stream.read(
desired_frame_count,
exception_on_overflow=False,
)
return bytes
def _prerun(self):
if self._mic_enabled:
self._pyaudio = pyaudio.PyAudio()
self._audio_stream = self._pyaudio.open(
format=self._pyaudio.get_format_from_width(self._sample_width),
channels=self._n_channels,
rate=self._speaker_sample_rate,
output=True,
)
if self._camera_enabled:
# Start with a neutral gray background.
array = np.ones((1024, 1024, 3)) * 128
data = f"P5 {1024} {1024} 255 ".encode() + array.astype(np.uint8).tobytes()
photo = tk.PhotoImage(width=1024, height=1024, data=data, format="PPM")
self._image_label = tk.Label(self._tk_root, image=photo)
self._image_label.pack()
if self._speaker_enabled:
self._speaker_stream = self._pyaudio.open(
format=self._pyaudio.get_format_from_width(self._sample_width),
channels=self._n_channels,
rate=self._speaker_sample_rate,
frames_per_buffer=self._speaker_sample_rate,
input=True
)

View File

@@ -0,0 +1,44 @@
from openai import AsyncOpenAI
import json
from collections.abc import AsyncGenerator
from dailyai.services.ai_services import LLMService
class OLLamaLLMService(LLMService):
def __init__(self, model="llama2", base_url='http://localhost:11434/v1'):
super().__init__()
self._model = model
self._client = AsyncOpenAI(api_key="ollama", base_url=base_url)
async def get_response(self, messages, stream):
return await self._client.chat.completions.create(
stream=stream,
messages=messages,
model=self._model
)
async def run_llm_async(self, messages, tool_choice=None) -> AsyncGenerator[str, None]:
messages_for_log = json.dumps(messages)
self.logger.debug(f"Generating chat via openai: {messages_for_log}")
chunks = await self._client.chat.completions.create(
model=self._model, stream=True, messages=messages
)
async for chunk in chunks:
if len(chunk.choices) == 0:
continue
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
async def run_llm(self, messages) -> str | None:
messages_for_log = json.dumps(messages)
self.logger.debug(f"Generating chat via ollama: {messages_for_log}")
response = await self._client.chat.completions.create(model=self._model, stream=False, messages=messages)
if response and len(response.choices) > 0:
return response.choices[0].message.content
else:
return None

View File

@@ -1,67 +1,79 @@
import requests
import aiohttp
import asyncio
from PIL import Image
import io
import time
from openai import AsyncOpenAI
import os
import json
from collections.abc import AsyncGenerator
from dailyai.services.ai_services import AIService, TTSService, LLMService, ImageGenService
from dailyai.services.ai_services import LLMService, ImageGenService
class OpenAILLMService(LLMService):
def __init__(self, api_key=None, model=None):
super().__init__()
api_key = api_key or os.getenv("OPEN_AI_KEY")
self.model = model or os.getenv("OPEN_AI_LLM_MODEL") or "gpt-4"
self.client = AsyncOpenAI(api_key=api_key)
def __init__(self, *, api_key, model="gpt-4", tools=None, messages=None):
super().__init__(tools=tools, messages=messages)
self._model = model
self._client = AsyncOpenAI(api_key=api_key)
async def get_response(self, messages, stream):
return await self.client.chat.completions.create(
return await self._client.chat.completions.create(
stream=stream,
messages=messages,
model=self.model
model=self._model,
tools=self._tools
)
async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
async def run_llm_async(self, messages, tool_choice=None) -> AsyncGenerator[str, None]:
messages_for_log = json.dumps(messages)
self.logger.debug(f"Generating chat via openai: {messages_for_log}")
response = await self.get_response(messages, stream=True)
for chunk in response:
if self._tools:
tools = self._tools
else:
tools = None
start_time = time.time()
chunks = await self._client.chat.completions.create(model=self._model, stream=True, messages=messages, tools=tools, tool_choice=tool_choice)
self.logger.info(f"=== OpenAI LLM TTFB: {time.time() - start_time}")
async for chunk in chunks:
if len(chunk.choices) == 0:
continue
if chunk.choices[0].delta.content:
if chunk.choices[0].delta.tool_calls:
yield chunk.choices[0].delta.tool_calls[0]
elif chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
async def run_llm(self, messages) -> str | None:
messages_for_log = json.dumps(messages)
self.logger.debug(f"Generating chat via openai: {messages_for_log}")
response = await self.get_response(messages, stream=False)
response = await self._client.chat.completions.create(model=self._model, stream=False, messages=messages)
if response and len(response.choices) > 0:
return response.choices[0].message.content
else:
return None
class OpenAIImageGenService(ImageGenService):
def __init__(self, image_size:str, api_key=None, model=None):
def __init__(
self,
*,
image_size: str,
aiohttp_session: aiohttp.ClientSession,
api_key,
model="dall-e-3",
):
super().__init__(image_size=image_size)
api_key = api_key or os.getenv("OPEN_AI_KEY")
self.model = model or os.getenv("OPEN_AI_IMAGE_MODEL") or "dall-e-3"
self.client = AsyncOpenAI(api_key=api_key)
self._model = model
self._client = AsyncOpenAI(api_key=api_key)
self._aiohttp_session = aiohttp_session
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
self.logger.info("Generating OpenAI image", sentence)
image = await self.client.images.generate(
image = await self._client.images.generate(
prompt=sentence,
model=self.model,
model=self._model,
n=1,
size=self.image_size
)
@@ -70,10 +82,7 @@ class OpenAIImageGenService(ImageGenService):
raise Exception("No image provided in response", image)
# Load the image from the url
async with aiohttp.ClientSession() as session:
async with session.get(image_url) as response:
image_stream = io.BytesIO(await response.content.read())
image = Image.open(image_stream)
return (image_url, image.tobytes())
return (image_url, dalle_im.tobytes())
async with self._aiohttp_session.get(image_url) as response:
image_stream = io.BytesIO(await response.content.read())
image = Image.open(image_stream)
return (image_url, image.tobytes())

View File

@@ -1,36 +1,40 @@
import io
import os
import struct
from pyht import Client
from dotenv import load_dotenv
from pyht.client import TTSOptions
from pyht.protos.api_pb2 import Format
from services.ai_service import AIService
from dailyai.services.ai_services import TTSService
class PlayHTAIService(AIService):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.speech_key = os.getenv("PLAY_HT_KEY") or ''
self.user_id = os.getenv("PLAY_HT_USER_ID") or ''
class PlayHTAIService(TTSService):
def __init__(
self,
*,
api_key,
user_id,
voice_url
):
super().__init__()
self.speech_key = api_key
self.user_id = user_id
self.client = Client(
user_id=self.user_id,
api_key=self.speech_key,
)
self.options = TTSOptions(
voice="s3://voice-cloning-zero-shot/820da3d2-3a3b-42e7-844d-e68db835a206/sarah/manifest.json",
voice=voice_url,
sample_rate=16000,
quality="higher",
format=Format.FORMAT_WAV
)
format=Format.FORMAT_WAV)
def close(self):
super().close()
def __del__(self):
self.client.close()
def run_tts(self, sentence):
async def run_tts(self, sentence):
b = bytearray()
in_header = True
for chunk in self.client.tts(sentence, self.options):
@@ -43,14 +47,15 @@ class PlayHTAIService(AIService):
fh = io.BytesIO(b)
fh.seek(36)
(data, size) = struct.unpack('<4sI', fh.read(8))
self.logger.info(f"first attempt: data: {data}, size: {hex(size)}, position: {fh.tell()}")
self.logger.info(
f"first attempt: data: {data}, size: {hex(size)}, position: {fh.tell()}")
while data != b'data':
fh.read(size)
(data, size) = struct.unpack('<4sI', fh.read(8))
self.logger.info(f"subsequent data: {data}, size: {hex(size)}, position: {fh.tell()}, data != data: {data != b'data'}")
self.logger.info(
f"subsequent data: {data}, size: {hex(size)}, position: {fh.tell()}, data != data: {data != b'data'}")
self.logger.info("position: ", fh.tell())
in_header = False
else:
if len(chunk):
yield chunk

View File

@@ -4,6 +4,8 @@ from services.ai_service import AIService
# Note that Cloudflare's AI workers are still in beta.
# https://developers.cloudflare.com/workers-ai/
class CloudflareAIService(AIService):
def __init__(self):
super().__init__()
@@ -19,11 +21,11 @@ class CloudflareAIService(AIService):
return response.json()
# https://developers.cloudflare.com/workers-ai/models/llm/
def run_llm(self, messages, latest_user_message=None, stream = True):
def run_llm(self, messages, latest_user_message=None, stream=True):
input = {
"messages": [
{ "role": "system", "content": "You are a friendly assistant" },
{ "role": "user", "content": sentence }
{"role": "system", "content": "You are a friendly assistant"},
{"role": "user", "content": sentence}
]
}
@@ -57,9 +59,9 @@ class CloudflareAIService(AIService):
# https://developers.cloudflare.com/workers-ai/models/embedding/
def run_embeddings(self, texts, size="medium"):
models = {
"small": "@cf/baai/bge-small-en-v1.5", # 384 output dimensions
"medium": "@cf/baai/bge-base-en-v1.5", # 768 output dimensions
"large": "@cf/baai/bge-large-en-v1.5" #1024 output dimensions
"small": "@cf/baai/bge-small-en-v1.5", # 384 output dimensions
"medium": "@cf/baai/bge-base-en-v1.5", # 768 output dimensions
"large": "@cf/baai/bge-large-en-v1.5" # 1024 output dimensions
}
return self.run(models[size], {"text": texts})

View File

@@ -1,28 +0,0 @@
import os
import requests
from services.ai_service import AIService
from PIL import Image
class DeepgramAIService(AIService):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.api_key = os.getenv("DEEPGRAM_API_KEY")
def get_mic_sample_rate(self):
return 24000
def run_tts(self, sentence):
self.logger.info(f"Running deepgram tts for {sentence}")
base_url = "https://api.beta.deepgram.com/v1/speak"
voice = os.getenv("DEEPGRAM_VOICE") or "alpha-apollo-en-v1" # move this to an environment variable
request_url = f"{base_url}?model={voice}&encoding=linear16&container=none"
headers = {"authorization": f"token {self.api_key}"}
r = requests.post(request_url, headers=headers, data=sentence)
self.logger.info(
f"audio fetch status code: {r.status_code}, content length: {len(r.content)}"
)
yield r.content

View File

@@ -2,9 +2,12 @@ from services.ai_service import AIService
import openai
import os
# To use Google Cloud's AI products, you'll need to install Google Cloud CLI and enable the TTS and in your project: https://cloud.google.com/sdk/docs/install
# To use Google Cloud's AI products, you'll need to install Google Cloud
# CLI and enable the TTS and in your project:
# https://cloud.google.com/sdk/docs/install
from google.cloud import texttospeech
class GoogleAIService(AIService):
def __init__(self):
super().__init__()
@@ -15,11 +18,14 @@ class GoogleAIService(AIService):
)
self.audio_config = texttospeech.AudioConfig(
audio_encoding = texttospeech.AudioEncoding.LINEAR16,
sample_rate_hertz = 16000
audio_encoding=texttospeech.AudioEncoding.LINEAR16,
sample_rate_hertz=16000
)
def run_tts(self, sentence):
synthesis_input = texttospeech.SynthesisInput(text = sentence.strip())
result = self.client.synthesize_speech(input=synthesis_input, voice=self.voice, audio_config=self.audio_config)
synthesis_input = texttospeech.SynthesisInput(text=sentence.strip())
result = self.client.synthesize_speech(
input=synthesis_input,
voice=self.voice,
audio_config=self.audio_config)
return result

View File

@@ -1,7 +1,12 @@
from services.ai_service import AIService
from transformers import pipeline
# These functions are just intended for testing, not production use. If you'd like to use HuggingFace, you should use your own models, or do some research into the specific models that will work best for your use case.
# These functions are just intended for testing, not production use. If
# you'd like to use HuggingFace, you should use your own models, or do
# some research into the specific models that will work best for your use
# case.
class HuggingFaceAIService(AIService):
def __init__(self):
super().__init__()
@@ -10,9 +15,12 @@ class HuggingFaceAIService(AIService):
classifier = pipeline("sentiment-analysis")
return classifier(sentence)
# available models at https://huggingface.co/Helsinki-NLP (**not all models use 2-character language codes**)
# available models at https://huggingface.co/Helsinki-NLP (**not all
# models use 2-character language codes**)
def run_text_translation(self, sentence, source_language, target_language):
translator = pipeline(f"translation", model=f"Helsinki-NLP/opus-mt-{source_language}-{target_language}")
translator = pipeline(
f"translation",
model=f"Helsinki-NLP/opus-mt-{source_language}-{target_language}")
return translator(sentence)[0]["translation_text"]

View File

@@ -4,6 +4,7 @@ import time
from PIL import Image
from services.ai_service import AIService
class MockAIService(AIService):
def __init__(self):
super().__init__()
@@ -20,8 +21,7 @@ class MockAIService(AIService):
time.sleep(1)
return (image_url, image)
def run_llm(self, messages, latest_user_message=None, stream = True):
def run_llm(self, messages, latest_user_message=None, stream=True):
for i in range(5):
time.sleep(1)
yield({"choices": [{"delta": {"content": f"hello {i}!"}}]})
yield ({"choices": [{"delta": {"content": f"hello {i}!"}}]})

View File

@@ -0,0 +1,55 @@
"""This module implements Whisper transcription with a locally-downloaded model."""
import asyncio
from enum import Enum
import logging
from typing import BinaryIO
from faster_whisper import WhisperModel
from dailyai.services.local_stt_service import LocalSTTService
class Model(Enum):
"""Class of basic Whisper model selection options"""
TINY = "tiny"
BASE = "base"
MEDIUM = "medium"
LARGE = "large-v3"
DISTIL_LARGE_V2 = "Systran/faster-distil-whisper-large-v2"
DISTIL_MEDIUM_EN = "Systran/faster-distil-whisper-medium.en"
class WhisperSTTService(LocalSTTService):
"""Class to transcribe audio with a locally-downloaded Whisper model"""
_model: WhisperModel
# Model configuration
_model_name: Model
_device: str
_compute_type: str
def __init__(self, model_name: Model = Model.DISTIL_MEDIUM_EN,
device: str = "auto",
compute_type: str = "default"):
super().__init__()
self.logger: logging.Logger = logging.getLogger("dailyai")
self._model_name = model_name
self._device = device
self._compute_type = compute_type
self._load()
def _load(self):
"""Loads the Whisper model. Note that if this is the first time
this model is being run, it will take time to download."""
model = WhisperModel(
self._model_name.value,
device=self._device,
compute_type=self._compute_type)
self._model = model
async def run_stt(self, audio: BinaryIO) -> str:
"""Transcribes given audio using Whisper"""
segments, _ = await asyncio.to_thread(self._model.transcribe, audio)
res: str = ""
for segment in segments:
res += f"{segment.text} "
return res

View File

@@ -0,0 +1,128 @@
import asyncio
import doctest
import functools
import unittest
from dailyai.pipeline.aggregators import (
GatedAggregator,
ParallelPipeline,
SentenceAggregator,
StatelessTextTransformer,
)
from dailyai.pipeline.frames import (
AudioFrame,
EndFrame,
ImageFrame,
LLMResponseEndFrame,
LLMResponseStartFrame,
Frame,
TextFrame,
)
from dailyai.pipeline.pipeline import Pipeline
class TestDailyFrameAggregators(unittest.IsolatedAsyncioTestCase):
async def test_sentence_aggregator(self):
sentence = "Hello, world. How are you? I am fine"
expected_sentences = ["Hello, world.", " How are you?", " I am fine "]
aggregator = SentenceAggregator()
for word in sentence.split(" "):
async for sentence in aggregator.process_frame(TextFrame(word + " ")):
self.assertIsInstance(sentence, TextFrame)
if isinstance(sentence, TextFrame):
self.assertEqual(sentence.text, expected_sentences.pop(0))
async for sentence in aggregator.process_frame(EndFrame()):
if len(expected_sentences):
self.assertIsInstance(sentence, TextFrame)
if isinstance(sentence, TextFrame):
self.assertEqual(sentence.text, expected_sentences.pop(0))
else:
self.assertIsInstance(sentence, EndFrame)
self.assertEqual(expected_sentences, [])
async def test_gated_accumulator(self):
gated_aggregator = GatedAggregator(
gate_open_fn=lambda frame: isinstance(frame, ImageFrame),
gate_close_fn=lambda frame: isinstance(frame, LLMResponseStartFrame),
start_open=False,
)
frames = [
LLMResponseStartFrame(),
TextFrame("Hello, "),
TextFrame("world."),
AudioFrame(b"hello"),
ImageFrame("image", b"image"),
AudioFrame(b"world"),
LLMResponseEndFrame(),
]
expected_output_frames = [
ImageFrame("image", b"image"),
LLMResponseStartFrame(),
TextFrame("Hello, "),
TextFrame("world."),
AudioFrame(b"hello"),
AudioFrame(b"world"),
LLMResponseEndFrame(),
]
for frame in frames:
async for out_frame in gated_aggregator.process_frame(frame):
self.assertEqual(out_frame, expected_output_frames.pop(0))
self.assertEqual(expected_output_frames, [])
async def test_parallel_pipeline(self):
async def slow_add(sleep_time:float, name:str, x: str):
await asyncio.sleep(sleep_time)
return ":".join([x, name])
pipe1_annotation = StatelessTextTransformer(functools.partial(slow_add, 0.1, 'pipe1'))
pipe2_annotation = StatelessTextTransformer(functools.partial(slow_add, 0.2, 'pipe2'))
sentence_aggregator = SentenceAggregator()
add_dots = StatelessTextTransformer(lambda x: x + ".")
source = asyncio.Queue()
sink = asyncio.Queue()
pipeline = Pipeline(
[
ParallelPipeline(
[[pipe1_annotation], [sentence_aggregator, pipe2_annotation]]
),
add_dots,
],
source,
sink,
)
frames = [
TextFrame("Hello, "),
TextFrame("world."),
EndFrame()
]
expected_output_frames: list[Frame] = [
TextFrame(text='Hello, :pipe1.'),
TextFrame(text='world.:pipe1.'),
TextFrame(text='Hello, world.:pipe2.'),
EndFrame()
]
for frame in frames:
await source.put(frame)
await pipeline.run_pipeline()
while not sink.empty():
frame = await sink.get()
self.assertEqual(frame, expected_output_frames.pop(0))
def load_tests(loader, tests, ignore):
""" Run doctests on the aggregators module. """
from dailyai.pipeline import aggregators
tests.addTests(doctest.DocTestSuite(aggregators))
return tests

View File

@@ -1,35 +1,31 @@
from re import A
import unittest
from typing import AsyncGenerator, Generator
from dailyai.services.ai_services import AIService, SentenceAggregator
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.ai_services import AIService
from dailyai.pipeline.frames import EndFrame, Frame, TextFrame
class SimpleAIService(AIService):
def allowed_input_frame_types(self) -> set[FrameType]:
return set([FrameType.TEXT_CHUNK])
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
yield frame
def possible_output_frame_types(self) -> set[FrameType]:
return set([FrameType.TEXT_CHUNK])
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> QueueFrame | None:
return frame
class TestBaseAIService(unittest.IsolatedAsyncioTestCase):
async def test_async_input(self):
service = SimpleAIService()
input_frames = [
QueueFrame(FrameType.TEXT_CHUNK, "hello"),
QueueFrame(FrameType.END_STREAM, None),
TextFrame("hello"),
EndFrame()
]
async def iterate_frames() -> AsyncGenerator[QueueFrame, None]:
async def iterate_frames() -> AsyncGenerator[Frame, None]:
for frame in input_frames:
yield frame
output_frames = []
async for frame in service.run(set([FrameType.TEXT_CHUNK]), iterate_frames()):
async for frame in service.run(iterate_frames()):
output_frames.append(frame)
self.assertEqual(input_frames, output_frames)
@@ -37,93 +33,18 @@ class TestBaseAIService(unittest.IsolatedAsyncioTestCase):
async def test_nonasync_input(self):
service = SimpleAIService()
input_frames = [
QueueFrame(FrameType.TEXT_CHUNK, "hello"),
QueueFrame(FrameType.END_STREAM, None),
]
input_frames = [TextFrame("hello"), EndFrame()]
def iterate_frames() -> Generator[QueueFrame, None, None]:
def iterate_frames() -> Generator[Frame, None, None]:
for frame in input_frames:
yield frame
output_frames = []
async for frame in service.run(set([FrameType.TEXT_CHUNK]), iterate_frames()):
async for frame in service.run(iterate_frames()):
output_frames.append(frame)
self.assertEqual(input_frames, output_frames)
class TestSentenceAggregator(unittest.IsolatedAsyncioTestCase):
async def test_clause(self) -> None:
input_frames = [
QueueFrame(FrameType.TEXT_CHUNK, "hello"),
QueueFrame(FrameType.END_STREAM, None),
]
service = SentenceAggregator()
output_frames = []
async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
output_frames.append(frame)
self.assertEqual(1, len(output_frames))
self.assertEqual(QueueFrame(FrameType.SENTENCE, "hello"), output_frames[0])
async def test_sentence(self) -> None:
input_frames = [
QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
QueueFrame(FrameType.TEXT_CHUNK, "world."),
QueueFrame(FrameType.END_STREAM, None),
]
service = SentenceAggregator()
output_frames = []
async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
output_frames.append(frame)
self.assertEqual(1, len(output_frames))
self.assertEqual(QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0])
async def test_sentence_and_clause(self) -> None:
input_frames = [
QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
QueueFrame(FrameType.TEXT_CHUNK, "world."),
QueueFrame(FrameType.TEXT_CHUNK, " How are"),
QueueFrame(FrameType.END_STREAM, None),
]
service = SentenceAggregator()
output_frames = []
async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
output_frames.append(frame)
self.assertEqual(2, len(output_frames))
self.assertEqual(
QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0]
)
self.assertEqual(
QueueFrame(FrameType.SENTENCE, " How are"), output_frames[1]
)
async def test_two_sentences(self) -> None:
input_frames = [
QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
QueueFrame(FrameType.TEXT_CHUNK, "world."),
QueueFrame(FrameType.TEXT_CHUNK, " How are"),
QueueFrame(FrameType.TEXT_CHUNK, " you doing?"),
QueueFrame(FrameType.END_STREAM, None),
]
service = SentenceAggregator()
output_frames = []
async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
output_frames.append(frame)
self.assertEqual(2, len(output_frames))
self.assertEqual(
QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0]
)
self.assertEqual(QueueFrame(FrameType.SENTENCE, " How are you doing?"), output_frames[1])
if __name__ == "__main__":
unittest.main()

View File

@@ -1,180 +0,0 @@
import time
import unittest
from queue import Queue, Empty
from threading import Thread, Event
from typing import Generator
from dailyai.async_processor.async_processor import (
AsyncProcessor,
AsyncProcessorState,
LLMResponse,
)
from dailyai.message_handler.message_handler import MessageHandler
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.ai_services import (
AIServiceConfig,
ImageGenService,
LLMService,
TTSService,
)
"""
class MockTTSService(TTSService):
def run_tts(self, sentence):
for word in sentence.split(' '):
time.sleep(0.1)
yield bytes(word, "utf-8")
class MockLLMService(LLMService):
def run_llm_async(self, messages) -> Generator[str, None, None]:
for i in ["Hello ", "there.", "How are ", "you?", "I ", "hope ", "you ", "are ", "well."]:
time.sleep(0.1)
yield i
class MockImageService(ImageGenService):
def run_image_gen(self, sentence) -> None:
return None
class TestResponse(unittest.TestCase):
def test_base_state_transitions(self):
mock_tts_service = MockTTSService()
mock_llm_service = MockLLMService()
mock_image_service = MockImageService()
processor = AsyncProcessor(AIServiceConfig(tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service))
processor.prepare()
processor.play()
processor.finalize()
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
def test_state_transitions(self):
output_queue = Queue()
mock_tts_service = MockTTSService()
mock_llm_service = MockLLMService()
mock_image_service = MockImageService()
message_handler = MessageHandler("Hello World")
processor = LLMResponse(
AIServiceConfig(
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
),
message_handler,
output_queue,
)
processor.prepare()
processor.play()
# Consume the output from the output queue. It's necessary to mark these tasks as done for the
# play function to return.
expected_words = ["Hello", "there.", "How", "are", "you?", "I", "hope", "you", "are", "well."]
# remove the "start_stream" message from the queue
output_queue.get()
output_queue.task_done()
while expected_words:
actual_word:QueueFrame = output_queue.get()
word = expected_words.pop(0)
self.assertEqual(actual_word.frame_type, FrameType.AUDIO_FRAME)
self.assertEqual(actual_word.frame_data, bytes(word, "utf-8"))
output_queue.task_done()
processor.finalize()
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
def test_interrupt_preparation(self):
output_queue = Queue()
mock_tts_service = MockTTSService()
mock_llm_service = MockLLMService()
mock_image_service = MockImageService()
message_handler = MessageHandler("System Message")
processor = LLMResponse(
AIServiceConfig(
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
),
message_handler,
output_queue,
)
processor.prepare()
interrupt_request_at = time.perf_counter()
processor.interrupt()
processor.finalize()
finalized_at = time.perf_counter()
self.assertTrue(0.1 < finalized_at - interrupt_request_at < 0.2)
print(f"delta: {interrupt_request_at, finalized_at}")
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
def test_interrupt_play(self):
output_queue = Queue()
mock_tts_service = MockTTSService()
mock_llm_service = MockLLMService()
mock_image_service = MockImageService()
message_handler = MessageHandler("System Message")
processor = LLMResponse(
AIServiceConfig(
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
),
message_handler,
output_queue,
)
processor.prepare()
processor.play()
stop_processing_output_queue = Event()
def process_output_queue_async():
# Consume the output from the output queue. It's necessary to mark these tasks as done for the
# play function to return.
time.sleep(0.1)
expected_words = ["Hello", "there.", "How", "are", "you?", "I", "hope", "you", "are", "well."]
while expected_words and not stop_processing_output_queue.is_set():
try:
actual_word:QueueFrame = output_queue.get_nowait()
if actual_word.frame_type == FrameType.AUDIO_FRAME:
time.sleep(0.1)
word = expected_words.pop(0)
self.assertEqual(actual_word.frame_type, FrameType.AUDIO_FRAME)
self.assertEqual(actual_word.frame_data, bytes(word, "utf-8"))
output_queue.task_done()
except Empty:
pass
process_output_queue = Thread(target=process_output_queue_async, daemon=True)
process_output_queue.start()
time.sleep(0.5)
processor.interrupt()
stop_processing_output_queue.set()
process_output_queue.join()
processor.finalize()
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
def test_statechange_callback(self):
mock_tts_service = MockTTSService()
mock_llm_service = MockLLMService()
mock_image_service = MockImageService()
processor = AsyncProcessor(
AIServiceConfig(
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
)
)
is_finalized = False
def set_is_finalized(async_processor:AsyncProcessor):
nonlocal is_finalized
is_finalized = True
processor.set_state_callback(
AsyncProcessorState.FINALIZED, set_is_finalized
)
processor.prepare()
self.assertFalse(is_finalized)
processor.play()
self.assertFalse(is_finalized)
processor.finalize()
self.assertTrue(is_finalized)
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
if __name__ == '__main__':
unittest.main()
"""

View File

@@ -0,0 +1,83 @@
import asyncio
import unittest
from unittest.mock import MagicMock, patch
from dailyai.pipeline.frames import AudioFrame, ImageFrame
class TestDailyTransport(unittest.IsolatedAsyncioTestCase):
async def test_event_handler(self):
from dailyai.services.daily_transport_service import DailyTransportService
transport = DailyTransportService("mock.daily.co/mock", "token", "bot")
was_called = False
@transport.event_handler("on_first_other_participant_joined")
def test_event_handler(transport):
nonlocal was_called
was_called = True
transport.on_first_other_participant_joined()
self.assertTrue(was_called)
async def test_event_handler_async(self):
from dailyai.services.daily_transport_service import DailyTransportService
transport = DailyTransportService("mock.daily.co/mock", "token", "bot")
event = asyncio.Event()
@transport.event_handler("on_first_other_participant_joined")
async def test_event_handler(transport):
nonlocal event
await asyncio.sleep(0.1)
event.set()
transport.on_first_other_participant_joined()
await asyncio.wait_for(event.wait(), timeout=1)
self.assertTrue(event.is_set())
"""
@patch("dailyai.services.daily_transport_service.CallClient")
@patch("dailyai.services.daily_transport_service.Daily")
async def test_run_with_camera_and_mic(self, daily_mock, callclient_mock):
from dailyai.services.daily_transport_service import DailyTransportService
transport = DailyTransportService(
"https://mock.daily.co/mock",
"token",
"bot",
mic_enabled=True,
camera_enabled=True,
duration_minutes=0.01,
)
mic = MagicMock()
camera = MagicMock()
daily_mock.create_microphone_device.return_value = mic
daily_mock.create_camera_device.return_value = camera
async def send_audio_frame():
await transport.send_queue.put(AudioQueueFrame(bytes([0] * 3300)))
async def send_video_frame():
await transport.send_queue.put(ImageQueueFrame(None, b"test"))
await asyncio.gather(transport.run(), send_audio_frame(), send_video_frame())
daily_mock.init.assert_called_once_with()
daily_mock.create_microphone_device.assert_called_once()
daily_mock.create_camera_device.assert_called_once()
callclient_mock.return_value.set_user_name.assert_called_once_with("bot")
callclient_mock.return_value.join.assert_called_once_with(
"https://mock.daily.co/mock", "token", completion=transport.call_joined
)
camera.write_frame.assert_called_with(b"test")
mic.write_frames.assert_called()
"""

View File

@@ -1,147 +0,0 @@
import time
import unittest
from unittest.mock import MagicMock, call
from dailyai.message_handler.message_handler import MessageHandler, IndexingMessageHandler
from dailyai.services.ai_services import (
AIServiceConfig,
TTSService,
LLMService,
ImageGenService,
)
from ..storage.search import SearchIndexer
class TestMessageHandler(unittest.TestCase):
def test_simple_intro(self):
message_handler = MessageHandler("Hello world")
self.assertEqual(
message_handler.get_llm_messages(),
[{"role": "system", "content": "Hello world"}],
)
def test_simple_user_message(self):
message_handler = MessageHandler("System prompt")
message_handler.add_user_message("User message")
self.assertEqual(
message_handler.get_llm_messages(),
[
{"role": "system", "content": "System prompt"},
{"role": "user", "content": "User message"},
],
)
def test_simple_user_and_assistant_message(self):
message_handler = MessageHandler("System prompt")
message_handler.add_user_message("User message")
message_handler.add_assistant_message("Assistant message")
self.assertEqual(
message_handler.get_llm_messages(),
[
{"role": "system", "content": "System prompt"},
{"role": "user", "content": "User message"},
{"role": "assistant", "content": "Assistant message"},
],
)
def test_user_message_overwrite(self):
message_handler = MessageHandler("System prompt")
message_handler.add_user_message("User message")
message_handler.add_assistant_message("Assistant message")
message_handler.add_user_message("plus something else")
self.assertEqual(
message_handler.get_llm_messages(),
[
{"role": "system", "content": "System prompt"},
{"role": "user", "content": "User message plus something else"},
],
)
def test_user_message_after_assistant(self):
message_handler = MessageHandler("System prompt")
message_handler.add_user_message("User message")
message_handler.add_assistant_message("Assistant message")
message_handler.finalize_user_message()
message_handler.add_user_message("other user message")
self.assertEqual(
message_handler.get_llm_messages(),
[
{"role": "system", "content": "System prompt"},
{"role": "user", "content": "User message"},
{"role": "assistant", "content": "Assistant message"},
{"role": "user", "content": "other user message"},
],
)
class MockTTSService(TTSService):
def run_tts(self, sentence):
for word in sentence.split(" "):
time.sleep(0.1)
yield bytes(word, "utf-8")
class MockLLMService(LLMService):
def run_llm(self, messages) -> str:
return "Parsed user message."
class MockImageService(ImageGenService):
def run_image_gen(self, sentence) -> None:
return None
class TestStorageMessageHandler(unittest.TestCase):
def test_user_message_finalized(self):
mock_tts_service = MockTTSService()
mock_llm_service = MockLLMService()
mock_image_service = MockImageService()
service_config = AIServiceConfig(
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
)
mock_indexer = MagicMock(spec=SearchIndexer)
message_handler = IndexingMessageHandler(
"Hello world", service_config, mock_indexer
)
message_handler.cleanup_user_message = MagicMock(return_value="Parsed user message.")
message_handler.add_user_message("User message")
message_handler.add_assistant_message("Assistant message will be ignored")
message_handler.add_user_message("plus something else")
message_handler.finalize_user_message()
message_handler.add_assistant_message(
"New assistant message will not be ignored"
)
message_handler.add_user_message("User message second time")
message_handler.add_assistant_message("Assistant message second time")
message_handler.write_messages_to_storage()
time.sleep(0.5)
message_handler.cleanup_user_message.assert_called_with("User message plus something else")
self.assertEqual(
mock_indexer.mock_calls,
[
call.index_text('"Parsed user message."'),
call.index_text("New assistant message will not be ignored"),
],
)
mock_indexer.reset_mock()
message_handler.finalize_user_message()
time.sleep(0.5)
self.assertEqual(
mock_indexer.mock_calls,
[
call.index_text('"Parsed user message."'),
call.index_text("Assistant message second time"),
],
)
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,60 @@
import asyncio
from doctest import OutputChecker
import unittest
from dailyai.pipeline.aggregators import SentenceAggregator, StatelessTextTransformer
from dailyai.pipeline.frames import EndFrame, TextFrame
from dailyai.pipeline.pipeline import Pipeline
class TestDailyPipeline(unittest.IsolatedAsyncioTestCase):
async def test_pipeline_simple(self):
aggregator = SentenceAggregator()
outgoing_queue = asyncio.Queue()
incoming_queue = asyncio.Queue()
pipeline = Pipeline([aggregator], incoming_queue, outgoing_queue)
await incoming_queue.put(TextFrame("Hello, "))
await incoming_queue.put(TextFrame("world."))
await incoming_queue.put(EndFrame())
await pipeline.run_pipeline()
self.assertEqual(await outgoing_queue.get(), TextFrame("Hello, world."))
self.assertIsInstance(await outgoing_queue.get(), EndFrame)
async def test_pipeline_multiple_stages(self):
sentence_aggregator = SentenceAggregator()
to_upper = StatelessTextTransformer(lambda x: x.upper())
add_space = StatelessTextTransformer(lambda x: x + " ")
outgoing_queue = asyncio.Queue()
incoming_queue = asyncio.Queue()
pipeline = Pipeline(
[add_space, sentence_aggregator, to_upper],
incoming_queue,
outgoing_queue
)
sentence = "Hello, world. It's me, a pipeline."
for c in sentence:
await incoming_queue.put(TextFrame(c))
await incoming_queue.put(EndFrame())
await pipeline.run_pipeline()
self.assertEqual(
await outgoing_queue.get(), TextFrame("H E L L O , W O R L D .")
)
self.assertEqual(
await outgoing_queue.get(),
TextFrame(" I T ' S M E , A P I P E L I N E ."),
)
# leftover little bit because of the spacing
self.assertEqual(
await outgoing_queue.get(),
TextFrame(" "),
)
self.assertIsInstance(await outgoing_queue.get(), EndFrame)

View File

@@ -0,0 +1,64 @@
import asyncio
import aiohttp
import os
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.playht_ai_service import PlayHTAIService
from examples.foundational.support.runner import configure
async def main(room_url):
async with aiohttp.ClientSession() as session:
# create a transport service object using environment variables for
# the transport service's API key, room url, and any other configuration.
# services can all define and document the environment variables they use.
# services all also take an optional config object that is used instead of
# environment variables.
#
# the abstract transport service APIs presumably can map pretty closely
# to the daily-python basic API
meeting_duration_minutes = 5
transport = DailyTransportService(
room_url,
None,
"Say One Thing",
meeting_duration_minutes,
mic_enabled=True
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
"""
tts = PlayHTAIService(
api_key=os.getenv("PLAY_HT_API_KEY"),
user_id=os.getenv("PLAY_HT_USER_ID"),
voice_url=os.getenv("PLAY_HT_VOICE_URL"),
)
"""
# Register an event handler so we can play the audio when the participant joins.
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
nonlocal tts
if participant["info"]["isLocal"]:
return
await tts.say(
"Hello there, " + participant["info"]["userName"] + "!",
transport.send_queue,
)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
await transport.run()
del(tts)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,34 @@
import asyncio
import aiohttp
import os
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.local_transport_service import LocalTransportService
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 1
transport = LocalTransportService(
duration_minutes=meeting_duration_minutes,
mic_enabled=True
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
async def say_something():
await asyncio.sleep(1)
await tts.say(
"Hello there.",
transport.send_queue,
)
await transport.stop_when_done()
await asyncio.gather(transport.run(), say_something())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,59 @@
import asyncio
import os
import aiohttp
from dailyai.pipeline.frames import LLMMessagesQueueFrame
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.deepgram_ai_services import DeepgramTTSService
from dailyai.services.open_ai_services import OpenAILLMService
from examples.foundational.support.runner import configure
async def main(room_url):
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 1
transport = DailyTransportService(
room_url,
None,
"Say One Thing From an LLM",
duration_minutes=meeting_duration_minutes,
mic_enabled=True
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
# tts = AzureTTSService(api_key=os.getenv("AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"))
# tts = DeepgramTTSService(aiohttp_session=session, api_key=os.getenv("DEEPGRAM_API_KEY"), voice=os.getenv("DEEPGRAM_VOICE"))
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
# llm = OpenAILLMService(api_key=os.getenv("OPENAI_CHATGPT_API_KEY"))
messages = [{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world."
}]
tts_task = asyncio.create_task(
tts.run_to_queue(
transport.send_queue,
llm.run([LLMMessagesQueueFrame(messages)]),
)
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts_task
await transport.stop_when_done()
await transport.run()
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,53 @@
import asyncio
import aiohttp
import os
from dailyai.pipeline.frames import TextFrame
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.open_ai_services import OpenAIImageGenService
from dailyai.services.azure_ai_services import AzureImageGenServiceREST
from examples.foundational.support.runner import configure
local_joined = False
participant_joined = False
async def main(room_url):
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 1
transport = DailyTransportService(
room_url,
None,
"Show a still frame image",
duration_minutes=meeting_duration_minutes,
mic_enabled=False,
camera_enabled=True,
camera_width=1024,
camera_height=1024
)
imagegen = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"))
# imagegen = OpenAIImageGenService(aiohttp_session=session, api_key=os.getenv("OPENAI_DALLE_API_KEY"), image_size="1024x1024")
# imagegen = AzureImageGenServiceREST(image_size="1024x1024", aiohttp_session=session, api_key=os.getenv("AZURE_DALLE_API_KEY"), endpoint=os.getenv("AZURE_DALLE_ENDPOINT"), model=os.getenv("AZURE_DALLE_MODEL"))
image_task = asyncio.create_task(
imagegen.run_to_queue(
transport.send_queue, [
TextFrame("a cat in the style of picasso")]))
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await image_task
await transport.run()
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,50 @@
import asyncio
import aiohttp
import os
import tkinter as tk
from dailyai.pipeline.frames import TextFrame
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.local_transport_service import LocalTransportService
local_joined = False
participant_joined = False
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 2
tk_root = tk.Tk()
tk_root.title("Calendar")
transport = LocalTransportService(
tk_root=tk_root,
mic_enabled=True,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=meeting_duration_minutes,
)
imagegen = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
image_task = asyncio.create_task(
imagegen.run_to_queue(
transport.send_queue, [TextFrame("a cat in the style of picasso")]
)
)
async def run_tk():
while not transport._stop_threads.is_set():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(transport.run(), image_task, run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,71 @@
import asyncio
import os
import aiohttp
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.pipeline.frames import EndFrame, LLMMessagesQueueFrame
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from examples.foundational.support.runner import configure
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
room_url,
None,
"Static And Dynamic Speech",
duration_minutes=1,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
azure_tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"))
elevenlabs_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
messages = [{"role": "system", "content": "tell the user a joke about llamas"}]
# Start a task to run the LLM to create a joke, and convert the LLM output to audio frames. This task
# will run in parallel with generating and speaking the audio for static text, so there's no delay to
# speak the LLM response.
buffer_queue = asyncio.Queue()
source_queue = asyncio.Queue()
pipeline = Pipeline(source = source_queue, sink=buffer_queue, processors=[llm, elevenlabs_tts])
source_queue.put_nowait(LLMMessagesQueueFrame(messages))
pipeline_run_task = pipeline.run_pipeline()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await azure_tts.say("My friend the LLM is now going to tell a joke about llamas.", transport.send_queue)
async def buffer_to_send_queue():
while True:
frame = await buffer_queue.get()
await transport.send_queue.put(frame)
buffer_queue.task_done()
if isinstance(frame, EndFrame):
break
await asyncio.gather(pipeline_run_task, buffer_to_send_queue())
await transport.stop_when_done()
await transport.run()
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,93 @@
import asyncio
from re import S
import aiohttp
import os
from dailyai.pipeline.aggregators import GatedAggregator, LLMFullResponseAggregator, ParallelPipeline, SentenceAggregator
from dailyai.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesQueueFrame, LLMResponseStartFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.azure_ai_services import AzureLLMService, AzureImageGenServiceREST, AzureTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.open_ai_services import OpenAIImageGenService
from examples.foundational.support.runner import configure
async def main(room_url):
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 5
transport = DailyTransportService(
room_url,
None,
"Month Narration Bot",
duration_minutes=meeting_duration_minutes,
mic_enabled=True,
camera_enabled=True,
mic_sample_rate=16000,
camera_width=1024,
camera_height=1024
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV")
dalle = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"))
source_queue = asyncio.Queue()
for month in ["January", "February"]:
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
await source_queue.put(LLMMessagesQueueFrame(messages))
await source_queue.put(EndFrame())
gated_aggregator = GatedAggregator(
gate_open_fn=lambda frame: isinstance(frame, ImageFrame),
gate_close_fn=lambda frame: isinstance(frame, LLMResponseStartFrame),
start_open=False,
)
sentence_aggregator = SentenceAggregator()
llm_full_response_aggregator = LLMFullResponseAggregator()
pipeline = Pipeline(
source=source_queue,
sink=transport.send_queue,
processors=[
llm,
sentence_aggregator,
ParallelPipeline([[tts], [llm_full_response_aggregator, dalle]]),
gated_aggregator,
],
)
pipeline_task = pipeline.run_pipeline()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await pipeline_task
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
await transport.run()
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,134 @@
import aiohttp
import argparse
import asyncio
import tkinter as tk
import os
from dailyai.pipeline.frames import AudioFrame, ImageFrame
from dailyai.services.azure_ai_services import AzureLLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.local_transport_service import LocalTransportService
async def main(room_url):
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 5
tk_root = tk.Tk()
tk_root.title("Calendar")
transport = LocalTransportService(
mic_enabled=True,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=meeting_duration_minutes,
tk_root=tk_root,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV",
)
dalle = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
# Get a complete audio chunk from the given text. Splitting this into its own
# coroutine lets us ensure proper ordering of the audio chunks on the send queue.
async def get_all_audio(text):
all_audio = bytearray()
async for audio in tts.run_tts(text):
all_audio.extend(audio)
return all_audio
async def get_month_data(month):
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
image_description = await llm.run_llm(messages)
if not image_description:
return
to_speak = f"{month}: {image_description}"
audio_task = asyncio.create_task(get_all_audio(to_speak))
image_task = asyncio.create_task(dalle.run_image_gen(image_description))
(audio, image_data) = await asyncio.gather(
audio_task, image_task
)
return {
"month": month,
"text": image_description,
"image_url": image_data[0],
"image": image_data[1],
"audio": audio,
}
months: list[str] = [
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
]
async def show_images():
# This will play the months in the order they're completed. The benefit
# is we'll have as little delay as possible before the first month, and
# likely no delay between months, but the months won't display in order.
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
if data:
await transport.send_queue.put(
[
ImageFrame(data["image_url"], data["image"]),
AudioFrame(data["audio"]),
]
)
await asyncio.sleep(25)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
async def run_tk():
while not transport._stop_threads.is_set():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
await asyncio.gather(transport.run(), show_images(), run_tk())
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
args, unknown = parser.parse_known_args()
asyncio.run(main(args.url))

View File

@@ -0,0 +1,67 @@
import asyncio
import os
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.ai_services import FrameLogger
from dailyai.pipeline.aggregators import LLMAssistantContextAggregator, LLMUserContextAggregator
from examples.foundational.support.runner import configure
async def main(room_url: str, token):
transport = DailyTransportService(
room_url,
token,
"Respond bot",
duration_minutes=5,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
vad_enabled=True
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"))
fl = FrameLogger("Inner")
fl2 = FrameLogger("Outer")
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts.say("Hi, I'm listening!", transport.send_queue)
async def handle_transcriptions():
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(messages, transport._my_participant_id)
pipeline = Pipeline(
processors=[
fl,
tma_in,
llm,
fl2,
tma_out,
tts
],
)
await transport.run_uninterruptible_pipeline(pipeline)
transport.transcription_settings["extra"]["endpointing"] = True
transport.transcription_settings["extra"]["punctuate"] = True
await asyncio.gather(transport.run(), handle_transcriptions())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,115 @@
import argparse
import asyncio
import os
from typing import AsyncGenerator
import aiohttp
import requests
import time
import urllib.parse
from PIL import Image
from dailyai.pipeline.frames import ImageFrame, Frame
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.ai_services import AIService
from dailyai.pipeline.aggregators import LLMAssistantContextAggregator, LLMUserContextAggregator
from dailyai.services.fal_ai_services import FalImageGenService
from examples.foundational.support.runner import configure
class ImageSyncAggregator(AIService):
def __init__(self, speaking_path: str, waiting_path: str):
self._speaking_image = Image.open(speaking_path)
self._speaking_image_bytes = self._speaking_image.tobytes()
self._waiting_image = Image.open(waiting_path)
self._waiting_image_bytes = self._waiting_image.tobytes()
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
yield ImageFrame(None, self._speaking_image_bytes)
yield frame
yield ImageFrame(None, self._waiting_image_bytes)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
room_url,
token,
"Respond bot",
5,
)
transport._camera_enabled = True
transport._camera_width = 1024
transport._camera_height = 1024
transport._mic_enabled = True
transport._mic_sample_rate = 16000
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"))
img = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"))
async def get_images():
get_speaking_task = asyncio.create_task(
img.run_image_gen("An image of a cat speaking")
)
get_waiting_task = asyncio.create_task(
img.run_image_gen("An image of a cat waiting")
)
(speaking_data, waiting_data) = await asyncio.gather(
get_speaking_task, get_waiting_task
)
return speaking_data, waiting_data
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts.say("Hi, I'm listening!", transport.send_queue)
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id
)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
)
await tts.run_to_queue(
transport.send_queue,
image_sync_aggregator.run(
tma_out.run(
llm.run(
tma_in.run(
transport.get_receive_frames()
)
)
)
)
)
transport.transcription_settings["extra"]["punctuate"] = True
await asyncio.gather(transport.run(), handle_transcriptions())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,63 @@
import asyncio
import aiohttp
import os
from dailyai.pipeline.aggregators import LLMAssistantContextAggregator, LLMResponseAggregator, LLMUserContextAggregator, UserResponseAggregator
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.ai_services import FrameLogger
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from support.runner import configure
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
room_url,
token,
"Respond bot",
duration_minutes=5,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
vad_enabled=True,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"))
pipeline = Pipeline([FrameLogger(), llm, FrameLogger(), tts])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts.say("Hi, I'm listening!", transport.send_queue)
async def run_conversation():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
]
await transport.run_interruptible_pipeline(
pipeline,
post_processor=LLMResponseAggregator(
messages
),
pre_processor=UserResponseAggregator(
messages
),
)
transport.transcription_settings["extra"]["punctuate"] = False
await asyncio.gather(transport.run(), run_conversation())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,115 @@
import aiohttp
import asyncio
import os
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.pipeline.frames import AudioFrame, ImageFrame
from examples.foundational.support.runner import configure
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
room_url,
None,
"Respond bot",
duration_minutes=10,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=True,
camera_width=1024,
camera_height=1024
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
tts1 = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"))
tts2 = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl")
dalle = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"))
bot1_messages = [
{"role": "system", "content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long."},
]
bot2_messages = [
{
"role": "system",
"content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich."},
]
async def get_bot1_statement():
# Run the LLMs synchronously for the back-and-forth
bot1_msg = await llm.run_llm(bot1_messages)
print(f"bot1_msg: {bot1_msg}")
if bot1_msg:
bot1_messages.append({"role": "assistant", "content": bot1_msg})
bot2_messages.append({"role": "user", "content": bot1_msg})
all_audio = bytearray()
async for audio in tts1.run_tts(bot1_msg):
all_audio.extend(audio)
return all_audio
async def get_bot2_statement():
# Run the LLMs synchronously for the back-and-forth
bot2_msg = await llm.run_llm(bot2_messages)
print(f"bot2_msg: {bot2_msg}")
if bot2_msg:
bot2_messages.append({"role": "assistant", "content": bot2_msg})
bot1_messages.append({"role": "user", "content": bot2_msg})
all_audio = bytearray()
async for audio in tts2.run_tts(bot2_msg):
all_audio.extend(audio)
return all_audio
async def argue():
for i in range(100):
print(f"In iteration {i}")
bot1_description = "A woman conservatively dressed as a librarian in a library surrounded by books, cartoon, serious, highly detailed"
(audio1, image_data1) = await asyncio.gather(
get_bot1_statement(), dalle.run_image_gen(bot1_description)
)
await transport.send_queue.put(
[
ImageFrame(None, image_data1[1]),
AudioFrame(audio1),
]
)
bot2_description = "A cat dressed in a hot dog costume, cartoon, bright colors, funny, highly detailed"
(audio2, image_data2) = await asyncio.gather(
get_bot2_statement(), dalle.run_image_gen(bot2_description)
)
await transport.send_queue.put(
[
ImageFrame(None, image_data2[1]),
AudioFrame(audio2),
]
)
await asyncio.gather(transport.run(), argue())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,179 @@
import aiohttp
import asyncio
import os
import random
from typing import AsyncGenerator
from PIL import Image
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import LLMUserContextAggregator, LLMAssistantContextAggregator
from dailyai.pipeline.frames import (
Frame,
TextFrame,
ImageFrame,
SpriteFrame,
TranscriptionQueueFrame,
)
from dailyai.services.ai_services import AIService
from examples.foundational.support.runner import configure
sprites = {}
image_files = [
'sc-default.png',
'sc-talk.png',
'sc-listen-1.png',
'sc-think-1.png',
'sc-think-2.png',
'sc-think-3.png',
'sc-think-4.png'
]
script_dir = os.path.dirname(__file__)
for file in image_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites[file] = img.tobytes()
# When the bot isn't talking, show a static image of the cat listening
quiet_frame = ImageFrame("", sprites["sc-listen-1.png"])
# When the bot is talking, build an animation from two sprites
talking_list = [sprites['sc-default.png'], sprites['sc-talk.png']]
talking = [random.choice(talking_list) for x in range(30)]
talking_frame = SpriteFrame(images=talking)
# TODO: Support "thinking" as soon as we get a valid transcript, while LLM is processing
thinking_list = [
sprites['sc-think-1.png'],
sprites['sc-think-2.png'],
sprites['sc-think-3.png'],
sprites['sc-think-4.png']]
thinking_frame = SpriteFrame(images=thinking_list)
class TranscriptFilter(AIService):
def __init__(self, bot_participant_id=None):
self.bot_participant_id = bot_participant_id
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, TranscriptionQueueFrame):
if frame.participantId != self.bot_participant_id:
yield frame
class NameCheckFilter(AIService):
def __init__(self, names: list[str]):
self.names = names
self.sentence = ""
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
content: str = ""
# TODO: split up transcription by participant
if isinstance(frame, TextFrame):
content = frame.text
self.sentence += content
if self.sentence.endswith((".", "?", "!")):
if any(name in self.sentence for name in self.names):
out = self.sentence
self.sentence = ""
yield TextFrame(out)
else:
out = self.sentence
self.sentence = ""
class ImageSyncAggregator(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
yield talking_frame
yield frame
yield quiet_frame
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
room_url,
token,
"Santa Cat",
duration_minutes=3,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=True,
camera_width=720,
camera_height=1280
)
transport._mic_enabled = True
transport._mic_sample_rate = 16000
transport._camera_enabled = True
transport._camera_width = 720
transport._camera_height = 1280
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl")
isa = ImageSyncAggregator()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts.say("Hi! If you want to talk to me, just say 'hey Santa Cat'.", transport.send_queue)
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while. Your responses should only be a few sentences long."},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id
)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
tf = TranscriptFilter(transport._my_participant_id)
ncf = NameCheckFilter(["Santa Cat", "Santa"])
await tts.run_to_queue(
transport.send_queue,
isa.run(
tma_out.run(
llm.run(
tma_in.run(
ncf.run(
tf.run(
transport.get_receive_frames()
)
)
)
)
)
)
)
async def starting_image():
await transport.send_queue.put(quiet_frame)
transport.transcription_settings["extra"]["punctuate"] = True
await asyncio.gather(transport.run(), handle_transcriptions(), starting_image())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,131 @@
import aiohttp
import asyncio
import logging
import os
import wave
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import LLMContextAggregator, LLMUserContextAggregator, LLMAssistantContextAggregator
from dailyai.services.ai_services import AIService, FrameLogger
from dailyai.pipeline.frames import Frame, AudioFrame, LLMResponseEndFrame, LLMMessagesQueueFrame
from typing import AsyncGenerator
from examples.foundational.support.runner import configure
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s") # or whatever
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
sounds = {}
sound_files = [
'ding1.wav',
'ding2.wav'
]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = audio_file.readframes(-1)
class OutboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMResponseEndFrame):
yield AudioFrame(sounds["ding1.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
class InboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMMessagesQueueFrame):
yield AudioFrame(sounds["ding2.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
room_url,
token,
"Respond bot",
duration_minutes=5,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV")
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts.say("Hi, I'm listening!", transport.send_queue)
await transport.send_queue.put(AudioFrame(sounds["ding1.wav"]))
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id
)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
await out_sound.run_to_queue(
transport.send_queue,
tts.run(
fl.run(
tma_out.run(
llm.run(
fl2.run(
in_sound.run(
tma_in.run(
transport.get_receive_frames()
)
)
)
)
)
)
)
)
transport.transcription_settings["extra"]["punctuate"] = True
await asyncio.gather(transport.run(), handle_transcriptions())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,39 @@
import asyncio
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.whisper_ai_services import WhisperSTTService
from examples.foundational.support.runner import configure
async def main(room_url: str):
transport = DailyTransportService(
room_url,
None,
"Transcription bot",
start_transcription=True,
mic_enabled=False,
camera_enabled=False,
speaker_enabled=True
)
stt = WhisperSTTService()
transcription_output_queue = asyncio.Queue()
async def handle_transcription():
print("`````````TRANSCRIPTION`````````")
while True:
item = await transcription_output_queue.get()
print(item.text)
async def handle_speaker():
await stt.run_to_queue(
transcription_output_queue,
transport.get_receive_frames()
)
await asyncio.gather(transport.run(), handle_speaker(), handle_transcription())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,59 @@
import argparse
import asyncio
import wave
from dailyai.pipeline.frames import EndFrame, TranscriptionQueueFrame
from dailyai.services.local_transport_service import LocalTransportService
from dailyai.services.whisper_ai_services import WhisperSTTService
async def main(room_url: str):
global transport
global stt
meeting_duration_minutes = 1
transport = LocalTransportService(
mic_enabled=True,
camera_enabled=False,
speaker_enabled=True,
duration_minutes=meeting_duration_minutes,
start_transcription=True
)
stt = WhisperSTTService()
transcription_output_queue = asyncio.Queue()
transport_done = asyncio.Event()
async def handle_transcription():
print("`````````TRANSCRIPTION`````````")
while not transport_done.is_set():
item = await transcription_output_queue.get()
print("got item from queue", item)
if isinstance(item, TranscriptionQueueFrame):
print(item.text)
elif isinstance(item, EndFrame):
break
print("handle_transcription done")
async def handle_speaker():
await stt.run_to_queue(
transcription_output_queue, transport.get_receive_frames()
)
await transcription_output_queue.put(EndFrame())
print("handle speaker done.")
async def run_until_done():
await transport.run()
transport_done.set()
print("run_until_done done")
await asyncio.gather(run_until_done(), handle_speaker(), handle_transcription())
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
args, unknown = parser.parse_known_args()
asyncio.run(main(args.url))

View File

@@ -0,0 +1,376 @@
import aiohttp
import asyncio
import json
import random
import os
import re
import wave
from typing import AsyncGenerator
from PIL import Image
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.deepgram_ai_services import DeepgramTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import LLMAssistantContextAggregator, LLMContextAggregator, LLMUserContextAggregator, UserResponseAggregator, LLMResponseAggregator
from support.runner import configure
from dailyai.pipeline.frames import LLMMessagesQueueFrame, TranscriptionQueueFrame, Frame, TextFrame, LLMFunctionCallFrame, LLMResponseEndFrame, StartFrame, AudioFrame, SpriteFrame, ImageFrame
from dailyai.services.ai_services import FrameLogger, AIService
import logging
logging.basicConfig(level=logging.DEBUG)
sounds = {}
sound_files = [
'clack-short.wav',
'clack.wav',
'clack-short-quiet.wav'
]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = audio_file.readframes(-1)
steps = [
{
"prompt": "Start by introducing yourself. Then, ask the user to confirm their identity by telling you their birthday, including the year. When they answer with their birthday, call the verify_birthday function.",
"run_async": False,
"failed": "The user provided an incorrect birthday. Ask them for their birthday again. When they answer, call the verify_birthday function.", "tools": [{
"type": "function",
"function": {
"name": "verify_birthday",
"description": "Use this function to verify the user has provided their correct birthday.",
"parameters": {
"type": "object",
"properties": {
"birthday": {
"type": "string",
"description": "The user's birthdate, including the year. The user can provide it in any format, but convert it to YYYY-MM-DD format to call this function."
}
}
}
}
}]},
{
"prompt": "Next, thank the user for confirming their identity, then ask the user to list their current prescriptions. Each prescription needs to have a medication name and a dosage. Do not call the list_prescriptions function with any unknown dosages.",
"run_async": True,
"tools": [{
"type": "function",
"function": {
"name": "list_prescriptions",
"description": "Once the user has provided a list of their prescription medications, call this function.",
"parameters": {
"type": "object",
"properties": {
"prescriptions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"medication": {
"type": "string",
"description": "The medication's name"
},
"dosage": {
"type": "string",
"description": "The prescription's dosage"
}
}
}
}
}
}
}
}]
},
{
"prompt": "Next, ask the user if they have any allergies. Once they have listed their allergies or confirmed they don't have any, call the list_allergies function.",
"run_async": True,
"tools": [
{
"type": "function",
"function": {
"name": "list_allergies",
"description": "Once the user has provided a list of their allergies, call this function.",
"parameters": {
"type": "object",
"properties": {
"allergies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "What the user is allergic to"
}
}
}
}
}
}
}
}
]
},
{
"prompt": "Now ask the user if they have any medical conditions the doctor should know about. Once they've answered the question, call the list_conditions function.",
"run_async": True,
"tools": [
{
"type": "function",
"function": {
"name": "list_conditions",
"description": "Once the user has provided a list of their medical conditions, call this function.",
"parameters": {
"type": "object",
"properties": {
"conditions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user's medical condition"
}
}
}
}
}
}
}
},
],
},
{
"prompt": "Finally, ask the user the reason for their doctor visit today. Once they answer, call the list_visit_reasons function.",
"run_async": True,
"tools": [
{
"type": "function",
"function": {
"name": "list_visit_reasons",
"description": "Once the user has provided a list of the reasons they are visiting a doctor today, call this function.",
"parameters": {
"type": "object",
"properties": {
"visit_reasons": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user's reason for visiting the doctor"
}
}
}
}
}
}
}
}
]
},
{"prompt": "Now, thank the user and end the conversation.",
"run_async": True, "tools": []},
{"prompt": "", "run_async": True, "tools": []}
]
current_step = 0
class TranscriptFilter(AIService):
def __init__(self, bot_participant_id=None):
super().__init__()
self.bot_participant_id = bot_participant_id
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, TranscriptionQueueFrame):
if frame.participantId != self.bot_participant_id:
yield frame
class ChecklistProcessor(AIService):
def __init__(self, messages, llm, tools, *args, **kwargs):
super().__init__(*args, **kwargs)
self._messages = messages
self._llm = llm
self._tools = tools
self._function_name = ""
self._arguments = ""
self._id = "You are Jessica, an agent for a company called Tri-County Health Services. Your job is to collect important information from the user before their doctor visit. You're talking to Chad Bailey. You should address the user by their first name and be polite and professional. You're not a medical professional, so you shouldn't provide any advice. Keep your responses short. Your job is to collect information to give to a doctor. Don't make assumptions about what values to plug into functions. Ask for clarification if a user response is ambiguous."
self._acks = ["One sec.", "Let me confirm that.", "Thanks.", "OK."]
messages.append(
{"role": "system", "content": f"{self._id} {steps[0]['prompt']}"})
def verify_birthday(self, args):
return args['birthday'] == "1983-01-01"
def list_prescriptions(self, args):
# print(f"--- Prescriptions: {args['prescriptions']}\n")
pass
def list_allergies(self, args):
# print(f"--- Allergies: {args['allergies']}\n")
pass
def list_conditions(self, args):
# print(f"--- Medical Conditions: {args['conditions']}")
pass
def list_visit_reasons(self, args):
# print(f"Visit Reasons: {args['visit_reasons']}")
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
global current_step
this_step = steps[current_step]
# TODO-CB: forcing a global here :/
self._tools.clear()
self._tools.extend(this_step['tools'])
if isinstance(frame, LLMFunctionCallFrame) and frame.function_name:
print(f"... Preparing function call: {frame.function_name}")
self._function_name = frame.function_name
if this_step['run_async']:
# Get the LLM talking about the next step before getting the rest
# of the function call completion
current_step += 1
# yield TextFrame(f"We should move on to Step {current_step}.")
self._messages.append({
"role": "system", "content": steps[current_step]['prompt']})
# yield LLMMessagesQueueFrame(self._messages)
yield LLMMessagesQueueFrame(self._messages)
async for frame in llm.process_frame(LLMMessagesQueueFrame(self._messages), tool_choice="none"):
yield frame
else:
# Insert a quick response while we run the function
# yield AudioFrame(sounds["clack-short-quiet.wav"])
pass
elif isinstance(frame, LLMFunctionCallFrame) and frame.arguments:
self._arguments += frame.arguments
elif isinstance(frame, LLMResponseEndFrame):
if self._function_name and self._arguments:
print(
f"--> Calling function: {self._function_name} with arguments:")
pretty_json = re.sub("\n", "\n ", json.dumps(
json.loads(self._arguments), indent=2))
print(f"--> {pretty_json}\n")
fn = getattr(self, self._function_name)
result = fn(json.loads(self._arguments))
self._function_name = ""
self._arguments = ""
if not this_step['run_async']:
if result:
current_step += 1
# yield TextFrame(f"We should move on to Step {current_step}.")
self._messages.append({
"role": "system", "content": steps[current_step]['prompt']})
# yield LLMMessagesQueueFrame(self._messages)
yield LLMMessagesQueueFrame(self._messages)
async for frame in llm.process_frame(LLMMessagesQueueFrame(self._messages), tool_choice="none"):
yield frame
else:
self._messages.append({
"role": "system", "content": this_step['failed']})
# yield LLMMessagesQueueFrame(self._messages)
yield LLMMessagesQueueFrame(self._messages)
async for frame in llm.process_frame(LLMMessagesQueueFrame(self._messages), tool_choice="none"):
yield frame
print(f"<-- Verify result: {result}\n")
else:
yield frame
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
global transport
global llm
global tts
transport = DailyTransportService(
room_url,
token,
"Intake Bot",
5,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
start_transcription=True,
vad_enabled=True
)
# TODO-CB: Go back to vad_enabled
messages = []
tools = []
# llm = AzureLLMService(api_key=os.getenv("AZURE_CHATGPT_API_KEY"), endpoint=os.getenv(
# "AZURE_CHATGPT_ENDPOINT"), model=os.getenv("AZURE_CHATGPT_MODEL"))
llm = OpenAILLMService(api_key=os.getenv(
"OPENAI_CHATGPT_API_KEY"), model="gpt-4-1106-preview", tools=tools) # gpt-4-1106-preview
# tts = AzureTTSService(api_key=os.getenv(
# "AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"))
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv(
"ELEVENLABS_API_KEY"), voice_id="XrExE9yKIg1WjnnlVkGX") # matilda
# tts = DeepgramTTSService(aiohttp_session=session, api_key=os.getenv(
# "DEEPGRAM_API_KEY"), voice="aura-asteria-en")
# lca = LLMContextAggregator(
# messages=messages, bot_participant_id=transport._my_participant_id)
checklist = ChecklistProcessor(messages, llm, tools)
fl = FrameLogger("FRAME LOGGER 1:")
fl2 = FrameLogger("FRAME LOGGER 2:")
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
fl = FrameLogger("first other participant")
# TODO-CB: Make sure this message gets into the context somehow
await tts.run_to_queue(
transport.send_queue,
llm.run([LLMMessagesQueueFrame(messages)]),
)
async def handle_intake():
pipeline = Pipeline(
processors=[
fl,
llm,
fl2,
checklist,
tts
]
)
await transport.run_interruptible_pipeline(pipeline,
post_processor=LLMResponseAggregator(
messages
),
pre_processor=UserResponseAggregator(messages)
)
transport.transcription_settings["extra"]["endpointing"] = True
transport.transcription_settings["extra"]["punctuate"] = True
try:
await asyncio.gather(transport.run(), handle_intake())
except (asyncio.CancelledError, KeyboardInterrupt):
print('whoops')
transport.stop()
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 870 KiB

After

Width:  |  Height:  |  Size: 870 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 871 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 872 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 868 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -0,0 +1,53 @@
import argparse
import os
import time
import urllib
import requests
from dotenv import load_dotenv
load_dotenv()
def configure():
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL.")
if not key:
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
# Create a meeting token for the given room with an expiration 1 hour in the future.
room_name: str = urllib.parse.urlparse(url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={"Authorization": f"Bearer {key}"},
json={
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
},
)
if res.status_code != 200:
raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
return (url, token)

View File

@@ -7,11 +7,12 @@ import random
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.pipeline.frames import Frame, FrameType
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
async def main(room_url:str, token):
async def main(room_url: str, token):
global transport
global llm
global tts
@@ -22,30 +23,29 @@ async def main(room_url:str, token):
"Imagebot",
1,
)
transport.mic_enabled = True
transport.camera_enabled = True
transport.mic_sample_rate = 16000
transport.camera_width = 1024
transport.camera_height = 1024
transport._mic_enabled = True
transport._camera_enabled = True
transport._mic_sample_rate = 16000
transport._camera_width = 1024
transport._camera_height = 1024
llm = AzureLLMService()
tts = AzureTTSService()
img = FalImageGenService()
async def handle_transcriptions():
print("handle_transcriptions got called")
sentence = ""
async for message in transport.get_transcriptions():
print(f"transcription message: {message}")
if message["session_id"] == transport.my_participant_id:
if message["session_id"] == transport._my_participant_id:
continue
finder = message["text"].find("start over")
finder = message["text"].find("start over")
print(f"finder: {finder}")
if finder >= 0:
async for audio in tts.run_tts(f"Resetting."):
transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
sentence = ""
continue
# todo: we could differentiate between transcriptions from different participants
@@ -54,12 +54,12 @@ async def main(room_url:str, token):
# TODO: Cache this audio
phrase = random.choice(["OK.", "Got it.", "Sure.", "You bet.", "Sure thing."])
async for audio in tts.run_tts(phrase):
transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
img_result = img.run_image_gen(sentence, "1024x1024")
awaited_img = await asyncio.gather(img_result)
transport.output_queue.put(
[
QueueFrame(FrameType.IMAGE_FRAME, awaited_img[0][1]),
Frame(FrameType.IMAGE_FRAME, awaited_img[0][1]),
]
)
@@ -69,9 +69,10 @@ async def main(room_url:str, token):
if participant["info"]["isLocal"]:
return
async for audio in tts.run_tts("Describe an image, and I'll create it."):
audio_generator = tts.run_tts(f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
audio_generator = tts.run_tts(
f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
async for audio in audio_generator:
transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
transport.transcription_settings["extra"]["punctuate"] = False
transport.transcription_settings["extra"]["endpointing"] = False

View File

@@ -0,0 +1,134 @@
import aiohttp
import asyncio
import os
import wave
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.pipeline.aggregators import LLMContextAggregator
from dailyai.services.ai_services import AIService, FrameLogger
from dailyai.pipeline.frames import Frame, AudioFrame, LLMResponseEndFrame, LLMMessagesQueueFrame
from typing import AsyncGenerator
from examples.foundational.support.runner import configure
sounds = {}
sound_files = [
'ding1.wav',
'ding2.wav'
]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = audio_file.readframes(-1)
class OutboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMResponseEndFrame):
yield AudioFrame(sounds["ding1.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
class InboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMMessagesQueueFrame):
yield AudioFrame(sounds["ding2.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
async def main(room_url: str, token, phone):
async with aiohttp.ClientSession() as session:
global transport
global llm
global tts
transport = DailyTransportService(
room_url,
token,
"Respond bot",
300,
)
transport._mic_enabled = True
transport._mic_sample_rate = 16000
transport._camera_enabled = False
llm = AzureLLMService()
tts = AzureTTSService()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts.say("Hi, I'm listening!", transport.send_queue)
await transport.send_queue.put(AudioFrame(sounds["ding1.wav"]))
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
]
tma_in = LLMContextAggregator(
messages, "user", transport._my_participant_id
)
tma_out = LLMContextAggregator(
messages, "assistant", transport._my_participant_id
)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
await out_sound.run_to_queue(
transport.send_queue,
tts.run(
tma_out.run(
llm.run(
fl2.run(
in_sound.run(
tma_in.run(
transport.get_receive_frames()
)
)
)
)
)
)
)
@transport.event_handler("on_participant_joined")
async def pax_joined(transport, pax):
print(f"PARTICIPANT JOINED: {pax}")
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if (state == "joined"):
if (phone):
transport.start_recording()
transport.dialout(phone)
transport.transcription_settings["extra"]["punctuate"] = True
await asyncio.gather(transport.run(), handle_transcriptions())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,39 @@
# setup
FROM python:3.11.5
WORKDIR /app
COPY requirements.txt /app
COPY *.py /app
COPY pyproject.toml /app
COPY src/ /app/src/
WORKDIR /app
RUN ls --recursive /app/
RUN pip3 install --upgrade -r requirements.txt
RUN python -m build .
RUN pip3 install .
# If running on Ubuntu, Azure TTS requires some extra config
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi
RUN wget -O - https://www.openssl.org/source/openssl-1.1.1w.tar.gz | tar zxf -
WORKDIR openssl-1.1.1w
RUN ./config --prefix=/usr/local
RUN make -j $(nproc)
RUN make install_sw install_ssldirs
RUN ldconfig -v
ENV SSL_CERT_DIR=/etc/ssl/certs
#ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
RUN apt clean
RUN apt-get update
RUN apt-get -y install build-essential libssl-dev ca-certificates libasound2 wget
ENV PYTHONUNBUFFERED=1
WORKDIR /app
EXPOSE 8000
# run
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]

View File

@@ -0,0 +1,13 @@
# Server Example
This is an example server based on [Santa Cat](https://santacat.ai). You can run the server with this command:
```
flask --app daily-bot-manager.py --debug run
```
Once the server is started, you can load `http://127.0.0.1:5000/spin-up-kitty` in a browser, and the server will do the following:
- Create a new, randomly-named Daily room with `DAILY_API_KEY` from your .env file or environment
- Start the `10-wake-word.py` example and connect it to that room
- 301 redirect your browser to the room

View File

@@ -0,0 +1,33 @@
import time
import urllib
from dotenv import load_dotenv
import requests
from flask import jsonify
import os
load_dotenv()
def get_meeting_token(room_name, daily_api_key, token_expiry):
api_path = os.getenv('DAILY_API_PATH') or 'https://api.daily.co/v1'
if not token_expiry:
token_expiry = time.time() + 600
res = requests.post(
f'{api_path}/meeting-tokens',
headers={
'Authorization': f'Bearer {daily_api_key}'},
json={
'properties': {
'room_name': room_name,
'is_owner': True,
'exp': token_expiry}})
if res.status_code != 200:
return jsonify({'error': 'Unable to create meeting token', 'detail': res.text}), 500
meeting_token = res.json()['token']
return meeting_token
def get_room_name(room_url):
return urllib.parse.urlparse(room_url).path[1:]

View File

@@ -0,0 +1,100 @@
import os
import requests
import subprocess
import time
from flask import Flask, jsonify, request, redirect
from flask_cors import CORS
from examples.server.auth import get_meeting_token
from dotenv import load_dotenv
load_dotenv()
app = Flask(__name__)
CORS(app)
print(f"I loaded an environment, and my FAL_KEY_ID is {os.getenv('FAL_KEY_ID')}")
def start_bot(bot_path, args=None):
daily_api_key = os.getenv("DAILY_API_KEY")
api_path = os.getenv("DAILY_API_PATH") or "https://api.daily.co/v1"
timeout = int(os.getenv("DAILY_ROOM_TIMEOUT") or os.getenv("DAILY_BOT_MAX_DURATION") or 300)
exp = time.time() + timeout
res = requests.post(
f"{api_path}/rooms",
headers={"Authorization": f"Bearer {daily_api_key}"},
json={
"properties": {
"exp": exp,
"enable_chat": True,
"enable_emoji_reactions": True,
"eject_at_room_exp": True,
"enable_prejoin_ui": False,
"enable_recording": "cloud"
}
},
)
if res.status_code != 200:
return (
jsonify(
{
"error": "Unable to create room",
"status_code": res.status_code,
"text": res.text,
}
),
500,
)
room_url = res.json()["url"]
room_name = res.json()["name"]
meeting_token = get_meeting_token(room_name, daily_api_key, exp)
if args:
extra_args = " ".join([f'-{x[0]} "{x[1]}"' for x in args])
else:
extra_args = ""
proc = subprocess.Popen(
[
f"python {bot_path} -u {room_url} -t {meeting_token} -k {daily_api_key} {extra_args}"
],
shell=True,
bufsize=1,
)
# Don't return until the bot has joined the room, but wait for at most 2 seconds.
attempts = 0
while attempts < 20:
time.sleep(0.1)
attempts += 1
res = requests.get(
f"{api_path}/rooms/{room_name}/get-session-data",
headers={"Authorization": f"Bearer {daily_api_key}"},
)
if res.status_code == 200:
break
print(f"Took {attempts} attempts to join room {room_name}")
# Additional client config
config = {}
if os.getenv("CLIENT_VAD_TIMEOUT_SEC"):
config['vad_timeout_sec'] = float(os.getenv("DAILY_CLIENT_VAD_TIMEOUT_SEC"))
else:
config['vad_timeout_sec'] = 1.5
# return jsonify({"room_url": room_url, "token": meeting_token, "config": config}), 200
return redirect(room_url, code=301)
@app.route("/spin-up-kitty", methods=["GET", "POST"])
def spin_up_kitty():
return start_bot("./src/examples/foundational/10-wake-word.py")
@app.route("/healthz")
def health_check():
return "ok", 200

View File

@@ -1 +0,0 @@
These samples need to be updated! Don't rely on them.

View File

@@ -1,91 +0,0 @@
import argparse
from email.mime import image
from re import A
import requests
import time
import urllib.parse
from dailyai.async_processor.async_processor import (
LLMResponse,
ConversationProcessorCollection,
)
from dailyai.orchestrator import OrchestratorConfig, Orchestrator
from dailyai.message_handler.message_handler import MessageHandler
from dailyai.services.ai_services import AIServiceConfig
from dailyai.services.azure_ai_services import AzureImageGenService, AzureTTSService, AzureLLMService
from dailyai.services.deepgram_ai_services import DeepgramTTSService
def add_bot_to_room(room_url, token, expiration) -> None:
# A simple prompt for a simple sample.
message_handler = MessageHandler(
"""
You are a sample bot in a WebRTC session. You'll receive input as transcriptions of user's
speech, and your responses will be converted to audio via a TTS service.
Answer user's questions and be friendly, and if you can, give some ideas about how someone
could use a bot like you in a more in-depth way. Because your responses will be spoken,
try to keep them short.
"""
)
# Use Azure services for the TTS, image generation, and LLM.
# Note that you'll need to set the following environment variables:
# - AZURE_SPEECH_SERVICE_KEY
# - AZURE_SPEECH_SERVICE_REGION
# - AZURE_CHATGPT_KEY
# - AZURE_CHATGPT_ENDPOINT
# - AZURE_CHATGPT_DEPLOYMENT_ID
services = AIServiceConfig(
tts=AzureTTSService(), image=None, llm=AzureLLMService()
)
orchestrator_config = OrchestratorConfig(
room_url=room_url,
token=token,
bot_name="Simple Bot",
expiration=expiration,
)
orchestrator = Orchestrator(
orchestrator_config,
services,
message_handler,
)
orchestrator.start()
# When the orchestrator's done, we need to shut it down,
# and the various services and handlers we've created.
orchestrator.stop()
message_handler.shutdown()
services.tts.close()
services.llm.close()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument("-u", "--url", type=str, required=True, help="URL of the Daily room")
parser.add_argument(
"-k", "--apikey", type=str, required=True, help="Daily API Key (needed to create token)"
)
args: argparse.Namespace = parser.parse_args()
# Create a meeting token for the given room with an expiration 1 hour in the future.
room_name: str = urllib.parse.urlparse(args.url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={"Authorization": f"Bearer {args.apikey}"},
json={
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
},
)
if res.status_code != 200:
raise Exception(f'Failed to create meeting token: {res.status_code} {res.text}')
token: str = res.json()['token']
add_bot_to_room(args.url, token, expiration)

View File

@@ -1,172 +0,0 @@
import argparse
from email.mime import image
import logging
import os
import random
import requests
import time
import urllib.parse
from PIL import Image
from dailyai.async_processor.async_processor import (
ConversationProcessorCollection,
LLMResponse,
OrchestratorResponse
)
from dailyai.orchestrator import OrchestratorConfig, Orchestrator
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.message_handler.message_handler import MessageHandler
from dailyai.services.ai_services import AIServiceConfig
from dailyai.services.azure_ai_services import AzureImageGenService, AzureTTSService, AzureLLMService
class StaticSpriteResponse(OrchestratorResponse):
def __init__(
self,
services,
message_handler,
output_queue
) -> None:
super().__init__(services, message_handler, output_queue)
self.image_bytes:bytes | None = None
self.filenames = None # override this in subclasses
def start_preparation(self) -> None:
full_path = os.path.join(os.path.dirname(__file__), "sprites/", self.filename)
print(full_path)
with Image.open(full_path) as img:
self.image_bytes = img.tobytes()
def do_play(self) -> None:
self.output_queue.put(QueueFrame(FrameType.IMAGE, self.image_bytes))
class IntroSpriteResponse(StaticSpriteResponse):
def __init__(self, services, message_handler, output_queue) -> None:
super().__init__(services, message_handler, output_queue)
self.filename = "intro.png"
class WaitingSpriteResponse(StaticSpriteResponse):
def __init__(self, services, message_handler, output_queue) -> None:
super().__init__(services, message_handler, output_queue)
self.filename = "waiting.png"
class AnimatedSpriteLLMResponse(LLMResponse):
def __init__(self, services, message_handler, output_queue) -> None:
super().__init__(services, message_handler, output_queue)
self.filenames = ["talk-1.png", "talk-2.png"]
self.image_bytes = []
def start_preparation(self) -> None:
super().start_preparation()
for filename in self.filenames:
full_path = os.path.join(os.path.dirname(__file__), "sprites/", filename)
print(full_path)
with Image.open(full_path) as img:
self.image_bytes.append(img.tobytes())
def get_frames_from_tts_response(self, audio_frame) -> list[QueueFrame]:
return [
QueueFrame(FrameType.AUDIO, audio_frame),
QueueFrame(FrameType.IMAGE, random.choice(self.image_bytes))
]
def add_bot_to_room(room_url, token, expiration) -> None:
# A simple prompt for a simple sample.
message_handler = MessageHandler(
"""
You are a sample bot in a WebRTC session. You'll receive input as transcriptions of user's
speech, and your responses will be converted to audio via a TTS service.
Answer user's questions and be friendly, and if you can, give some ideas about how someone
could use a bot like you in a more in-depth way. Because your responses will be spoken,
try to keep them short.
"""
)
# Use Azure services for the TTS, image generation, and LLM.
# Note that you'll need to set the following environment variables:
# - AZURE_SPEECH_SERVICE_KEY
# - AZURE_SPEECH_SERVICE_REGION
# - AZURE_CHATGPT_KEY
# - AZURE_CHATGPT_ENDPOINT
# - AZURE_CHATGPT_DEPLOYMENT_ID
#
# This demo doesn't use image generation, but if you extend it to do so,
# you'll also need to set:
# - AZURE_DALLE_KEY
# - AZURE_DALLE_ENDPOINT
# - AZURE_DALLE_DEPLOYMENT_ID
services = AIServiceConfig(
tts=AzureTTSService(), image=AzureImageGenService(), llm=AzureLLMService()
)
sprite_conversation_processors = ConversationProcessorCollection(
introduction=IntroSpriteResponse,
waiting=WaitingSpriteResponse,
response=AnimatedSpriteLLMResponse,
)
orchestrator_config = OrchestratorConfig(
room_url=room_url,
token=token,
bot_name="Simple Bot",
expiration=expiration,
)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger: logging.Logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
orchestrator = Orchestrator(
orchestrator_config,
services,
message_handler,
sprite_conversation_processors
)
orchestrator.start()
# When the orchestrator's done, we need to shut it down,
# and the various services and handlers we've created.
orchestrator.stop()
message_handler.shutdown()
services.tts.close()
services.image.close()
services.llm.close()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument("-u", "--url", type=str, required=True, help="URL of the Daily room")
parser.add_argument(
"-k", "--apikey", type=str, required=True, help="Daily API Key (needed to create token)"
)
args: argparse.Namespace = parser.parse_args()
# Create a meeting token for the given room with an expiration 1 hour in the future.
room_name: str = urllib.parse.urlparse(args.url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={"Authorization": f"Bearer {args.apikey}"},
json={
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
},
)
if res.status_code != 200:
raise Exception(f'Failed to create meeting token: {res.status_code} {res.text}')
token: str = res.json()['token']
add_bot_to_room(args.url, token, expiration)

View File

@@ -1,54 +0,0 @@
import argparse
import asyncio
from typing import AsyncGenerator
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
async def main(room_url):
# create a transport service object using environment variables for
# the transport service's API key, room url, and any other configuration.
# services can all define and document the environment variables they use.
# services all also take an optional config object that is used instead of
# environment variables.
#
# the abstract transport service APIs presumably can map pretty closely
# to the daily-python basic API
meeting_duration_minutes = 1
transport = DailyTransportService(
room_url,
None,
"Say One Thing",
meeting_duration_minutes,
)
transport.mic_enabled = True
tts = ElevenLabsTTSService(voice_id="ErXwobaYiN019PkySvjV")
# Register an event handler so we can play the audio when the participant joins.
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
if participant["info"]["isLocal"]:
return
await tts.say(
"Hello there, " + participant["info"]["userName"] + "!",
transport.send_queue,
)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
await transport.run()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
args, unknown = parser.parse_known_args()
asyncio.run(main(args.url))

View File

@@ -1,55 +0,0 @@
import asyncio
import time
from typing import AsyncGenerator
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureTTSService
from dailyai.services.deepgram_ai_services import DeepgramTTSService
async def main(room_url):
# create a transport service object using environment variables for
# the transport service's API key, room url, and any other configuration.
# services can all define and document the environment variables they use.
# services all also take an optional config object that is used instead of
# environment variables.
#
# the abstract transport service APIs presumably can map pretty closely
# to the daily-python basic API
meeting_duration_minutes = 1
transport = DailyTransportService(
room_url,
None,
"Greeter",
meeting_duration_minutes,
)
transport.mic_enabled = True
# similarly, create a tts service
tts = DeepgramTTSService()
# Get the generator for the audio. This will start running in the background,
# and when we ask the generator for its items, we'll get what it's generated.
# Register an event handler so we can play the audio when the participant joins.
print("settting up handler")
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
print(f"participant joined: {participant['info']['userName']}")
if participant["info"]["isLocal"]:
return
audio_generator: AsyncGenerator[bytes, None] = tts.run_tts(f"Hello there, {participant['info']['userName']}!")
async for audio in audio_generator:
transport.output_queue.put(QueueFrame(FrameType.AUDIO, audio))
print("setting up call state handler")
@transport.event_handler("on_call_state_updated")
async def on_call_joined(transport, state):
print(f"call state callback: {state}")
await transport.run()
if __name__ == "__main__":
asyncio.run(main("https://chad-hq.daily.co/howdy"))

View File

@@ -1,52 +0,0 @@
import argparse
import asyncio
from typing import AsyncGenerator
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.ai_services import SentenceAggregator
from dailyai.services.azure_ai_services import AzureLLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
async def main(room_url):
meeting_duration_minutes = 1
transport = DailyTransportService(
room_url,
None,
"Say One Thing From an LLM",
meeting_duration_minutes,
)
transport.mic_enabled = True
tts = ElevenLabsTTSService(voice_id="29vD33N1CtxCmqQRPOHJ")
llm = AzureLLMService()
messages = [{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world."
}]
tts_task = asyncio.create_task(
tts.run_to_queue(
transport.send_queue,
SentenceAggregator().run(
llm.run([QueueFrame(FrameType.LLM_MESSAGE, messages)])
)
)
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts_task
await transport.stop_when_done()
await transport.run()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
args, unknown = parser.parse_known_args()
asyncio.run(main(args.url))

View File

@@ -1,44 +0,0 @@
import argparse
import asyncio
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.open_ai_services import OpenAIImageGenService
local_joined = False
participant_joined = False
async def main(room_url):
meeting_duration_minutes = 1
transport = DailyTransportService(
room_url,
None,
"Show a still frame image",
meeting_duration_minutes,
)
transport.mic_enabled = False
transport.camera_enabled = True
transport.camera_width = 1024
transport.camera_height = 1024
imagegen = OpenAIImageGenService(image_size="1024x1024")
image_task = asyncio.create_task(
imagegen.run_to_queue(transport.send_queue, [QueueFrame(FrameType.IMAGE_DESCRIPTION, "a cat in the style of picasso")])
)
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
await image_task
await transport.run()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
args, unknown = parser.parse_known_args()
asyncio.run(main(args.url))

View File

@@ -1,79 +0,0 @@
import argparse
import asyncio
import re
from dailyai.services.ai_services import SentenceAggregator
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
async def main(room_url:str):
global transport
global llm
global tts
transport = DailyTransportService(
room_url,
None,
"Say Two Things Bot",
1,
)
transport.mic_enabled = True
transport.mic_sample_rate = 16000
transport.camera_enabled = False
llm = AzureLLMService()
azure_tts = AzureTTSService()
elevenlabs_tts = ElevenLabsTTSService(voice_id="ErXwobaYiN019PkySvjV")
messages = [{"role": "system", "content": "tell the user a joke about llamas"}]
# Start a task to run the LLM to create a joke, and convert the LLM output to audio frames. This task
# will run in parallel with generating and speaking the audio for static text, so there's no delay to
# speak the LLM response.
buffer_queue = asyncio.Queue()
llm_response_task = asyncio.create_task(
elevenlabs_tts.run_to_queue(
buffer_queue,
SentenceAggregator().run(
llm.run([QueueFrame(FrameType.LLM_MESSAGE, messages)])
),
True,
)
)
@transport.event_handler("on_participant_joined")
async def on_joined(transport, participant):
if participant["id"] == transport.my_participant_id:
return
await azure_tts.run_to_queue(
transport.send_queue,
[QueueFrame(FrameType.SENTENCE, "My friend the LLM is now going to tell a joke about llamas.")]
)
async def buffer_to_send_queue():
while True:
frame = await buffer_queue.get()
await transport.send_queue.put(frame)
buffer_queue.task_done()
if frame.frame_type == FrameType.END_STREAM:
break
await asyncio.gather(llm_response_task, buffer_to_send_queue())
await transport.stop_when_done()
await transport.run()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
args, unknown = parser.parse_known_args()
asyncio.run(main(args.url))

View File

@@ -1,108 +0,0 @@
import argparse
import asyncio
from asyncio.queues import Queue
import re
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.services.ai_services import SentenceAggregator
from dailyai.services.azure_ai_services import AzureLLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.open_ai_services import OpenAIImageGenService
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.fal_ai_services import FalImageGenService
async def main(room_url):
meeting_duration_minutes = 5
transport = DailyTransportService(
room_url,
None,
"Month Narration Bot",
meeting_duration_minutes,
)
transport.mic_enabled = True
transport.camera_enabled = True
transport.mic_sample_rate = 16000
transport.camera_width = 1024
transport.camera_height = 1024
llm = AzureLLMService()
dalle = FalImageGenService(image_size="1024x1024")
tts = ElevenLabsTTSService(voice_id="ErXwobaYiN019PkySvjV")
# dalle = OpenAIImageGenService(image_size="1024x1024")
# Get a complete audio chunk from the given text. Splitting this into its own
# coroutine lets us ensure proper ordering of the audio chunks on the send queue.
async def get_all_audio(text):
all_audio = bytearray()
async for audio in tts.run_tts(text):
all_audio.extend(audio)
return all_audio
async def get_month_data(month):
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
image_description = await llm.run_llm(messages)
to_speak = f"{month}: {image_description}"
(audio, image_data) = await asyncio.gather(
get_all_audio(to_speak), dalle.run_image_gen(image_description)
)
return {
"month": month,
"text": image_description,
"image": image_data[1],
"audio": audio,
}
months: list[str] = [
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
]
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
# This will play the months in the order they're completed. The benefit
# is we'll have as little delay as possible before the first month, and
# likely no delay between months, but the months won't display in order.
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
await transport.send_queue.put(
[
QueueFrame(FrameType.IMAGE, data["image"]),
QueueFrame(FrameType.AUDIO, data["audio"]),
]
)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
await transport.run()
if __name__=="__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
args, unknown = parser.parse_known_args()
asyncio.run(main(args.url))

View File

@@ -1,93 +0,0 @@
import argparse
import asyncio
import requests
import time
import urllib.parse
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.queue_frame import QueueFrame, FrameType
async def main(room_url:str, token):
global transport
global llm
global tts
transport = DailyTransportService(
room_url,
token,
"Respond bot",
1,
)
transport.mic_enabled = True
transport.mic_sample_rate = 16000
transport.camera_enabled = False
llm = AzureLLMService()
tts = AzureTTSService()
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
]
sentence = ""
async for frame in transport.get_receive_frames():
if frame.frame_type != FrameType.TRANSCRIPTION:
continue
message = frame.frame_data
if message["session_id"] == transport.my_participant_id:
continue
# todo: we could differentiate between transcriptions from different participants
sentence += message["text"]
if sentence.endswith((".", "?", "!")):
messages.append({"role": "user", "content": sentence})
sentence = ''
full_response = ""
async for response in llm.run_llm_async_sentences(messages):
full_response += response
async for audio in tts.run_tts(response):
await transport.send_queue.put(QueueFrame(FrameType.AUDIO, audio))
messages.append({"role": "assistant", "content": full_response})
transport.transcription_settings["extra"]["punctuate"] = True
await asyncio.gather(transport.run(), handle_transcriptions())
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=True,
help="Daily API Key (needed to create token)",
)
args, unknown = parser.parse_known_args()
# Create a meeting token for the given room with an expiration 1 hour in the future.
room_name: str = urllib.parse.urlparse(args.url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={"Authorization": f"Bearer {args.apikey}"},
json={
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
},
)
if res.status_code != 200:
raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
asyncio.run(main(args.url, token))