Compare commits
80 Commits
transcript
...
cb/valoran
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
06b03dcc33 | ||
|
|
af4ab95713 | ||
|
|
a8d618ede1 | ||
|
|
d6108dae5c | ||
|
|
d90fdb1cae | ||
|
|
1f9c5d132f | ||
|
|
f710aeae95 | ||
|
|
20091d91c9 | ||
|
|
3794e86868 | ||
|
|
5a47a3d5cd | ||
|
|
0cae54e79e | ||
|
|
92ec5641d4 | ||
|
|
53e97bd872 | ||
|
|
dcbd79333a | ||
|
|
97a4cb8b7f | ||
|
|
cc7877f626 | ||
|
|
1992b7e79e | ||
|
|
aee5087a46 | ||
|
|
2516670874 | ||
|
|
4fecc10808 | ||
|
|
4548f91fdc | ||
|
|
33ea1f9925 | ||
|
|
08144fc560 | ||
|
|
815aa2bc3e | ||
|
|
560c98f2fa | ||
|
|
0e0c992f59 | ||
|
|
fd5ff5fee5 | ||
|
|
d76139ac1a | ||
|
|
444418d94c | ||
|
|
d27122e35e | ||
|
|
237db19c40 | ||
|
|
0ae83577c6 | ||
|
|
5c402eee81 | ||
|
|
80750fe022 | ||
|
|
ccfba04ea2 | ||
|
|
5b8198cf9e | ||
|
|
3fa00c4db8 | ||
|
|
4ce36f8c63 | ||
|
|
9620080cc5 | ||
|
|
ee1ce8f288 | ||
|
|
70d07b6ea2 | ||
|
|
9d5ad5675c | ||
|
|
0d96f91cde | ||
|
|
4e9586595d | ||
|
|
d0bcddfd70 | ||
|
|
065a213ebb | ||
|
|
7d6c94d604 | ||
|
|
0859b57b00 | ||
|
|
09838c9b1f | ||
|
|
c39920132c | ||
|
|
860129a4be | ||
|
|
4416f36ae9 | ||
|
|
86af896150 | ||
|
|
5cbac4701b | ||
|
|
5d9aa530e2 | ||
|
|
d4c4d49035 | ||
|
|
e81f247845 | ||
|
|
8baf137511 | ||
|
|
fcceb32bd7 | ||
|
|
ead655fe23 | ||
|
|
bab102f197 | ||
|
|
95fc802607 | ||
|
|
2886997693 | ||
|
|
5fdda43bed | ||
|
|
f0d9b0613e | ||
|
|
a661905d7f | ||
|
|
c9c2e5f561 | ||
|
|
795a339542 | ||
|
|
31db156dfc | ||
|
|
690cf2e47d | ||
|
|
ba89e41c5b | ||
|
|
c134598a77 | ||
|
|
b51abd2969 | ||
|
|
3fda9b0ecb | ||
|
|
95c92e5304 | ||
|
|
b443fbdb60 | ||
|
|
ccd2fa31e5 | ||
|
|
9b65286216 | ||
|
|
6ae733ebfe | ||
|
|
1071dede1a |
1
.gitignore
vendored
@@ -2,6 +2,7 @@
|
||||
env/
|
||||
__pycache__/
|
||||
*~
|
||||
venv
|
||||
#*#
|
||||
|
||||
# Distribution / packaging
|
||||
|
||||
24
LICENSE
Normal file
@@ -0,0 +1,24 @@
|
||||
BSD 2-Clause License
|
||||
|
||||
Copyright (c) 2024, Daily
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
|
||||
2. Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
128
README.md
@@ -1,6 +1,19 @@
|
||||
# dailyai SDK
|
||||
# Daily AI SDK
|
||||
|
||||
This SDK can help you build applications that participate in WebRTC meetings and use various AI services to interact with other participants.
|
||||
Build conversational, multi-modal AI apps with real-time voice and video, like this:
|
||||
|
||||
_Demo Video to come_
|
||||
|
||||
With built-in support for many of the best AI platforms (or [add your own](/docs)):
|
||||
|
||||
- Azure - DALL-E, ChatGPT, and Azure AI Text-to-Speech
|
||||
- Deepgram - Speech-to-text, and Aura text-to-speech
|
||||
- Eleven Labs text-to-speech
|
||||
- Fal.ai image generation
|
||||
- OpenAI DALL-E and ChatGPT
|
||||
- Whisper local speech-to-text
|
||||
|
||||
## Step 1: Get Started
|
||||
|
||||
## Build/Install
|
||||
|
||||
@@ -35,21 +48,112 @@ pip install path_to_this_repo
|
||||
Tou can run the simple sample like so:
|
||||
|
||||
```
|
||||
python src/samples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
|
||||
python src/examples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
|
||||
```
|
||||
## Overview
|
||||
|
||||
Note that the sample uses Azure's TTS and LLM services. You'll need to set the following environment variables for the sample to work:
|
||||
The Daily AI SDK allows you to build applications that can participate in WebRTC sessions and interact with AI Services. Some examples of what you can build with this:
|
||||
|
||||
- conversational bots that interact 1:1 with a user, using voice recognition and text-to-speech
|
||||
- assistant bots that aggregate transcriptions from multiple participants in a meeting and provide realtime summaries or other AI-generated output.
|
||||
- image-recognition bots
|
||||
- etc
|
||||
|
||||
## Concepts
|
||||
|
||||
### Transport Service
|
||||
|
||||
The SDK provides one “transport service”, which is a wrapper around Daily’s `daily-python` client (tk add link). You can use this service to listen for events related to a WebRTC session, such as “a participant joined the meeting”.
|
||||
The transport service also exposes a send queue, and a receive queue. You can use the send queue to send audio and video to the WebRTC session, and you can listen to the receive queue to see audio, video and transcription data from the WebRTC session.
|
||||
|
||||
### AI Services
|
||||
|
||||
The AI Service classes provide wrappers around various AI providers, and allow you to query LLMs, convert text to speech and make images from text. The audio and images can then be placed on the transport service’s send queue, where they’ll be sent to the WebRTC session.
|
||||
|
||||
### Queue Frames
|
||||
|
||||
Communication between the transport service and AI services, and between various AI services, takes place in Queue Frames. These frames contain an indication of the type of data as well as the data itself.
|
||||
|
||||
## Using Transports, AI Services and Frames
|
||||
|
||||
AI Services all define a `.run` method. This method consumes and generates `QueueFrame` frames. The kind of frames that can be consumed and generated depend on the kind of service. For instance, an LLM AI Service consumes `LLM_MESSAGE` frames (which define a history of interaction with an LLM) and emit `TEXT` frames (the response from the LLM).
|
||||
|
||||
The `.run` method is an `AsyncIterable`, and it takes an `iterable`, `AsyncIterable` or `asyncio.Queue` that produces QueueFrames as a parameter. This makes it easy to chain AI Services, and consume input from the Transport’s `receive_queue` .
|
||||
|
||||
AI Services also have a `.run_to_queue` method. This method is not an AsyncIterable, but instead sends processed QueueFrames to a queue. This makes it easy to send the output of an AI Service to the Transport’s `send_queue`.
|
||||
|
||||
AI Services also define convenience functions that let you bypass creating QueueFrames for some simple cases (eg. using the TTS service to convert a string to audio output and send that audio to the transport’s `send_queue`). See below for examples.
|
||||
|
||||
## Examples
|
||||
|
||||
### Say Something
|
||||
|
||||
The base TTS AI service exposes a `.say` method. After creating a transport and TTS service, you can use this method like so:
|
||||
|
||||
```
|
||||
AZURE_SPEECH_SERVICE_KEY
|
||||
AZURE_SPEECH_SERVICE_REGION
|
||||
AZURE_CHATGPT_KEY
|
||||
AZURE_CHATGPT_ENDPOINT
|
||||
AZURE_CHATGPT_DEPLOYMENT_ID
|
||||
transport = DailyTransportService(...)
|
||||
tts = AzureTTSService()
|
||||
await tts.say("hello world", transport.send_queue)
|
||||
```
|
||||
|
||||
If you have those environment variables stored in an .env file, you can quickly load them into your terminal's environment by running this:
|
||||
This will call the TTS service to render the text to audio frames, then put the audio frames on the transport’s send queue. The transport will then send those frames along to the WebRTC session.
|
||||
|
||||
### Speak an LLM response
|
||||
|
||||
Given a system prompt contained in a `messages` array, you can emit the LLM’s response as audio with a chain like this:
|
||||
|
||||
```bash
|
||||
export $(grep -v '^#' .env | xargs)
|
||||
```
|
||||
transport = DailyTransportService(...) # setup parameters omitted
|
||||
tts = AzureTTSService()
|
||||
llm = AzureLLMService()
|
||||
messages = [...] # system prompt omitted for brevity
|
||||
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
llm.run([QueueFrame.LLM_MESSAGES, messages])
|
||||
)
|
||||
```
|
||||
|
||||
In this code, the LLM service object sends the messages to Azure’s OpenAI implementation, which streams chunks back asynchronously. Those chunks are aggregated by the TTS Service to ensure the best audio response (TTS works best when it gets complete sentence, so it can inflect correctly), then sent to Azure’s TTS service, converted to audio frames, and sent to the WebRTC session via the Daily transport.
|
||||
|
||||
### Pre-cache an LLM response
|
||||
|
||||
Sometimes LLMs can be slower than we’d like for natural-feeling communication. Here’s an example where we take advantage of the time it takes to speak some pre-defined text to get a head start on the LLM response:
|
||||
|
||||
(TK link to 04- sample)
|
||||
|
||||
In this sample, we set up a buffer queue to receive the audio frames from the LLM response before while we are joining the call and start an asynchronous task to start filling this buffer:
|
||||
|
||||
```
|
||||
buffer_queue = asyncio.Queue()
|
||||
llm_response_task = asyncio.create_task(
|
||||
elevenlabs_tts.run_to_queue(
|
||||
buffer_queue,
|
||||
llm.run([QueueFrame(FrameType.LLM_MESSAGE, messages)]),
|
||||
True,
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
Then, when we’ve joined the call, we speak the static text:
|
||||
|
||||
```
|
||||
await azure_tts.say("My friend...", transport.send_queue)
|
||||
```
|
||||
|
||||
As that text is being spoken, the asynchronous LLM task continues in the background. When the text is done, we pull the frames off the buffer queue and put them in the transport’s `send_queue`:
|
||||
|
||||
```
|
||||
async def buffer_to_send_queue():
|
||||
while True:
|
||||
frame = await buffer_queue.get()
|
||||
await transport.send_queue.put(frame)
|
||||
buffer_queue.task_done()
|
||||
if frame.frame_type == FrameType.END_STREAM:
|
||||
break
|
||||
|
||||
await asyncio.gather(llm_response_task, buffer_to_send_queue())
|
||||
|
||||
```
|
||||
|
||||
One thing to note here is the last parameter to `run_to_queue` in the first code clause above: this causes the `run_to_queue` method to send an `END_STREAM` frame when it’s done rendering. This lets us know when to stop our `buffer_to_send_queue` task above.
|
||||
|
||||
13
docs/README.md
Normal file
@@ -0,0 +1,13 @@
|
||||
# Daily AI SDK Docs
|
||||
|
||||
## [Architecture Overview](architecture.md)
|
||||
|
||||
Learn about the thinking behind the SDK's design.
|
||||
|
||||
## [Example Code](examples/)
|
||||
|
||||
The repo includes several example apps in the `src/examples` directory. The docs explain how they work.
|
||||
|
||||
## [API Reference](api/)
|
||||
|
||||
Complete documentation of the available classes and methods in the SDK.
|
||||
2
docs/architecture.md
Normal file
@@ -0,0 +1,2 @@
|
||||
# Daily AI SDK Architecture Guide
|
||||
|
||||
119
docs/examples/01-say-one-thing.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# 01: Say One Thing
|
||||
|
||||
_video here - youtube?_
|
||||
|
||||
This example uses a text-to-speech (TTS) service to say one predefined sentence. But first, a quick overview of the general structure of these examples.
|
||||
|
||||
## Running the demos
|
||||
|
||||
All of the demos have something like this at the bottom of the file:
|
||||
|
||||
```python
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
```
|
||||
|
||||
### `configure()`
|
||||
|
||||
The `configure()` function comes from `src/examples/foundational/support/runner.py`, and it allows you to configure the examples from the command line directly, or using environment variables:
|
||||
|
||||
```bash
|
||||
python 01-say-one-thing.py -u https://YOUR_DOMAIN.daily.co/YOUR_ROOM -k YOUR_API_KEY
|
||||
# or
|
||||
DAILY_ROOM_URL=https://YOUR_DOMAIN.daily.co/YOUR_ROOM DAILY_API_KEY=YOUR_API_KEY python 01-say-one-thing.py
|
||||
# or set DAILY_ROOM_URL and DAILY_API_KEY in a .env file
|
||||
python 01-say-one-thing.py
|
||||
```
|
||||
|
||||
You'll need a Daily account to run these demos. You can sign up for free at [daily.co](https://daily.co). Once you've signed up you can create a room from the [Dashboard](https://dashboard.daily.co/rooms), and grab [your API key](https://dashboard.daily.co/developers) while you're there.
|
||||
|
||||
Some functionality (such as transcription) requires the bot to have owner privileges in the room. `runner.py` uses the Daily REST API to create a meeting token with owner privileges. You can learn more about meeting tokens in the [Daily docs](https://docs.daily.co/reference/rest-api/meeting-tokens).
|
||||
|
||||
### `asyncio.run()`
|
||||
|
||||
The AI SDK makes heavy use of Python's `asyncio` module. [This is a reasonable intro to the topic](https://builtin.com/data-science/asyncio) if you haven't worked with `asyncio` and coroutines before.
|
||||
|
||||
You can learn a bit more about the specifics of how the Daily AI SDK uses coroutines in the [Architecture Guide](../architecture.md).
|
||||
|
||||
## The `main()` function
|
||||
|
||||
All of the examples have a `main()` function with a similar structure:
|
||||
|
||||
- Configure the transport
|
||||
- Configure the AI service(s) used in the demo
|
||||
- Configure any event listeners
|
||||
- Define a processing pipeline
|
||||
- Run the example's coroutine(s)
|
||||
|
||||
### Configuring the transport
|
||||
|
||||
The first section of the `main()` function configures the transport object:
|
||||
|
||||
```python
|
||||
meeting_duration_minutes = 5
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Say One Thing",
|
||||
meeting_duration_minutes,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
```
|
||||
|
||||
The [Architecture Guide](../architecture.md) explains the transport object in more detail. In this case, we're configuring a Daily transport object and enabling the virtual microphone, so our bot can play audio.
|
||||
|
||||
### Configuring the services
|
||||
|
||||
As described in the [Architecture Guide](../architecture.md), 'a 'Service' is a class that processes 'Frames' as part of a 'Pipeline'. In this demo app, we'll only need one service: a text-to-speech generator. We can create an instance of the `ElevenLabsTTSService` class with this line of code:
|
||||
|
||||
```python
|
||||
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
|
||||
```
|
||||
|
||||
You'll need to make sure and set those environment variables somewhere. The easiest way to do that is to copy the `example.env` file in the repo and rename it to `.env`, and then add your credentials to that file. `runner.py` loads the `python-dotenv` module and initializes it, making the values in that file available in the environment.
|
||||
|
||||
### Configuring event listeners
|
||||
|
||||
This part isn't strictly necessary for an app like this. You could include the contents of the `on_participant_joined` function directly in the body of the `main()` function, and it would run as soon as you started the script from the command line.
|
||||
|
||||
Instead, we can use an event handler to wait to run that code until someone else joins the meeting. We'll define a function called `greet_user()`, and use the `@transport.event_handler("on_participant_joined")` decorator to tell the SDK that we want to run that function whenever a user joins the room.
|
||||
|
||||
```python
|
||||
@transport.event_handler("on_participant_joined")
|
||||
async def greet_user(transport, participant):
|
||||
if participant["info"]["isLocal"]:
|
||||
return
|
||||
|
||||
await tts.say(
|
||||
"Hello there, " + participant["info"]["userName"] + "!",
|
||||
transport.send_queue,
|
||||
)
|
||||
|
||||
# wait for the output queue to be empty, then leave the meeting
|
||||
await transport.stop_when_done()
|
||||
```
|
||||
|
||||
### Defining a processing pipeline
|
||||
|
||||
In this example, we don't actually have much of a processing pipeline! In fact, we're doing the whole thing inside the `greet_user()` function already.
|
||||
|
||||
Pipelines usually look like a bunch of nested calls to the `run()` or `run_to_queue()` function from different Services. In this example, we're using the `say()` function from the TTS service. This is effectively a convenience wrapper around the `run_to_queue()` function, which we'll discuss more later. It's important to `await` this function to ensure that the speech frames are queued for playback before the next line of code, because of the `stop_when_done()` function being called immediately afterward.
|
||||
|
||||
The output of the `say()` function goes to the transport's `send_queue`. This queue is the all-important connection between the world of the Services pipeline that's generating frames asynchronously and the ordered playback of audio and visual media in the WebRTC call.
|
||||
|
||||
### Running the coroutines
|
||||
|
||||
In this example, we don't actually have any separate processing pipelines—everything happens as a result of an event from the transport. So we only need to run the transport's coroutine, and await its completion:
|
||||
|
||||
```python
|
||||
await transport.run()
|
||||
```
|
||||
|
||||
In future examples, we'll run more processes in parallel. For now, this script can run until the transport exits—which will happen based on calling `stop_when_done()` in the `greet_user()` function.
|
||||
|
||||
## Next Steps
|
||||
|
||||
Next, we'll start connecting multiple AI services together by building a service pipeline.
|
||||
|
||||
## [02 - LLM Say One Thing »](02-llm-say-one-thing.md)
|
||||
5
docs/examples/README.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# Daily AI SDK Examples
|
||||
|
||||
The docs in this folder pair with the example apps located in `src/examples/foundational`. They are designed to serve as a quick references for building different kinds of AI apps. But the examples also build on one another, so it can be really helpful to walk through them in order.
|
||||
|
||||
To start, you can learn about the overall structure of the examples in [01 - Say One Thing](01-say-one-thing.md).
|
||||
@@ -7,16 +7,20 @@ name = "daily_ai"
|
||||
version = "0.0.1"
|
||||
description = "Orchestrator for AI bots with Daily"
|
||||
dependencies = [
|
||||
"daily-python",
|
||||
"Pillow",
|
||||
"typing-extensions",
|
||||
"openai",
|
||||
"google-cloud-texttospeech",
|
||||
"azure-cognitiveservices-speech",
|
||||
"pyht",
|
||||
"opentelemetry-sdk",
|
||||
"aiohttp",
|
||||
"fal"
|
||||
"azure-cognitiveservices-speech",
|
||||
"daily-python",
|
||||
"fal",
|
||||
"faster_whisper",
|
||||
"google-cloud-texttospeech",
|
||||
"numpy",
|
||||
"openai",
|
||||
"Pillow",
|
||||
"pyht",
|
||||
"python-dotenv",
|
||||
"torch",
|
||||
"pyaudio",
|
||||
"typing-extensions"
|
||||
]
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
autopep8==2.0.4
|
||||
build==1.0.3
|
||||
packaging==23.2
|
||||
pyproject_hooks==1.0.0
|
||||
|
||||
@@ -1,347 +0,0 @@
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
|
||||
from collections import defaultdict
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from queue import Queue, PriorityQueue, Empty
|
||||
from threading import Event, Semaphore, Thread
|
||||
from typing import Any, Generator, Iterator, Optional, Type
|
||||
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.message_handler.message_handler import MessageHandler
|
||||
from dailyai.services.ai_services import AIServiceConfig
|
||||
|
||||
class AsyncProcessorState:
|
||||
# Setting class variables, other synchronous activities
|
||||
INIT = 0
|
||||
|
||||
# Making asynchronous requests to LLM and other services to render response
|
||||
PREPARING = 1
|
||||
|
||||
# Ready to start presenting to user (but may not have all data yet)
|
||||
READY = 2
|
||||
|
||||
# Playing response
|
||||
PLAYING = 3
|
||||
|
||||
# An interrupt has been requested and the response is shutting down in-flight processing
|
||||
INTERRUPTING = 4
|
||||
|
||||
# An interrupt has been requested and the response is finished stopping in-flight processing
|
||||
INTERRUPTED = 5
|
||||
|
||||
# Response has been played or interrupted
|
||||
DONE = 6
|
||||
|
||||
# Response is being finalized (updating records of speech, updating LLM context, etc.)
|
||||
FINALIZING = 7
|
||||
|
||||
# Response is complete. This could mean that everything is updated, or that the response
|
||||
# was interrupted.
|
||||
FINALIZED = 8
|
||||
|
||||
state_transitions = {
|
||||
INIT: [PREPARING, INTERRUPTING],
|
||||
PREPARING: [READY, INTERRUPTING],
|
||||
READY: [PLAYING, INTERRUPTING],
|
||||
PLAYING: [DONE, INTERRUPTING],
|
||||
INTERRUPTING: [INTERRUPTED],
|
||||
INTERRUPTED: [DONE],
|
||||
DONE: [FINALIZING],
|
||||
FINALIZING: [FINALIZED],
|
||||
FINALIZED: [FINALIZED],
|
||||
}
|
||||
|
||||
|
||||
@dataclass(order=True)
|
||||
class StateTransitionItem:
|
||||
state: int
|
||||
evt: Event = field(compare=False)
|
||||
|
||||
class AsyncProcessor:
|
||||
def __init__(
|
||||
self,
|
||||
services: AIServiceConfig
|
||||
) -> None:
|
||||
self.state = AsyncProcessorState.INIT
|
||||
self.prepare_thread = None
|
||||
self.play_thread = None
|
||||
self.finalize_thread = None
|
||||
|
||||
self.services: AIServiceConfig = services
|
||||
|
||||
self.state_transition_semaphore = Semaphore()
|
||||
self.waiting_for_state_changes = PriorityQueue()
|
||||
self.state_queue = Queue()
|
||||
|
||||
self.state_change_callbacks = defaultdict(list)
|
||||
|
||||
self.was_interrupted = False
|
||||
|
||||
self.logger: logging.Logger = logging.getLogger("dailyai")
|
||||
|
||||
def set_state(self, state: int) -> None:
|
||||
if state in AsyncProcessorState.state_transitions[self.state]:
|
||||
self.state_transition_semaphore.acquire()
|
||||
|
||||
self.state: int = state
|
||||
self.state_transition_semaphore.release()
|
||||
|
||||
# wake up any threads waiting for this state transition
|
||||
try:
|
||||
while True:
|
||||
waiter = self.waiting_for_state_changes.get_nowait()
|
||||
if waiter.state <= state:
|
||||
waiter.evt.set()
|
||||
else:
|
||||
self.waiting_for_state_changes.put(waiter)
|
||||
break
|
||||
except Empty:
|
||||
pass
|
||||
|
||||
# make all the callbacks for this state
|
||||
for callback in self.state_change_callbacks[state]:
|
||||
callback(self)
|
||||
else:
|
||||
self.logger.error(
|
||||
f"Invalid state transition from {self.state} to {state} in {self.__class__.__name__}"
|
||||
)
|
||||
raise Exception(f"Invalid state transition from {self.state} to {state}")
|
||||
|
||||
#
|
||||
# This is used for state transitions that could be blocked by an interruption.
|
||||
# If we are interrupted, we silently fail this call. Use only if you know that
|
||||
# this state transition should fail if the processor has been interrupted.
|
||||
#
|
||||
|
||||
def maybe_set_state(self, state: int) -> bool:
|
||||
if state in AsyncProcessorState.state_transitions[self.state]:
|
||||
self.set_state(state)
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
def wait_for_state_transition(self, state: int) -> None:
|
||||
if self.state >= state:
|
||||
return
|
||||
|
||||
self.state_transition_semaphore.acquire()
|
||||
|
||||
evt = Event()
|
||||
self.waiting_for_state_changes.put(StateTransitionItem(state, evt))
|
||||
self.state_transition_semaphore.release()
|
||||
result = evt.wait(120.0)
|
||||
if not result:
|
||||
self.logger.error(
|
||||
f"Timed out waiting for state transition to {state} from {self.state}"
|
||||
)
|
||||
|
||||
def set_state_callback(self, state: int, callback: callable) -> None:
|
||||
self.state_change_callbacks[state].append(callback)
|
||||
|
||||
def prepare(self) -> None:
|
||||
self.prepare_thread = Thread(target=self.async_prepare, daemon=True)
|
||||
self.prepare_thread.start()
|
||||
self.wait_for_state_transition(AsyncProcessorState.READY)
|
||||
|
||||
def play(self) -> None:
|
||||
self.wait_for_state_transition(AsyncProcessorState.READY)
|
||||
self.play_thread = Thread(target=self.async_play, daemon=True)
|
||||
self.play_thread.start()
|
||||
self.wait_for_state_transition(AsyncProcessorState.PLAYING)
|
||||
|
||||
def finalize(self) -> None:
|
||||
# don't finalize until we're done playing.
|
||||
self.wait_for_state_transition(AsyncProcessorState.DONE)
|
||||
self.set_state(AsyncProcessorState.FINALIZING)
|
||||
self.do_finalization()
|
||||
self.set_state(AsyncProcessorState.FINALIZED)
|
||||
|
||||
def interrupt(self) -> None:
|
||||
# nothing to interrupt if we're already finalizing or finalized, no-op
|
||||
if self.state in [
|
||||
AsyncProcessorState.FINALIZING,
|
||||
AsyncProcessorState.FINALIZED,
|
||||
]:
|
||||
return
|
||||
|
||||
self.set_state(AsyncProcessorState.INTERRUPTING)
|
||||
self.was_interrupted = True
|
||||
self.do_interruption()
|
||||
self.set_state(AsyncProcessorState.INTERRUPTED)
|
||||
self.set_state(AsyncProcessorState.DONE)
|
||||
|
||||
def async_play(self) -> None:
|
||||
self.logger.info(f"Starting to play")
|
||||
if self.maybe_set_state(AsyncProcessorState.PLAYING):
|
||||
self.do_play()
|
||||
self.maybe_set_state(AsyncProcessorState.DONE)
|
||||
|
||||
def async_prepare(self) -> None:
|
||||
self.set_state(AsyncProcessorState.PREPARING)
|
||||
self.start_preparation()
|
||||
self.set_state(AsyncProcessorState.READY)
|
||||
self.continue_preparation()
|
||||
self.logger.info(f"Preparation done for {self.__class__.__name__}")
|
||||
self.preparation_done()
|
||||
|
||||
def start_preparation(self) -> None:
|
||||
pass
|
||||
|
||||
def continue_preparation(self) -> None:
|
||||
pass
|
||||
|
||||
def preparation_done(self):
|
||||
pass
|
||||
|
||||
def get_preparation_iterator(self) -> Iterator:
|
||||
yield None
|
||||
|
||||
def process_chunk(self, chunk) -> None:
|
||||
pass
|
||||
|
||||
def do_interruption(self) -> None:
|
||||
pass
|
||||
|
||||
def do_play(self) -> None:
|
||||
pass
|
||||
|
||||
def do_finalization(self) -> None:
|
||||
pass
|
||||
|
||||
# A common class for responses that use a message queue and
|
||||
# an output queue.
|
||||
|
||||
class OrchestratorResponse(AsyncProcessor):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
services,
|
||||
message_handler,
|
||||
output_queue,
|
||||
) -> None:
|
||||
super().__init__(services)
|
||||
|
||||
self.message_handler: MessageHandler = message_handler
|
||||
self.output_queue: Queue = output_queue
|
||||
|
||||
|
||||
class LLMResponse(OrchestratorResponse):
|
||||
def __init__(
|
||||
self,
|
||||
services,
|
||||
message_handler,
|
||||
output_queue,
|
||||
) -> None:
|
||||
super().__init__(services, message_handler, output_queue)
|
||||
|
||||
self.has_sent_first_frame = False
|
||||
|
||||
self.chunks_in_preparation = Queue()
|
||||
|
||||
self.llm_responses: list[str] = []
|
||||
|
||||
def get_preparation_iterator(self) -> Iterator:
|
||||
messages_for_llm = self.message_handler.get_llm_messages()
|
||||
self.logger.debug(f"Messages for llm: {json.dumps(messages_for_llm, indent=2)}")
|
||||
return self.clauses_from_chunks(
|
||||
self.services.llm.run_llm_async(messages_for_llm)
|
||||
)
|
||||
|
||||
def clauses_from_chunks(self, chunks) -> Iterator:
|
||||
out = ""
|
||||
for chunk in chunks:
|
||||
if self.state not in [
|
||||
AsyncProcessorState.READY,
|
||||
AsyncProcessorState.PLAYING,
|
||||
]:
|
||||
break
|
||||
|
||||
out += chunk
|
||||
|
||||
if re.match(r"^.*[.!?]$", out): # it looks like a sentence
|
||||
yield out.strip()
|
||||
out = ""
|
||||
|
||||
if out.strip():
|
||||
yield out.strip()
|
||||
|
||||
def get_frames_from_tts_response(self, audio_frame) -> list[QueueFrame]:
|
||||
return [QueueFrame(FrameType.AUDIO, audio_frame)]
|
||||
|
||||
def get_frames_from_chunk(self, chunk) -> Generator[list[QueueFrame], Any, None]:
|
||||
for audio_frame in self.services.tts.run_tts(chunk):
|
||||
yield self.get_frames_from_tts_response(audio_frame)
|
||||
|
||||
def start_preparation(self) -> None:
|
||||
self.preparation_iterator = self.get_preparation_iterator()
|
||||
|
||||
def continue_preparation(self) -> None:
|
||||
for chunk in self.preparation_iterator:
|
||||
if self.state not in [
|
||||
AsyncProcessorState.READY,
|
||||
AsyncProcessorState.PLAYING,
|
||||
]:
|
||||
break
|
||||
|
||||
self.process_chunk(chunk)
|
||||
|
||||
def process_chunk(self, chunk) -> None:
|
||||
self.chunks_in_preparation.put((chunk, self.get_frames_from_chunk(chunk)))
|
||||
|
||||
def preparation_done(self):
|
||||
self.chunks_in_preparation.put((None, None))
|
||||
|
||||
def do_play(self) -> None:
|
||||
while True:
|
||||
if self.state not in [
|
||||
AsyncProcessorState.READY,
|
||||
AsyncProcessorState.PLAYING,
|
||||
]:
|
||||
break
|
||||
prepared_chunk = self.chunks_in_preparation.get()
|
||||
if prepared_chunk[0] == None:
|
||||
return
|
||||
|
||||
self.play_prepared_chunk(prepared_chunk)
|
||||
|
||||
def play_prepared_chunk(self, prepared_chunk) -> None:
|
||||
chunk, tts_generator = prepared_chunk
|
||||
for frames in tts_generator:
|
||||
if self.state not in [
|
||||
AsyncProcessorState.READY,
|
||||
AsyncProcessorState.PLAYING,
|
||||
]:
|
||||
break
|
||||
|
||||
if not self.has_sent_first_frame:
|
||||
self.output_queue.put(QueueFrame(FrameType.START_STREAM, None))
|
||||
self.has_sent_first_frame = True
|
||||
|
||||
for frame in frames:
|
||||
self.output_queue.put(frame)
|
||||
|
||||
self.output_queue.join()
|
||||
self.llm_responses.append(chunk)
|
||||
|
||||
def do_finalization(self) -> None:
|
||||
self.message_handler.add_assistant_messages(self.llm_responses)
|
||||
|
||||
def do_interruption(self) -> None:
|
||||
self.chunks_in_preparation.put((None, None))
|
||||
|
||||
if self.prepare_thread and self.prepare_thread.is_alive():
|
||||
self.prepare_thread.join()
|
||||
|
||||
if self.play_thread and self.play_thread.is_alive():
|
||||
self.play_thread.join()
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ConversationProcessorCollection:
|
||||
introduction: Optional[Type[OrchestratorResponse]] = None
|
||||
waiting: Optional[Type[OrchestratorResponse]] = None
|
||||
response: Optional[Type[OrchestratorResponse]] = None
|
||||
goodbye: Optional[Type[OrchestratorResponse]] = None
|
||||
78
src/dailyai/conversation_wrappers.py
Normal file
@@ -0,0 +1,78 @@
|
||||
import asyncio
|
||||
import copy
|
||||
import functools
|
||||
from typing import AsyncGenerator, Awaitable, Callable
|
||||
from dailyai.queue_aggregators import LLMAssistantContextAggregator, LLMContextAggregator, LLMUserContextAggregator
|
||||
from dailyai.queue_frame import EndStreamQueueFrame, QueueFrame, TranscriptionQueueFrame, UserStartedSpeakingFrame
|
||||
|
||||
|
||||
class InterruptibleConversationWrapper:
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
frame_generator: Callable[[], AsyncGenerator[QueueFrame, None]],
|
||||
runner: Callable[
|
||||
[str, LLMContextAggregator, LLMContextAggregator], Awaitable[None]
|
||||
],
|
||||
interrupt: Callable[[], None],
|
||||
my_participant_id: str | None,
|
||||
llm_messages: list[dict[str, str]],
|
||||
llm_context_aggregator_in=LLMUserContextAggregator,
|
||||
llm_context_aggregator_out=LLMAssistantContextAggregator,
|
||||
delay_before_speech_seconds: float = 1.0,
|
||||
):
|
||||
self._frame_generator: Callable[[], AsyncGenerator[QueueFrame, None]] = frame_generator
|
||||
self._runner: Callable[
|
||||
[str, LLMContextAggregator, LLMContextAggregator], Awaitable[None]
|
||||
] = runner
|
||||
self._interrupt: Callable[[], None] = interrupt
|
||||
self._my_participant_id = my_participant_id
|
||||
self._messages: list[dict[str, str]] = llm_messages
|
||||
self._delay_before_speech_seconds = delay_before_speech_seconds
|
||||
self._llm_context_aggregator_in = llm_context_aggregator_in
|
||||
self._llm_context_aggregator_out = llm_context_aggregator_out
|
||||
|
||||
self._current_phrase = ""
|
||||
|
||||
def update_messages(self, new_messages: list[dict[str, str]], task: asyncio.Task | None):
|
||||
if task:
|
||||
if not task.cancelled():
|
||||
self._current_phrase = ""
|
||||
self._messages = new_messages
|
||||
|
||||
async def speak_after_delay(self, user_speech, messages):
|
||||
await asyncio.sleep(self._delay_before_speech_seconds)
|
||||
tma_in = self._llm_context_aggregator_in(
|
||||
messages, self._my_participant_id, complete_sentences=False
|
||||
)
|
||||
tma_out = self._llm_context_aggregator_out(
|
||||
messages, self._my_participant_id
|
||||
)
|
||||
|
||||
await self._runner(user_speech, tma_in, tma_out)
|
||||
|
||||
async def run_conversation(self):
|
||||
current_response_task = None
|
||||
|
||||
async for frame in self._frame_generator():
|
||||
if isinstance(frame, EndStreamQueueFrame):
|
||||
break
|
||||
elif not isinstance(frame, TranscriptionQueueFrame):
|
||||
continue
|
||||
|
||||
if frame.participantId == self._my_participant_id:
|
||||
continue
|
||||
|
||||
if current_response_task and isinstance(frame, UserStartedSpeakingFrame):
|
||||
current_response_task.cancel()
|
||||
self._interrupt()
|
||||
|
||||
|
||||
self._current_phrase += " " + frame.text
|
||||
current_llm_messages = copy.deepcopy(self._messages)
|
||||
current_response_task = asyncio.create_task(
|
||||
self.speak_after_delay(self._current_phrase, current_llm_messages)
|
||||
)
|
||||
current_response_task.add_done_callback(
|
||||
functools.partial(self.update_messages, current_llm_messages)
|
||||
)
|
||||
@@ -1,127 +0,0 @@
|
||||
import logging
|
||||
import time
|
||||
|
||||
from dataclasses import dataclass
|
||||
from queue import Queue, Empty
|
||||
from threading import Thread
|
||||
|
||||
from dailyai.storage.search import SearchIndexer
|
||||
from dailyai.services.ai_services import AIServiceConfig
|
||||
|
||||
|
||||
@dataclass
|
||||
class Message:
|
||||
type: str
|
||||
timestamp: float
|
||||
message: str
|
||||
|
||||
|
||||
class MessageHandler:
|
||||
def __init__(self, intro):
|
||||
self.messages: list[Message] = [Message("system", time.time(), intro)]
|
||||
self.last_user_message_idx:int | None = None
|
||||
self.finalized_user_message_idx: int | None = None
|
||||
|
||||
def add_user_message(self, message) -> None:
|
||||
if self.last_user_message_idx is not None and self.last_user_message_idx != self.finalized_user_message_idx:
|
||||
previous_message: str = self.messages[self.last_user_message_idx].message
|
||||
self.messages[self.last_user_message_idx] = Message(
|
||||
"user", time.time(), ' '.join([previous_message, message])
|
||||
)
|
||||
self.messages = self.messages[: self.last_user_message_idx + 1]
|
||||
else:
|
||||
self.messages.append(Message("user", time.time(), message))
|
||||
|
||||
self.last_user_message_idx = len(self.messages) - 1
|
||||
|
||||
def add_assistant_message(self, message) -> None:
|
||||
if self.messages[-1].type == "assistant":
|
||||
self.messages[-1].message += " " + message
|
||||
else:
|
||||
self.messages.append(Message("assistant", time.time(), message))
|
||||
|
||||
def add_assistant_messages(self, messages) -> None:
|
||||
self.messages.append(Message("assistant", time.time(), " ".join(messages)))
|
||||
|
||||
def get_llm_messages(self) -> list[dict[str, str]]:
|
||||
return [{"role": m.type, "content": m.message} for m in self.messages]
|
||||
|
||||
def finalize_user_message(self) -> None:
|
||||
self.finalized_user_message_idx = self.last_user_message_idx
|
||||
|
||||
def shutdown(self) -> None:
|
||||
pass
|
||||
|
||||
class IndexingMessageHandler(MessageHandler):
|
||||
def __init__(
|
||||
self, intro, services: AIServiceConfig, indexer: SearchIndexer
|
||||
) -> None:
|
||||
super().__init__(intro)
|
||||
self.services = services
|
||||
|
||||
self.search_indexer = indexer
|
||||
|
||||
self.last_written_idx = 0
|
||||
self.storage_message_queue = Queue()
|
||||
|
||||
self.index_writer_thread = Thread(target=self.storage_writer, daemon=True)
|
||||
self.index_writer_thread.start()
|
||||
|
||||
self.logger = logging.getLogger("dailyai")
|
||||
|
||||
def shutdown(self):
|
||||
self.finalize_user_message()
|
||||
self.storage_message_queue.put(None)
|
||||
self.index_writer_thread.join()
|
||||
|
||||
def storage_writer(self) -> None:
|
||||
while True:
|
||||
try:
|
||||
message_idx = self.storage_message_queue.get()
|
||||
self.storage_message_queue.task_done()
|
||||
|
||||
if message_idx is None:
|
||||
return
|
||||
|
||||
if message_idx <= self.last_written_idx:
|
||||
continue
|
||||
|
||||
self.last_written_idx = message_idx
|
||||
|
||||
message = self.messages[message_idx]
|
||||
content = message.message
|
||||
if message.type == "user":
|
||||
content = self.cleanup_user_message(content)
|
||||
|
||||
# sometimes the LLM returns a string wrapped in quotes and sometimes it doesn't.
|
||||
# if it didn't, wrap it in quotes
|
||||
if content[0] != '"':
|
||||
content = '"' + content + '"'
|
||||
|
||||
self.search_indexer.index_text(content)
|
||||
except Empty:
|
||||
pass
|
||||
|
||||
def cleanup_user_message(self, user_message) -> str:
|
||||
return user_message
|
||||
|
||||
def finalize_user_message(self):
|
||||
super().finalize_user_message()
|
||||
self.write_messages_to_storage()
|
||||
|
||||
def write_messages_to_storage(self):
|
||||
if self.finalized_user_message_idx is None:
|
||||
return
|
||||
|
||||
for idx in range(self.last_written_idx, len(self.messages)):
|
||||
self.logger.info(
|
||||
f"Writing to storage: {self.messages[idx].type} {self.messages[idx].message}"
|
||||
)
|
||||
if (
|
||||
self.messages[idx].type == "user"
|
||||
and idx > self.finalized_user_message_idx
|
||||
):
|
||||
break
|
||||
|
||||
if self.messages[idx].type != "system":
|
||||
self.storage_message_queue.put(idx)
|
||||
@@ -1,409 +0,0 @@
|
||||
import logging
|
||||
import os
|
||||
import time
|
||||
import wave
|
||||
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
from queue import Queue, Empty
|
||||
from opentelemetry import trace, context
|
||||
|
||||
from dailyai.async_processor.async_processor import (
|
||||
AsyncProcessor,
|
||||
AsyncProcessorState,
|
||||
ConversationProcessorCollection,
|
||||
OrchestratorResponse,
|
||||
LLMResponse,
|
||||
)
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.ai_services import AIServiceConfig
|
||||
from dailyai.message_handler.message_handler import MessageHandler
|
||||
|
||||
from threading import Thread, Semaphore, Event, Timer
|
||||
|
||||
from opentelemetry import context
|
||||
from opentelemetry.context.context import Context
|
||||
|
||||
from daily import (
|
||||
EventHandler,
|
||||
CallClient,
|
||||
Daily,
|
||||
VirtualCameraDevice,
|
||||
VirtualMicrophoneDevice,
|
||||
VirtualSpeakerDevice,
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class OrchestratorConfig:
|
||||
room_url: str
|
||||
token: str
|
||||
bot_name: str
|
||||
expiration: float
|
||||
|
||||
# Note that we use this as a default parameter value in the Orchestrator
|
||||
# constructor. The dataclass is defined with Frozen=True, so this should
|
||||
# be safe.
|
||||
default_conversation_collection = ConversationProcessorCollection(
|
||||
introduction=LLMResponse,
|
||||
waiting=None,
|
||||
response=LLMResponse,
|
||||
goodbye=None,
|
||||
)
|
||||
|
||||
|
||||
class Orchestrator(EventHandler):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
daily_config: OrchestratorConfig,
|
||||
ai_service_config: AIServiceConfig,
|
||||
message_handler: MessageHandler,
|
||||
conversation_processors: ConversationProcessorCollection = default_conversation_collection,
|
||||
tracer=None,
|
||||
):
|
||||
self.bot_name: str = daily_config.bot_name
|
||||
self.room_url: str = daily_config.room_url
|
||||
self.token: str = daily_config.token
|
||||
self.expiration: float = daily_config.expiration
|
||||
|
||||
self.logger: logging.Logger = logging.getLogger("dailyai")
|
||||
self.tracer = tracer or trace.get_tracer("orchestrator")
|
||||
|
||||
self.ctx: Context = context.get_current()
|
||||
|
||||
self.transcription = ""
|
||||
self.last_fragment_at = None
|
||||
self.talked_at = None
|
||||
self.paused_at = None
|
||||
|
||||
self.logger.info(f"Creating Response for introductions")
|
||||
self.services: AIServiceConfig = ai_service_config
|
||||
self.output_queue = Queue()
|
||||
self.is_interrupted = Event()
|
||||
self.stop_threads = Event()
|
||||
self.story_started = False
|
||||
|
||||
self.message_handler = message_handler
|
||||
self.conversation_processors: ConversationProcessorCollection = conversation_processors
|
||||
|
||||
if conversation_processors.introduction is not None:
|
||||
intro = conversation_processors.introduction(
|
||||
services=self.services, message_handler=self.message_handler, output_queue=self.output_queue
|
||||
)
|
||||
intro.prepare()
|
||||
intro.set_state_callback(AsyncProcessorState.DONE, self.on_intro_played)
|
||||
intro.set_state_callback(AsyncProcessorState.FINALIZED, self.on_intro_finished)
|
||||
self.logger.info(f"Introduction is preparing")
|
||||
|
||||
self.current_response: AsyncProcessor = intro
|
||||
self.can_interrupt = False
|
||||
# self.response_event.set()
|
||||
self.response_semaphore = Semaphore()
|
||||
|
||||
self.speech_timeout = None
|
||||
self.interrupt_time = None
|
||||
|
||||
self.logger.info("Configuring daily")
|
||||
self.configure_daily()
|
||||
|
||||
def configure_daily(self):
|
||||
Daily.init()
|
||||
self.client = CallClient(event_handler=self)
|
||||
|
||||
self.logger.info(f"Mic sample rate: {self.services.tts.get_mic_sample_rate()}")
|
||||
self.mic: VirtualMicrophoneDevice = Daily.create_microphone_device(
|
||||
"mic", sample_rate=self.services.tts.get_mic_sample_rate(), channels=1
|
||||
)
|
||||
self.speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
|
||||
"speaker", sample_rate=16000, channels=1
|
||||
)
|
||||
self.camera: VirtualCameraDevice = Daily.create_camera_device(
|
||||
"camera", width=720, height=1280, color_format="RGB"
|
||||
)
|
||||
|
||||
Daily.select_speaker_device("speaker")
|
||||
|
||||
self.client.set_user_name(self.bot_name)
|
||||
self.client.join(self.room_url, self.token, completion=self.call_joined)
|
||||
|
||||
self.client.update_inputs(
|
||||
{
|
||||
"camera": {
|
||||
"isEnabled": True,
|
||||
"settings": {
|
||||
"deviceId": "camera",
|
||||
},
|
||||
},
|
||||
"microphone": {
|
||||
"isEnabled": True,
|
||||
"settings": {
|
||||
"deviceId": "mic",
|
||||
"customConstraints": {
|
||||
"autoGainControl": {"exact": False},
|
||||
"echoCancellation": {"exact": False},
|
||||
"noiseSuppression": {"exact": False},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
self.client.update_publishing(
|
||||
{
|
||||
"camera": {
|
||||
"sendSettings": {
|
||||
"maxQuality": "low",
|
||||
"encodings": {
|
||||
"low": {
|
||||
"maxBitrate": 250000,
|
||||
"scaleResolutionDownBy": 1.333,
|
||||
"maxFramerate": 8,
|
||||
}
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
self.my_participant_id = self.client.participants()["local"]["id"]
|
||||
|
||||
def start(self) -> None:
|
||||
# TODO: this loop could, I think, be replaced with a timer and an event
|
||||
self.participant_left = False
|
||||
|
||||
try:
|
||||
participant_count: int = len(self.client.participants())
|
||||
self.logger.info(f"{participant_count} participants in room")
|
||||
while time.time() < self.expiration and not self.participant_left:
|
||||
# all handling of incoming transcriptions happens in on_transcription_message
|
||||
time.sleep(1)
|
||||
except Exception as e:
|
||||
self.logger.error(f"Exception {e}")
|
||||
finally:
|
||||
self.client.leave()
|
||||
|
||||
def stop(self):
|
||||
self.logger.info("Stop current response")
|
||||
if self.current_response:
|
||||
if self.current_response.state < AsyncProcessorState.INTERRUPTED:
|
||||
self.current_response.interrupt()
|
||||
|
||||
self.logger.info("Wait for state transition")
|
||||
self.current_response.wait_for_state_transition(AsyncProcessorState.FINALIZED)
|
||||
|
||||
self.stop_threads.set()
|
||||
self.camera_thread.join()
|
||||
self.logger.info("Camera thread stopped")
|
||||
|
||||
self.logger.info("Put stop in output queue")
|
||||
self.output_queue.put(QueueFrame(FrameType.END_STREAM, None))
|
||||
|
||||
self.frame_consumer_thread.join()
|
||||
self.logger.info("Orchestrator stopped.")
|
||||
|
||||
def on_intro_played(self, intro):
|
||||
self.logger.info(f"Introduction has played")
|
||||
self.can_interrupt = True
|
||||
intro.finalize()
|
||||
|
||||
def on_intro_finished(self, intro):
|
||||
self.logger.info(f"Introduction has finished")
|
||||
waiting = self.conversation_processors.waiting(self.services, self.message_handler, self.output_queue)
|
||||
waiting.prepare()
|
||||
waiting.play()
|
||||
|
||||
def on_response_played(self, response):
|
||||
response.finalize()
|
||||
|
||||
def on_response_finished(self, response):
|
||||
if not response.was_interrupted:
|
||||
self.message_handler.finalize_user_message()
|
||||
|
||||
def call_joined(self, join_data, client_error):
|
||||
self.logger.info(f"Call_joined: {join_data}, {client_error}")
|
||||
self.client.start_transcription(
|
||||
{
|
||||
"language": "en",
|
||||
"tier": "nova",
|
||||
"model": "2-conversationalai",
|
||||
"profanity_filter": True,
|
||||
"redact": False,
|
||||
"extra": {
|
||||
"endpointing": True,
|
||||
"punctuate": False,
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
def on_participant_joined(self, participant):
|
||||
with self.tracer.start_as_current_span("on_participant_joined", context=self.ctx):
|
||||
self.logger.info(f"on_participant_joined: {participant}")
|
||||
|
||||
# TODO: figure out the architecture to get the story id to the client
|
||||
# self.client.send_app_message({"event": "story-id", "storyID": self.story_id})
|
||||
time.sleep(2)
|
||||
|
||||
if not self.story_started:
|
||||
self.action()
|
||||
self.story_started = True
|
||||
|
||||
def on_participant_left(self, participant, reason):
|
||||
self.logger.info(f"Participant {participant} left")
|
||||
if len(self.client.participants()) < 2:
|
||||
self.participant_left = True
|
||||
|
||||
def on_app_message(self, message, sender):
|
||||
with self.tracer.start_as_current_span("on_app_message", context=self.ctx):
|
||||
self.logger.info(f"on_app_message {message} from {sender}")
|
||||
if "isSpeaking" in message and message["isSpeaking"] == True:
|
||||
self.handle_user_started_talking()
|
||||
|
||||
if "isSpeaking" in message and message["isSpeaking"] == False:
|
||||
self.handle_user_stopped_talking()
|
||||
|
||||
def on_transcription_message(self, message):
|
||||
with self.tracer.start_as_current_span("on_transcription_message", context=self.ctx):
|
||||
if message["session_id"] != self.my_participant_id:
|
||||
self.handle_transcription_fragment(message['text'])
|
||||
|
||||
def on_transcription_stopped(self, stopped_by, stopped_by_error):
|
||||
self.logger.info(f"Transcription stopped {stopped_by}, {stopped_by_error}")
|
||||
|
||||
def on_transcription_error(self, message):
|
||||
self.logger.error(f"Transcription error {message}")
|
||||
|
||||
def on_transcription_started(self, status):
|
||||
self.logger.info(f"Transcription started {status}")
|
||||
|
||||
def set_image(self, image: bytes):
|
||||
self.image: bytes | None = image
|
||||
|
||||
def run_camera(self):
|
||||
try:
|
||||
while not self.stop_threads.is_set():
|
||||
if self.image:
|
||||
self.camera.write_frame(self.image)
|
||||
|
||||
time.sleep(1.0 / 8.0) # 8 fps
|
||||
except Exception as e:
|
||||
self.logger.error(f"Exception {e} in camera thread.")
|
||||
|
||||
def handle_user_started_talking(self):
|
||||
# TODO: allow configuration of the timer timeout
|
||||
self.logger.error("user started talking")
|
||||
self.speech_timeout = Timer(1.0, self.utterance_interrupt)
|
||||
|
||||
def handle_user_stopped_talking(self):
|
||||
self.logger.error("user stopped talking, canceling utterance interrupt")
|
||||
if self.speech_timeout:
|
||||
self.speech_timeout.cancel()
|
||||
|
||||
def utterance_interrupt(self):
|
||||
self.logger.error("utterance interrupt")
|
||||
self.is_interrupted.set()
|
||||
|
||||
def handle_transcription_fragment(self, fragment):
|
||||
if not self.can_interrupt:
|
||||
return
|
||||
|
||||
# start generating a new response. We'll do the fast parts of the interrupt
|
||||
# now but wait for the state transition after we've kicked off the prepare
|
||||
# on the new response.
|
||||
if (
|
||||
self.current_response
|
||||
and self.current_response.state < AsyncProcessorState.INTERRUPTED
|
||||
):
|
||||
self.interrupt_time = time.perf_counter()
|
||||
self.is_interrupted.set()
|
||||
self.current_response.interrupt()
|
||||
|
||||
self.message_handler.add_user_message(fragment)
|
||||
|
||||
response_type: type[OrchestratorResponse] | type[LLMResponse] = self.conversation_processors.response or LLMResponse
|
||||
new_response: OrchestratorResponse = response_type(
|
||||
self.services, self.message_handler, self.output_queue
|
||||
)
|
||||
new_response.set_state_callback(
|
||||
AsyncProcessorState.DONE, self.on_response_played
|
||||
)
|
||||
new_response.set_state_callback(
|
||||
AsyncProcessorState.FINALIZED, self.on_response_finished
|
||||
)
|
||||
new_response.prepare()
|
||||
|
||||
self.response_semaphore.acquire()
|
||||
if (
|
||||
self.current_response
|
||||
and self.current_response.state < AsyncProcessorState.INTERRUPTED
|
||||
):
|
||||
self.current_response.wait_for_state_transition(
|
||||
AsyncProcessorState.FINALIZED
|
||||
)
|
||||
|
||||
self.current_response = new_response
|
||||
self.current_response.play()
|
||||
|
||||
self.response_semaphore.release()
|
||||
|
||||
def action(self):
|
||||
self.logger.info("Starting camera thread")
|
||||
self.image: bytes | None = None
|
||||
self.camera_thread = Thread(target=self.run_camera, daemon=True)
|
||||
self.camera_thread.start()
|
||||
|
||||
self.logger.info("Starting frame consumer thread")
|
||||
self.frame_consumer_thread = Thread(target=self.frame_consumer, daemon=True)
|
||||
self.frame_consumer_thread.start()
|
||||
|
||||
self.logger.info("Playing introduction")
|
||||
self.can_interrupt = False
|
||||
self.current_response.play()
|
||||
|
||||
def frame_consumer(self):
|
||||
self.logger.info("🎬 Starting frame consumer thread")
|
||||
b = bytearray()
|
||||
smallest_write_size = 3200
|
||||
all_audio_frames = bytearray()
|
||||
while True:
|
||||
try:
|
||||
frame:QueueFrame = self.output_queue.get()
|
||||
if frame.frame_type == FrameType.END_STREAM:
|
||||
self.logger.info("Stopping frame consumer thread")
|
||||
return
|
||||
|
||||
# if interrupted, we just pull frames off the queue and discard them
|
||||
if not self.is_interrupted.is_set():
|
||||
if frame:
|
||||
if frame.frame_type == FrameType.AUDIO:
|
||||
chunk = frame.frame_data
|
||||
|
||||
all_audio_frames.extend(chunk)
|
||||
|
||||
b.extend(chunk)
|
||||
l = len(b) - (len(b) % smallest_write_size)
|
||||
if l:
|
||||
self.mic.write_frames(bytes(b[:l]))
|
||||
b = b[l:]
|
||||
elif frame.frame_type == FrameType.IMAGE:
|
||||
self.set_image(frame.frame_data)
|
||||
elif len(b):
|
||||
self.mic.write_frames(bytes(b))
|
||||
b = bytearray()
|
||||
else:
|
||||
if self.interrupt_time:
|
||||
self.logger.info(f"Lag to stop stream after interruption {time.perf_counter() - self.interrupt_time}")
|
||||
self.interrupt_time = None
|
||||
|
||||
if frame.frame_type == FrameType.START_STREAM:
|
||||
self.is_interrupted.clear()
|
||||
|
||||
self.output_queue.task_done()
|
||||
except Empty:
|
||||
try:
|
||||
if len(b):
|
||||
self.mic.write_frames(bytes(b))
|
||||
except Exception as e:
|
||||
self.logger.error(f"Exception in frame_consumer: {e}, {len(b)}")
|
||||
|
||||
b = bytearray()
|
||||
103
src/dailyai/queue_aggregators.py
Normal file
@@ -0,0 +1,103 @@
|
||||
import asyncio
|
||||
|
||||
from dailyai.queue_frame import LLMMessagesQueueFrame, QueueFrame, TextQueueFrame, TranscriptionQueueFrame
|
||||
from dailyai.services.ai_services import AIService
|
||||
|
||||
from typing import AsyncGenerator, List
|
||||
|
||||
|
||||
class QueueTee:
|
||||
async def run_to_queue_and_generate(
|
||||
self,
|
||||
output_queue: asyncio.Queue,
|
||||
generator: AsyncGenerator[QueueFrame, None]
|
||||
) -> AsyncGenerator[QueueFrame, None]:
|
||||
async for frame in generator:
|
||||
await output_queue.put(frame)
|
||||
yield frame
|
||||
|
||||
async def run_to_queues(
|
||||
self,
|
||||
output_queues: List[asyncio.Queue],
|
||||
generator: AsyncGenerator[QueueFrame, None]
|
||||
):
|
||||
async for frame in generator:
|
||||
for queue in output_queues:
|
||||
await queue.put(frame)
|
||||
|
||||
|
||||
class LLMContextAggregator(AIService):
|
||||
def __init__(
|
||||
self,
|
||||
messages: list[dict],
|
||||
role: str,
|
||||
bot_participant_id=None,
|
||||
complete_sentences=True,
|
||||
pass_through=True):
|
||||
super().__init__()
|
||||
self.messages = messages
|
||||
self.bot_participant_id = bot_participant_id
|
||||
self.role = role
|
||||
self.sentence = ""
|
||||
self.complete_sentences = complete_sentences
|
||||
self.pass_through = pass_through
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
# We don't do anything with non-text frames, pass it along to next in the pipeline.
|
||||
if not isinstance(frame, TextQueueFrame):
|
||||
yield frame
|
||||
return
|
||||
|
||||
# Ignore transcription frames from the bot
|
||||
if isinstance(frame, TranscriptionQueueFrame):
|
||||
if frame.participantId == self.bot_participant_id:
|
||||
return
|
||||
print(f"@@@ tma got a frame: {frame.text}")
|
||||
# The common case for "pass through" is receiving frames from the LLM that we'll
|
||||
# use to update the "assistant" LLM messages, but also passing the text frames
|
||||
# along to a TTS service to be spoken to the user.
|
||||
if self.pass_through:
|
||||
yield frame
|
||||
|
||||
# TODO: split up transcription by participant
|
||||
if self.complete_sentences:
|
||||
# type: ignore -- the linter thinks this isn't a TextQueueFrame, even
|
||||
# though we check it above
|
||||
self.sentence += frame.text
|
||||
if self.sentence.endswith((".", "?", "!")):
|
||||
self.messages.append(
|
||||
{"role": self.role, "content": self.sentence})
|
||||
self.sentence = ""
|
||||
# for message in self.messages:
|
||||
# print(f"{message['role']}: {message['content']}")
|
||||
yield LLMMessagesQueueFrame(self.messages)
|
||||
else:
|
||||
# type: ignore -- the linter thinks this isn't a TextQueueFrame, even
|
||||
# though we check it above
|
||||
self.messages.append({"role": self.role, "content": frame.text})
|
||||
yield LLMMessagesQueueFrame(self.messages)
|
||||
|
||||
async def finalize(self) -> AsyncGenerator[QueueFrame, None]:
|
||||
# Send any dangling words that weren't finished with punctuation.
|
||||
if self.complete_sentences and self.sentence:
|
||||
self.messages.append({"role": self.role, "content": self.sentence})
|
||||
# for message in self.messages:
|
||||
# print(f"{message['role']}: {message['content']}")
|
||||
yield LLMMessagesQueueFrame(self.messages)
|
||||
|
||||
|
||||
class LLMUserContextAggregator(LLMContextAggregator):
|
||||
def __init__(self,
|
||||
messages: list[dict],
|
||||
bot_participant_id=None,
|
||||
complete_sentences=True):
|
||||
super().__init__(messages, "user", bot_participant_id, complete_sentences, pass_through=False)
|
||||
|
||||
|
||||
class LLMAssistantContextAggregator(LLMContextAggregator):
|
||||
def __init__(
|
||||
self, messages: list[dict], bot_participant_id=None, complete_sentences=True
|
||||
):
|
||||
super().__init__(
|
||||
messages, "assistant", bot_participant_id, complete_sentences, pass_through=True
|
||||
)
|
||||
@@ -1,19 +1,77 @@
|
||||
from enum import Enum
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
class FrameType(Enum):
|
||||
START_STREAM = 0
|
||||
END_STREAM = 1
|
||||
AUDIO = 2
|
||||
IMAGE = 3
|
||||
SENTENCE = 4
|
||||
TEXT_CHUNK = 5
|
||||
LLM_MESSAGE = 6
|
||||
APP_MESSAGE = 7
|
||||
IMAGE_DESCRIPTION = 8
|
||||
TRANSCRIPTION = 9
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class QueueFrame:
|
||||
frame_type: FrameType
|
||||
frame_data: str | dict | bytes | list | None
|
||||
pass
|
||||
|
||||
|
||||
class ControlQueueFrame(QueueFrame):
|
||||
pass
|
||||
|
||||
|
||||
class StartStreamQueueFrame(ControlQueueFrame):
|
||||
pass
|
||||
|
||||
|
||||
class EndStreamQueueFrame(ControlQueueFrame):
|
||||
pass
|
||||
|
||||
|
||||
class LLMResponseEndQueueFrame(QueueFrame):
|
||||
pass
|
||||
|
||||
|
||||
@dataclass()
|
||||
class ChatMessageQueueFrame(QueueFrame):
|
||||
message: str
|
||||
|
||||
|
||||
@dataclass()
|
||||
class LLMFunctionCallFrame(QueueFrame):
|
||||
function_name: str
|
||||
arguments: str
|
||||
|
||||
|
||||
@dataclass()
|
||||
class AudioQueueFrame(QueueFrame):
|
||||
data: bytes
|
||||
|
||||
|
||||
@dataclass()
|
||||
class ImageQueueFrame(QueueFrame):
|
||||
url: str | None
|
||||
image: bytes
|
||||
|
||||
|
||||
@dataclass()
|
||||
class SpriteQueueFrame(QueueFrame):
|
||||
images: list[bytes]
|
||||
|
||||
|
||||
@dataclass()
|
||||
class TextQueueFrame(QueueFrame):
|
||||
text: str
|
||||
|
||||
|
||||
@dataclass()
|
||||
class TranscriptionQueueFrame(TextQueueFrame):
|
||||
participantId: str
|
||||
timestamp: str
|
||||
|
||||
|
||||
@dataclass()
|
||||
class LLMMessagesQueueFrame(QueueFrame):
|
||||
messages: list[dict[str, str]] # TODO: define this more concretely!
|
||||
|
||||
|
||||
class AppMessageQueueFrame(QueueFrame):
|
||||
message: Any
|
||||
participantId: str
|
||||
|
||||
class UserStartedSpeakingFrame(QueueFrame):
|
||||
pass
|
||||
|
||||
class UserStoppedSpeakingFrame(QueueFrame):
|
||||
pass
|
||||
@@ -1,2 +1,3 @@
|
||||
Pillow==10.1.0
|
||||
typing_extensions==4.9.0
|
||||
typing_extensions==4.9.0
|
||||
faster-whisper==0.10.0
|
||||
@@ -1,73 +0,0 @@
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from dailyai.queue_frame import FrameType, QueueFrame
|
||||
from dailyai.services.ai_services import AIService
|
||||
|
||||
class SentenceAggregator(AIService):
|
||||
def __init__(self, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self.current_sentence = ""
|
||||
|
||||
def allowed_input_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.TEXT_CHUNK, FrameType.SENTENCE])
|
||||
|
||||
def possible_output_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.SENTENCE])
|
||||
|
||||
async def process_frame(
|
||||
self, requested_frame_types: set[FrameType], frame: QueueFrame
|
||||
) -> AsyncGenerator[QueueFrame, None]:
|
||||
if not FrameType.SENTENCE in requested_frame_types:
|
||||
return
|
||||
|
||||
if frame.frame_type == FrameType.TEXT_CHUNK:
|
||||
if type(frame.frame_data) != str:
|
||||
raise Exception(
|
||||
"Sentence aggregator requires a string for the data field"
|
||||
)
|
||||
|
||||
self.current_sentence += frame.frame_data
|
||||
if self.current_sentence.endswith((".", "?", "!")):
|
||||
sentence = self.current_sentence
|
||||
self.current_sentence = ""
|
||||
yield QueueFrame(FrameType.SENTENCE, sentence)
|
||||
elif frame.frame_type == FrameType.END_STREAM:
|
||||
if self.current_sentence:
|
||||
yield QueueFrame(FrameType.SENTENCE, self.current_sentence)
|
||||
elif frame.frame_type == FrameType.SENTENCE:
|
||||
yield frame
|
||||
|
||||
|
||||
class TranscriptionSentenceAggregator(AIService):
|
||||
def __init__(self, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self.current_sentence = ""
|
||||
|
||||
def allowed_input_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.TEXT_CHUNK, FrameType.SENTENCE])
|
||||
|
||||
def possible_output_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.SENTENCE])
|
||||
|
||||
async def process_frame(
|
||||
self, requested_frame_types: set[FrameType], frame: QueueFrame
|
||||
) -> AsyncGenerator[QueueFrame, None]:
|
||||
if not FrameType.SENTENCE in requested_frame_types:
|
||||
return
|
||||
|
||||
if frame.frame_type == FrameType.TEXT_CHUNK:
|
||||
if type(frame.frame_data) != str:
|
||||
raise Exception(
|
||||
"Sentence aggregator requires a string for the data field"
|
||||
)
|
||||
|
||||
self.current_sentence += frame.frame_data
|
||||
if self.current_sentence.endswith((".", "?", "!")):
|
||||
sentence = self.current_sentence
|
||||
self.current_sentence = ""
|
||||
yield QueueFrame(FrameType.SENTENCE, sentence)
|
||||
elif frame.frame_type == FrameType.END_STREAM:
|
||||
if self.current_sentence:
|
||||
yield QueueFrame(FrameType.SENTENCE, self.current_sentence)
|
||||
elif frame.frame_type == FrameType.SENTENCE:
|
||||
yield frame
|
||||
@@ -1,17 +1,27 @@
|
||||
import asyncio
|
||||
import io
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
import wave
|
||||
|
||||
from httpx import request
|
||||
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.queue_frame import (
|
||||
AudioQueueFrame,
|
||||
ControlQueueFrame,
|
||||
EndStreamQueueFrame,
|
||||
ImageQueueFrame,
|
||||
LLMMessagesQueueFrame,
|
||||
LLMResponseEndQueueFrame,
|
||||
LLMFunctionCallFrame,
|
||||
ChatMessageQueueFrame,
|
||||
QueueFrame,
|
||||
TextQueueFrame,
|
||||
TranscriptionQueueFrame,
|
||||
)
|
||||
|
||||
from abc import abstractmethod
|
||||
from typing import AsyncGenerator, Iterable
|
||||
from typing import AsyncGenerator, AsyncIterable, BinaryIO, Iterable
|
||||
from dataclasses import dataclass
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from collections.abc import Iterable, AsyncIterable
|
||||
|
||||
class AIService:
|
||||
|
||||
@@ -21,95 +31,57 @@ class AIService:
|
||||
def stop(self):
|
||||
pass
|
||||
|
||||
def allowed_input_frame_types(self) -> set[FrameType]:
|
||||
return set()
|
||||
|
||||
def possible_output_frame_types(self) -> set[FrameType]:
|
||||
return set()
|
||||
|
||||
async def run_to_queue(self, queue: asyncio.Queue, frames, add_end_of_stream=False) -> None:
|
||||
async for frame in self.run(frames):
|
||||
await queue.put(frame)
|
||||
|
||||
if add_end_of_stream:
|
||||
await queue.put(QueueFrame(FrameType.END_STREAM, None))
|
||||
await queue.put(EndStreamQueueFrame())
|
||||
|
||||
async def run(
|
||||
self,
|
||||
frames: Iterable[QueueFrame]
|
||||
| AsyncIterable[QueueFrame]
|
||||
| asyncio.Queue[QueueFrame],
|
||||
requested_frame_types: set[FrameType] | None=None,
|
||||
) -> AsyncGenerator[QueueFrame, None]:
|
||||
if requested_frame_types and self.possible_output_frame_types().intersection(requested_frame_types) == set():
|
||||
raise Exception(f"Requested frame types {requested_frame_types} are not supported by this service.")
|
||||
**kwargs) -> AsyncGenerator[QueueFrame, None]:
|
||||
try:
|
||||
if isinstance(frames, AsyncIterable):
|
||||
async for frame in frames:
|
||||
async for output_frame in self.process_frame(frame):
|
||||
yield output_frame
|
||||
elif isinstance(frames, Iterable):
|
||||
for frame in frames:
|
||||
async for output_frame in self.process_frame(frame):
|
||||
yield output_frame
|
||||
elif isinstance(frames, asyncio.Queue):
|
||||
while True:
|
||||
frame = await frames.get()
|
||||
async for output_frame in self.process_frame(frame):
|
||||
yield output_frame
|
||||
if isinstance(frame, EndStreamQueueFrame):
|
||||
break
|
||||
else:
|
||||
raise Exception("Frames must be an iterable or async iterable")
|
||||
|
||||
if not requested_frame_types:
|
||||
requested_frame_types = self.possible_output_frame_types()
|
||||
|
||||
if isinstance(frames, AsyncIterable):
|
||||
async for frame in frames:
|
||||
async for output_frame in self.process_frame(requested_frame_types, frame):
|
||||
yield output_frame
|
||||
elif isinstance(frames, Iterable):
|
||||
for frame in frames:
|
||||
async for output_frame in self.process_frame(requested_frame_types, frame):
|
||||
yield output_frame
|
||||
elif isinstance(frames, asyncio.Queue):
|
||||
while True:
|
||||
frame = await frames.get()
|
||||
async for output_frame in self.process_frame(requested_frame_types, frame):
|
||||
yield output_frame
|
||||
if frame.frame_type == FrameType.END_STREAM:
|
||||
break
|
||||
else:
|
||||
raise Exception("Frames must be an iterable or async iterable")
|
||||
async for output_frame in self.finalize():
|
||||
yield output_frame
|
||||
except Exception as e:
|
||||
self.logger.error("Exception occurred while running AI service", e)
|
||||
raise e
|
||||
|
||||
@abstractmethod
|
||||
async def process_frame(self, requested_frame_types:set[FrameType], frame:QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
# Yield something so the linter can deduce what should happen here.
|
||||
yield QueueFrame(FrameType.END_STREAM, None)
|
||||
|
||||
class SentenceAggregator(AIService):
|
||||
def __init__(self, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self.current_sentence = ""
|
||||
|
||||
def allowed_input_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.TEXT_CHUNK, FrameType.SENTENCE])
|
||||
|
||||
def possible_output_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.SENTENCE])
|
||||
|
||||
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if not FrameType.SENTENCE in requested_frame_types:
|
||||
return
|
||||
|
||||
if frame.frame_type == FrameType.TEXT_CHUNK:
|
||||
if type(frame.frame_data) != str:
|
||||
raise Exception(
|
||||
"Sentence aggregator requires a string for the data field"
|
||||
)
|
||||
|
||||
self.current_sentence += frame.frame_data
|
||||
if self.current_sentence.endswith((".", "?", "!")):
|
||||
sentence = self.current_sentence
|
||||
self.current_sentence = ""
|
||||
yield QueueFrame(FrameType.SENTENCE, sentence)
|
||||
elif frame.frame_type == FrameType.END_STREAM:
|
||||
if self.current_sentence:
|
||||
yield QueueFrame(FrameType.SENTENCE, self.current_sentence)
|
||||
elif frame.frame_type == FrameType.SENTENCE:
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, ControlQueueFrame):
|
||||
yield frame
|
||||
|
||||
@abstractmethod
|
||||
async def finalize(self) -> AsyncGenerator[QueueFrame, None]:
|
||||
# This is a trick for the interpreter (and linter) to know that this is a generator.
|
||||
if False:
|
||||
yield QueueFrame()
|
||||
|
||||
|
||||
class LLMService(AIService):
|
||||
def allowed_input_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.LLM_MESSAGE, FrameType.SENTENCE, FrameType.TRANSCRIPTION])
|
||||
|
||||
def allowed_output_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.SENTENCE, FrameType.TEXT_CHUNK])
|
||||
|
||||
@abstractmethod
|
||||
async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
|
||||
yield ""
|
||||
@@ -118,52 +90,74 @@ class LLMService(AIService):
|
||||
async def run_llm(self, messages) -> str:
|
||||
pass
|
||||
|
||||
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if frame.frame_type == FrameType.LLM_MESSAGE:
|
||||
if type(frame.frame_data) != list:
|
||||
raise Exception("LLM service requires a dict for the data field")
|
||||
async def process_frame(self, frame: QueueFrame, tool_choice: str = None) -> AsyncGenerator[QueueFrame, None]:
|
||||
function_name = ""
|
||||
arguments = ""
|
||||
if isinstance(frame, LLMMessagesQueueFrame):
|
||||
async for text_chunk in self.run_llm_async(frame.messages, tool_choice):
|
||||
if isinstance(text_chunk, str):
|
||||
yield TextQueueFrame(text_chunk)
|
||||
elif text_chunk.function:
|
||||
if text_chunk.function.name:
|
||||
# function_name += text_chunk.function.name
|
||||
yield LLMFunctionCallFrame(function_name=text_chunk.function.name, arguments=None)
|
||||
if text_chunk.function.arguments:
|
||||
# arguments += text_chunk.function.arguments
|
||||
yield LLMFunctionCallFrame(function_name=None, arguments=text_chunk.function.arguments)
|
||||
|
||||
messages: list[dict[str, str]] = frame.frame_data
|
||||
if FrameType.SENTENCE in requested_frame_types:
|
||||
yield QueueFrame(FrameType.SENTENCE, await self.run_llm(messages))
|
||||
else:
|
||||
async for text_chunk in self.run_llm_async(messages):
|
||||
yield QueueFrame(FrameType.TEXT_CHUNK, text_chunk)
|
||||
|
||||
# TODO: handle other frame types! Need to aggregate into messages
|
||||
if (function_name and arguments):
|
||||
# yield LLMFunctionCallFrame(function_name=function_name, arguments=arguments)
|
||||
function_name = ""
|
||||
arguments = ""
|
||||
yield LLMResponseEndQueueFrame()
|
||||
else:
|
||||
yield frame
|
||||
|
||||
|
||||
class TTSService(AIService):
|
||||
def __init__(self, aggregate_sentences=True):
|
||||
super().__init__()
|
||||
self.aggregate_sentences: bool = aggregate_sentences
|
||||
self.current_sentence: str = ""
|
||||
|
||||
# Some TTS services require a specific sample rate. We default to 16k
|
||||
def get_mic_sample_rate(self):
|
||||
return 16000
|
||||
|
||||
def allowed_input_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.SENTENCE, FrameType.TRANSCRIPTION, FrameType.TEXT_CHUNK])
|
||||
|
||||
def possible_output_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.AUDIO])
|
||||
|
||||
# Converts the sentence to audio. Yields a list of audio frames that can
|
||||
# Converts the text to audio. Yields a list of audio frames that can
|
||||
# be sent to the microphone device
|
||||
@abstractmethod
|
||||
async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
|
||||
async def run_tts(self, text) -> AsyncGenerator[bytes, None]:
|
||||
# yield empty bytes here, so linting can infer what this method does
|
||||
yield bytes()
|
||||
|
||||
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if not FrameType.AUDIO in requested_frame_types:
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if not isinstance(frame, TextQueueFrame):
|
||||
yield frame
|
||||
return
|
||||
|
||||
if type(frame.frame_data) != str:
|
||||
raise Exception("TTS service requires a string for the data field")
|
||||
text: str | None = None
|
||||
if not self.aggregate_sentences:
|
||||
text = frame.text
|
||||
else:
|
||||
self.current_sentence += frame.text
|
||||
if self.current_sentence.endswith((".", "?", "!")):
|
||||
text = self.current_sentence
|
||||
self.current_sentence = ""
|
||||
|
||||
async for audio_chunk in self.run_tts(frame.frame_data):
|
||||
yield QueueFrame(FrameType.AUDIO, audio_chunk)
|
||||
if text:
|
||||
# yield ChatMessageQueueFrame(message=text)
|
||||
async for audio_chunk in self.run_tts(text):
|
||||
yield AudioQueueFrame(audio_chunk)
|
||||
|
||||
async def finalize(self):
|
||||
if self.current_sentence:
|
||||
async for audio_chunk in self.run_tts(self.current_sentence):
|
||||
yield AudioQueueFrame(audio_chunk)
|
||||
|
||||
# Convenience function to send the audio for a sentence to the given queue
|
||||
async def say(self, sentence, queue: asyncio.Queue):
|
||||
await self.run_to_queue(queue, [QueueFrame(FrameType.SENTENCE, sentence)])
|
||||
await self.run_to_queue(queue, [TextQueueFrame(sentence)])
|
||||
|
||||
|
||||
class ImageGenService(AIService):
|
||||
@@ -171,30 +165,61 @@ class ImageGenService(AIService):
|
||||
super().__init__(**kwargs)
|
||||
self.image_size = image_size
|
||||
|
||||
def allowed_input_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.SENTENCE, FrameType.TRANSCRIPTION, FrameType.TEXT_CHUNK, FrameType.IMAGE_DESCRIPTION])
|
||||
|
||||
def possible_output_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.IMAGE])
|
||||
|
||||
# Renders the image. Returns an Image object.
|
||||
@abstractmethod
|
||||
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
|
||||
async def run_image_gen(self, sentence: str) -> tuple[str, bytes]:
|
||||
pass
|
||||
|
||||
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if not FrameType.IMAGE in requested_frame_types:
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if not isinstance(frame, TextQueueFrame):
|
||||
yield frame
|
||||
return
|
||||
|
||||
if type(frame.frame_data) != str:
|
||||
raise Exception("Image service requires a string for the data field")
|
||||
|
||||
(_, image_data) = await self.run_image_gen(frame.frame_data)
|
||||
yield QueueFrame(FrameType.IMAGE, image_data)
|
||||
(url, image_data) = await self.run_image_gen(frame.text)
|
||||
yield ImageQueueFrame(url, image_data)
|
||||
|
||||
|
||||
@dataclass
|
||||
class AIServiceConfig:
|
||||
tts: TTSService
|
||||
image: ImageGenService
|
||||
llm: LLMService
|
||||
class STTService(AIService):
|
||||
"""STTService is a base class for speech-to-text services."""
|
||||
|
||||
_frame_rate: int
|
||||
|
||||
def __init__(self, frame_rate: int = 16000, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self._frame_rate = frame_rate
|
||||
|
||||
@abstractmethod
|
||||
async def run_stt(self, audio: BinaryIO) -> str:
|
||||
"""Returns transcript as a string"""
|
||||
pass
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
"""Processes a frame of audio data, either buffering or transcribing it."""
|
||||
if not isinstance(frame, AudioQueueFrame):
|
||||
return
|
||||
|
||||
data = frame.data
|
||||
content = io.BufferedRandom(io.BytesIO())
|
||||
ww = wave.open(self._content, "wb")
|
||||
ww.setnchannels(1)
|
||||
ww.setsampwidth(2)
|
||||
ww.setframerate(self._frame_rate)
|
||||
ww.writeframesraw(data)
|
||||
ww.close()
|
||||
content.seek(0)
|
||||
text = await self.run_stt(content)
|
||||
yield TranscriptionQueueFrame(text, '', str(time.time()))
|
||||
|
||||
|
||||
class FrameLogger(AIService):
|
||||
def __init__(self, prefix="Frame", **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self.prefix = prefix
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, (AudioQueueFrame, ImageQueueFrame)):
|
||||
self.logger.info(f"{self.prefix}: {type(frame)}")
|
||||
else:
|
||||
self.logger.info(f"{self.prefix}: {frame}")
|
||||
|
||||
yield frame
|
||||
|
||||
@@ -15,30 +15,26 @@ from PIL import Image
|
||||
# See .env.example for Azure configuration needed
|
||||
from azure.cognitiveservices.speech import SpeechSynthesizer, SpeechConfig, ResultReason, CancellationReason
|
||||
|
||||
|
||||
class AzureTTSService(TTSService):
|
||||
def __init__(self, speech_key=None, speech_region=None):
|
||||
def __init__(self, *, api_key, region):
|
||||
super().__init__()
|
||||
|
||||
speech_key = speech_key or os.getenv("AZURE_SPEECH_SERVICE_KEY")
|
||||
speech_region = speech_region or os.getenv("AZURE_SPEECH_SERVICE_REGION")
|
||||
|
||||
self.speech_config = SpeechConfig(subscription=speech_key, region=speech_region)
|
||||
self.speech_synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=None)
|
||||
self.speech_config = SpeechConfig(subscription=api_key, region=region)
|
||||
self.speech_synthesizer = SpeechSynthesizer(
|
||||
speech_config=self.speech_config, audio_config=None)
|
||||
|
||||
async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
|
||||
self.logger.info("Running azure tts")
|
||||
ssml = "<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' " \
|
||||
"xmlns:mstts='http://www.w3.org/2001/mstts'>" \
|
||||
"<voice name='en-US-SaraNeural'>" \
|
||||
"<mstts:silence type='Sentenceboundary' value='20ms' />" \
|
||||
"<mstts:express-as style='lyrical' styledegree='2' role='SeniorFemale'>" \
|
||||
"<prosody rate='1.05'>" \
|
||||
f"{sentence}" \
|
||||
"</prosody></mstts:express-as></voice></speak> "
|
||||
try:
|
||||
result = await asyncio.to_thread(self.speech_synthesizer.speak_ssml, (ssml))
|
||||
except Exception as e:
|
||||
self.logger.error("Error in azure tts", e)
|
||||
"xmlns:mstts='http://www.w3.org/2001/mstts'>" \
|
||||
"<voice name='en-US-SaraNeural'>" \
|
||||
"<mstts:silence type='Sentenceboundary' value='20ms' />" \
|
||||
"<mstts:express-as style='lyrical' styledegree='2' role='SeniorFemale'>" \
|
||||
"<prosody rate='1.05'>" \
|
||||
f"{sentence}" \
|
||||
"</prosody></mstts:express-as></voice></speak> "
|
||||
result = await asyncio.to_thread(self.speech_synthesizer.speak_ssml, (ssml))
|
||||
self.logger.info("Got azure tts result")
|
||||
if result.reason == ResultReason.SynthesizingAudioCompleted:
|
||||
self.logger.info("Returning result")
|
||||
@@ -50,24 +46,15 @@ class AzureTTSService(TTSService):
|
||||
if cancellation_details.reason == CancellationReason.Error:
|
||||
self.logger.info("Error details: {}".format(cancellation_details.error_details))
|
||||
|
||||
|
||||
class AzureLLMService(LLMService):
|
||||
def __init__(self, api_key=None, azure_endpoint=None, api_version=None, model=None):
|
||||
def __init__(self, *, api_key, endpoint, api_version="2023-12-01-preview", model):
|
||||
super().__init__()
|
||||
api_key = api_key or os.getenv("AZURE_CHATGPT_KEY")
|
||||
self._model: str = model
|
||||
|
||||
azure_endpoint = azure_endpoint or os.getenv("AZURE_CHATGPT_ENDPOINT")
|
||||
if not azure_endpoint:
|
||||
raise Exception("No azure endpoint specified for Azure LLM, please set AZURE_CHATGPT_ENDPOINT in the environment or pass it to the AzureLLMService constructor")
|
||||
|
||||
model: str | None = model or os.getenv("AZURE_CHATGPT_DEPLOYMENT_ID")
|
||||
if not model:
|
||||
raise Exception("No model specified for Azure LLM, please set AZURE_CHATGPT_DEPLOYMENT_ID in the environment or pass it to the AzureLLMService constructor")
|
||||
self.model: str = model
|
||||
|
||||
api_version = api_version or "2023-12-01-preview"
|
||||
self.client = AsyncAzureOpenAI(
|
||||
self._client = AsyncAzureOpenAI(
|
||||
api_key=api_key,
|
||||
azure_endpoint=azure_endpoint,
|
||||
azure_endpoint=endpoint,
|
||||
api_version=api_version,
|
||||
)
|
||||
|
||||
@@ -75,7 +62,7 @@ class AzureLLMService(LLMService):
|
||||
messages_for_log = json.dumps(messages)
|
||||
self.logger.debug(f"Generating chat via azure: {messages_for_log}")
|
||||
|
||||
chunks = await self.client.chat.completions.create(model=self.model, stream=True, messages=messages)
|
||||
chunks = await self._client.chat.completions.create(model=self._model, stream=True, messages=messages)
|
||||
async for chunk in chunks:
|
||||
if len(chunk.choices) == 0:
|
||||
continue
|
||||
@@ -87,89 +74,70 @@ class AzureLLMService(LLMService):
|
||||
messages_for_log = json.dumps(messages)
|
||||
self.logger.debug(f"Generating chat via azure: {messages_for_log}")
|
||||
|
||||
response = await self.client.chat.completions.create(model=self.model, stream=False, messages=messages)
|
||||
response = await self._client.chat.completions.create(model=self._model, stream=False, messages=messages)
|
||||
if response and len(response.choices) > 0:
|
||||
return response.choices[0].message.content
|
||||
else:
|
||||
return None
|
||||
|
||||
|
||||
class AzureImageGenServiceREST(ImageGenService):
|
||||
|
||||
def __init__(self, image_size:str, api_key=None, azure_endpoint=None, api_version=None, model=None):
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
api_version="2023-06-01-preview",
|
||||
image_size: str,
|
||||
aiohttp_session: aiohttp.ClientSession,
|
||||
api_key,
|
||||
endpoint,
|
||||
model):
|
||||
super().__init__(image_size=image_size)
|
||||
self.api_key = api_key or os.getenv("AZURE_DALLE_KEY")
|
||||
self.azure_endpoint = azure_endpoint or os.getenv("AZURE_DALLE_ENDPOINT")
|
||||
self.api_version = api_version or "2023-06-01-preview"
|
||||
self.model = model or os.getenv("AZURE_DALLE_DEPLOYMENT_ID")
|
||||
|
||||
self._api_key = api_key
|
||||
self._azure_endpoint = endpoint
|
||||
self._api_version = api_version
|
||||
self._model = model
|
||||
self._aiohttp_session = aiohttp_session
|
||||
|
||||
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
|
||||
# TODO hoist the session to app-level
|
||||
async with aiohttp.ClientSession() as session:
|
||||
url = f"{self.azure_endpoint}openai/images/generations:submit?api-version={self.api_version}"
|
||||
headers= { "api-key": self.api_key, "Content-Type": "application/json" }
|
||||
body = {
|
||||
# Enter your prompt text here
|
||||
"prompt": sentence,
|
||||
"size": self.image_size,
|
||||
"n": 1,
|
||||
}
|
||||
async with session.post(url, headers=headers, json=body) as submission:
|
||||
operation_location = submission.headers['operation-location']
|
||||
url = f"{self._azure_endpoint}openai/images/generations:submit?api-version={self._api_version}"
|
||||
headers = {"api-key": self._api_key, "Content-Type": "application/json"}
|
||||
body = {
|
||||
# Enter your prompt text here
|
||||
"prompt": sentence,
|
||||
"size": self.image_size,
|
||||
"n": 1,
|
||||
}
|
||||
async with self._aiohttp_session.post(
|
||||
url, headers=headers, json=body
|
||||
) as submission:
|
||||
print(f"submission: {submission}")
|
||||
# We never get past this line, because this header isn't
|
||||
# defined on a 429 response, but something is eating our exceptions!
|
||||
operation_location = submission.headers['operation-location']
|
||||
print(f"submission status: {submission.status}")
|
||||
status = ""
|
||||
attempts_left = 120
|
||||
json_response = None
|
||||
while status != "succeeded":
|
||||
attempts_left -= 1
|
||||
if attempts_left == 0:
|
||||
raise Exception("Image generation timed out")
|
||||
|
||||
status = ""
|
||||
attempts_left = 120
|
||||
json_response = None
|
||||
while status != "succeeded":
|
||||
attempts_left -= 1
|
||||
if attempts_left == 0:
|
||||
raise Exception("Image generation timed out")
|
||||
await asyncio.sleep(1)
|
||||
response = await self._aiohttp_session.get(
|
||||
operation_location, headers=headers
|
||||
)
|
||||
json_response = await response.json()
|
||||
status = json_response["status"]
|
||||
|
||||
await asyncio.sleep(1)
|
||||
response = await session.get(operation_location, headers=headers)
|
||||
json_response = await response.json()
|
||||
status = json_response["status"]
|
||||
|
||||
image_url = json_response["result"]["data"][0]["url"] if json_response else None
|
||||
if not image_url:
|
||||
raise Exception("Image generation failed")
|
||||
|
||||
# Load the image from the url
|
||||
async with session.get(image_url) as response:
|
||||
image_stream = io.BytesIO(await response.content.read())
|
||||
image = Image.open(image_stream)
|
||||
return (image_url, image.tobytes())
|
||||
|
||||
|
||||
class AzureImageGenService(ImageGenService):
|
||||
|
||||
def __init__(self, api_key=None, azure_endpoint=None, api_version=None, model=None):
|
||||
super().__init__()
|
||||
|
||||
api_key = api_key or os.getenv("AZURE_DALLE_KEY")
|
||||
azure_endpoint = azure_endpoint or os.getenv("AZURE_DALLE_ENDPOINT")
|
||||
api_version = api_version or "2023-06-01-preview"
|
||||
self.model = model or os.getenv("AZURE_DALLE_DEPLOYMENT_ID")
|
||||
|
||||
self.client = AzureOpenAI(
|
||||
api_key=api_key,
|
||||
azure_endpoint=azure_endpoint,
|
||||
api_version=api_version,
|
||||
)
|
||||
|
||||
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
|
||||
self.logger.info("Generating azure image", sentence)
|
||||
|
||||
image = self.client.images.generate(
|
||||
model=self.model,
|
||||
prompt=sentence,
|
||||
n=1,
|
||||
size=self.image_size,
|
||||
)
|
||||
|
||||
url = image["data"][0]["url"]
|
||||
response = requests.get(url)
|
||||
|
||||
dalle_stream = io.BytesIO(response.content)
|
||||
dalle_im = Image.open(dalle_stream.tobytes())
|
||||
|
||||
return (url, dalle_im)
|
||||
image_url = json_response["result"]["data"][0]["url"] if json_response else None
|
||||
if not image_url:
|
||||
raise Exception("Image generation failed")
|
||||
# Load the image from the url
|
||||
async with self._aiohttp_session.get(image_url) as response:
|
||||
image_stream = io.BytesIO(await response.content.read())
|
||||
image = Image.open(image_stream)
|
||||
print("i got an image file!")
|
||||
return (image_url, image.tobytes())
|
||||
|
||||
435
src/dailyai/services/base_transport_service.py
Normal file
@@ -0,0 +1,435 @@
|
||||
from abc import abstractmethod
|
||||
import asyncio
|
||||
import functools
|
||||
import itertools
|
||||
import logging
|
||||
import numpy as np
|
||||
import pyaudio
|
||||
import torch
|
||||
import torchaudio
|
||||
import queue
|
||||
import threading
|
||||
import time
|
||||
from typing import AsyncGenerator
|
||||
from enum import Enum
|
||||
from typing import AsyncGenerator, AsyncIterable, BinaryIO, Iterable
|
||||
|
||||
from dailyai.queue_frame import (
|
||||
AudioQueueFrame,
|
||||
ChatMessageQueueFrame,
|
||||
EndStreamQueueFrame,
|
||||
ImageQueueFrame,
|
||||
QueueFrame,
|
||||
SpriteQueueFrame,
|
||||
StartStreamQueueFrame,
|
||||
UserStartedSpeakingFrame,
|
||||
UserStoppedSpeakingFrame
|
||||
)
|
||||
|
||||
|
||||
torch.set_num_threads(1)
|
||||
|
||||
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
|
||||
model='silero_vad',
|
||||
force_reload=False)
|
||||
|
||||
(get_speech_timestamps,
|
||||
save_audio,
|
||||
read_audio,
|
||||
VADIterator,
|
||||
collect_chunks) = utils
|
||||
|
||||
# Taken from utils_vad.py
|
||||
|
||||
|
||||
def validate(model,
|
||||
inputs: torch.Tensor):
|
||||
with torch.no_grad():
|
||||
outs = model(inputs)
|
||||
return outs
|
||||
|
||||
# Provided by Alexander Veysov
|
||||
|
||||
|
||||
def int2float(sound):
|
||||
abs_max = np.abs(sound).max()
|
||||
sound = sound.astype('float32')
|
||||
if abs_max > 0:
|
||||
sound *= 1/32768
|
||||
sound = sound.squeeze() # depends on the use case
|
||||
return sound
|
||||
|
||||
|
||||
FORMAT = pyaudio.paInt16
|
||||
CHANNELS = 1
|
||||
SAMPLE_RATE = 16000
|
||||
CHUNK = int(SAMPLE_RATE / 10)
|
||||
|
||||
audio = pyaudio.PyAudio()
|
||||
|
||||
|
||||
class VADState(Enum):
|
||||
QUIET = 1
|
||||
STARTING = 2
|
||||
SPEAKING = 3
|
||||
STOPPING = 4
|
||||
|
||||
class BaseTransportService():
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
**kwargs,
|
||||
) -> None:
|
||||
self._mic_enabled = kwargs.get("mic_enabled") or False
|
||||
self._mic_sample_rate = kwargs.get("mic_sample_rate") or 16000
|
||||
self._camera_enabled = kwargs.get("camera_enabled") or False
|
||||
self._camera_width = kwargs.get("camera_width") or 1024
|
||||
self._camera_height = kwargs.get("camera_height") or 768
|
||||
self._speaker_enabled = kwargs.get("speaker_enabled") or False
|
||||
self._speaker_sample_rate = kwargs.get("speaker_sample_rate") or 16000
|
||||
self._fps = kwargs.get("fps") or 8
|
||||
self._vad_start_s = kwargs.get("vad_start_s") or 0.2
|
||||
self._vad_stop_s = kwargs.get("vad_stop_s") or 0.8
|
||||
self._context = kwargs.get("context") or []
|
||||
self._vad_enabled = kwargs.get("vad_enabled") or False
|
||||
|
||||
if self._vad_enabled and self._speaker_enabled:
|
||||
raise Exception("Sorry, you can't use speaker_enabled and vad_enabled at the same time. Please set one to False.")
|
||||
|
||||
self._vad_samples = 1536
|
||||
vad_frame_s = self._vad_samples / SAMPLE_RATE
|
||||
self._vad_start_frames = round(self._vad_start_s / vad_frame_s)
|
||||
self._vad_stop_frames = round(self._vad_stop_s / vad_frame_s)
|
||||
self._vad_starting_count = 0
|
||||
self._vad_stopping_count = 0
|
||||
self._vad_state = VADState.QUIET
|
||||
self._user_is_speaking = False
|
||||
|
||||
duration_minutes = kwargs.get("duration_minutes") or 10
|
||||
self._expiration = time.time() + duration_minutes * 60
|
||||
|
||||
self.send_queue = asyncio.Queue()
|
||||
self.receive_queue = asyncio.Queue()
|
||||
|
||||
self._threadsafe_send_queue = queue.Queue()
|
||||
|
||||
self._images = None
|
||||
|
||||
try:
|
||||
self._loop: asyncio.AbstractEventLoop | None = asyncio.get_running_loop()
|
||||
except RuntimeError:
|
||||
self._loop = None
|
||||
|
||||
self._stop_threads = threading.Event()
|
||||
self._is_interrupted = threading.Event()
|
||||
|
||||
self._logger: logging.Logger = logging.getLogger()
|
||||
|
||||
def update_messages(self, new_context: list[dict[str, str]], task: asyncio.Task | None):
|
||||
if task:
|
||||
if not task.cancelled():
|
||||
self._current_phrase = ""
|
||||
self._context = new_context
|
||||
|
||||
|
||||
|
||||
async def run_pipeline(self, frame):
|
||||
# TODO-CB: This exception for missing class gets eaten!
|
||||
await self._runner(frame)
|
||||
|
||||
async def run_conversation(self, runner: Iterable[QueueFrame]
|
||||
| AsyncIterable[QueueFrame]
|
||||
| asyncio.Queue[QueueFrame],
|
||||
) -> AsyncGenerator[QueueFrame, None]:
|
||||
current_response_task = None
|
||||
self._runner = runner
|
||||
|
||||
async for frame in self.get_receive_frames():
|
||||
if isinstance(frame, EndStreamQueueFrame):
|
||||
break
|
||||
# elif not isinstance(frame, TranscriptionQueueFrame):
|
||||
# continue
|
||||
# TODO-CB: Verify this is an accurate replacement
|
||||
# if hasattr(frame, 'participantId') and frame.participantId == self._my_participant_id:
|
||||
# if not isinstance(frame, UserStoppedSpeakingFrame):
|
||||
# continue
|
||||
|
||||
if current_response_task and isinstance(frame, UserStartedSpeakingFrame):
|
||||
# TODO-CB: Maybe not always interrupt? Are there frame types we can pass through?
|
||||
current_response_task.cancel()
|
||||
self.interrupt()
|
||||
|
||||
# self._current_phrase += " " + frame.text
|
||||
# current_llm_context = copy.deepcopy(self._context)
|
||||
current_response_task = asyncio.create_task(
|
||||
self.run_pipeline(
|
||||
frame)
|
||||
)
|
||||
current_response_task.add_done_callback(
|
||||
functools.partial(self.update_messages, self._context)
|
||||
)
|
||||
|
||||
async def run(self):
|
||||
self._prerun()
|
||||
|
||||
async_output_queue_marshal_task = asyncio.create_task(
|
||||
self._marshal_frames())
|
||||
|
||||
self._camera_thread = threading.Thread(
|
||||
target=self._run_camera, daemon=True)
|
||||
self._camera_thread.start()
|
||||
|
||||
self._frame_consumer_thread = threading.Thread(
|
||||
target=self._frame_consumer, daemon=True)
|
||||
self._frame_consumer_thread.start()
|
||||
|
||||
if self._speaker_enabled:
|
||||
self._receive_audio_thread = threading.Thread(
|
||||
target=self._receive_audio, daemon=True)
|
||||
self._receive_audio_thread.start()
|
||||
|
||||
if self._vad_enabled:
|
||||
self._vad_thread = threading.Thread(target=self._vad, daemon=True)
|
||||
self._vad_thread.start()
|
||||
|
||||
try:
|
||||
while (
|
||||
time.time() < self._expiration
|
||||
and not self._stop_threads.is_set()
|
||||
):
|
||||
await asyncio.sleep(1)
|
||||
except Exception as e:
|
||||
self._logger.error(f"Exception {e}")
|
||||
raise e
|
||||
finally:
|
||||
# Do anything that must be done to clean up
|
||||
self._post_run()
|
||||
|
||||
self._stop_threads.set()
|
||||
|
||||
await self.send_queue.put(EndStreamQueueFrame())
|
||||
await async_output_queue_marshal_task
|
||||
await self.send_queue.join()
|
||||
self._frame_consumer_thread.join()
|
||||
|
||||
if self._speaker_enabled:
|
||||
self._receive_audio_thread.join()
|
||||
|
||||
if self._vad_enabled:
|
||||
self._vad_thread.join()
|
||||
|
||||
|
||||
def _post_run(self):
|
||||
# Note that this function must be idempotent! It can be called multiple times
|
||||
# if, for example, a keyboard interrupt occurs.
|
||||
pass
|
||||
|
||||
def stop(self):
|
||||
self._stop_threads.set()
|
||||
|
||||
async def stop_when_done(self):
|
||||
await self._wait_for_send_queue_to_empty()
|
||||
self.stop()
|
||||
|
||||
async def _wait_for_send_queue_to_empty(self):
|
||||
await self.send_queue.join()
|
||||
self._threadsafe_send_queue.join()
|
||||
|
||||
@abstractmethod
|
||||
def write_frame_to_camera(self, frame: bytes):
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def write_frame_to_mic(self, frame: bytes):
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def read_audio_frames(self, desired_frame_count):
|
||||
return bytes()
|
||||
|
||||
@abstractmethod
|
||||
def _prerun(self):
|
||||
pass
|
||||
|
||||
def _vad(self):
|
||||
# CB: Starting silero VAD stuff
|
||||
# TODO-CB: Probably need to force virtual speaker creation if we're
|
||||
# going to build this in?
|
||||
# TODO-CB: pyaudio installation
|
||||
while not self._stop_threads.is_set():
|
||||
audio_chunk = self.read_audio_frames(self._vad_samples)
|
||||
audio_int16 = np.frombuffer(audio_chunk, np.int16)
|
||||
audio_float32 = int2float(audio_int16)
|
||||
new_confidence = model(
|
||||
torch.from_numpy(audio_float32), 16000).item()
|
||||
speaking = new_confidence > 0.5
|
||||
if speaking:
|
||||
match self._vad_state:
|
||||
case VADState.QUIET:
|
||||
self._vad_state = VADState.STARTING
|
||||
self._vad_starting_count = 1
|
||||
case VADState.STARTING:
|
||||
self._vad_starting_count += 1
|
||||
case VADState.STOPPING:
|
||||
self._vad_state = VADState.SPEAKING
|
||||
self._vad_stopping_count = 0
|
||||
else:
|
||||
match self._vad_state:
|
||||
case VADState.STARTING:
|
||||
self._vad_state = VADState.QUIET
|
||||
self._vad_starting_count = 0
|
||||
case VADState.SPEAKING:
|
||||
self._vad_state = VADState.STOPPING
|
||||
self._vad_stopping_count = 1
|
||||
case VADState.STOPPING:
|
||||
self._vad_stopping_count += 1
|
||||
|
||||
if self._vad_state == VADState.STARTING and self._vad_starting_count >= self._vad_start_frames:
|
||||
print("##### VAD START")
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
self.receive_queue.put(
|
||||
UserStartedSpeakingFrame()), self._loop
|
||||
)
|
||||
self.interrupt()
|
||||
self._vad_state = VADState.SPEAKING
|
||||
self._vad_starting_count = 0
|
||||
if self._vad_state == VADState.STOPPING and self._vad_stopping_count >= self._vad_stop_frames:
|
||||
print("##### VAD STOP")
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
self.receive_queue.put(
|
||||
UserStoppedSpeakingFrame()), self._loop
|
||||
)
|
||||
self._vad_state = VADState.QUIET
|
||||
self._vad_stopping_count = 0
|
||||
|
||||
async def _marshal_frames(self):
|
||||
while True:
|
||||
frame: QueueFrame | list = await self.send_queue.get()
|
||||
self._threadsafe_send_queue.put(frame)
|
||||
self.send_queue.task_done()
|
||||
if isinstance(frame, EndStreamQueueFrame):
|
||||
break
|
||||
|
||||
def interrupt(self):
|
||||
print(f"!!!!! INTERRUPT")
|
||||
self._is_interrupted.set()
|
||||
|
||||
async def get_receive_frames(self) -> AsyncGenerator[QueueFrame, None]:
|
||||
while True:
|
||||
frame = await self.receive_queue.get()
|
||||
yield frame
|
||||
if isinstance(frame, EndStreamQueueFrame):
|
||||
break
|
||||
|
||||
def _receive_audio(self):
|
||||
if not self._loop:
|
||||
self._logger.error("No loop available for audio thread")
|
||||
return
|
||||
|
||||
seconds = 1
|
||||
desired_frame_count = self._speaker_sample_rate * seconds
|
||||
while not self._stop_threads.is_set():
|
||||
buffer = self.read_audio_frames(desired_frame_count)
|
||||
if len(buffer) > 0:
|
||||
frame = AudioQueueFrame(buffer)
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
self.receive_queue.put(frame), self._loop
|
||||
)
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
self.receive_queue.put(EndStreamQueueFrame()), self._loop
|
||||
)
|
||||
|
||||
def _set_image(self, image: bytes):
|
||||
self._images = itertools.cycle([image])
|
||||
|
||||
def _set_images(self, images: list[bytes], start_frame=0):
|
||||
self._images = itertools.cycle(images)
|
||||
|
||||
def _run_camera(self):
|
||||
try:
|
||||
while not self._stop_threads.is_set():
|
||||
if self._images:
|
||||
this_frame = next(self._images)
|
||||
self.write_frame_to_camera(this_frame)
|
||||
|
||||
time.sleep(1.0 / self._fps)
|
||||
except Exception as e:
|
||||
self._logger.error(f"Exception {e} in camera thread.")
|
||||
raise e
|
||||
|
||||
def _frame_consumer(self):
|
||||
self._logger.info("🎬 Starting frame consumer thread")
|
||||
b = bytearray()
|
||||
smallest_write_size = 3200
|
||||
largest_write_size = 8000
|
||||
all_audio_frames = bytearray()
|
||||
while True:
|
||||
try:
|
||||
frames_or_frame: QueueFrame | list[QueueFrame] = (
|
||||
self._threadsafe_send_queue.get()
|
||||
)
|
||||
if isinstance(frames_or_frame, AudioQueueFrame) and len(frames_or_frame.data) > largest_write_size:
|
||||
# subdivide large audio frames to enable interruption
|
||||
frames = []
|
||||
for i in range(0, len(frames_or_frame.data), largest_write_size):
|
||||
frames.append(AudioQueueFrame(frames_or_frame.data[i : i+largest_write_size]))
|
||||
elif isinstance(frames_or_frame, QueueFrame):
|
||||
frames: list[QueueFrame] = [frames_or_frame]
|
||||
elif isinstance(frames_or_frame, list):
|
||||
frames: list[QueueFrame] = frames_or_frame
|
||||
else:
|
||||
raise Exception("Unknown type in output queue")
|
||||
|
||||
for frame in frames:
|
||||
if isinstance(frame, EndStreamQueueFrame):
|
||||
self._logger.info("Stopping frame consumer thread")
|
||||
self._threadsafe_send_queue.task_done()
|
||||
return
|
||||
|
||||
# if interrupted, we just pull frames off the queue and discard them
|
||||
if not self._is_interrupted.is_set():
|
||||
if frame:
|
||||
if isinstance(frame, AudioQueueFrame):
|
||||
chunk = frame.data
|
||||
|
||||
all_audio_frames.extend(chunk)
|
||||
|
||||
b.extend(chunk)
|
||||
truncated_length: int = len(b) - (
|
||||
len(b) % smallest_write_size
|
||||
)
|
||||
if truncated_length:
|
||||
self.write_frame_to_mic(
|
||||
bytes(b[:truncated_length]))
|
||||
b = b[truncated_length:]
|
||||
elif isinstance(frame, ImageQueueFrame):
|
||||
self._set_image(frame.image)
|
||||
elif isinstance(frame, SpriteQueueFrame):
|
||||
self._set_images(frame.images)
|
||||
elif isinstance(frame, ChatMessageQueueFrame):
|
||||
self._send_chat_message(frame)
|
||||
elif len(b):
|
||||
self.write_frame_to_mic(bytes(b))
|
||||
b = bytearray()
|
||||
else:
|
||||
# if there are leftover audio bytes, write them now; failing to do so
|
||||
# can cause static in the audio stream.
|
||||
if len(b):
|
||||
truncated_length = len(b) - (len(b) % 160)
|
||||
self.write_frame_to_mic(
|
||||
bytes(b[:truncated_length]))
|
||||
b = bytearray()
|
||||
|
||||
if isinstance(frame, StartStreamQueueFrame):
|
||||
self._is_interrupted.clear()
|
||||
|
||||
self._threadsafe_send_queue.task_done()
|
||||
except queue.Empty:
|
||||
if len(b):
|
||||
self.write_frame_to_mic(bytes(b))
|
||||
|
||||
b = bytearray()
|
||||
except Exception as e:
|
||||
print(
|
||||
f"Exception in frame_consumer: {e}, {len(b)}")
|
||||
raise e
|
||||
@@ -1,15 +1,17 @@
|
||||
import asyncio
|
||||
import inspect
|
||||
import logging
|
||||
import time
|
||||
import signal
|
||||
import threading
|
||||
import types
|
||||
|
||||
from functools import partial
|
||||
from queue import Queue, Empty
|
||||
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.queue_frame import (
|
||||
TranscriptionQueueFrame,
|
||||
)
|
||||
|
||||
from threading import Thread, Event, Timer
|
||||
from threading import Event
|
||||
|
||||
from daily import (
|
||||
EventHandler,
|
||||
@@ -20,43 +22,43 @@ from daily import (
|
||||
VirtualSpeakerDevice,
|
||||
)
|
||||
|
||||
class DailyTransportService(EventHandler):
|
||||
from dailyai.services.base_transport_service import BaseTransportService
|
||||
|
||||
|
||||
class DailyTransportService(BaseTransportService, EventHandler):
|
||||
_daily_initialized = False
|
||||
_lock = threading.Lock()
|
||||
|
||||
_speaker_enabled: bool
|
||||
_speaker_sample_rate: int
|
||||
_vad_enabled: bool
|
||||
|
||||
# This is necessary to override EventHandler's __new__ method.
|
||||
def __new__(cls, *args, **kwargs):
|
||||
return super().__new__(cls)
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
room_url: str,
|
||||
token: str | None,
|
||||
bot_name: str,
|
||||
duration: float = 10,
|
||||
min_others_count: int = 1,
|
||||
start_transcription: bool = False,
|
||||
**kwargs,
|
||||
):
|
||||
super().__init__()
|
||||
self.bot_name: str = bot_name
|
||||
self.room_url: str = room_url
|
||||
self.token: str | None = token
|
||||
self.duration: float = duration
|
||||
self.expiration = time.time() + duration * 60
|
||||
super().__init__(**kwargs) # This will call BaseTransportService.__init__ method, not EventHandler
|
||||
|
||||
# This queue is used to marshal frames from the async send queue to the thread that emits audio & video.
|
||||
# We need this to maintain the asynchronous behavior of asyncio queues -- to give async functions
|
||||
# a chance to run while waiting for queue items -- but also to maintain thread safety and have a threaded
|
||||
# handler to send frames, to ensure that sending isn't subject to pauses in the async thread.
|
||||
self.threadsafe_send_queue = Queue()
|
||||
self._room_url: str = room_url
|
||||
self._bot_name: str = bot_name
|
||||
self._token: str | None = token
|
||||
self._min_others_count = min_others_count
|
||||
self._start_transcription = start_transcription
|
||||
|
||||
self.is_interrupted = Event()
|
||||
self.stop_threads = Event()
|
||||
self.story_started = False
|
||||
self.mic_enabled = False
|
||||
self.mic_sample_rate = 16000
|
||||
self.camera_width = 1024
|
||||
self.camera_height = 768
|
||||
self.camera_enabled = False
|
||||
self._is_interrupted = Event()
|
||||
self._stop_threads = Event()
|
||||
|
||||
self.send_queue = asyncio.Queue()
|
||||
self.receive_queue = asyncio.Queue()
|
||||
|
||||
self.other_participant_has_joined = False
|
||||
|
||||
self.camera_thread = None
|
||||
self.frame_consumer_thread = None
|
||||
self._other_participant_has_joined = False
|
||||
self._my_participant_id = None
|
||||
|
||||
self.transcription_settings = {
|
||||
"language": "en",
|
||||
@@ -70,41 +72,44 @@ class DailyTransportService(EventHandler):
|
||||
},
|
||||
}
|
||||
|
||||
self.logger: logging.Logger = logging.getLogger("dailyai")
|
||||
self._logger: logging.Logger = logging.getLogger("dailyai")
|
||||
|
||||
self.event_handlers = {}
|
||||
self._event_handlers = {}
|
||||
|
||||
def _patch_method(self, event_name, *args, **kwargs):
|
||||
try:
|
||||
self.loop = asyncio.get_running_loop()
|
||||
except RuntimeError:
|
||||
self.loop = None
|
||||
|
||||
def patch_method(self, event_name, *args, **kwargs):
|
||||
try:
|
||||
for handler in self.event_handlers[event_name]:
|
||||
for handler in self._event_handlers[event_name]:
|
||||
if inspect.iscoroutinefunction(handler):
|
||||
if self.loop:
|
||||
asyncio.run_coroutine_threadsafe(handler(*args, **kwargs), self.loop)
|
||||
if self._loop:
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
handler(*args, **kwargs), self._loop)
|
||||
else:
|
||||
raise Exception("No event loop to run coroutine. In order to use async event handlers, you must run the DailyTransportService in an asyncio event loop.")
|
||||
raise Exception(
|
||||
"No event loop to run coroutine. In order to use async event handlers, you must run the DailyTransportService in an asyncio event loop.")
|
||||
else:
|
||||
handler(*args, **kwargs)
|
||||
except Exception as e:
|
||||
self.logger.error(f"Exception in event handler {event_name}: {e}")
|
||||
self._logger.error(f"Exception in event handler {event_name}: {e}")
|
||||
raise e
|
||||
|
||||
def add_event_handler(self, event_name: str, handler):
|
||||
if not event_name.startswith("on_"):
|
||||
raise Exception(f"Event handler {event_name} must start with 'on_'")
|
||||
raise Exception(
|
||||
f"Event handler {event_name} must start with 'on_'")
|
||||
|
||||
methods = inspect.getmembers(self, predicate=inspect.ismethod)
|
||||
if event_name not in [method[0] for method in methods]:
|
||||
raise Exception(f"Event handler {event_name} not found")
|
||||
|
||||
if not event_name in self.event_handlers:
|
||||
self.event_handlers[event_name] = [getattr(self, event_name), types.MethodType(handler, self)]
|
||||
setattr(self, event_name, partial(self.patch_method, event_name))
|
||||
if event_name not in self._event_handlers:
|
||||
self._event_handlers[event_name] = [
|
||||
getattr(
|
||||
self, event_name), types.MethodType(
|
||||
handler, self)]
|
||||
setattr(self, event_name, partial(self._patch_method, event_name))
|
||||
else:
|
||||
self.event_handlers[event_name].append(types.MethodType(handler, self))
|
||||
self._event_handlers[event_name].append(
|
||||
types.MethodType(handler, self))
|
||||
|
||||
def event_handler(self, event_name: str):
|
||||
def decorator(handler):
|
||||
@@ -113,167 +118,153 @@ class DailyTransportService(EventHandler):
|
||||
|
||||
return decorator
|
||||
|
||||
def configure_daily(self):
|
||||
Daily.init()
|
||||
def write_frame_to_camera(self, frame: bytes):
|
||||
self.camera.write_frame(frame)
|
||||
|
||||
def write_frame_to_mic(self, frame: bytes):
|
||||
self.mic.write_frames(frame)
|
||||
|
||||
def read_audio_frames(self, desired_frame_count):
|
||||
bytes = self._speaker.read_frames(desired_frame_count)
|
||||
return bytes
|
||||
|
||||
def _prerun(self):
|
||||
# Only initialize Daily once
|
||||
if not DailyTransportService._daily_initialized:
|
||||
with DailyTransportService._lock:
|
||||
Daily.init()
|
||||
DailyTransportService._daily_initialized = True
|
||||
self.client = CallClient(event_handler=self)
|
||||
|
||||
if self.mic_enabled:
|
||||
if self._mic_enabled:
|
||||
self.mic: VirtualMicrophoneDevice = Daily.create_microphone_device(
|
||||
"mic", sample_rate=self.mic_sample_rate, channels=1
|
||||
"mic", sample_rate=self._mic_sample_rate, channels=1
|
||||
)
|
||||
|
||||
if self.camera_enabled:
|
||||
if self._camera_enabled:
|
||||
self.camera: VirtualCameraDevice = Daily.create_camera_device(
|
||||
"camera", width=self.camera_width, height=self.camera_height, color_format="RGB"
|
||||
"camera", width=self._camera_width, height=self._camera_height, color_format="RGB"
|
||||
)
|
||||
|
||||
self.speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
|
||||
"speaker", sample_rate=16000, channels=1
|
||||
)
|
||||
if self._speaker_enabled or self._vad_enabled:
|
||||
self._speaker: VirtualSpeakerDevice = Daily.create_speaker_device(
|
||||
"speaker", sample_rate=self._speaker_sample_rate, channels=1
|
||||
)
|
||||
Daily.select_speaker_device("speaker")
|
||||
|
||||
self.image: bytes | None = None
|
||||
self.camera_thread = Thread(target=self.run_camera, daemon=True)
|
||||
self.camera_thread.start()
|
||||
self.client.set_user_name(self._bot_name)
|
||||
|
||||
self.logger.info("Starting frame consumer thread")
|
||||
self.frame_consumer_thread = Thread(target=self.frame_consumer, daemon=True)
|
||||
self.frame_consumer_thread.start()
|
||||
|
||||
Daily.select_speaker_device("speaker")
|
||||
|
||||
self.client.set_user_name(self.bot_name)
|
||||
self.client.join(self.room_url, self.token, completion=self.call_joined)
|
||||
|
||||
self.client.update_inputs(
|
||||
{
|
||||
"camera": {
|
||||
"isEnabled": True,
|
||||
"settings": {
|
||||
"deviceId": "camera",
|
||||
self.client.join(
|
||||
self._room_url,
|
||||
self._token,
|
||||
completion=self.call_joined,
|
||||
client_settings={
|
||||
"inputs": {
|
||||
"camera": {
|
||||
"isEnabled": True,
|
||||
"settings": {
|
||||
"deviceId": "camera",
|
||||
},
|
||||
},
|
||||
},
|
||||
"microphone": {
|
||||
"isEnabled": True,
|
||||
"settings": {
|
||||
"deviceId": "mic",
|
||||
"customConstraints": {
|
||||
"autoGainControl": {"exact": False},
|
||||
"echoCancellation": {"exact": False},
|
||||
"noiseSuppression": {"exact": False},
|
||||
"microphone": {
|
||||
"isEnabled": True,
|
||||
"settings": {
|
||||
"deviceId": "mic",
|
||||
"customConstraints": {
|
||||
"autoGainControl": {"exact": False},
|
||||
"echoCancellation": {"exact": False},
|
||||
"noiseSuppression": {"exact": False},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
self.client.update_publishing(
|
||||
{
|
||||
"camera": {
|
||||
"sendSettings": {
|
||||
"maxQuality": "low",
|
||||
"encodings": {
|
||||
"low": {
|
||||
"maxBitrate": 250000,
|
||||
"scaleResolutionDownBy": 1.333,
|
||||
"maxFramerate": 8,
|
||||
}
|
||||
},
|
||||
"publishing": {
|
||||
"camera": {
|
||||
"sendSettings": {
|
||||
"maxQuality": "low",
|
||||
"encodings": {
|
||||
"low": {
|
||||
"maxBitrate": 250000,
|
||||
"scaleResolutionDownBy": 1.333,
|
||||
"maxFramerate": 8,
|
||||
}
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
},
|
||||
)
|
||||
self._my_participant_id = self.client.participants()["local"]["id"]
|
||||
|
||||
if self.token:
|
||||
self.client.update_subscription_profiles({
|
||||
"base": {
|
||||
"camera": "unsubscribed",
|
||||
}
|
||||
})
|
||||
|
||||
if self._token and self._start_transcription:
|
||||
self.client.start_transcription(self.transcription_settings)
|
||||
|
||||
self.my_participant_id = self.client.participants()["local"]["id"]
|
||||
self.original_sigint_handler = signal.getsignal(signal.SIGINT)
|
||||
signal.signal(signal.SIGINT, self.process_interrupt_handler)
|
||||
|
||||
async def get_receive_frames(self):
|
||||
while True:
|
||||
frame = await self.receive_queue.get()
|
||||
yield frame
|
||||
if frame.frame_type == FrameType.END_STREAM:
|
||||
break
|
||||
def process_interrupt_handler(self, signum, frame):
|
||||
self._post_run()
|
||||
if callable(self.original_sigint_handler):
|
||||
self.original_sigint_handler(signum, frame)
|
||||
|
||||
def get_async_send_queue(self):
|
||||
return self.send_queue
|
||||
|
||||
async def marshal_frames(self):
|
||||
while True:
|
||||
frame: QueueFrame | list = await self.send_queue.get()
|
||||
self.threadsafe_send_queue.put(frame)
|
||||
self.send_queue.task_done()
|
||||
if type(frame) == QueueFrame and frame.frame_type == FrameType.END_STREAM:
|
||||
break
|
||||
|
||||
async def wait_for_send_queue_to_empty(self):
|
||||
await self.send_queue.join()
|
||||
self.threadsafe_send_queue.join()
|
||||
|
||||
async def stop_when_done(self):
|
||||
await self.wait_for_send_queue_to_empty()
|
||||
self.stop()
|
||||
|
||||
async def run(self) -> None:
|
||||
self.configure_daily()
|
||||
|
||||
self.participant_left = False
|
||||
|
||||
async_output_queue_marshal_task = asyncio.create_task(self.marshal_frames())
|
||||
|
||||
try:
|
||||
participant_count: int = len(self.client.participants())
|
||||
self.logger.info(f"{participant_count} participants in room")
|
||||
while time.time() < self.expiration and not self.participant_left and not self.stop_threads.is_set():
|
||||
await asyncio.sleep(1)
|
||||
except Exception as e:
|
||||
self.logger.error(f"Exception {e}")
|
||||
finally:
|
||||
self.client.leave()
|
||||
|
||||
self.stop_threads.set()
|
||||
|
||||
await self.receive_queue.put(QueueFrame(FrameType.END_STREAM, None))
|
||||
await self.send_queue.put(QueueFrame(FrameType.END_STREAM, None))
|
||||
await async_output_queue_marshal_task
|
||||
|
||||
if self.camera_thread and self.camera_thread.is_alive():
|
||||
self.camera_thread.join()
|
||||
if self.frame_consumer_thread and self.frame_consumer_thread.is_alive():
|
||||
self.frame_consumer_thread.join()
|
||||
|
||||
def stop(self):
|
||||
self.stop_threads.set()
|
||||
def _post_run(self):
|
||||
self.client.leave()
|
||||
|
||||
def on_first_other_participant_joined(self):
|
||||
pass
|
||||
|
||||
def call_joined(self, join_data, client_error):
|
||||
self.logger.info(f"Call_joined: {join_data}, {client_error}")
|
||||
self._logger.info(f"Call_joined: {join_data}, {client_error}")
|
||||
|
||||
def dialout(self, number):
|
||||
self.client.start_dialout({"phoneNumber": number})
|
||||
|
||||
def start_recording(self):
|
||||
self.client.start_recording()
|
||||
|
||||
def on_error(self, error):
|
||||
self.logger.error(f"on_error: {error}")
|
||||
self._logger.error(f"on_error: {error}")
|
||||
|
||||
def on_call_state_updated(self, state):
|
||||
pass
|
||||
|
||||
def on_participant_joined(self, participant):
|
||||
if not self.other_participant_has_joined and participant["id"] != self.my_participant_id:
|
||||
self.other_participant_has_joined = True
|
||||
if not self._other_participant_has_joined and participant["id"] != self._my_participant_id:
|
||||
self._other_participant_has_joined = True
|
||||
self.on_first_other_participant_joined()
|
||||
|
||||
"""
|
||||
def on_participant_left(self, participant, reason):
|
||||
if len(self.client.participants()) < 2:
|
||||
self.participant_left = True
|
||||
pass
|
||||
if len(self.client.participants()) < self._min_others_count + 1:
|
||||
self._stop_threads.set()
|
||||
"""
|
||||
|
||||
def on_app_message(self, message, sender):
|
||||
print(f"app message: {message}")
|
||||
if self._loop:
|
||||
frame = TranscriptionQueueFrame(
|
||||
message["message"], message["name"], message["date"])
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
self.receive_queue.put(frame), self._loop)
|
||||
pass
|
||||
|
||||
def on_transcription_message(self, message:dict):
|
||||
if self.loop:
|
||||
frame = QueueFrame(FrameType.TRANSCRIPTION, message)
|
||||
asyncio.run_coroutine_threadsafe(self.receive_queue.put(frame), self.loop)
|
||||
def on_transcription_message(self, message: dict):
|
||||
if self._loop:
|
||||
participantId = ""
|
||||
if "participantId" in message:
|
||||
participantId = message["participantId"]
|
||||
elif "session_id" in message:
|
||||
participantId = message["session_id"]
|
||||
frame = TranscriptionQueueFrame(
|
||||
message["text"], participantId, message["timestamp"])
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
self.receive_queue.put(frame), self._loop)
|
||||
|
||||
def on_transcription_stopped(self, stopped_by, stopped_by_error):
|
||||
pass
|
||||
@@ -284,74 +275,10 @@ class DailyTransportService(EventHandler):
|
||||
def on_transcription_started(self, status):
|
||||
pass
|
||||
|
||||
def set_image(self, image: bytes):
|
||||
self.image: bytes | None = image
|
||||
def _send_chat_message(self, frame):
|
||||
self.client.send_app_message(
|
||||
{'message': frame.message, 'event': 'chat-msg', 'name': self._bot_name, 'date': time.time(), 'room': 'main-room'})
|
||||
|
||||
def run_camera(self):
|
||||
try:
|
||||
while not self.stop_threads.is_set():
|
||||
if self.image:
|
||||
self.camera.write_frame(self.image)
|
||||
|
||||
time.sleep(1.0 / 8) # 8 fps
|
||||
except Exception as e:
|
||||
self.logger.error(f"Exception {e} in camera thread.")
|
||||
|
||||
def frame_consumer(self):
|
||||
self.logger.info("🎬 Starting frame consumer thread")
|
||||
b = bytearray()
|
||||
smallest_write_size = 3200
|
||||
all_audio_frames = bytearray()
|
||||
while True:
|
||||
try:
|
||||
frames_or_frame: QueueFrame | list[QueueFrame] = self.threadsafe_send_queue.get()
|
||||
if type(frames_or_frame) == QueueFrame:
|
||||
frames: list[QueueFrame] = [frames_or_frame]
|
||||
elif type(frames_or_frame) == list:
|
||||
frames: list[QueueFrame] = frames_or_frame
|
||||
else:
|
||||
raise Exception("Unknown type in output queue")
|
||||
|
||||
for frame in frames:
|
||||
if frame.frame_type == FrameType.END_STREAM:
|
||||
self.logger.info("Stopping frame consumer thread")
|
||||
self.threadsafe_send_queue.task_done()
|
||||
return
|
||||
|
||||
# if interrupted, we just pull frames off the queue and discard them
|
||||
if not self.is_interrupted.is_set():
|
||||
if frame:
|
||||
if frame.frame_type == FrameType.AUDIO:
|
||||
chunk = frame.frame_data
|
||||
|
||||
all_audio_frames.extend(chunk)
|
||||
|
||||
b.extend(chunk)
|
||||
l = len(b) - (len(b) % smallest_write_size)
|
||||
if l:
|
||||
self.mic.write_frames(bytes(b[:l]))
|
||||
b = b[l:]
|
||||
elif frame.frame_type == FrameType.IMAGE:
|
||||
self.set_image(frame.frame_data)
|
||||
elif len(b):
|
||||
self.mic.write_frames(bytes(b))
|
||||
b = bytearray()
|
||||
else:
|
||||
if self.interrupt_time:
|
||||
self.logger.info(
|
||||
f"Lag to stop stream after interruption {time.perf_counter() - self.interrupt_time}"
|
||||
)
|
||||
self.interrupt_time = None
|
||||
|
||||
if frame.frame_type == FrameType.START_STREAM:
|
||||
self.is_interrupted.clear()
|
||||
|
||||
self.threadsafe_send_queue.task_done()
|
||||
except Empty:
|
||||
try:
|
||||
if len(b):
|
||||
self.mic.write_frames(bytes(b))
|
||||
except Exception as e:
|
||||
self.logger.error(f"Exception in frame_consumer: {e}, {len(b)}")
|
||||
|
||||
b = bytearray()
|
||||
def stop(self):
|
||||
super().stop()
|
||||
self.client.leave()
|
||||
|
||||
36
src/dailyai/services/deepgram_ai_service.py
Normal file
@@ -0,0 +1,36 @@
|
||||
import os
|
||||
import aiohttp
|
||||
import requests
|
||||
|
||||
from dailyai.services.ai_services import TTSService
|
||||
|
||||
|
||||
class DeepgramAIService(TTSService):
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
aiohttp_session: aiohttp.ClientSession,
|
||||
api_key,
|
||||
voice,
|
||||
sample_rate=16000
|
||||
):
|
||||
super().__init__()
|
||||
|
||||
self._api_key = api_key
|
||||
self._voice = voice
|
||||
self._sample_rate = sample_rate
|
||||
self._aiohttp_session = aiohttp_session
|
||||
|
||||
async def run_tts(self, sentence):
|
||||
self.logger.info(f"Running deepgram tts for {sentence}")
|
||||
base_url = "https://api.beta.deepgram.com/v1/speak"
|
||||
request_url = f"{base_url}?model={self._voice}&encoding=linear16&container=none&sample_rate={self._sample_rate}"
|
||||
headers = {"authorization": f"token {self._api_key}", "Content-Type": "application/json"}
|
||||
data = {"text": sentence}
|
||||
|
||||
async with self._aiohttp_session.post(
|
||||
request_url, headers=headers, json=data
|
||||
) as r:
|
||||
async for chunk in r.content:
|
||||
if chunk:
|
||||
yield chunk
|
||||
@@ -7,23 +7,24 @@ import requests
|
||||
from collections.abc import AsyncGenerator
|
||||
from dailyai.services.ai_services import TTSService
|
||||
|
||||
|
||||
class DeepgramTTSService(TTSService):
|
||||
def __init__(self, speech_key=None, voice=None):
|
||||
def __init__(self, *, aiohttp_session, api_key, voice="alpha-asteria-en-v2"):
|
||||
super().__init__()
|
||||
|
||||
self.voice = voice or os.getenv("DEEPGRAM_VOICE") or "alpha-asteria-en-v2"
|
||||
self.speech_key = speech_key or os.getenv("DEEPGRAM_API_KEY")
|
||||
|
||||
self._voice = voice
|
||||
self._api_key = api_key
|
||||
self._aiohttp_session = aiohttp_session
|
||||
|
||||
def get_mic_sample_rate(self):
|
||||
return 24000
|
||||
|
||||
async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
|
||||
self.logger.info(f"Running deepgram tts for {sentence}")
|
||||
base_url = "https://api.beta.deepgram.com/v1/speak"
|
||||
request_url = f"{base_url}?model={self.voice}&encoding=linear16&container=none&sample_rate=16000"
|
||||
headers = {"authorization": f"token {self.speech_key}"}
|
||||
body = { "text": sentence }
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(request_url, headers=headers, json=body) as r:
|
||||
async for data in r.content:
|
||||
yield data
|
||||
request_url = f"{base_url}?model={self._voice}&encoding=linear16&container=none&sample_rate=16000"
|
||||
headers = {"authorization": f"token {self._api_key}"}
|
||||
body = {"text": sentence}
|
||||
async with self._aiohttp_session.post(request_url, headers=headers, json=body) as r:
|
||||
async for data in r.content:
|
||||
yield data
|
||||
|
||||
@@ -9,28 +9,38 @@ from dailyai.services.ai_services import TTSService
|
||||
|
||||
|
||||
class ElevenLabsTTSService(TTSService):
|
||||
def __init__(self, api_key=None, voice_id=None):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
aiohttp_session: aiohttp.ClientSession,
|
||||
api_key,
|
||||
voice_id,
|
||||
):
|
||||
super().__init__()
|
||||
|
||||
self.api_key = api_key or os.getenv("ELEVENLABS_API_KEY")
|
||||
self.voice_id = voice_id or os.getenv("ELEVENLABS_VOICE_ID")
|
||||
self._api_key = api_key
|
||||
self._voice_id = voice_id
|
||||
self._aiohttp_session = aiohttp_session
|
||||
|
||||
async def run_tts(self, sentence) -> AsyncGenerator[bytes, None]:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
url = f"https://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}/stream"
|
||||
payload = {"text": sentence, "model_id": "eleven_turbo_v2"}
|
||||
querystring = {"output_format": "pcm_16000", "optimize_streaming_latency": 2}
|
||||
headers = {
|
||||
"xi-api-key": self.api_key,
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
async with session.post(url, json=payload, headers=headers, params=querystring) as r:
|
||||
if r.status != 200:
|
||||
self.logger.error(
|
||||
f"audio fetch status code: {r.status}, error: {r.text}"
|
||||
)
|
||||
return
|
||||
url = f"https://api.elevenlabs.io/v1/text-to-speech/{self._voice_id}/stream"
|
||||
payload = {"text": sentence, "model_id": "eleven_turbo_v2"}
|
||||
querystring = {"output_format": "pcm_16000",
|
||||
"optimize_streaming_latency": 2}
|
||||
headers = {
|
||||
"xi-api-key": self._api_key,
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
async with self._aiohttp_session.post(
|
||||
url, json=payload, headers=headers, params=querystring
|
||||
) as r:
|
||||
if r.status != 200:
|
||||
self.logger.error(
|
||||
f"audio fetch status code: {r.status}, error: {r.text}"
|
||||
)
|
||||
return
|
||||
|
||||
async for chunk in r.content:
|
||||
if chunk:
|
||||
yield chunk
|
||||
async for chunk in r.content:
|
||||
if chunk:
|
||||
yield chunk
|
||||
|
||||
@@ -2,30 +2,42 @@ import fal
|
||||
import aiohttp
|
||||
import asyncio
|
||||
import io
|
||||
import json
|
||||
import os
|
||||
from PIL import Image
|
||||
|
||||
from dailyai.services.ai_services import ImageGenService
|
||||
|
||||
from dailyai.services.ai_services import LLMService, TTSService, ImageGenService
|
||||
|
||||
from dailyai.services.ai_services import ImageGenService
|
||||
# Fal expects FAL_KEY_ID and FAL_KEY_SECRET to be set in the env
|
||||
|
||||
|
||||
class FalImageGenService(ImageGenService):
|
||||
def __init__(self, image_size):
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
image_size,
|
||||
aiohttp_session: aiohttp.ClientSession,
|
||||
key_id=None,
|
||||
key_secret=None):
|
||||
super().__init__(image_size)
|
||||
self._aiohttp_session = aiohttp_session
|
||||
if key_id:
|
||||
os.environ["FAL_KEY_ID"] = key_id
|
||||
if key_secret:
|
||||
os.environ["FAL_KEY_SECRET"] = key_secret
|
||||
|
||||
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
|
||||
def get_image_url(sentence, size):
|
||||
print("starting fal submit...")
|
||||
handler = fal.apps.submit(
|
||||
"110602490-fast-sdxl",
|
||||
arguments={
|
||||
"prompt": sentence
|
||||
"prompt": sentence
|
||||
},
|
||||
)
|
||||
print("past fal handler init, about to wait for iter_events...")
|
||||
)
|
||||
for event in handler.iter_events():
|
||||
if isinstance(event, fal.apps.InProgress):
|
||||
print('Request in progress')
|
||||
print(event.logs)
|
||||
pass
|
||||
|
||||
result = handler.get()
|
||||
|
||||
@@ -34,16 +46,9 @@ class FalImageGenService(ImageGenService):
|
||||
raise Exception("Image generation failed")
|
||||
|
||||
return image_url
|
||||
print(f"fetching image url...")
|
||||
image_url = await asyncio.to_thread(get_image_url, sentence, self.image_size)
|
||||
print(f"got image url, downloading image...")
|
||||
# Load the image from the url
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(image_url) as response:
|
||||
print("got image response")
|
||||
image_stream = io.BytesIO(await response.content.read())
|
||||
print("read image stream")
|
||||
image = Image.open(image_stream)
|
||||
return (image_url, image.tobytes())
|
||||
|
||||
# return (image_url, dalle_im.tobytes())
|
||||
async with self._aiohttp_session.get(image_url) as response:
|
||||
image_stream = io.BytesIO(await response.content.read())
|
||||
image = Image.open(image_stream)
|
||||
return (image_url, image.tobytes())
|
||||
|
||||
73
src/dailyai/services/local_stt_service.py
Normal file
@@ -0,0 +1,73 @@
|
||||
import array
|
||||
import io
|
||||
import math
|
||||
import time
|
||||
from typing import AsyncGenerator
|
||||
import wave
|
||||
from dailyai.queue_frame import AudioQueueFrame, QueueFrame, TranscriptionQueueFrame
|
||||
from dailyai.services.ai_services import STTService
|
||||
|
||||
|
||||
class LocalSTTService(STTService):
|
||||
_content: io.BufferedRandom
|
||||
_wave: wave.Wave_write
|
||||
_current_silence_frames: int
|
||||
|
||||
# Configuration
|
||||
_min_rms: int
|
||||
_max_silence_frames: int
|
||||
_frame_rate: int
|
||||
|
||||
def __init__(self,
|
||||
min_rms: int = 400,
|
||||
max_silence_frames: int = 3,
|
||||
frame_rate: int = 16000,
|
||||
**kwargs):
|
||||
super().__init__(frame_rate, **kwargs)
|
||||
self._current_silence_frames = 0
|
||||
self._min_rms = min_rms
|
||||
self._max_silence_frames = max_silence_frames
|
||||
self._frame_rate = frame_rate
|
||||
self._new_wave()
|
||||
|
||||
def _new_wave(self):
|
||||
"""Creates a new wave object and content buffer."""
|
||||
self._content = io.BufferedRandom(io.BytesIO())
|
||||
ww = wave.open(self._content, "wb")
|
||||
ww.setnchannels(1)
|
||||
ww.setsampwidth(2)
|
||||
ww.setframerate(self._frame_rate)
|
||||
self._wave = ww
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
"""Processes a frame of audio data, either buffering or transcribing it."""
|
||||
if not isinstance(frame, AudioQueueFrame):
|
||||
return
|
||||
|
||||
data = frame.data
|
||||
# Try to filter out empty background noise
|
||||
# (Very rudimentary approach, can be improved)
|
||||
rms = self._get_volume(data)
|
||||
if rms >= self._min_rms:
|
||||
# If volume is high enough, write new data to wave file
|
||||
self._wave.writeframesraw(data)
|
||||
|
||||
# If buffer is not empty and we detect a 3-frame pause in speech,
|
||||
# transcribe the audio gathered so far.
|
||||
if self._content.tell() > 0 and self._current_silence_frames > self._max_silence_frames:
|
||||
self._current_silence_frames = 0
|
||||
self._wave.close()
|
||||
self._content.seek(0)
|
||||
text = await self.run_stt(self._content)
|
||||
self._new_wave()
|
||||
yield TranscriptionQueueFrame(text, '', str(time.time()))
|
||||
# If we get this far, this is a frame of silence
|
||||
self._current_silence_frames += 1
|
||||
|
||||
def _get_volume(self, audio: bytes) -> float:
|
||||
# https://docs.python.org/3/library/array.html
|
||||
audio_array = array.array('h', audio)
|
||||
squares = [sample**2 for sample in audio_array]
|
||||
mean = sum(squares) / len(audio_array)
|
||||
rms = math.sqrt(mean)
|
||||
return rms
|
||||
76
src/dailyai/services/local_transport_service.py
Normal file
@@ -0,0 +1,76 @@
|
||||
import asyncio
|
||||
import time
|
||||
import numpy as np
|
||||
import tkinter as tk
|
||||
import pyaudio
|
||||
|
||||
from dailyai.services.base_transport_service import BaseTransportService
|
||||
|
||||
|
||||
class LocalTransportService(BaseTransportService):
|
||||
def __init__(self, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self._sample_width = kwargs.get("sample_width") or 2
|
||||
self._n_channels = kwargs.get("n_channels") or 1
|
||||
self._tk_root = kwargs.get("tk_root") or None
|
||||
|
||||
if self._camera_enabled and not self._tk_root:
|
||||
raise ValueError("If camera is enabled, a tkinter root must be provided")
|
||||
|
||||
if self._speaker_enabled:
|
||||
self._speaker_buffer_pending = bytearray()
|
||||
|
||||
async def _write_frame_to_tkinter(self, frame: bytes):
|
||||
data = f"P6 {self._camera_width} {self._camera_height} 255 ".encode() + frame
|
||||
photo = tk.PhotoImage(
|
||||
width=self._camera_width,
|
||||
height=self._camera_height,
|
||||
data=data,
|
||||
format="PPM")
|
||||
self._image_label.config(image=photo)
|
||||
|
||||
# This holds a reference to the photo, preventing it from being garbage collected.
|
||||
self._image_label.image = photo # type: ignore
|
||||
|
||||
def write_frame_to_camera(self, frame: bytes):
|
||||
if self._camera_enabled and self._loop:
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
self._write_frame_to_tkinter(frame), self._loop
|
||||
)
|
||||
|
||||
def write_frame_to_mic(self, frame: bytes):
|
||||
self._audio_stream.write(frame)
|
||||
|
||||
def read_frames(self, desired_frame_count):
|
||||
bytes = self._speaker_stream.read(
|
||||
desired_frame_count,
|
||||
exception_on_overflow=False,
|
||||
)
|
||||
return bytes
|
||||
|
||||
def _prerun(self):
|
||||
if self._mic_enabled:
|
||||
self._pyaudio = pyaudio.PyAudio()
|
||||
self._audio_stream = self._pyaudio.open(
|
||||
format=self._pyaudio.get_format_from_width(self._sample_width),
|
||||
channels=self._n_channels,
|
||||
rate=self._speaker_sample_rate,
|
||||
output=True,
|
||||
)
|
||||
|
||||
if self._camera_enabled:
|
||||
# Start with a neutral gray background.
|
||||
array = np.ones((1024, 1024, 3)) * 128
|
||||
data = f"P5 {1024} {1024} 255 ".encode() + array.astype(np.uint8).tobytes()
|
||||
photo = tk.PhotoImage(width=1024, height=1024, data=data, format="PPM")
|
||||
self._image_label = tk.Label(self._tk_root, image=photo)
|
||||
self._image_label.pack()
|
||||
|
||||
if self._speaker_enabled:
|
||||
self._speaker_stream = self._pyaudio.open(
|
||||
format=self._pyaudio.get_format_from_width(self._sample_width),
|
||||
channels=self._n_channels,
|
||||
rate=self._speaker_sample_rate,
|
||||
frames_per_buffer=self._speaker_sample_rate,
|
||||
input=True
|
||||
)
|
||||
42
src/dailyai/services/ollama_ai_services.py
Normal file
@@ -0,0 +1,42 @@
|
||||
from openai import AsyncOpenAI
|
||||
|
||||
import json
|
||||
from collections.abc import AsyncGenerator
|
||||
|
||||
from dailyai.services.ai_services import LLMService
|
||||
|
||||
|
||||
class OLLamaLLMService(LLMService):
|
||||
def __init__(self, model="llama2", base_url='http://localhost:11434/v1'):
|
||||
super().__init__()
|
||||
self._model = model
|
||||
self._client = AsyncOpenAI(api_key="ollama", base_url=base_url)
|
||||
|
||||
async def get_response(self, messages, stream):
|
||||
return await self._client.chat.completions.create(
|
||||
stream=stream,
|
||||
messages=messages,
|
||||
model=self._model
|
||||
)
|
||||
|
||||
async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
|
||||
messages_for_log = json.dumps(messages)
|
||||
self.logger.debug(f"Generating chat via openai: {messages_for_log}")
|
||||
|
||||
chunks = await self._client.chat.completions.create(model=self._model, stream=True, messages=messages)
|
||||
async for chunk in chunks:
|
||||
if len(chunk.choices) == 0:
|
||||
continue
|
||||
|
||||
if chunk.choices[0].delta.content:
|
||||
yield chunk.choices[0].delta.content
|
||||
|
||||
async def run_llm(self, messages) -> str | None:
|
||||
messages_for_log = json.dumps(messages)
|
||||
self.logger.debug(f"Generating chat via openai: {messages_for_log}")
|
||||
|
||||
response = await self._client.chat.completions.create(model=self._model, stream=False, messages=messages)
|
||||
if response and len(response.choices) > 0:
|
||||
return response.choices[0].message.content
|
||||
else:
|
||||
return None
|
||||
@@ -1,67 +1,83 @@
|
||||
import requests
|
||||
import aiohttp
|
||||
import asyncio
|
||||
from PIL import Image
|
||||
import io
|
||||
from openai import AsyncOpenAI
|
||||
|
||||
import os
|
||||
import json
|
||||
from collections.abc import AsyncGenerator
|
||||
|
||||
from dailyai.services.ai_services import AIService, TTSService, LLMService, ImageGenService
|
||||
from dailyai.services.ai_services import LLMService, ImageGenService
|
||||
|
||||
|
||||
class OpenAILLMService(LLMService):
|
||||
def __init__(self, api_key=None, model=None):
|
||||
def __init__(self, *, api_key, model="gpt-4", tools=None):
|
||||
super().__init__()
|
||||
api_key = api_key or os.getenv("OPEN_AI_KEY")
|
||||
self.model = model or os.getenv("OPEN_AI_LLM_MODEL") or "gpt-4"
|
||||
self.client = AsyncOpenAI(api_key=api_key)
|
||||
self._model = model
|
||||
self._tools = tools
|
||||
self._client = AsyncOpenAI(api_key=api_key)
|
||||
|
||||
async def get_response(self, messages, stream):
|
||||
return await self.client.chat.completions.create(
|
||||
return await self._client.chat.completions.create(
|
||||
stream=stream,
|
||||
messages=messages,
|
||||
model=self.model
|
||||
model=self._model,
|
||||
tools=self._tools
|
||||
)
|
||||
|
||||
async def run_llm_async(self, messages) -> AsyncGenerator[str, None]:
|
||||
async def run_llm_async(self, messages, tool_choice=None) -> AsyncGenerator[str, None]:
|
||||
messages_for_log = json.dumps(messages)
|
||||
self.logger.debug(f"Generating chat via openai: {messages_for_log}")
|
||||
|
||||
response = await self.get_response(messages, stream=True)
|
||||
|
||||
for chunk in response:
|
||||
print("---")
|
||||
print(f"tools: {self._tools}")
|
||||
print("---")
|
||||
print(f"messages: {messages_for_log}")
|
||||
print("-----")
|
||||
if self._tools:
|
||||
tools = self._tools
|
||||
else:
|
||||
tools = None
|
||||
chunks = await self._client.chat.completions.create(model=self._model, stream=True, messages=messages, tools=tools, tool_choice=tool_choice)
|
||||
async for chunk in chunks:
|
||||
if len(chunk.choices) == 0:
|
||||
continue
|
||||
|
||||
if chunk.choices[0].delta.content:
|
||||
if chunk.choices[0].delta.tool_calls:
|
||||
yield chunk.choices[0].delta.tool_calls[0]
|
||||
elif chunk.choices[0].delta.content:
|
||||
yield chunk.choices[0].delta.content
|
||||
|
||||
async def run_llm(self, messages) -> str | None:
|
||||
messages_for_log = json.dumps(messages)
|
||||
self.logger.debug(f"Generating chat via openai: {messages_for_log}")
|
||||
|
||||
response = await self.get_response(messages, stream=False)
|
||||
response = await self._client.chat.completions.create(model=self._model, stream=False, messages=messages)
|
||||
if response and len(response.choices) > 0:
|
||||
return response.choices[0].message.content
|
||||
else:
|
||||
return None
|
||||
|
||||
|
||||
class OpenAIImageGenService(ImageGenService):
|
||||
def __init__(self, image_size:str, api_key=None, model=None):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
image_size: str,
|
||||
aiohttp_session: aiohttp.ClientSession,
|
||||
api_key,
|
||||
model="dall-e-3",
|
||||
):
|
||||
super().__init__(image_size=image_size)
|
||||
api_key = api_key or os.getenv("OPEN_AI_KEY")
|
||||
self.model = model or os.getenv("OPEN_AI_IMAGE_MODEL") or "dall-e-3"
|
||||
self.client = AsyncOpenAI(api_key=api_key)
|
||||
self._model = model
|
||||
print(f"api key: {api_key}")
|
||||
self._client = AsyncOpenAI(api_key=api_key)
|
||||
self._aiohttp_session = aiohttp_session
|
||||
|
||||
async def run_image_gen(self, sentence) -> tuple[str, bytes]:
|
||||
self.logger.info("Generating OpenAI image", sentence)
|
||||
|
||||
image = await self.client.images.generate(
|
||||
image = await self._client.images.generate(
|
||||
prompt=sentence,
|
||||
model=self.model,
|
||||
model=self._model,
|
||||
n=1,
|
||||
size=self.image_size
|
||||
)
|
||||
@@ -70,10 +86,7 @@ class OpenAIImageGenService(ImageGenService):
|
||||
raise Exception("No image provided in response", image)
|
||||
|
||||
# Load the image from the url
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(image_url) as response:
|
||||
image_stream = io.BytesIO(await response.content.read())
|
||||
image = Image.open(image_stream)
|
||||
return (image_url, image.tobytes())
|
||||
|
||||
return (image_url, dalle_im.tobytes())
|
||||
async with self._aiohttp_session.get(image_url) as response:
|
||||
image_stream = io.BytesIO(await response.content.read())
|
||||
image = Image.open(image_stream)
|
||||
return (image_url, image.tobytes())
|
||||
|
||||
@@ -1,36 +1,40 @@
|
||||
import io
|
||||
import os
|
||||
import struct
|
||||
from pyht import Client
|
||||
from dotenv import load_dotenv
|
||||
from pyht.client import TTSOptions
|
||||
from pyht.protos.api_pb2 import Format
|
||||
|
||||
from services.ai_service import AIService
|
||||
from dailyai.services.ai_services import TTSService
|
||||
|
||||
class PlayHTAIService(AIService):
|
||||
def __init__(self, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
self.speech_key = os.getenv("PLAY_HT_KEY") or ''
|
||||
self.user_id = os.getenv("PLAY_HT_USER_ID") or ''
|
||||
class PlayHTAIService(TTSService):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
api_key,
|
||||
user_id,
|
||||
voice_url
|
||||
):
|
||||
super().__init__()
|
||||
|
||||
self.speech_key = api_key
|
||||
self.user_id = user_id
|
||||
|
||||
self.client = Client(
|
||||
user_id=self.user_id,
|
||||
api_key=self.speech_key,
|
||||
)
|
||||
self.options = TTSOptions(
|
||||
voice="s3://voice-cloning-zero-shot/820da3d2-3a3b-42e7-844d-e68db835a206/sarah/manifest.json",
|
||||
voice=voice_url,
|
||||
sample_rate=16000,
|
||||
quality="higher",
|
||||
format=Format.FORMAT_WAV
|
||||
)
|
||||
format=Format.FORMAT_WAV)
|
||||
|
||||
def close(self):
|
||||
super().close()
|
||||
def __del__(self):
|
||||
self.client.close()
|
||||
|
||||
def run_tts(self, sentence):
|
||||
async def run_tts(self, sentence):
|
||||
b = bytearray()
|
||||
in_header = True
|
||||
for chunk in self.client.tts(sentence, self.options):
|
||||
@@ -43,14 +47,15 @@ class PlayHTAIService(AIService):
|
||||
fh = io.BytesIO(b)
|
||||
fh.seek(36)
|
||||
(data, size) = struct.unpack('<4sI', fh.read(8))
|
||||
self.logger.info(f"first attempt: data: {data}, size: {hex(size)}, position: {fh.tell()}")
|
||||
self.logger.info(
|
||||
f"first attempt: data: {data}, size: {hex(size)}, position: {fh.tell()}")
|
||||
while data != b'data':
|
||||
fh.read(size)
|
||||
(data, size) = struct.unpack('<4sI', fh.read(8))
|
||||
self.logger.info(f"subsequent data: {data}, size: {hex(size)}, position: {fh.tell()}, data != data: {data != b'data'}")
|
||||
self.logger.info(
|
||||
f"subsequent data: {data}, size: {hex(size)}, position: {fh.tell()}, data != data: {data != b'data'}")
|
||||
self.logger.info("position: ", fh.tell())
|
||||
in_header = False
|
||||
else:
|
||||
if len(chunk):
|
||||
yield chunk
|
||||
|
||||
@@ -4,6 +4,8 @@ from services.ai_service import AIService
|
||||
|
||||
# Note that Cloudflare's AI workers are still in beta.
|
||||
# https://developers.cloudflare.com/workers-ai/
|
||||
|
||||
|
||||
class CloudflareAIService(AIService):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
@@ -15,15 +17,16 @@ class CloudflareAIService(AIService):
|
||||
|
||||
# base endpoint, used by the others
|
||||
def run(self, model, input):
|
||||
response = requests.post(f"{self.api_base_url}{model}", headers=self.headers, json=input)
|
||||
response = requests.post(
|
||||
f"{self.api_base_url}{model}", headers=self.headers, json=input)
|
||||
return response.json()
|
||||
|
||||
# https://developers.cloudflare.com/workers-ai/models/llm/
|
||||
def run_llm(self, messages, latest_user_message=None, stream = True):
|
||||
def run_llm(self, messages, latest_user_message=None, stream=True):
|
||||
input = {
|
||||
"messages": [
|
||||
{ "role": "system", "content": "You are a friendly assistant" },
|
||||
{ "role": "user", "content": sentence }
|
||||
{"role": "system", "content": "You are a friendly assistant"},
|
||||
{"role": "user", "content": sentence}
|
||||
]
|
||||
}
|
||||
|
||||
@@ -57,9 +60,9 @@ class CloudflareAIService(AIService):
|
||||
# https://developers.cloudflare.com/workers-ai/models/embedding/
|
||||
def run_embeddings(self, texts, size="medium"):
|
||||
models = {
|
||||
"small": "@cf/baai/bge-small-en-v1.5", # 384 output dimensions
|
||||
"medium": "@cf/baai/bge-base-en-v1.5", # 768 output dimensions
|
||||
"large": "@cf/baai/bge-large-en-v1.5" #1024 output dimensions
|
||||
"small": "@cf/baai/bge-small-en-v1.5", # 384 output dimensions
|
||||
"medium": "@cf/baai/bge-base-en-v1.5", # 768 output dimensions
|
||||
"large": "@cf/baai/bge-large-en-v1.5" # 1024 output dimensions
|
||||
}
|
||||
|
||||
return self.run(models[size], {"text": texts})
|
||||
|
||||
@@ -1,28 +0,0 @@
|
||||
import os
|
||||
import requests
|
||||
|
||||
from services.ai_service import AIService
|
||||
from PIL import Image
|
||||
|
||||
|
||||
class DeepgramAIService(AIService):
|
||||
def __init__(self, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
self.api_key = os.getenv("DEEPGRAM_API_KEY")
|
||||
|
||||
def get_mic_sample_rate(self):
|
||||
return 24000
|
||||
|
||||
def run_tts(self, sentence):
|
||||
self.logger.info(f"Running deepgram tts for {sentence}")
|
||||
base_url = "https://api.beta.deepgram.com/v1/speak"
|
||||
voice = os.getenv("DEEPGRAM_VOICE") or "alpha-apollo-en-v1" # move this to an environment variable
|
||||
request_url = f"{base_url}?model={voice}&encoding=linear16&container=none"
|
||||
headers = {"authorization": f"token {self.api_key}"}
|
||||
|
||||
r = requests.post(request_url, headers=headers, data=sentence)
|
||||
self.logger.info(
|
||||
f"audio fetch status code: {r.status_code}, content length: {len(r.content)}"
|
||||
)
|
||||
yield r.content
|
||||
@@ -2,9 +2,12 @@ from services.ai_service import AIService
|
||||
import openai
|
||||
import os
|
||||
|
||||
# To use Google Cloud's AI products, you'll need to install Google Cloud CLI and enable the TTS and in your project: https://cloud.google.com/sdk/docs/install
|
||||
# To use Google Cloud's AI products, you'll need to install Google Cloud
|
||||
# CLI and enable the TTS and in your project:
|
||||
# https://cloud.google.com/sdk/docs/install
|
||||
from google.cloud import texttospeech
|
||||
|
||||
|
||||
class GoogleAIService(AIService):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
@@ -15,11 +18,14 @@ class GoogleAIService(AIService):
|
||||
)
|
||||
|
||||
self.audio_config = texttospeech.AudioConfig(
|
||||
audio_encoding = texttospeech.AudioEncoding.LINEAR16,
|
||||
sample_rate_hertz = 16000
|
||||
audio_encoding=texttospeech.AudioEncoding.LINEAR16,
|
||||
sample_rate_hertz=16000
|
||||
)
|
||||
|
||||
def run_tts(self, sentence):
|
||||
synthesis_input = texttospeech.SynthesisInput(text = sentence.strip())
|
||||
result = self.client.synthesize_speech(input=synthesis_input, voice=self.voice, audio_config=self.audio_config)
|
||||
synthesis_input = texttospeech.SynthesisInput(text=sentence.strip())
|
||||
result = self.client.synthesize_speech(
|
||||
input=synthesis_input,
|
||||
voice=self.voice,
|
||||
audio_config=self.audio_config)
|
||||
return result
|
||||
|
||||
@@ -1,7 +1,12 @@
|
||||
from services.ai_service import AIService
|
||||
from transformers import pipeline
|
||||
|
||||
# These functions are just intended for testing, not production use. If you'd like to use HuggingFace, you should use your own models, or do some research into the specific models that will work best for your use case.
|
||||
# These functions are just intended for testing, not production use. If
|
||||
# you'd like to use HuggingFace, you should use your own models, or do
|
||||
# some research into the specific models that will work best for your use
|
||||
# case.
|
||||
|
||||
|
||||
class HuggingFaceAIService(AIService):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
@@ -10,9 +15,12 @@ class HuggingFaceAIService(AIService):
|
||||
classifier = pipeline("sentiment-analysis")
|
||||
return classifier(sentence)
|
||||
|
||||
# available models at https://huggingface.co/Helsinki-NLP (**not all models use 2-character language codes**)
|
||||
# available models at https://huggingface.co/Helsinki-NLP (**not all
|
||||
# models use 2-character language codes**)
|
||||
def run_text_translation(self, sentence, source_language, target_language):
|
||||
translator = pipeline(f"translation", model=f"Helsinki-NLP/opus-mt-{source_language}-{target_language}")
|
||||
translator = pipeline(
|
||||
f"translation",
|
||||
model=f"Helsinki-NLP/opus-mt-{source_language}-{target_language}")
|
||||
|
||||
return translator(sentence)[0]["translation_text"]
|
||||
|
||||
|
||||
@@ -4,6 +4,7 @@ import time
|
||||
from PIL import Image
|
||||
from services.ai_service import AIService
|
||||
|
||||
|
||||
class MockAIService(AIService):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
@@ -20,8 +21,7 @@ class MockAIService(AIService):
|
||||
time.sleep(1)
|
||||
return (image_url, image)
|
||||
|
||||
def run_llm(self, messages, latest_user_message=None, stream = True):
|
||||
def run_llm(self, messages, latest_user_message=None, stream=True):
|
||||
for i in range(5):
|
||||
time.sleep(1)
|
||||
yield({"choices": [{"delta": {"content": f"hello {i}!"}}]})
|
||||
|
||||
yield ({"choices": [{"delta": {"content": f"hello {i}!"}}]})
|
||||
|
||||
55
src/dailyai/services/whisper_ai_services.py
Normal file
@@ -0,0 +1,55 @@
|
||||
"""This module implements Whisper transcription with a locally-downloaded model."""
|
||||
import asyncio
|
||||
from enum import Enum
|
||||
import logging
|
||||
from typing import BinaryIO
|
||||
from faster_whisper import WhisperModel
|
||||
from dailyai.services.local_stt_service import LocalSTTService
|
||||
|
||||
|
||||
class Model(Enum):
|
||||
"""Class of basic Whisper model selection options"""
|
||||
TINY = "tiny"
|
||||
BASE = "base"
|
||||
MEDIUM = "medium"
|
||||
LARGE = "large-v3"
|
||||
DISTIL_LARGE_V2 = "Systran/faster-distil-whisper-large-v2"
|
||||
DISTIL_MEDIUM_EN = "Systran/faster-distil-whisper-medium.en"
|
||||
|
||||
|
||||
class WhisperSTTService(LocalSTTService):
|
||||
"""Class to transcribe audio with a locally-downloaded Whisper model"""
|
||||
_model: WhisperModel
|
||||
|
||||
# Model configuration
|
||||
_model_name: Model
|
||||
_device: str
|
||||
_compute_type: str
|
||||
|
||||
def __init__(self, model_name: Model = Model.DISTIL_MEDIUM_EN,
|
||||
device: str = "auto",
|
||||
compute_type: str = "default"):
|
||||
|
||||
super().__init__()
|
||||
self.logger: logging.Logger = logging.getLogger("dailyai")
|
||||
self._model_name = model_name
|
||||
self._device = device
|
||||
self._compute_type = compute_type
|
||||
self._load()
|
||||
|
||||
def _load(self):
|
||||
"""Loads the Whisper model. Note that if this is the first time
|
||||
this model is being run, it will take time to download."""
|
||||
model = WhisperModel(
|
||||
self._model_name.value,
|
||||
device=self._device,
|
||||
compute_type=self._compute_type)
|
||||
self._model = model
|
||||
|
||||
async def run_stt(self, audio: BinaryIO) -> str:
|
||||
"""Transcribes given audio using Whisper"""
|
||||
segments, _ = await asyncio.to_thread(self._model.transcribe, audio)
|
||||
res: str = ""
|
||||
for segment in segments:
|
||||
res += f"{segment.text} "
|
||||
return res
|
||||
@@ -1,35 +1,31 @@
|
||||
from re import A
|
||||
import unittest
|
||||
|
||||
from typing import AsyncGenerator, Generator
|
||||
|
||||
from dailyai.services.ai_services import AIService, SentenceAggregator
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.ai_services import AIService
|
||||
from dailyai.queue_frame import EndStreamQueueFrame, QueueFrame, TextQueueFrame
|
||||
|
||||
|
||||
class SimpleAIService(AIService):
|
||||
def allowed_input_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.TEXT_CHUNK])
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
yield frame
|
||||
|
||||
def possible_output_frame_types(self) -> set[FrameType]:
|
||||
return set([FrameType.TEXT_CHUNK])
|
||||
|
||||
async def process_frame(self, requested_frame_types: set[FrameType], frame: QueueFrame) -> QueueFrame | None:
|
||||
return frame
|
||||
|
||||
class TestBaseAIService(unittest.IsolatedAsyncioTestCase):
|
||||
async def test_async_input(self):
|
||||
service = SimpleAIService()
|
||||
|
||||
input_frames = [
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "hello"),
|
||||
QueueFrame(FrameType.END_STREAM, None),
|
||||
TextQueueFrame("hello"),
|
||||
EndStreamQueueFrame()
|
||||
]
|
||||
|
||||
async def iterate_frames() -> AsyncGenerator[QueueFrame, None]:
|
||||
for frame in input_frames:
|
||||
yield frame
|
||||
|
||||
output_frames = []
|
||||
async for frame in service.run(set([FrameType.TEXT_CHUNK]), iterate_frames()):
|
||||
async for frame in service.run(iterate_frames()):
|
||||
output_frames.append(frame)
|
||||
|
||||
self.assertEqual(input_frames, output_frames)
|
||||
@@ -37,93 +33,18 @@ class TestBaseAIService(unittest.IsolatedAsyncioTestCase):
|
||||
async def test_nonasync_input(self):
|
||||
service = SimpleAIService()
|
||||
|
||||
input_frames = [
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "hello"),
|
||||
QueueFrame(FrameType.END_STREAM, None),
|
||||
]
|
||||
input_frames = [TextQueueFrame("hello"), EndStreamQueueFrame()]
|
||||
|
||||
def iterate_frames() -> Generator[QueueFrame, None, None]:
|
||||
for frame in input_frames:
|
||||
yield frame
|
||||
|
||||
output_frames = []
|
||||
async for frame in service.run(set([FrameType.TEXT_CHUNK]), iterate_frames()):
|
||||
async for frame in service.run(iterate_frames()):
|
||||
output_frames.append(frame)
|
||||
|
||||
self.assertEqual(input_frames, output_frames)
|
||||
|
||||
|
||||
class TestSentenceAggregator(unittest.IsolatedAsyncioTestCase):
|
||||
async def test_clause(self) -> None:
|
||||
input_frames = [
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "hello"),
|
||||
QueueFrame(FrameType.END_STREAM, None),
|
||||
]
|
||||
|
||||
service = SentenceAggregator()
|
||||
output_frames = []
|
||||
async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
|
||||
output_frames.append(frame)
|
||||
|
||||
self.assertEqual(1, len(output_frames))
|
||||
self.assertEqual(QueueFrame(FrameType.SENTENCE, "hello"), output_frames[0])
|
||||
|
||||
async def test_sentence(self) -> None:
|
||||
input_frames = [
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "world."),
|
||||
QueueFrame(FrameType.END_STREAM, None),
|
||||
]
|
||||
|
||||
service = SentenceAggregator()
|
||||
output_frames = []
|
||||
async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
|
||||
output_frames.append(frame)
|
||||
|
||||
self.assertEqual(1, len(output_frames))
|
||||
self.assertEqual(QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0])
|
||||
|
||||
async def test_sentence_and_clause(self) -> None:
|
||||
input_frames = [
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "world."),
|
||||
QueueFrame(FrameType.TEXT_CHUNK, " How are"),
|
||||
QueueFrame(FrameType.END_STREAM, None),
|
||||
]
|
||||
|
||||
service = SentenceAggregator()
|
||||
output_frames = []
|
||||
async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
|
||||
output_frames.append(frame)
|
||||
|
||||
self.assertEqual(2, len(output_frames))
|
||||
self.assertEqual(
|
||||
QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0]
|
||||
)
|
||||
self.assertEqual(
|
||||
QueueFrame(FrameType.SENTENCE, " How are"), output_frames[1]
|
||||
)
|
||||
|
||||
async def test_two_sentences(self) -> None:
|
||||
input_frames = [
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "hello, "),
|
||||
QueueFrame(FrameType.TEXT_CHUNK, "world."),
|
||||
QueueFrame(FrameType.TEXT_CHUNK, " How are"),
|
||||
QueueFrame(FrameType.TEXT_CHUNK, " you doing?"),
|
||||
QueueFrame(FrameType.END_STREAM, None),
|
||||
]
|
||||
|
||||
service = SentenceAggregator()
|
||||
output_frames = []
|
||||
async for frame in service.run(set([FrameType.SENTENCE]), input_frames):
|
||||
output_frames.append(frame)
|
||||
|
||||
self.assertEqual(2, len(output_frames))
|
||||
self.assertEqual(
|
||||
QueueFrame(FrameType.SENTENCE, "hello, world."), output_frames[0]
|
||||
)
|
||||
self.assertEqual(QueueFrame(FrameType.SENTENCE, " How are you doing?"), output_frames[1])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
||||
@@ -1,180 +0,0 @@
|
||||
import time
|
||||
import unittest
|
||||
|
||||
from queue import Queue, Empty
|
||||
from threading import Thread, Event
|
||||
from typing import Generator
|
||||
|
||||
from dailyai.async_processor.async_processor import (
|
||||
AsyncProcessor,
|
||||
AsyncProcessorState,
|
||||
LLMResponse,
|
||||
)
|
||||
from dailyai.message_handler.message_handler import MessageHandler
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.ai_services import (
|
||||
AIServiceConfig,
|
||||
ImageGenService,
|
||||
LLMService,
|
||||
TTSService,
|
||||
)
|
||||
"""
|
||||
class MockTTSService(TTSService):
|
||||
def run_tts(self, sentence):
|
||||
for word in sentence.split(' '):
|
||||
time.sleep(0.1)
|
||||
yield bytes(word, "utf-8")
|
||||
|
||||
class MockLLMService(LLMService):
|
||||
def run_llm_async(self, messages) -> Generator[str, None, None]:
|
||||
for i in ["Hello ", "there.", "How are ", "you?", "I ", "hope ", "you ", "are ", "well."]:
|
||||
time.sleep(0.1)
|
||||
yield i
|
||||
|
||||
class MockImageService(ImageGenService):
|
||||
def run_image_gen(self, sentence) -> None:
|
||||
return None
|
||||
|
||||
class TestResponse(unittest.TestCase):
|
||||
def test_base_state_transitions(self):
|
||||
mock_tts_service = MockTTSService()
|
||||
mock_llm_service = MockLLMService()
|
||||
mock_image_service = MockImageService()
|
||||
processor = AsyncProcessor(AIServiceConfig(tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service))
|
||||
processor.prepare()
|
||||
processor.play()
|
||||
processor.finalize()
|
||||
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
|
||||
|
||||
def test_state_transitions(self):
|
||||
output_queue = Queue()
|
||||
mock_tts_service = MockTTSService()
|
||||
mock_llm_service = MockLLMService()
|
||||
mock_image_service = MockImageService()
|
||||
message_handler = MessageHandler("Hello World")
|
||||
processor = LLMResponse(
|
||||
AIServiceConfig(
|
||||
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
|
||||
),
|
||||
message_handler,
|
||||
output_queue,
|
||||
)
|
||||
processor.prepare()
|
||||
processor.play()
|
||||
|
||||
# Consume the output from the output queue. It's necessary to mark these tasks as done for the
|
||||
# play function to return.
|
||||
expected_words = ["Hello", "there.", "How", "are", "you?", "I", "hope", "you", "are", "well."]
|
||||
|
||||
# remove the "start_stream" message from the queue
|
||||
output_queue.get()
|
||||
output_queue.task_done()
|
||||
|
||||
while expected_words:
|
||||
actual_word:QueueFrame = output_queue.get()
|
||||
word = expected_words.pop(0)
|
||||
self.assertEqual(actual_word.frame_type, FrameType.AUDIO_FRAME)
|
||||
self.assertEqual(actual_word.frame_data, bytes(word, "utf-8"))
|
||||
output_queue.task_done()
|
||||
|
||||
processor.finalize()
|
||||
|
||||
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
|
||||
|
||||
def test_interrupt_preparation(self):
|
||||
output_queue = Queue()
|
||||
mock_tts_service = MockTTSService()
|
||||
mock_llm_service = MockLLMService()
|
||||
mock_image_service = MockImageService()
|
||||
message_handler = MessageHandler("System Message")
|
||||
processor = LLMResponse(
|
||||
AIServiceConfig(
|
||||
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
|
||||
),
|
||||
message_handler,
|
||||
output_queue,
|
||||
)
|
||||
processor.prepare()
|
||||
interrupt_request_at = time.perf_counter()
|
||||
processor.interrupt()
|
||||
processor.finalize()
|
||||
finalized_at = time.perf_counter()
|
||||
self.assertTrue(0.1 < finalized_at - interrupt_request_at < 0.2)
|
||||
print(f"delta: {interrupt_request_at, finalized_at}")
|
||||
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
|
||||
|
||||
def test_interrupt_play(self):
|
||||
output_queue = Queue()
|
||||
mock_tts_service = MockTTSService()
|
||||
mock_llm_service = MockLLMService()
|
||||
mock_image_service = MockImageService()
|
||||
message_handler = MessageHandler("System Message")
|
||||
processor = LLMResponse(
|
||||
AIServiceConfig(
|
||||
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
|
||||
),
|
||||
message_handler,
|
||||
output_queue,
|
||||
)
|
||||
processor.prepare()
|
||||
processor.play()
|
||||
|
||||
stop_processing_output_queue = Event()
|
||||
def process_output_queue_async():
|
||||
# Consume the output from the output queue. It's necessary to mark these tasks as done for the
|
||||
# play function to return.
|
||||
time.sleep(0.1)
|
||||
expected_words = ["Hello", "there.", "How", "are", "you?", "I", "hope", "you", "are", "well."]
|
||||
while expected_words and not stop_processing_output_queue.is_set():
|
||||
try:
|
||||
actual_word:QueueFrame = output_queue.get_nowait()
|
||||
if actual_word.frame_type == FrameType.AUDIO_FRAME:
|
||||
time.sleep(0.1)
|
||||
word = expected_words.pop(0)
|
||||
self.assertEqual(actual_word.frame_type, FrameType.AUDIO_FRAME)
|
||||
self.assertEqual(actual_word.frame_data, bytes(word, "utf-8"))
|
||||
output_queue.task_done()
|
||||
except Empty:
|
||||
pass
|
||||
|
||||
process_output_queue = Thread(target=process_output_queue_async, daemon=True)
|
||||
process_output_queue.start()
|
||||
|
||||
time.sleep(0.5)
|
||||
processor.interrupt()
|
||||
|
||||
stop_processing_output_queue.set()
|
||||
process_output_queue.join()
|
||||
|
||||
processor.finalize()
|
||||
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
|
||||
|
||||
def test_statechange_callback(self):
|
||||
mock_tts_service = MockTTSService()
|
||||
mock_llm_service = MockLLMService()
|
||||
mock_image_service = MockImageService()
|
||||
processor = AsyncProcessor(
|
||||
AIServiceConfig(
|
||||
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
|
||||
)
|
||||
)
|
||||
is_finalized = False
|
||||
def set_is_finalized(async_processor:AsyncProcessor):
|
||||
nonlocal is_finalized
|
||||
is_finalized = True
|
||||
|
||||
processor.set_state_callback(
|
||||
AsyncProcessorState.FINALIZED, set_is_finalized
|
||||
)
|
||||
processor.prepare()
|
||||
self.assertFalse(is_finalized)
|
||||
processor.play()
|
||||
self.assertFalse(is_finalized)
|
||||
processor.finalize()
|
||||
self.assertTrue(is_finalized)
|
||||
self.assertEqual(processor.state, AsyncProcessorState.FINALIZED)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
"""
|
||||
81
src/dailyai/tests/test_daily_transport_service.py
Normal file
@@ -0,0 +1,81 @@
|
||||
import asyncio
|
||||
import unittest
|
||||
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
from dailyai.queue_frame import AudioQueueFrame, ImageQueueFrame
|
||||
|
||||
|
||||
class TestDailyTransport(unittest.IsolatedAsyncioTestCase):
|
||||
|
||||
async def test_event_handler(self):
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
|
||||
transport = DailyTransportService("mock.daily.co/mock", "token", "bot")
|
||||
|
||||
was_called = False
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
def test_event_handler(transport):
|
||||
nonlocal was_called
|
||||
was_called = True
|
||||
|
||||
transport.on_first_other_participant_joined()
|
||||
|
||||
self.assertTrue(was_called)
|
||||
|
||||
async def test_event_handler_async(self):
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
|
||||
transport = DailyTransportService("mock.daily.co/mock", "token", "bot")
|
||||
|
||||
event = asyncio.Event()
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def test_event_handler(transport):
|
||||
nonlocal event
|
||||
await asyncio.sleep(0.1)
|
||||
event.set()
|
||||
|
||||
transport.on_first_other_participant_joined()
|
||||
|
||||
await asyncio.wait_for(event.wait(), timeout=1)
|
||||
self.assertTrue(event.is_set())
|
||||
|
||||
@patch("dailyai.services.daily_transport_service.CallClient")
|
||||
@patch("dailyai.services.daily_transport_service.Daily")
|
||||
async def test_run_with_camera_and_mic(self, daily_mock, callclient_mock):
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
transport = DailyTransportService(
|
||||
"https://mock.daily.co/mock",
|
||||
"token",
|
||||
"bot",
|
||||
mic_enabled=True,
|
||||
camera_enabled=True,
|
||||
duration_minutes=0.01,
|
||||
)
|
||||
|
||||
mic = MagicMock()
|
||||
camera = MagicMock()
|
||||
daily_mock.create_microphone_device.return_value = mic
|
||||
daily_mock.create_camera_device.return_value = camera
|
||||
|
||||
async def send_audio_frame():
|
||||
await transport.send_queue.put(AudioQueueFrame(bytes([0] * 3300)))
|
||||
|
||||
async def send_video_frame():
|
||||
await transport.send_queue.put(ImageQueueFrame(None, b"test"))
|
||||
|
||||
await asyncio.gather(transport.run(), send_audio_frame(), send_video_frame())
|
||||
|
||||
daily_mock.init.assert_called_once_with()
|
||||
daily_mock.create_microphone_device.assert_called_once()
|
||||
daily_mock.create_camera_device.assert_called_once()
|
||||
|
||||
callclient_mock.return_value.set_user_name.assert_called_once_with("bot")
|
||||
callclient_mock.return_value.join.assert_called_once_with(
|
||||
"https://mock.daily.co/mock", "token", completion=transport.call_joined
|
||||
)
|
||||
|
||||
camera.write_frame.assert_called_with(b"test")
|
||||
mic.write_frames.assert_called()
|
||||
@@ -1,147 +0,0 @@
|
||||
import time
|
||||
import unittest
|
||||
|
||||
from unittest.mock import MagicMock, call
|
||||
|
||||
from dailyai.message_handler.message_handler import MessageHandler, IndexingMessageHandler
|
||||
from dailyai.services.ai_services import (
|
||||
AIServiceConfig,
|
||||
TTSService,
|
||||
LLMService,
|
||||
ImageGenService,
|
||||
)
|
||||
from ..storage.search import SearchIndexer
|
||||
|
||||
|
||||
class TestMessageHandler(unittest.TestCase):
|
||||
def test_simple_intro(self):
|
||||
message_handler = MessageHandler("Hello world")
|
||||
self.assertEqual(
|
||||
message_handler.get_llm_messages(),
|
||||
[{"role": "system", "content": "Hello world"}],
|
||||
)
|
||||
|
||||
def test_simple_user_message(self):
|
||||
message_handler = MessageHandler("System prompt")
|
||||
message_handler.add_user_message("User message")
|
||||
self.assertEqual(
|
||||
message_handler.get_llm_messages(),
|
||||
[
|
||||
{"role": "system", "content": "System prompt"},
|
||||
{"role": "user", "content": "User message"},
|
||||
],
|
||||
)
|
||||
|
||||
def test_simple_user_and_assistant_message(self):
|
||||
message_handler = MessageHandler("System prompt")
|
||||
message_handler.add_user_message("User message")
|
||||
message_handler.add_assistant_message("Assistant message")
|
||||
self.assertEqual(
|
||||
message_handler.get_llm_messages(),
|
||||
[
|
||||
{"role": "system", "content": "System prompt"},
|
||||
{"role": "user", "content": "User message"},
|
||||
{"role": "assistant", "content": "Assistant message"},
|
||||
],
|
||||
)
|
||||
|
||||
def test_user_message_overwrite(self):
|
||||
message_handler = MessageHandler("System prompt")
|
||||
message_handler.add_user_message("User message")
|
||||
message_handler.add_assistant_message("Assistant message")
|
||||
message_handler.add_user_message("plus something else")
|
||||
self.assertEqual(
|
||||
message_handler.get_llm_messages(),
|
||||
[
|
||||
{"role": "system", "content": "System prompt"},
|
||||
{"role": "user", "content": "User message plus something else"},
|
||||
],
|
||||
)
|
||||
|
||||
def test_user_message_after_assistant(self):
|
||||
message_handler = MessageHandler("System prompt")
|
||||
message_handler.add_user_message("User message")
|
||||
message_handler.add_assistant_message("Assistant message")
|
||||
message_handler.finalize_user_message()
|
||||
message_handler.add_user_message("other user message")
|
||||
self.assertEqual(
|
||||
message_handler.get_llm_messages(),
|
||||
[
|
||||
{"role": "system", "content": "System prompt"},
|
||||
{"role": "user", "content": "User message"},
|
||||
{"role": "assistant", "content": "Assistant message"},
|
||||
{"role": "user", "content": "other user message"},
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
class MockTTSService(TTSService):
|
||||
def run_tts(self, sentence):
|
||||
for word in sentence.split(" "):
|
||||
time.sleep(0.1)
|
||||
yield bytes(word, "utf-8")
|
||||
|
||||
|
||||
class MockLLMService(LLMService):
|
||||
def run_llm(self, messages) -> str:
|
||||
return "Parsed user message."
|
||||
|
||||
class MockImageService(ImageGenService):
|
||||
def run_image_gen(self, sentence) -> None:
|
||||
return None
|
||||
|
||||
|
||||
class TestStorageMessageHandler(unittest.TestCase):
|
||||
def test_user_message_finalized(self):
|
||||
mock_tts_service = MockTTSService()
|
||||
mock_llm_service = MockLLMService()
|
||||
mock_image_service = MockImageService()
|
||||
|
||||
service_config = AIServiceConfig(
|
||||
tts=mock_tts_service, llm=mock_llm_service, image=mock_image_service
|
||||
)
|
||||
|
||||
mock_indexer = MagicMock(spec=SearchIndexer)
|
||||
|
||||
message_handler = IndexingMessageHandler(
|
||||
"Hello world", service_config, mock_indexer
|
||||
)
|
||||
message_handler.cleanup_user_message = MagicMock(return_value="Parsed user message.")
|
||||
message_handler.add_user_message("User message")
|
||||
message_handler.add_assistant_message("Assistant message will be ignored")
|
||||
message_handler.add_user_message("plus something else")
|
||||
message_handler.finalize_user_message()
|
||||
message_handler.add_assistant_message(
|
||||
"New assistant message will not be ignored"
|
||||
)
|
||||
message_handler.add_user_message("User message second time")
|
||||
message_handler.add_assistant_message("Assistant message second time")
|
||||
message_handler.write_messages_to_storage()
|
||||
|
||||
time.sleep(0.5)
|
||||
message_handler.cleanup_user_message.assert_called_with("User message plus something else")
|
||||
self.assertEqual(
|
||||
mock_indexer.mock_calls,
|
||||
[
|
||||
call.index_text('"Parsed user message."'),
|
||||
call.index_text("New assistant message will not be ignored"),
|
||||
],
|
||||
)
|
||||
|
||||
mock_indexer.reset_mock()
|
||||
|
||||
message_handler.finalize_user_message()
|
||||
|
||||
time.sleep(0.5)
|
||||
|
||||
self.assertEqual(
|
||||
mock_indexer.mock_calls,
|
||||
[
|
||||
call.index_text('"Parsed user message."'),
|
||||
call.index_text("Assistant message second time"),
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
64
src/examples/foundational/01-say-one-thing.py
Normal file
@@ -0,0 +1,64 @@
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.services.playht_ai_service import PlayHTAIService
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
async def main(room_url):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
# create a transport service object using environment variables for
|
||||
# the transport service's API key, room url, and any other configuration.
|
||||
# services can all define and document the environment variables they use.
|
||||
# services all also take an optional config object that is used instead of
|
||||
# environment variables.
|
||||
#
|
||||
# the abstract transport service APIs presumably can map pretty closely
|
||||
# to the daily-python basic API
|
||||
meeting_duration_minutes = 5
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Say One Thing",
|
||||
meeting_duration_minutes,
|
||||
mic_enabled=True
|
||||
)
|
||||
|
||||
"""
|
||||
tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
|
||||
"""
|
||||
tts = PlayHTAIService(
|
||||
api_key=os.getenv("PLAY_HT_API_KEY"),
|
||||
user_id=os.getenv("PLAY_HT_USER_ID"),
|
||||
voice_url=os.getenv("PLAY_HT_VOICE_URL"),
|
||||
)
|
||||
|
||||
# Register an event handler so we can play the audio when the participant joins.
|
||||
@transport.event_handler("on_participant_joined")
|
||||
async def on_participant_joined(transport, participant):
|
||||
nonlocal tts
|
||||
if participant["info"]["isLocal"]:
|
||||
return
|
||||
|
||||
await tts.say(
|
||||
"Hello there, " + participant["info"]["userName"] + "!",
|
||||
transport.send_queue,
|
||||
)
|
||||
|
||||
# wait for the output queue to be empty, then leave the meeting
|
||||
await transport.stop_when_done()
|
||||
|
||||
await transport.run()
|
||||
del(tts)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url))
|
||||
34
src/examples/foundational/01a-local-transport.py
Normal file
@@ -0,0 +1,34 @@
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.services.local_transport_service import LocalTransportService
|
||||
|
||||
|
||||
async def main():
|
||||
async with aiohttp.ClientSession() as session:
|
||||
meeting_duration_minutes = 1
|
||||
transport = LocalTransportService(
|
||||
duration_minutes=meeting_duration_minutes,
|
||||
mic_enabled=True
|
||||
)
|
||||
tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
|
||||
)
|
||||
|
||||
async def say_something():
|
||||
await asyncio.sleep(1)
|
||||
await tts.say(
|
||||
"Hello there.",
|
||||
transport.send_queue,
|
||||
)
|
||||
await transport.stop_when_done()
|
||||
|
||||
await asyncio.gather(transport.run(), say_something())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
59
src/examples/foundational/02-llm-say-one-thing.py
Normal file
@@ -0,0 +1,59 @@
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
import aiohttp
|
||||
|
||||
from dailyai.queue_frame import LLMMessagesQueueFrame
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.services.deepgram_ai_services import DeepgramTTSService
|
||||
from dailyai.services.open_ai_services import OpenAILLMService
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
async def main(room_url):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
meeting_duration_minutes = 1
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Say One Thing From an LLM",
|
||||
duration_minutes=meeting_duration_minutes,
|
||||
mic_enabled=True
|
||||
)
|
||||
|
||||
tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
|
||||
# tts = AzureTTSService(api_key=os.getenv("AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
# tts = DeepgramTTSService(aiohttp_session=session, api_key=os.getenv("DEEPGRAM_API_KEY"), voice=os.getenv("DEEPGRAM_VOICE"))
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
# llm = OpenAILLMService(api_key=os.getenv("OPENAI_CHATGPT_API_KEY"))
|
||||
messages = [{
|
||||
"role": "system",
|
||||
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world."
|
||||
}]
|
||||
tts_task = asyncio.create_task(
|
||||
tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
llm.run([LLMMessagesQueueFrame(messages)]),
|
||||
)
|
||||
)
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await tts_task
|
||||
await transport.stop_when_done()
|
||||
|
||||
await transport.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url))
|
||||
53
src/examples/foundational/03-still-frame.py
Normal file
@@ -0,0 +1,53 @@
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
|
||||
from dailyai.queue_frame import TextQueueFrame
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.fal_ai_services import FalImageGenService
|
||||
from dailyai.services.open_ai_services import OpenAIImageGenService
|
||||
from dailyai.services.azure_ai_services import AzureImageGenServiceREST
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
local_joined = False
|
||||
participant_joined = False
|
||||
|
||||
|
||||
async def main(room_url):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
meeting_duration_minutes = 1
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Show a still frame image",
|
||||
duration_minutes=meeting_duration_minutes,
|
||||
mic_enabled=False,
|
||||
camera_enabled=True,
|
||||
camera_width=1024,
|
||||
camera_height=1024
|
||||
)
|
||||
|
||||
imagegen = FalImageGenService(
|
||||
image_size="1024x1024",
|
||||
aiohttp_session=session,
|
||||
key_id=os.getenv("FAL_KEY_ID"),
|
||||
key_secret=os.getenv("FAL_KEY_SECRET"))
|
||||
# imagegen = OpenAIImageGenService(aiohttp_session=session, api_key=os.getenv("OPENAI_DALLE_API_KEY"), image_size="1024x1024")
|
||||
# imagegen = AzureImageGenServiceREST(image_size="1024x1024", aiohttp_session=session, api_key=os.getenv("AZURE_DALLE_API_KEY"), endpoint=os.getenv("AZURE_DALLE_ENDPOINT"), model=os.getenv("AZURE_DALLE_MODEL"))
|
||||
|
||||
image_task = asyncio.create_task(
|
||||
imagegen.run_to_queue(
|
||||
transport.send_queue, [
|
||||
TextQueueFrame("a cat in the style of picasso")]))
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await image_task
|
||||
|
||||
await transport.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url))
|
||||
50
src/examples/foundational/03a-image-local.py
Normal file
@@ -0,0 +1,50 @@
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
|
||||
import tkinter as tk
|
||||
|
||||
from dailyai.queue_frame import TextQueueFrame
|
||||
from dailyai.services.fal_ai_services import FalImageGenService
|
||||
from dailyai.services.local_transport_service import LocalTransportService
|
||||
|
||||
local_joined = False
|
||||
participant_joined = False
|
||||
|
||||
|
||||
async def main():
|
||||
async with aiohttp.ClientSession() as session:
|
||||
meeting_duration_minutes = 2
|
||||
tk_root = tk.Tk()
|
||||
tk_root.title("Calendar")
|
||||
transport = LocalTransportService(
|
||||
tk_root=tk_root,
|
||||
mic_enabled=True,
|
||||
camera_enabled=True,
|
||||
camera_width=1024,
|
||||
camera_height=1024,
|
||||
duration_minutes=meeting_duration_minutes,
|
||||
)
|
||||
|
||||
imagegen = FalImageGenService(
|
||||
image_size="1024x1024",
|
||||
aiohttp_session=session,
|
||||
key_id=os.getenv("FAL_KEY_ID"),
|
||||
key_secret=os.getenv("FAL_KEY_SECRET"),
|
||||
)
|
||||
image_task = asyncio.create_task(
|
||||
imagegen.run_to_queue(
|
||||
transport.send_queue, [TextQueueFrame("a cat in the style of picasso")]
|
||||
)
|
||||
)
|
||||
|
||||
async def run_tk():
|
||||
while not transport._stop_threads.is_set():
|
||||
tk_root.update()
|
||||
tk_root.update_idletasks()
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
await asyncio.gather(transport.run(), image_task, run_tk())
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
73
src/examples/foundational/04-utterance-and-speech.py
Normal file
@@ -0,0 +1,73 @@
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
import aiohttp
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.queue_frame import EndStreamQueueFrame, LLMMessagesQueueFrame
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
async def main(room_url: str):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Static And Dynamic Speech",
|
||||
duration_minutes=1,
|
||||
mic_enabled=True,
|
||||
mic_sample_rate=16000,
|
||||
camera_enabled=False
|
||||
)
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
azure_tts = AzureTTSService(
|
||||
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
|
||||
region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
elevenlabs_tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
|
||||
|
||||
messages = [{"role": "system", "content": "tell the user a joke about llamas"}]
|
||||
|
||||
# Start a task to run the LLM to create a joke, and convert the LLM output to audio frames. This task
|
||||
# will run in parallel with generating and speaking the audio for static text, so there's no delay to
|
||||
# speak the LLM response.
|
||||
buffer_queue = asyncio.Queue()
|
||||
llm_response_task = asyncio.create_task(
|
||||
elevenlabs_tts.run_to_queue(
|
||||
buffer_queue,
|
||||
llm.run([LLMMessagesQueueFrame(messages)]),
|
||||
True,
|
||||
)
|
||||
)
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await azure_tts.say("My friend the LLM is now going to tell a joke about llamas.", transport.send_queue)
|
||||
|
||||
async def buffer_to_send_queue():
|
||||
while True:
|
||||
frame = await buffer_queue.get()
|
||||
await transport.send_queue.put(frame)
|
||||
buffer_queue.task_done()
|
||||
if isinstance(frame, EndStreamQueueFrame):
|
||||
break
|
||||
|
||||
await asyncio.gather(llm_response_task, buffer_to_send_queue())
|
||||
|
||||
await transport.stop_when_done()
|
||||
|
||||
await transport.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url))
|
||||
134
src/examples/foundational/05-sync-speech-and-image.py
Normal file
@@ -0,0 +1,134 @@
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
|
||||
from dailyai.queue_frame import AudioQueueFrame, ImageQueueFrame
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureImageGenServiceREST, AzureTTSService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.fal_ai_services import FalImageGenService
|
||||
from dailyai.services.open_ai_services import OpenAIImageGenService
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
async def main(room_url):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
meeting_duration_minutes = 5
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Month Narration Bot",
|
||||
duration_minutes=meeting_duration_minutes,
|
||||
mic_enabled=True,
|
||||
camera_enabled=True,
|
||||
mic_sample_rate=16000,
|
||||
camera_width=1024,
|
||||
camera_height=1024
|
||||
)
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id="ErXwobaYiN019PkySvjV")
|
||||
# tts = AzureTTSService(api_key=os.getenv("AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
|
||||
dalle = FalImageGenService(
|
||||
image_size="1024x1024",
|
||||
aiohttp_session=session,
|
||||
key_id=os.getenv("FAL_KEY_ID"),
|
||||
key_secret=os.getenv("FAL_KEY_SECRET"))
|
||||
# dalle = OpenAIImageGenService(aiohttp_session=session, api_key=os.getenv("OPENAI_DALLE_API_KEY"), image_size="1024x1024")
|
||||
# dalle = AzureImageGenServiceREST(image_size="1024x1024", aiohttp_session=session, api_key=os.getenv("AZURE_DALLE_API_KEY"), endpoint=os.getenv("AZURE_DALLE_ENDPOINT"), model=os.getenv("AZURE_DALLE_MODEL"))
|
||||
|
||||
# Get a complete audio chunk from the given text. Splitting this into its own
|
||||
# coroutine lets us ensure proper ordering of the audio chunks on the send queue.
|
||||
async def get_all_audio(text):
|
||||
all_audio = bytearray()
|
||||
async for audio in tts.run_tts(text):
|
||||
all_audio.extend(audio)
|
||||
|
||||
return all_audio
|
||||
|
||||
async def get_month_data(month):
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
|
||||
}
|
||||
]
|
||||
|
||||
image_description = await llm.run_llm(messages)
|
||||
if not image_description:
|
||||
return
|
||||
|
||||
to_speak = f"{month}: {image_description}"
|
||||
audio_task = asyncio.create_task(get_all_audio(to_speak))
|
||||
image_task = asyncio.create_task(dalle.run_image_gen(image_description))
|
||||
print(f"about to gather tasks for {month}")
|
||||
(audio, image_data) = await asyncio.gather(
|
||||
audio_task, image_task
|
||||
)
|
||||
print(f"about to return from get_month_data for {month}")
|
||||
return {
|
||||
"month": month,
|
||||
"text": image_description,
|
||||
"image_url": image_data[0],
|
||||
"image": image_data[1],
|
||||
"audio": audio,
|
||||
}
|
||||
|
||||
months: list[str] = [
|
||||
"January",
|
||||
"February",
|
||||
"March",
|
||||
"April",
|
||||
"May",
|
||||
"June"
|
||||
]
|
||||
"""
|
||||
"February",
|
||||
"March",
|
||||
"April",
|
||||
"May",
|
||||
"June",
|
||||
"July",
|
||||
"August",
|
||||
"September",
|
||||
"October",
|
||||
"November",
|
||||
"December",
|
||||
"""
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
# This will play the months in the order they're completed. The benefit
|
||||
# is we'll have as little delay as possible before the first month, and
|
||||
# likely no delay between months, but the months won't display in order.
|
||||
for month_data_task in asyncio.as_completed(month_tasks):
|
||||
print(f"month_data_task: {month_data_task}")
|
||||
try:
|
||||
data = await month_data_task
|
||||
except Exception:
|
||||
print("OMG EXCEPTION!!!!")
|
||||
if data:
|
||||
await transport.send_queue.put(
|
||||
[
|
||||
ImageQueueFrame(data["image_url"], data["image"]),
|
||||
AudioQueueFrame(data["audio"]),
|
||||
]
|
||||
)
|
||||
|
||||
# wait for the output queue to be empty, then leave the meeting
|
||||
await transport.stop_when_done()
|
||||
|
||||
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
|
||||
|
||||
await transport.run()
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url))
|
||||
134
src/examples/foundational/05a-local-sync-speech-and-text.py
Normal file
@@ -0,0 +1,134 @@
|
||||
import aiohttp
|
||||
import argparse
|
||||
import asyncio
|
||||
import tkinter as tk
|
||||
import os
|
||||
|
||||
from dailyai.queue_frame import AudioQueueFrame, ImageQueueFrame
|
||||
from dailyai.services.azure_ai_services import AzureLLMService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.services.fal_ai_services import FalImageGenService
|
||||
from dailyai.services.local_transport_service import LocalTransportService
|
||||
|
||||
|
||||
async def main(room_url):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
meeting_duration_minutes = 5
|
||||
tk_root = tk.Tk()
|
||||
tk_root.title("Calendar")
|
||||
|
||||
transport = LocalTransportService(
|
||||
mic_enabled=True,
|
||||
camera_enabled=True,
|
||||
camera_width=1024,
|
||||
camera_height=1024,
|
||||
duration_minutes=meeting_duration_minutes,
|
||||
tk_root=tk_root,
|
||||
)
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"),
|
||||
)
|
||||
tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id="ErXwobaYiN019PkySvjV",
|
||||
)
|
||||
dalle = FalImageGenService(
|
||||
image_size="1024x1024",
|
||||
aiohttp_session=session,
|
||||
key_id=os.getenv("FAL_KEY_ID"),
|
||||
key_secret=os.getenv("FAL_KEY_SECRET"),
|
||||
)
|
||||
|
||||
# Get a complete audio chunk from the given text. Splitting this into its own
|
||||
# coroutine lets us ensure proper ordering of the audio chunks on the send queue.
|
||||
async def get_all_audio(text):
|
||||
all_audio = bytearray()
|
||||
async for audio in tts.run_tts(text):
|
||||
all_audio.extend(audio)
|
||||
|
||||
return all_audio
|
||||
|
||||
async def get_month_data(month):
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
|
||||
}
|
||||
]
|
||||
|
||||
image_description = await llm.run_llm(messages)
|
||||
if not image_description:
|
||||
return
|
||||
|
||||
to_speak = f"{month}: {image_description}"
|
||||
audio_task = asyncio.create_task(get_all_audio(to_speak))
|
||||
image_task = asyncio.create_task(dalle.run_image_gen(image_description))
|
||||
(audio, image_data) = await asyncio.gather(
|
||||
audio_task, image_task
|
||||
)
|
||||
|
||||
return {
|
||||
"month": month,
|
||||
"text": image_description,
|
||||
"image_url": image_data[0],
|
||||
"image": image_data[1],
|
||||
"audio": audio,
|
||||
}
|
||||
|
||||
months: list[str] = [
|
||||
"January",
|
||||
"February",
|
||||
"March",
|
||||
"April",
|
||||
"May",
|
||||
"June",
|
||||
"July",
|
||||
"August",
|
||||
"September",
|
||||
"October",
|
||||
"November",
|
||||
"December",
|
||||
]
|
||||
|
||||
async def show_images():
|
||||
# This will play the months in the order they're completed. The benefit
|
||||
# is we'll have as little delay as possible before the first month, and
|
||||
# likely no delay between months, but the months won't display in order.
|
||||
for month_data_task in asyncio.as_completed(month_tasks):
|
||||
data = await month_data_task
|
||||
if data:
|
||||
await transport.send_queue.put(
|
||||
[
|
||||
ImageQueueFrame(data["image_url"], data["image"]),
|
||||
AudioQueueFrame(data["audio"]),
|
||||
]
|
||||
)
|
||||
|
||||
await asyncio.sleep(25)
|
||||
|
||||
# wait for the output queue to be empty, then leave the meeting
|
||||
await transport.stop_when_done()
|
||||
|
||||
async def run_tk():
|
||||
while not transport._stop_threads.is_set():
|
||||
tk_root.update()
|
||||
tk_root.update_idletasks()
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
|
||||
|
||||
await asyncio.gather(transport.run(), show_images(), run_tk())
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
asyncio.run(main(args.url))
|
||||
99
src/examples/foundational/06-listen-and-respond.py
Normal file
@@ -0,0 +1,99 @@
|
||||
import aiohttp
|
||||
import asyncio
|
||||
import os
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.services.open_ai_services import OpenAILLMService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.queue_aggregators import LLMAssistantContextAggregator, LLMContextAggregator, LLMUserContextAggregator
|
||||
from examples.foundational.support.runner import configure
|
||||
from dailyai.queue_frame import LLMMessagesQueueFrame, TranscriptionQueueFrame, QueueFrame, TextQueueFrame
|
||||
from dailyai.services.ai_services import FrameLogger, AIService
|
||||
|
||||
class TranscriptFilter(AIService):
|
||||
def __init__(self, bot_participant_id=None):
|
||||
super().__init__()
|
||||
self.bot_participant_id = bot_participant_id
|
||||
print(f"Filtering transcripts from : {self.bot_participant_id}")
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, TranscriptionQueueFrame):
|
||||
if frame.participantId != self.bot_participant_id:
|
||||
yield frame
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
global transport
|
||||
global llm
|
||||
global tts
|
||||
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
5,
|
||||
mic_enabled=True,
|
||||
mic_sample_rate=16000,
|
||||
camera_enabled=False
|
||||
)
|
||||
|
||||
# llm = AzureLLMService(api_key=os.getenv("AZURE_CHATGPT_API_KEY"), endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"), model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_CHATGPT_API_KEY"))
|
||||
# tts = AzureTTSService(api_key=os.getenv("AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id="EXAVITQu4vr4xnSDxMaL")
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": """You are Valerie, an agent for a company called Valorant Health. Your job is to help users get access to health care. You're talking to Chad Bailey, a 40 year old male who needs to see a doctor.
|
||||
|
||||
You need to do three things, in this order:
|
||||
|
||||
1. Confirm the user's identity.
|
||||
2. Find out what kinds of doctors the user needs to see.
|
||||
3. Get the name of their insurance company.
|
||||
|
||||
Start by introducing yourself and asking the user to verify their identity by providing their date of birth. Once their identity is confirmed, move on to step 2, then to step 3.
|
||||
|
||||
Once you have collected all of that information, respond with a JSON object containing the answers."""}
|
||||
]
|
||||
tma_in = LLMUserContextAggregator(messages, transport._my_participant_id)
|
||||
tma_out = LLMAssistantContextAggregator(messages, transport._my_participant_id)
|
||||
# checklist = ChecklistProcessor(messages, llm)
|
||||
|
||||
async def handle_transcriptions():
|
||||
tf = TranscriptFilter(transport._my_participant_id)
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
tma_out.run(
|
||||
llm.run(
|
||||
tma_in.run(
|
||||
tf.run(
|
||||
transport.get_receive_frames()
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
fl = FrameLogger("first other participant")
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
fl.run(
|
||||
tma_out.run(
|
||||
llm.run([LLMMessagesQueueFrame(messages)]),
|
||||
)
|
||||
)
|
||||
)
|
||||
transport.transcription_settings["extra"]["endpointing"] = True
|
||||
transport.transcription_settings["extra"]["punctuate"] = True
|
||||
await asyncio.gather(transport.run(), handle_transcriptions())
|
||||
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
115
src/examples/foundational/06a-image-sync.py
Normal file
@@ -0,0 +1,115 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
import os
|
||||
from typing import AsyncGenerator
|
||||
import aiohttp
|
||||
import requests
|
||||
import time
|
||||
import urllib.parse
|
||||
|
||||
from PIL import Image
|
||||
from dailyai.queue_frame import ImageQueueFrame, QueueFrame
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.services.ai_services import AIService
|
||||
from dailyai.queue_aggregators import LLMAssistantContextAggregator, LLMUserContextAggregator
|
||||
from dailyai.services.fal_ai_services import FalImageGenService
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
class ImageSyncAggregator(AIService):
|
||||
def __init__(self, speaking_path: str, waiting_path: str):
|
||||
self._speaking_image = Image.open(speaking_path)
|
||||
self._speaking_image_bytes = self._speaking_image.tobytes()
|
||||
|
||||
self._waiting_image = Image.open(waiting_path)
|
||||
self._waiting_image_bytes = self._waiting_image.tobytes()
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
yield ImageQueueFrame(None, self._speaking_image_bytes)
|
||||
yield frame
|
||||
yield ImageQueueFrame(None, self._waiting_image_bytes)
|
||||
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
5,
|
||||
)
|
||||
transport._camera_enabled = True
|
||||
transport._camera_width = 1024
|
||||
transport._camera_height = 1024
|
||||
transport._mic_enabled = True
|
||||
transport._mic_sample_rate = 16000
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
tts = AzureTTSService(
|
||||
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
|
||||
region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
img = FalImageGenService(
|
||||
image_size="1024x1024",
|
||||
aiohttp_session=session,
|
||||
key_id=os.getenv("FAL_KEY_ID"),
|
||||
key_secret=os.getenv("FAL_KEY_SECRET"))
|
||||
|
||||
async def get_images():
|
||||
get_speaking_task = asyncio.create_task(
|
||||
img.run_image_gen("An image of a cat speaking")
|
||||
)
|
||||
get_waiting_task = asyncio.create_task(
|
||||
img.run_image_gen("An image of a cat waiting")
|
||||
)
|
||||
|
||||
(speaking_data, waiting_data) = await asyncio.gather(
|
||||
get_speaking_task, get_waiting_task
|
||||
)
|
||||
|
||||
return speaking_data, waiting_data
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await tts.say("Hi, I'm listening!", transport.send_queue)
|
||||
|
||||
async def handle_transcriptions():
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
|
||||
]
|
||||
|
||||
tma_in = LLMUserContextAggregator(
|
||||
messages, transport._my_participant_id
|
||||
)
|
||||
tma_out = LLMAssistantContextAggregator(
|
||||
messages, transport._my_participant_id
|
||||
)
|
||||
image_sync_aggregator = ImageSyncAggregator(
|
||||
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
|
||||
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
|
||||
)
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
image_sync_aggregator.run(
|
||||
tma_out.run(
|
||||
llm.run(
|
||||
tma_in.run(
|
||||
transport.get_receive_frames()
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
transport.transcription_settings["extra"]["punctuate"] = True
|
||||
await asyncio.gather(transport.run(), handle_transcriptions())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
120
src/examples/foundational/06a-multi-step.py
Normal file
@@ -0,0 +1,120 @@
|
||||
import aiohttp
|
||||
import asyncio
|
||||
import os
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.services.open_ai_services import OpenAILLMService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.queue_aggregators import LLMAssistantContextAggregator, LLMContextAggregator, LLMUserContextAggregator
|
||||
from examples.foundational.support.runner import configure
|
||||
from dailyai.queue_frame import LLMMessagesQueueFrame, TranscriptionQueueFrame, QueueFrame, TextQueueFrame
|
||||
from dailyai.services.ai_services import FrameLogger, AIService
|
||||
|
||||
class TranscriptFilter(AIService):
|
||||
def __init__(self, bot_participant_id=None):
|
||||
super().__init__()
|
||||
self.bot_participant_id = bot_participant_id
|
||||
print(f"Filtering transcripts from : {self.bot_participant_id}")
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, TranscriptionQueueFrame):
|
||||
if frame.participantId != self.bot_participant_id:
|
||||
yield frame
|
||||
|
||||
class ChecklistProcessor(AIService):
|
||||
def __init__(self, messages, llm, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self._current_step = 0
|
||||
self._messages = messages
|
||||
self._llm = llm
|
||||
self._id = "You are Valerie, an agent for a company called Valorant Health. Your job is to help users get access to health care. You're talking to Chad Bailey, a 40 year old male who needs to see a doctor."
|
||||
self._steps = [
|
||||
"Start by introducing yourself. Then, ask the user to confirm their identity by telling you their birthday. After the user has confirmed their identity, respond only with ABC.",
|
||||
"Now that the user has confirmed their identity, ask them to describe what kind of doctor they need to see. When the user has responded with at least one kind of doctor, respond only with ABC.",
|
||||
"Next, you need to ask the user what kind of health insurance they have. Once the user has told you what insurance company they use, respond only with ABC.",
|
||||
"Tell the user goodbye.",
|
||||
""
|
||||
]
|
||||
messages.append({"role": "system", "content": f"{self._id} {self._steps[0]}"})
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, TextQueueFrame):
|
||||
print(f"got a text frame: {frame.text}")
|
||||
if isinstance(frame, TextQueueFrame) and frame.text == "ABC":
|
||||
self._current_step += 1
|
||||
# yield TextQueueFrame(f"We should move on to Step {self._current_step}.")
|
||||
self._messages.append({"role": "system", "content": self._steps[self._current_step]})
|
||||
yield LLMMessagesQueueFrame(self._messages)
|
||||
print(f"past llmmessagesqueueframe yield")
|
||||
async for frame in llm.process_frame(LLMMessagesQueueFrame(self._messages)):
|
||||
yield frame
|
||||
else:
|
||||
yield frame
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
global transport
|
||||
global llm
|
||||
global tts
|
||||
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
5,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
transport.mic_sample_rate = 16000
|
||||
transport.camera_enabled = False
|
||||
|
||||
# llm = AzureLLMService(api_key=os.getenv("AZURE_CHATGPT_API_KEY"), endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"), model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_CHATGPT_API_KEY"))
|
||||
# tts = AzureTTSService(api_key=os.getenv("AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id="EXAVITQu4vr4xnSDxMaL")
|
||||
|
||||
messages = [
|
||||
]
|
||||
tma_in = LLMUserContextAggregator(messages, transport._my_participant_id)
|
||||
tma_out = LLMAssistantContextAggregator(messages, transport._my_participant_id)
|
||||
checklist = ChecklistProcessor(messages, llm)
|
||||
|
||||
async def handle_transcriptions():
|
||||
tf = TranscriptFilter(transport._my_participant_id)
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
checklist.run(
|
||||
tma_out.run(
|
||||
llm.run(
|
||||
tma_in.run(
|
||||
tf.run(
|
||||
transport.get_receive_frames()
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
)
|
||||
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
fl = FrameLogger("first other participant")
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
fl.run(
|
||||
tma_out.run(
|
||||
llm.run([LLMMessagesQueueFrame(messages)]),
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
transport.transcription_settings["extra"]["punctuate"] = True
|
||||
await asyncio.gather(transport.run(), handle_transcriptions())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
483
src/examples/foundational/06b-patient-intake.py
Normal file
@@ -0,0 +1,483 @@
|
||||
import aiohttp
|
||||
import asyncio
|
||||
import json
|
||||
import random
|
||||
import os
|
||||
import wave
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.services.open_ai_services import OpenAILLMService
|
||||
from dailyai.services.deepgram_ai_services import DeepgramTTSService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.queue_aggregators import LLMAssistantContextAggregator, LLMContextAggregator, LLMUserContextAggregator
|
||||
from support.runner import configure
|
||||
from dailyai.queue_frame import LLMMessagesQueueFrame, TranscriptionQueueFrame, QueueFrame, TextQueueFrame, LLMFunctionCallFrame, LLMResponseEndQueueFrame, StartStreamQueueFrame, AudioQueueFrame
|
||||
from dailyai.services.ai_services import FrameLogger, AIService
|
||||
from dailyai.conversation_wrappers import InterruptibleConversationWrapper
|
||||
|
||||
import logging
|
||||
logging.basicConfig(level=logging.ERROR)
|
||||
|
||||
sounds = {}
|
||||
sound_files = [
|
||||
'clack-short.wav',
|
||||
'clack.wav',
|
||||
'clack-short-quiet.wav'
|
||||
]
|
||||
|
||||
script_dir = os.path.dirname(__file__)
|
||||
|
||||
for file in sound_files:
|
||||
# Build the full path to the image file
|
||||
full_path = os.path.join(script_dir, "assets", file)
|
||||
# Get the filename without the extension to use as the dictionary key
|
||||
filename = os.path.splitext(os.path.basename(full_path))[0]
|
||||
# Open the image and convert it to bytes
|
||||
with wave.open(full_path) as audio_file:
|
||||
sounds[file] = audio_file.readframes(-1)
|
||||
|
||||
tools = [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "verify_birthday",
|
||||
"description": "Use this function to verify the user has provided their correct birthday.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"birthday": {
|
||||
"type": "string",
|
||||
"description": "The user's birthdate, including the year. The user can provide it in any format, but convert it to YYYY-MM-DD format to call this function."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_prescriptions",
|
||||
"description": "Once the user has provided a list of their prescription medications, call this function.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"prescriptions": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "The medication's name"
|
||||
},
|
||||
"dosage": {
|
||||
"type": "string",
|
||||
"description": "The prescription's dosage"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_allergies",
|
||||
"description": "Once the user has provided a list of their allergies, call this function.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"allergies": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "What the user is allergic to"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_conditions",
|
||||
"description": "Once the user has provided a list of their medical conditions, call this function.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"conditions": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "The user's medical condition"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_visit_reasons",
|
||||
"description": "Once the user has provided a list of the reasons they are visiting a doctor today, call this function.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"visit_reasons": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "The user's reason for visiting the doctor"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
steps = [
|
||||
{
|
||||
"prompt": "Start by introducing yourself. Then, ask the user to confirm their identity by telling you their birthday, including the year. When they answer with their birthday, call the verify_birthday function.",
|
||||
"run_async": False,
|
||||
"failed": "The user provided an incorrect birthday. Ask them for their birthday again. When they answer, call the verify_birthday function.", "tools": [{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "verify_birthday",
|
||||
"description": "Use this function to verify the user has provided their correct birthday.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"birthday": {
|
||||
"type": "string",
|
||||
"description": "The user's birthdate, including the year. The user can provide it in any format, but convert it to YYYY-MM-DD format to call this function."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}]},
|
||||
{
|
||||
"prompt": "Next, thank the user for confirming their identity, then ask the user to list their current prescriptions. Each prescription needs to have a medication name and a dosage. Do not call the list_prescriptions function with any unknown dosages.",
|
||||
"run_async": True,
|
||||
"tools": [{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_prescriptions",
|
||||
"description": "Once the user has provided a list of their prescription medications, call this function.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"prescriptions": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"medication": {
|
||||
"type": "string",
|
||||
"description": "The medication's name"
|
||||
},
|
||||
"dosage": {
|
||||
"type": "string",
|
||||
"description": "The prescription's dosage"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}]
|
||||
},
|
||||
{
|
||||
"prompt": "Next, ask the user if they have any allergies. Once they have listed their allergies or confirmed they don't have any, call the list_allergies function.",
|
||||
"run_async": True,
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_allergies",
|
||||
"description": "Once the user has provided a list of their allergies, call this function.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"allergies": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "What the user is allergic to"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"prompt": "Now ask the user if they have any medical conditions the doctor should know about. Once they've answered the question, call the list_conditions function.",
|
||||
"run_async": True,
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_conditions",
|
||||
"description": "Once the user has provided a list of their medical conditions, call this function.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"conditions": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "The user's medical condition"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
"prompt": "Finally, ask the user the reason for their doctor visit today. Once they answer, double-check to make sure they don't have any other health concerns. After that, call the list_visit_reasons function.",
|
||||
"run_async": True,
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_visit_reasons",
|
||||
"description": "Once the user has provided a list of the reasons they are visiting a doctor today, call this function.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"visit_reasons": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "The user's reason for visiting the doctor"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{"prompt": "Now, thank the user and end the conversation.", "run_async": True, "tools": []},
|
||||
{"prompt": "", "run_async": True, "tools": []}
|
||||
]
|
||||
current_step = 0
|
||||
|
||||
class TranscriptFilter(AIService):
|
||||
def __init__(self, bot_participant_id=None):
|
||||
super().__init__()
|
||||
self.bot_participant_id = bot_participant_id
|
||||
print(f"Filtering transcripts from : {self.bot_participant_id}")
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, TranscriptionQueueFrame):
|
||||
if frame.participantId != self.bot_participant_id:
|
||||
yield frame
|
||||
|
||||
|
||||
class ChecklistProcessor(AIService):
|
||||
def __init__(self, messages, llm, tools, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self._messages = messages
|
||||
self._llm = llm
|
||||
self._tools = tools
|
||||
self._function_name = ""
|
||||
self._arguments = ""
|
||||
self._id = "You are Jessica, an agent for a company called Tri-County Advanced Optimum Health Solution Specialists. Your job is to collect important information from the user before they visit a doctor. You're talking to Chad Bailey. You should address the user by their first name and be polite and professional. You're not a medical professional, so you shouldn't provide any advice. Keep your responses short. Your job is to collect information to give to a doctor. Don't make assumptions about what values to plug into functions. Ask for clarification if a user response is ambiguous."
|
||||
self._acks = [ "One sec.", "Let me confirm that.", "Thanks.", "OK."]
|
||||
|
||||
messages.append(
|
||||
{"role": "system", "content": f"{self._id} {steps[0]['prompt']}"})
|
||||
|
||||
def verify_birthday(self, args):
|
||||
return args['birthday'] == "1983-08-19"
|
||||
|
||||
def list_prescriptions(self, args):
|
||||
print(f"Prescriptions: {args['prescriptions']}")
|
||||
|
||||
def list_allergies(self, args):
|
||||
print(f"Allergies: {args['allergies']}")
|
||||
|
||||
def list_conditions(self, args):
|
||||
print(f"Medical Conditions: {args['conditions']}")
|
||||
|
||||
def list_visit_reasons(self, args):
|
||||
print(f"Visit Reasons: {args['visit_reasons']}")
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
global current_step
|
||||
this_step = steps[current_step]
|
||||
# TODO-CB: forcing a global here :/
|
||||
self._tools.clear()
|
||||
self._tools.extend(this_step['tools'])
|
||||
if isinstance(frame, LLMFunctionCallFrame) and frame.function_name:
|
||||
print(f"FUNCTION CALL: {frame}")
|
||||
self._function_name = frame.function_name
|
||||
if this_step['run_async']:
|
||||
# Get the LLM talking about the next step before getting the rest
|
||||
# of the function call completion
|
||||
current_step += 1
|
||||
# yield TextQueueFrame(f"We should move on to Step {current_step}.")
|
||||
self._messages.append({
|
||||
"role": "system", "content": steps[current_step]['prompt']})
|
||||
# yield LLMMessagesQueueFrame(self._messages)
|
||||
yield LLMMessagesQueueFrame(self._messages)
|
||||
async for frame in llm.process_frame(LLMMessagesQueueFrame(self._messages), tool_choice="none"):
|
||||
yield frame
|
||||
else:
|
||||
# Insert a quick response while we run the function
|
||||
yield AudioQueueFrame(sounds["clack-short-quiet.wav"])
|
||||
elif isinstance(frame, LLMFunctionCallFrame) and frame.arguments:
|
||||
self._arguments += frame.arguments
|
||||
elif isinstance(frame, LLMResponseEndQueueFrame):
|
||||
print(
|
||||
f"%%% got a response end. function_name is {self._function_name}, arguments is {self._arguments}")
|
||||
print(f"%%%% messages is {self._messages}")
|
||||
|
||||
if self._function_name and self._arguments:
|
||||
|
||||
fn = getattr(self, self._function_name)
|
||||
print(f"fn is: {fn}")
|
||||
result = fn(json.loads(self._arguments))
|
||||
self._function_name = ""
|
||||
self._arguments = ""
|
||||
if not this_step['run_async']:
|
||||
if result:
|
||||
current_step += 1
|
||||
# yield TextQueueFrame(f"We should move on to Step {current_step}.")
|
||||
self._messages.append({
|
||||
"role": "system", "content": steps[current_step]['prompt']})
|
||||
# yield LLMMessagesQueueFrame(self._messages)
|
||||
yield LLMMessagesQueueFrame(self._messages)
|
||||
async for frame in llm.process_frame(LLMMessagesQueueFrame(self._messages), tool_choice="none"):
|
||||
yield frame
|
||||
else:
|
||||
self._messages.append({
|
||||
"role": "system", "content": this_step['failed']})
|
||||
# yield LLMMessagesQueueFrame(self._messages)
|
||||
yield LLMMessagesQueueFrame(self._messages)
|
||||
async for frame in llm.process_frame(LLMMessagesQueueFrame(self._messages), tool_choice="none"):
|
||||
yield frame
|
||||
print(f"VERIFY RESULT: {result}")
|
||||
|
||||
else:
|
||||
yield frame
|
||||
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
global transport
|
||||
global llm
|
||||
global tts
|
||||
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
5,
|
||||
mic_enabled=True,
|
||||
mic_sample_rate=16000,
|
||||
camera_enabled=False,
|
||||
start_transcription=True,
|
||||
vad_enabled=True
|
||||
)
|
||||
|
||||
messages = []
|
||||
tools = []
|
||||
|
||||
# llm = AzureLLMService(api_key=os.getenv("AZURE_CHATGPT_API_KEY"), endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"), model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
llm = OpenAILLMService(api_key=os.getenv(
|
||||
"OPENAI_CHATGPT_API_KEY"), model="gpt-4-turbo-preview", tools=tools)
|
||||
# tts = AzureTTSService(api_key=os.getenv(
|
||||
# "AZURE_SPEECH_API_KEY"), region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv(
|
||||
"ELEVENLABS_API_KEY"), voice_id="XrExE9yKIg1WjnnlVkGX") # matilda
|
||||
# tts = DeepgramTTSService(aiohttp_session=session, api_key=os.getenv("DEEPGRAM_API_KEY"), voice=os.getenv("DEEPGRAM_VOICE"))
|
||||
|
||||
tma_in = LLMUserContextAggregator(
|
||||
messages, transport._my_participant_id)
|
||||
tma_out = LLMAssistantContextAggregator(
|
||||
messages, transport._my_participant_id)
|
||||
checklist = ChecklistProcessor(messages, llm, tools)
|
||||
fl = FrameLogger("got transcript")
|
||||
fl2 = FrameLogger("just above the checklist")
|
||||
|
||||
async def run_response(user_speech, tma_in, tma_out):
|
||||
tf = TranscriptFilter(transport._my_participant_id)
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
checklist.run(
|
||||
tma_out.run(
|
||||
llm.run(
|
||||
tma_in.run(
|
||||
[StartStreamQueueFrame(), TextQueueFrame(user_speech)]
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
fl = FrameLogger("first other participant")
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
fl.run(
|
||||
tma_out.run(
|
||||
llm.run([LLMMessagesQueueFrame(messages)]),
|
||||
)
|
||||
)
|
||||
)
|
||||
transport.transcription_settings["extra"]["endpointing"] = True
|
||||
transport.transcription_settings["extra"]["punctuate"] = True
|
||||
try:
|
||||
await asyncio.gather(transport.run(), transport.run_conversation(run_response))
|
||||
except (asyncio.CancelledError, KeyboardInterrupt):
|
||||
print('whoops')
|
||||
transport.stop()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
72
src/examples/foundational/07-interruptible.py
Normal file
@@ -0,0 +1,72 @@
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
from dailyai.conversation_wrappers import InterruptibleConversationWrapper
|
||||
|
||||
from dailyai.queue_frame import StartStreamQueueFrame, TextQueueFrame
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.services.open_ai_services import OpenAILLMService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
duration_minutes=5,
|
||||
start_transcription=True,
|
||||
mic_enabled=True,
|
||||
mic_sample_rate=16000,
|
||||
camera_enabled=False,
|
||||
)
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
tts = AzureTTSService(
|
||||
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
|
||||
region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
|
||||
async def run_response(user_speech, tma_in, tma_out):
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
tma_out.run(
|
||||
llm.run(
|
||||
tma_in.run(
|
||||
[StartStreamQueueFrame(), TextQueueFrame(user_speech)]
|
||||
)
|
||||
)
|
||||
),
|
||||
)
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await tts.say("Hi, I'm listening!", transport.send_queue)
|
||||
|
||||
async def run_conversation():
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
|
||||
]
|
||||
|
||||
conversation_wrapper = InterruptibleConversationWrapper(
|
||||
frame_generator=transport.get_receive_frames,
|
||||
runner=run_response,
|
||||
interrupt=transport.interrupt,
|
||||
my_participant_id=transport._my_participant_id,
|
||||
llm_messages=messages,
|
||||
)
|
||||
await conversation_wrapper.run_conversation()
|
||||
|
||||
transport.transcription_settings["extra"]["punctuate"] = False
|
||||
await asyncio.gather(transport.run(), run_conversation())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
115
src/examples/foundational/08-bots-arguing.py
Normal file
@@ -0,0 +1,115 @@
|
||||
import aiohttp
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.services.fal_ai_services import FalImageGenService
|
||||
from dailyai.queue_frame import AudioQueueFrame, ImageQueueFrame
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
async def main(room_url: str):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Respond bot",
|
||||
duration_minutes=10,
|
||||
mic_enabled=True,
|
||||
mic_sample_rate=16000,
|
||||
camera_enabled=True,
|
||||
camera_width=1024,
|
||||
camera_height=1024
|
||||
)
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
tts1 = AzureTTSService(
|
||||
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
|
||||
region=os.getenv("AZURE_SPEECH_REGION"))
|
||||
tts2 = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id="jBpfuIE2acCO8z3wKNLl")
|
||||
dalle = FalImageGenService(
|
||||
image_size="1024x1024",
|
||||
aiohttp_session=session,
|
||||
key_id=os.getenv("FAL_KEY_ID"),
|
||||
key_secret=os.getenv("FAL_KEY_SECRET"))
|
||||
|
||||
bot1_messages = [
|
||||
{"role": "system", "content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long."},
|
||||
]
|
||||
bot2_messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich."},
|
||||
]
|
||||
|
||||
async def get_bot1_statement():
|
||||
# Run the LLMs synchronously for the back-and-forth
|
||||
bot1_msg = await llm.run_llm(bot1_messages)
|
||||
print(f"bot1_msg: {bot1_msg}")
|
||||
if bot1_msg:
|
||||
bot1_messages.append({"role": "assistant", "content": bot1_msg})
|
||||
bot2_messages.append({"role": "user", "content": bot1_msg})
|
||||
|
||||
all_audio = bytearray()
|
||||
async for audio in tts1.run_tts(bot1_msg):
|
||||
all_audio.extend(audio)
|
||||
|
||||
return all_audio
|
||||
|
||||
async def get_bot2_statement():
|
||||
# Run the LLMs synchronously for the back-and-forth
|
||||
bot2_msg = await llm.run_llm(bot2_messages)
|
||||
print(f"bot2_msg: {bot2_msg}")
|
||||
if bot2_msg:
|
||||
bot2_messages.append({"role": "assistant", "content": bot2_msg})
|
||||
bot1_messages.append({"role": "user", "content": bot2_msg})
|
||||
|
||||
all_audio = bytearray()
|
||||
async for audio in tts2.run_tts(bot2_msg):
|
||||
all_audio.extend(audio)
|
||||
|
||||
return all_audio
|
||||
|
||||
async def argue():
|
||||
for i in range(100):
|
||||
print(f"In iteration {i}")
|
||||
|
||||
bot1_description = "A woman conservatively dressed as a librarian in a library surrounded by books, cartoon, serious, highly detailed"
|
||||
|
||||
(audio1, image_data1) = await asyncio.gather(
|
||||
get_bot1_statement(), dalle.run_image_gen(bot1_description)
|
||||
)
|
||||
await transport.send_queue.put(
|
||||
[
|
||||
ImageQueueFrame(None, image_data1[1]),
|
||||
AudioQueueFrame(audio1),
|
||||
]
|
||||
)
|
||||
|
||||
bot2_description = "A cat dressed in a hot dog costume, cartoon, bright colors, funny, highly detailed"
|
||||
|
||||
(audio2, image_data2) = await asyncio.gather(
|
||||
get_bot2_statement(), dalle.run_image_gen(bot2_description)
|
||||
)
|
||||
await transport.send_queue.put(
|
||||
[
|
||||
ImageQueueFrame(None, image_data2[1]),
|
||||
AudioQueueFrame(audio2),
|
||||
]
|
||||
)
|
||||
|
||||
await asyncio.gather(transport.run(), argue())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url))
|
||||
179
src/examples/foundational/10-wake-word.py
Normal file
@@ -0,0 +1,179 @@
|
||||
import aiohttp
|
||||
import asyncio
|
||||
import os
|
||||
import random
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from PIL import Image
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.queue_aggregators import LLMUserContextAggregator, LLMAssistantContextAggregator
|
||||
from dailyai.queue_frame import (
|
||||
QueueFrame,
|
||||
TextQueueFrame,
|
||||
ImageQueueFrame,
|
||||
SpriteQueueFrame,
|
||||
TranscriptionQueueFrame,
|
||||
)
|
||||
from dailyai.services.ai_services import AIService
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
sprites = {}
|
||||
image_files = [
|
||||
'sc-default.png',
|
||||
'sc-talk.png',
|
||||
'sc-listen-1.png',
|
||||
'sc-think-1.png',
|
||||
'sc-think-2.png',
|
||||
'sc-think-3.png',
|
||||
'sc-think-4.png'
|
||||
]
|
||||
|
||||
script_dir = os.path.dirname(__file__)
|
||||
|
||||
for file in image_files:
|
||||
# Build the full path to the image file
|
||||
full_path = os.path.join(script_dir, "assets", file)
|
||||
# Get the filename without the extension to use as the dictionary key
|
||||
filename = os.path.splitext(os.path.basename(full_path))[0]
|
||||
# Open the image and convert it to bytes
|
||||
with Image.open(full_path) as img:
|
||||
sprites[file] = img.tobytes()
|
||||
|
||||
# When the bot isn't talking, show a static image of the cat listening
|
||||
quiet_frame = ImageQueueFrame("", sprites["sc-listen-1.png"])
|
||||
# When the bot is talking, build an animation from two sprites
|
||||
talking_list = [sprites['sc-default.png'], sprites['sc-talk.png']]
|
||||
talking = [random.choice(talking_list) for x in range(30)]
|
||||
talking_frame = SpriteQueueFrame(images=talking)
|
||||
|
||||
# TODO: Support "thinking" as soon as we get a valid transcript, while LLM is processing
|
||||
thinking_list = [
|
||||
sprites['sc-think-1.png'],
|
||||
sprites['sc-think-2.png'],
|
||||
sprites['sc-think-3.png'],
|
||||
sprites['sc-think-4.png']]
|
||||
thinking_frame = SpriteQueueFrame(images=thinking_list)
|
||||
|
||||
|
||||
class TranscriptFilter(AIService):
|
||||
def __init__(self, bot_participant_id=None):
|
||||
self.bot_participant_id = bot_participant_id
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, TranscriptionQueueFrame):
|
||||
if frame.participantId != self.bot_participant_id:
|
||||
yield frame
|
||||
|
||||
|
||||
class NameCheckFilter(AIService):
|
||||
def __init__(self, names: list[str]):
|
||||
self.names = names
|
||||
self.sentence = ""
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
content: str = ""
|
||||
|
||||
# TODO: split up transcription by participant
|
||||
if isinstance(frame, TextQueueFrame):
|
||||
content = frame.text
|
||||
|
||||
self.sentence += content
|
||||
if self.sentence.endswith((".", "?", "!")):
|
||||
if any(name in self.sentence for name in self.names):
|
||||
out = self.sentence
|
||||
self.sentence = ""
|
||||
yield TextQueueFrame(out)
|
||||
else:
|
||||
out = self.sentence
|
||||
self.sentence = ""
|
||||
|
||||
|
||||
class ImageSyncAggregator(AIService):
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
yield talking_frame
|
||||
yield frame
|
||||
yield quiet_frame
|
||||
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Santa Cat",
|
||||
duration_minutes=3,
|
||||
start_transcription=True,
|
||||
mic_enabled=True,
|
||||
mic_sample_rate=16000,
|
||||
camera_enabled=True,
|
||||
camera_width=720,
|
||||
camera_height=1280
|
||||
)
|
||||
transport._mic_enabled = True
|
||||
transport._mic_sample_rate = 16000
|
||||
transport._camera_enabled = True
|
||||
transport._camera_width = 720
|
||||
transport._camera_height = 1280
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id="jBpfuIE2acCO8z3wKNLl")
|
||||
isa = ImageSyncAggregator()
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await tts.say("Hi! If you want to talk to me, just say 'hey Santa Cat'.", transport.send_queue)
|
||||
|
||||
async def handle_transcriptions():
|
||||
messages = [
|
||||
{"role": "system", "content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while. Your responses should only be a few sentences long."},
|
||||
]
|
||||
|
||||
tma_in = LLMUserContextAggregator(
|
||||
messages, transport._my_participant_id
|
||||
)
|
||||
tma_out = LLMAssistantContextAggregator(
|
||||
messages, transport._my_participant_id
|
||||
)
|
||||
tf = TranscriptFilter(transport._my_participant_id)
|
||||
ncf = NameCheckFilter(["Santa Cat", "Santa"])
|
||||
await tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
isa.run(
|
||||
tma_out.run(
|
||||
llm.run(
|
||||
tma_in.run(
|
||||
ncf.run(
|
||||
tf.run(
|
||||
transport.get_receive_frames()
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
async def starting_image():
|
||||
await transport.send_queue.put(quiet_frame)
|
||||
|
||||
transport.transcription_settings["extra"]["punctuate"] = True
|
||||
await asyncio.gather(transport.run(), handle_transcriptions(), starting_image())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
131
src/examples/foundational/11-sound-effects.py
Normal file
@@ -0,0 +1,131 @@
|
||||
import aiohttp
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import wave
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.queue_aggregators import LLMContextAggregator, LLMUserContextAggregator, LLMAssistantContextAggregator
|
||||
from dailyai.services.ai_services import AIService, FrameLogger
|
||||
from dailyai.queue_frame import QueueFrame, AudioQueueFrame, LLMResponseEndQueueFrame, LLMMessagesQueueFrame
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s") # or whatever
|
||||
logger = logging.getLogger("dailyai")
|
||||
logger.setLevel(logging.DEBUG)
|
||||
|
||||
sounds = {}
|
||||
sound_files = [
|
||||
'ding1.wav',
|
||||
'ding2.wav'
|
||||
]
|
||||
|
||||
script_dir = os.path.dirname(__file__)
|
||||
|
||||
for file in sound_files:
|
||||
# Build the full path to the image file
|
||||
full_path = os.path.join(script_dir, "assets", file)
|
||||
# Get the filename without the extension to use as the dictionary key
|
||||
filename = os.path.splitext(os.path.basename(full_path))[0]
|
||||
# Open the image and convert it to bytes
|
||||
with wave.open(full_path) as audio_file:
|
||||
sounds[file] = audio_file.readframes(-1)
|
||||
|
||||
|
||||
class OutboundSoundEffectWrapper(AIService):
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, LLMResponseEndQueueFrame):
|
||||
yield AudioQueueFrame(sounds["ding1.wav"])
|
||||
# In case anything else up the stack needs it
|
||||
yield frame
|
||||
else:
|
||||
yield frame
|
||||
|
||||
|
||||
class InboundSoundEffectWrapper(AIService):
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, LLMMessagesQueueFrame):
|
||||
yield AudioQueueFrame(sounds["ding2.wav"])
|
||||
# In case anything else up the stack needs it
|
||||
yield frame
|
||||
else:
|
||||
yield frame
|
||||
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
duration_minutes=5,
|
||||
mic_enabled=True,
|
||||
mic_sample_rate=16000,
|
||||
camera_enabled=False
|
||||
)
|
||||
|
||||
llm = AzureLLMService(
|
||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||
model=os.getenv("AZURE_CHATGPT_MODEL"))
|
||||
tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id="ErXwobaYiN019PkySvjV")
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await tts.say("Hi, I'm listening!", transport.send_queue)
|
||||
await transport.send_queue.put(AudioQueueFrame(sounds["ding1.wav"]))
|
||||
|
||||
async def handle_transcriptions():
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
|
||||
]
|
||||
|
||||
tma_in = LLMUserContextAggregator(
|
||||
messages, transport._my_participant_id
|
||||
)
|
||||
tma_out = LLMAssistantContextAggregator(
|
||||
messages, transport._my_participant_id
|
||||
)
|
||||
out_sound = OutboundSoundEffectWrapper()
|
||||
in_sound = InboundSoundEffectWrapper()
|
||||
fl = FrameLogger("LLM Out")
|
||||
fl2 = FrameLogger("Transcription In")
|
||||
await out_sound.run_to_queue(
|
||||
transport.send_queue,
|
||||
tts.run(
|
||||
fl.run(
|
||||
tma_out.run(
|
||||
llm.run(
|
||||
fl2.run(
|
||||
in_sound.run(
|
||||
tma_in.run(
|
||||
transport.get_receive_frames()
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
transport.transcription_settings["extra"]["punctuate"] = True
|
||||
await asyncio.gather(transport.run(), handle_transcriptions())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
39
src/examples/foundational/13-whisper-transcription.py
Normal file
@@ -0,0 +1,39 @@
|
||||
import asyncio
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.whisper_ai_services import WhisperSTTService
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
|
||||
async def main(room_url: str):
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Transcription bot",
|
||||
start_transcription=True,
|
||||
mic_enabled=False,
|
||||
camera_enabled=False,
|
||||
speaker_enabled=True
|
||||
)
|
||||
|
||||
stt = WhisperSTTService()
|
||||
transcription_output_queue = asyncio.Queue()
|
||||
|
||||
async def handle_transcription():
|
||||
print("`````````TRANSCRIPTION`````````")
|
||||
while True:
|
||||
item = await transcription_output_queue.get()
|
||||
print(item.text)
|
||||
|
||||
async def handle_speaker():
|
||||
await stt.run_to_queue(
|
||||
transcription_output_queue,
|
||||
transport.get_receive_frames()
|
||||
)
|
||||
await asyncio.gather(transport.run(), handle_speaker(), handle_transcription())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url))
|
||||
59
src/examples/foundational/13a-whisper-local.py
Normal file
@@ -0,0 +1,59 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
import wave
|
||||
from dailyai.queue_frame import EndStreamQueueFrame, TranscriptionQueueFrame
|
||||
|
||||
from dailyai.services.local_transport_service import LocalTransportService
|
||||
from dailyai.services.whisper_ai_services import WhisperSTTService
|
||||
|
||||
|
||||
async def main(room_url: str):
|
||||
global transport
|
||||
global stt
|
||||
|
||||
meeting_duration_minutes = 1
|
||||
transport = LocalTransportService(
|
||||
mic_enabled=True,
|
||||
camera_enabled=False,
|
||||
speaker_enabled=True,
|
||||
duration_minutes=meeting_duration_minutes,
|
||||
start_transcription=True
|
||||
)
|
||||
stt = WhisperSTTService()
|
||||
transcription_output_queue = asyncio.Queue()
|
||||
transport_done = asyncio.Event()
|
||||
|
||||
async def handle_transcription():
|
||||
print("`````````TRANSCRIPTION`````````")
|
||||
while not transport_done.is_set():
|
||||
item = await transcription_output_queue.get()
|
||||
print("got item from queue", item)
|
||||
if isinstance(item, TranscriptionQueueFrame):
|
||||
print(item.text)
|
||||
elif isinstance(item, EndStreamQueueFrame):
|
||||
break
|
||||
print("handle_transcription done")
|
||||
|
||||
async def handle_speaker():
|
||||
await stt.run_to_queue(
|
||||
transcription_output_queue, transport.get_receive_frames()
|
||||
)
|
||||
await transcription_output_queue.put(EndStreamQueueFrame())
|
||||
print("handle speaker done.")
|
||||
|
||||
async def run_until_done():
|
||||
await transport.run()
|
||||
transport_done.set()
|
||||
print("run_until_done done")
|
||||
|
||||
await asyncio.gather(run_until_done(), handle_speaker(), handle_transcription())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
asyncio.run(main(args.url))
|
||||
BIN
src/examples/foundational/assets/clack-short-quiet.wav
Normal file
BIN
src/examples/foundational/assets/clack-short.wav
Normal file
BIN
src/examples/foundational/assets/clack.wav
Normal file
BIN
src/examples/foundational/assets/ding1.wav
Normal file
BIN
src/examples/foundational/assets/ding2.wav
Normal file
|
Before Width: | Height: | Size: 871 KiB After Width: | Height: | Size: 871 KiB |
|
Before Width: | Height: | Size: 868 KiB After Width: | Height: | Size: 868 KiB |
BIN
src/examples/foundational/assets/sc-listen-2.png
Normal file
|
After Width: | Height: | Size: 868 KiB |
|
Before Width: | Height: | Size: 870 KiB After Width: | Height: | Size: 870 KiB |
|
Before Width: | Height: | Size: 871 KiB After Width: | Height: | Size: 871 KiB |
BIN
src/examples/foundational/assets/sc-think-2.png
Normal file
|
After Width: | Height: | Size: 871 KiB |
BIN
src/examples/foundational/assets/sc-think-3.png
Normal file
|
After Width: | Height: | Size: 872 KiB |
BIN
src/examples/foundational/assets/sc-think-4.png
Normal file
|
After Width: | Height: | Size: 868 KiB |
BIN
src/examples/foundational/assets/speaking.png
Normal file
|
After Width: | Height: | Size: 33 KiB |
BIN
src/examples/foundational/assets/waiting.png
Normal file
|
After Width: | Height: | Size: 30 KiB |
53
src/examples/foundational/support/runner.py
Normal file
@@ -0,0 +1,53 @@
|
||||
import argparse
|
||||
import os
|
||||
import time
|
||||
import urllib
|
||||
import requests
|
||||
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv()
|
||||
|
||||
|
||||
def configure():
|
||||
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-k",
|
||||
"--apikey",
|
||||
type=str,
|
||||
required=False,
|
||||
help="Daily API Key (needed to create an owner token for the room)",
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
|
||||
key = args.apikey or os.getenv("DAILY_API_KEY")
|
||||
|
||||
if not url:
|
||||
raise Exception(
|
||||
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL.")
|
||||
|
||||
if not key:
|
||||
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
|
||||
|
||||
# Create a meeting token for the given room with an expiration 1 hour in the future.
|
||||
room_name: str = urllib.parse.urlparse(url).path[1:]
|
||||
expiration: float = time.time() + 60 * 60
|
||||
|
||||
res: requests.Response = requests.post(
|
||||
f"https://api.daily.co/v1/meeting-tokens",
|
||||
headers={"Authorization": f"Bearer {key}"},
|
||||
json={
|
||||
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
|
||||
},
|
||||
)
|
||||
|
||||
if res.status_code != 200:
|
||||
raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
|
||||
|
||||
token: str = res.json()["token"]
|
||||
|
||||
return (url, token)
|
||||
@@ -11,7 +11,8 @@ from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.fal_ai_services import FalImageGenService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
|
||||
async def main(room_url:str, token):
|
||||
|
||||
async def main(room_url: str, token):
|
||||
global transport
|
||||
global llm
|
||||
global tts
|
||||
@@ -22,26 +23,25 @@ async def main(room_url:str, token):
|
||||
"Imagebot",
|
||||
1,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
transport.camera_enabled = True
|
||||
transport.mic_sample_rate = 16000
|
||||
transport.camera_width = 1024
|
||||
transport.camera_height = 1024
|
||||
transport._mic_enabled = True
|
||||
transport._camera_enabled = True
|
||||
transport._mic_sample_rate = 16000
|
||||
transport._camera_width = 1024
|
||||
transport._camera_height = 1024
|
||||
|
||||
llm = AzureLLMService()
|
||||
tts = AzureTTSService()
|
||||
img = FalImageGenService()
|
||||
|
||||
|
||||
async def handle_transcriptions():
|
||||
print("handle_transcriptions got called")
|
||||
|
||||
sentence = ""
|
||||
async for message in transport.get_transcriptions():
|
||||
print(f"transcription message: {message}")
|
||||
if message["session_id"] == transport.my_participant_id:
|
||||
if message["session_id"] == transport._my_participant_id:
|
||||
continue
|
||||
finder = message["text"].find("start over")
|
||||
finder = message["text"].find("start over")
|
||||
print(f"finder: {finder}")
|
||||
if finder >= 0:
|
||||
async for audio in tts.run_tts(f"Resetting."):
|
||||
@@ -69,7 +69,8 @@ async def main(room_url:str, token):
|
||||
if participant["info"]["isLocal"]:
|
||||
return
|
||||
async for audio in tts.run_tts("Describe an image, and I'll create it."):
|
||||
audio_generator = tts.run_tts(f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
|
||||
audio_generator = tts.run_tts(
|
||||
f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
|
||||
async for audio in audio_generator:
|
||||
transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
|
||||
|
||||
134
src/examples/internal/11a-dial-out.py
Normal file
@@ -0,0 +1,134 @@
|
||||
import aiohttp
|
||||
import asyncio
|
||||
import os
|
||||
import wave
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.queue_aggregators import LLMContextAggregator
|
||||
from dailyai.services.ai_services import AIService, FrameLogger
|
||||
from dailyai.queue_frame import QueueFrame, AudioQueueFrame, LLMResponseEndQueueFrame, LLMMessagesQueueFrame
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from examples.foundational.support.runner import configure
|
||||
|
||||
sounds = {}
|
||||
sound_files = [
|
||||
'ding1.wav',
|
||||
'ding2.wav'
|
||||
]
|
||||
|
||||
script_dir = os.path.dirname(__file__)
|
||||
|
||||
for file in sound_files:
|
||||
# Build the full path to the image file
|
||||
full_path = os.path.join(script_dir, "assets", file)
|
||||
# Get the filename without the extension to use as the dictionary key
|
||||
filename = os.path.splitext(os.path.basename(full_path))[0]
|
||||
# Open the image and convert it to bytes
|
||||
with wave.open(full_path) as audio_file:
|
||||
sounds[file] = audio_file.readframes(-1)
|
||||
|
||||
|
||||
class OutboundSoundEffectWrapper(AIService):
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, LLMResponseEndQueueFrame):
|
||||
yield AudioQueueFrame(sounds["ding1.wav"])
|
||||
# In case anything else up the stack needs it
|
||||
yield frame
|
||||
else:
|
||||
yield frame
|
||||
|
||||
|
||||
class InboundSoundEffectWrapper(AIService):
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
async def process_frame(self, frame: QueueFrame) -> AsyncGenerator[QueueFrame, None]:
|
||||
if isinstance(frame, LLMMessagesQueueFrame):
|
||||
yield AudioQueueFrame(sounds["ding2.wav"])
|
||||
# In case anything else up the stack needs it
|
||||
yield frame
|
||||
else:
|
||||
yield frame
|
||||
|
||||
|
||||
async def main(room_url: str, token, phone):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
|
||||
global transport
|
||||
global llm
|
||||
global tts
|
||||
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
300,
|
||||
)
|
||||
transport._mic_enabled = True
|
||||
transport._mic_sample_rate = 16000
|
||||
transport._camera_enabled = False
|
||||
|
||||
llm = AzureLLMService()
|
||||
tts = AzureTTSService()
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await tts.say("Hi, I'm listening!", transport.send_queue)
|
||||
await transport.send_queue.put(AudioQueueFrame(sounds["ding1.wav"]))
|
||||
|
||||
async def handle_transcriptions():
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
|
||||
]
|
||||
|
||||
tma_in = LLMContextAggregator(
|
||||
messages, "user", transport._my_participant_id
|
||||
)
|
||||
tma_out = LLMContextAggregator(
|
||||
messages, "assistant", transport._my_participant_id
|
||||
)
|
||||
out_sound = OutboundSoundEffectWrapper()
|
||||
in_sound = InboundSoundEffectWrapper()
|
||||
fl = FrameLogger("LLM Out")
|
||||
fl2 = FrameLogger("Transcription In")
|
||||
await out_sound.run_to_queue(
|
||||
transport.send_queue,
|
||||
tts.run(
|
||||
tma_out.run(
|
||||
llm.run(
|
||||
fl2.run(
|
||||
in_sound.run(
|
||||
tma_in.run(
|
||||
transport.get_receive_frames()
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
@transport.event_handler("on_participant_joined")
|
||||
async def pax_joined(transport, pax):
|
||||
print(f"PARTICIPANT JOINED: {pax}")
|
||||
|
||||
@transport.event_handler("on_call_state_updated")
|
||||
async def on_call_state_updated(transport, state):
|
||||
if (state == "joined"):
|
||||
if (phone):
|
||||
transport.start_recording()
|
||||
transport.dialout(phone)
|
||||
|
||||
transport.transcription_settings["extra"]["punctuate"] = True
|
||||
|
||||
await asyncio.gather(transport.run(), handle_transcriptions())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
39
src/examples/server/Dockerfile
Normal file
@@ -0,0 +1,39 @@
|
||||
# setup
|
||||
FROM python:3.11.5
|
||||
|
||||
WORKDIR /app
|
||||
COPY requirements.txt /app
|
||||
COPY *.py /app
|
||||
COPY pyproject.toml /app
|
||||
|
||||
COPY src/ /app/src/
|
||||
|
||||
WORKDIR /app
|
||||
RUN ls --recursive /app/
|
||||
RUN pip3 install --upgrade -r requirements.txt
|
||||
RUN python -m build .
|
||||
RUN pip3 install .
|
||||
|
||||
# If running on Ubuntu, Azure TTS requires some extra config
|
||||
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi
|
||||
|
||||
RUN wget -O - https://www.openssl.org/source/openssl-1.1.1w.tar.gz | tar zxf -
|
||||
WORKDIR openssl-1.1.1w
|
||||
RUN ./config --prefix=/usr/local
|
||||
RUN make -j $(nproc)
|
||||
RUN make install_sw install_ssldirs
|
||||
RUN ldconfig -v
|
||||
ENV SSL_CERT_DIR=/etc/ssl/certs
|
||||
|
||||
#ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
|
||||
RUN apt clean
|
||||
RUN apt-get update
|
||||
RUN apt-get -y install build-essential libssl-dev ca-certificates libasound2 wget
|
||||
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
EXPOSE 8000
|
||||
# run
|
||||
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]
|
||||
13
src/examples/server/README.md
Normal file
@@ -0,0 +1,13 @@
|
||||
# Server Example
|
||||
|
||||
This is an example server based on [Santa Cat](https://santacat.ai). You can run the server with this command:
|
||||
|
||||
```
|
||||
flask --app daily-bot-manager.py --debug run
|
||||
```
|
||||
|
||||
Once the server is started, you can load `http://127.0.0.1:5000/spin-up-kitty` in a browser, and the server will do the following:
|
||||
|
||||
- Create a new, randomly-named Daily room with `DAILY_API_KEY` from your .env file or environment
|
||||
- Start the `10-wake-word.py` example and connect it to that room
|
||||
- 301 redirect your browser to the room
|
||||
33
src/examples/server/auth.py
Normal file
@@ -0,0 +1,33 @@
|
||||
import time
|
||||
import urllib
|
||||
|
||||
from dotenv import load_dotenv
|
||||
import requests
|
||||
from flask import jsonify
|
||||
import os
|
||||
|
||||
load_dotenv()
|
||||
|
||||
|
||||
def get_meeting_token(room_name, daily_api_key, token_expiry):
|
||||
api_path = os.getenv('DAILY_API_PATH') or 'https://api.daily.co/v1'
|
||||
|
||||
if not token_expiry:
|
||||
token_expiry = time.time() + 600
|
||||
res = requests.post(
|
||||
f'{api_path}/meeting-tokens',
|
||||
headers={
|
||||
'Authorization': f'Bearer {daily_api_key}'},
|
||||
json={
|
||||
'properties': {
|
||||
'room_name': room_name,
|
||||
'is_owner': True,
|
||||
'exp': token_expiry}})
|
||||
if res.status_code != 200:
|
||||
return jsonify({'error': 'Unable to create meeting token', 'detail': res.text}), 500
|
||||
meeting_token = res.json()['token']
|
||||
return meeting_token
|
||||
|
||||
|
||||
def get_room_name(room_url):
|
||||
return urllib.parse.urlparse(room_url).path[1:]
|
||||
100
src/examples/server/daily-bot-manager.py
Normal file
@@ -0,0 +1,100 @@
|
||||
import os
|
||||
import requests
|
||||
import subprocess
|
||||
import time
|
||||
|
||||
from flask import Flask, jsonify, request, redirect
|
||||
from flask_cors import CORS
|
||||
from examples.server.auth import get_meeting_token
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
app = Flask(__name__)
|
||||
CORS(app)
|
||||
|
||||
print(f"I loaded an environment, and my FAL_KEY_ID is {os.getenv('FAL_KEY_ID')}")
|
||||
|
||||
|
||||
def start_bot(bot_path, args=None):
|
||||
daily_api_key = os.getenv("DAILY_API_KEY")
|
||||
api_path = os.getenv("DAILY_API_PATH") or "https://api.daily.co/v1"
|
||||
|
||||
timeout = int(os.getenv("DAILY_ROOM_TIMEOUT") or os.getenv("DAILY_BOT_MAX_DURATION") or 300)
|
||||
exp = time.time() + timeout
|
||||
res = requests.post(
|
||||
f"{api_path}/rooms",
|
||||
headers={"Authorization": f"Bearer {daily_api_key}"},
|
||||
json={
|
||||
"properties": {
|
||||
"exp": exp,
|
||||
"enable_chat": True,
|
||||
"enable_emoji_reactions": True,
|
||||
"eject_at_room_exp": True,
|
||||
"enable_prejoin_ui": False,
|
||||
"enable_recording": "cloud"
|
||||
}
|
||||
},
|
||||
)
|
||||
if res.status_code != 200:
|
||||
return (
|
||||
jsonify(
|
||||
{
|
||||
"error": "Unable to create room",
|
||||
"status_code": res.status_code,
|
||||
"text": res.text,
|
||||
}
|
||||
),
|
||||
500,
|
||||
)
|
||||
room_url = res.json()["url"]
|
||||
room_name = res.json()["name"]
|
||||
|
||||
meeting_token = get_meeting_token(room_name, daily_api_key, exp)
|
||||
|
||||
if args:
|
||||
extra_args = " ".join([f'-{x[0]} "{x[1]}"' for x in args])
|
||||
else:
|
||||
extra_args = ""
|
||||
|
||||
proc = subprocess.Popen(
|
||||
[
|
||||
f"python {bot_path} -u {room_url} -t {meeting_token} -k {daily_api_key} {extra_args}"
|
||||
],
|
||||
shell=True,
|
||||
bufsize=1,
|
||||
)
|
||||
|
||||
# Don't return until the bot has joined the room, but wait for at most 2 seconds.
|
||||
attempts = 0
|
||||
while attempts < 20:
|
||||
time.sleep(0.1)
|
||||
attempts += 1
|
||||
res = requests.get(
|
||||
f"{api_path}/rooms/{room_name}/get-session-data",
|
||||
headers={"Authorization": f"Bearer {daily_api_key}"},
|
||||
)
|
||||
if res.status_code == 200:
|
||||
break
|
||||
print(f"Took {attempts} attempts to join room {room_name}")
|
||||
|
||||
# Additional client config
|
||||
config = {}
|
||||
if os.getenv("CLIENT_VAD_TIMEOUT_SEC"):
|
||||
config['vad_timeout_sec'] = float(os.getenv("DAILY_CLIENT_VAD_TIMEOUT_SEC"))
|
||||
else:
|
||||
config['vad_timeout_sec'] = 1.5
|
||||
|
||||
# return jsonify({"room_url": room_url, "token": meeting_token, "config": config}), 200
|
||||
return redirect(room_url, code=301)
|
||||
|
||||
|
||||
@app.route("/spin-up-kitty", methods=["GET", "POST"])
|
||||
def spin_up_kitty():
|
||||
return start_bot("./src/examples/foundational/10-wake-word.py")
|
||||
|
||||
|
||||
@app.route("/healthz")
|
||||
def health_check():
|
||||
return "ok", 200
|
||||
@@ -1 +0,0 @@
|
||||
These samples need to be updated! Don't rely on them.
|
||||
@@ -1,91 +0,0 @@
|
||||
import argparse
|
||||
from email.mime import image
|
||||
from re import A
|
||||
import requests
|
||||
import time
|
||||
import urllib.parse
|
||||
|
||||
from dailyai.async_processor.async_processor import (
|
||||
LLMResponse,
|
||||
ConversationProcessorCollection,
|
||||
)
|
||||
from dailyai.orchestrator import OrchestratorConfig, Orchestrator
|
||||
from dailyai.message_handler.message_handler import MessageHandler
|
||||
from dailyai.services.ai_services import AIServiceConfig
|
||||
from dailyai.services.azure_ai_services import AzureImageGenService, AzureTTSService, AzureLLMService
|
||||
from dailyai.services.deepgram_ai_services import DeepgramTTSService
|
||||
|
||||
def add_bot_to_room(room_url, token, expiration) -> None:
|
||||
|
||||
# A simple prompt for a simple sample.
|
||||
message_handler = MessageHandler(
|
||||
"""
|
||||
You are a sample bot in a WebRTC session. You'll receive input as transcriptions of user's
|
||||
speech, and your responses will be converted to audio via a TTS service.
|
||||
Answer user's questions and be friendly, and if you can, give some ideas about how someone
|
||||
could use a bot like you in a more in-depth way. Because your responses will be spoken,
|
||||
try to keep them short.
|
||||
"""
|
||||
)
|
||||
|
||||
# Use Azure services for the TTS, image generation, and LLM.
|
||||
# Note that you'll need to set the following environment variables:
|
||||
# - AZURE_SPEECH_SERVICE_KEY
|
||||
# - AZURE_SPEECH_SERVICE_REGION
|
||||
# - AZURE_CHATGPT_KEY
|
||||
# - AZURE_CHATGPT_ENDPOINT
|
||||
# - AZURE_CHATGPT_DEPLOYMENT_ID
|
||||
|
||||
services = AIServiceConfig(
|
||||
tts=AzureTTSService(), image=None, llm=AzureLLMService()
|
||||
)
|
||||
|
||||
orchestrator_config = OrchestratorConfig(
|
||||
room_url=room_url,
|
||||
token=token,
|
||||
bot_name="Simple Bot",
|
||||
expiration=expiration,
|
||||
)
|
||||
|
||||
orchestrator = Orchestrator(
|
||||
orchestrator_config,
|
||||
services,
|
||||
message_handler,
|
||||
)
|
||||
orchestrator.start()
|
||||
|
||||
# When the orchestrator's done, we need to shut it down,
|
||||
# and the various services and handlers we've created.
|
||||
orchestrator.stop()
|
||||
message_handler.shutdown()
|
||||
|
||||
services.tts.close()
|
||||
services.llm.close()
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument("-u", "--url", type=str, required=True, help="URL of the Daily room")
|
||||
parser.add_argument(
|
||||
"-k", "--apikey", type=str, required=True, help="Daily API Key (needed to create token)"
|
||||
)
|
||||
|
||||
args: argparse.Namespace = parser.parse_args()
|
||||
|
||||
# Create a meeting token for the given room with an expiration 1 hour in the future.
|
||||
room_name: str = urllib.parse.urlparse(args.url).path[1:]
|
||||
expiration: float = time.time() + 60 * 60
|
||||
|
||||
res: requests.Response = requests.post(
|
||||
f"https://api.daily.co/v1/meeting-tokens",
|
||||
headers={"Authorization": f"Bearer {args.apikey}"},
|
||||
json={
|
||||
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
|
||||
},
|
||||
)
|
||||
|
||||
if res.status_code != 200:
|
||||
raise Exception(f'Failed to create meeting token: {res.status_code} {res.text}')
|
||||
|
||||
token: str = res.json()['token']
|
||||
|
||||
add_bot_to_room(args.url, token, expiration)
|
||||
@@ -1,172 +0,0 @@
|
||||
import argparse
|
||||
from email.mime import image
|
||||
import logging
|
||||
import os
|
||||
import random
|
||||
import requests
|
||||
import time
|
||||
import urllib.parse
|
||||
|
||||
from PIL import Image
|
||||
|
||||
from dailyai.async_processor.async_processor import (
|
||||
ConversationProcessorCollection,
|
||||
LLMResponse,
|
||||
OrchestratorResponse
|
||||
)
|
||||
from dailyai.orchestrator import OrchestratorConfig, Orchestrator
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.message_handler.message_handler import MessageHandler
|
||||
from dailyai.services.ai_services import AIServiceConfig
|
||||
from dailyai.services.azure_ai_services import AzureImageGenService, AzureTTSService, AzureLLMService
|
||||
|
||||
class StaticSpriteResponse(OrchestratorResponse):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
services,
|
||||
message_handler,
|
||||
output_queue
|
||||
) -> None:
|
||||
super().__init__(services, message_handler, output_queue)
|
||||
self.image_bytes:bytes | None = None
|
||||
self.filenames = None # override this in subclasses
|
||||
|
||||
def start_preparation(self) -> None:
|
||||
full_path = os.path.join(os.path.dirname(__file__), "sprites/", self.filename)
|
||||
print(full_path)
|
||||
|
||||
with Image.open(full_path) as img:
|
||||
self.image_bytes = img.tobytes()
|
||||
|
||||
def do_play(self) -> None:
|
||||
self.output_queue.put(QueueFrame(FrameType.IMAGE, self.image_bytes))
|
||||
|
||||
|
||||
class IntroSpriteResponse(StaticSpriteResponse):
|
||||
def __init__(self, services, message_handler, output_queue) -> None:
|
||||
super().__init__(services, message_handler, output_queue)
|
||||
self.filename = "intro.png"
|
||||
|
||||
|
||||
class WaitingSpriteResponse(StaticSpriteResponse):
|
||||
def __init__(self, services, message_handler, output_queue) -> None:
|
||||
super().__init__(services, message_handler, output_queue)
|
||||
self.filename = "waiting.png"
|
||||
|
||||
|
||||
class AnimatedSpriteLLMResponse(LLMResponse):
|
||||
def __init__(self, services, message_handler, output_queue) -> None:
|
||||
super().__init__(services, message_handler, output_queue)
|
||||
self.filenames = ["talk-1.png", "talk-2.png"]
|
||||
self.image_bytes = []
|
||||
|
||||
def start_preparation(self) -> None:
|
||||
super().start_preparation()
|
||||
|
||||
for filename in self.filenames:
|
||||
full_path = os.path.join(os.path.dirname(__file__), "sprites/", filename)
|
||||
print(full_path)
|
||||
|
||||
with Image.open(full_path) as img:
|
||||
self.image_bytes.append(img.tobytes())
|
||||
|
||||
def get_frames_from_tts_response(self, audio_frame) -> list[QueueFrame]:
|
||||
return [
|
||||
QueueFrame(FrameType.AUDIO, audio_frame),
|
||||
QueueFrame(FrameType.IMAGE, random.choice(self.image_bytes))
|
||||
]
|
||||
|
||||
|
||||
def add_bot_to_room(room_url, token, expiration) -> None:
|
||||
|
||||
# A simple prompt for a simple sample.
|
||||
message_handler = MessageHandler(
|
||||
"""
|
||||
You are a sample bot in a WebRTC session. You'll receive input as transcriptions of user's
|
||||
speech, and your responses will be converted to audio via a TTS service.
|
||||
Answer user's questions and be friendly, and if you can, give some ideas about how someone
|
||||
could use a bot like you in a more in-depth way. Because your responses will be spoken,
|
||||
try to keep them short.
|
||||
"""
|
||||
)
|
||||
|
||||
# Use Azure services for the TTS, image generation, and LLM.
|
||||
# Note that you'll need to set the following environment variables:
|
||||
# - AZURE_SPEECH_SERVICE_KEY
|
||||
# - AZURE_SPEECH_SERVICE_REGION
|
||||
# - AZURE_CHATGPT_KEY
|
||||
# - AZURE_CHATGPT_ENDPOINT
|
||||
# - AZURE_CHATGPT_DEPLOYMENT_ID
|
||||
#
|
||||
# This demo doesn't use image generation, but if you extend it to do so,
|
||||
# you'll also need to set:
|
||||
# - AZURE_DALLE_KEY
|
||||
# - AZURE_DALLE_ENDPOINT
|
||||
# - AZURE_DALLE_DEPLOYMENT_ID
|
||||
|
||||
services = AIServiceConfig(
|
||||
tts=AzureTTSService(), image=AzureImageGenService(), llm=AzureLLMService()
|
||||
)
|
||||
|
||||
sprite_conversation_processors = ConversationProcessorCollection(
|
||||
introduction=IntroSpriteResponse,
|
||||
waiting=WaitingSpriteResponse,
|
||||
response=AnimatedSpriteLLMResponse,
|
||||
)
|
||||
|
||||
orchestrator_config = OrchestratorConfig(
|
||||
room_url=room_url,
|
||||
token=token,
|
||||
bot_name="Simple Bot",
|
||||
expiration=expiration,
|
||||
)
|
||||
|
||||
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
|
||||
logger: logging.Logger = logging.getLogger("dailyai")
|
||||
logger.setLevel(logging.DEBUG)
|
||||
|
||||
orchestrator = Orchestrator(
|
||||
orchestrator_config,
|
||||
services,
|
||||
message_handler,
|
||||
sprite_conversation_processors
|
||||
)
|
||||
orchestrator.start()
|
||||
|
||||
# When the orchestrator's done, we need to shut it down,
|
||||
# and the various services and handlers we've created.
|
||||
orchestrator.stop()
|
||||
message_handler.shutdown()
|
||||
|
||||
services.tts.close()
|
||||
services.image.close()
|
||||
services.llm.close()
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument("-u", "--url", type=str, required=True, help="URL of the Daily room")
|
||||
parser.add_argument(
|
||||
"-k", "--apikey", type=str, required=True, help="Daily API Key (needed to create token)"
|
||||
)
|
||||
|
||||
args: argparse.Namespace = parser.parse_args()
|
||||
|
||||
# Create a meeting token for the given room with an expiration 1 hour in the future.
|
||||
room_name: str = urllib.parse.urlparse(args.url).path[1:]
|
||||
expiration: float = time.time() + 60 * 60
|
||||
|
||||
res: requests.Response = requests.post(
|
||||
f"https://api.daily.co/v1/meeting-tokens",
|
||||
headers={"Authorization": f"Bearer {args.apikey}"},
|
||||
json={
|
||||
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
|
||||
},
|
||||
)
|
||||
|
||||
if res.status_code != 200:
|
||||
raise Exception(f'Failed to create meeting token: {res.status_code} {res.text}')
|
||||
|
||||
token: str = res.json()['token']
|
||||
|
||||
add_bot_to_room(args.url, token, expiration)
|
||||
@@ -1,54 +0,0 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureTTSService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
|
||||
async def main(room_url):
|
||||
# create a transport service object using environment variables for
|
||||
# the transport service's API key, room url, and any other configuration.
|
||||
# services can all define and document the environment variables they use.
|
||||
# services all also take an optional config object that is used instead of
|
||||
# environment variables.
|
||||
#
|
||||
# the abstract transport service APIs presumably can map pretty closely
|
||||
# to the daily-python basic API
|
||||
meeting_duration_minutes = 1
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Say One Thing",
|
||||
meeting_duration_minutes,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
tts = ElevenLabsTTSService(voice_id="ErXwobaYiN019PkySvjV")
|
||||
|
||||
# Register an event handler so we can play the audio when the participant joins.
|
||||
@transport.event_handler("on_participant_joined")
|
||||
async def on_participant_joined(transport, participant):
|
||||
if participant["info"]["isLocal"]:
|
||||
return
|
||||
|
||||
await tts.say(
|
||||
"Hello there, " + participant["info"]["userName"] + "!",
|
||||
transport.send_queue,
|
||||
)
|
||||
|
||||
# wait for the output queue to be empty, then leave the meeting
|
||||
await transport.stop_when_done()
|
||||
|
||||
await transport.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
asyncio.run(main(args.url))
|
||||
@@ -1,55 +0,0 @@
|
||||
import asyncio
|
||||
import time
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureTTSService
|
||||
from dailyai.services.deepgram_ai_services import DeepgramTTSService
|
||||
|
||||
async def main(room_url):
|
||||
# create a transport service object using environment variables for
|
||||
# the transport service's API key, room url, and any other configuration.
|
||||
# services can all define and document the environment variables they use.
|
||||
# services all also take an optional config object that is used instead of
|
||||
# environment variables.
|
||||
#
|
||||
# the abstract transport service APIs presumably can map pretty closely
|
||||
# to the daily-python basic API
|
||||
meeting_duration_minutes = 1
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Greeter",
|
||||
meeting_duration_minutes,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
|
||||
# similarly, create a tts service
|
||||
tts = DeepgramTTSService()
|
||||
|
||||
# Get the generator for the audio. This will start running in the background,
|
||||
# and when we ask the generator for its items, we'll get what it's generated.
|
||||
|
||||
# Register an event handler so we can play the audio when the participant joins.
|
||||
print("settting up handler")
|
||||
@transport.event_handler("on_participant_joined")
|
||||
async def on_participant_joined(transport, participant):
|
||||
print(f"participant joined: {participant['info']['userName']}")
|
||||
if participant["info"]["isLocal"]:
|
||||
return
|
||||
audio_generator: AsyncGenerator[bytes, None] = tts.run_tts(f"Hello there, {participant['info']['userName']}!")
|
||||
|
||||
async for audio in audio_generator:
|
||||
transport.output_queue.put(QueueFrame(FrameType.AUDIO, audio))
|
||||
|
||||
print("setting up call state handler")
|
||||
@transport.event_handler("on_call_state_updated")
|
||||
async def on_call_joined(transport, state):
|
||||
print(f"call state callback: {state}")
|
||||
|
||||
await transport.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main("https://chad-hq.daily.co/howdy"))
|
||||
@@ -1,52 +0,0 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
from typing import AsyncGenerator
|
||||
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.ai_services import SentenceAggregator
|
||||
from dailyai.services.azure_ai_services import AzureLLMService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
|
||||
async def main(room_url):
|
||||
meeting_duration_minutes = 1
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Say One Thing From an LLM",
|
||||
meeting_duration_minutes,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
|
||||
tts = ElevenLabsTTSService(voice_id="29vD33N1CtxCmqQRPOHJ")
|
||||
llm = AzureLLMService()
|
||||
|
||||
messages = [{
|
||||
"role": "system",
|
||||
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world."
|
||||
}]
|
||||
tts_task = asyncio.create_task(
|
||||
tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
SentenceAggregator().run(
|
||||
llm.run([QueueFrame(FrameType.LLM_MESSAGE, messages)])
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
await tts_task
|
||||
await transport.stop_when_done()
|
||||
|
||||
await transport.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
asyncio.run(main(args.url))
|
||||
@@ -1,44 +0,0 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.open_ai_services import OpenAIImageGenService
|
||||
|
||||
local_joined = False
|
||||
participant_joined = False
|
||||
|
||||
async def main(room_url):
|
||||
meeting_duration_minutes = 1
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Show a still frame image",
|
||||
meeting_duration_minutes,
|
||||
)
|
||||
transport.mic_enabled = False
|
||||
transport.camera_enabled = True
|
||||
transport.camera_width = 1024
|
||||
transport.camera_height = 1024
|
||||
|
||||
imagegen = OpenAIImageGenService(image_size="1024x1024")
|
||||
image_task = asyncio.create_task(
|
||||
imagegen.run_to_queue(transport.send_queue, [QueueFrame(FrameType.IMAGE_DESCRIPTION, "a cat in the style of picasso")])
|
||||
)
|
||||
|
||||
@transport.event_handler("on_participant_joined")
|
||||
async def on_participant_joined(transport, participant):
|
||||
await image_task
|
||||
|
||||
await transport.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
asyncio.run(main(args.url))
|
||||
@@ -1,79 +0,0 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
import re
|
||||
|
||||
from dailyai.services.ai_services import SentenceAggregator
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
|
||||
async def main(room_url:str):
|
||||
global transport
|
||||
global llm
|
||||
global tts
|
||||
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Say Two Things Bot",
|
||||
1,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
transport.mic_sample_rate = 16000
|
||||
transport.camera_enabled = False
|
||||
|
||||
llm = AzureLLMService()
|
||||
azure_tts = AzureTTSService()
|
||||
elevenlabs_tts = ElevenLabsTTSService(voice_id="ErXwobaYiN019PkySvjV")
|
||||
|
||||
messages = [{"role": "system", "content": "tell the user a joke about llamas"}]
|
||||
|
||||
# Start a task to run the LLM to create a joke, and convert the LLM output to audio frames. This task
|
||||
# will run in parallel with generating and speaking the audio for static text, so there's no delay to
|
||||
# speak the LLM response.
|
||||
buffer_queue = asyncio.Queue()
|
||||
llm_response_task = asyncio.create_task(
|
||||
elevenlabs_tts.run_to_queue(
|
||||
buffer_queue,
|
||||
SentenceAggregator().run(
|
||||
llm.run([QueueFrame(FrameType.LLM_MESSAGE, messages)])
|
||||
),
|
||||
True,
|
||||
)
|
||||
)
|
||||
|
||||
@transport.event_handler("on_participant_joined")
|
||||
async def on_joined(transport, participant):
|
||||
if participant["id"] == transport.my_participant_id:
|
||||
return
|
||||
|
||||
await azure_tts.run_to_queue(
|
||||
transport.send_queue,
|
||||
[QueueFrame(FrameType.SENTENCE, "My friend the LLM is now going to tell a joke about llamas.")]
|
||||
)
|
||||
|
||||
async def buffer_to_send_queue():
|
||||
while True:
|
||||
frame = await buffer_queue.get()
|
||||
await transport.send_queue.put(frame)
|
||||
buffer_queue.task_done()
|
||||
if frame.frame_type == FrameType.END_STREAM:
|
||||
break
|
||||
|
||||
await asyncio.gather(llm_response_task, buffer_to_send_queue())
|
||||
|
||||
await transport.stop_when_done()
|
||||
|
||||
await transport.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
asyncio.run(main(args.url))
|
||||
@@ -1,108 +0,0 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
|
||||
from asyncio.queues import Queue
|
||||
import re
|
||||
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
from dailyai.services.ai_services import SentenceAggregator
|
||||
from dailyai.services.azure_ai_services import AzureLLMService
|
||||
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
|
||||
from dailyai.services.open_ai_services import OpenAIImageGenService
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.fal_ai_services import FalImageGenService
|
||||
|
||||
async def main(room_url):
|
||||
meeting_duration_minutes = 5
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
None,
|
||||
"Month Narration Bot",
|
||||
meeting_duration_minutes,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
transport.camera_enabled = True
|
||||
transport.mic_sample_rate = 16000
|
||||
transport.camera_width = 1024
|
||||
transport.camera_height = 1024
|
||||
|
||||
llm = AzureLLMService()
|
||||
dalle = FalImageGenService(image_size="1024x1024")
|
||||
tts = ElevenLabsTTSService(voice_id="ErXwobaYiN019PkySvjV")
|
||||
# dalle = OpenAIImageGenService(image_size="1024x1024")
|
||||
|
||||
# Get a complete audio chunk from the given text. Splitting this into its own
|
||||
# coroutine lets us ensure proper ordering of the audio chunks on the send queue.
|
||||
async def get_all_audio(text):
|
||||
all_audio = bytearray()
|
||||
async for audio in tts.run_tts(text):
|
||||
all_audio.extend(audio)
|
||||
|
||||
return all_audio
|
||||
|
||||
async def get_month_data(month):
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
|
||||
}
|
||||
]
|
||||
|
||||
image_description = await llm.run_llm(messages)
|
||||
to_speak = f"{month}: {image_description}"
|
||||
(audio, image_data) = await asyncio.gather(
|
||||
get_all_audio(to_speak), dalle.run_image_gen(image_description)
|
||||
)
|
||||
|
||||
return {
|
||||
"month": month,
|
||||
"text": image_description,
|
||||
"image": image_data[1],
|
||||
"audio": audio,
|
||||
}
|
||||
|
||||
months: list[str] = [
|
||||
"January",
|
||||
"February",
|
||||
"March",
|
||||
"April",
|
||||
"May",
|
||||
"June",
|
||||
"July",
|
||||
"August",
|
||||
"September",
|
||||
"October",
|
||||
"November",
|
||||
"December",
|
||||
]
|
||||
|
||||
@transport.event_handler("on_first_other_participant_joined")
|
||||
async def on_first_other_participant_joined(transport):
|
||||
# This will play the months in the order they're completed. The benefit
|
||||
# is we'll have as little delay as possible before the first month, and
|
||||
# likely no delay between months, but the months won't display in order.
|
||||
for month_data_task in asyncio.as_completed(month_tasks):
|
||||
data = await month_data_task
|
||||
await transport.send_queue.put(
|
||||
[
|
||||
QueueFrame(FrameType.IMAGE, data["image"]),
|
||||
QueueFrame(FrameType.AUDIO, data["audio"]),
|
||||
]
|
||||
)
|
||||
|
||||
# wait for the output queue to be empty, then leave the meeting
|
||||
await transport.stop_when_done()
|
||||
|
||||
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
|
||||
|
||||
await transport.run()
|
||||
|
||||
if __name__=="__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
asyncio.run(main(args.url))
|
||||
@@ -1,94 +0,0 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
import requests
|
||||
import time
|
||||
import urllib.parse
|
||||
from dailyai.services.ai_services import SentenceAggregator
|
||||
|
||||
from dailyai.services.daily_transport_service import DailyTransportService
|
||||
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
|
||||
from dailyai.queue_frame import QueueFrame, FrameType
|
||||
|
||||
async def main(room_url:str, token):
|
||||
global transport
|
||||
global llm
|
||||
global tts
|
||||
|
||||
transport = DailyTransportService(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
1,
|
||||
)
|
||||
transport.mic_enabled = True
|
||||
transport.mic_sample_rate = 16000
|
||||
transport.camera_enabled = False
|
||||
|
||||
llm = AzureLLMService()
|
||||
tts = AzureTTSService()
|
||||
|
||||
async def handle_transcriptions():
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
|
||||
]
|
||||
|
||||
sentence = ""
|
||||
async for frame in transport.get_receive_frames():
|
||||
if frame.frame_type != FrameType.TRANSCRIPTION:
|
||||
continue
|
||||
|
||||
message = frame.frame_data
|
||||
if message["session_id"] == transport.my_participant_id:
|
||||
continue
|
||||
|
||||
# todo: we could differentiate between transcriptions from different participants
|
||||
sentence += message["text"]
|
||||
if sentence.endswith((".", "?", "!")):
|
||||
messages.append({"role": "user", "content": sentence})
|
||||
sentence = ''
|
||||
|
||||
full_response = ""
|
||||
async for response in llm.run_llm_async_sentences(messages):
|
||||
full_response += response
|
||||
async for audio in tts.run_tts(response):
|
||||
await transport.send_queue.put(QueueFrame(FrameType.AUDIO, audio))
|
||||
|
||||
messages.append({"role": "assistant", "content": full_response})
|
||||
|
||||
transport.transcription_settings["extra"]["punctuate"] = True
|
||||
await asyncio.gather(transport.run(), handle_transcriptions())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
|
||||
parser.add_argument(
|
||||
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-k",
|
||||
"--apikey",
|
||||
type=str,
|
||||
required=True,
|
||||
help="Daily API Key (needed to create token)",
|
||||
)
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
# Create a meeting token for the given room with an expiration 1 hour in the future.
|
||||
room_name: str = urllib.parse.urlparse(args.url).path[1:]
|
||||
expiration: float = time.time() + 60 * 60
|
||||
|
||||
res: requests.Response = requests.post(
|
||||
f"https://api.daily.co/v1/meeting-tokens",
|
||||
headers={"Authorization": f"Bearer {args.apikey}"},
|
||||
json={
|
||||
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
|
||||
},
|
||||
)
|
||||
|
||||
if res.status_code != 200:
|
||||
raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
|
||||
|
||||
token: str = res.json()["token"]
|
||||
|
||||
asyncio.run(main(args.url, token))
|
||||