Compare commits

..

140 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
86025723e7 github: one more publish workflow fix 2024-04-04 17:36:20 -07:00
Aleix Conchillo Flaqué
6f4270a552 github: avoid caching in publish workflow 2024-04-04 17:32:50 -07:00
Aleix Conchillo Flaqué
31f050c02b github: more publish workflows fixes 2024-04-04 17:31:59 -07:00
Aleix Conchillo Flaqué
a0fe57721b github: fix publish workflows 2024-04-04 17:17:15 -07:00
Aleix Conchillo Flaqué
abf5e57319 Merge pull request #103 from daily-co/aleix/fix-github-cache-name
github: fix github cache name
2024-04-05 08:03:15 +08:00
Aleix Conchillo Flaqué
44de9007c3 Merge pull request #102 from daily-co/examples-cleanup
examples cleanup
2024-04-05 08:02:57 +08:00
Aleix Conchillo Flaqué
46d265514e pyproject: update github url 2024-04-04 15:52:28 -07:00
Aleix Conchillo Flaqué
9e64de8606 Merge pull request #101 from daily-co/cb/bot-exit
Allow transport exit to end a running pipeline
2024-04-05 06:51:06 +08:00
Aleix Conchillo Flaqué
1ea503c1e6 examples: fix 03a-image-local 2024-04-04 15:35:58 -07:00
Aleix Conchillo Flaqué
d0aeeccb68 github: fix github cache name 2024-04-04 14:36:04 -07:00
Aleix Conchillo Flaqué
d687c8cdeb transports: updated silero vad not found message 2024-04-04 14:05:40 -07:00
Aleix Conchillo Flaqué
951f20c788 transports: don't write/read if microphone/speaker not enabled 2024-04-04 14:05:15 -07:00
Aleix Conchillo Flaqué
982c0a0749 examples: move non-working examples to to_be_updated 2024-04-04 14:04:53 -07:00
Chad Bailey
27cef7cd70 add endframe to transport receive queue 2024-04-04 20:45:23 +00:00
chadbailey59
03ea208361 VAD fallback (#97)
* Silero VAD preferred with webrtc fallback

* webrtc VAD neds a different sample size

* fixup

* fixup
2024-04-04 13:31:07 -05:00
Aleix Conchillo Flaqué
385b51ac83 Merge pull request #98 from daily-co/use-pip-features
use pip optional dependencies
2024-04-05 01:00:21 +08:00
Aleix Conchillo Flaqué
a37e4fabad github: only run publish-test on main 2024-04-04 09:58:42 -07:00
Aleix Conchillo Flaqué
8bc3c03a69 add a requirements.txt per platform 2024-04-03 21:39:10 -07:00
Aleix Conchillo Flaqué
1fc800754b github: no need to install dependencies when building/deploying 2024-04-03 16:26:58 -07:00
Aleix Conchillo Flaqué
18c4bccc13 github: rename deploy to publish 2024-04-03 16:22:23 -07:00
Aleix Conchillo Flaqué
d57d473c13 pyproject.toml: use setuptools_scm to auto manage versions 2024-04-03 16:13:07 -07:00
Aleix Conchillo Flaqué
48bb3c6955 github: add publish to pypi workflows 2024-04-03 15:57:59 -07:00
Aleix Conchillo Flaqué
e3ee3f9cc6 github(lint): use requirements-dev.txt 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
3528f5d735 use conditional imports and show help errors if modules not found 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
23735cb3a3 dot-env.example: cleanup and add missing environment variables 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
6918dc69f0 github: separate build and test workflows 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
128d350abc pyproject.toml: use project optional dependencies and pin them 2024-04-03 15:27:31 -07:00
chadbailey59
2f59e38a7a Modularize tricky dependencies (#95)
* removed pyaudio from threaded transport

* modularized torch and torchaudio

* modularized local transport

* Working Dockerfile as well

* docker updates for fly.io
2024-04-03 10:48:11 -05:00
chadbailey59
c21014860f Added app messages to translator example (#94) 2024-04-01 14:25:20 -05:00
chadbailey59
d4e3e1710f Server updates (#90)
* updated server readme

* fixup

* Refactored server

* fixup
2024-03-28 15:03:08 -05:00
Moishe Lettvin
e7f9296b5a Merge pull request #93 from daily-co/frame-name-cleanup
Cleanup the last few badly-named Frame types
2024-03-28 14:25:59 -04:00
Moishe Lettvin
27322108b7 Cleanup the last few badly-named Frame types 2024-03-28 12:36:24 -04:00
Moishe Lettvin
22bbedec93 Merge pull request #92 from daily-co/remove-bad-print
Remove mistakenly-added print statement
2024-03-28 11:54:49 -04:00
Moishe Lettvin
ed91bc0f66 Remove mistakenly-added print statement 2024-03-28 11:47:11 -04:00
Moishe Lettvin
565acfa9c9 Merge pull request #86 from daily-co/transport-refactor
Starting refactor of transports into their own directory
2024-03-28 11:17:32 -04:00
Moishe Lettvin
a2295b6b1d Merge pull request #91 from daily-co/pipeline-logging
Add logging for pipeline
2024-03-28 11:16:26 -04:00
Moishe Lettvin
fef1366c84 Merge pull request #88 from daily-co/frame-progress-diagram
Frame progress diagram
2024-03-28 11:13:21 -04:00
Moishe Lettvin
5c0ba1b6f0 Fix off by one errors, add tests and comment 2024-03-28 08:34:34 -04:00
Moishe Lettvin
05c77bce25 Add logging for pipeline 2024-03-27 18:48:30 -04:00
Moishe Lettvin
4ce140bf84 Move some things to AbstractTransport class 2024-03-27 12:59:08 -04:00
James Hush
a3293c6d7a fix: force overriding environment variables from .env files (#89) 2024-03-27 23:38:55 +08:00
Moishe Lettvin
ce04d4a54a Add text to md 2024-03-27 08:10:14 -04:00
Moishe Lettvin
758ed2d895 Frame progress images 2024-03-26 20:40:10 -04:00
Moishe Lettvin
85cd795b2b fix image 2024-03-26 20:36:18 -04:00
Moishe Lettvin
6c36d5f686 Testing 2024-03-26 20:33:09 -04:00
Moishe Lettvin
b2425d6dcd Testing 2024-03-26 20:32:32 -04:00
Moishe Lettvin
e8a6560ac1 Merge forgotten files 2024-03-26 16:24:47 -04:00
Moishe Lettvin
78c80d8941 some more renames 2024-03-26 15:57:19 -04:00
Moishe Lettvin
2fc5de6afe Starting refactor of transports into their own directory 2024-03-26 08:35:04 -04:00
Moishe Lettvin
24fb7c5a05 Merge pull request #81 from daily-co/websocket-transport
Websocket transport
2024-03-25 14:40:34 -04:00
Moishe Lettvin
5761e23af1 remove unnecessary checks 2024-03-25 14:00:08 -04:00
Moishe Lettvin
960c659d5a Remove duplicated constant 2024-03-25 13:59:03 -04:00
Moishe Lettvin
2bda4c3307 Websocket transport 2024-03-25 13:54:34 -04:00
Aleix Conchillo Flaqué
2c5628a621 Merge pull request #85 from daily-co/minor-readme-update
README: minor fixes
2024-03-22 04:33:42 +08:00
Aleix Conchillo Flaqué
9b4cfd9a6c README: minor fixes 2024-03-21 13:16:50 -07:00
Aleix Conchillo Flaqué
8f9aeb0751 Merge pull request #82 from daily-co/remove-unused-imports
remove unused imports
2024-03-22 03:02:07 +08:00
Aleix Conchillo Flaqué
e8a9d43287 Merge pull request #84 from daily-co/use-openai-api-key
use OPENAI_API_KEY instead of OPENAI_CHATGPT_API_KEY
2024-03-21 21:57:40 +08:00
Aleix Conchillo Flaqué
cf5d516d51 use OPENAI_API_KEY instead of OPENAI_CHATGPT_API_KEY
Fixes #77
2024-03-20 15:26:32 -07:00
Aleix Conchillo Flaqué
0666dd1194 remove unused imports 2024-03-20 14:52:19 -07:00
Aleix Conchillo Flaqué
42e25ccd13 create missing __init__.py 2024-03-20 14:41:39 -07:00
Aleix Conchillo Flaqué
520cee273f Merge pull request #80 from daily-co/move-src-daily-tests-to-tests
move src/dailyai/tests to tests
2024-03-21 00:27:07 +08:00
Aleix Conchillo Flaqué
a189e2618f github: source venv in every step 2024-03-19 15:31:03 -07:00
Aleix Conchillo Flaqué
ae2dcf88ed github: use virtual environment 2024-03-19 15:23:09 -07:00
Aleix Conchillo Flaqué
5cdb82ad3c README: one more autopep8 emacs update 2024-03-19 15:18:29 -07:00
Aleix Conchillo Flaqué
593513c84a github: add venv caching 2024-03-19 15:17:48 -07:00
Aleix Conchillo Flaqué
16257f8ec0 move src/dailyai/tests to tests 2024-03-19 14:59:48 -07:00
Aleix Conchillo Flaqué
5fc21a7508 Merge pull request #73 from daily-co/github-unittests-workflow
github: add workflow for unit tests
2024-03-20 03:01:03 +08:00
Aleix Conchillo Flaqué
cc05429135 github: add workflow for unit tests 2024-03-19 11:51:14 -07:00
Aleix Conchillo Flaqué
85e66dddbe Merge pull request #79 from daily-co/readme-emacs-autopep8-update
README: emacs autopep8 update
2024-03-20 02:17:44 +08:00
Aleix Conchillo Flaqué
03ea559839 README: emacs autopep8 update 2024-03-19 10:28:11 -07:00
Aleix Conchillo Flaqué
b6c9859e34 Merge pull request #78 from daily-co/readme-editor-setup
README: add editor setup
2024-03-20 01:10:57 +08:00
Aleix Conchillo Flaqué
bc47c909a3 README: add editor setup 2024-03-19 10:10:14 -07:00
Aleix Conchillo Flaqué
428659730d Merge pull request #70 from daily-co/move-src-example-to-examples
move src/examples to examples
2024-03-20 01:09:13 +08:00
Aleix Conchillo Flaqué
a573277a10 examples: copy runner.py and auth.py where needed 2024-03-18 17:10:23 -07:00
Aleix Conchillo Flaqué
69c2637a25 README.md: update examples 2024-03-18 14:53:53 -07:00
Aleix Conchillo Flaqué
90c34d278f move src/examples to examples 2024-03-18 11:51:38 -07:00
Aleix Conchillo Flaqué
2f4e31d1b2 Merge pull request #69 from daily-co/add-github-linting-workflow
github: add linting workflow
2024-03-19 02:46:50 +08:00
Aleix Conchillo Flaqué
9385270775 autopep8 formatting 2024-03-18 11:28:32 -07:00
Aleix Conchillo Flaqué
2914e43350 github: add linting workflow 2024-03-18 11:28:06 -07:00
chadbailey59
78638d2dba Live translation (#61)
* added translator

* fixup
2024-03-18 13:26:05 -05:00
Aleix Conchillo Flaqué
141a5bb548 Merge pull request #68 from daily-co/log-transcription-errors
daily: log transcription errors
2024-03-19 01:53:40 +08:00
Aleix Conchillo Flaqué
3957813202 Merge pull request #67 from daily-co/add-dot-env-template
add dot-env.template
2024-03-19 01:49:21 +08:00
Aleix Conchillo Flaqué
549862ef99 daily: log transcription errors 2024-03-18 10:47:20 -07:00
Aleix Conchillo Flaqué
1000ca5b55 add dot-env.template 2024-03-18 10:43:57 -07:00
Moishe Lettvin
91dbfef4c3 Merge pull request #64 from daily-co/docs
Some docs
2024-03-18 13:38:32 -04:00
Moishe Lettvin
3b61d0b41a fix typos 2024-03-18 13:38:00 -04:00
Moishe Lettvin
bf3ae091b9 Merge pull request #62 from daily-co/anthropic-support
Anthropic LLM service
2024-03-18 13:36:39 -04:00
Aleix Conchillo Flaqué
34ac796607 Merge pull request #66 from daily-co/daily-transport-release-client
services: release daily client after leave
2024-03-19 01:36:22 +08:00
Aleix Conchillo Flaqué
e0551e9d85 services: release daily client after leave 2024-03-18 10:32:46 -07:00
Moishe Lettvin
b1ab6f91b9 Merge pull request #65 from daily-co/app-messages
Support for app messages
2024-03-18 11:37:10 -04:00
Moishe Lettvin
58726dc20d clean up imports 2024-03-18 10:14:51 -04:00
Moishe Lettvin
8e61fe8e36 Support for app messages 2024-03-18 10:08:41 -04:00
Moishe Lettvin
99b836c227 added docstrings to frames. 2024-03-18 09:08:12 -04:00
Moishe Lettvin
1c27f77f1a drafty architecture doc 2024-03-18 08:39:50 -04:00
Moishe Lettvin
c91fa39a99 Remove testing code 2024-03-15 19:42:46 -04:00
Moishe Lettvin
eacaea7db4 Anthropic LLM service 2024-03-15 19:40:37 -04:00
Moishe Lettvin
c6dfcb6f7a Merge pull request #60 from daily-co/remove-ai-service-methods
Remove run_to_queue and run from AIService class
2024-03-15 15:28:28 -04:00
Moishe Lettvin
18bf26de14 Update apps 2024-03-15 13:39:33 -04:00
Moishe Lettvin
b8b35db89c Remove run_to_queue and run from AIService class 2024-03-15 11:04:22 -04:00
Moishe Lettvin
358166f347 Merge pull request #59 from daily-co/remove-requirements
Remove unused requirements file
2024-03-13 16:23:42 -04:00
Moishe Lettvin
c006c123b2 Remove unused requirements file 2024-03-13 16:19:03 -04:00
chadbailey59
cf302fb765 Storybot and Chatbot examples (#58)
* storybot

* storybot

* added pipeline.queue_frames

* fixup
2024-03-13 15:12:59 -05:00
Moishe Lettvin
e33820fe36 Merge pull request #56 from daily-co/fal-redux
Use other model in FAL
2024-03-12 15:14:57 -04:00
Moishe Lettvin
b84b3d59f3 Use other model in FAL 2024-03-12 14:47:00 -04:00
Moishe Lettvin
7b5b88b99b Merge pull request #55 from daily-co/fix-fal
set FAL param correctly
2024-03-12 14:12:16 -04:00
Moishe Lettvin
e87196cce7 set FAL param correctly 2024-03-12 14:03:43 -04:00
chadbailey59
bbfc9e703b intake cleanup (#54) 2024-03-12 13:01:39 -05:00
Moishe Lettvin
c21a63d48b Merge pull request #49 from daily-co/openai-base-llm
Base OpenAI LLM service
2024-03-12 12:58:31 -04:00
Moishe Lettvin
f546bb32da Make 08- work again 2024-03-12 10:34:52 -04:00
Moishe Lettvin
d9378e23ba Base OpenAI LLM service 2024-03-11 16:52:41 -04:00
Moishe Lettvin
c75a3fb0d0 Merge pull request #53 from daily-co/fix_other_joined_event
Don't do time-consuming processing in `on_other_joined_event`
2024-03-11 13:27:13 -04:00
Moishe Lettvin
f8ae264957 remove unnecessary print 2024-03-11 13:20:28 -04:00
Moishe Lettvin
977c12d530 undo fal change 2024-03-11 13:19:47 -04:00
Moishe Lettvin
61c55d2f47 Fix up other examples 2024-03-11 13:17:31 -04:00
Moishe Lettvin
fd2fa23e9c Fix example 2 2024-03-11 13:00:29 -04:00
Moishe Lettvin
de026ccc8a Merge pull request #50 from daily-co/khk/launch-samples
Khk/launch samples
2024-03-11 12:50:38 -04:00
Moishe Lettvin
c5bb0e14ab Merge pull request #51 from daily-co/khk/readme
updated README
2024-03-11 12:50:22 -04:00
chadbailey59
a4f3c51184 the smallest commit in history 2024-03-11 09:47:00 -05:00
Moishe Lettvin
7786e685cc Merge pull request #52 from daily-co/pypi-updates
updates to pyproject.toml
2024-03-11 10:34:35 -04:00
Moishe Lettvin
33793ca9f8 update description 2024-03-11 07:31:39 -04:00
Moishe Lettvin
d26aede667 updates to pyproject.toml 2024-03-11 07:25:20 -04:00
Moishe Lettvin
ad993056d8 rename to dailyai 2024-03-11 07:16:20 -04:00
Kwindla Hultman Kramer
5b1f26aacb updated README 2024-03-10 22:06:23 -07:00
Kwindla Hultman Kramer
4e16e514dd attempting to change tts to deepgram in example 04 2024-03-10 19:43:06 -07:00
Kwindla Hultman Kramer
959ffa9d36 small streamlining of example 03 2024-03-10 19:42:19 -07:00
Kwindla Hultman Kramer
4396b1018a small streamlining of example 02 2024-03-10 19:41:32 -07:00
Kwindla Hultman Kramer
37e904ce68 changed fal to a maybe slightly faster model 2024-03-10 19:40:51 -07:00
Kwindla Hultman Kramer
ef39d842a5 custom processor in example 05 2024-03-10 19:18:37 -07:00
Kwindla Hultman Kramer
72f631a066 working on foundational examples 2024-03-10 17:21:46 -07:00
chadbailey59
5d46302b9e changed default services (#47) 2024-03-08 15:36:30 -06:00
chadbailey59
8241dc0bed cleaned up example logging (#46) 2024-03-08 15:25:17 -06:00
Moishe Lettvin
95a1efbe75 Merge pull request #45 from daily-co/exception_handling_callbacks
Wait for the callback's result, so exceptions get raised
2024-03-08 15:04:15 -05:00
Moishe Lettvin
e59df8476e Wait for the callback's result, so exceptions get raised 2024-03-08 15:02:15 -05:00
chadbailey59
824df8ca7c moved patient intake and example runner (#44) 2024-03-08 12:07:51 -06:00
chadbailey59
0db8a51b27 cleaned up function calling frames (#43) 2024-03-08 10:13:28 -06:00
chadbailey59
ce9c6ede66 function allowlist (#42) 2024-03-08 08:49:09 -06:00
Moishe Lettvin
192b46bbab Merge pull request #41 from daily-co/optimize-pipeline
Optimize pipeline processing
2024-03-07 21:01:03 -05:00
Moishe Lettvin
196279e342 Add endframe to sample 4 2024-03-07 19:24:27 -05:00
Moishe Lettvin
edd93bc4cb remove errant print statement 2024-03-07 19:05:03 -05:00
Moishe Lettvin
d0076dd4ee Optimize pipeline processing so we don't wait for the completion of one generator to move onto the next. 2024-03-07 18:59:47 -05:00
169 changed files with 5301 additions and 2089 deletions

30
.dockerignore Normal file
View File

@@ -0,0 +1,30 @@
# flyctl launch added from .gitignore
**/.vscode
**/env
**/__pycache__
**/*~
**/venv
#*#
# Distribution / packaging
**/.Python
**/build
**/develop-eggs
**/dist
**/downloads
**/eggs
**/.eggs
**/lib
**/lib64
**/parts
**/sdist
**/var
**/wheels
**/share/python-wheels
**/*.egg-info
**/.installed.cfg
**/*.egg
**/MANIFEST
**/.DS_Store
**/.env
fly.toml

44
.github/workflows/build.yaml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: build
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
name: "Build and Install"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Install project and other Python dependencies
run: |
source .venv/bin/activate
pip install --editable .

44
.github/workflows/lint.yaml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: lint
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-lint-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
autopep8:
name: "Formatting lints"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install development Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: autopep8
id: autopep8
run: |
source .venv/bin/activate
autopep8 --max-line-length 100 --exit-code -r -d --exclude "*_pb2.py" -a -a src/
- name: Fail if autopep8 requires changes
if: steps.autopep8.outputs.exit-code == 2
run: exit 1

62
.github/workflows/publish.yaml vendored Normal file
View File

@@ -0,0 +1,62 @@
name: publish
on:
workflow_dispatch:
inputs:
gitref:
type: string
description: "what git ref to build"
required: true
jobs:
build:
name: "Build and upload wheels"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels
path: ./dist
publish-to-pypi:
name: "Publish to PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: pypi
url: https://pypi.org/p/dailyai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true

59
.github/workflows/publish_test.yaml vendored Normal file
View File

@@ -0,0 +1,59 @@
name: publish-test
on:
workflow_dispatch:
push:
branches:
- main
jobs:
build:
name: "Build and upload wheels"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
path: ./dist
publish-to-pypi:
name: "Test publish to PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: pypi
url: https://pypi.org/p/dailyai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/

49
.github/workflows/tests.yaml vendored Normal file
View File

@@ -0,0 +1,49 @@
name: test
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
test:
name: "Unit and Integration Tests"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing requirements-dev.txt and requirements-extra.txt which
# contain all dependencies needed to run the tests and examples.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('linux-py3.10-requirements.txt') }}-${{ hashFiles('dev-requirements.txt') }}
path: .venv
- name: Install system packages
run: sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r linux-py3.10-requirements.txt -r dev-requirements.txt
- name: Test with pytest
run: |
source .venv/bin/activate
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests

1
.gitignore vendored
View File

@@ -26,3 +26,4 @@ share/python-wheels/
MANIFEST
.DS_Store
.env
fly.toml

View File

@@ -7,13 +7,14 @@ COPY *.py /app
COPY pyproject.toml /app
COPY src/ /app/src/
COPY examples/ /app/examples/
WORKDIR /app
RUN ls --recursive /app/
RUN pip3 install --upgrade -r requirements.txt
RUN python -m build .
RUN pip3 install .
RUN pip3 install gunicorn
# If running on Ubuntu, Azure TTS requires some extra config
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi
@@ -36,4 +37,4 @@ WORKDIR /app
EXPOSE 8000
# run
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--chdir", "examples/server", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]

235
README.md
View File

@@ -1,33 +1,103 @@
# Daily AI SDK
# dailyai — an open source framework for real-time, multi-modal, conversational AI applications
Build conversational, multi-modal AI apps with real-time voice and video, like this:
Build things like this:
_Demo Video to come_
[![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0)
With built-in support for many of the best AI platforms (or [add your own](/docs)):
**`dailyai` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
- Azure - DALL-E, ChatGPT, and Azure AI Text-to-Speech
- Deepgram - Speech-to-text, and Aura text-to-speech
- Eleven Labs text-to-speech
- Fal.ai image generation
- OpenAI DALL-E and ChatGPT
- Whisper local speech-to-text
In 2023 a *lot* of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
- low-latency, reliable audio transport
- echo cancellation
- phrase endpointing (knowing when the bot should respond to human speech)
- interruptibility
- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models
## Step 1: Get Started
As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
## Build/Install
Today, `dailyai` is:
1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
2. transport services that moves audio, video, and events across the Internet
3. implementations of specific generative AI services
Currently implemented services:
- Speech-to-text
- Deepgram
- Whisper
- LLMs
- Azure
- OpenAI
- Image generation
- Azure
- Fal
- OpenAI
- Text-to-speech
- Azure
- Deepgram
- ElevenLabs
- Transport
- Daily
- Local (in progress, intended as a quick start example service)
If you'd like to [implement a service]((https://github.com/daily-co/daily-ai-sdk/tree/main/src/dailyai/services)), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
## Getting started
Today, the easiest way to get started with `dailyai` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations. (The [transport base class](https://github.com/daily-co/daily-ai-sdk/blob/main/src/dailyai/transports/abstract_transport.py) is easy to extend, though, so feel free to submit PRs if you'd like to implement another transport service.)
```
# install the module
pip install dailyai
# set up an .env file with API keys
cp dot-env.template .env
```
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional
dependencies that you can install with:
```
pip install dailyai[option,...]
```
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
- **AI services**: `anthropic`, `azure`, `fal`, `openai`, `playht`, `silero`, `whisper`
- **Transports**: `daily`, `local`, `websocket`
## Code examples
There are two directories of examples:
- [foundational](https://github.com/daily-co/daily-ai-sdk/tree/main/examples/foundational) — demos that build on each other, introducing one or two concepts at a time
- [starter apps](https://github.com/daily-co/daily-ai-sdk/tree/main/examples/starter-apps) — complete applications that you can use as starting points for development
Before running the examples you need to install the dependencies (which will install all the dependencies to run all of the examples):
```
pip install -r {env}-requirements.txt
```
To run the example below you need to sign up for a [free Daily account](https://dashboard.daily.co/u/signup) and create a Daily room (so you can hear the LLM talking). After that, join the room's URL directly from a browser tab and run:
```
python examples/foundational/02-llm-say-one-thing.py
```
## Hacking on the framework itself
_Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_
```
python3 -m venv env
source env/bin/activate
python3 -m venv venv
source venv/bin/activate
```
From the root of this repo, run the following:
```
pip install -r requirements.txt
pip install -r {env}-requirements.txt -r dev-requirements.txt
python -m build
```
@@ -43,117 +113,54 @@ If you want to use this package from another directory, you can run:
pip install path_to_this_repo
```
## Running the samples
### Running tests
Tou can run the simple sample like so:
From the root directory, run:
```
python src/examples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
```
## Overview
The Daily AI SDK allows you to build applications that can participate in WebRTC sessions and interact with AI Services. Some examples of what you can build with this:
- conversational bots that interact 1:1 with a user, using voice recognition and text-to-speech
- assistant bots that aggregate transcriptions from multiple participants in a meeting and provide realtime summaries or other AI-generated output.
- image-recognition bots
- etc
## Concepts
### Transport Service
The SDK provides one “transport service”, which is a wrapper around Dailys `daily-python` client (tk add link). You can use this service to listen for events related to a WebRTC session, such as “a participant joined the meeting”.
The transport service also exposes a send queue, and a receive queue. You can use the send queue to send audio and video to the WebRTC session, and you can listen to the receive queue to see audio, video and transcription data from the WebRTC session.
### AI Services
The AI Service classes provide wrappers around various AI providers, and allow you to query LLMs, convert text to speech and make images from text. The audio and images can then be placed on the transport services send queue, where theyll be sent to the WebRTC session.
### Queue Frames
Communication between the transport service and AI services, and between various AI services, takes place in Queue Frames. These frames contain an indication of the type of data as well as the data itself.
## Using Transports, AI Services and Frames
AI Services all define a `.run` method. This method consumes and generates `QueueFrame` frames. The kind of frames that can be consumed and generated depend on the kind of service. For instance, an LLM AI Service consumes `LLM_MESSAGE` frames (which define a history of interaction with an LLM) and emit `TEXT` frames (the response from the LLM).
The `.run` method is an `AsyncIterable`, and it takes an `iterable`, `AsyncIterable` or `asyncio.Queue` that produces QueueFrames as a parameter. This makes it easy to chain AI Services, and consume input from the Transports `receive_queue` .
AI Services also have a `.run_to_queue` method. This method is not an AsyncIterable, but instead sends processed QueueFrames to a queue. This makes it easy to send the output of an AI Service to the Transports `send_queue`.
AI Services also define convenience functions that let you bypass creating QueueFrames for some simple cases (eg. using the TTS service to convert a string to audio output and send that audio to the transports `send_queue`). See below for examples.
## Examples
### Say Something
The base TTS AI service exposes a `.say` method. After creating a transport and TTS service, you can use this method like so:
```
transport = DailyTransportService(...)
tts = AzureTTSService()
await tts.say("hello world", transport.send_queue)
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests
```
This will call the TTS service to render the text to audio frames, then put the audio frames on the transports send queue. The transport will then send those frames along to the WebRTC session.
## Setting up your editor
### Speak an LLM response
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting.
Given a system prompt contained in a `messages` array, you can emit the LLMs response as audio with a chain like this:
### Emacs
```
transport = DailyTransportService(...) # setup parameters omitted
tts = AzureTTSService()
llm = AzureLLMService()
messages = [...] # system prompt omitted for brevity
You can use [use-package](https://github.com/jwiegley/use-package) to install [py-autopep8](https://codeberg.org/ideasman42/emacs-py-autopep8) package and configure `autopep8` arguments:
await tts.run_to_queue(
transport.send_queue,
llm.run([QueueFrame.LLM_MESSAGES, messages])
)
```elisp
(use-package py-autopep8
:ensure t
:defer t
:hook ((python-mode . py-autopep8-mode))
:config
(setq py-autopep8-options '("-a" "-a", "--max-line-length=100")))
```
In this code, the LLM service object sends the messages to Azures OpenAI implementation, which streams chunks back asynchronously. Those chunks are aggregated by the TTS Service to ensure the best audio response (TTS works best when it gets complete sentence, so it can inflect correctly), then sent to Azures TTS service, converted to audio frames, and sent to the WebRTC session via the Daily transport.
`autopep8` was installed in the `venv` environment described before, so you should be able to use [pyvenv-auto](https://github.com/ryotaro612/pyvenv-auto) to automatically load that environment inside Emacs.
### Pre-cache an LLM response
Sometimes LLMs can be slower than wed like for natural-feeling communication. Heres an example where we take advantage of the time it takes to speak some pre-defined text to get a head start on the LLM response:
(TK link to 04- sample)
In this sample, we set up a buffer queue to receive the audio frames from the LLM response before while we are joining the call and start an asynchronous task to start filling this buffer:
```
buffer_queue = asyncio.Queue()
llm_response_task = asyncio.create_task(
elevenlabs_tts.run_to_queue(
buffer_queue,
llm.run([QueueFrame(FrameType.LLM_MESSAGE, messages)]),
True,
)
)
```
Then, when weve joined the call, we speak the static text:
```
await azure_tts.say("My friend...", transport.send_queue)
```
As that text is being spoken, the asynchronous LLM task continues in the background. When the text is done, we pull the frames off the buffer queue and put them in the transports `send_queue`:
```
async def buffer_to_send_queue():
while True:
frame = await buffer_queue.get()
await transport.send_queue.put(frame)
buffer_queue.task_done()
if frame.frame_type == FrameType.END_STREAM:
break
await asyncio.gather(llm_response_task, buffer_to_send_queue())
```elisp
(use-package pyvenv-auto
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
```
One thing to note here is the last parameter to `run_to_queue` in the first code clause above: this causes the `run_to_queue` method to send an `END_STREAM` frame when its done rendering. This lets us know when to stop our `buffer_to_send_queue` task above.
### Visual Studio Code
Install the
[autopep8](https://marketplace.visualstudio.com/items?itemName=ms-python.autopep8) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, enable formatting on save and configure `autopep8` arguments:
```json
"[python]": {
"editor.defaultFormatter": "ms-python.autopep8",
"editor.formatOnSave": true
},
"autopep8.args": [
"-a",
"-a",
"--max-line-length=100"
],
```

6
dev-requirements.txt Normal file
View File

@@ -0,0 +1,6 @@
autopep8==2.0.4
build==1.0.3
pip-tools==7.4.1
pytest==8.1.1
setuptools==69.2.0
setuptools_scm==8.0.4

View File

@@ -4,9 +4,13 @@
Learn about the thinking behind the SDK's design.
## [A Frame's Progress](frame-progress.md)
See how a Frame is processed through a Transport, a Pipeline, and a series of Frame Processors.
## [Example Code](examples/)
The repo includes several example apps in the `src/examples` directory. The docs explain how they work.
The repo includes several example apps in the `examples` directory. The docs explain how they work.
## [API Reference](api/)

View File

@@ -1,2 +1,17 @@
# Daily AI SDK Architecture Guide
## Frames
Frames can represent discrete chunks of data, for instance a chunk of text, a chunk of audio, or an image. They can also be used to as control flow, for instance a frame that indicates that there is no more data available, or that a user started or stopped talking. They can also represent more complex data structures, such as a message array used for an LLM completion.
## FrameProcessors
Frame processors operate on frames. Every frame processor implements a `process_frame` method that consumes one frame and produces zero or more frames. Frame processors can do simple transforms, such as concatenating text fragments into sentences, or they can treat frames as input for an AI Service, and emit chat completions based on message arrays or transform text into audio or images.
## Pipelines
Pipelines are lists of frame processors that read from a source queue and send the processed frames to a sink queue. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport's send queue as its sync. Placing LLM message frames on the pipeline's source queue will cause the LLM's response to be spoken. See example #2 for an implementation of this.
## Transports
Transports provide a receive queue, which is input from "the outside world", and a sink queue, which is data that will be sent "to the outside world". The `LocalTransportService` does this with the local camera, mic, display and speaker. The `DailyTransportService` does this with a WebRTC session joined to a Daily.co room.

View File

@@ -16,7 +16,7 @@ if __name__ == "__main__":
### `configure()`
The `configure()` function comes from `src/examples/foundational/support/runner.py`, and it allows you to configure the examples from the command line directly, or using environment variables:
The `configure()` function comes from `examples/foundational/support/runner.py`, and it allows you to configure the examples from the command line directly, or using environment variables:
```bash
python 01-say-one-thing.py -u https://YOUR_DOMAIN.daily.co/YOUR_ROOM -k YOUR_API_KEY

View File

@@ -1,5 +1,5 @@
# Daily AI SDK Examples
The docs in this folder pair with the example apps located in `src/examples/foundational`. They are designed to serve as a quick references for building different kinds of AI apps. But the examples also build on one another, so it can be really helpful to walk through them in order.
The docs in this folder pair with the example apps located in `examples/foundational`. They are designed to serve as a quick references for building different kinds of AI apps. But the examples also build on one another, so it can be really helpful to walk through them in order.
To start, you can learn about the overall structure of the examples in [01 - Say One Thing](01-say-one-thing.md).

46
docs/frame-progress.md Normal file
View File

@@ -0,0 +1,46 @@
# A Frame's Progress
1. A user says “Hello, LLM” and the cloud transcription service delivers a transcription to the Transport.
![A transcript frame arrives](images/frame-progress-01.png)
2. The Transport places a Transcription frame in the Pipelines source queue.
![Frame in source queue](images/frame-progress-02.png)
3. The Pipeline passes the Transcription frame to the first Frame Processor in its list, the LLM User Message Aggregator.
![To UMA](images/frame-progress-03.png)
4. The LLM User Message Aggregator updates the LLM Context with a `{“user”: “Hello LLM”}` message.
![Update context](images/frame-progress-04.png)
5. The LLM User Message Aggregator yields an LLM Message Frame, containing the updated LLM Context. The Pipeline passes this frame to the LLM Frame Processor.
![Update context](images/frame-progress-05.png)
6. The LLM Frame Processor creates a streaming chat completion based on the LLM context and yields the first chunk of a response, Text Frame with the value “Hi, “. The Pipeline passes this frame to the TTS Frame Processor. The TTS Frame Processor aggregates this response but doesnt yield anything, yet, because its waiting for a full sentence.
![LLM yields Text](images/frame-progress-06.png)
7. The LLM Frame Processor yields another Text Frame with the value “there.”. The Pipeline passes this frame to the TTS Frame Processor.
![LLM yields more Text](images/frame-progress-07.png)
8. The TTS Frame Processor now has a full sentence, so it starts streaming audio based on “Hi, there.” It yields the first chunk of streaming audio as an Audio frame, which the Pipeline passes to the LLM Assistant Message Aggregator.
![TTS yields Audio](images/frame-progress-08.png)
9. The LLM Assistant Message Aggregator doesnt do anything with Audio frames, so it immediately yields the frame, unchanged. This is the convention for all Frame Processors: frames that the processor doesnt process should be immediately yielded.
![pass-through](images/frame-progress-09.png)
10. The Pipeline places the first Audio frame in its sink queue, which is being watched by the Transport. Since the frame is now in a queue, the Pipeline can continue processing other frames. Note that the source and sink queues form a sort of “boundary of concurrent processing” between a Pipeline and the outside world. In a Pipeline, Frames are processed sequentially; once a Frame is on a queue it can be processed in parallel with the frames being processed by the Pipeline. TODO: link to a more in-depth section about this.
![sink queue](images/frame-progress-10.png)
11. The TTS Frame Processor yields another Audio frame as the Transport transmits the first Audio frame.
![parallel audio](images/frame-progress-11.png)
12. As before, the LLM Assistant Message Aggregator immediately yields the Audio frame and the Pipeline places the Audio frame in the sink queue.
![sink queue 2](images/frame-progress-12.png)
13. The TTS Frame Processor has no more frames to yield. The LLM Frame Processor emits an LLM Response End Frame, which the Pipeline passes to the TTS Frame Processor.
![response end](images/frame-progress-13.png)
14. The TTS Frame Processor immediately yields the LLM Response End Frame, so the Pipeline passes it along to the LLM Assistant Message Aggregator. The LLM Assistant Message Aggregator updates the LLM Context with the full response from the LLM. TODO TODO: I realized I forgot that the TSS Frame Processor also yields the Text frames that the LLM emitted so that the LLM Assistant Message Aggregator could accumulate them, arrggh.
![response end](images/frame-progress-14.png)
15. The system is quiet, and waiting for the next message from the Transport.
![response end](images/frame-progress-15.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

25
dot-env.template Normal file
View File

@@ -0,0 +1,25 @@
# Anthropic
ANTHROPIC_API_KEY=...
# Azure
SPEECH_KEY=...
SPEECH_REGION=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
# ElevenLabs
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
# Fal
FAL_KEY_ID=...
FAL_KEY_SECRET=...
# PlayHT
PLAY_HT_USER_ID=...
PLAY_HT_API_KEY=...
# OpenAI
OPENAI_API_KEY=...

View File

@@ -0,0 +1,54 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frames import EndFrame, TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Say One Thing",
mic_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([tts])
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
if participant["info"]["isLocal"]:
return
participant_name = participant["info"]["userName"] or ''
await pipeline.queue_frames([TextFrame("Hello there, " + participant_name + "!"), EndFrame()])
await transport.run(pipeline)
del tts
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -1,17 +1,24 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.local_transport_service import LocalTransportService
from dailyai.transports.local_transport import LocalTransport
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 1
transport = LocalTransportService(
duration_minutes=meeting_duration_minutes,
mic_enabled=True
transport = LocalTransport(
duration_minutes=meeting_duration_minutes, mic_enabled=True
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
@@ -21,10 +28,7 @@ async def main():
async def say_something():
await asyncio.sleep(1)
await tts.say(
"Hello there.",
transport.send_queue,
)
await transport.say("Hello there.", tts)
await transport.stop_when_done()
await asyncio.gather(transport.run(), say_something())

View File

@@ -0,0 +1,59 @@
import asyncio
import os
import logging
import aiohttp
from dailyai.pipeline.frames import EndFrame, LLMMessagesFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.open_ai_services import OpenAILLMService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Say One Thing From an LLM",
mic_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}]
pipeline = Pipeline([llm, tts])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await pipeline.queue_frames([LLMMessagesFrame(messages), EndFrame()])
await transport.run(pipeline)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,57 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frames import TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.fal_ai_services import FalImageGenService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Show a still frame image",
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=1
)
imagegen = FalImageGenService(
image_size="square_hd",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
pipeline = Pipeline([imagegen])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
# Note that we do not put an EndFrame() item in the pipeline for this demo.
# This means that the bot will stay in the channel until it times out.
# An EndFrame() in the pipeline would cause the transport to shut
# down.
await pipeline.queue_frames(
[TextFrame("a cat in the style of picasso")]
)
await transport.run(pipeline)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -1,25 +1,33 @@
import asyncio
import aiohttp
import logging
import os
import tkinter as tk
from dailyai.pipeline.frames import TextFrame
from dailyai.pipeline.frames import TextFrame, EndFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.local_transport_service import LocalTransportService
from dailyai.transports.local_transport import LocalTransport
local_joined = False
participant_joined = False
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 2
tk_root = tk.Tk()
tk_root.title("Calendar")
transport = LocalTransportService(
tk_root.title("dailyai")
transport = LocalTransport(
tk_root=tk_root,
mic_enabled=True,
mic_enabled=False,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
@@ -27,16 +35,14 @@ async def main():
)
imagegen = FalImageGenService(
image_size="1024x1024",
image_size="square_hd",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
image_task = asyncio.create_task(
imagegen.run_to_queue(
transport.send_queue, [TextFrame("a cat in the style of picasso")]
)
)
pipeline = Pipeline([imagegen])
await pipeline.queue_frames([TextFrame("a cat in the style of picasso")])
async def run_tk():
while not transport._stop_threads.is_set():
@@ -44,7 +50,8 @@ async def main():
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(transport.run(), image_task, run_tk())
await asyncio.gather(transport.run(pipeline, override_pipeline_source_queue=False), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,85 @@
import asyncio
import logging
import os
import aiohttp
from dailyai.pipeline.merge_pipeline import SequentialMergePipeline
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.deepgram_ai_services import DeepgramTTSService
from dailyai.pipeline.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Static And Dynamic Speech",
duration_minutes=1,
mic_enabled=True,
mic_sample_rate=16000,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
azure_tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
deepgram_tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
)
elevenlabs_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
messages = [{"role": "system",
"content": "tell the user a joke about llamas"}]
# Start a task to run the LLM to create a joke, and convert the LLM output to audio frames. This task
# will run in parallel with generating and speaking the audio for static text, so there's no delay to
# speak the LLM response.
llm_pipeline = Pipeline([llm, elevenlabs_tts])
await llm_pipeline.queue_frames([LLMMessagesFrame(messages), EndPipeFrame()])
simple_tts_pipeline = Pipeline([azure_tts])
await simple_tts_pipeline.queue_frames(
[
TextFrame("My friend the LLM is going to tell a joke about llamas."),
EndPipeFrame(),
]
)
merge_pipeline = SequentialMergePipeline(
[simple_tts_pipeline, llm_pipeline])
await asyncio.gather(
transport.run(merge_pipeline),
simple_tts_pipeline.run_pipeline(),
llm_pipeline.run_pipeline(),
)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,146 @@
import asyncio
import aiohttp
import os
import logging
from dataclasses import dataclass
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import (
GatedAggregator,
LLMFullResponseAggregator,
ParallelPipeline,
SentenceAggregator,
)
from dailyai.pipeline.frames import (
Frame,
TextFrame,
EndFrame,
ImageFrame,
LLMMessagesFrame,
LLMResponseStartFrame,
)
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
@dataclass
class MonthFrame(Frame):
month: str
class MonthPrepender(FrameProcessor):
def __init__(self):
self.most_recent_month = "Placeholder, month frame not yet received"
self.prepend_to_next_text_frame = False
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, MonthFrame):
self.most_recent_month = frame.month
elif self.prepend_to_next_text_frame and isinstance(frame, TextFrame):
yield TextFrame(f"{self.most_recent_month}: {frame.text}")
self.prepend_to_next_text_frame = False
elif isinstance(frame, LLMResponseStartFrame):
self.prepend_to_next_text_frame = True
yield frame
else:
yield frame
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Month Narration Bot",
mic_enabled=True,
camera_enabled=True,
mic_sample_rate=16000,
camera_width=1024,
camera_height=1024,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
imagegen = FalImageGenService(
image_size="square_hd",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
gated_aggregator = GatedAggregator(
gate_open_fn=lambda frame: isinstance(
frame, ImageFrame), gate_close_fn=lambda frame: isinstance(
frame, LLMResponseStartFrame), start_open=False, )
sentence_aggregator = SentenceAggregator()
month_prepender = MonthPrepender()
llm_full_response_aggregator = LLMFullResponseAggregator()
pipeline = Pipeline(
processors=[
llm,
sentence_aggregator,
ParallelPipeline(
[[month_prepender, tts], [llm_full_response_aggregator, imagegen]]
),
gated_aggregator,
],
)
frames = []
for month in [
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
]:
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
frames.append(MonthFrame(month))
frames.append(LLMMessagesFrame(messages))
frames.append(EndFrame())
await pipeline.queue_frames(frames)
await transport.run(pipeline, override_pipeline_source_queue=False)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,88 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frames import LLMMessagesFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.ai_services import FrameLogger
from dailyai.pipeline.aggregators import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
)
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
duration_minutes=5,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
vad_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
fl = FrameLogger("Inner")
fl2 = FrameLogger("Outer")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
pipeline = Pipeline(
processors=[
fl,
tma_in,
llm,
fl2,
tts,
tma_out,
],
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await pipeline.queue_frames([LLMMessagesFrame(messages)])
transport.transcription_settings["extra"]["endpointing"] = True
transport.transcription_settings["extra"]["punctuate"] = True
await transport.run(pipeline)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,76 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.aggregators import (
LLMResponseAggregator,
UserResponseAggregator,
)
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.ai_services import FrameLogger
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
duration_minutes=5,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
vad_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
pipeline = Pipeline([FrameLogger(), llm, FrameLogger(), tts])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await transport.say("Hi, I'm listening!", tts)
async def run_conversation():
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
await transport.run_interruptible_pipeline(
pipeline,
post_processor=LLMResponseAggregator(messages),
pre_processor=UserResponseAggregator(messages),
)
transport.transcription_settings["extra"]["punctuate"] = False
await asyncio.gather(transport.run(), run_conversation())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -1,19 +1,30 @@
from typing import Tuple
import aiohttp
import asyncio
import logging
import os
from dailyai.pipeline.aggregators import SentenceAggregator
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.pipeline.frames import AudioFrame, ImageFrame
from dailyai.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from examples.foundational.support.runner import configure
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
transport = DailyTransport(
room_url,
None,
"Respond bot",
@@ -22,62 +33,83 @@ async def main(room_url: str):
mic_sample_rate=16000,
camera_enabled=True,
camera_width=1024,
camera_height=1024
camera_height=1024,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
tts1 = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"))
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts2 = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl")
voice_id="jBpfuIE2acCO8z3wKNLl",
)
dalle = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"))
key_secret=os.getenv("FAL_KEY_SECRET"),
)
bot1_messages = [
{"role": "system", "content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long."},
{
"role": "system",
"content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long.",
},
]
bot2_messages = [
{
"role": "system",
"content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich."},
"content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich.",
},
]
async def get_bot1_statement():
# Run the LLMs synchronously for the back-and-forth
bot1_msg = await llm.run_llm(bot1_messages)
print(f"bot1_msg: {bot1_msg}")
if bot1_msg:
bot1_messages.append({"role": "assistant", "content": bot1_msg})
bot2_messages.append({"role": "user", "content": bot1_msg})
async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
"""This function streams text from the LLM and uses the TTS service to convert
that text to speech as it's received. """
source_queue = asyncio.Queue()
sink_queue = asyncio.Queue()
sentence_aggregator = SentenceAggregator()
pipeline = Pipeline(
[llm, sentence_aggregator, tts1], source_queue, sink_queue
)
await source_queue.put(LLMMessagesFrame(messages))
await source_queue.put(EndFrame())
await pipeline.run_pipeline()
message = ""
all_audio = bytearray()
async for audio in tts1.run_tts(bot1_msg):
all_audio.extend(audio)
while sink_queue.qsize():
frame = sink_queue.get_nowait()
if isinstance(frame, TextFrame):
message += frame.text
elif isinstance(frame, AudioFrame):
all_audio.extend(frame.data)
return all_audio
return (message, all_audio)
async def get_bot1_statement():
message, audio = await get_text_and_audio(bot1_messages)
bot1_messages.append({"role": "assistant", "content": message})
bot2_messages.append({"role": "user", "content": message})
return audio
async def get_bot2_statement():
# Run the LLMs synchronously for the back-and-forth
bot2_msg = await llm.run_llm(bot2_messages)
print(f"bot2_msg: {bot2_msg}")
if bot2_msg:
bot2_messages.append({"role": "assistant", "content": bot2_msg})
bot1_messages.append({"role": "user", "content": bot2_msg})
message, audio = await get_text_and_audio(bot2_messages)
all_audio = bytearray()
async for audio in tts2.run_tts(bot2_msg):
all_audio.extend(audio)
bot2_messages.append({"role": "assistant", "content": message})
bot1_messages.append({"role": "user", "content": message})
return all_audio
return audio
async def argue():
for i in range(100):

View File

@@ -1,37 +1,42 @@
import asyncio
import logging
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
from dailyai.pipeline.pipeline import Pipeline
from examples.foundational.support.runner import configure
from runner import configure
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
transport = DailyTransportService(
transport = DailyTransport(
room_url,
None,
"Transcription bot",
start_transcription=True,
start_transcription=False,
mic_enabled=False,
camera_enabled=False,
speaker_enabled=True
speaker_enabled=True,
)
stt = WhisperSTTService()
transcription_output_queue = asyncio.Queue()
pipeline = Pipeline([stt])
pipeline.set_sink(transcription_output_queue)
async def handle_transcription():
print("`````````TRANSCRIPTION`````````")
while True:
item = await transcription_output_queue.get()
print(item.text)
async def handle_speaker():
await stt.run_to_queue(
transcription_output_queue,
transport.get_receive_frames()
)
await asyncio.gather(transport.run(), handle_speaker(), handle_transcription())
await asyncio.gather(transport.run(pipeline), handle_transcription())
if __name__ == "__main__":

View File

@@ -0,0 +1,53 @@
import asyncio
import logging
from dailyai.pipeline.frames import EndFrame, TranscriptionFrame
from dailyai.transports.local_transport import LocalTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
from dailyai.pipeline.pipeline import Pipeline
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
meeting_duration_minutes = 1
transport = LocalTransport(
mic_enabled=False,
camera_enabled=False,
speaker_enabled=True,
duration_minutes=meeting_duration_minutes,
start_transcription=False,
)
stt = WhisperSTTService()
transcription_output_queue = asyncio.Queue()
transport_done = asyncio.Event()
pipeline = Pipeline([stt])
pipeline.set_sink(transcription_output_queue)
async def handle_transcription():
print("`````````TRANSCRIPTION`````````")
while not transport_done.is_set():
item = await transcription_output_queue.get()
print("got item from queue", item)
if isinstance(item, TranscriptionFrame):
print(item.text)
elif isinstance(item, EndFrame):
break
print("handle_transcription done")
async def run_until_done():
await transport.run(pipeline)
transport_done.set()
print("run_until_done done")
await asyncio.gather(run_until_done(), handle_transcription())
if __name__ == "__main__":
asyncio.run(main())

Binary file not shown.

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 870 KiB

After

Width:  |  Height:  |  Size: 870 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 872 KiB

After

Width:  |  Height:  |  Size: 872 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 33 KiB

After

Width:  |  Height:  |  Size: 33 KiB

View File

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -4,15 +4,15 @@ import time
import urllib
import requests
from dotenv import load_dotenv
load_dotenv()
def configure():
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
"-u",
"--url",
type=str,
required=False,
help="URL of the Daily room to join")
parser.add_argument(
"-k",
"--apikey",
@@ -33,20 +33,25 @@ def configure():
if not key:
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
# Create a meeting token for the given room with an expiration 1 hour in the future.
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={"Authorization": f"Bearer {key}"},
headers={
"Authorization": f"Bearer {key}"},
json={
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
},
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]

View File

@@ -1,23 +1,30 @@
import aiohttp
import argparse
import asyncio
import logging
import tkinter as tk
import os
from dailyai.pipeline.frames import AudioFrame, ImageFrame
from dailyai.services.azure_ai_services import AzureLLMService
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.local_transport_service import LocalTransportService
from dailyai.transports.local_transport import LocalTransport
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url):
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 5
tk_root = tk.Tk()
tk_root.title("Calendar")
transport = LocalTransportService(
transport = LocalTransport(
mic_enabled=True,
camera_enabled=True,
camera_width=1024,
@@ -26,16 +33,16 @@ async def main(room_url):
tk_root=tk_root,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV",
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
dalle = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
@@ -44,7 +51,8 @@ async def main(room_url):
)
# Get a complete audio chunk from the given text. Splitting this into its own
# coroutine lets us ensure proper ordering of the audio chunks on the send queue.
# coroutine lets us ensure proper ordering of the audio chunks on the
# send queue.
async def get_all_audio(text):
all_audio = bytearray()
async for audio in tts.run_tts(text):
@@ -53,12 +61,8 @@ async def main(room_url):
return all_audio
async def get_month_data(month):
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
messages = [{"role": "system", "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {
month}. Include only the image description with no preamble. Limit the description to one sentence, please.", }]
image_description = await llm.run_llm(messages)
if not image_description:
@@ -66,10 +70,9 @@ async def main(room_url):
to_speak = f"{month}: {image_description}"
audio_task = asyncio.create_task(get_all_audio(to_speak))
image_task = asyncio.create_task(dalle.run_image_gen(image_description))
(audio, image_data) = await asyncio.gather(
audio_task, image_task
)
image_task = asyncio.create_task(
dalle.run_image_gen(image_description))
(audio, image_data) = await asyncio.gather(audio_task, image_task)
return {
"month": month,
@@ -97,7 +100,8 @@ async def main(room_url):
async def show_images():
# This will play the months in the order they're completed. The benefit
# is we'll have as little delay as possible before the first month, and
# likely no delay between months, but the months won't display in order.
# likely no delay between months, but the months won't display in
# order.
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
if data:
@@ -119,16 +123,12 @@ async def main(room_url):
tk_root.update_idletasks()
await asyncio.sleep(0.1)
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
month_tasks = [
asyncio.create_task(
get_month_data(month)) for month in months]
await asyncio.gather(transport.run(), show_images(), run_tk())
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
args, unknown = parser.parse_known_args()
asyncio.run(main(args.url))
asyncio.run(main())

View File

@@ -1,22 +1,29 @@
import argparse
import asyncio
import os
import logging
from typing import AsyncGenerator
import aiohttp
import requests
import time
import urllib.parse
from PIL import Image
from dailyai.pipeline.frames import ImageFrame, Frame
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.pipeline.frames import ImageFrame, Frame
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.ai_services import AIService
from dailyai.pipeline.aggregators import LLMAssistantContextAggregator, LLMUserContextAggregator
from dailyai.pipeline.aggregators import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
)
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from examples.foundational.support.runner import configure
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class ImageSyncAggregator(AIService):
@@ -35,7 +42,7 @@ class ImageSyncAggregator(AIService):
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
transport = DailyTransport(
room_url,
token,
"Respond bot",
@@ -47,18 +54,22 @@ async def main(room_url: str, token):
transport._mic_enabled = True
transport._mic_sample_rate = 16000
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
img = FalImageGenService(
image_size="1024x1024",
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"))
key_secret=os.getenv("FAL_KEY_SECRET"),
)
async def get_images():
get_speaking_task = asyncio.create_task(
@@ -80,30 +91,26 @@ async def main(room_url: str, token):
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id
)
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
)
os.path.join(
os.path.dirname(__file__), "assets", "speaking.png"), os.path.join(
os.path.dirname(__file__), "assets", "waiting.png"), )
await tts.run_to_queue(
transport.send_queue,
image_sync_aggregator.run(
tma_out.run(
llm.run(
tma_in.run(
transport.get_receive_frames()
)
)
)
)
tma_out.run(llm.run(tma_in.run(transport.get_receive_frames())))
),
)
transport.transcription_settings["extra"]["punctuate"] = True

View File

@@ -1,36 +1,45 @@
import aiohttp
import asyncio
import logging
import os
import random
from typing import AsyncGenerator
from PIL import Image
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import LLMUserContextAggregator, LLMAssistantContextAggregator
from dailyai.pipeline.aggregators import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from dailyai.pipeline.frames import (
Frame,
TextFrame,
ImageFrame,
SpriteFrame,
TranscriptionQueueFrame,
TranscriptionFrame,
)
from dailyai.services.ai_services import AIService
from examples.foundational.support.runner import configure
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
sprites = {}
image_files = [
'sc-default.png',
'sc-talk.png',
'sc-listen-1.png',
'sc-think-1.png',
'sc-think-2.png',
'sc-think-3.png',
'sc-think-4.png'
"sc-default.png",
"sc-talk.png",
"sc-listen-1.png",
"sc-think-1.png",
"sc-think-2.png",
"sc-think-3.png",
"sc-think-4.png",
]
script_dir = os.path.dirname(__file__)
@@ -47,16 +56,18 @@ for file in image_files:
# When the bot isn't talking, show a static image of the cat listening
quiet_frame = ImageFrame("", sprites["sc-listen-1.png"])
# When the bot is talking, build an animation from two sprites
talking_list = [sprites['sc-default.png'], sprites['sc-talk.png']]
talking_list = [sprites["sc-default.png"], sprites["sc-talk.png"]]
talking = [random.choice(talking_list) for x in range(30)]
talking_frame = SpriteFrame(images=talking)
# TODO: Support "thinking" as soon as we get a valid transcript, while LLM is processing
# TODO: Support "thinking" as soon as we get a valid transcript, while LLM
# is processing
thinking_list = [
sprites['sc-think-1.png'],
sprites['sc-think-2.png'],
sprites['sc-think-3.png'],
sprites['sc-think-4.png']]
sprites["sc-think-1.png"],
sprites["sc-think-2.png"],
sprites["sc-think-3.png"],
sprites["sc-think-4.png"],
]
thinking_frame = SpriteFrame(images=thinking_list)
@@ -65,7 +76,7 @@ class TranscriptFilter(AIService):
self.bot_participant_id = bot_participant_id
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, TranscriptionQueueFrame):
if isinstance(frame, TranscriptionFrame):
if frame.participantId != self.bot_participant_id:
yield frame
@@ -105,7 +116,7 @@ class ImageSyncAggregator(AIService):
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
transport = DailyTransport(
room_url,
token,
"Santa Cat",
@@ -115,7 +126,7 @@ async def main(room_url: str, token):
mic_sample_rate=16000,
camera_enabled=True,
camera_width=720,
camera_height=1280
camera_height=1280,
)
transport._mic_enabled = True
transport._mic_sample_rate = 16000
@@ -123,28 +134,34 @@ async def main(room_url: str, token):
transport._camera_width = 720
transport._camera_height = 1280
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl")
voice_id="jBpfuIE2acCO8z3wKNLl",
)
isa = ImageSyncAggregator()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
await tts.say("Hi! If you want to talk to me, just say 'hey Santa Cat'.", transport.send_queue)
await tts.say(
"Hi! If you want to talk to me, just say 'hey Santa Cat'.",
transport.send_queue,
)
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while. Your responses should only be a few sentences long."},
{
"role": "system",
"content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while. Your responses should only be a few sentences long.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id
)
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
@@ -156,15 +173,10 @@ async def main(room_url: str, token):
tma_out.run(
llm.run(
tma_in.run(
ncf.run(
tf.run(
transport.get_receive_frames()
)
)
)
ncf.run(tf.run(transport.get_receive_frames())))
)
)
)
),
)
async def starting_image():

View File

@@ -4,25 +4,33 @@ import logging
import os
import wave
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import LLMContextAggregator, LLMUserContextAggregator, LLMAssistantContextAggregator
from dailyai.pipeline.aggregators import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from dailyai.services.ai_services import AIService, FrameLogger
from dailyai.pipeline.frames import Frame, AudioFrame, LLMResponseEndFrame, LLMMessagesQueueFrame
from dailyai.pipeline.frames import (
Frame,
AudioFrame,
LLMResponseEndFrame,
LLMMessagesFrame,
)
from typing import AsyncGenerator
from examples.foundational.support.runner import configure
from runner import configure
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s") # or whatever
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
sounds = {}
sound_files = [
'ding1.wav',
'ding2.wav'
]
sound_files = ["ding1.wav", "ding2.wav"]
script_dir = os.path.dirname(__file__)
@@ -54,7 +62,7 @@ class InboundSoundEffectWrapper(AIService):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMMessagesQueueFrame):
if isinstance(frame, LLMMessagesFrame):
yield AudioFrame(sounds["ding2.wav"])
# In case anything else up the stack needs it
yield frame
@@ -64,24 +72,25 @@ class InboundSoundEffectWrapper(AIService):
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransportService(
transport = DailyTransport(
room_url,
token,
"Respond bot",
duration_minutes=5,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False
camera_enabled=False,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV")
voice_id="ErXwobaYiN019PkySvjV",
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport):
@@ -90,12 +99,14 @@ async def main(room_url: str, token):
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id
)
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
@@ -111,15 +122,13 @@ async def main(room_url: str, token):
llm.run(
fl2.run(
in_sound.run(
tma_in.run(
transport.get_receive_frames()
)
tma_in.run(transport.get_receive_frames())
)
)
)
)
)
)
),
)
transport.transcription_settings["extra"]["punctuate"] = True

View File

@@ -0,0 +1,25 @@
syntax = "proto3";
package dailyai_proto;
message TextFrame {
string text = 1;
}
message AudioFrame {
bytes audio = 1;
}
message TranscriptionFrame {
string text = 1;
string participant_id = 2;
string timestamp = 3;
}
message Frame {
oneof frame {
TextFrame text = 1;
AudioFrame audio = 2;
TranscriptionFrame transcription = 3;
}
}

View File

@@ -0,0 +1,134 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="//cdn.jsdelivr.net/npm/protobufjs@7.X.X/dist/protobuf.min.js"></script>
<title>WebSocket Audio Stream</title>
</head>
<body>
<h1>WebSocket Audio Stream</h1>
<button id="startAudioBtn">Start Audio</button>
<button id="stopAudioBtn">Stop Audio</button>
<script>
const SAMPLE_RATE = 16000;
const BUFFER_SIZE = 8192;
const MIN_AUDIO_SIZE = 6400;
let audioContext;
let microphoneStream;
let scriptProcessor;
let source;
let frame;
let audioChunks = [];
let isPlaying = false;
let ws;
const proto = protobuf.load("frames.proto", (err, root) => {
if (err) throw err;
frame = root.lookupType("dailyai_proto.Frame");
});
function initWebSocket() {
ws = new WebSocket('ws://localhost:8765');
ws.addEventListener('open', () => console.log('WebSocket connection established.'));
ws.addEventListener('message', handleWebSocketMessage);
ws.addEventListener('close', (event) => console.log("WebSocket connection closed.", event.code, event.reason));
ws.addEventListener('error', (event) => console.error('WebSocket error:', event));
}
async function handleWebSocketMessage(event) {
const arrayBuffer = await event.data.arrayBuffer();
enqueueAudioFromProto(arrayBuffer);
}
function enqueueAudioFromProto(arrayBuffer) {
const parsedFrame = frame.decode(new Uint8Array(arrayBuffer));
if (!parsedFrame?.audio) return false;
const frameCount = parsedFrame.audio.data.length / 2;
const audioOutBuffer = audioContext.createBuffer(1, frameCount, SAMPLE_RATE);
const nowBuffering = audioOutBuffer.getChannelData(0);
const view = new Int16Array(parsedFrame.audio.data.buffer);
for (let i = 0; i < frameCount; i++) {
const word = view[i];
nowBuffering[i] = ((word + 32768) % 65536 - 32768) / 32768.0;
}
audioChunks.push(audioOutBuffer);
if (!isPlaying) playNextChunk();
}
function playNextChunk() {
if (audioChunks.length === 0) {
isPlaying = false;
return;
}
isPlaying = true;
const audioOutBuffer = audioChunks.shift();
const source = audioContext.createBufferSource();
source.buffer = audioOutBuffer;
source.connect(audioContext.destination);
source.onended = playNextChunk;
source.start();
}
function startAudio() {
if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) {
alert('getUserMedia is not supported in your browser.');
return;
}
navigator.mediaDevices.getUserMedia({ audio: true })
.then((stream) => {
microphoneStream = stream;
audioContext = new (window.AudioContext || window.webkitAudioContext)();
scriptProcessor = audioContext.createScriptProcessor(BUFFER_SIZE, 1, 1);
source = audioContext.createMediaStreamSource(stream);
source.connect(scriptProcessor);
scriptProcessor.connect(audioContext.destination);
const audioBuffer = [];
const skipRatio = Math.floor(audioContext.sampleRate / (SAMPLE_RATE * 2));
scriptProcessor.onaudioprocess = (event) => {
const rawLeftChannelData = event.inputBuffer.getChannelData(0);
for (let i = 0; i < rawLeftChannelData.length; i += skipRatio) {
const normalized = ((rawLeftChannelData[i] * 32768.0) + 32768) % 65536 - 32768;
const swappedBytes = ((normalized & 0xff) << 8) | ((normalized >> 8) & 0xff);
audioBuffer.push(swappedBytes);
}
if (audioBuffer.length >= MIN_AUDIO_SIZE) {
const audioFrame = frame.create({ audio: { audio: audioBuffer.slice(0, MIN_AUDIO_SIZE) } });
const encodedFrame = new Uint8Array(frame.encode(audioFrame).finish());
ws.send(encodedFrame);
audioBuffer.splice(0, MIN_AUDIO_SIZE);
}
};
initWebSocket();
})
.catch((error) => console.error('Error accessing microphone:', error));
}
function stopAudio() {
if (ws) {
ws.close();
scriptProcessor.disconnect();
source.disconnect();
ws = undefined;
}
}
document.getElementById('startAudioBtn').addEventListener('click', startAudio);
document.getElementById('stopAudioBtn').addEventListener('click', stopAudio);
</script>
</body>
</html>

View File

@@ -0,0 +1,50 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.frames import TextFrame, TranscriptionFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.transports.websocket_transport import WebsocketTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
logging.basicConfig(format="%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class WhisperTranscriber(FrameProcessor):
async def process_frame(self, frame):
if isinstance(frame, TranscriptionFrame):
print(f"Transcribed: {frame.text}")
else:
yield frame
async def main():
async with aiohttp.ClientSession() as session:
transport = WebsocketTransport(
mic_enabled=True,
speaker_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([
WhisperSTTService(),
WhisperTranscriber(),
tts,
])
@transport.on_connection
async def queue_frame():
await pipeline.queue_frames([TextFrame("Hello there!")])
await transport.run(pipeline)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -5,7 +5,7 @@ import time
import urllib.parse
import random
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.pipeline.frames import Frame, FrameType
from dailyai.services.fal_ai_services import FalImageGenService
@@ -17,7 +17,7 @@ async def main(room_url: str, token):
global llm
global tts
transport = DailyTransportService(
transport = DailyTransport(
room_url,
token,
"Imagebot",
@@ -45,14 +45,17 @@ async def main(room_url: str, token):
print(f"finder: {finder}")
if finder >= 0:
async for audio in tts.run_tts(f"Resetting."):
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
transport.output_queue.put(
Frame(FrameType.AUDIO_FRAME, audio))
sentence = ""
continue
# todo: we could differentiate between transcriptions from different participants
# todo: we could differentiate between transcriptions from
# different participants
sentence += f" {message['text']}"
print(f"sentence is now: {sentence}")
# TODO: Cache this audio
phrase = random.choice(["OK.", "Got it.", "Sure.", "You bet.", "Sure thing."])
phrase = random.choice(
["OK.", "Got it.", "Sure.", "You bet.", "Sure thing."])
async for audio in tts.run_tts(phrase):
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
img_result = img.run_image_gen(sentence, "1024x1024")
@@ -82,8 +85,11 @@ async def main(room_url: str, token):
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
"-u",
"--url",
type=str,
required=True,
help="URL of the Daily room to join")
parser.add_argument(
"-k",
"--apikey",
@@ -94,20 +100,25 @@ if __name__ == "__main__":
args, unknown = parser.parse_known_args()
# Create a meeting token for the given room with an expiration 1 hour in the future.
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(args.url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={"Authorization": f"Bearer {args.apikey}"},
headers={
"Authorization": f"Bearer {args.apikey}"},
json={
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
},
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]

View File

@@ -3,14 +3,17 @@ import asyncio
import os
import wave
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.pipeline.aggregators import LLMContextAggregator
from dailyai.services.ai_services import AIService, FrameLogger
from dailyai.pipeline.frames import Frame, AudioFrame, LLMResponseEndFrame, LLMMessagesQueueFrame
from dailyai.pipeline.frames import Frame, AudioFrame, LLMResponseEndFrame, LLMMessagesFrame
from typing import AsyncGenerator
from examples.foundational.support.runner import configure
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
sounds = {}
sound_files = [
@@ -48,7 +51,7 @@ class InboundSoundEffectWrapper(AIService):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMMessagesQueueFrame):
if isinstance(frame, LLMMessagesFrame):
yield AudioFrame(sounds["ding2.wav"])
# In case anything else up the stack needs it
yield frame
@@ -63,7 +66,7 @@ async def main(room_url: str, token, phone):
global llm
global tts
transport = DailyTransportService(
transport = DailyTransport(
room_url,
token,
"Respond bot",

34
examples/server/README.md Normal file
View File

@@ -0,0 +1,34 @@
# Server Example
Use this server app to quickly host a bot on the web:
```
flask --app daily-bot-manager.py --debug run
```
It's currently configured to serve example apps defined in the APPS constant in the server file:
```
chatbot
patient-intake
storybot
translator
```
Once the server is started, you can create a bot instance by opening `http://127.0.0.1:5000/start/chatbot` in a browser, and the server will do the following:
- Create a new, randomly-named Daily room with `DAILY_API_KEY` from your .env file or environment
- Start an instance of `chatbot.py` and connect it to that room
- 301 redirect your browser to the room
### Options
The server supports several options, which can be set in the body of a POST request, or as params in the URL of a GET request.
- `room_url` (default: none): A room URL to join. If empty, the server will create a Daily room and return the URL in the response.
room_properties (none): A JSON object (URL encoded if included as a GET parameter) for overriding default room creation properties, as described here: https://docs.daily.co/reference/rest-api/rooms/create-room This will be ignored if a room_url is provided.
- `token_properties` (none): A JSON object (URL encoded if included as a GET parameter) for overriding default token properties. By default, the server creates an owner token with an expiration time of one hour.
- `duration` (7200 seconds, or two hours): Use this property to set a time limit for the bot, as well as an expiration time for the room (if the server is creating one). This will not add an expiration time to an existing room. Expiration times in `token_properties` or `room_properties` will also take precedence over this value. You can set this property to `0` to disable timeouts, but this isn't recommended.
- `bot_args` (none): A string containing any additional command-line args to pass to the bot.
- `wait_for_bot` (true): Whether to wait for the bot to successfully join the room before returning a response from the server. If true, the server will start the bot script, then poll the room for up to 5 seconds to confirm the bot has joined the room. If it doesn't, the server will stop the bot and return a 500 response. If set to `false`, the server will start the bot, but immediately return a 200 response. This can be useful if the server is creating rooms for you, and you need the room URL to join the user to the room.
- `redirect` (true): Instead of returning a 200 for GET requests, the server will return a 301 redirect to the ROOM_URL. This is handy for testing by creating a bot with a GET request directly in the browser. POST requests will never return redirects. Set to `false` to get 200 responses with info in a JSON object even for GET requests.

View File

@@ -0,0 +1,165 @@
import os
import requests
import urllib
import subprocess
import time
from flask import Flask, jsonify, redirect, request
from flask_cors import CORS
from dotenv import load_dotenv
load_dotenv(override=True)
app = Flask(__name__)
CORS(app)
APPS = {
"chatbot": "../starter-apps/chatbot.py",
"patient-intake": "../starter-apps/patient-intake.py",
"storybot": "../starter-apps/storybot.py",
"translator": "../starter-apps/translator.py"
}
daily_api_key = os.getenv("DAILY_API_KEY")
api_path = os.getenv("DAILY_API_PATH") or "https://api.daily.co/v1"
def get_room_name(room_url):
return urllib.parse.urlparse(room_url).path[1:]
def create_room(room_properties, exp):
room_props = {
"exp": exp,
"enable_chat": True,
"enable_emoji_reactions": True,
"eject_at_room_exp": True,
"enable_prejoin_ui": False,
"enable_recording": "cloud"
}
if room_properties:
room_props |= room_properties
res = requests.post(
f"{api_path}/rooms",
headers={"Authorization": f"Bearer {daily_api_key}"},
json={
"properties": room_props
},
)
if res.status_code != 200:
raise Exception(f"Unable to create room: {res.text}")
room_url = res.json()["url"]
room_name = res.json()["name"]
return (room_url, room_name)
def create_token(room_name, token_properties, exp):
token_props = {"exp": exp, "is_owner": True}
if token_properties:
token_props |= token_properties
# Force the token to be limited to the room
token_props |= {"room_name": room_name}
res = requests.post(
f'{api_path}/meeting-tokens',
headers={
'Authorization': f'Bearer {daily_api_key}'},
json={
'properties': token_props})
if res.status_code != 200:
if res.status_code != 200:
raise Exception(f"Unable to create meeting token: {res.text}")
meeting_token = res.json()['token']
return meeting_token
def start_bot(*, bot_path, room_url, token, bot_args, wait_for_bot):
room_name = get_room_name(room_url)
proc = subprocess.Popen(
[f"python {bot_path} -u {room_url} -t {token} -k {daily_api_key} {bot_args}"],
shell=True,
bufsize=1,
)
if wait_for_bot:
# Don't return until the bot has joined the room, but wait for at most 5
# seconds.
attempts = 0
while attempts < 50:
time.sleep(0.1)
attempts += 1
res = requests.get(
f"{api_path}/rooms/{room_name}/get-session-data",
headers={"Authorization": f"Bearer {daily_api_key}"},
)
if res.status_code == 200:
print(f"Took {attempts} attempts to join room {room_name}")
return True
# If we don't break from the loop, that means we never found the bot in the room
raise Exception("The bot was unable to join the room. Please try again.")
return True
@app.route("/start/<string:botname>", methods=["GET", "POST"])
def start(botname):
try:
if botname not in APPS:
raise Exception(f"Bot '{botname}' is not in the allowlist.")
bot_path = APPS[botname]
props = {
"room_url": None,
"room_properties": None,
"token_properties": None,
"bot_args": None,
"wait_for_bot": True,
"duration": None,
"redirect": True
}
props |= request.values.to_dict() # gets URL params as well as plaintext POST body
try:
props |= request.json
except BaseException:
pass
if props['redirect'] == "false":
props['redirect'] = False
if props['wait_for_bot'] == "false":
props['wait_for_bot'] = False
duration = int(os.getenv("DAILY_BOT_DURATION") or 7200)
if props['duration']:
duration = props['duration']
exp = time.time() + duration
if (props['room_url']):
room_url = props['room_url']
try:
room_name = get_room_name(room_url)
except ValueError:
raise Exception(
"There was a problem detecting the room name. Please double-check the value of room_url.")
else:
room_url, room_name = create_room(props['room_properties'], exp)
token = create_token(room_name, props['token_properties'], exp)
bot = start_bot(
room_url=room_url,
bot_path=bot_path,
token=token,
bot_args=props['bot_args'],
wait_for_bot=props['wait_for_bot'])
if props['redirect'] and request.method == "GET":
return redirect(room_url, 302)
else:
return jsonify({"room_url": room_url, "token": token})
except BaseException as e:
return f"There was a problem starting the bot: {e}", 500
@app.route("/healthz")
def health_check():
return "ok", 200

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 759 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 884 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 876 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 881 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 874 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 882 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 885 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 888 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 890 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 898 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 836 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 905 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 849 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 864 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 858 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 875 KiB

Some files were not shown because too many files have changed in this diff Show More