Compare commits

...

279 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
db05a9b29b Merge pull request #116 from daily-co/moondream-use-cpu
moondream: allow passing use_cpu
2024-04-11 09:08:11 +08:00
Aleix Conchillo Flaqué
130e418800 moondream: allow passing use_cpu 2024-04-10 17:43:44 -07:00
Aleix Conchillo Flaqué
1a0a66e503 Merge pull request #114 from daily-co/jpt/fal-updates
Updated Fal.ai service to take a params model and allow for model string param
2024-04-11 00:47:33 +08:00
Aleix Conchillo Flaqué
e22babbae2 examples: update with new FalImageGenService parameters 2024-04-10 09:45:08 -07:00
Aleix Conchillo Flaqué
bfe2e0f36e services: don't use image_size in ImageGenService 2024-04-10 09:44:42 -07:00
Aleix Conchillo Flaqué
26d401e5de Merge pull request #115 from daily-co/add-vision-and-moondream-service
add vision and moondream service
2024-04-11 00:22:26 +08:00
Aleix Conchillo Flaqué
3c20f9153d added VisionImageFrame and VisionImageFrameAggregator 2024-04-10 09:19:34 -07:00
Aleix Conchillo Flaqué
2f9899af5a update macos-py3.10 requirements 2024-04-09 22:39:04 -07:00
Aleix Conchillo Flaqué
5ef5cf30f4 update linux-py3.10 requirements 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
34a6c5691b examples: added 12-describe-video 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
18bf09c704 services: added MoondreamService 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
84cfa7cc95 services: added VisionService 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
a5eba0106b transport: allow requesting a user frame 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
b117a185e3 frames: added UserImageRequestFrame 2024-04-09 22:14:54 -07:00
Aleix Conchillo Flaqué
0219230827 Merge pull request #113 from daily-co/aleix/only-subcribe-to-participant
only subcribe to participant
2024-04-10 10:47:29 +08:00
Aleix Conchillo Flaqué
9fcbb36997 examples: add 14a-local-render-remote-participant 2024-04-09 19:46:10 -07:00
Aleix Conchillo Flaqué
0bf15fd6eb daily: only subscribe to participant video source 2024-04-09 19:46:10 -07:00
Aleix Conchillo Flaqué
989252bb52 daily: always check camera/mic/speaker enabled 2024-04-09 19:46:10 -07:00
Jon Taylor
7b44a79a5b added params and model attribute to fal service 2024-04-09 17:43:27 -07:00
Aleix Conchillo Flaqué
4bd29b0080 Merge pull request #110 from daily-co/compatible-versions
pyproject: use compatible version
2024-04-10 00:41:22 +08:00
Aleix Conchillo Flaqué
ebb76fdae9 update macos-py3.10 requirements 2024-04-09 08:52:37 -07:00
Aleix Conchillo Flaqué
5d52def0fe update linux-py3.10 requirements 2024-04-09 08:49:41 -07:00
Aleix Conchillo Flaqué
9ada56d0b0 pyproject: use compatible version 2024-04-09 08:41:54 -07:00
Aleix Conchillo Flaqué
8d73cdb2ee Merge pull request #111 from daily-co/user-transcription-aggregator
pipeline: add UserTranscriptionAggregator
2024-04-09 23:34:52 +08:00
Aleix Conchillo Flaqué
4f04b10202 Merge pull request #112 from daily-co/user-image-frame
user image frames and other updates
2024-04-09 23:34:32 +08:00
Aleix Conchillo Flaqué
97b923e37e llm user and assistant aggregator renames 2024-04-09 08:31:48 -07:00
Aleix Conchillo Flaqué
57aabea0a3 examples: added 14-render-remote-participant 2024-04-09 08:01:14 -07:00
Aleix Conchillo Flaqué
319b8e7816 updated ImageFrame and added URLImageFrame and UserImageFrame 2024-04-08 23:23:33 -07:00
Aleix Conchillo Flaqué
96950ca6df daily: on_first_other_participant_joined now gets the participant 2024-04-08 23:23:33 -07:00
Aleix Conchillo Flaqué
d7b2e67c35 pipeline: add UserTranscriptionAggregator 2024-04-08 17:15:14 -07:00
Aleix Conchillo Flaqué
53930b47a5 github: just some rewording 2024-04-06 18:03:53 -07:00
Aleix Conchillo Flaqué
86c8ab02cc github: also publish stables releases to test pypi 2024-04-06 17:58:13 -07:00
Aleix Conchillo Flaqué
b678097f6d Merge pull request #109 from daily-co/only-use-fps
transport: only use fps to set maxFramerate
2024-04-07 07:02:44 +08:00
Aleix Conchillo Flaqué
eb455043c4 transport: use camera_bitrate and camera_framerate 2024-04-06 12:27:05 -07:00
Aleix Conchillo Flaqué
dd696be04c Merge pull request #108 from daily-co/add-camera-max-framerate
transport: add camera_max_framerate argument
2024-04-06 11:18:42 +08:00
Aleix Conchillo Flaqué
96b2337183 transport: add camera_max_framerate argument 2024-04-05 20:16:03 -07:00
Aleix Conchillo Flaqué
ea52e73f57 Merge pull request #107 from daily-co/increase-max-framerate
transport: increase daily maxFramerate to 30
2024-04-06 11:08:21 +08:00
Aleix Conchillo Flaqué
88404e4739 Merge pull request #106 from daily-co/updated-to-be-updated-examples
examples: updated to_be_updated examples
2024-04-06 11:06:30 +08:00
Aleix Conchillo Flaqué
0fd323714e transport: add camera_max_bitrate argument 2024-04-05 20:05:58 -07:00
Aleix Conchillo Flaqué
a362ca4d3d transport: increase daily maxFramerate to 30 2024-04-05 19:44:25 -07:00
Aleix Conchillo Flaqué
02b5c3dd5f update dot-env.template 2024-04-05 16:16:56 -07:00
Aleix Conchillo Flaqué
497a09cbc8 examples: updated to_be_updated examples 2024-04-05 16:01:23 -07:00
Aleix Conchillo Flaqué
172a14245d Merge pull request #104 from daily-co/threaded-transport-allow-sink-override
examples: fix whisper examples
2024-04-06 04:46:12 +08:00
Aleix Conchillo Flaqué
302246399b Merge pull request #105 from daily-co/local-tranport-read-audio-frames
transports: fix local transport read_audio_frames
2024-04-06 04:44:37 +08:00
Aleix Conchillo Flaqué
9590cc2fbc examples: fix whisper examples 2024-04-05 13:43:51 -07:00
Aleix Conchillo Flaqué
09e4044c72 transports: fix local transport read_audio_frames 2024-04-05 13:34:01 -07:00
Aleix Conchillo Flaqué
efdfb74dc3 github: increase fetch-depth to 100 for test publish 2024-04-05 08:32:29 -07:00
Aleix Conchillo Flaqué
158de6f20b github: fetch-tags and increase fetch-depth for test publish 2024-04-05 08:25:37 -07:00
Aleix Conchillo Flaqué
47f68b742d pyproject: user proper environment for test pypi 2024-04-05 08:02:45 -07:00
Aleix Conchillo Flaqué
2654ca1f62 pyproject: don't use local version for test pypi 2024-04-05 07:51:52 -07:00
Aleix Conchillo Flaqué
4263827ee8 README: use double-quotes with optional dependencies 2024-04-04 17:47:16 -07:00
Aleix Conchillo Flaqué
97fe529b0e github: update test publish workflow 2024-04-04 17:41:31 -07:00
Aleix Conchillo Flaqué
86025723e7 github: one more publish workflow fix 2024-04-04 17:36:20 -07:00
Aleix Conchillo Flaqué
6f4270a552 github: avoid caching in publish workflow 2024-04-04 17:32:50 -07:00
Aleix Conchillo Flaqué
31f050c02b github: more publish workflows fixes 2024-04-04 17:31:59 -07:00
Aleix Conchillo Flaqué
a0fe57721b github: fix publish workflows 2024-04-04 17:17:15 -07:00
Aleix Conchillo Flaqué
abf5e57319 Merge pull request #103 from daily-co/aleix/fix-github-cache-name
github: fix github cache name
2024-04-05 08:03:15 +08:00
Aleix Conchillo Flaqué
44de9007c3 Merge pull request #102 from daily-co/examples-cleanup
examples cleanup
2024-04-05 08:02:57 +08:00
Aleix Conchillo Flaqué
46d265514e pyproject: update github url 2024-04-04 15:52:28 -07:00
Aleix Conchillo Flaqué
9e64de8606 Merge pull request #101 from daily-co/cb/bot-exit
Allow transport exit to end a running pipeline
2024-04-05 06:51:06 +08:00
Aleix Conchillo Flaqué
1ea503c1e6 examples: fix 03a-image-local 2024-04-04 15:35:58 -07:00
Aleix Conchillo Flaqué
d0aeeccb68 github: fix github cache name 2024-04-04 14:36:04 -07:00
Aleix Conchillo Flaqué
d687c8cdeb transports: updated silero vad not found message 2024-04-04 14:05:40 -07:00
Aleix Conchillo Flaqué
951f20c788 transports: don't write/read if microphone/speaker not enabled 2024-04-04 14:05:15 -07:00
Aleix Conchillo Flaqué
982c0a0749 examples: move non-working examples to to_be_updated 2024-04-04 14:04:53 -07:00
Chad Bailey
27cef7cd70 add endframe to transport receive queue 2024-04-04 20:45:23 +00:00
chadbailey59
03ea208361 VAD fallback (#97)
* Silero VAD preferred with webrtc fallback

* webrtc VAD neds a different sample size

* fixup

* fixup
2024-04-04 13:31:07 -05:00
Aleix Conchillo Flaqué
385b51ac83 Merge pull request #98 from daily-co/use-pip-features
use pip optional dependencies
2024-04-05 01:00:21 +08:00
Aleix Conchillo Flaqué
a37e4fabad github: only run publish-test on main 2024-04-04 09:58:42 -07:00
Aleix Conchillo Flaqué
8bc3c03a69 add a requirements.txt per platform 2024-04-03 21:39:10 -07:00
Aleix Conchillo Flaqué
1fc800754b github: no need to install dependencies when building/deploying 2024-04-03 16:26:58 -07:00
Aleix Conchillo Flaqué
18c4bccc13 github: rename deploy to publish 2024-04-03 16:22:23 -07:00
Aleix Conchillo Flaqué
d57d473c13 pyproject.toml: use setuptools_scm to auto manage versions 2024-04-03 16:13:07 -07:00
Aleix Conchillo Flaqué
48bb3c6955 github: add publish to pypi workflows 2024-04-03 15:57:59 -07:00
Aleix Conchillo Flaqué
e3ee3f9cc6 github(lint): use requirements-dev.txt 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
3528f5d735 use conditional imports and show help errors if modules not found 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
23735cb3a3 dot-env.example: cleanup and add missing environment variables 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
6918dc69f0 github: separate build and test workflows 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
128d350abc pyproject.toml: use project optional dependencies and pin them 2024-04-03 15:27:31 -07:00
chadbailey59
2f59e38a7a Modularize tricky dependencies (#95)
* removed pyaudio from threaded transport

* modularized torch and torchaudio

* modularized local transport

* Working Dockerfile as well

* docker updates for fly.io
2024-04-03 10:48:11 -05:00
chadbailey59
c21014860f Added app messages to translator example (#94) 2024-04-01 14:25:20 -05:00
chadbailey59
d4e3e1710f Server updates (#90)
* updated server readme

* fixup

* Refactored server

* fixup
2024-03-28 15:03:08 -05:00
Moishe Lettvin
e7f9296b5a Merge pull request #93 from daily-co/frame-name-cleanup
Cleanup the last few badly-named Frame types
2024-03-28 14:25:59 -04:00
Moishe Lettvin
27322108b7 Cleanup the last few badly-named Frame types 2024-03-28 12:36:24 -04:00
Moishe Lettvin
22bbedec93 Merge pull request #92 from daily-co/remove-bad-print
Remove mistakenly-added print statement
2024-03-28 11:54:49 -04:00
Moishe Lettvin
ed91bc0f66 Remove mistakenly-added print statement 2024-03-28 11:47:11 -04:00
Moishe Lettvin
565acfa9c9 Merge pull request #86 from daily-co/transport-refactor
Starting refactor of transports into their own directory
2024-03-28 11:17:32 -04:00
Moishe Lettvin
a2295b6b1d Merge pull request #91 from daily-co/pipeline-logging
Add logging for pipeline
2024-03-28 11:16:26 -04:00
Moishe Lettvin
fef1366c84 Merge pull request #88 from daily-co/frame-progress-diagram
Frame progress diagram
2024-03-28 11:13:21 -04:00
Moishe Lettvin
5c0ba1b6f0 Fix off by one errors, add tests and comment 2024-03-28 08:34:34 -04:00
Moishe Lettvin
05c77bce25 Add logging for pipeline 2024-03-27 18:48:30 -04:00
Moishe Lettvin
4ce140bf84 Move some things to AbstractTransport class 2024-03-27 12:59:08 -04:00
James Hush
a3293c6d7a fix: force overriding environment variables from .env files (#89) 2024-03-27 23:38:55 +08:00
Moishe Lettvin
ce04d4a54a Add text to md 2024-03-27 08:10:14 -04:00
Moishe Lettvin
758ed2d895 Frame progress images 2024-03-26 20:40:10 -04:00
Moishe Lettvin
85cd795b2b fix image 2024-03-26 20:36:18 -04:00
Moishe Lettvin
6c36d5f686 Testing 2024-03-26 20:33:09 -04:00
Moishe Lettvin
b2425d6dcd Testing 2024-03-26 20:32:32 -04:00
Moishe Lettvin
e8a6560ac1 Merge forgotten files 2024-03-26 16:24:47 -04:00
Moishe Lettvin
78c80d8941 some more renames 2024-03-26 15:57:19 -04:00
Moishe Lettvin
2fc5de6afe Starting refactor of transports into their own directory 2024-03-26 08:35:04 -04:00
Moishe Lettvin
24fb7c5a05 Merge pull request #81 from daily-co/websocket-transport
Websocket transport
2024-03-25 14:40:34 -04:00
Moishe Lettvin
5761e23af1 remove unnecessary checks 2024-03-25 14:00:08 -04:00
Moishe Lettvin
960c659d5a Remove duplicated constant 2024-03-25 13:59:03 -04:00
Moishe Lettvin
2bda4c3307 Websocket transport 2024-03-25 13:54:34 -04:00
Aleix Conchillo Flaqué
2c5628a621 Merge pull request #85 from daily-co/minor-readme-update
README: minor fixes
2024-03-22 04:33:42 +08:00
Aleix Conchillo Flaqué
9b4cfd9a6c README: minor fixes 2024-03-21 13:16:50 -07:00
Aleix Conchillo Flaqué
8f9aeb0751 Merge pull request #82 from daily-co/remove-unused-imports
remove unused imports
2024-03-22 03:02:07 +08:00
Aleix Conchillo Flaqué
e8a9d43287 Merge pull request #84 from daily-co/use-openai-api-key
use OPENAI_API_KEY instead of OPENAI_CHATGPT_API_KEY
2024-03-21 21:57:40 +08:00
Aleix Conchillo Flaqué
cf5d516d51 use OPENAI_API_KEY instead of OPENAI_CHATGPT_API_KEY
Fixes #77
2024-03-20 15:26:32 -07:00
Aleix Conchillo Flaqué
0666dd1194 remove unused imports 2024-03-20 14:52:19 -07:00
Aleix Conchillo Flaqué
42e25ccd13 create missing __init__.py 2024-03-20 14:41:39 -07:00
Aleix Conchillo Flaqué
520cee273f Merge pull request #80 from daily-co/move-src-daily-tests-to-tests
move src/dailyai/tests to tests
2024-03-21 00:27:07 +08:00
Aleix Conchillo Flaqué
a189e2618f github: source venv in every step 2024-03-19 15:31:03 -07:00
Aleix Conchillo Flaqué
ae2dcf88ed github: use virtual environment 2024-03-19 15:23:09 -07:00
Aleix Conchillo Flaqué
5cdb82ad3c README: one more autopep8 emacs update 2024-03-19 15:18:29 -07:00
Aleix Conchillo Flaqué
593513c84a github: add venv caching 2024-03-19 15:17:48 -07:00
Aleix Conchillo Flaqué
16257f8ec0 move src/dailyai/tests to tests 2024-03-19 14:59:48 -07:00
Aleix Conchillo Flaqué
5fc21a7508 Merge pull request #73 from daily-co/github-unittests-workflow
github: add workflow for unit tests
2024-03-20 03:01:03 +08:00
Aleix Conchillo Flaqué
cc05429135 github: add workflow for unit tests 2024-03-19 11:51:14 -07:00
Aleix Conchillo Flaqué
85e66dddbe Merge pull request #79 from daily-co/readme-emacs-autopep8-update
README: emacs autopep8 update
2024-03-20 02:17:44 +08:00
Aleix Conchillo Flaqué
03ea559839 README: emacs autopep8 update 2024-03-19 10:28:11 -07:00
Aleix Conchillo Flaqué
b6c9859e34 Merge pull request #78 from daily-co/readme-editor-setup
README: add editor setup
2024-03-20 01:10:57 +08:00
Aleix Conchillo Flaqué
bc47c909a3 README: add editor setup 2024-03-19 10:10:14 -07:00
Aleix Conchillo Flaqué
428659730d Merge pull request #70 from daily-co/move-src-example-to-examples
move src/examples to examples
2024-03-20 01:09:13 +08:00
Aleix Conchillo Flaqué
a573277a10 examples: copy runner.py and auth.py where needed 2024-03-18 17:10:23 -07:00
Aleix Conchillo Flaqué
69c2637a25 README.md: update examples 2024-03-18 14:53:53 -07:00
Aleix Conchillo Flaqué
90c34d278f move src/examples to examples 2024-03-18 11:51:38 -07:00
Aleix Conchillo Flaqué
2f4e31d1b2 Merge pull request #69 from daily-co/add-github-linting-workflow
github: add linting workflow
2024-03-19 02:46:50 +08:00
Aleix Conchillo Flaqué
9385270775 autopep8 formatting 2024-03-18 11:28:32 -07:00
Aleix Conchillo Flaqué
2914e43350 github: add linting workflow 2024-03-18 11:28:06 -07:00
chadbailey59
78638d2dba Live translation (#61)
* added translator

* fixup
2024-03-18 13:26:05 -05:00
Aleix Conchillo Flaqué
141a5bb548 Merge pull request #68 from daily-co/log-transcription-errors
daily: log transcription errors
2024-03-19 01:53:40 +08:00
Aleix Conchillo Flaqué
3957813202 Merge pull request #67 from daily-co/add-dot-env-template
add dot-env.template
2024-03-19 01:49:21 +08:00
Aleix Conchillo Flaqué
549862ef99 daily: log transcription errors 2024-03-18 10:47:20 -07:00
Aleix Conchillo Flaqué
1000ca5b55 add dot-env.template 2024-03-18 10:43:57 -07:00
Moishe Lettvin
91dbfef4c3 Merge pull request #64 from daily-co/docs
Some docs
2024-03-18 13:38:32 -04:00
Moishe Lettvin
3b61d0b41a fix typos 2024-03-18 13:38:00 -04:00
Moishe Lettvin
bf3ae091b9 Merge pull request #62 from daily-co/anthropic-support
Anthropic LLM service
2024-03-18 13:36:39 -04:00
Aleix Conchillo Flaqué
34ac796607 Merge pull request #66 from daily-co/daily-transport-release-client
services: release daily client after leave
2024-03-19 01:36:22 +08:00
Aleix Conchillo Flaqué
e0551e9d85 services: release daily client after leave 2024-03-18 10:32:46 -07:00
Moishe Lettvin
b1ab6f91b9 Merge pull request #65 from daily-co/app-messages
Support for app messages
2024-03-18 11:37:10 -04:00
Moishe Lettvin
58726dc20d clean up imports 2024-03-18 10:14:51 -04:00
Moishe Lettvin
8e61fe8e36 Support for app messages 2024-03-18 10:08:41 -04:00
Moishe Lettvin
99b836c227 added docstrings to frames. 2024-03-18 09:08:12 -04:00
Moishe Lettvin
1c27f77f1a drafty architecture doc 2024-03-18 08:39:50 -04:00
Moishe Lettvin
c91fa39a99 Remove testing code 2024-03-15 19:42:46 -04:00
Moishe Lettvin
eacaea7db4 Anthropic LLM service 2024-03-15 19:40:37 -04:00
Moishe Lettvin
c6dfcb6f7a Merge pull request #60 from daily-co/remove-ai-service-methods
Remove run_to_queue and run from AIService class
2024-03-15 15:28:28 -04:00
Moishe Lettvin
18bf26de14 Update apps 2024-03-15 13:39:33 -04:00
Moishe Lettvin
b8b35db89c Remove run_to_queue and run from AIService class 2024-03-15 11:04:22 -04:00
Moishe Lettvin
358166f347 Merge pull request #59 from daily-co/remove-requirements
Remove unused requirements file
2024-03-13 16:23:42 -04:00
Moishe Lettvin
c006c123b2 Remove unused requirements file 2024-03-13 16:19:03 -04:00
chadbailey59
cf302fb765 Storybot and Chatbot examples (#58)
* storybot

* storybot

* added pipeline.queue_frames

* fixup
2024-03-13 15:12:59 -05:00
Moishe Lettvin
e33820fe36 Merge pull request #56 from daily-co/fal-redux
Use other model in FAL
2024-03-12 15:14:57 -04:00
Moishe Lettvin
b84b3d59f3 Use other model in FAL 2024-03-12 14:47:00 -04:00
Moishe Lettvin
7b5b88b99b Merge pull request #55 from daily-co/fix-fal
set FAL param correctly
2024-03-12 14:12:16 -04:00
Moishe Lettvin
e87196cce7 set FAL param correctly 2024-03-12 14:03:43 -04:00
chadbailey59
bbfc9e703b intake cleanup (#54) 2024-03-12 13:01:39 -05:00
Moishe Lettvin
c21a63d48b Merge pull request #49 from daily-co/openai-base-llm
Base OpenAI LLM service
2024-03-12 12:58:31 -04:00
Moishe Lettvin
f546bb32da Make 08- work again 2024-03-12 10:34:52 -04:00
Moishe Lettvin
d9378e23ba Base OpenAI LLM service 2024-03-11 16:52:41 -04:00
Moishe Lettvin
c75a3fb0d0 Merge pull request #53 from daily-co/fix_other_joined_event
Don't do time-consuming processing in `on_other_joined_event`
2024-03-11 13:27:13 -04:00
Moishe Lettvin
f8ae264957 remove unnecessary print 2024-03-11 13:20:28 -04:00
Moishe Lettvin
977c12d530 undo fal change 2024-03-11 13:19:47 -04:00
Moishe Lettvin
61c55d2f47 Fix up other examples 2024-03-11 13:17:31 -04:00
Moishe Lettvin
fd2fa23e9c Fix example 2 2024-03-11 13:00:29 -04:00
Moishe Lettvin
de026ccc8a Merge pull request #50 from daily-co/khk/launch-samples
Khk/launch samples
2024-03-11 12:50:38 -04:00
Moishe Lettvin
c5bb0e14ab Merge pull request #51 from daily-co/khk/readme
updated README
2024-03-11 12:50:22 -04:00
chadbailey59
a4f3c51184 the smallest commit in history 2024-03-11 09:47:00 -05:00
Moishe Lettvin
7786e685cc Merge pull request #52 from daily-co/pypi-updates
updates to pyproject.toml
2024-03-11 10:34:35 -04:00
Moishe Lettvin
33793ca9f8 update description 2024-03-11 07:31:39 -04:00
Moishe Lettvin
d26aede667 updates to pyproject.toml 2024-03-11 07:25:20 -04:00
Moishe Lettvin
ad993056d8 rename to dailyai 2024-03-11 07:16:20 -04:00
Kwindla Hultman Kramer
5b1f26aacb updated README 2024-03-10 22:06:23 -07:00
Kwindla Hultman Kramer
4e16e514dd attempting to change tts to deepgram in example 04 2024-03-10 19:43:06 -07:00
Kwindla Hultman Kramer
959ffa9d36 small streamlining of example 03 2024-03-10 19:42:19 -07:00
Kwindla Hultman Kramer
4396b1018a small streamlining of example 02 2024-03-10 19:41:32 -07:00
Kwindla Hultman Kramer
37e904ce68 changed fal to a maybe slightly faster model 2024-03-10 19:40:51 -07:00
Kwindla Hultman Kramer
ef39d842a5 custom processor in example 05 2024-03-10 19:18:37 -07:00
Kwindla Hultman Kramer
72f631a066 working on foundational examples 2024-03-10 17:21:46 -07:00
chadbailey59
5d46302b9e changed default services (#47) 2024-03-08 15:36:30 -06:00
chadbailey59
8241dc0bed cleaned up example logging (#46) 2024-03-08 15:25:17 -06:00
Moishe Lettvin
95a1efbe75 Merge pull request #45 from daily-co/exception_handling_callbacks
Wait for the callback's result, so exceptions get raised
2024-03-08 15:04:15 -05:00
Moishe Lettvin
e59df8476e Wait for the callback's result, so exceptions get raised 2024-03-08 15:02:15 -05:00
chadbailey59
824df8ca7c moved patient intake and example runner (#44) 2024-03-08 12:07:51 -06:00
chadbailey59
0db8a51b27 cleaned up function calling frames (#43) 2024-03-08 10:13:28 -06:00
chadbailey59
ce9c6ede66 function allowlist (#42) 2024-03-08 08:49:09 -06:00
Moishe Lettvin
192b46bbab Merge pull request #41 from daily-co/optimize-pipeline
Optimize pipeline processing
2024-03-07 21:01:03 -05:00
Moishe Lettvin
196279e342 Add endframe to sample 4 2024-03-07 19:24:27 -05:00
Moishe Lettvin
edd93bc4cb remove errant print statement 2024-03-07 19:05:03 -05:00
Moishe Lettvin
d0076dd4ee Optimize pipeline processing so we don't wait for the completion of one generator to move onto the next. 2024-03-07 18:59:47 -05:00
chadbailey59
3c5f4800d4 Chad's big patient intake PR (#40)
* at least it runs, kind of

* wip

* wip with user response aggregator

* frame and pipeline docstrings

* Getting started on docstrings

* finish docstrings for aggregators

* patient intake is working!

* cleanup

* cleanup

---------

Co-authored-by: Moishe Lettvin <moishel@gmail.com>
2024-03-07 17:41:32 -06:00
Moishe Lettvin
2bcb4966d3 Merge pull request #39 from daily-co/docstrings
Docstrings
2024-03-07 15:39:50 -05:00
Moishe Lettvin
b14f08a7d5 finish docstrings for aggregators 2024-03-07 15:16:23 -05:00
Moishe Lettvin
8fb92e3fd7 Getting started on docstrings 2024-03-07 12:51:19 -05:00
Moishe Lettvin
337ca7f581 frame and pipeline docstrings 2024-03-07 10:16:27 -05:00
Moishe Lettvin
eb430621f1 Merge pull request #37 from daily-co/fix-interruptible
Fix interruptible pipeline runner and aggregator.
2024-03-07 09:09:41 -05:00
Moishe Lettvin
d5683c4f24 Fix interruptible pipeline runner and aggregator. 2024-03-07 09:05:49 -05:00
chadbailey59
b4505b7eff added audio chunking for better interruption support (#35) 2024-03-06 18:20:04 -06:00
Moishe Lettvin
3e46d28aff Add start frame to interrupt loop 2024-03-06 15:58:19 -05:00
Moishe Lettvin
d3e76c4fd6 Merge pull request #34 from daily-co/rename-frames
Remove Queue in frame names
2024-03-06 14:10:56 -05:00
Moishe Lettvin
62fd371b97 Remove Queue in frame names 2024-03-06 14:09:06 -05:00
Moishe Lettvin
b9556716dd Merge pull request #33 from daily-co/pipeline-instead-of-nest
Pipeline instead of nest
2024-03-05 11:04:20 -05:00
Moishe Lettvin
2708dcf7b5 Remove conversation wrapper 2024-03-04 14:07:49 -05:00
Moishe Lettvin
d3f86dab2e starting on interruptions 2024-03-04 13:41:28 -05:00
Moishe Lettvin
18e7626b9f Getting started on interruptible transport pipeline runner 2024-03-04 07:51:22 -05:00
Moishe Lettvin
763a50f8ec First cut at sample 6 rewrite with pipelines 2024-03-04 07:28:10 -05:00
Moishe Lettvin
3b282cc921 some comments 2024-03-03 20:17:48 -05:00
Moishe Lettvin
434772dc23 Update sample 5! 2024-03-03 19:50:13 -05:00
Moishe Lettvin
15df4a9d58 cleanup, make sample 4 work with new stuff 2024-03-03 19:37:30 -05:00
Moishe Lettvin
643be238f9 getting started 2024-03-03 16:31:31 -05:00
chadbailey59
d90fdb1cae Isolated changes to add VAD (#32)
* added VAD

* added separate 'vad enabled' property
2024-02-28 15:16:44 -06:00
Moishe Lettvin
f710aeae95 Merge pull request #30 from daily-co/unsub-video
cleanup client properties and unsubscribe from camera
2024-02-27 13:16:20 -05:00
Moishe Lettvin
20091d91c9 cleanup client properties and unsubscribe from camera 2024-02-27 13:09:55 -05:00
Moishe Lettvin
92ec5641d4 update deepgram tts to new service structure 2024-02-14 13:44:59 -05:00
Moishe Lettvin
53e97bd872 Merge pull request #28 from daily-co/update-playht-service
Update playht service
2024-02-14 12:54:34 -05:00
Moishe Lettvin
dcbd79333a make destructor call client.close in PlayHT service 2024-02-14 12:53:20 -05:00
Moishe Lettvin
97a4cb8b7f Update playht tts service 2024-02-14 12:40:13 -05:00
Moishe Lettvin
cc7877f626 Merge pull request #26 from daily-co/fix-sigint
fix sigint handling
2024-02-14 12:11:44 -05:00
Moishe Lettvin
1992b7e79e fix sigint handling 2024-02-14 12:10:47 -05:00
Moishe Lettvin
2516670874 Merge pull request #25 from daily-co/keyboard-interrupt
Call client.leave on keyboard interrupt
2024-02-13 14:18:42 -05:00
Moishe Lettvin
4fecc10808 Call client.leave on keyboard interrupt 2024-02-13 14:17:09 -05:00
Moishe Lettvin
08144fc560 Merge pull request #24 from daily-co/another-formatting-pass
Another autopep8 formatting pass
2024-02-10 09:39:51 -05:00
Moishe Lettvin
815aa2bc3e Another autopep8 formatting pass 2024-02-10 09:29:08 -05:00
Moishe Lettvin
560c98f2fa Merge pull request #23 from daily-co/ollama-service
Ollama LLM service
2024-02-10 09:27:17 -05:00
Moishe Lettvin
0e0c992f59 Ollama LLM service 2024-02-10 09:22:52 -05:00
Moishe Lettvin
d76139ac1a Merge pull request #22 from daily-co/temp-readme-patch
Make the README okay-enough for limited public release
2024-02-09 11:57:39 -05:00
Moishe Lettvin
444418d94c Make the README okay-enough for limited public release 2024-02-09 10:26:39 -05:00
Moishe Lettvin
d27122e35e Create LICENSE 2024-02-09 09:10:28 -06:00
Chad Bailey
0ae83577c6 renamed samples to examples 2024-02-08 16:34:48 +00:00
Chad Bailey
5c402eee81 started adding docs 2024-02-08 16:31:17 +00:00
Moishe Lettvin
80750fe022 Remove old/deprecated/broken samples 2024-02-08 09:56:22 -05:00
Moishe Lettvin
ccfba04ea2 Remove mistakenly-added file 2024-02-08 09:55:28 -05:00
Moishe Lettvin
5b8198cf9e Merge pull request #21 from daily-co/cleanup_constructor_args
Cleanup constructor args in examples
2024-02-08 09:44:51 -05:00
Moishe Lettvin
3fa00c4db8 Cleanup constructor args in examples 2024-02-08 09:41:51 -05:00
Moishe Lettvin
4ce36f8c63 Merge pull request #20 from daily-co/base_transport
Add a "Local Transport" as a proof of concept
2024-02-08 08:25:03 -05:00
Moishe Lettvin
9620080cc5 A little example cleanup 2024-02-08 08:24:25 -05:00
Moishe Lettvin
ee1ce8f288 Abstract base transport class & local transport class 2024-02-08 08:15:28 -05:00
chadbailey59
70d07b6ea2 WIP: environment cleanup (#19)
* removed env var usage from SDK services

* started consolidating configure.py

* 1–3 work

* cleaned up the rest

* more cleanup

* cleanup and 05 tinkering

* made fal keys optional
2024-02-06 15:07:16 -06:00
Moishe Lettvin
9d5ad5675c Fix 06- demo and also fix bugs where dangling sentences wouldn't be spoken 2024-02-01 12:54:23 -05:00
chadbailey59
0d96f91cde Added sound effect example (#18)
* added sound effect example

* added dialout to this branch too

* fixup

* fixup for more dialout testing

* cleanup
2024-02-01 10:26:50 -06:00
Moishe Lettvin
4e9586595d minor cleanup 2024-01-29 15:06:39 -05:00
Moishe Lettvin
d0bcddfd70 Fix 06a-image-sync.py 2024-01-29 14:29:32 -05:00
Chad Bailey
065a213ebb example renaming 2024-01-29 17:42:45 +00:00
Chad Bailey
7d6c94d604 added 09 examples 2024-01-29 17:39:28 +00:00
Chad Bailey
0859b57b00 Added 09 examples 2024-01-29 17:39:14 +00:00
Moishe Lettvin
09838c9b1f Merge pull request #17 from daily-co/start_tests
Add some basic daily_transport tests
2024-01-29 07:57:33 -05:00
Moishe Lettvin
c39920132c Add some basic daily_transport tests 2024-01-29 07:56:12 -05:00
Moishe Lettvin
860129a4be Merge pull request #16 from daily-co/image_tweaks
Minor Cleanup
2024-01-27 19:10:52 -05:00
Moishe Lettvin
4416f36ae9 some minor cleanup, and coalesce image/images into one thing, and use itertools.cycle 2024-01-27 19:07:29 -05:00
chadbailey59
86af896150 Wake word and animation sprites (#15)
* WIP: golden kitty

* added web server

* added health check

* added flask to module build

* trying requirements.txt

* added dotenv

* flask_cors

* gunicorn

* requirements cleanup

* Dockerfile

* WOOF

* basic wake word

* removed otel

* basic animation kind of works

* i think animation defeated me

* added santa cat assets

* cleanup

* cleanup

* server example and cleanup

* more cleanup

* fix up some class variable names

* minor cleanup, remove mistakenly-added print and logger stuff

* cleanup

* cleanup

---------

Co-authored-by: Moishe Lettvin <moishel@gmail.com>
2024-01-26 15:37:39 -06:00
Moishe Lettvin
5cbac4701b minor cleanup, remove mistakenly-added print and logger stuff 2024-01-26 15:27:12 -05:00
Moishe Lettvin
5d9aa530e2 fix up some class variable names 2024-01-26 15:15:44 -05:00
Moishe Lettvin
d4c4d49035 Merge pull request #14 from daily-co/aiosessions
Don't create aiohttp sessions inside services
2024-01-26 14:01:24 -05:00
Moishe Lettvin
e81f247845 Don't create aiohttp sessions inside services 2024-01-26 12:30:37 -05:00
Liza
8baf137511 prefix suspected private members (#13) 2024-01-26 18:28:54 +01:00
Moishe Lettvin
fcceb32bd7 Merge pull request #12 from daily-co/frame_sync
Speaking / waiting images
2024-01-26 10:17:01 -05:00
Moishe Lettvin
ead655fe23 some more fixup 2024-01-26 10:07:16 -05:00
Moishe Lettvin
bab102f197 little more cleanup 2024-01-26 09:54:51 -05:00
Moishe Lettvin
95fc802607 Speaking / waiting images 2024-01-26 09:15:29 -05:00
Moishe Lettvin
2886997693 Merge pull request #11 from daily-co/autopep
Autopep linter fixes
2024-01-25 12:17:26 -05:00
Moishe Lettvin
5fdda43bed Autopep linter fixes 2024-01-25 12:12:46 -05:00
Moishe Lettvin
f0d9b0613e Add faster_whisper to module dependencies; remove unneeded import 2024-01-25 11:27:00 -05:00
Moishe Lettvin
a661905d7f Merge pull request #9 from daily-co/interruptions
Interruptable conversation wrapper
2024-01-25 11:24:57 -05:00
Moishe Lettvin
c9c2e5f561 Remove unnecessary try/except 2024-01-25 11:18:55 -05:00
Moishe Lettvin
795a339542 Add InterruptibleConversationWrapper 2024-01-25 11:15:04 -05:00
Liza
31db156dfc Local Whisper transcription (#10)
* First pass at Whisper transcription

* deletions

* Revise based on feedback, add autopep8
2024-01-25 13:43:25 +01:00
Moishe Lettvin
690cf2e47d Merge pull request #8 from daily-co/queueframe-refactor
Refactor QueueFrame
2024-01-23 13:15:11 -05:00
Moishe Lettvin
ba89e41c5b remove commented-out code 2024-01-23 09:37:15 -05:00
Moishe Lettvin
c134598a77 Refactor QueueFrame 2024-01-23 09:33:51 -05:00
Liza
b51abd2969 facilitate manual call management (#7) 2024-01-23 14:33:27 +01:00
Moishe Lettvin
3fda9b0ecb Use more flexibile aggregator 2024-01-22 16:02:35 -05:00
Moishe Lettvin
95c92e5304 Aggregators for LLM messages 2024-01-22 10:59:13 -05:00
Moishe Lettvin
b443fbdb60 Very rough draft at intro/overview in README 2024-01-19 16:20:08 -05:00
Moishe Lettvin
ccd2fa31e5 Rename 'theoretical-to-real' samples to 'foundational' 2024-01-19 13:57:52 -05:00
Moishe Lettvin
9b65286216 Merge pull request #6 from daily-co/rm-sentence-aggregator
Cleanup: no more sentence aggregator
2024-01-19 13:42:27 -05:00
Moishe Lettvin
6ae733ebfe Cleanup: no more sentence aggregator, let the TTS service deal with that; also removed the queue typing stuff from ai_services 2024-01-19 13:06:15 -05:00
Liza
1071dede1a Only initialize Daily once (#5) 2024-01-19 14:59:48 +01:00
183 changed files with 8720 additions and 3055 deletions

30
.dockerignore Normal file
View File

@@ -0,0 +1,30 @@
# flyctl launch added from .gitignore
**/.vscode
**/env
**/__pycache__
**/*~
**/venv
#*#
# Distribution / packaging
**/.Python
**/build
**/develop-eggs
**/dist
**/downloads
**/eggs
**/.eggs
**/lib
**/lib64
**/parts
**/sdist
**/var
**/wheels
**/share/python-wheels
**/*.egg-info
**/.installed.cfg
**/*.egg
**/MANIFEST
**/.DS_Store
**/.env
fly.toml

44
.github/workflows/build.yaml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: build
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
name: "Build and Install"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Install project and other Python dependencies
run: |
source .venv/bin/activate
pip install --editable .

44
.github/workflows/lint.yaml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: lint
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-lint-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
autopep8:
name: "Formatting lints"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install development Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: autopep8
id: autopep8
run: |
source .venv/bin/activate
autopep8 --max-line-length 100 --exit-code -r -d --exclude "*_pb2.py" -a -a src/
- name: Fail if autopep8 requires changes
if: steps.autopep8.outputs.exit-code == 2
run: exit 1

84
.github/workflows/publish.yaml vendored Normal file
View File

@@ -0,0 +1,84 @@
name: publish
on:
workflow_dispatch:
inputs:
gitref:
type: string
description: "what git ref to build"
required: true
jobs:
build:
name: "Build and upload wheels"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels
path: ./dist
publish-to-pypi:
name: "Publish to PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: pypi
url: https://pypi.org/p/dailyai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
publish-to-test-pypi:
name: "Publish to Test PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: testpypi
url: https://pypi.org/p/dailyai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/

63
.github/workflows/publish_test.yaml vendored Normal file
View File

@@ -0,0 +1,63 @@
name: publish-test
on:
workflow_dispatch:
push:
branches:
- main
jobs:
build:
name: "Build and upload wheels"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
fetch-tags: true
fetch-depth: 100
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels
path: ./dist
publish-to-pypi:
name: "Publish to Test PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: testpypi
url: https://pypi.org/p/dailyai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/

49
.github/workflows/tests.yaml vendored Normal file
View File

@@ -0,0 +1,49 @@
name: test
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
test:
name: "Unit and Integration Tests"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing requirements-dev.txt and requirements-extra.txt which
# contain all dependencies needed to run the tests and examples.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('linux-py3.10-requirements.txt') }}-${{ hashFiles('dev-requirements.txt') }}
path: .venv
- name: Install system packages
run: sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r linux-py3.10-requirements.txt -r dev-requirements.txt
- name: Test with pytest
run: |
source .venv/bin/activate
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests

3
.gitignore vendored
View File

@@ -2,6 +2,8 @@
env/
__pycache__/
*~
venv
.venv
#*#
# Distribution / packaging
@@ -25,3 +27,4 @@ share/python-wheels/
MANIFEST
.DS_Store
.env
fly.toml

40
Dockerfile Normal file
View File

@@ -0,0 +1,40 @@
# setup
FROM python:3.11.5
WORKDIR /app
COPY requirements.txt /app
COPY *.py /app
COPY pyproject.toml /app
COPY src/ /app/src/
COPY examples/ /app/examples/
WORKDIR /app
RUN ls --recursive /app/
RUN pip3 install --upgrade -r requirements.txt
RUN python -m build .
RUN pip3 install .
RUN pip3 install gunicorn
# If running on Ubuntu, Azure TTS requires some extra config
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi
RUN wget -O - https://www.openssl.org/source/openssl-1.1.1w.tar.gz | tar zxf -
WORKDIR openssl-1.1.1w
RUN ./config --prefix=/usr/local
RUN make -j $(nproc)
RUN make install_sw install_ssldirs
RUN ldconfig -v
ENV SSL_CERT_DIR=/etc/ssl/certs
#ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
RUN apt clean
RUN apt-get update
RUN apt-get -y install build-essential libssl-dev ca-certificates libasound2 wget
ENV PYTHONUNBUFFERED=1
WORKDIR /app
EXPOSE 8000
# run
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--chdir", "examples/server", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]

24
LICENSE Normal file
View File

@@ -0,0 +1,24 @@
BSD 2-Clause License
Copyright (c) 2024, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

151
README.md
View File

@@ -1,20 +1,105 @@
# dailyai SDK
# dailyai — an open source framework for real-time, multi-modal, conversational AI applications
This SDK can help you build applications that participate in WebRTC meetings and use various AI services to interact with other participants.
Build things like this:
## Build/Install
[![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0)
**`dailyai` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
In 2023 a *lot* of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
- low-latency, reliable audio transport
- echo cancellation
- phrase endpointing (knowing when the bot should respond to human speech)
- interruptibility
- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models
As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
Today, `dailyai` is:
1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
2. transport services that moves audio, video, and events across the Internet
3. implementations of specific generative AI services
Currently implemented services:
- Speech-to-text
- Deepgram
- Whisper
- LLMs
- Azure
- OpenAI
- Image generation
- Azure
- Fal
- OpenAI
- Text-to-speech
- Azure
- Deepgram
- ElevenLabs
- Transport
- Daily
- Local (in progress, intended as a quick start example service)
- Vision
- Moondream
If you'd like to [implement a service]((https://github.com/daily-co/daily-ai-sdk/tree/main/src/dailyai/services)), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
## Getting started
Today, the easiest way to get started with `dailyai` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations. (The [transport base class](https://github.com/daily-co/daily-ai-sdk/blob/main/src/dailyai/transports/abstract_transport.py) is easy to extend, though, so feel free to submit PRs if you'd like to implement another transport service.)
```
# install the module
pip install dailyai
# set up an .env file with API keys
cp dot-env.template .env
```
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional
dependencies that you can install with:
```
pip install "dailyai[option,...]"
```
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
- **Transports**: `daily`, `local`, `websocket`
## Code examples
There are two directories of examples:
- [foundational](https://github.com/daily-co/daily-ai-sdk/tree/main/examples/foundational) — demos that build on each other, introducing one or two concepts at a time
- [starter apps](https://github.com/daily-co/daily-ai-sdk/tree/main/examples/starter-apps) — complete applications that you can use as starting points for development
Before running the examples you need to install the dependencies (which will install all the dependencies to run all of the examples):
```
pip install -r {env}-requirements.txt
```
To run the example below you need to sign up for a [free Daily account](https://dashboard.daily.co/u/signup) and create a Daily room (so you can hear the LLM talking). After that, join the room's URL directly from a browser tab and run:
```
python examples/foundational/02-llm-say-one-thing.py
```
## Hacking on the framework itself
_Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_
```
python3 -m venv env
source env/bin/activate
python3 -m venv venv
source venv/bin/activate
```
From the root of this repo, run the following:
```
pip install -r requirements.txt
pip install -r {env}-requirements.txt -r dev-requirements.txt
python -m build
```
@@ -30,26 +115,54 @@ If you want to use this package from another directory, you can run:
pip install path_to_this_repo
```
## Running the samples
### Running tests
Tou can run the simple sample like so:
From the root directory, run:
```
python src/samples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests
```
Note that the sample uses Azure's TTS and LLM services. You'll need to set the following environment variables for the sample to work:
## Setting up your editor
```
AZURE_SPEECH_SERVICE_KEY
AZURE_SPEECH_SERVICE_REGION
AZURE_CHATGPT_KEY
AZURE_CHATGPT_ENDPOINT
AZURE_CHATGPT_DEPLOYMENT_ID
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting.
### Emacs
You can use [use-package](https://github.com/jwiegley/use-package) to install [py-autopep8](https://codeberg.org/ideasman42/emacs-py-autopep8) package and configure `autopep8` arguments:
```elisp
(use-package py-autopep8
:ensure t
:defer t
:hook ((python-mode . py-autopep8-mode))
:config
(setq py-autopep8-options '("-a" "-a", "--max-line-length=100")))
```
If you have those environment variables stored in an .env file, you can quickly load them into your terminal's environment by running this:
`autopep8` was installed in the `venv` environment described before, so you should be able to use [pyvenv-auto](https://github.com/ryotaro612/pyvenv-auto) to automatically load that environment inside Emacs.
```elisp
(use-package pyvenv-auto
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
```bash
export $(grep -v '^#' .env | xargs)
```
### Visual Studio Code
Install the
[autopep8](https://marketplace.visualstudio.com/items?itemName=ms-python.autopep8) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, enable formatting on save and configure `autopep8` arguments:
```json
"[python]": {
"editor.defaultFormatter": "ms-python.autopep8",
"editor.formatOnSave": true
},
"autopep8.args": [
"-a",
"-a",
"--max-line-length=100"
],
```

6
dev-requirements.txt Normal file
View File

@@ -0,0 +1,6 @@
autopep8==2.0.4
build==1.0.3
pip-tools==7.4.1
pytest==8.1.1
setuptools==69.2.0
setuptools_scm==8.0.4

17
docs/README.md Normal file
View File

@@ -0,0 +1,17 @@
# Daily AI SDK Docs
## [Architecture Overview](architecture.md)
Learn about the thinking behind the SDK's design.
## [A Frame's Progress](frame-progress.md)
See how a Frame is processed through a Transport, a Pipeline, and a series of Frame Processors.
## [Example Code](examples/)
The repo includes several example apps in the `examples` directory. The docs explain how they work.
## [API Reference](api/)
Complete documentation of the available classes and methods in the SDK.

17
docs/architecture.md Normal file
View File

@@ -0,0 +1,17 @@
# Daily AI SDK Architecture Guide
## Frames
Frames can represent discrete chunks of data, for instance a chunk of text, a chunk of audio, or an image. They can also be used to as control flow, for instance a frame that indicates that there is no more data available, or that a user started or stopped talking. They can also represent more complex data structures, such as a message array used for an LLM completion.
## FrameProcessors
Frame processors operate on frames. Every frame processor implements a `process_frame` method that consumes one frame and produces zero or more frames. Frame processors can do simple transforms, such as concatenating text fragments into sentences, or they can treat frames as input for an AI Service, and emit chat completions based on message arrays or transform text into audio or images.
## Pipelines
Pipelines are lists of frame processors that read from a source queue and send the processed frames to a sink queue. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport's send queue as its sync. Placing LLM message frames on the pipeline's source queue will cause the LLM's response to be spoken. See example #2 for an implementation of this.
## Transports
Transports provide a receive queue, which is input from "the outside world", and a sink queue, which is data that will be sent "to the outside world". The `LocalTransportService` does this with the local camera, mic, display and speaker. The `DailyTransportService` does this with a WebRTC session joined to a Daily.co room.

View File

@@ -0,0 +1,119 @@
# 01: Say One Thing
_video here - youtube?_
This example uses a text-to-speech (TTS) service to say one predefined sentence. But first, a quick overview of the general structure of these examples.
## Running the demos
All of the demos have something like this at the bottom of the file:
```python
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
```
### `configure()`
The `configure()` function comes from `examples/foundational/support/runner.py`, and it allows you to configure the examples from the command line directly, or using environment variables:
```bash
python 01-say-one-thing.py -u https://YOUR_DOMAIN.daily.co/YOUR_ROOM -k YOUR_API_KEY
# or
DAILY_ROOM_URL=https://YOUR_DOMAIN.daily.co/YOUR_ROOM DAILY_API_KEY=YOUR_API_KEY python 01-say-one-thing.py
# or set DAILY_ROOM_URL and DAILY_API_KEY in a .env file
python 01-say-one-thing.py
```
You'll need a Daily account to run these demos. You can sign up for free at [daily.co](https://daily.co). Once you've signed up you can create a room from the [Dashboard](https://dashboard.daily.co/rooms), and grab [your API key](https://dashboard.daily.co/developers) while you're there.
Some functionality (such as transcription) requires the bot to have owner privileges in the room. `runner.py` uses the Daily REST API to create a meeting token with owner privileges. You can learn more about meeting tokens in the [Daily docs](https://docs.daily.co/reference/rest-api/meeting-tokens).
### `asyncio.run()`
The AI SDK makes heavy use of Python's `asyncio` module. [This is a reasonable intro to the topic](https://builtin.com/data-science/asyncio) if you haven't worked with `asyncio` and coroutines before.
You can learn a bit more about the specifics of how the Daily AI SDK uses coroutines in the [Architecture Guide](../architecture.md).
## The `main()` function
All of the examples have a `main()` function with a similar structure:
- Configure the transport
- Configure the AI service(s) used in the demo
- Configure any event listeners
- Define a processing pipeline
- Run the example's coroutine(s)
### Configuring the transport
The first section of the `main()` function configures the transport object:
```python
meeting_duration_minutes = 5
transport = DailyTransportService(
room_url,
None,
"Say One Thing",
meeting_duration_minutes,
)
transport.mic_enabled = True
```
The [Architecture Guide](../architecture.md) explains the transport object in more detail. In this case, we're configuring a Daily transport object and enabling the virtual microphone, so our bot can play audio.
### Configuring the services
As described in the [Architecture Guide](../architecture.md), 'a 'Service' is a class that processes 'Frames' as part of a 'Pipeline'. In this demo app, we'll only need one service: a text-to-speech generator. We can create an instance of the `ElevenLabsTTSService` class with this line of code:
```python
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
```
You'll need to make sure and set those environment variables somewhere. The easiest way to do that is to copy the `example.env` file in the repo and rename it to `.env`, and then add your credentials to that file. `runner.py` loads the `python-dotenv` module and initializes it, making the values in that file available in the environment.
### Configuring event listeners
This part isn't strictly necessary for an app like this. You could include the contents of the `on_participant_joined` function directly in the body of the `main()` function, and it would run as soon as you started the script from the command line.
Instead, we can use an event handler to wait to run that code until someone else joins the meeting. We'll define a function called `greet_user()`, and use the `@transport.event_handler("on_participant_joined")` decorator to tell the SDK that we want to run that function whenever a user joins the room.
```python
@transport.event_handler("on_participant_joined")
async def greet_user(transport, participant):
if participant["info"]["isLocal"]:
return
await tts.say(
"Hello there, " + participant["info"]["userName"] + "!",
transport.send_queue,
)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
```
### Defining a processing pipeline
In this example, we don't actually have much of a processing pipeline! In fact, we're doing the whole thing inside the `greet_user()` function already.
Pipelines usually look like a bunch of nested calls to the `run()` or `run_to_queue()` function from different Services. In this example, we're using the `say()` function from the TTS service. This is effectively a convenience wrapper around the `run_to_queue()` function, which we'll discuss more later. It's important to `await` this function to ensure that the speech frames are queued for playback before the next line of code, because of the `stop_when_done()` function being called immediately afterward.
The output of the `say()` function goes to the transport's `send_queue`. This queue is the all-important connection between the world of the Services pipeline that's generating frames asynchronously and the ordered playback of audio and visual media in the WebRTC call.
### Running the coroutines
In this example, we don't actually have any separate processing pipelines—everything happens as a result of an event from the transport. So we only need to run the transport's coroutine, and await its completion:
```python
await transport.run()
```
In future examples, we'll run more processes in parallel. For now, this script can run until the transport exits—which will happen based on calling `stop_when_done()` in the `greet_user()` function.
## Next Steps
Next, we'll start connecting multiple AI services together by building a service pipeline.
## [02 - LLM Say One Thing »](02-llm-say-one-thing.md)

5
docs/examples/README.md Normal file
View File

@@ -0,0 +1,5 @@
# Daily AI SDK Examples
The docs in this folder pair with the example apps located in `examples/foundational`. They are designed to serve as a quick references for building different kinds of AI apps. But the examples also build on one another, so it can be really helpful to walk through them in order.
To start, you can learn about the overall structure of the examples in [01 - Say One Thing](01-say-one-thing.md).

46
docs/frame-progress.md Normal file
View File

@@ -0,0 +1,46 @@
# A Frame's Progress
1. A user says “Hello, LLM” and the cloud transcription service delivers a transcription to the Transport.
![A transcript frame arrives](images/frame-progress-01.png)
2. The Transport places a Transcription frame in the Pipelines source queue.
![Frame in source queue](images/frame-progress-02.png)
3. The Pipeline passes the Transcription frame to the first Frame Processor in its list, the LLM User Message Aggregator.
![To UMA](images/frame-progress-03.png)
4. The LLM User Message Aggregator updates the LLM Context with a `{“user”: “Hello LLM”}` message.
![Update context](images/frame-progress-04.png)
5. The LLM User Message Aggregator yields an LLM Message Frame, containing the updated LLM Context. The Pipeline passes this frame to the LLM Frame Processor.
![Update context](images/frame-progress-05.png)
6. The LLM Frame Processor creates a streaming chat completion based on the LLM context and yields the first chunk of a response, Text Frame with the value “Hi, “. The Pipeline passes this frame to the TTS Frame Processor. The TTS Frame Processor aggregates this response but doesnt yield anything, yet, because its waiting for a full sentence.
![LLM yields Text](images/frame-progress-06.png)
7. The LLM Frame Processor yields another Text Frame with the value “there.”. The Pipeline passes this frame to the TTS Frame Processor.
![LLM yields more Text](images/frame-progress-07.png)
8. The TTS Frame Processor now has a full sentence, so it starts streaming audio based on “Hi, there.” It yields the first chunk of streaming audio as an Audio frame, which the Pipeline passes to the LLM Assistant Message Aggregator.
![TTS yields Audio](images/frame-progress-08.png)
9. The LLM Assistant Message Aggregator doesnt do anything with Audio frames, so it immediately yields the frame, unchanged. This is the convention for all Frame Processors: frames that the processor doesnt process should be immediately yielded.
![pass-through](images/frame-progress-09.png)
10. The Pipeline places the first Audio frame in its sink queue, which is being watched by the Transport. Since the frame is now in a queue, the Pipeline can continue processing other frames. Note that the source and sink queues form a sort of “boundary of concurrent processing” between a Pipeline and the outside world. In a Pipeline, Frames are processed sequentially; once a Frame is on a queue it can be processed in parallel with the frames being processed by the Pipeline. TODO: link to a more in-depth section about this.
![sink queue](images/frame-progress-10.png)
11. The TTS Frame Processor yields another Audio frame as the Transport transmits the first Audio frame.
![parallel audio](images/frame-progress-11.png)
12. As before, the LLM Assistant Message Aggregator immediately yields the Audio frame and the Pipeline places the Audio frame in the sink queue.
![sink queue 2](images/frame-progress-12.png)
13. The TTS Frame Processor has no more frames to yield. The LLM Frame Processor emits an LLM Response End Frame, which the Pipeline passes to the TTS Frame Processor.
![response end](images/frame-progress-13.png)
14. The TTS Frame Processor immediately yields the LLM Response End Frame, so the Pipeline passes it along to the LLM Assistant Message Aggregator. The LLM Assistant Message Aggregator updates the LLM Context with the full response from the LLM. TODO TODO: I realized I forgot that the TSS Frame Processor also yields the Text frames that the LLM emitted so that the LLM Assistant Message Aggregator could accumulate them, arrggh.
![response end](images/frame-progress-14.png)
15. The system is quiet, and waiting for the next message from the Transport.
![response end](images/frame-progress-15.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

33
dot-env.template Normal file
View File

@@ -0,0 +1,33 @@
# Anthropic
ANTHROPIC_API_KEY=...
# Azure
AZURE_SPEECH_REGION=...
AZURE_SPEECH_API_KEY=...
AZURE_CHATGPT_API_KEY=...
AZURE_CHATGPT_ENDPOINT=https://...
AZURE_CHATGPT_MODEL=...
AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
# ElevenLabs
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
# Fal
FAL_KEY_ID=...
FAL_KEY_SECRET=...
# PlayHT
PLAY_HT_USER_ID=...
PLAY_HT_API_KEY=...
# OpenAI
OPENAI_API_KEY=...

View File

@@ -0,0 +1,54 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frames import EndFrame, TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Say One Thing",
mic_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([tts])
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
if participant["info"]["isLocal"]:
return
participant_name = participant["info"]["userName"] or ''
await pipeline.queue_frames([TextFrame("Hello there, " + participant_name + "!"), EndFrame()])
await transport.run(pipeline)
del tts
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,38 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.transports.local_transport import LocalTransport
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 1
transport = LocalTransport(
duration_minutes=meeting_duration_minutes, mic_enabled=True
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
async def say_something():
await asyncio.sleep(1)
await transport.say("Hello there.", tts)
await transport.stop_when_done()
await asyncio.gather(transport.run(), say_something())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,59 @@
import asyncio
import os
import logging
import aiohttp
from dailyai.pipeline.frames import EndFrame, LLMMessagesFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.open_ai_services import OpenAILLMService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Say One Thing From an LLM",
mic_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}]
pipeline = Pipeline([llm, tts])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await pipeline.queue_frames([LLMMessagesFrame(messages), EndFrame()])
await transport.run(pipeline)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,59 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frames import TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.fal_ai_services import FalImageGenService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Show a still frame image",
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=1
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
pipeline = Pipeline([imagegen])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
# Note that we do not put an EndFrame() item in the pipeline for this demo.
# This means that the bot will stay in the channel until it times out.
# An EndFrame() in the pipeline would cause the transport to shut
# down.
await pipeline.queue_frames(
[TextFrame("a cat in the style of picasso")]
)
await transport.run(pipeline)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,59 @@
import asyncio
import aiohttp
import logging
import os
import tkinter as tk
from dailyai.pipeline.frames import TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.transports.local_transport import LocalTransport
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 2
tk_root = tk.Tk()
tk_root.title("dailyai")
transport = LocalTransport(
tk_root=tk_root,
mic_enabled=False,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=meeting_duration_minutes,
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
pipeline = Pipeline([imagegen])
await pipeline.queue_frames([TextFrame("a cat in the style of picasso")])
async def run_tk():
while not transport._stop_threads.is_set():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(transport.run(pipeline, override_pipeline_source_queue=False), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,85 @@
import asyncio
import logging
import os
import aiohttp
from dailyai.pipeline.merge_pipeline import SequentialMergePipeline
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.deepgram_ai_services import DeepgramTTSService
from dailyai.pipeline.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Static And Dynamic Speech",
duration_minutes=1,
mic_enabled=True,
mic_sample_rate=16000,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
azure_tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
deepgram_tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
)
elevenlabs_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
messages = [{"role": "system",
"content": "tell the user a joke about llamas"}]
# Start a task to run the LLM to create a joke, and convert the LLM output to audio frames. This task
# will run in parallel with generating and speaking the audio for static text, so there's no delay to
# speak the LLM response.
llm_pipeline = Pipeline([llm, elevenlabs_tts])
await llm_pipeline.queue_frames([LLMMessagesFrame(messages), EndPipeFrame()])
simple_tts_pipeline = Pipeline([azure_tts])
await simple_tts_pipeline.queue_frames(
[
TextFrame("My friend the LLM is going to tell a joke about llamas."),
EndPipeFrame(),
]
)
merge_pipeline = SequentialMergePipeline(
[simple_tts_pipeline, llm_pipeline])
await asyncio.gather(
transport.run(merge_pipeline),
simple_tts_pipeline.run_pipeline(),
llm_pipeline.run_pipeline(),
)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,148 @@
import asyncio
import aiohttp
import os
import logging
from dataclasses import dataclass
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import (
GatedAggregator,
LLMFullResponseAggregator,
ParallelPipeline,
SentenceAggregator,
)
from dailyai.pipeline.frames import (
Frame,
TextFrame,
EndFrame,
ImageFrame,
LLMMessagesFrame,
LLMResponseStartFrame,
)
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
@dataclass
class MonthFrame(Frame):
month: str
class MonthPrepender(FrameProcessor):
def __init__(self):
self.most_recent_month = "Placeholder, month frame not yet received"
self.prepend_to_next_text_frame = False
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, MonthFrame):
self.most_recent_month = frame.month
elif self.prepend_to_next_text_frame and isinstance(frame, TextFrame):
yield TextFrame(f"{self.most_recent_month}: {frame.text}")
self.prepend_to_next_text_frame = False
elif isinstance(frame, LLMResponseStartFrame):
self.prepend_to_next_text_frame = True
yield frame
else:
yield frame
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Month Narration Bot",
mic_enabled=True,
camera_enabled=True,
mic_sample_rate=16000,
camera_width=1024,
camera_height=1024,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
gated_aggregator = GatedAggregator(
gate_open_fn=lambda frame: isinstance(
frame, ImageFrame), gate_close_fn=lambda frame: isinstance(
frame, LLMResponseStartFrame), start_open=False, )
sentence_aggregator = SentenceAggregator()
month_prepender = MonthPrepender()
llm_full_response_aggregator = LLMFullResponseAggregator()
pipeline = Pipeline(
processors=[
llm,
sentence_aggregator,
ParallelPipeline(
[[month_prepender, tts], [llm_full_response_aggregator, imagegen]]
),
gated_aggregator,
],
)
frames = []
for month in [
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
]:
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
frames.append(MonthFrame(month))
frames.append(LLMMessagesFrame(messages))
frames.append(EndFrame())
await pipeline.queue_frames(frames)
await transport.run(pipeline, override_pipeline_source_queue=False)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,147 @@
import aiohttp
import asyncio
import logging
import tkinter as tk
import os
from dailyai.pipeline.aggregators import LLMFullResponseAggregator
from dailyai.pipeline.frames import AudioFrame, URLImageFrame, LLMMessagesFrame, TextFrame
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.transports.local_transport import LocalTransport
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 5
tk_root = tk.Tk()
tk_root.title("dailyai")
transport = LocalTransport(
mic_enabled=True,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=meeting_duration_minutes,
tk_root=tk_root,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="1024x1024"
),
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
# Get a complete audio chunk from the given text. Splitting this into its own
# coroutine lets us ensure proper ordering of the audio chunks on the
# send queue.
async def get_all_audio(text):
all_audio = bytearray()
async for audio in tts.run_tts(text):
all_audio.extend(audio)
return all_audio
async def get_month_description(aggregator, frame):
async for frame in aggregator.process_frame(frame):
if isinstance(frame, TextFrame):
return frame.text
async def get_month_data(month):
messages = [{"role": "system", "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.", }]
messages_frame = LLMMessagesFrame(messages)
llm_full_response_aggregator = LLMFullResponseAggregator()
image_description = None
async for frame in llm.process_frame(messages_frame):
result = await get_month_description(llm_full_response_aggregator, frame)
if result:
image_description = result
break
if not image_description:
return
to_speak = f"{month}: {image_description}"
audio_task = asyncio.create_task(get_all_audio(to_speak))
image_task = asyncio.create_task(
imagegen.run_image_gen(image_description))
(audio, image_data) = await asyncio.gather(audio_task, image_task)
return {
"month": month,
"text": image_description,
"image_url": image_data[0],
"image": image_data[1],
"image_size": image_data[2],
"audio": audio,
}
# We only specify 5 months as we create tasks all at once and we might
# get rate limited otherwise.
months: list[str] = [
"January",
"February",
"March",
"April",
"May",
]
async def show_images():
# This will play the months in the order they're completed. The benefit
# is we'll have as little delay as possible before the first month, and
# likely no delay between months, but the months won't display in
# order.
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
if data:
await transport.send_queue.put(
[
URLImageFrame(data["image_url"], data["image"], data["image_size"]),
AudioFrame(data["audio"]),
]
)
await asyncio.sleep(25)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
async def run_tk():
while not transport._stop_threads.is_set():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
month_tasks = [
asyncio.create_task(
get_month_data(month)) for month in months]
await asyncio.gather(transport.run(), show_images(), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,88 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frames import LLMMessagesFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.ai_services import FrameLogger
from dailyai.pipeline.aggregators import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
)
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
duration_minutes=5,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
vad_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
fl = FrameLogger("Inner")
fl2 = FrameLogger("Outer")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
pipeline = Pipeline(
processors=[
fl,
tma_in,
llm,
fl2,
tts,
tma_out,
],
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await pipeline.queue_frames([LLMMessagesFrame(messages)])
transport.transcription_settings["extra"]["endpointing"] = True
transport.transcription_settings["extra"]["punctuate"] = True
await transport.run(pipeline)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,96 @@
import asyncio
import os
import logging
from typing import AsyncGenerator
import aiohttp
from PIL import Image
from dailyai.pipeline.frames import ImageFrame, Frame, TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.ai_services import AIService
from dailyai.pipeline.aggregators import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
)
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class ImageSyncAggregator(AIService):
def __init__(self, speaking_path: str, waiting_path: str):
self._speaking_image = Image.open(speaking_path)
self._speaking_image_bytes = self._speaking_image.tobytes()
self._waiting_image = Image.open(waiting_path)
self._waiting_image_bytes = self._waiting_image.tobytes()
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
yield ImageFrame(self._speaking_image_bytes, (1024, 1024))
yield frame
yield ImageFrame(self._waiting_image_bytes, (1024, 1024))
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
5,
)
transport._camera_enabled = True
transport._camera_width = 1024
transport._camera_height = 1024
transport._mic_enabled = True
transport._mic_sample_rate = 16000
transport.transcription_settings["extra"]["punctuate"] = True
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so it should not include any special characters. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
)
pipeline = Pipeline([image_sync_aggregator, tma_in, llm, tma_out, tts])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await pipeline.queue_frames([TextFrame("Hi, I'm listening!")])
await transport.run(pipeline)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,76 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.aggregators import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.ai_services import FrameLogger
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
duration_minutes=5,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
vad_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
pipeline = Pipeline([FrameLogger(), llm, FrameLogger(), tts])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await transport.say("Hi, I'm listening!", tts)
async def run_conversation():
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
await transport.run_interruptible_pipeline(
pipeline,
post_processor=LLMAssistantResponseAggregator(messages),
pre_processor=LLMUserResponseAggregator(messages),
)
transport.transcription_settings["extra"]["punctuate"] = False
await asyncio.gather(transport.run(), run_conversation())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,149 @@
from typing import Tuple
import aiohttp
import asyncio
import logging
import os
from dailyai.pipeline.aggregators import SentenceAggregator
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Respond bot",
duration_minutes=10,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
tts1 = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts2 = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)
dalle = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="1024x1024"
),
aiohttp_session=session,
key_id=os.getenv("FAL_KEY_ID"),
key_secret=os.getenv("FAL_KEY_SECRET"),
)
bot1_messages = [
{
"role": "system",
"content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long.",
},
]
bot2_messages = [
{
"role": "system",
"content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich.",
},
]
async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
"""This function streams text from the LLM and uses the TTS service to convert
that text to speech as it's received. """
source_queue = asyncio.Queue()
sink_queue = asyncio.Queue()
sentence_aggregator = SentenceAggregator()
pipeline = Pipeline(
[llm, sentence_aggregator, tts1], source_queue, sink_queue
)
await source_queue.put(LLMMessagesFrame(messages))
await source_queue.put(EndFrame())
await pipeline.run_pipeline()
message = ""
all_audio = bytearray()
while sink_queue.qsize():
frame = sink_queue.get_nowait()
if isinstance(frame, TextFrame):
message += frame.text
elif isinstance(frame, AudioFrame):
all_audio.extend(frame.data)
return (message, all_audio)
async def get_bot1_statement():
message, audio = await get_text_and_audio(bot1_messages)
bot1_messages.append({"role": "assistant", "content": message})
bot2_messages.append({"role": "user", "content": message})
return audio
async def get_bot2_statement():
message, audio = await get_text_and_audio(bot2_messages)
bot2_messages.append({"role": "assistant", "content": message})
bot1_messages.append({"role": "user", "content": message})
return audio
async def argue():
for i in range(100):
print(f"In iteration {i}")
bot1_description = "A woman conservatively dressed as a librarian in a library surrounded by books, cartoon, serious, highly detailed"
(audio1, image_data1) = await asyncio.gather(
get_bot1_statement(), dalle.run_image_gen(bot1_description)
)
await transport.send_queue.put(
[
ImageFrame(image_data1[1], image_data1[2]),
AudioFrame(audio1),
]
)
bot2_description = "A cat dressed in a hot dog costume, cartoon, bright colors, funny, highly detailed"
(audio2, image_data2) = await asyncio.gather(
get_bot2_statement(), dalle.run_image_gen(bot2_description)
)
await transport.send_queue.put(
[
ImageFrame(image_data2[1], image_data2[2]),
AudioFrame(audio2),
]
)
await asyncio.gather(transport.run(), argue())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,182 @@
import aiohttp
import asyncio
import logging
import os
import random
from typing import AsyncGenerator
from PIL import Image
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from dailyai.pipeline.frames import (
Frame,
TextFrame,
ImageFrame,
SpriteFrame,
TranscriptionFrame,
)
from dailyai.services.ai_services import AIService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
sprites = {}
image_files = [
"sc-default.png",
"sc-talk.png",
"sc-listen-1.png",
"sc-think-1.png",
"sc-think-2.png",
"sc-think-3.png",
"sc-think-4.png",
]
script_dir = os.path.dirname(__file__)
for file in image_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites[file] = img.tobytes()
# When the bot isn't talking, show a static image of the cat listening
quiet_frame = ImageFrame(sprites["sc-listen-1.png"], (720, 1280))
# When the bot is talking, build an animation from two sprites
talking_list = [sprites["sc-default.png"], sprites["sc-talk.png"]]
talking = [random.choice(talking_list) for x in range(30)]
talking_frame = SpriteFrame(images=talking)
# TODO: Support "thinking" as soon as we get a valid transcript, while LLM
# is processing
thinking_list = [
sprites["sc-think-1.png"],
sprites["sc-think-2.png"],
sprites["sc-think-3.png"],
sprites["sc-think-4.png"],
]
thinking_frame = SpriteFrame(images=thinking_list)
class TranscriptFilter(AIService):
def __init__(self, bot_participant_id=None):
self.bot_participant_id = bot_participant_id
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, TranscriptionFrame):
if frame.participantId != self.bot_participant_id:
yield frame
class NameCheckFilter(AIService):
def __init__(self, names: list[str]):
self.names = names
self.sentence = ""
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
content: str = ""
# TODO: split up transcription by participant
if isinstance(frame, TextFrame):
content = frame.text
self.sentence += content
if self.sentence.endswith((".", "?", "!")):
if any(name in self.sentence for name in self.names):
out = self.sentence
self.sentence = ""
yield TextFrame(out)
else:
out = self.sentence
self.sentence = ""
class ImageSyncAggregator(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
yield talking_frame
yield frame
yield quiet_frame
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Santa Cat",
duration_minutes=3,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=True,
camera_width=720,
camera_height=1280,
)
transport._mic_enabled = True
transport._mic_sample_rate = 16000
transport._camera_enabled = True
transport._camera_width = 720
transport._camera_height = 1280
transport.transcription_settings["extra"]["punctuate"] = True
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)
isa = ImageSyncAggregator()
messages = [
{
"role": "system",
"content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while. Your responses should only be a few sentences long.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
tf = TranscriptFilter(transport._my_participant_id)
ncf = NameCheckFilter(["Santa Cat", "Santa"])
pipeline = Pipeline([isa, tf, ncf, tma_in, llm, tma_out, tts])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await transport.say(
"Hi! If you want to talk to me, just say 'hey Santa Cat'.",
tts,
)
async def starting_image():
await transport.send_queue.put(quiet_frame)
await asyncio.gather(transport.run(pipeline), starting_image())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,126 @@
import aiohttp
import asyncio
import logging
import os
import wave
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from dailyai.services.ai_services import AIService, FrameLogger
from dailyai.pipeline.frames import (
Frame,
AudioFrame,
LLMResponseEndFrame,
LLMMessagesFrame,
)
from typing import AsyncGenerator
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
sounds = {}
sound_files = ["ding1.wav", "ding2.wav"]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = audio_file.readframes(-1)
class OutboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMResponseEndFrame):
yield AudioFrame(sounds["ding1.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
class InboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMMessagesFrame):
yield AudioFrame(sounds["ding2.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
duration_minutes=5,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
)
transport.transcription_settings["extra"]["punctuate"] = True
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
pipeline = Pipeline([tma_in, in_sound, fl2, llm, tma_out, fl, tts, out_sound])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await transport.say("Hi, I'm listening!", tts)
await transport.send_queue.put(AudioFrame(sounds["ding1.wav"]))
await asyncio.gather(transport.run(pipeline))
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,85 @@
import asyncio
import aiohttp
import logging
import os
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import FrameProcessor, UserResponseAggregator, VisionImageFrameAggregator
from dailyai.pipeline.frames import Frame, TextFrame, UserImageRequestFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.moondream_ai_service import MoondreamService
from dailyai.transports.daily_transport import DailyTransport
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class UserImageRequester(FrameProcessor):
participant_id: str
def set_participant_id(self, participant_id: str):
self.participant_id = participant_id
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if self.participant_id and isinstance(frame, TextFrame):
yield UserImageRequestFrame(self.participant_id)
yield frame
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
duration_minutes=5,
mic_enabled=True,
mic_sample_rate=16000,
vad_enabled=True,
start_transcription=True,
video_rendering_enabled=True
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
# If you run into weird description, try with use_cpu=True
moondream = MoondreamService()
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await transport.say("Hi there! Feel free to ask me what I see.", tts)
transport.render_participant_video(participant["id"], framerate=0)
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([user_response, image_requester, vision_aggregator, moondream, tts])
await transport.run(pipeline)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,58 @@
import asyncio
import logging
from dailyai.pipeline.frames import EndFrame, TranscriptionFrame
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
from dailyai.pipeline.pipeline import Pipeline
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
transport = DailyTransport(
room_url,
None,
"Transcription bot",
start_transcription=False,
mic_enabled=False,
camera_enabled=False,
speaker_enabled=True,
)
stt = WhisperSTTService()
transcription_output_queue = asyncio.Queue()
transport_done = asyncio.Event()
pipeline = Pipeline([stt], source=transport.receive_queue, sink=transcription_output_queue)
async def handle_transcription():
print("`````````TRANSCRIPTION`````````")
while not transport_done.is_set():
item = await transcription_output_queue.get()
print("got item from queue", item)
if isinstance(item, TranscriptionFrame):
print(item.text)
elif isinstance(item, EndFrame):
break
print("handle_transcription done")
async def run_until_done():
await transport.run()
transport_done.set()
print("run_until_done done")
await asyncio.gather(run_until_done(), pipeline.run_pipeline(), handle_transcription())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,51 @@
import asyncio
import logging
from dailyai.pipeline.frames import EndFrame, TranscriptionFrame
from dailyai.transports.local_transport import LocalTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
from dailyai.pipeline.pipeline import Pipeline
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
meeting_duration_minutes = 1
transport = LocalTransport(
mic_enabled=True,
camera_enabled=False,
speaker_enabled=True,
duration_minutes=meeting_duration_minutes,
)
stt = WhisperSTTService()
transcription_output_queue = asyncio.Queue()
transport_done = asyncio.Event()
pipeline = Pipeline([stt], source=transport.receive_queue, sink=transcription_output_queue)
async def handle_transcription():
print("`````````TRANSCRIPTION`````````")
while not transport_done.is_set():
item = await transcription_output_queue.get()
print("got item from queue", item)
if isinstance(item, TranscriptionFrame):
print(item.text)
elif isinstance(item, EndFrame):
break
print("handle_transcription done")
async def run_until_done():
await transport.run()
transport_done.set()
print("run_until_done done")
await asyncio.gather(run_until_done(), pipeline.run_pipeline(), handle_transcription())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,52 @@
import asyncio
import logging
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import FrameProcessor
from dailyai.pipeline.frames import ImageFrame, Frame, UserImageFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class UserImageProcessor(FrameProcessor):
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, UserImageFrame):
yield ImageFrame(frame.image, frame.size)
else:
yield frame
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Render participant video",
camera_width=1280,
camera_height=720,
camera_enabled=True,
video_rendering_enabled=True
)
@ transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
transport.render_participant_video(participant["id"])
pipeline = Pipeline([UserImageProcessor()])
await asyncio.gather(transport.run(pipeline))
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,71 @@
import asyncio
import logging
import tkinter as tk
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import FrameProcessor
from dailyai.pipeline.frames import ImageFrame, Frame, UserImageFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.transports.local_transport import LocalTransport
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class UserImageProcessor(FrameProcessor):
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, UserImageFrame):
yield ImageFrame(frame.image, frame.size)
else:
yield frame
async def main(room_url: str, token):
tk_root = tk.Tk()
tk_root.title("dailyai")
local_transport = LocalTransport(
tk_root=tk_root,
camera_enabled=True,
camera_width=1280,
camera_height=720
)
transport = DailyTransport(
room_url,
token,
"Render participant video",
video_rendering_enabled=True
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
transport.render_participant_video(participant["id"])
async def run_tk():
while not transport._stop_threads.is_set():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
local_pipeline = Pipeline([UserImageProcessor()], source=transport.receive_queue)
await asyncio.gather(
transport.run(),
local_transport.run(local_pipeline, override_pipeline_source_queue=False),
run_tk()
)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

Binary file not shown.

Binary file not shown.

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 870 KiB

After

Width:  |  Height:  |  Size: 870 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 871 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 872 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 868 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -0,0 +1,58 @@
import argparse
import os
import time
import urllib
import requests
def configure():
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u",
"--url",
type=str,
required=False,
help="URL of the Daily room to join")
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL.")
if not key:
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={
"Authorization": f"Bearer {key}"},
json={
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
return (url, token)

View File

@@ -0,0 +1,25 @@
syntax = "proto3";
package dailyai_proto;
message TextFrame {
string text = 1;
}
message AudioFrame {
bytes audio = 1;
}
message TranscriptionFrame {
string text = 1;
string participant_id = 2;
string timestamp = 3;
}
message Frame {
oneof frame {
TextFrame text = 1;
AudioFrame audio = 2;
TranscriptionFrame transcription = 3;
}
}

View File

@@ -0,0 +1,134 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="//cdn.jsdelivr.net/npm/protobufjs@7.X.X/dist/protobuf.min.js"></script>
<title>WebSocket Audio Stream</title>
</head>
<body>
<h1>WebSocket Audio Stream</h1>
<button id="startAudioBtn">Start Audio</button>
<button id="stopAudioBtn">Stop Audio</button>
<script>
const SAMPLE_RATE = 16000;
const BUFFER_SIZE = 8192;
const MIN_AUDIO_SIZE = 6400;
let audioContext;
let microphoneStream;
let scriptProcessor;
let source;
let frame;
let audioChunks = [];
let isPlaying = false;
let ws;
const proto = protobuf.load("frames.proto", (err, root) => {
if (err) throw err;
frame = root.lookupType("dailyai_proto.Frame");
});
function initWebSocket() {
ws = new WebSocket('ws://localhost:8765');
ws.addEventListener('open', () => console.log('WebSocket connection established.'));
ws.addEventListener('message', handleWebSocketMessage);
ws.addEventListener('close', (event) => console.log("WebSocket connection closed.", event.code, event.reason));
ws.addEventListener('error', (event) => console.error('WebSocket error:', event));
}
async function handleWebSocketMessage(event) {
const arrayBuffer = await event.data.arrayBuffer();
enqueueAudioFromProto(arrayBuffer);
}
function enqueueAudioFromProto(arrayBuffer) {
const parsedFrame = frame.decode(new Uint8Array(arrayBuffer));
if (!parsedFrame?.audio) return false;
const frameCount = parsedFrame.audio.data.length / 2;
const audioOutBuffer = audioContext.createBuffer(1, frameCount, SAMPLE_RATE);
const nowBuffering = audioOutBuffer.getChannelData(0);
const view = new Int16Array(parsedFrame.audio.data.buffer);
for (let i = 0; i < frameCount; i++) {
const word = view[i];
nowBuffering[i] = ((word + 32768) % 65536 - 32768) / 32768.0;
}
audioChunks.push(audioOutBuffer);
if (!isPlaying) playNextChunk();
}
function playNextChunk() {
if (audioChunks.length === 0) {
isPlaying = false;
return;
}
isPlaying = true;
const audioOutBuffer = audioChunks.shift();
const source = audioContext.createBufferSource();
source.buffer = audioOutBuffer;
source.connect(audioContext.destination);
source.onended = playNextChunk;
source.start();
}
function startAudio() {
if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) {
alert('getUserMedia is not supported in your browser.');
return;
}
navigator.mediaDevices.getUserMedia({ audio: true })
.then((stream) => {
microphoneStream = stream;
audioContext = new (window.AudioContext || window.webkitAudioContext)();
scriptProcessor = audioContext.createScriptProcessor(BUFFER_SIZE, 1, 1);
source = audioContext.createMediaStreamSource(stream);
source.connect(scriptProcessor);
scriptProcessor.connect(audioContext.destination);
const audioBuffer = [];
const skipRatio = Math.floor(audioContext.sampleRate / (SAMPLE_RATE * 2));
scriptProcessor.onaudioprocess = (event) => {
const rawLeftChannelData = event.inputBuffer.getChannelData(0);
for (let i = 0; i < rawLeftChannelData.length; i += skipRatio) {
const normalized = ((rawLeftChannelData[i] * 32768.0) + 32768) % 65536 - 32768;
const swappedBytes = ((normalized & 0xff) << 8) | ((normalized >> 8) & 0xff);
audioBuffer.push(swappedBytes);
}
if (audioBuffer.length >= MIN_AUDIO_SIZE) {
const audioFrame = frame.create({ audio: { audio: audioBuffer.slice(0, MIN_AUDIO_SIZE) } });
const encodedFrame = new Uint8Array(frame.encode(audioFrame).finish());
ws.send(encodedFrame);
audioBuffer.splice(0, MIN_AUDIO_SIZE);
}
};
initWebSocket();
})
.catch((error) => console.error('Error accessing microphone:', error));
}
function stopAudio() {
if (ws) {
ws.close();
scriptProcessor.disconnect();
source.disconnect();
ws = undefined;
}
}
document.getElementById('startAudioBtn').addEventListener('click', startAudio);
document.getElementById('stopAudioBtn').addEventListener('click', stopAudio);
</script>
</body>
</html>

View File

@@ -0,0 +1,50 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.frames import TextFrame, TranscriptionFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.transports.websocket_transport import WebsocketTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
logging.basicConfig(format="%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class WhisperTranscriber(FrameProcessor):
async def process_frame(self, frame):
if isinstance(frame, TranscriptionFrame):
print(f"Transcribed: {frame.text}")
else:
yield frame
async def main():
async with aiohttp.ClientSession() as session:
transport = WebsocketTransport(
mic_enabled=True,
speaker_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([
WhisperSTTService(),
WhisperTranscriber(),
tts,
])
@transport.on_connection
async def queue_frame():
await pipeline.queue_frames([TextFrame("Hello there!")])
await transport.run(pipeline)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -5,61 +5,64 @@ import time
import urllib.parse
import random
from dailyai.services.daily_transport_service import DailyTransportService
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.queue_frame import QueueFrame, FrameType
from dailyai.pipeline.frames import Frame, FrameType
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
async def main(room_url:str, token):
async def main(room_url: str, token):
global transport
global llm
global tts
transport = DailyTransportService(
transport = DailyTransport(
room_url,
token,
"Imagebot",
1,
)
transport.mic_enabled = True
transport.camera_enabled = True
transport.mic_sample_rate = 16000
transport.camera_width = 1024
transport.camera_height = 1024
transport._mic_enabled = True
transport._camera_enabled = True
transport._mic_sample_rate = 16000
transport._camera_width = 1024
transport._camera_height = 1024
llm = AzureLLMService()
tts = AzureTTSService()
img = FalImageGenService()
async def handle_transcriptions():
print("handle_transcriptions got called")
sentence = ""
async for message in transport.get_transcriptions():
print(f"transcription message: {message}")
if message["session_id"] == transport.my_participant_id:
if message["session_id"] == transport._my_participant_id:
continue
finder = message["text"].find("start over")
finder = message["text"].find("start over")
print(f"finder: {finder}")
if finder >= 0:
async for audio in tts.run_tts(f"Resetting."):
transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
transport.output_queue.put(
Frame(FrameType.AUDIO_FRAME, audio))
sentence = ""
continue
# todo: we could differentiate between transcriptions from different participants
# todo: we could differentiate between transcriptions from
# different participants
sentence += f" {message['text']}"
print(f"sentence is now: {sentence}")
# TODO: Cache this audio
phrase = random.choice(["OK.", "Got it.", "Sure.", "You bet.", "Sure thing."])
phrase = random.choice(
["OK.", "Got it.", "Sure.", "You bet.", "Sure thing."])
async for audio in tts.run_tts(phrase):
transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
img_result = img.run_image_gen(sentence, "1024x1024")
awaited_img = await asyncio.gather(img_result)
transport.output_queue.put(
[
QueueFrame(FrameType.IMAGE_FRAME, awaited_img[0][1]),
Frame(FrameType.IMAGE_FRAME, awaited_img[0][1]),
]
)
@@ -69,9 +72,10 @@ async def main(room_url:str, token):
if participant["info"]["isLocal"]:
return
async for audio in tts.run_tts("Describe an image, and I'll create it."):
audio_generator = tts.run_tts(f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
audio_generator = tts.run_tts(
f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
async for audio in audio_generator:
transport.output_queue.put(QueueFrame(FrameType.AUDIO_FRAME, audio))
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
transport.transcription_settings["extra"]["punctuate"] = False
transport.transcription_settings["extra"]["endpointing"] = False
@@ -81,8 +85,11 @@ async def main(room_url:str, token):
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=True, help="URL of the Daily room to join"
)
"-u",
"--url",
type=str,
required=True,
help="URL of the Daily room to join")
parser.add_argument(
"-k",
"--apikey",
@@ -93,20 +100,25 @@ if __name__ == "__main__":
args, unknown = parser.parse_known_args()
# Create a meeting token for the given room with an expiration 1 hour in the future.
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(args.url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={"Authorization": f"Bearer {args.apikey}"},
headers={
"Authorization": f"Bearer {args.apikey}"},
json={
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
},
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]

View File

@@ -0,0 +1,137 @@
import aiohttp
import asyncio
import os
import wave
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.pipeline.aggregators import LLMContextAggregator
from dailyai.services.ai_services import AIService, FrameLogger
from dailyai.pipeline.frames import Frame, AudioFrame, LLMResponseEndFrame, LLMMessagesFrame
from typing import AsyncGenerator
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
sounds = {}
sound_files = [
'ding1.wav',
'ding2.wav'
]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = audio_file.readframes(-1)
class OutboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMResponseEndFrame):
yield AudioFrame(sounds["ding1.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
class InboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMMessagesFrame):
yield AudioFrame(sounds["ding2.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
async def main(room_url: str, token, phone):
async with aiohttp.ClientSession() as session:
global transport
global llm
global tts
transport = DailyTransport(
room_url,
token,
"Respond bot",
300,
)
transport._mic_enabled = True
transport._mic_sample_rate = 16000
transport._camera_enabled = False
llm = AzureLLMService()
tts = AzureTTSService()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await tts.say("Hi, I'm listening!", transport.send_queue)
await transport.send_queue.put(AudioFrame(sounds["ding1.wav"]))
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
]
tma_in = LLMContextAggregator(
messages, "user", transport._my_participant_id
)
tma_out = LLMContextAggregator(
messages, "assistant", transport._my_participant_id
)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
await out_sound.run_to_queue(
transport.send_queue,
tts.run(
tma_out.run(
llm.run(
fl2.run(
in_sound.run(
tma_in.run(
transport.get_receive_frames()
)
)
)
)
)
)
)
@transport.event_handler("on_participant_joined")
async def pax_joined(transport, pax):
print(f"PARTICIPANT JOINED: {pax}")
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if (state == "joined"):
if (phone):
transport.start_recording()
transport.dialout(phone)
transport.transcription_settings["extra"]["punctuate"] = True
await asyncio.gather(transport.run(), handle_transcriptions())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

34
examples/server/README.md Normal file
View File

@@ -0,0 +1,34 @@
# Server Example
Use this server app to quickly host a bot on the web:
```
flask --app daily-bot-manager.py --debug run
```
It's currently configured to serve example apps defined in the APPS constant in the server file:
```
chatbot
patient-intake
storybot
translator
```
Once the server is started, you can create a bot instance by opening `http://127.0.0.1:5000/start/chatbot` in a browser, and the server will do the following:
- Create a new, randomly-named Daily room with `DAILY_API_KEY` from your .env file or environment
- Start an instance of `chatbot.py` and connect it to that room
- 301 redirect your browser to the room
### Options
The server supports several options, which can be set in the body of a POST request, or as params in the URL of a GET request.
- `room_url` (default: none): A room URL to join. If empty, the server will create a Daily room and return the URL in the response.
room_properties (none): A JSON object (URL encoded if included as a GET parameter) for overriding default room creation properties, as described here: https://docs.daily.co/reference/rest-api/rooms/create-room This will be ignored if a room_url is provided.
- `token_properties` (none): A JSON object (URL encoded if included as a GET parameter) for overriding default token properties. By default, the server creates an owner token with an expiration time of one hour.
- `duration` (7200 seconds, or two hours): Use this property to set a time limit for the bot, as well as an expiration time for the room (if the server is creating one). This will not add an expiration time to an existing room. Expiration times in `token_properties` or `room_properties` will also take precedence over this value. You can set this property to `0` to disable timeouts, but this isn't recommended.
- `bot_args` (none): A string containing any additional command-line args to pass to the bot.
- `wait_for_bot` (true): Whether to wait for the bot to successfully join the room before returning a response from the server. If true, the server will start the bot script, then poll the room for up to 5 seconds to confirm the bot has joined the room. If it doesn't, the server will stop the bot and return a 500 response. If set to `false`, the server will start the bot, but immediately return a 200 response. This can be useful if the server is creating rooms for you, and you need the room URL to join the user to the room.
- `redirect` (true): Instead of returning a 200 for GET requests, the server will return a 301 redirect to the ROOM_URL. This is handy for testing by creating a bot with a GET request directly in the browser. POST requests will never return redirects. Set to `false` to get 200 responses with info in a JSON object even for GET requests.

View File

@@ -0,0 +1,165 @@
import os
import requests
import urllib
import subprocess
import time
from flask import Flask, jsonify, redirect, request
from flask_cors import CORS
from dotenv import load_dotenv
load_dotenv(override=True)
app = Flask(__name__)
CORS(app)
APPS = {
"chatbot": "../starter-apps/chatbot.py",
"patient-intake": "../starter-apps/patient-intake.py",
"storybot": "../starter-apps/storybot.py",
"translator": "../starter-apps/translator.py"
}
daily_api_key = os.getenv("DAILY_API_KEY")
api_path = os.getenv("DAILY_API_PATH") or "https://api.daily.co/v1"
def get_room_name(room_url):
return urllib.parse.urlparse(room_url).path[1:]
def create_room(room_properties, exp):
room_props = {
"exp": exp,
"enable_chat": True,
"enable_emoji_reactions": True,
"eject_at_room_exp": True,
"enable_prejoin_ui": False,
"enable_recording": "cloud"
}
if room_properties:
room_props |= room_properties
res = requests.post(
f"{api_path}/rooms",
headers={"Authorization": f"Bearer {daily_api_key}"},
json={
"properties": room_props
},
)
if res.status_code != 200:
raise Exception(f"Unable to create room: {res.text}")
room_url = res.json()["url"]
room_name = res.json()["name"]
return (room_url, room_name)
def create_token(room_name, token_properties, exp):
token_props = {"exp": exp, "is_owner": True}
if token_properties:
token_props |= token_properties
# Force the token to be limited to the room
token_props |= {"room_name": room_name}
res = requests.post(
f'{api_path}/meeting-tokens',
headers={
'Authorization': f'Bearer {daily_api_key}'},
json={
'properties': token_props})
if res.status_code != 200:
if res.status_code != 200:
raise Exception(f"Unable to create meeting token: {res.text}")
meeting_token = res.json()['token']
return meeting_token
def start_bot(*, bot_path, room_url, token, bot_args, wait_for_bot):
room_name = get_room_name(room_url)
proc = subprocess.Popen(
[f"python {bot_path} -u {room_url} -t {token} -k {daily_api_key} {bot_args}"],
shell=True,
bufsize=1,
)
if wait_for_bot:
# Don't return until the bot has joined the room, but wait for at most 5
# seconds.
attempts = 0
while attempts < 50:
time.sleep(0.1)
attempts += 1
res = requests.get(
f"{api_path}/rooms/{room_name}/get-session-data",
headers={"Authorization": f"Bearer {daily_api_key}"},
)
if res.status_code == 200:
print(f"Took {attempts} attempts to join room {room_name}")
return True
# If we don't break from the loop, that means we never found the bot in the room
raise Exception("The bot was unable to join the room. Please try again.")
return True
@app.route("/start/<string:botname>", methods=["GET", "POST"])
def start(botname):
try:
if botname not in APPS:
raise Exception(f"Bot '{botname}' is not in the allowlist.")
bot_path = APPS[botname]
props = {
"room_url": None,
"room_properties": None,
"token_properties": None,
"bot_args": None,
"wait_for_bot": True,
"duration": None,
"redirect": True
}
props |= request.values.to_dict() # gets URL params as well as plaintext POST body
try:
props |= request.json
except BaseException:
pass
if props['redirect'] == "false":
props['redirect'] = False
if props['wait_for_bot'] == "false":
props['wait_for_bot'] = False
duration = int(os.getenv("DAILY_BOT_DURATION") or 7200)
if props['duration']:
duration = props['duration']
exp = time.time() + duration
if (props['room_url']):
room_url = props['room_url']
try:
room_name = get_room_name(room_url)
except ValueError:
raise Exception(
"There was a problem detecting the room name. Please double-check the value of room_url.")
else:
room_url, room_name = create_room(props['room_properties'], exp)
token = create_token(room_name, props['token_properties'], exp)
bot = start_bot(
room_url=room_url,
bot_path=bot_path,
token=token,
bot_args=props['bot_args'],
wait_for_bot=props['wait_for_bot'])
if props['redirect'] and request.method == "GET":
return redirect(room_url, 302)
else:
return jsonify({"room_url": room_url, "token": token})
except BaseException as e:
return f"There was a problem starting the bot: {e}", 500
@app.route("/healthz")
def health_check():
return "ok", 200

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 759 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 884 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 876 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 881 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 874 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 882 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 885 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 888 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 890 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 898 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 836 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 905 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 849 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Some files were not shown because too many files have changed in this diff Show More