Compare commits

...

366 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
6ac012a82b Merge pull request #158 from pipecat-ai/use-pyloudnorm-loudness
interruptions: introduce pyloudnorm to compute loudness
2024-05-23 05:24:38 +08:00
Aleix Conchillo Flaqué
075194cb54 update CHANGELOG for 0.0.20 2024-05-22 14:21:13 -07:00
Aleix Conchillo Flaqué
269f070051 audio: no need for compute_rms 2024-05-22 14:09:24 -07:00
Aleix Conchillo Flaqué
3342c9d7c2 services(stt): use calculate_audio_volume 2024-05-22 13:05:20 -07:00
Aleix Conchillo Flaqué
b468b2f926 audio: clamp normalized volume 2024-05-22 13:04:09 -07:00
Aleix Conchillo Flaqué
af1c7d0023 interruptions: introduce pyloudnorm to compute loudness
https://github.com/csteinmetz1/pyloudnorm
2024-05-22 11:52:07 -07:00
Aleix Conchillo Flaqué
34670eef79 Merge pull request #162 from pipecat-ai/reset-before-pushing
processors: reset aggergator before pushing
2024-05-23 02:51:55 +08:00
Aleix Conchillo Flaqué
979739c1b7 processors: reset aggergator before pushing 2024-05-22 11:26:08 -07:00
Aleix Conchillo Flaqué
83ed6870b9 Merge pull request #161 from pipecat-ai/only-interrupt-assistant
processors: only interrupt asssisstant
2024-05-23 02:02:43 +08:00
Aleix Conchillo Flaqué
57a568986a processors: only interrupt asssisstant
We were pushing interruption frames in the audio task. This was caussing the
LLMUserResponseAggregator to push the accumulated text and then casuing the LLM
to respond.
2024-05-22 10:15:35 -07:00
Aleix Conchillo Flaqué
e828e26b5b Merge pull request #159 from pipecat-ai/create-pool-executor
transports: run threads in their own ThreadPoolExecutor
2024-05-22 15:49:03 +08:00
Aleix Conchillo Flaqué
825738440e transports: run threads in their own ThreadPoolExecutor 2024-05-21 18:52:27 -07:00
Aleix Conchillo Flaqué
147bd1a075 Merge pull request #156 from pipecat-ai/pipecat-0.0.19
update CHANGELOG.md for 0.0.19
2024-05-21 12:36:48 +08:00
Aleix Conchillo Flaqué
209e97f372 update CHANGELOG.md for 0.0.19 2024-05-20 21:33:15 -07:00
Aleix Conchillo Flaqué
47f8627432 Merge pull request #155 from pipecat-ai/llm-accumlate-full-response
aggregators: accumulate full responses and take interruptions into ac…
2024-05-21 11:34:39 +08:00
Aleix Conchillo Flaqué
cc6713837a github: publish test to pypi again. simply always use PRs 2024-05-20 12:19:39 -07:00
Aleix Conchillo Flaqué
728fe0ad88 github: don't publish to test pypi twice 2024-05-20 12:15:54 -07:00
Aleix Conchillo Flaqué
dbba45349f github: don't run publish_test on main branch 2024-05-20 12:14:00 -07:00
Aleix Conchillo Flaqué
40ccf46b4b aggregators: accumulate full responses and take interruptions into account 2024-05-20 11:40:57 -07:00
Aleix Conchillo Flaqué
077bb9f20a Merge pull request #153 from pipecat-ai/expose-llm-messages
aggregators: expose LLM messages
2024-05-21 02:40:26 +08:00
Aleix Conchillo Flaqué
e4c990c677 aggregators: expose LLM messages 2024-05-20 10:51:37 -07:00
Aleix Conchillo Flaqué
1c8b9d813a examples: minot updates to storytelling-chatbot instructions 2024-05-20 10:31:33 -07:00
Aleix Conchillo Flaqué
83812f2671 transports(daily): implement DailyOutputTransport.send_message 2024-05-20 10:30:59 -07:00
Aleix Conchillo Flaqué
4053c33899 update CHANGELOG for 0.0.17 2024-05-19 19:27:20 -07:00
Aleix Conchillo Flaqué
03978b63bc update linux-py3.10-requirements.txt 2024-05-19 19:27:04 -07:00
Aleix Conchillo Flaqué
bf036be6b8 Merge pull request #150 from pipecat-ai/khk-gemini
Initial commit of Google Gemini LLM service.
2024-05-20 10:24:31 +08:00
Kwindla Hultman Kramer
7ffb10d7f5 add to CHANGELOG.md 2024-05-19 12:44:45 -07:00
Kwindla Hultman Kramer
66377954cb fix up openai vision and gemini implementation 2024-05-19 12:33:57 -07:00
Kwindla Hultman Kramer
e507686cef oops, fix openai.py 2024-05-19 11:13:39 -07:00
Kwindla Hultman Kramer
e5ddaf14f4 add google and deepgram to README.md 2024-05-19 11:09:30 -07:00
Kwindla Hultman Kramer
cf597a2f6b add back in debug log line in openai.py 2024-05-19 11:08:38 -07:00
Kwindla Hultman Kramer
d83f0aabca generate macos-py3.10-requirements.txt with Python 3.10 2024-05-19 10:53:50 -07:00
Kwindla Hultman Kramer
b337e984b3 Initial commit of Google Gemini LLM service.
Gemini text input works. We translate from OpenAILLMContext format
on the fly in the GoogleLLMService implementation. This commit also
implements image input (vision) in both the GoogleLLMService and in
the OpenAILLMService. Image input is a hack and needs to be revisited.
OpenAI expects images to be uploaded as base64-encoded JPEGs. Google
does not require the base64 encoding. Other than for images, we use
the OpenAI format as our standard, but base64-encoding the images
and then unencoding them in the GoogleLLMService feels wasteful.
2024-05-19 10:35:20 -07:00
Aleix Conchillo Flaqué
6366ee072e Merge pull request #144 from pipecat-ai/initial-interruptions
intial basic interruptions support
2024-05-20 01:33:15 +08:00
Aleix Conchillo Flaqué
c3bfcbd562 aggregators: clear accumulated responses if interruption happens 2024-05-19 10:21:45 -07:00
Aleix Conchillo Flaqué
c0d5054798 examples: some prompt tweaking 2024-05-19 09:41:36 -07:00
Aleix Conchillo Flaqué
810dc30d3d examples: fix examples to use LLMFullResponseEndFrame 2024-05-19 09:39:34 -07:00
Aleix Conchillo Flaqué
36dd4933e9 example: add assistant responses to simple chatbot 2024-05-18 10:01:46 -07:00
Aleix Conchillo Flaqué
435fffe1b0 add LLMFullResponseStartFrame/LLMFullResponseEndFrame 2024-05-18 09:49:38 -07:00
Aleix Conchillo Flaqué
2b8f1c4cda services(openai): send LLMResponseStartFrame for each completion 2024-05-17 17:47:33 -07:00
Aleix Conchillo Flaqué
0e8c7a9b28 transports(output): create an downstream push frame task 2024-05-17 17:47:24 -07:00
Aleix Conchillo Flaqué
3e13678f23 vad: use exponential smoothed volume to improve speech detection 2024-05-17 17:13:31 -07:00
Aleix Conchillo Flaqué
455ec4f1fd services(tts): always send received TextFrame downstream 2024-05-17 17:11:11 -07:00
Aleix Conchillo Flaqué
8dc81042c3 examples: use DailyTranscriptionSettings in translation-chatbot 2024-05-17 15:37:29 -07:00
Aleix Conchillo Flaqué
c77db79447 examples: pipelines readability and add LLM assistants after transport 2024-05-17 14:52:51 -07:00
Aleix Conchillo Flaqué
de65028061 vad: reduce default confidence back to 0.5 2024-05-17 14:39:40 -07:00
Aleix Conchillo Flaqué
d66a795413 examples: use SileroVADAnalyzer instead of SileroVAD 2024-05-17 14:18:55 -07:00
Aleix Conchillo Flaqué
34762bf604 transports: allows update allow_interruptinos when receiving StartFrame 2024-05-17 14:15:37 -07:00
Aleix Conchillo Flaqué
57121338b1 pipeline(task): cleanup processors only if we need to 2024-05-17 13:53:33 -07:00
Aleix Conchillo Flaqué
a5d246ec0c vad: use exponential smoothing to avoid sudden changes 2024-05-17 13:53:33 -07:00
Aleix Conchillo Flaqué
f2cefeeedc utils: move exp_smoothing to utils module 2024-05-17 13:52:18 -07:00
Aleix Conchillo Flaqué
537e72a05f vad: introduce VADParams so you can tweak things 2024-05-17 13:52:18 -07:00
Aleix Conchillo Flaqué
efa5a061d7 silero: simplify int16 -> float32 conversion 2024-05-17 13:51:06 -07:00
Aleix Conchillo Flaqué
0bef44c2ff introduce StartInterruptionFrame and StopInterruptionFrame 2024-05-17 13:51:06 -07:00
Aleix Conchillo Flaqué
f62fe059b1 fix issues with Ctrl-C tasks cancellation 2024-05-17 13:51:04 -07:00
Aleix Conchillo Flaqué
f432e2b17e transports: allow adding a vad analyzer to BaseInputTransport 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
8c877d7d8e examples: update 07-interruptible 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
dc9377fb92 add missing queue task_done() 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
7384b63b1d initial interruptions support 2024-05-17 13:50:45 -07:00
Aleix Conchillo Flaqué
ba6ecf541f update CHANGELOG.md for 0.0.16 2024-05-16 18:15:07 -07:00
Aleix Conchillo Flaqué
94e5709d58 Merge pull request #149 from pipecat-ai/transports-push-task
transport: create input transports push frame task
2024-05-17 09:14:35 +08:00
Aleix Conchillo Flaqué
add8d3cbaf transport: create input transports push frame task 2024-05-16 16:54:39 -07:00
Aleix Conchillo Flaqué
1a42188bce Merge pull request #146 from pipecat-ai/daily-dont-send-tracks-if-not-enabled
transports(daily): don't send camera/audio tracks if not enabled
2024-05-17 01:24:39 +08:00
Aleix Conchillo Flaqué
0da427e127 transports(daily): don't send camera/audio tracks if not enabled 2024-05-16 08:16:39 -07:00
Aleix Conchillo Flaqué
9447b32f3e transports(daily): on_app_message doesn't need to be event handler 2024-05-15 17:06:47 -07:00
Aleix Conchillo Flaqué
af10adb7fe some minor event loop updates 2024-05-15 17:00:43 -07:00
Aleix Conchillo Flaqué
129acf886f transports(daily): hot fix for receiving transport messages 2024-05-15 17:00:04 -07:00
Aleix Conchillo Flaqué
9af3e1efac update CHANGELOG.md for 0.0.14 2024-05-15 15:59:38 -07:00
Aleix Conchillo Flaqué
9e22a8b4ff transports(daily): add receiving transport messages 2024-05-15 15:59:08 -07:00
Aleix Conchillo Flaqué
28da747f19 transports(daily): fix on_participant_left event 2024-05-15 15:40:31 -07:00
Aleix Conchillo Flaqué
3d6783ddb0 transports: resize output image if it doesn't match camera 2024-05-15 15:36:20 -07:00
Aleix Conchillo Flaqué
349fc526d7 transports(daily): avoid locking if no participant has joined yet 2024-05-15 15:24:58 -07:00
Aleix Conchillo Flaqué
acf6dc0a30 transports: more start and stop fixes 2024-05-15 15:23:03 -07:00
Aleix Conchillo Flaqué
3563e66ff6 transports(daily): add on_participant_left event 2024-05-15 15:20:37 -07:00
Aleix Conchillo Flaqué
8965ff27ec examples: use DEBUG in 09-mirror.py 2024-05-14 19:25:31 -07:00
Aleix Conchillo Flaqué
86feb1e104 services: fix DailyTransport stop/cleanup ordering 2024-05-14 19:24:55 -07:00
Aleix Conchillo Flaqué
f6257a86d3 examples: re-enable audio in 09-mirror.py 2024-05-14 19:23:35 -07:00
Aleix Conchillo Flaqué
bd04ea8aca examples: simplify 09-mirror.py 2024-05-14 19:07:19 -07:00
Aleix Conchillo Flaqué
754c1c6775 services: fixed DailyTransport output camera and audio 2024-05-14 19:07:19 -07:00
Aleix Conchillo Flaqué
0b01eb5a11 services: pass **kwargs to TTService 2024-05-14 18:46:03 -07:00
Aleix Conchillo Flaqué
6247b9df39 services: fix STTService and WhisperSTTService 2024-05-14 18:45:40 -07:00
Aleix Conchillo Flaqué
bd5344c892 services: MoondreamService model_id argument is now model 2024-05-14 18:34:10 -07:00
Aleix Conchillo Flaqué
e4fe54cd7f vad: rename VADAnalyzer arguments 2024-05-14 18:33:17 -07:00
Aleix Conchillo Flaqué
97f9e9b042 examples: update simple-chatbot prompt 2024-05-14 15:30:31 -07:00
Aleix Conchillo Flaqué
3668eb1606 update CHANGELOG for 0.0.12 2024-05-14 14:52:08 -07:00
Aleix Conchillo Flaqué
e23addcc02 examples: update simple-chatbot with Spanish 2024-05-14 14:51:44 -07:00
Aleix Conchillo Flaqué
5147f4086e transports(daily): add DailyTranscriptionSettings to update settings easier 2024-05-14 14:49:30 -07:00
Aleix Conchillo Flaqué
fb3c2de83f Merge pull request #141 from pipecat-ai/add-changelog
add CHANGELOG.md
2024-05-15 04:47:45 +08:00
Aleix Conchillo Flaqué
107817317c add CHANGELOG.md 2024-05-14 13:45:01 -07:00
Aleix Conchillo Flaqué
663ff3417c examples: add missing requirements 2024-05-14 08:03:51 -07:00
Aleix Conchillo Flaqué
2b19d6bbac examples: remove commented out silero from storytelling 2024-05-14 00:57:21 -07:00
Aleix Conchillo Flaqué
7c41246e55 examples: fix storytelling example 2024-05-14 00:32:37 -07:00
Aleix Conchillo Flaqué
11aa9dc803 pipeline: allow stopping tasks with StopTaskFrame 2024-05-14 00:30:32 -07:00
Aleix Conchillo Flaqué
922cdefee5 services: run_* now return async generators 2024-05-14 00:30:07 -07:00
Aleix Conchillo Flaqué
e018d5b47a transports(daily): always allow capturing transcriptions 2024-05-14 00:29:02 -07:00
Aleix Conchillo Flaqué
20c679988c transports: allow base transports to be reused 2024-05-14 00:28:43 -07:00
Aleix Conchillo Flaqué
a344101cff README.md: s/Twitter/X/ 2024-05-13 18:24:06 -07:00
Aleix Conchillo Flaqué
2cefc40a77 README.md: use http urls for images 2024-05-13 18:20:57 -07:00
Aleix Conchillo Flaqué
68f0da26b6 examples: more translation-chatbot fixes 2024-05-13 17:57:11 -07:00
Aleix Conchillo Flaqué
9aea8e951c aggregators/sentence: ignore interim transcriptions 2024-05-13 17:56:19 -07:00
Aleix Conchillo Flaqué
12ff6d08fe examples: fix translation-chatbot 2024-05-13 16:22:11 -07:00
Aleix Conchillo Flaqué
1b21867a6f transports: add support for sending transport messages 2024-05-13 16:22:11 -07:00
Aleix Conchillo Flaqué
d28d0fa218 processors: add FrameProcessor.push_error 2024-05-13 16:12:35 -07:00
Aleix Conchillo Flaqué
01381f6dcd frames: add TransportMessageFrame 2024-05-13 16:12:30 -07:00
Aleix Conchillo Flaqué
c111fff0f7 services: update azure services 2024-05-13 16:12:26 -07:00
Aleix Conchillo Flaqué
50677e6085 Merge pull request #138 from pipecat-ai/moondream-chatbot-fixes
examples: fix moondream-chatbot
2024-05-14 06:29:13 +08:00
Aleix Conchillo Flaqué
22cd1ac5f2 examples: fix moondream-chatbot 2024-05-13 15:28:11 -07:00
Kwindla Hultman Kramer
fdfcfd1d5e Merge pull request #137 from rahulunair/intel_gpu
(feat): adding intel gpus support
2024-05-13 14:52:34 -07:00
Aleix Conchillo Flaqué
b6385be6c6 Merge pull request #136 from pipecat-ai/simple-chatbot-fixes
examples: fix simple-chatbot
2024-05-14 05:41:52 +08:00
rahulunair
6be88fa81b (feat): adding intel gpus support 2024-05-13 21:21:05 +00:00
Aleix Conchillo Flaqué
ed31c7924e examples: fix simple-chatbot 2024-05-13 13:19:11 -07:00
Jon Taylor
4898084645 Update LICENSE 2024-05-13 20:49:51 +01:00
chadbailey59
6be0751a52 Delete CNAME 2024-05-13 14:42:46 -05:00
Aleix Conchillo Flaqué
7ce1206ed4 Create CNAME 2024-05-13 12:05:08 -07:00
Jon Taylor
1b5130694a Update README.md 2024-05-13 19:36:39 +01:00
Jon Taylor
7c6199e93e Merge pull request #135 from pipecat-ai/jpt/devrel-edits-2
Jpt/devrel edits 2
2024-05-13 18:19:33 +01:00
Jon Taylor
3be742479d removed space 2024-05-13 18:17:00 +01:00
Aleix Conchillo Flaqué
d380b02a44 README: improve code reading 2024-05-13 10:12:19 -07:00
Aleix Conchillo Flaqué
5600fc49f1 README: fix code indentation 2024-05-13 10:08:09 -07:00
Jon Taylor
5f0d8b8d9f removed docs badge 2024-05-13 17:42:01 +01:00
Jon Taylor
8204e5c2d4 removed images 2024-05-13 17:41:03 +01:00
Jon Taylor
29b98c0326 removed images from examples readme 2024-05-13 17:40:07 +01:00
Jon Taylor
3502ef4745 Merge pull request #134 from pipecat-ai/jpt/devrel-edits
Added example apps to repo
2024-05-13 17:37:31 +01:00
Jon Taylor
0d28e84c59 addressed nitpicks 2024-05-13 17:37:01 +01:00
Jon Taylor
062fbf4ce3 fixed header for VAD 2024-05-13 17:20:50 +01:00
Jon Taylor
af8471b370 changed daily_url to daily_room 2024-05-13 17:20:10 +01:00
Jon Taylor
f756027333 updated text for simple example 2024-05-13 17:17:41 +01:00
Jon Taylor
65c4c0b21f fixed typo in readme 2024-05-13 17:14:17 +01:00
Jon Taylor
f1c02f8554 added examples back 2024-05-13 17:09:46 +01:00
Jon Taylor
27ba50cbbf updated README with sample code 2024-05-13 14:51:10 +01:00
Aleix Conchillo Flaqué
b254525d3c go back to using @dataclass since they can be inspected 2024-05-12 22:35:43 -07:00
Aleix Conchillo Flaqué
6c06fb8169 README: update pypi badge 2024-05-12 19:28:00 -07:00
Aleix Conchillo Flaqué
721cd11d62 Merge pull request #133 from pipecat-ai/aleix/readme
rebased jpt/readme branch
2024-05-13 10:26:45 +08:00
Aleix Conchillo Flaqué
bfbcb9d531 fix autopep8 linting 2024-05-12 19:25:17 -07:00
Aleix Conchillo Flaqué
724e78c5be renamed image.png to pipecat.png 2024-05-12 17:44:10 -07:00
Jon Taylor
d3c3d78855 added discord badge 2024-05-12 17:41:36 -07:00
Jon Taylor
8fa9fdcd5a Reworked readme to have more pipes and cats 2024-05-12 17:41:30 -07:00
Aleix Conchillo Flaqué
7856d20a38 Merge pull request #132 from pipecat-ai/pypi-repo-change
change pypi repo to pipecat-ai
2024-05-13 03:14:40 +08:00
Aleix Conchillo Flaqué
6d10027f2d change pypi repo to pipecat-ai 2024-05-12 12:08:43 -07:00
Aleix Conchillo Flaqué
bea31215dc Merge pull request #129 from daily-co/wip-proposal
pipecat proposal
2024-05-13 01:13:18 +08:00
Aleix Conchillo Flaqué
083480ca1e update macos-py3.10-requirements.txt 2024-05-12 10:10:35 -07:00
Aleix Conchillo Flaqué
65846330cf update linux-py3.10-requirements.txt 2024-05-12 10:09:04 -07:00
Aleix Conchillo Flaqué
29f48266f7 README: install dev-requirements.txt first 2024-05-12 10:07:54 -07:00
Aleix Conchillo Flaqué
bfd583211c examples: use LocalAudioTransport 2024-05-12 10:07:54 -07:00
Aleix Conchillo Flaqué
b026915d19 initial commit for new pipecat architecture 2024-05-12 10:07:25 -07:00
Aleix Conchillo Flaqué
4a0836dc8f Merge pull request #130 from daily-co/dependabot-05-06-24
dependabot: update packages 05-06-24
2024-05-07 08:14:38 +08:00
Aleix Conchillo Flaqué
2729c6bf5b dependabot: update packages 05-06-24 2024-05-06 15:33:33 -07:00
Aleix Conchillo Flaqué
712a889121 Merge pull request #128 from daily-co/pillow-security-fixes
pyproject: pillow security fixes
2024-04-23 01:51:49 +08:00
Aleix Conchillo Flaqué
2f341e4fb0 pyproject: pillow security fixes 2024-04-22 10:28:42 -07:00
Kwindla Hultman Kramer
24198ecf45 Merge pull request #126 from daily-co/jptaylor-patch-3
Update README.md
2024-04-12 23:10:30 -07:00
Jon Taylor
7e4fefe958 Update README.md 2024-04-12 22:45:30 -07:00
Jon Taylor
e9af39b85f Merge pull request #125 from daily-co/jptaylor-patch-2
Update README.md
2024-04-12 22:44:14 -07:00
Jon Taylor
38aa3cebb4 Update README.md 2024-04-12 22:42:11 -07:00
Jon Taylor
72724365a0 Merge pull request #124 from daily-co/jptaylor-patch-1
Update README.md
2024-04-12 22:40:29 -07:00
Jon Taylor
5368462e41 Update README.md 2024-04-12 22:28:40 -07:00
Jon Taylor
1b2b29dd18 Merge pull request #123 from daily-co/jpt/pypi-badge
added pypi badge
2024-04-12 07:33:26 -07:00
Kwindla Hultman Kramer
d2b2b6f619 Merge pull request #122 from daily-co/kwindla-patch-1
Update README.md
2024-04-11 21:34:37 -07:00
Jon Taylor
54bcb52129 added pypi badge 2024-04-11 21:34:27 -07:00
Kwindla Hultman Kramer
3dc7438bc8 Update README.md 2024-04-11 21:05:27 -07:00
Aleix Conchillo Flaqué
523bb9f2a2 Merge pull request #120 from daily-co/small-fireworks-fixes
minor fireworks updates
2024-04-12 06:35:57 +08:00
Aleix Conchillo Flaqué
0c2b3f8b65 minor fireworks updates 2024-04-11 15:34:23 -07:00
chadbailey59
0b7578056d added fireworks adapter (#118) 2024-04-11 17:15:02 -05:00
Aleix Conchillo Flaqué
f1b6b9f8e5 Merge pull request #119 from daily-co/use-new-fal-client-library
services: FalImageGenService now uses fal-client library
2024-04-12 05:59:58 +08:00
Aleix Conchillo Flaqué
cbc51babbe services: use asyncio to_thread in moondreamservice 2024-04-11 14:22:44 -07:00
Aleix Conchillo Flaqué
b0faafc184 update macos-py3.10 requirements 2024-04-11 14:16:19 -07:00
Aleix Conchillo Flaqué
103092dbb2 update linux-py3.10 requirements 2024-04-11 14:13:59 -07:00
Aleix Conchillo Flaqué
7b49c9ade3 services: FalImageGenService now uses fal-client library 2024-04-11 14:09:01 -07:00
Aleix Conchillo Flaqué
1e83a405c0 Merge pull request #117 from daily-co/llm-use-aggregator-pass-through-fix
aggregators: fix LLMUserResponseAggregator passs-through
2024-04-12 04:24:56 +08:00
Aleix Conchillo Flaqué
7336866a1c examples: rely on new daily default transcription settings 2024-04-11 11:22:58 -07:00
Aleix Conchillo Flaqué
0f23282e30 transport: enable interim results in daily transport 2024-04-11 11:22:05 -07:00
Aleix Conchillo Flaqué
eb3bf117b1 use InterimTranscriptionFrame in LLMUserResponseAggregator 2024-04-11 11:21:42 -07:00
Aleix Conchillo Flaqué
e288aa047b examples: use LLMUserResponseAggregator with VAD 2024-04-11 08:10:56 -07:00
Aleix Conchillo Flaqué
9a9df35d7b aggregators: allow TranscriptionFrame after an end frame threshold 2024-04-10 23:35:31 -07:00
Aleix Conchillo Flaqué
af8663e95d aggregators: fix LLMUserResponseAggregator passs-through 2024-04-10 21:46:16 -07:00
Aleix Conchillo Flaqué
db05a9b29b Merge pull request #116 from daily-co/moondream-use-cpu
moondream: allow passing use_cpu
2024-04-11 09:08:11 +08:00
Aleix Conchillo Flaqué
130e418800 moondream: allow passing use_cpu 2024-04-10 17:43:44 -07:00
Aleix Conchillo Flaqué
1a0a66e503 Merge pull request #114 from daily-co/jpt/fal-updates
Updated Fal.ai service to take a params model and allow for model string param
2024-04-11 00:47:33 +08:00
Aleix Conchillo Flaqué
e22babbae2 examples: update with new FalImageGenService parameters 2024-04-10 09:45:08 -07:00
Aleix Conchillo Flaqué
bfe2e0f36e services: don't use image_size in ImageGenService 2024-04-10 09:44:42 -07:00
Aleix Conchillo Flaqué
26d401e5de Merge pull request #115 from daily-co/add-vision-and-moondream-service
add vision and moondream service
2024-04-11 00:22:26 +08:00
Aleix Conchillo Flaqué
3c20f9153d added VisionImageFrame and VisionImageFrameAggregator 2024-04-10 09:19:34 -07:00
Aleix Conchillo Flaqué
2f9899af5a update macos-py3.10 requirements 2024-04-09 22:39:04 -07:00
Aleix Conchillo Flaqué
5ef5cf30f4 update linux-py3.10 requirements 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
34a6c5691b examples: added 12-describe-video 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
18bf09c704 services: added MoondreamService 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
84cfa7cc95 services: added VisionService 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
a5eba0106b transport: allow requesting a user frame 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
b117a185e3 frames: added UserImageRequestFrame 2024-04-09 22:14:54 -07:00
Aleix Conchillo Flaqué
0219230827 Merge pull request #113 from daily-co/aleix/only-subcribe-to-participant
only subcribe to participant
2024-04-10 10:47:29 +08:00
Aleix Conchillo Flaqué
9fcbb36997 examples: add 14a-local-render-remote-participant 2024-04-09 19:46:10 -07:00
Aleix Conchillo Flaqué
0bf15fd6eb daily: only subscribe to participant video source 2024-04-09 19:46:10 -07:00
Aleix Conchillo Flaqué
989252bb52 daily: always check camera/mic/speaker enabled 2024-04-09 19:46:10 -07:00
Jon Taylor
7b44a79a5b added params and model attribute to fal service 2024-04-09 17:43:27 -07:00
Aleix Conchillo Flaqué
4bd29b0080 Merge pull request #110 from daily-co/compatible-versions
pyproject: use compatible version
2024-04-10 00:41:22 +08:00
Aleix Conchillo Flaqué
ebb76fdae9 update macos-py3.10 requirements 2024-04-09 08:52:37 -07:00
Aleix Conchillo Flaqué
5d52def0fe update linux-py3.10 requirements 2024-04-09 08:49:41 -07:00
Aleix Conchillo Flaqué
9ada56d0b0 pyproject: use compatible version 2024-04-09 08:41:54 -07:00
Aleix Conchillo Flaqué
8d73cdb2ee Merge pull request #111 from daily-co/user-transcription-aggregator
pipeline: add UserTranscriptionAggregator
2024-04-09 23:34:52 +08:00
Aleix Conchillo Flaqué
4f04b10202 Merge pull request #112 from daily-co/user-image-frame
user image frames and other updates
2024-04-09 23:34:32 +08:00
Aleix Conchillo Flaqué
97b923e37e llm user and assistant aggregator renames 2024-04-09 08:31:48 -07:00
Aleix Conchillo Flaqué
57aabea0a3 examples: added 14-render-remote-participant 2024-04-09 08:01:14 -07:00
Aleix Conchillo Flaqué
319b8e7816 updated ImageFrame and added URLImageFrame and UserImageFrame 2024-04-08 23:23:33 -07:00
Aleix Conchillo Flaqué
96950ca6df daily: on_first_other_participant_joined now gets the participant 2024-04-08 23:23:33 -07:00
Aleix Conchillo Flaqué
d7b2e67c35 pipeline: add UserTranscriptionAggregator 2024-04-08 17:15:14 -07:00
Aleix Conchillo Flaqué
53930b47a5 github: just some rewording 2024-04-06 18:03:53 -07:00
Aleix Conchillo Flaqué
86c8ab02cc github: also publish stables releases to test pypi 2024-04-06 17:58:13 -07:00
Aleix Conchillo Flaqué
b678097f6d Merge pull request #109 from daily-co/only-use-fps
transport: only use fps to set maxFramerate
2024-04-07 07:02:44 +08:00
Aleix Conchillo Flaqué
eb455043c4 transport: use camera_bitrate and camera_framerate 2024-04-06 12:27:05 -07:00
Aleix Conchillo Flaqué
dd696be04c Merge pull request #108 from daily-co/add-camera-max-framerate
transport: add camera_max_framerate argument
2024-04-06 11:18:42 +08:00
Aleix Conchillo Flaqué
96b2337183 transport: add camera_max_framerate argument 2024-04-05 20:16:03 -07:00
Aleix Conchillo Flaqué
ea52e73f57 Merge pull request #107 from daily-co/increase-max-framerate
transport: increase daily maxFramerate to 30
2024-04-06 11:08:21 +08:00
Aleix Conchillo Flaqué
88404e4739 Merge pull request #106 from daily-co/updated-to-be-updated-examples
examples: updated to_be_updated examples
2024-04-06 11:06:30 +08:00
Aleix Conchillo Flaqué
0fd323714e transport: add camera_max_bitrate argument 2024-04-05 20:05:58 -07:00
Aleix Conchillo Flaqué
a362ca4d3d transport: increase daily maxFramerate to 30 2024-04-05 19:44:25 -07:00
Aleix Conchillo Flaqué
02b5c3dd5f update dot-env.template 2024-04-05 16:16:56 -07:00
Aleix Conchillo Flaqué
497a09cbc8 examples: updated to_be_updated examples 2024-04-05 16:01:23 -07:00
Aleix Conchillo Flaqué
172a14245d Merge pull request #104 from daily-co/threaded-transport-allow-sink-override
examples: fix whisper examples
2024-04-06 04:46:12 +08:00
Aleix Conchillo Flaqué
302246399b Merge pull request #105 from daily-co/local-tranport-read-audio-frames
transports: fix local transport read_audio_frames
2024-04-06 04:44:37 +08:00
Aleix Conchillo Flaqué
9590cc2fbc examples: fix whisper examples 2024-04-05 13:43:51 -07:00
Aleix Conchillo Flaqué
09e4044c72 transports: fix local transport read_audio_frames 2024-04-05 13:34:01 -07:00
Aleix Conchillo Flaqué
efdfb74dc3 github: increase fetch-depth to 100 for test publish 2024-04-05 08:32:29 -07:00
Aleix Conchillo Flaqué
158de6f20b github: fetch-tags and increase fetch-depth for test publish 2024-04-05 08:25:37 -07:00
Aleix Conchillo Flaqué
47f68b742d pyproject: user proper environment for test pypi 2024-04-05 08:02:45 -07:00
Aleix Conchillo Flaqué
2654ca1f62 pyproject: don't use local version for test pypi 2024-04-05 07:51:52 -07:00
Aleix Conchillo Flaqué
4263827ee8 README: use double-quotes with optional dependencies 2024-04-04 17:47:16 -07:00
Aleix Conchillo Flaqué
97fe529b0e github: update test publish workflow 2024-04-04 17:41:31 -07:00
Aleix Conchillo Flaqué
86025723e7 github: one more publish workflow fix 2024-04-04 17:36:20 -07:00
Aleix Conchillo Flaqué
6f4270a552 github: avoid caching in publish workflow 2024-04-04 17:32:50 -07:00
Aleix Conchillo Flaqué
31f050c02b github: more publish workflows fixes 2024-04-04 17:31:59 -07:00
Aleix Conchillo Flaqué
a0fe57721b github: fix publish workflows 2024-04-04 17:17:15 -07:00
Aleix Conchillo Flaqué
abf5e57319 Merge pull request #103 from daily-co/aleix/fix-github-cache-name
github: fix github cache name
2024-04-05 08:03:15 +08:00
Aleix Conchillo Flaqué
44de9007c3 Merge pull request #102 from daily-co/examples-cleanup
examples cleanup
2024-04-05 08:02:57 +08:00
Aleix Conchillo Flaqué
46d265514e pyproject: update github url 2024-04-04 15:52:28 -07:00
Aleix Conchillo Flaqué
9e64de8606 Merge pull request #101 from daily-co/cb/bot-exit
Allow transport exit to end a running pipeline
2024-04-05 06:51:06 +08:00
Aleix Conchillo Flaqué
1ea503c1e6 examples: fix 03a-image-local 2024-04-04 15:35:58 -07:00
Aleix Conchillo Flaqué
d0aeeccb68 github: fix github cache name 2024-04-04 14:36:04 -07:00
Aleix Conchillo Flaqué
d687c8cdeb transports: updated silero vad not found message 2024-04-04 14:05:40 -07:00
Aleix Conchillo Flaqué
951f20c788 transports: don't write/read if microphone/speaker not enabled 2024-04-04 14:05:15 -07:00
Aleix Conchillo Flaqué
982c0a0749 examples: move non-working examples to to_be_updated 2024-04-04 14:04:53 -07:00
Chad Bailey
27cef7cd70 add endframe to transport receive queue 2024-04-04 20:45:23 +00:00
chadbailey59
03ea208361 VAD fallback (#97)
* Silero VAD preferred with webrtc fallback

* webrtc VAD neds a different sample size

* fixup

* fixup
2024-04-04 13:31:07 -05:00
Aleix Conchillo Flaqué
385b51ac83 Merge pull request #98 from daily-co/use-pip-features
use pip optional dependencies
2024-04-05 01:00:21 +08:00
Aleix Conchillo Flaqué
a37e4fabad github: only run publish-test on main 2024-04-04 09:58:42 -07:00
Aleix Conchillo Flaqué
8bc3c03a69 add a requirements.txt per platform 2024-04-03 21:39:10 -07:00
Aleix Conchillo Flaqué
1fc800754b github: no need to install dependencies when building/deploying 2024-04-03 16:26:58 -07:00
Aleix Conchillo Flaqué
18c4bccc13 github: rename deploy to publish 2024-04-03 16:22:23 -07:00
Aleix Conchillo Flaqué
d57d473c13 pyproject.toml: use setuptools_scm to auto manage versions 2024-04-03 16:13:07 -07:00
Aleix Conchillo Flaqué
48bb3c6955 github: add publish to pypi workflows 2024-04-03 15:57:59 -07:00
Aleix Conchillo Flaqué
e3ee3f9cc6 github(lint): use requirements-dev.txt 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
3528f5d735 use conditional imports and show help errors if modules not found 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
23735cb3a3 dot-env.example: cleanup and add missing environment variables 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
6918dc69f0 github: separate build and test workflows 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
128d350abc pyproject.toml: use project optional dependencies and pin them 2024-04-03 15:27:31 -07:00
chadbailey59
2f59e38a7a Modularize tricky dependencies (#95)
* removed pyaudio from threaded transport

* modularized torch and torchaudio

* modularized local transport

* Working Dockerfile as well

* docker updates for fly.io
2024-04-03 10:48:11 -05:00
chadbailey59
c21014860f Added app messages to translator example (#94) 2024-04-01 14:25:20 -05:00
chadbailey59
d4e3e1710f Server updates (#90)
* updated server readme

* fixup

* Refactored server

* fixup
2024-03-28 15:03:08 -05:00
Moishe Lettvin
e7f9296b5a Merge pull request #93 from daily-co/frame-name-cleanup
Cleanup the last few badly-named Frame types
2024-03-28 14:25:59 -04:00
Moishe Lettvin
27322108b7 Cleanup the last few badly-named Frame types 2024-03-28 12:36:24 -04:00
Moishe Lettvin
22bbedec93 Merge pull request #92 from daily-co/remove-bad-print
Remove mistakenly-added print statement
2024-03-28 11:54:49 -04:00
Moishe Lettvin
ed91bc0f66 Remove mistakenly-added print statement 2024-03-28 11:47:11 -04:00
Moishe Lettvin
565acfa9c9 Merge pull request #86 from daily-co/transport-refactor
Starting refactor of transports into their own directory
2024-03-28 11:17:32 -04:00
Moishe Lettvin
a2295b6b1d Merge pull request #91 from daily-co/pipeline-logging
Add logging for pipeline
2024-03-28 11:16:26 -04:00
Moishe Lettvin
fef1366c84 Merge pull request #88 from daily-co/frame-progress-diagram
Frame progress diagram
2024-03-28 11:13:21 -04:00
Moishe Lettvin
5c0ba1b6f0 Fix off by one errors, add tests and comment 2024-03-28 08:34:34 -04:00
Moishe Lettvin
05c77bce25 Add logging for pipeline 2024-03-27 18:48:30 -04:00
Moishe Lettvin
4ce140bf84 Move some things to AbstractTransport class 2024-03-27 12:59:08 -04:00
James Hush
a3293c6d7a fix: force overriding environment variables from .env files (#89) 2024-03-27 23:38:55 +08:00
Moishe Lettvin
ce04d4a54a Add text to md 2024-03-27 08:10:14 -04:00
Moishe Lettvin
758ed2d895 Frame progress images 2024-03-26 20:40:10 -04:00
Moishe Lettvin
85cd795b2b fix image 2024-03-26 20:36:18 -04:00
Moishe Lettvin
6c36d5f686 Testing 2024-03-26 20:33:09 -04:00
Moishe Lettvin
b2425d6dcd Testing 2024-03-26 20:32:32 -04:00
Moishe Lettvin
e8a6560ac1 Merge forgotten files 2024-03-26 16:24:47 -04:00
Moishe Lettvin
78c80d8941 some more renames 2024-03-26 15:57:19 -04:00
Moishe Lettvin
2fc5de6afe Starting refactor of transports into their own directory 2024-03-26 08:35:04 -04:00
Moishe Lettvin
24fb7c5a05 Merge pull request #81 from daily-co/websocket-transport
Websocket transport
2024-03-25 14:40:34 -04:00
Moishe Lettvin
5761e23af1 remove unnecessary checks 2024-03-25 14:00:08 -04:00
Moishe Lettvin
960c659d5a Remove duplicated constant 2024-03-25 13:59:03 -04:00
Moishe Lettvin
2bda4c3307 Websocket transport 2024-03-25 13:54:34 -04:00
Aleix Conchillo Flaqué
2c5628a621 Merge pull request #85 from daily-co/minor-readme-update
README: minor fixes
2024-03-22 04:33:42 +08:00
Aleix Conchillo Flaqué
9b4cfd9a6c README: minor fixes 2024-03-21 13:16:50 -07:00
Aleix Conchillo Flaqué
8f9aeb0751 Merge pull request #82 from daily-co/remove-unused-imports
remove unused imports
2024-03-22 03:02:07 +08:00
Aleix Conchillo Flaqué
e8a9d43287 Merge pull request #84 from daily-co/use-openai-api-key
use OPENAI_API_KEY instead of OPENAI_CHATGPT_API_KEY
2024-03-21 21:57:40 +08:00
Aleix Conchillo Flaqué
cf5d516d51 use OPENAI_API_KEY instead of OPENAI_CHATGPT_API_KEY
Fixes #77
2024-03-20 15:26:32 -07:00
Aleix Conchillo Flaqué
0666dd1194 remove unused imports 2024-03-20 14:52:19 -07:00
Aleix Conchillo Flaqué
42e25ccd13 create missing __init__.py 2024-03-20 14:41:39 -07:00
Aleix Conchillo Flaqué
520cee273f Merge pull request #80 from daily-co/move-src-daily-tests-to-tests
move src/dailyai/tests to tests
2024-03-21 00:27:07 +08:00
Aleix Conchillo Flaqué
a189e2618f github: source venv in every step 2024-03-19 15:31:03 -07:00
Aleix Conchillo Flaqué
ae2dcf88ed github: use virtual environment 2024-03-19 15:23:09 -07:00
Aleix Conchillo Flaqué
5cdb82ad3c README: one more autopep8 emacs update 2024-03-19 15:18:29 -07:00
Aleix Conchillo Flaqué
593513c84a github: add venv caching 2024-03-19 15:17:48 -07:00
Aleix Conchillo Flaqué
16257f8ec0 move src/dailyai/tests to tests 2024-03-19 14:59:48 -07:00
Aleix Conchillo Flaqué
5fc21a7508 Merge pull request #73 from daily-co/github-unittests-workflow
github: add workflow for unit tests
2024-03-20 03:01:03 +08:00
Aleix Conchillo Flaqué
cc05429135 github: add workflow for unit tests 2024-03-19 11:51:14 -07:00
Aleix Conchillo Flaqué
85e66dddbe Merge pull request #79 from daily-co/readme-emacs-autopep8-update
README: emacs autopep8 update
2024-03-20 02:17:44 +08:00
Aleix Conchillo Flaqué
03ea559839 README: emacs autopep8 update 2024-03-19 10:28:11 -07:00
Aleix Conchillo Flaqué
b6c9859e34 Merge pull request #78 from daily-co/readme-editor-setup
README: add editor setup
2024-03-20 01:10:57 +08:00
Aleix Conchillo Flaqué
bc47c909a3 README: add editor setup 2024-03-19 10:10:14 -07:00
Aleix Conchillo Flaqué
428659730d Merge pull request #70 from daily-co/move-src-example-to-examples
move src/examples to examples
2024-03-20 01:09:13 +08:00
Aleix Conchillo Flaqué
a573277a10 examples: copy runner.py and auth.py where needed 2024-03-18 17:10:23 -07:00
Aleix Conchillo Flaqué
69c2637a25 README.md: update examples 2024-03-18 14:53:53 -07:00
Aleix Conchillo Flaqué
90c34d278f move src/examples to examples 2024-03-18 11:51:38 -07:00
Aleix Conchillo Flaqué
2f4e31d1b2 Merge pull request #69 from daily-co/add-github-linting-workflow
github: add linting workflow
2024-03-19 02:46:50 +08:00
Aleix Conchillo Flaqué
9385270775 autopep8 formatting 2024-03-18 11:28:32 -07:00
Aleix Conchillo Flaqué
2914e43350 github: add linting workflow 2024-03-18 11:28:06 -07:00
chadbailey59
78638d2dba Live translation (#61)
* added translator

* fixup
2024-03-18 13:26:05 -05:00
Aleix Conchillo Flaqué
141a5bb548 Merge pull request #68 from daily-co/log-transcription-errors
daily: log transcription errors
2024-03-19 01:53:40 +08:00
Aleix Conchillo Flaqué
3957813202 Merge pull request #67 from daily-co/add-dot-env-template
add dot-env.template
2024-03-19 01:49:21 +08:00
Aleix Conchillo Flaqué
549862ef99 daily: log transcription errors 2024-03-18 10:47:20 -07:00
Aleix Conchillo Flaqué
1000ca5b55 add dot-env.template 2024-03-18 10:43:57 -07:00
Moishe Lettvin
91dbfef4c3 Merge pull request #64 from daily-co/docs
Some docs
2024-03-18 13:38:32 -04:00
Moishe Lettvin
3b61d0b41a fix typos 2024-03-18 13:38:00 -04:00
Moishe Lettvin
bf3ae091b9 Merge pull request #62 from daily-co/anthropic-support
Anthropic LLM service
2024-03-18 13:36:39 -04:00
Aleix Conchillo Flaqué
34ac796607 Merge pull request #66 from daily-co/daily-transport-release-client
services: release daily client after leave
2024-03-19 01:36:22 +08:00
Aleix Conchillo Flaqué
e0551e9d85 services: release daily client after leave 2024-03-18 10:32:46 -07:00
Moishe Lettvin
b1ab6f91b9 Merge pull request #65 from daily-co/app-messages
Support for app messages
2024-03-18 11:37:10 -04:00
Moishe Lettvin
58726dc20d clean up imports 2024-03-18 10:14:51 -04:00
Moishe Lettvin
8e61fe8e36 Support for app messages 2024-03-18 10:08:41 -04:00
Moishe Lettvin
99b836c227 added docstrings to frames. 2024-03-18 09:08:12 -04:00
Moishe Lettvin
1c27f77f1a drafty architecture doc 2024-03-18 08:39:50 -04:00
Moishe Lettvin
c91fa39a99 Remove testing code 2024-03-15 19:42:46 -04:00
Moishe Lettvin
eacaea7db4 Anthropic LLM service 2024-03-15 19:40:37 -04:00
Moishe Lettvin
c6dfcb6f7a Merge pull request #60 from daily-co/remove-ai-service-methods
Remove run_to_queue and run from AIService class
2024-03-15 15:28:28 -04:00
Moishe Lettvin
18bf26de14 Update apps 2024-03-15 13:39:33 -04:00
Moishe Lettvin
b8b35db89c Remove run_to_queue and run from AIService class 2024-03-15 11:04:22 -04:00
Moishe Lettvin
358166f347 Merge pull request #59 from daily-co/remove-requirements
Remove unused requirements file
2024-03-13 16:23:42 -04:00
Moishe Lettvin
c006c123b2 Remove unused requirements file 2024-03-13 16:19:03 -04:00
chadbailey59
cf302fb765 Storybot and Chatbot examples (#58)
* storybot

* storybot

* added pipeline.queue_frames

* fixup
2024-03-13 15:12:59 -05:00
Moishe Lettvin
e33820fe36 Merge pull request #56 from daily-co/fal-redux
Use other model in FAL
2024-03-12 15:14:57 -04:00
Moishe Lettvin
b84b3d59f3 Use other model in FAL 2024-03-12 14:47:00 -04:00
Moishe Lettvin
7b5b88b99b Merge pull request #55 from daily-co/fix-fal
set FAL param correctly
2024-03-12 14:12:16 -04:00
Moishe Lettvin
e87196cce7 set FAL param correctly 2024-03-12 14:03:43 -04:00
chadbailey59
bbfc9e703b intake cleanup (#54) 2024-03-12 13:01:39 -05:00
Moishe Lettvin
c21a63d48b Merge pull request #49 from daily-co/openai-base-llm
Base OpenAI LLM service
2024-03-12 12:58:31 -04:00
Moishe Lettvin
f546bb32da Make 08- work again 2024-03-12 10:34:52 -04:00
Moishe Lettvin
d9378e23ba Base OpenAI LLM service 2024-03-11 16:52:41 -04:00
Moishe Lettvin
c75a3fb0d0 Merge pull request #53 from daily-co/fix_other_joined_event
Don't do time-consuming processing in `on_other_joined_event`
2024-03-11 13:27:13 -04:00
Moishe Lettvin
f8ae264957 remove unnecessary print 2024-03-11 13:20:28 -04:00
Moishe Lettvin
977c12d530 undo fal change 2024-03-11 13:19:47 -04:00
Moishe Lettvin
61c55d2f47 Fix up other examples 2024-03-11 13:17:31 -04:00
Moishe Lettvin
fd2fa23e9c Fix example 2 2024-03-11 13:00:29 -04:00
Moishe Lettvin
de026ccc8a Merge pull request #50 from daily-co/khk/launch-samples
Khk/launch samples
2024-03-11 12:50:38 -04:00
Moishe Lettvin
c5bb0e14ab Merge pull request #51 from daily-co/khk/readme
updated README
2024-03-11 12:50:22 -04:00
chadbailey59
a4f3c51184 the smallest commit in history 2024-03-11 09:47:00 -05:00
Moishe Lettvin
7786e685cc Merge pull request #52 from daily-co/pypi-updates
updates to pyproject.toml
2024-03-11 10:34:35 -04:00
Moishe Lettvin
33793ca9f8 update description 2024-03-11 07:31:39 -04:00
Moishe Lettvin
d26aede667 updates to pyproject.toml 2024-03-11 07:25:20 -04:00
Moishe Lettvin
ad993056d8 rename to dailyai 2024-03-11 07:16:20 -04:00
Kwindla Hultman Kramer
5b1f26aacb updated README 2024-03-10 22:06:23 -07:00
Kwindla Hultman Kramer
4e16e514dd attempting to change tts to deepgram in example 04 2024-03-10 19:43:06 -07:00
Kwindla Hultman Kramer
959ffa9d36 small streamlining of example 03 2024-03-10 19:42:19 -07:00
Kwindla Hultman Kramer
4396b1018a small streamlining of example 02 2024-03-10 19:41:32 -07:00
Kwindla Hultman Kramer
37e904ce68 changed fal to a maybe slightly faster model 2024-03-10 19:40:51 -07:00
Kwindla Hultman Kramer
ef39d842a5 custom processor in example 05 2024-03-10 19:18:37 -07:00
Kwindla Hultman Kramer
72f631a066 working on foundational examples 2024-03-10 17:21:46 -07:00
chadbailey59
5d46302b9e changed default services (#47) 2024-03-08 15:36:30 -06:00
chadbailey59
8241dc0bed cleaned up example logging (#46) 2024-03-08 15:25:17 -06:00
Moishe Lettvin
95a1efbe75 Merge pull request #45 from daily-co/exception_handling_callbacks
Wait for the callback's result, so exceptions get raised
2024-03-08 15:04:15 -05:00
Moishe Lettvin
e59df8476e Wait for the callback's result, so exceptions get raised 2024-03-08 15:02:15 -05:00
chadbailey59
824df8ca7c moved patient intake and example runner (#44) 2024-03-08 12:07:51 -06:00
chadbailey59
0db8a51b27 cleaned up function calling frames (#43) 2024-03-08 10:13:28 -06:00
chadbailey59
ce9c6ede66 function allowlist (#42) 2024-03-08 08:49:09 -06:00
Moishe Lettvin
192b46bbab Merge pull request #41 from daily-co/optimize-pipeline
Optimize pipeline processing
2024-03-07 21:01:03 -05:00
Moishe Lettvin
196279e342 Add endframe to sample 4 2024-03-07 19:24:27 -05:00
Moishe Lettvin
edd93bc4cb remove errant print statement 2024-03-07 19:05:03 -05:00
Moishe Lettvin
d0076dd4ee Optimize pipeline processing so we don't wait for the completion of one generator to move onto the next. 2024-03-07 18:59:47 -05:00
332 changed files with 17557 additions and 4740 deletions

30
.dockerignore Normal file
View File

@@ -0,0 +1,30 @@
# flyctl launch added from .gitignore
**/.vscode
**/env
**/__pycache__
**/*~
**/venv
#*#
# Distribution / packaging
**/.Python
**/build
**/develop-eggs
**/dist
**/downloads
**/eggs
**/.eggs
**/lib
**/lib64
**/parts
**/sdist
**/var
**/wheels
**/share/python-wheels
**/*.egg-info
**/.installed.cfg
**/*.egg
**/MANIFEST
**/.DS_Store
**/.env
fly.toml

44
.github/workflows/build.yaml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: build
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
name: "Build and Install"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Install project and other Python dependencies
run: |
source .venv/bin/activate
pip install --editable .

44
.github/workflows/lint.yaml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: lint
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-lint-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
autopep8:
name: "Formatting lints"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install development Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: autopep8
id: autopep8
run: |
source .venv/bin/activate
autopep8 --max-line-length 100 --exit-code -r -d --exclude "*_pb2.py" -a -a src/
- name: Fail if autopep8 requires changes
if: steps.autopep8.outputs.exit-code == 2
run: exit 1

84
.github/workflows/publish.yaml vendored Normal file
View File

@@ -0,0 +1,84 @@
name: publish
on:
workflow_dispatch:
inputs:
gitref:
type: string
description: "what git ref to build"
required: true
jobs:
build:
name: "Build and upload wheels"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels
path: ./dist
publish-to-pypi:
name: "Publish to PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: pypi
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
publish-to-test-pypi:
name: "Publish to Test PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: testpypi
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/

63
.github/workflows/publish_test.yaml vendored Normal file
View File

@@ -0,0 +1,63 @@
name: publish-test
on:
workflow_dispatch:
push:
branches:
- main
jobs:
build:
name: "Build and upload wheels"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
fetch-tags: true
fetch-depth: 100
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels
path: ./dist
publish-to-test-pypi:
name: "Publish to Test PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: testpypi
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/

49
.github/workflows/tests.yaml vendored Normal file
View File

@@ -0,0 +1,49 @@
name: test
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
test:
name: "Unit and Integration Tests"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing requirements-dev.txt and requirements-extra.txt which
# contain all dependencies needed to run the tests and examples.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('linux-py3.10-requirements.txt') }}-${{ hashFiles('dev-requirements.txt') }}
path: .venv
- name: Install system packages
run: sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r linux-py3.10-requirements.txt -r dev-requirements.txt
- name: Test with pytest
run: |
source .venv/bin/activate
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests

2
.gitignore vendored
View File

@@ -3,6 +3,7 @@ env/
__pycache__/
*~
venv
.venv
#*#
# Distribution / packaging
@@ -26,3 +27,4 @@ share/python-wheels/
MANIFEST
.DS_Store
.env
fly.toml

313
CHANGELOG.md Normal file
View File

@@ -0,0 +1,313 @@
# Changelog
All notable changes to **pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.0.20] - 2024-05-22
### Added
- In order to improve interruptions we now compute a loudness level using
[pyloudnorm](https://github.com/csteinmetz1/pyloudnorm). The audio coming
WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm
applied to the signal, however we don't do that on our local PyAudio
signals. This means that currently incoming audio from PyAudio is kind of
broken. We will fix it in future releases.
### Fixed
- Fixed an issue where `StartInterruptionFrame` would cause
`LLMUserResponseAggregator` to push the accumulated text causing the LLM
respond in the wrong task. The `StartInterruptionFrame` should not trigger any
new LLM response because that would be spoken in a different task.
- Fixed an issue where tasks and threads could be paused because the executor
didn't have more tasks available. This was causing issues when cancelling and
recreating tasks during interruptions.
## [0.0.19] - 2024-05-20
### Changed
- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` internal
messages are now exposed through the `messages` property.
### Fixed
- Fixed an issue where `LLMAssistantResponseAggregator` was not accumulating the
full response but short sentences instead. If there's an interruption we only
accumulate what the bot has spoken until now in a long response as well.
## [0.0.18] - 2024-05-20
### Fixed
- Fixed an issue in `DailyOuputTransport` where transport messages were not
being sent.
## [0.0.17] - 2024-05-19
### Added
- Added `google.generativeai` model support, including vision. This new `google`
service defaults to using `gemini-1.5-flash-latest`. Example in
`examples/foundational/12a-describe-video-gemini-flash.py`.
- Added vision support to `openai` service. Example in
`examples/foundational/12a-describe-video-gemini-flash.py`.
- Added initial interruptions support. The assistant contexts (or aggregators)
should now be placed after the output transport. This way, only the completed
spoken context is added to the assistant context.
- Added `VADParams` so you can control voice confidence level and others.
- `VADAnalyzer` now uses an exponential smoothed volume to improve speech
detection. This is useful when voice confidence is high (because there's
someone talking near you) but volume is low.
### Fixed
- Fixed an issue where TTSService was not pushing TextFrames downstream.
- Fixed issues with Ctrl-C program termination.
- Fixed an issue that was causing `StopTaskFrame` to actually not exit the
`PipelineTask`.
## [0.0.16] - 2024-05-16
### Fixed
- `DailyTransport`: don't publish camera and audio tracks if not enabled.
- Fixed an issue in `BaseInputTransport` that was causing frames pushed
downstream not pushed in the right order.
## [0.0.15] - 2024-05-15
### Fixed
- Quick hot fix for receiving `DailyTransportMessage`.
## [0.0.14] - 2024-05-15
### Added
- Added `DailyTransport` event `on_participant_left`.
- Added support for receiving `DailyTransportMessage`.
### Fixed
- Images are now resized to the size of the output camera. This was causing
images not being displayed.
- Fixed an issue in `DailyTransport` that would not allow the input processor to
shutdown if no participant ever joined the room.
- Fixed base transports start and stop. In some situation processors would halt
or not shutdown properly.
## [0.0.13] - 2024-05-14
### Changed
- `MoondreamService` argument `model_id` is now `model`.
- `VADAnalyzer` arguments have been renamed for more clarity.
### Fixed
- Fixed an issue with `DailyInputTransport` and `DailyOutputTransport` that
could cause some threads to not start properly.
- Fixed `STTService`. Add `max_silence_secs` and `max_buffer_secs` to handle
better what's being passed to the STT service. Also add exponential smoothing
to the RMS.
- Fixed `WhisperSTTService`. Add `no_speech_prob` to avoid garbage output text.
## [0.0.12] - 2024-05-14
### Added
- Added `DailyTranscriptionSettings` to be able to specify transcription
settings much easier (e.g. language).
### Other
- Updated `simple-chatbot` with Spanish.
- Add missing dependencies in some of the examples.
## [0.0.11] - 2024-05-13
### Added
- Allow stopping pipeline tasks with new `StopTaskFrame`.
### Changed
- TTS, STT and image generation service now use `AsyncGenerator`.
### Fixed
- `DailyTransport`: allow registering for participant transcriptions even if
input transport is not initialized yet.
### Other
- Updated `storytelling-chatbot`.
## [0.0.10] - 2024-05-13
### Added
- Added Intel GPU support to `MoondreamService`.
- Added support for sending transport messages (e.g. to communicate with an app
at the other end of the transport).
- Added `FrameProcessor.push_error()` to easily send an `ErrorFrame` upstream.
### Fixed
- Fixed Azure services (TTS and image generation).
### Other
- Updated `simple-chatbot`, `moondream-chatbot` and `translation-chatbot`
examples.
## [0.0.9] - 2024-05-12
### Changed
Many things have changed in this version. Many of the main ideas such as frames,
processors, services and transports are still there but some things have changed
a bit.
- `Frame`s describe the basic units for processing. For example, text, image or
audio frames. Or control frames to indicate a user has started or stopped
speaking.
- `FrameProcessor`s process frames (e.g. they convert a `TextFrame` to an
`ImageRawFrame`) and push new frames downstream or upstream to their linked
peers.
- `FrameProcessor`s can be linked together. The easiest wait is to use the
`Pipeline` which is a container for processors. Linking processors allow
frames to travel upstream or downstream easily.
- `Transport`s are a way to send or receive frames. There can be local
transports (e.g. local audio or native apps), network transports
(e.g. websocket) or service transports (e.g. https://daily.co).
- `Pipeline`s are just a processor container for other processors.
- A `PipelineTask` know how to run a pipeline.
- A `PipelineRunner` can run one or more tasks and it is also used, for example,
to capture Ctrl-C from the user.
## [0.0.8] - 2024-04-11
### Added
- Added `FireworksLLMService`.
- Added `InterimTranscriptionFrame` and enable interim results in
`DailyTransport` transcriptions.
### Changed
- `FalImageGenService` now uses new `fal_client` package.
### Fixed
- `FalImageGenService`: use `asyncio.to_thread` to not block main loop when
generating images.
- Allow `TranscriptionFrame` after an end frame (transcriptions can be delayed
and received after `UserStoppedSpeakingFrame`).
## [0.0.7] - 2024-04-10
### Added
- Add `use_cpu` argument to `MoondreamService`.
## [0.0.6] - 2024-04-10
### Added
- Added `FalImageGenService.InputParams`.
- Added `URLImageFrame` and `UserImageFrame`.
- Added `UserImageRequestFrame` and allow requesting an image from a participant.
- Added base `VisionService` and `MoondreamService`
### Changed
- Don't pass `image_size` to `ImageGenService`, images should have their own size.
- `ImageFrame` now receives a tuple`(width,height)` to specify the size.
- `on_first_other_participant_joined` now gets a participant argument.
### Fixed
- Check if camera, speaker and microphone are enabled before writing to them.
### Performance
- `DailyTransport` only subscribe to desired participant video track.
## [0.0.5] - 2024-04-06
### Changed
- Use `camera_bitrate` and `camera_framerate`.
- Increase `camera_framerate` to 30 by default.
### Fixed
- Fixed `LocalTransport.read_audio_frames`.
## [0.0.4] - 2024-04-04
### Added
- Added project optional dependencies `[silero,openai,...]`.
### Changed
- Moved thransports to its own directory.
- Use `OPENAI_API_KEY` instead of `OPENAI_CHATGPT_API_KEY`.
### Fixed
- Don't write to microphone/speaker if not enabled.
### Other
- Added live translation example.
- Fix foundational examples.
## [0.0.3] - 2024-03-13
### Other
- Added `storybot` and `chatbot` examples.
## [0.0.2] - 2024-03-12
Initial public release.

62
CHANGELOG.md.template Normal file
View File

@@ -0,0 +1,62 @@
# Changelog
All notable changes to the **<project name>** SDK will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Please make sure to add your changes to the appropriate categories:
## [Unreleased]
### Added
<!-- for new functionality -->
- n/a
### Changed
<!-- for changed functionality -->
- n/a
### Deprecated
<!-- for soon-to-be removed functionality -->
- n/a
### Removed
<!-- for removed functionality -->
- n/a
### Fixed
<!-- for fixed bugs -->
- n/a
### Performance
<!-- for performance-relevant changes -->
- n/a
### Security
<!-- for security-relevant changes -->
- n/a
### Other
<!-- for everything else -->
- n/a
## [0.1.0] - YYYY-MM-DD
Initial release.

View File

@@ -7,13 +7,14 @@ COPY *.py /app
COPY pyproject.toml /app
COPY src/ /app/src/
COPY examples/ /app/examples/
WORKDIR /app
RUN ls --recursive /app/
RUN pip3 install --upgrade -r requirements.txt
RUN python -m build .
RUN pip3 install .
RUN pip3 install gunicorn
# If running on Ubuntu, Azure TTS requires some extra config
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi
@@ -36,4 +37,4 @@ WORKDIR /app
EXPOSE 8000
# run
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--chdir", "examples/server", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]

View File

@@ -1,6 +1,6 @@
BSD 2-Clause License
Copyright (c) 2024, Daily
Copyright (c) 2024, Kwindla Hultman Kramer
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

300
README.md
View File

@@ -1,159 +1,221 @@
# Daily AI SDK
<div align="center">
 <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div>
Build conversational, multi-modal AI apps with real-time voice and video, like this:
# Pipecat
_Demo Video to come_
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Discord](https://img.shields.io/discord/1239284677165056021
)](https://discord.gg/pipecat)
With built-in support for many of the best AI platforms (or [add your own](/docs)):
`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, [story-telling toys for kids](https://storytelling-chatbot.fly.dev/), customer support bots, [intake flows](https://www.youtube.com/watch?v=lDevgsp9vn0), and snarky social companions.
- Azure - DALL-E, ChatGPT, and Azure AI Text-to-Speech
- Deepgram - Speech-to-text, and Aura text-to-speech
- Eleven Labs text-to-speech
- Fal.ai image generation
- OpenAI DALL-E and ChatGPT
- Whisper local speech-to-text
Take a look at some example apps:
## Step 1: Get Started
<p float="left">
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/storytelling-chatbot/image.png" width="280" /></a>
<br/>
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/translation-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="280" /></a>
</p>
## Build/Install
## Getting started with voice agents
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.
```shell
# install the module
pip install pipecat-ai
# set up an .env file with API keys
cp dot-env.template .env
```
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:
```shell
pip install "pipecat-ai[option,...]"
```
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
- **AI services**: `anthropic`, `azure`, `deepgram`, `google`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
- **Transports**: `local`, `websocket`, `daily`
## Code examples
- [foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development
## A simple voice agent running locally
Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use [Daily](https://daily.co) for real-time media transport, and [ElevenLabs](https://elevenlabs.io/) for text-to-speech.
```python
#app.py
import asyncio
import aiohttp
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
async def main():
async with aiohttp.ClientSession() as session:
# Use Daily as a real-time media transport (WebRTC)
transport = DailyTransport(
room_url=...,
token=...,
"Bot Name",
DailyParams(audio_out_enabled=True))
# Use Eleven Labs for Text-to-Speech
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=...,
voice_id=...,
)
# Simple pipeline that will process text to speech and output the result
pipeline = Pipeline([tts, transport.output()])
# Create Pipecat processor that can run one or more pipelines tasks
runner = PipelineRunner()
# Assign the task callable to run the pipeline
task = PipelineTask(pipeline)
# Register an event handler to play audio when a
# participant joins the transport WebRTC session
@transport.event_handler("on_participant_joined")
async def on_new_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
# Queue a TextFrame that will get spoken by the TTS service (Eleven Labs)
await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
# Run the pipeline task
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())
```
Run it with:
```shell
python app.py
```
Daily provides a prebuilt WebRTC user interface. Whilst the app is running, you can visit at `https://<yourdomain>.daily.co/<room_url>` and listen to the bot say hello!
## WebRTC for production use
WebSockets are fine for server-to-server communication or for initial development. But for production use, youll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/#webrtc))
One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard.
## What is VAD?
Voice Activity Detection &mdash; very important for knowing when a user has finished speaking to your bot. If you are not using press-to-talk, and want Pipecat to detect when the user has finished talking, VAD is an essential component for a natural feeling conversation.
Pipecast makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.
```shell
pip install pipecat-ai[silero]
```
The first time your run your bot with Silero, startup may take a while whilst it downloads and caches the model in the background. You can check the progress of this in the console.
## Hacking on the framework itself
_Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_
```
python3 -m venv env
source env/bin/activate
```shell
python3 -m venv venv
source venv/bin/activate
```
From the root of this repo, run the following:
```
pip install -r requirements.txt
```shell
pip install -r dev-requirements.txt -r {env}-requirements.txt
python -m build
```
This builds the package. To use the package locally (eg to run sample files), run
```
```shell
pip install --editable .
```
If you want to use this package from another directory, you can run:
```
```shell
pip install path_to_this_repo
```
## Running the samples
### Running tests
Tou can run the simple sample like so:
From the root directory, run:
```
python src/examples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
```
## Overview
The Daily AI SDK allows you to build applications that can participate in WebRTC sessions and interact with AI Services. Some examples of what you can build with this:
- conversational bots that interact 1:1 with a user, using voice recognition and text-to-speech
- assistant bots that aggregate transcriptions from multiple participants in a meeting and provide realtime summaries or other AI-generated output.
- image-recognition bots
- etc
## Concepts
### Transport Service
The SDK provides one “transport service”, which is a wrapper around Dailys `daily-python` client (tk add link). You can use this service to listen for events related to a WebRTC session, such as “a participant joined the meeting”.
The transport service also exposes a send queue, and a receive queue. You can use the send queue to send audio and video to the WebRTC session, and you can listen to the receive queue to see audio, video and transcription data from the WebRTC session.
### AI Services
The AI Service classes provide wrappers around various AI providers, and allow you to query LLMs, convert text to speech and make images from text. The audio and images can then be placed on the transport services send queue, where theyll be sent to the WebRTC session.
### Queue Frames
Communication between the transport service and AI services, and between various AI services, takes place in Queue Frames. These frames contain an indication of the type of data as well as the data itself.
## Using Transports, AI Services and Frames
AI Services all define a `.run` method. This method consumes and generates `QueueFrame` frames. The kind of frames that can be consumed and generated depend on the kind of service. For instance, an LLM AI Service consumes `LLM_MESSAGE` frames (which define a history of interaction with an LLM) and emit `TEXT` frames (the response from the LLM).
The `.run` method is an `AsyncIterable`, and it takes an `iterable`, `AsyncIterable` or `asyncio.Queue` that produces QueueFrames as a parameter. This makes it easy to chain AI Services, and consume input from the Transports `receive_queue` .
AI Services also have a `.run_to_queue` method. This method is not an AsyncIterable, but instead sends processed QueueFrames to a queue. This makes it easy to send the output of an AI Service to the Transports `send_queue`.
AI Services also define convenience functions that let you bypass creating QueueFrames for some simple cases (eg. using the TTS service to convert a string to audio output and send that audio to the transports `send_queue`). See below for examples.
## Examples
### Say Something
The base TTS AI service exposes a `.say` method. After creating a transport and TTS service, you can use this method like so:
```
transport = DailyTransportService(...)
tts = AzureTTSService()
await tts.say("hello world", transport.send_queue)
```shell
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests
```
This will call the TTS service to render the text to audio frames, then put the audio frames on the transports send queue. The transport will then send those frames along to the WebRTC session.
## Setting up your editor
### Speak an LLM response
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting.
Given a system prompt contained in a `messages` array, you can emit the LLMs response as audio with a chain like this:
### Emacs
```
transport = DailyTransportService(...) # setup parameters omitted
tts = AzureTTSService()
llm = AzureLLMService()
messages = [...] # system prompt omitted for brevity
You can use [use-package](https://github.com/jwiegley/use-package) to install [py-autopep8](https://codeberg.org/ideasman42/emacs-py-autopep8) package and configure `autopep8` arguments:
await tts.run_to_queue(
transport.send_queue,
llm.run([QueueFrame.LLM_MESSAGES, messages])
)
```elisp
(use-package py-autopep8
:ensure t
:defer t
:hook ((python-mode . py-autopep8-mode))
:config
(setq py-autopep8-options '("-a" "-a", "--max-line-length=100")))
```
In this code, the LLM service object sends the messages to Azures OpenAI implementation, which streams chunks back asynchronously. Those chunks are aggregated by the TTS Service to ensure the best audio response (TTS works best when it gets complete sentence, so it can inflect correctly), then sent to Azures TTS service, converted to audio frames, and sent to the WebRTC session via the Daily transport.
`autopep8` was installed in the `venv` environment described before, so you should be able to use [pyvenv-auto](https://github.com/ryotaro612/pyvenv-auto) to automatically load that environment inside Emacs.
### Pre-cache an LLM response
Sometimes LLMs can be slower than wed like for natural-feeling communication. Heres an example where we take advantage of the time it takes to speak some pre-defined text to get a head start on the LLM response:
(TK link to 04- sample)
In this sample, we set up a buffer queue to receive the audio frames from the LLM response before while we are joining the call and start an asynchronous task to start filling this buffer:
```
buffer_queue = asyncio.Queue()
llm_response_task = asyncio.create_task(
elevenlabs_tts.run_to_queue(
buffer_queue,
llm.run([QueueFrame(FrameType.LLM_MESSAGE, messages)]),
True,
)
)
```
Then, when weve joined the call, we speak the static text:
```
await azure_tts.say("My friend...", transport.send_queue)
```
As that text is being spoken, the asynchronous LLM task continues in the background. When the text is done, we pull the frames off the buffer queue and put them in the transports `send_queue`:
```
async def buffer_to_send_queue():
while True:
frame = await buffer_queue.get()
await transport.send_queue.put(frame)
buffer_queue.task_done()
if frame.frame_type == FrameType.END_STREAM:
break
await asyncio.gather(llm_response_task, buffer_to_send_queue())
```elisp
(use-package pyvenv-auto
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
```
One thing to note here is the last parameter to `run_to_queue` in the first code clause above: this causes the `run_to_queue` method to send an `END_STREAM` frame when its done rendering. This lets us know when to stop our `buffer_to_send_queue` task above.
### Visual Studio Code
Install the
[autopep8](https://marketplace.visualstudio.com/items?itemName=ms-python.autopep8) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, enable formatting on save and configure `autopep8` arguments:
```json
"[python]": {
"editor.defaultFormatter": "ms-python.autopep8",
"editor.formatOnSave": true
},
"autopep8.args": [
"-a",
"-a",
"--max-line-length=100"
],
```
## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)
➡️ [Reach us on X](https://x.com/pipecat_ai)

6
dev-requirements.txt Normal file
View File

@@ -0,0 +1,6 @@
autopep8~=2.1.0
build~=1.2.1
pip-tools~=7.4.1
pytest~=8.2.0
setuptools~=69.5.1
setuptools_scm~=8.1.0

View File

@@ -1,13 +1,10 @@
# Daily AI SDK Docs
# Pipecat Docs
## [Architecture Overview](architecture.md)
Learn about the thinking behind the SDK's design.
Learn about the thinking behind the framework's design.
## [Example Code](examples/)
## [A Frame's Progress](frame-progress.md)
The repo includes several example apps in the `src/examples` directory. The docs explain how they work.
See how a Frame is processed through a Transport, a Pipeline, and a series of Frame Processors.
## [API Reference](api/)
Complete documentation of the available classes and methods in the SDK.

View File

@@ -1,2 +1,17 @@
# Daily AI SDK Architecture Guide
# Pipecat architecture guide
## Frames
Frames can represent discrete chunks of data, for instance a chunk of text, a chunk of audio, or an image. They can also be used to as control flow, for instance a frame that indicates that there is no more data available, or that a user started or stopped talking. They can also represent more complex data structures, such as a message array used for an LLM completion.
## FrameProcessors
Frame processors operate on frames. Every frame processor implements a `process_frame` method that consumes one frame and produces zero or more frames. Frame processors can do simple transforms, such as concatenating text fragments into sentences, or they can treat frames as input for an AI Service, and emit chat completions based on message arrays or transform text into audio or images.
## Pipelines
Pipelines are lists of frame processors linked together. Frame processors can push frames upstream or downstream to their peers. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport as an output.
## Transports
Transports provide input and output frame processors to receive or send frames respectively. For example, the `DailyTransport` does this with a WebRTC session joined to a Daily.co room.

View File

@@ -1,119 +0,0 @@
# 01: Say One Thing
_video here - youtube?_
This example uses a text-to-speech (TTS) service to say one predefined sentence. But first, a quick overview of the general structure of these examples.
## Running the demos
All of the demos have something like this at the bottom of the file:
```python
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
```
### `configure()`
The `configure()` function comes from `src/examples/foundational/support/runner.py`, and it allows you to configure the examples from the command line directly, or using environment variables:
```bash
python 01-say-one-thing.py -u https://YOUR_DOMAIN.daily.co/YOUR_ROOM -k YOUR_API_KEY
# or
DAILY_ROOM_URL=https://YOUR_DOMAIN.daily.co/YOUR_ROOM DAILY_API_KEY=YOUR_API_KEY python 01-say-one-thing.py
# or set DAILY_ROOM_URL and DAILY_API_KEY in a .env file
python 01-say-one-thing.py
```
You'll need a Daily account to run these demos. You can sign up for free at [daily.co](https://daily.co). Once you've signed up you can create a room from the [Dashboard](https://dashboard.daily.co/rooms), and grab [your API key](https://dashboard.daily.co/developers) while you're there.
Some functionality (such as transcription) requires the bot to have owner privileges in the room. `runner.py` uses the Daily REST API to create a meeting token with owner privileges. You can learn more about meeting tokens in the [Daily docs](https://docs.daily.co/reference/rest-api/meeting-tokens).
### `asyncio.run()`
The AI SDK makes heavy use of Python's `asyncio` module. [This is a reasonable intro to the topic](https://builtin.com/data-science/asyncio) if you haven't worked with `asyncio` and coroutines before.
You can learn a bit more about the specifics of how the Daily AI SDK uses coroutines in the [Architecture Guide](../architecture.md).
## The `main()` function
All of the examples have a `main()` function with a similar structure:
- Configure the transport
- Configure the AI service(s) used in the demo
- Configure any event listeners
- Define a processing pipeline
- Run the example's coroutine(s)
### Configuring the transport
The first section of the `main()` function configures the transport object:
```python
meeting_duration_minutes = 5
transport = DailyTransportService(
room_url,
None,
"Say One Thing",
meeting_duration_minutes,
)
transport.mic_enabled = True
```
The [Architecture Guide](../architecture.md) explains the transport object in more detail. In this case, we're configuring a Daily transport object and enabling the virtual microphone, so our bot can play audio.
### Configuring the services
As described in the [Architecture Guide](../architecture.md), 'a 'Service' is a class that processes 'Frames' as part of a 'Pipeline'. In this demo app, we'll only need one service: a text-to-speech generator. We can create an instance of the `ElevenLabsTTSService` class with this line of code:
```python
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
```
You'll need to make sure and set those environment variables somewhere. The easiest way to do that is to copy the `example.env` file in the repo and rename it to `.env`, and then add your credentials to that file. `runner.py` loads the `python-dotenv` module and initializes it, making the values in that file available in the environment.
### Configuring event listeners
This part isn't strictly necessary for an app like this. You could include the contents of the `on_participant_joined` function directly in the body of the `main()` function, and it would run as soon as you started the script from the command line.
Instead, we can use an event handler to wait to run that code until someone else joins the meeting. We'll define a function called `greet_user()`, and use the `@transport.event_handler("on_participant_joined")` decorator to tell the SDK that we want to run that function whenever a user joins the room.
```python
@transport.event_handler("on_participant_joined")
async def greet_user(transport, participant):
if participant["info"]["isLocal"]:
return
await tts.say(
"Hello there, " + participant["info"]["userName"] + "!",
transport.send_queue,
)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
```
### Defining a processing pipeline
In this example, we don't actually have much of a processing pipeline! In fact, we're doing the whole thing inside the `greet_user()` function already.
Pipelines usually look like a bunch of nested calls to the `run()` or `run_to_queue()` function from different Services. In this example, we're using the `say()` function from the TTS service. This is effectively a convenience wrapper around the `run_to_queue()` function, which we'll discuss more later. It's important to `await` this function to ensure that the speech frames are queued for playback before the next line of code, because of the `stop_when_done()` function being called immediately afterward.
The output of the `say()` function goes to the transport's `send_queue`. This queue is the all-important connection between the world of the Services pipeline that's generating frames asynchronously and the ordered playback of audio and visual media in the WebRTC call.
### Running the coroutines
In this example, we don't actually have any separate processing pipelines—everything happens as a result of an event from the transport. So we only need to run the transport's coroutine, and await its completion:
```python
await transport.run()
```
In future examples, we'll run more processes in parallel. For now, this script can run until the transport exits—which will happen based on calling `stop_when_done()` in the `greet_user()` function.
## Next Steps
Next, we'll start connecting multiple AI services together by building a service pipeline.
## [02 - LLM Say One Thing »](02-llm-say-one-thing.md)

View File

@@ -1,5 +0,0 @@
# Daily AI SDK Examples
The docs in this folder pair with the example apps located in `src/examples/foundational`. They are designed to serve as a quick references for building different kinds of AI apps. But the examples also build on one another, so it can be really helpful to walk through them in order.
To start, you can learn about the overall structure of the examples in [01 - Say One Thing](01-say-one-thing.md).

46
docs/frame-progress.md Normal file
View File

@@ -0,0 +1,46 @@
# A Frame's Progress
1. A user says “Hello, LLM” and the cloud transcription service delivers a transcription to the Transport.
![A transcript frame arrives](images/frame-progress-01.png)
2. The Transport places a Transcription frame in the Pipelines source queue.
![Frame in source queue](images/frame-progress-02.png)
3. The Pipeline passes the Transcription frame to the first Frame Processor in its list, the LLM User Message Aggregator.
![To UMA](images/frame-progress-03.png)
4. The LLM User Message Aggregator updates the LLM Context with a `{“user”: “Hello LLM”}` message.
![Update context](images/frame-progress-04.png)
5. The LLM User Message Aggregator yields an LLM Message Frame, containing the updated LLM Context. The Pipeline passes this frame to the LLM Frame Processor.
![Update context](images/frame-progress-05.png)
6. The LLM Frame Processor creates a streaming chat completion based on the LLM context and yields the first chunk of a response, Text Frame with the value “Hi, “. The Pipeline passes this frame to the TTS Frame Processor. The TTS Frame Processor aggregates this response but doesnt yield anything, yet, because its waiting for a full sentence.
![LLM yields Text](images/frame-progress-06.png)
7. The LLM Frame Processor yields another Text Frame with the value “there.”. The Pipeline passes this frame to the TTS Frame Processor.
![LLM yields more Text](images/frame-progress-07.png)
8. The TTS Frame Processor now has a full sentence, so it starts streaming audio based on “Hi, there.” It yields the first chunk of streaming audio as an Audio frame, which the Pipeline passes to the LLM Assistant Message Aggregator.
![TTS yields Audio](images/frame-progress-08.png)
9. The LLM Assistant Message Aggregator doesnt do anything with Audio frames, so it immediately yields the frame, unchanged. This is the convention for all Frame Processors: frames that the processor doesnt process should be immediately yielded.
![pass-through](images/frame-progress-09.png)
10. The Pipeline places the first Audio frame in its sink queue, which is being watched by the Transport. Since the frame is now in a queue, the Pipeline can continue processing other frames. Note that the source and sink queues form a sort of “boundary of concurrent processing” between a Pipeline and the outside world. In a Pipeline, Frames are processed sequentially; once a Frame is on a queue it can be processed in parallel with the frames being processed by the Pipeline. TODO: link to a more in-depth section about this.
![sink queue](images/frame-progress-10.png)
11. The TTS Frame Processor yields another Audio frame as the Transport transmits the first Audio frame.
![parallel audio](images/frame-progress-11.png)
12. As before, the LLM Assistant Message Aggregator immediately yields the Audio frame and the Pipeline places the Audio frame in the sink queue.
![sink queue 2](images/frame-progress-12.png)
13. The TTS Frame Processor has no more frames to yield. The LLM Frame Processor emits an LLM Response End Frame, which the Pipeline passes to the TTS Frame Processor.
![response end](images/frame-progress-13.png)
14. The TTS Frame Processor immediately yields the LLM Response End Frame, so the Pipeline passes it along to the LLM Assistant Message Aggregator. The LLM Assistant Message Aggregator updates the LLM Context with the full response from the LLM. TODO TODO: I realized I forgot that the TSS Frame Processor also yields the Text frames that the LLM emitted so that the LLM Assistant Message Aggregator could accumulate them, arrggh.
![response end](images/frame-progress-14.png)
15. The system is quiet, and waiting for the next message from the Transport.
![response end](images/frame-progress-15.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

35
dot-env.template Normal file
View File

@@ -0,0 +1,35 @@
# Anthropic
ANTHROPIC_API_KEY=...
# Azure
AZURE_SPEECH_REGION=...
AZURE_SPEECH_API_KEY=...
AZURE_CHATGPT_API_KEY=...
AZURE_CHATGPT_ENDPOINT=https://...
AZURE_CHATGPT_MODEL=...
AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
# ElevenLabs
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
# Fal
FAL_KEY=...
# Fireworks
FIREWORKS_API_KEY=...
# PlayHT
PLAY_HT_USER_ID=...
PLAY_HT_API_KEY=...
# OpenAI
OPENAI_API_KEY=...

84
examples/README.md Normal file
View File

@@ -0,0 +1,84 @@
# Pipecat &mdash; Examples
## Foundational snippets
Small snippets that build on each other, introducing one or two concepts at a time.
➡️ [Take a look](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational)
## Chatbot examples
Collection of self-contained real-time voice and video AI demo applications built with Pipecat.
### Quickstart
Each project has its own set of dependencies and configuration variables. They intentionally avoids shared code across projects &mdash; you can grab whichever demo folder you want to work with as a starting point.
We recommend you start with a virtual environment:
```shell
cd pipecat-ai/examples/simple-chatbot
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
Next, follow the steps in the README for each demo.
Make sure you `pip install -r requirements.txt` for each demo project, so you can be sure to have the necessary service dependencies that extend the functionality of Pipecat. You can read more about the framework architecture [here](https://github.com/pipecat-ai/pipecat/tree/main/docs).
## Projects:
| Project | Description | Services |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, OpenAI, Daily, Daily Prebuilt UI |
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, Open AI, Fal, Daily, Custom UI |
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| Function-calling Chatbot (TBC) | A chatbot that can call functions in response to user input | Deepgram, OpenAI, Fireworks, Daily, Daily Prebuilt UI |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.
> It provides a quick way to join a real-time session with your bot and test your ideas without building any frontend code. If you'd like to see an example of a custom UI, try Storybot.
## FAQ
### Deployment
For each of these demos we've included a `Dockerfile`. Out of the box, this should provide everything needed to get the respective demo running on a VM:
```shell
docker build username/app:tag .
docker run -p 7860:7860 --env-file ./.env username/app:tag
docker push ...
```
### SSL
If you're working with a custom UI (such as with the Storytelling Chatbot), it's important to ensure your deployment platform supports HTTPS, as accessing user devices such as mics and webcams requires SSL.
If you try to run a custom UI without SSL, you may see an error in the console telling you that `navigator` is undefined, or no devices are available.
### Are these examples production ready?
Yes, kind of.
These demos attempt to keep things simple and are unopinionated regarding environment or scalability.
We're using FastAPI to spawn a subprocess for the bots / agents &mdash; useful for small tests, but not so great for production grade apps with many concurrent users. You can see how this works in each project's `start` endpoint in `server.py`.
Creating virtualized worker pools and on-demand instances is out of scope for these examples, but we hope to add some examples to this repo soon!
For projects that have CUDA as a requirement, such as Moondream Chatbot, be sure to deploy to a GPU-powered platform (such as [fly.io](https://fly.io) or [Runpod](https://runpod.io).)
## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)
➡️ [Reach us on Twitter](https://x.com/pipecat_ai)

View File

@@ -0,0 +1,56 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_participant_joined")
async def on_new_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,53 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([tts, transport.output()])
task = PipelineTask(pipeline)
async def say_something():
await asyncio.sleep(1)
await task.queue_frames([TextFrame("Hello there!"), EndFrame()])
runner = PipelineRunner()
await asyncio.gather(runner.run(task), say_something())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,68 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Say One Thing From an LLM",
DailyParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}]
runner = PipelineRunner()
task = PipelineTask(Pipeline([llm, tts, transport.output()]))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,68 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Show a still frame image",
DailyParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024
)
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
runner = PipelineRunner()
task = PipelineTask(Pipeline([imagegen, transport.output()]))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Note that we do not put an EndFrame() item in the pipeline for this demo.
# This means that the bot will stay in the channel until it times out.
# An EndFrame() in the pipeline would cause the transport to shut
# down.
await task.queue_frames([TextFrame("a cat in the style of picasso")])
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,68 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import tkinter as tk
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Picasso Cat")
transport = TkLocalTransport(
tk_root,
TransportParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024))
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
pipeline = Pipeline([imagegen, transport.output()])
task = PipelineTask(pipeline)
await task.queue_frames([TextFrame("a cat in the style of picasso")])
runner = PipelineRunner()
async def run_tk():
while runner.is_active():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,86 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
from pipecat.pipeline.merge_pipeline import SequentialMergePipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.frames.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.task import PipelineTask
from pipecat.services.azure import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.transport_services import TransportServiceOutput
from pipecat.services.transports.daily_transport import DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(room_url, None, "Static And Dynamic Speech")
meeting = TransportServiceOutput(transport, mic_enabled=True)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
azure_tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
elevenlabs_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
messages = [{"role": "system",
"content": "tell the user a joke about llamas"}]
# Start a task to run the LLM to create a joke, and convert the LLM
# output to audio frames. This task will run in parallel with generating
# and speaking the audio for static text, so there's no delay to speak
# the LLM response.
llm_pipeline = Pipeline([llm, elevenlabs_tts])
llm_task = PipelineTask(llm_pipeline)
await llm_task.queue_frames([LLMMessagesFrame(messages), EndPipeFrame()])
simple_tts_pipeline = Pipeline([azure_tts])
await simple_tts_pipeline.queue_frames(
[
TextFrame("My friend the LLM is going to tell a joke about llamas."),
EndPipeFrame(),
]
)
merge_pipeline = SequentialMergePipeline(
[simple_tts_pipeline, llm_pipeline])
await asyncio.gather(
transport.run(merge_pipeline),
simple_tts_pipeline.run_pipeline(),
llm_pipeline.run_pipeline(),
)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,164 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from dataclasses import dataclass
from pipecat.frames.frames import (
AppFrame,
EndFrame,
Frame,
ImageRawFrame,
LLMFullResponseStartFrame,
LLMMessagesFrame,
TextFrame
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.aggregators.gated import GatedAggregator
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.aggregators.parallel_task import ParallelTask
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
@dataclass
class MonthFrame(AppFrame):
month: str
def __str__(self):
return f"{self.name}(month: {self.month})"
class MonthPrepender(FrameProcessor):
def __init__(self):
super().__init__()
self.most_recent_month = "Placeholder, month frame not yet received"
self.prepend_to_next_text_frame = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, MonthFrame):
self.most_recent_month = frame.month
elif self.prepend_to_next_text_frame and isinstance(frame, TextFrame):
await self.push_frame(TextFrame(f"{self.most_recent_month}: {frame.text}"))
self.prepend_to_next_text_frame = False
elif isinstance(frame, LLMFullResponseStartFrame):
self.prepend_to_next_text_frame = True
await self.push_frame(frame)
else:
await self.push_frame(frame, direction)
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Month Narration Bot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
gated_aggregator = GatedAggregator(
gate_open_fn=lambda frame: isinstance(frame, ImageRawFrame),
gate_close_fn=lambda frame: isinstance(frame, LLMFullResponseStartFrame),
start_open=False
)
sentence_aggregator = SentenceAggregator()
month_prepender = MonthPrepender()
llm_full_response_aggregator = LLMFullResponseAggregator()
pipeline = Pipeline([
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
ParallelTask( # Run pipelines in parallel aggregating the result
[month_prepender, tts], # Create "Month: sentence" and output audio
[llm_full_response_aggregator, imagegen] # Aggregate full LLM response
),
gated_aggregator, # Queues everything until an image is available
transport.output() # Transport output
])
frames = []
for month in [
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
]:
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
frames.append(MonthFrame(month=month))
frames.append(LLMMessagesFrame(messages))
frames.append(EndFrame())
runner = PipelineRunner()
task = PipelineTask(pipeline)
await task.queue_frames(frames)
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,168 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import tkinter as tk
from pipecat.frames.frames import AudioRawFrame, Frame, URLImageRawFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Calendar")
runner = PipelineRunner()
async def get_month_data(month):
messages = [{"role": "system", "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.", }]
class ImageDescription(FrameProcessor):
def __init__(self):
super().__init__()
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
class AudioGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.audio = bytearray()
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, AudioRawFrame):
self.audio.extend(frame.audio)
self.frame = AudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels)
class ImageGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, URLImageRawFrame):
self.frame = frame
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"))
aggregator = LLMFullResponseAggregator()
description = ImageDescription()
audio_grabber = AudioGrabber()
image_grabber = ImageGrabber()
pipeline = Pipeline([
llm,
aggregator,
description,
ParallelPipeline([tts, audio_grabber],
[imagegen, image_grabber])
])
task = PipelineTask(pipeline)
await task.queue_frame(LLMMessagesFrame(messages))
await task.stop_when_done()
await runner.run(task)
return {
"month": month,
"text": description.text,
"image": image_grabber.frame,
"audio": audio_grabber.frame,
}
transport = TkLocalTransport(
tk_root,
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024))
pipeline = Pipeline([transport.output()])
task = PipelineTask(pipeline)
# We only specify 5 months as we create tasks all at once and we might
# get rate limited otherwise.
months: list[str] = [
"January",
"February",
# "March",
# "April",
# "May",
]
# We create one task per month. This will be executed concurrently.
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
# Now we wait for each month task in the order they're completed. The
# benefit is we'll have as little delay as possible before the first
# month, and likely no delay between months, but the months won't
# display in order.
async def show_images(month_tasks):
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
await task.queue_frames([data["image"], data["audio"]])
await runner.stop_when_done()
async def run_tk():
while True:
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), show_images(month_tasks), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,101 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
fl_in = FrameLogger("Inner")
fl_out = FrameLogger("Outer")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
fl_in,
transport.input(),
tma_in,
llm,
fl_out,
tts,
transport.output(),
tma_out
])
task = PipelineTask(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,123 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from PIL import Image
from pipecat.frames.frames import ImageRawFrame, Frame, SystemFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyTransport
from pipecat.transports.services.daily import DailyParams
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class ImageSyncAggregator(FrameProcessor):
def __init__(self, speaking_path: str, waiting_path: str):
super().__init__()
self._speaking_image = Image.open(speaking_path)
self._speaking_image_format = self._speaking_image.format
self._speaking_image_bytes = self._speaking_image.tobytes()
self._waiting_image = Image.open(waiting_path)
self._waiting_image_format = self._waiting_image.format
self._waiting_image_bytes = self._waiting_image.tobytes()
async def process_frame(self, frame: Frame, direction: FrameDirection):
if not isinstance(frame, SystemFrame):
await self.push_frame(ImageRawFrame(image=self._speaking_image_bytes, size=(1024, 1024), format=self._speaking_image_format))
await self.push_frame(frame)
await self.push_frame(ImageRawFrame(image=self._waiting_image_bytes, size=(1024, 1024), format=self._waiting_image_format))
else:
await self.push_frame(frame)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024,
transcription_enabled=True
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(messages)
tma_out = LLMAssistantContextAggregator(messages)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
)
pipeline = Pipeline([
transport.input(),
image_sync_aggregator,
tma_in,
llm,
tts,
transport.output(),
tma_out
])
task = PipelineTask(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([TextFrame(f"Hi, this is {participant_name}.")])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,94 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, allow_interruptions=True)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,148 @@
from typing import Tuple
import aiohttp
import asyncio
import logging
import os
from pipecat.pipeline.aggregators import SentenceAggregator
from pipecat.pipeline.pipeline import Pipeline
from pipecat.transports.daily_transport import DailyTransport
from pipecat.services.azure_ai_services import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs_ai_services import ElevenLabsTTSService
from pipecat.services.fal_ai_services import FalImageGenService
from pipecat.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("pipecat")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Respond bot",
duration_minutes=10,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
tts1 = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts2 = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)
dalle = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="1024x1024"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
bot1_messages = [
{
"role": "system",
"content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long.",
},
]
bot2_messages = [
{
"role": "system",
"content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich.",
},
]
async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
"""This function streams text from the LLM and uses the TTS service to convert
that text to speech as it's received. """
source_queue = asyncio.Queue()
sink_queue = asyncio.Queue()
sentence_aggregator = SentenceAggregator()
pipeline = Pipeline(
[llm, sentence_aggregator, tts1], source_queue, sink_queue
)
await source_queue.put(LLMMessagesFrame(messages))
await source_queue.put(EndFrame())
await pipeline.run_pipeline()
message = ""
all_audio = bytearray()
while sink_queue.qsize():
frame = sink_queue.get_nowait()
if isinstance(frame, TextFrame):
message += frame.text
elif isinstance(frame, AudioFrame):
all_audio.extend(frame.audio)
return (message, all_audio)
async def get_bot1_statement():
message, audio = await get_text_and_audio(bot1_messages)
bot1_messages.append({"role": "assistant", "content": message})
bot2_messages.append({"role": "user", "content": message})
return audio
async def get_bot2_statement():
message, audio = await get_text_and_audio(bot2_messages)
bot2_messages.append({"role": "assistant", "content": message})
bot1_messages.append({"role": "user", "content": message})
return audio
async def argue():
for i in range(100):
print(f"In iteration {i}")
bot1_description = "A woman conservatively dressed as a librarian in a library surrounded by books, cartoon, serious, highly detailed"
(audio1, image_data1) = await asyncio.gather(
get_bot1_statement(), dalle.run_image_gen(bot1_description)
)
await transport.send_queue.put(
[
ImageFrame(image_data1[1], image_data1[2]),
AudioFrame(audio1),
]
)
bot2_description = "A cat dressed in a hot dog costume, cartoon, bright colors, funny, highly detailed"
(audio2, image_data2) = await asyncio.gather(
get_bot2_statement(), dalle.run_image_gen(bot2_description)
)
await transport.send_queue.put(
[
ImageFrame(image_data2[1], image_data2[2]),
AudioFrame(audio2),
]
)
await asyncio.gather(transport.run(), argue())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,53 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.services.daily import DailyTransport, DailyParams
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url, token):
transport = DailyTransport(
room_url, token, "Test",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1280,
camera_out_height=720
)
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
pipeline = Pipeline([transport.input(), transport.output()])
runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,65 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
import tkinter as tk
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url, token):
tk_root = tk.Tk()
tk_root.title("Local Mirror")
daily_transport = DailyTransport(room_url, token, "Test", DailyParams(audio_in_enabled=True))
tk_transport = TkLocalTransport(
tk_root,
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1280,
camera_out_height=720))
@daily_transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
pipeline = Pipeline([daily_transport.input(), tk_transport.output()])
runner = PipelineRunner()
async def run_tk():
while runner.is_active():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
task = PipelineTask(pipeline)
await asyncio.gather(runner.run(task), run_tk())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,189 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import random
import sys
from PIL import Image
from pipecat.frames.frames import (
Frame,
SystemFrame,
TextFrame,
ImageRawFrame,
SpriteFrame,
TranscriptionFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
sprites = {}
image_files = [
"sc-default.png",
"sc-talk.png",
"sc-listen-1.png",
"sc-think-1.png",
"sc-think-2.png",
"sc-think-3.png",
"sc-think-4.png",
]
script_dir = os.path.dirname(__file__)
for file in image_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites[file] = ImageRawFrame(image=img.tobytes(), size=img.size, format=img.format)
# When the bot isn't talking, show a static image of the cat listening
quiet_frame = sprites["sc-listen-1.png"]
# When the bot is talking, build an animation from two sprites
talking_list = [sprites["sc-default.png"], sprites["sc-talk.png"]]
talking = [random.choice(talking_list) for x in range(30)]
talking_frame = SpriteFrame(talking)
# TODO: Support "thinking" as soon as we get a valid transcript, while LLM
# is processing
thinking_list = [
sprites["sc-think-1.png"],
sprites["sc-think-2.png"],
sprites["sc-think-3.png"],
sprites["sc-think-4.png"],
]
thinking_frame = SpriteFrame(thinking_list)
class NameCheckFilter(FrameProcessor):
def __init__(self, names: list[str]):
super().__init__()
self._names = names
self._sentence = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, SystemFrame):
await self.push_frame(frame, direction)
return
content: str = ""
# TODO: split up transcription by participant
if isinstance(frame, TranscriptionFrame):
content = frame.text
self._sentence += content
if self._sentence.endswith((".", "?", "!")):
if any(name in self._sentence for name in self._names):
await self.push_frame(TextFrame(self._sentence))
self._sentence = ""
else:
self._sentence = ""
else:
await self.push_frame(frame, direction)
class ImageSyncAggregator(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await self.push_frame(talking_frame)
await self.push_frame(frame)
await self.push_frame(quiet_frame)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Santa Cat",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=720,
camera_out_height=1280,
camera_out_framerate=10,
transcription_enabled=True
)
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)
isa = ImageSyncAggregator()
messages = [
{
"role": "system",
"content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while. Your responses should only be a few sentences long.",
},
]
tma_in = LLMUserContextAggregator(messages)
tma_out = LLMAssistantContextAggregator(messages)
ncf = NameCheckFilter(["Santa Cat", "Santa"])
pipeline = Pipeline([
transport.input(),
isa,
ncf,
tma_in,
llm,
tts,
transport.output(),
tma_out
])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Send some greeting at the beginning.
await tts.say("Hi! If you want to talk to me, just say 'hey Santa Cat'.")
transport.capture_participant_transcription(participant["id"])
async def starting_image():
await transport.send_image(quiet_frame)
runner = PipelineRunner()
task = PipelineTask(pipeline)
await asyncio.gather(runner.run(task), starting_image())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,142 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import wave
from pipecat.frames.frames import (
Frame,
AudioRawFrame,
LLMFullResponseEndFrame,
LLMMessagesFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
sounds = {}
sound_files = ["ding1.wav", "ding2.wav"]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = AudioRawFrame(audio_file.readframes(-1),
audio_file.getframerate(), audio_file.getnchannels())
class OutboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, LLMFullResponseEndFrame):
await self.push_frame(sounds["ding1.wav"])
# In case anything else downstream needs it
await self.push_frame(frame, direction)
else:
await self.push_frame(frame, direction)
class InboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, LLMMessagesFrame):
await self.push_frame(sounds["ding2.wav"])
# In case anything else downstream needs it
await self.push_frame(frame, direction)
else:
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(audio_out_enabled=True, transcription_enabled=True)
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(messages)
tma_out = LLMAssistantContextAggregator(messages)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
pipeline = Pipeline([
transport.input(),
tma_in,
in_sound,
fl2,
llm,
fl,
tts,
out_sound,
transport.output(),
tma_out
])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await tts.say("Hi, I'm listening!")
await transport.send_audio(sounds["ding1.wav"])
runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,110 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.moondream import MoondreamService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
# If you run into weird description, try with use_cpu=True
moondream = MoondreamService()
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
moondream,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,110 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.google import GoogleLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_in_enabled=True, # This is so Silero VAD can get audio data
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
google = GoogleLLMService(model="gemini-1.5-flash-latest")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
google,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,112 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
openai = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o"
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
openai,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,55 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main(room_url: str):
transport = DailyTransport(room_url, None, "Transcription bot",
DailyParams(audio_in_enabled=True))
stt = WhisperSTTService()
tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,55 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main(room_url: str):
transport = LocalAudioTransport(TransportParams(audio_in_enabled=True))
stt = WhisperSTTService()
tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

Binary file not shown.

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 870 KiB

After

Width:  |  Height:  |  Size: 870 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 872 KiB

After

Width:  |  Height:  |  Size: 872 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 33 KiB

After

Width:  |  Height:  |  Size: 33 KiB

View File

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -4,15 +4,15 @@ import time
import urllib
import requests
from dotenv import load_dotenv
load_dotenv()
def configure():
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
"-u",
"--url",
type=str,
required=False,
help="URL of the Daily room to join")
parser.add_argument(
"-k",
"--apikey",
@@ -33,20 +33,25 @@ def configure():
if not key:
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
# Create a meeting token for the given room with an expiration 1 hour in the future.
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={"Authorization": f"Bearer {key}"},
headers={
"Authorization": f"Bearer {key}"},
json={
"properties": {"room_name": room_name, "is_owner": True, "exp": expiration}
},
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(f"Failed to create meeting token: {res.status_code} {res.text}")
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]

View File

@@ -0,0 +1,25 @@
syntax = "proto3";
package pipecat_proto;
message TextFrame {
string text = 1;
}
message AudioFrame {
bytes audio = 1;
}
message TranscriptionFrame {
string text = 1;
string participant_id = 2;
string timestamp = 3;
}
message Frame {
oneof frame {
TextFrame text = 1;
AudioFrame audio = 2;
TranscriptionFrame transcription = 3;
}
}

View File

@@ -0,0 +1,134 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="//cdn.jsdelivr.net/npm/protobufjs@7.X.X/dist/protobuf.min.js"></script>
<title>WebSocket Audio Stream</title>
</head>
<body>
<h1>WebSocket Audio Stream</h1>
<button id="startAudioBtn">Start Audio</button>
<button id="stopAudioBtn">Stop Audio</button>
<script>
const SAMPLE_RATE = 16000;
const BUFFER_SIZE = 8192;
const MIN_AUDIO_SIZE = 6400;
let audioContext;
let microphoneStream;
let scriptProcessor;
let source;
let frame;
let audioChunks = [];
let isPlaying = false;
let ws;
const proto = protobuf.load("frames.proto", (err, root) => {
if (err) throw err;
frame = root.lookupType("pipecat_proto.Frame");
});
function initWebSocket() {
ws = new WebSocket('ws://localhost:8765');
ws.addEventListener('open', () => console.log('WebSocket connection established.'));
ws.addEventListener('message', handleWebSocketMessage);
ws.addEventListener('close', (event) => console.log("WebSocket connection closed.", event.code, event.reason));
ws.addEventListener('error', (event) => console.error('WebSocket error:', event));
}
async function handleWebSocketMessage(event) {
const arrayBuffer = await event.data.arrayBuffer();
enqueueAudioFromProto(arrayBuffer);
}
function enqueueAudioFromProto(arrayBuffer) {
const parsedFrame = frame.decode(new Uint8Array(arrayBuffer));
if (!parsedFrame?.audio) return false;
const frameCount = parsedFrame.audio.data.length / 2;
const audioOutBuffer = audioContext.createBuffer(1, frameCount, SAMPLE_RATE);
const nowBuffering = audioOutBuffer.getChannelData(0);
const view = new Int16Array(parsedFrame.audio.data.buffer);
for (let i = 0; i < frameCount; i++) {
const word = view[i];
nowBuffering[i] = ((word + 32768) % 65536 - 32768) / 32768.0;
}
audioChunks.push(audioOutBuffer);
if (!isPlaying) playNextChunk();
}
function playNextChunk() {
if (audioChunks.length === 0) {
isPlaying = false;
return;
}
isPlaying = true;
const audioOutBuffer = audioChunks.shift();
const source = audioContext.createBufferSource();
source.buffer = audioOutBuffer;
source.connect(audioContext.destination);
source.onended = playNextChunk;
source.start();
}
function startAudio() {
if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) {
alert('getUserMedia is not supported in your browser.');
return;
}
navigator.mediaDevices.getUserMedia({ audio: true })
.then((stream) => {
microphoneStream = stream;
audioContext = new (window.AudioContext || window.webkitAudioContext)();
scriptProcessor = audioContext.createScriptProcessor(BUFFER_SIZE, 1, 1);
source = audioContext.createMediaStreamSource(stream);
source.connect(scriptProcessor);
scriptProcessor.connect(audioContext.destination);
const audioBuffer = [];
const skipRatio = Math.floor(audioContext.sampleRate / (SAMPLE_RATE * 2));
scriptProcessor.onaudioprocess = (event) => {
const rawLeftChannelData = event.inputBuffer.getChannelData(0);
for (let i = 0; i < rawLeftChannelData.length; i += skipRatio) {
const normalized = ((rawLeftChannelData[i] * 32768.0) + 32768) % 65536 - 32768;
const swappedBytes = ((normalized & 0xff) << 8) | ((normalized >> 8) & 0xff);
audioBuffer.push(swappedBytes);
}
if (audioBuffer.length >= MIN_AUDIO_SIZE) {
const audioFrame = frame.create({ audio: { audio: audioBuffer.slice(0, MIN_AUDIO_SIZE) } });
const encodedFrame = new Uint8Array(frame.encode(audioFrame).finish());
ws.send(encodedFrame);
audioBuffer.splice(0, MIN_AUDIO_SIZE);
}
};
initWebSocket();
})
.catch((error) => console.error('Error accessing microphone:', error));
}
function stopAudio() {
if (ws) {
ws.close();
scriptProcessor.disconnect();
source.disconnect();
ws = undefined;
}
}
document.getElementById('startAudioBtn').addEventListener('click', startAudio);
document.getElementById('stopAudioBtn').addEventListener('click', stopAudio);
</script>
</body>
</html>

View File

@@ -0,0 +1,50 @@
import asyncio
import aiohttp
import logging
import os
from pipecat.pipeline.frame_processor import FrameProcessor
from pipecat.pipeline.frames import TextFrame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.elevenlabs_ai_services import ElevenLabsTTSService
from pipecat.transports.websocket_transport import WebsocketTransport
from pipecat.services.whisper_ai_services import WhisperSTTService
logging.basicConfig(format="%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("pipecat")
logger.setLevel(logging.DEBUG)
class WhisperTranscriber(FrameProcessor):
async def process_frame(self, frame):
if isinstance(frame, TranscriptionFrame):
print(f"Transcribed: {frame.text}")
else:
yield frame
async def main():
async with aiohttp.ClientSession() as session:
transport = WebsocketTransport(
mic_enabled=True,
speaker_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([
WhisperSTTService(),
WhisperTranscriber(),
tts,
])
@transport.on_connection
async def queue_frame():
await pipeline.queue_frames([TextFrame("Hello there!")])
await transport.run(pipeline)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,163 @@
# flyctl launch added from .gitignore
# Byte-compiled / optimized / DLL files
**/__pycache__
**/*.py[cod]
**/*$py.class
# C extensions
**/*.so
# Distribution / packaging
**/.Python
**/build
**/develop-eggs
**/dist
**/downloads
**/eggs
**/.eggs
**/lib
**/lib64
**/parts
**/sdist
**/var
**/wheels
**/share/python-wheels
**/*.egg-info
**/.installed.cfg
**/*.egg
**/MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
**/*.manifest
**/*.spec
# Installer logs
**/pip-log.txt
**/pip-delete-this-directory.txt
# Unit test / coverage reports
**/htmlcov
**/.tox
**/.nox
**/.coverage
**/.coverage.*
**/.cache
**/nosetests.xml
**/coverage.xml
**/*.cover
**/*.py,cover
**/.hypothesis
**/.pytest_cache
**/cover
# Translations
**/*.mo
**/*.pot
# Django stuff:
**/*.log
**/local_settings.py
**/db.sqlite3
**/db.sqlite3-journal
# Flask stuff:
**/instance
**/.webassets-cache
# Scrapy stuff:
**/.scrapy
# Sphinx documentation
**/docs/_build
# PyBuilder
**/.pybuilder
**/target
# Jupyter Notebook
**/.ipynb_checkpoints
# IPython
**/profile_default
**/ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
**/.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
**/__pypackages__
# Celery stuff
**/celerybeat-schedule
**/celerybeat.pid
# SageMath parsed files
**/*.sage.py
# Environments
**/.env
**/.venv
**/env
**/venv
**/ENV
**/env.bak
**/venv.bak
# Spyder project settings
**/.spyderproject
**/.spyproject
# Rope project settings
**/.ropeproject
# mkdocs documentation
site
# mypy
**/.mypy_cache
**/.dmypy.json
**/dmypy.json
# Pyre type checker
**/.pyre
# pytype static type analyzer
**/.pytype
# Cython debug symbols
**/cython_debug
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
**/runpod.toml
fly.toml

161
examples/moondream-chatbot/.gitignore vendored Normal file
View File

@@ -0,0 +1,161 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml

View File

@@ -0,0 +1,25 @@
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y wget
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
RUN dpkg -i cuda-keyring_1.1-1_all.deb
RUN echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" > /etc/apt/sources.list.d/cuda-ubuntu2204-x86_64.list
RUN apt-get update && apt-get install -y python3 python3-pip
RUN apt-get install -y cuda-nvcc-12-4 libcublas-12-4 libcudnn8
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
copy assets/* /app/assets/
copy utils/* /app/utils/
WORKDIR /app
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "server.py"]

View File

@@ -0,0 +1,76 @@
FROM ubuntu:22.04
# environment variables for Intel OneAPI components
ENV DPCPPROOT=/opt/intel/oneapi/compiler/latest
ENV MKLROOT=/opt/intel/oneapi/mkl/latest
ENV CCLROOT=/opt/intel/oneapi/ccl/latest
ENV MPIROOT=/opt/intel/oneapi/mpi/latest
# Install necessary dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
wget \
lsb-release \
pciutils \
gnupg2 \
python3-pip
# Add Intel OneAPI repository and GPG key
# Intel GPU repository and GPG key
# Install Intel OneAPI components and source the environment scripts
RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null && \
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list && \
/bin/bash -c ' \
. /etc/os-release && \
if [[ " jammy " =~ " ${VERSION_CODENAME} " ]]; then \
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg && \
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu ${VERSION_CODENAME}/lts/2350 unified" | \
tee /etc/apt/sources.list.d/intel-gpu-${VERSION_CODENAME}.list && \
apt-get update && \
apt-get install -y --no-install-recommends intel-opencl-icd \
intel-level-zero-gpu level-zero intel-media-va-driver-non-free \
libmfx1 libmfxgen1 libvpl2 libegl-mesa0 libegl1-mesa \
libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 \
libxatracker2 mesa-va-drivers mesa-vdpau-drivers \
mesa-vulkan-drivers va-driver-all; \
else \
echo "Ubuntu version ${VERSION_CODENAME} not supported. Exiting..."; \
exit 1; \
fi' && \
apt-get update && apt-get install -y --no-install-recommends \
intel-oneapi-dpcpp-cpp-2024.1=2024.1.0-963 intel-oneapi-mkl-devel=2024.1.0-691 \
intel-oneapi-ccl-devel=2021.12.0-309 && \
apt-get clean && rm -rf /var/lib/apt/lists/* && \
groupadd -r render && usermod -aG render root && \
echo "source ${DPCPPROOT}/env/vars.sh" >> ~/.bashrc && \
echo "source ${MKLROOT}/env/vars.sh" >> ~/.bashrc && \
echo "source ${CCLROOT}/env/vars.sh" >> ~/.bashrc && \
echo "source ${MPIROOT}/env/vars.sh" >> ~/.bashrc && \
echo "export LD_LIBRARY_PATH=${MKLROOT}/lib:${DPCPPROOT}/linux/compiler/lib/intel64_lin:$LD_LIBRARY_PATH" >> ~/.bashrc
WORKDIR /app
COPY . /app
RUN mkdir -p /app /app/assets /app/utils
COPY *.py requirements.txt assets/* utils/* /app/
# Install the Intel-specific versions of torch
RUN python3 -m pip install --no-cache-dir -r requirements.txt && \
pip uninstall -y torch && \
pip freeze | grep 'nvidia-' | xargs pip uninstall -y && \
pip install --no-cache-dir --force-reinstall torch==2.1.0.post2 torchvision==0.16.0.post2 torchaudio==2.1.0.post2 \
intel-extension-for-pytorch==2.1.30+xpu oneccl_bind_pt==2.1.300+xpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
RUN echo '#!/bin/bash\n\
source ${DPCPPROOT}/env/vars.sh\n\
source ${MKLROOT}/env/vars.sh\n\
source ${CCLROOT}/env/vars.sh\n\
source ${MPIROOT}/env/vars.sh\n\
export LD_LIBRARY_PATH=${MKLROOT}/lib:${DPCPPROOT}/linux/compiler/lib/intel64_lin:$LD_LIBRARY_PATH\n\
python3 server.py' > /usr/local/bin/run_app.sh && \
chmod +x /usr/local/bin/run_app.sh && \
find / -type d -name "__pycache__" -exec rm -rf {} +
EXPOSE 7860
ENTRYPOINT ["/usr/local/bin/run_app.sh"]

View File

@@ -0,0 +1,44 @@
# Moondream Chatbot
<img src="image.png" width="420px">
This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion. The chatbot also has vision powers thanks to [Moondream](https://moondream.ai) so you can ask it, for example, "what do you see?".
The first time, things might take some time to get started since VAD (Voice Activity Detection) and vision models need to be downloaded.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/start` in your browser to start a chatbot
session.
## Build and test the Docker image
```
docker build -t moonbot .
docker run --env-file .env -p 7860:7860 moonbot
```
### For Intel GPUs (Arc, Max and Flex series)
```
docker build -t moonbot -f Dockerfile.intel .
docker run --env-file .env -p 7860:7860 --device /dev/dri moonbot
```
You can try to visit `http://localhost:7860/start` again.

Binary file not shown.

After

Width:  |  Height:  |  Size: 759 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 884 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 876 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 881 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 874 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 882 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 885 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 888 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 890 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 898 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 836 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 905 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 849 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 864 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 858 KiB

Some files were not shown because too many files have changed in this diff Show More