Compare commits

...

304 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
2f4467b5a5 Merge pull request #213 from pipecat-ai/aleix/pipecat-0.0.26
update CHANGELOG for 0.0.26
2024-06-06 01:10:01 +08:00
Aleix Conchillo Flaqué
e91ab54a69 update CHANGELOG for 0.0.26 2024-06-05 10:07:45 -07:00
Aleix Conchillo Flaqué
6a33432c82 Merge pull request #212 from pipecat-ai/aleix/make-pinlesscallupdate-public
transports(daily): move pinlessCallUpdate to public api
2024-06-05 23:14:14 +08:00
Aleix Conchillo Flaqué
135654a080 transports(daily): move pinlessCallUpdate to public api 2024-06-05 08:08:56 -07:00
Aleix Conchillo Flaqué
7b708a2bee Merge pull request #211 from pipecat-ai/aleix/base-transport-async
various fixes and improvements
2024-06-05 22:57:35 +08:00
Aleix Conchillo Flaqué
b515c28417 services(cartesia): allow output_format and model_id 2024-06-04 19:24:33 -07:00
Aleix Conchillo Flaqué
854ffb0323 update CHANGELOG for DailyRESTHelper 2024-06-04 15:45:17 -07:00
Aleix Conchillo Flaqué
891b7b22ea transports: push EndFrame/CancelFrame before stopping push task 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
c8d37a7227 pipeline(runner): add support for SIGTERM 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
489060881d update macos-py3.10-requirements 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
d56a4cce1b update CHANGELOG with latest changes 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
7eb9dfde38 pyproject: include langchain-community and langchain-openai 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
571e10f83e services(anthropic): fix interruptions with anthropic 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
af202d4fe5 pipeline(task): introduce has_finished() 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
4057fbbcfd transports(tk): fix pyaudio output stream cleanup 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
5cdb8a79a1 examples: use camera_out_is_live for live video 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
a674b43243 transport: remove redundant camera thread and switch audio pull for push 2024-06-04 15:43:54 -07:00
Jon Taylor
ac41f13b7c Merge pull request #205 from pipecat-ai/daily_rest_helpers
Created REST helpers for Daily covering commonly used methods for running / deployment
2024-06-04 22:26:39 +02:00
Jon Taylor
003b9887b1 made sip and sipuri optional and None 2024-06-04 19:03:58 +02:00
Jon Taylor
ba45c2ab5b addressed review (urllib import and linting 2024-06-04 18:39:35 +02:00
Aleix Conchillo Flaqué
9d36a48a80 Merge pull request #208 from pipecat-ai/aleix/cartesia-voice-load-startup
services(cartesia): load voices on startup
2024-06-04 22:54:25 +08:00
Aleix Conchillo Flaqué
20a525635e Merge pull request #201 from TomTom101/TomTom101/openai_tts
Added OpenAI TTS (#196)
2024-06-04 22:53:56 +08:00
Aleix Conchillo Flaqué
659eceea95 services(cartesia): load voices on startup 2024-06-03 14:08:04 -07:00
TomTom101
d462c03d00 chore: Review comments 2024-06-03 20:13:15 +02:00
Jon Taylor
6591e07eb4 removed hardcoded 'https' from API url 2024-06-03 19:32:14 +02:00
Aleix Conchillo Flaqué
fe71825954 Merge pull request #206 from pipecat-ai/aleix/fix-deepgram-tts
services(deepgram): fixed DeepgramTTSService
2024-06-04 00:28:53 +08:00
Aleix Conchillo Flaqué
43516f84fe services(deepgram): fixed DeepgramTTSService 2024-06-03 07:53:46 -07:00
Jon Taylor
0849edb00b added Daily REST helpers file for common methods used in Pipecat bots 2024-06-03 16:38:13 +02:00
Aleix Conchillo Flaqué
dd3b4083eb Merge pull request #204 from TomTom101/TomTom101/langchain
fix: Fixed imports, support new PipelineParams
2024-06-03 03:16:30 +08:00
TomTom101
89673a4040 test(langchain): Use new PipelineParams in test 2024-06-02 20:19:55 +02:00
TomTom101
410dbd3dfc fix: Fixed imports, support new PipelineParams 2024-06-02 20:16:11 +02:00
TomTom101
7085b1ea3f doc(openai): Added hint re the 24kHz sample rate 2024-06-01 20:35:46 +02:00
TomTom101
8683cae719 feat: OpenAITTS 2024-06-01 10:13:28 +02:00
Aleix Conchillo Flaqué
0197efa524 Merge pull request #200 from pipecat-ai/aleix/changelog-0.0.25
update CHANGELOG.md for version 0.0.25
2024-06-01 07:48:42 +08:00
Aleix Conchillo Flaqué
16e76caa33 update CHANGELOG.md for version 0.0.25 2024-05-31 16:48:03 -07:00
Aleix Conchillo Flaqué
1f5240694d Merge pull request #199 from pipecat-ai/aleix/langchain-changelog
move LangchainProcessor to processors/frameworks and update CHANGELOG
2024-06-01 07:46:51 +08:00
Aleix Conchillo Flaqué
f087151db7 move LangchainProcessor to processors/frameworks and update CHANGELOG 2024-05-31 16:45:39 -07:00
Aleix Conchillo Flaqué
0b691ff597 Merge pull request #198 from pipecat-ai/aleix/websocket-transport
websocket transport support
2024-06-01 04:40:39 +08:00
TomTom101
ae049961b7 wip: untested 2024-05-31 22:30:52 +02:00
Aleix Conchillo Flaqué
0d6eee705f Merge pull request #190 from TomTom101/TomTom101/langchain
Langchain service
2024-06-01 04:21:12 +08:00
Aleix Conchillo Flaqué
58d20ec9dc transport(websocket-server): add on_client_disconnected 2024-05-31 12:52:43 -07:00
Aleix Conchillo Flaqué
38befe1dc1 examples(websocket): rename server.py to bot.py 2024-05-31 12:09:54 -07:00
Aleix Conchillo Flaqué
2f335100a5 remove storage folder 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
3fef818843 examples(websocket-server): use VAD analyzer from transport 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
428c8af77e transports(websocket): base class from BaseInputTransport 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
54fccd2e25 pipeline: cleanup processors one by one 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
66c6a5dc0f transports(websocket): base class from BaseOutputTransport 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
92561ae19d some event loop parameter updates 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
b85e93410b transports(daily): fix event handlers callback 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
593993ba97 transports(base_input): remove unnecessary task 2024-05-31 11:37:41 -07:00
Aleix Conchillo Flaqué
7b8b606278 update CHANGELOG and create websocker-server instructions 2024-05-31 11:37:19 -07:00
Aleix Conchillo Flaqué
7116ad0607 examples: fix websocket-client audio playback 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
c507044277 examples: use gpt-4o model by default 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
5f45a9d90f examples: websocket-server updates 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e31e87aabd transport(websocket): update audio_frame_size 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
2957416d90 serializers(protobuf): support id and name fields 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
b9b761b67a added sample_rate and num_channels to protobuf AudioRawFrame 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
a7539e9317 transports: simplify and fix async and nested decorators 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
75575c0c68 use get_event_loop() and move event handlers to BaseTransport 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
77b3e08214 examples: add and update wbesocket eaxmples 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
956b783c1a transports: added new WebsocketServerTransport 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e90c080470 serializers: added BaseSerializer 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
37aabaa03a frames: generate protobuf pb2 file for pipecat package 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
3e289a7bef pyproject: add protobuf dependency 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
6dd5e3fdf5 dev-requirements: add grpcio-tools 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e60df3c7c0 Merge pull request #195 from pipecat-ai/aleix/function-calling-move-to-llmservice
function calling move to LLMService
2024-06-01 02:36:29 +08:00
Aleix Conchillo Flaqué
42f772beed examples: some function calling examples cleanup 2024-05-31 11:36:04 -07:00
Aleix Conchillo Flaqué
3655c4a0fc services: move function calling registration to LLMService 2024-05-31 11:36:04 -07:00
Aleix Conchillo Flaqué
012dbffd94 update CHANGELOG.md for function calling 2024-05-31 11:36:03 -07:00
TomTom101
4b39efeee3 fix(langchain): try/catch langchain import in service; Only langchain is installed with the [langchain] extra (#190) 2024-05-31 10:19:27 +02:00
Kwindla Hultman Kramer
19caf750fd Merge pull request #194 from pipecat-ai/khk-cartesia-changelog
Added cartesia line to CHANGELOG.md
2024-05-30 14:18:41 -07:00
Kwindla Hultman Kramer
296611714f added cartesia line to CHANGELOG.md 2024-05-30 10:41:00 -07:00
chadbailey59
4c3d19cc8b Function calling (#175)
* added function calling code back

* removed old llm_context file

* added integration testing for openai

* added function calling example

* added function callbacks

* added function start callback

* fixup

* fixup

* added different return type support for function calling

* intake example working

* added frame loggers

* cleanup

* fixup

* Update openai.py

* removed function call frame types

* fixup

* re-added example

* renumbered wake phrase

* fixup for autopep8

* remove unused imports
2024-05-30 12:25:39 -05:00
Aleix Conchillo Flaqué
a3ba07c7a3 Merge pull request #193 from pipecat-ai/aleix/fix-camera-out-enabled-cpu
transport(output): fix high CPU usage with camera_out_enabled and no …
2024-05-31 01:25:06 +08:00
Kwindla Hultman Kramer
a1579808b2 Merge pull request #189 from pipecat-ai/khk-cartesia-etc
Cartesia TTS
2024-05-30 10:24:45 -07:00
Aleix Conchillo Flaqué
aecb9f5816 transport(output): fix high CPU usage with camera_out_enabled and no images 2024-05-30 10:18:43 -07:00
Aleix Conchillo Flaqué
a5d42a526c Merge pull request #191 from pipecat-ai/aleix/fix-silero-vad
vad: fix silero vad frame processor
2024-05-30 23:25:52 +08:00
Aleix Conchillo Flaqué
a9472f8116 vad: fix silero vad frame processor 2024-05-30 07:50:58 -07:00
TomTom101
b19243ab75 fix: corrected hint to install Langchain libs 2024-05-30 10:53:42 +02:00
TomTom101
2bf094b950 test(langchain): Rewrite to unittest, make it meaningful 2024-05-30 10:43:33 +02:00
Kwindla Hultman Kramer
d5f106ae19 pr fixes 2024-05-29 23:41:35 -07:00
Kwindla Hultman Kramer
920745345a cartesia tts support 2024-05-29 23:35:35 -07:00
TomTom101
143033d7db fix: install langchain-community with the langchain extra 2024-05-30 03:15:14 +02:00
TomTom101
335990c145 wip: hint to install langchain_community 2024-05-30 03:15:14 +02:00
TomTom101
6d24e836b0 wip: Example using LC message history 2024-05-30 03:15:14 +02:00
TomTom101
278a2fed56 wip: First stab at langchain support
Is this a service or processor?
How to deal with conversation history? LC has sophisticated means of this, but might get in the way of `LLMResponseAggregator`
2024-05-30 03:15:14 +02:00
Aleix Conchillo Flaqué
c444004eec Merge pull request #186 from pipecat-ai/aleix/update-changelog-0.0.24
update CHANGELOG.md 0.0.24
2024-05-29 23:23:06 +08:00
Aleix Conchillo Flaqué
72cf7896d7 update CHANGELOG.md 0.0.24 2024-05-29 08:22:33 -07:00
Aleix Conchillo Flaqué
31af5f8177 Merge pull request #182 from pipecat-ai/aleix/expo-se-dialin-ready
transports(daily): expose dialin-ready and handle timeouts
2024-05-29 23:05:47 +08:00
Aleix Conchillo Flaqué
6a68d9a57e pyproject: update daily-python to 0.9.0 2024-05-28 18:30:43 -07:00
Aleix Conchillo Flaqué
39f41ab25e transports(daily): expose dialin-ready and handle timeouts 2024-05-28 18:00:09 -07:00
Aleix Conchillo Flaqué
624cc1e987 Merge pull request #185 from pipecat-ai/aleix/add-start-recording
transport(daily): add start_recording, stop_recording and stop_dialout
2024-05-29 08:24:59 +08:00
Aleix Conchillo Flaqué
08a15e5cdd transports(daily): expose on_app_message 2024-05-28 17:23:34 -07:00
Aleix Conchillo Flaqué
4cd4787e4d transports(daily): added on_call_state_updated 2024-05-28 17:23:34 -07:00
Aleix Conchillo Flaqué
65afee2808 transport(daily): add start_recording, stop_recording and stop_dialout 2024-05-28 17:16:39 -07:00
Aleix Conchillo Flaqué
00ece864ec Merge pull request #184 from pipecat-ai/aleix/introduce-pipelineparams
introduce PipelineParams
2024-05-29 08:14:58 +08:00
Aleix Conchillo Flaqué
6d6d9bea5a introduce PipelineParams 2024-05-28 17:14:14 -07:00
Kwindla Hultman Kramer
7c213f8533 Merge pull request #183 from pipecat-ai/khk-deepgram-fix
moving Deepgram TTS base_url from beta to prod
2024-05-28 17:04:03 -07:00
Kwindla Hultman Kramer
3685c19b2d moving Deepgram TTS base_url from beta to prod 2024-05-28 15:59:26 -07:00
Aleix Conchillo Flaqué
650a2b4da4 Merge pull request #174 from pipecat-ai/fix-azure-llm-service
services(azure): fix AzureLLMService
2024-05-25 00:27:51 +08:00
Aleix Conchillo Flaqué
afea6f38f6 examples: no need to define tts twice 2024-05-24 09:23:00 -07:00
Aleix Conchillo Flaqué
c45d428551 services(google): make api_key argument mandatory 2024-05-24 09:23:00 -07:00
Aleix Conchillo Flaqué
4e594aa9b0 services: BaseOpenAILLMService.create_client() now returns the client 2024-05-24 09:04:15 -07:00
Aleix Conchillo Flaqué
32f91c5f31 services(azure): fix AzureLLMService
Fixes #160
2024-05-23 16:51:04 -07:00
Aleix Conchillo Flaqué
a32ece897a Merge pull request #179 from pipecat-ai/aleix/aiohttp-response-text
fix aiohttp response text
2024-05-24 07:42:05 +08:00
Aleix Conchillo Flaqué
88f6436aaa fix aiohttp response text 2024-05-23 15:51:00 -07:00
Aleix Conchillo Flaqué
fac43cea06 Merge pull request #178 from pipecat-ai/aleix/daily-python-0.8.0-deps
update linux/macos requirements
2024-05-24 05:50:10 +08:00
Aleix Conchillo Flaqué
a9e6aeed54 update linux/macos requirements 2024-05-23 14:49:34 -07:00
Aleix Conchillo Flaqué
fa9f49f5bb Merge pull request #177 from pipecat-ai/aleix/dialin-ready-missing-sipuri
transports(daily): fix dialin-ready event handling
2024-05-24 05:39:31 +08:00
Aleix Conchillo Flaqué
2a6183aba5 transports(daily): fix dialin-ready event handling 2024-05-23 14:38:37 -07:00
Aleix Conchillo Flaqué
b1a622971b Merge pull request #176 from pipecat-ai/aleix/handle-dialin-ready
transport(daily): add support for dial-in use cases
2024-05-24 04:58:10 +08:00
Aleix Conchillo Flaqué
5b72faccb4 update CHANGELOG.md for release 0.0.22 2024-05-23 13:57:28 -07:00
Aleix Conchillo Flaqué
c8732544c7 transport(daily): add support for dial-in use cases 2024-05-23 13:56:50 -07:00
Aleix Conchillo Flaqué
d4219b16b8 Merge pull request #170 from pipecat-ai/add-daily-transport-dialout-support
transport(daily): add dialout support
2024-05-24 04:19:51 +08:00
Aleix Conchillo Flaqué
0c33432f64 transport(daily): update CHANGELOG.md with dialout/dialin updates 2024-05-23 13:14:34 -07:00
Aleix Conchillo Flaqué
95bd58cced pyproject: depend on daily-python 0.8.0 2024-05-23 13:10:48 -07:00
Aleix Conchillo Flaqué
8d7d1a7e24 transport(daily): add dialin-ready event 2024-05-23 07:12:31 -07:00
Aleix Conchillo Flaqué
3768cb2f2c transport(daily): add dialout support 2024-05-22 22:44:01 -07:00
Aleix Conchillo Flaqué
d4b2741608 Merge pull request #169 from pipecat-ai/update-changelog-0.0.21
update CHANGELOG.md for 0.0.21
2024-05-23 12:42:41 +08:00
Aleix Conchillo Flaqué
aef2152dcc update CHANGELOG.md for 0.0.21 2024-05-22 21:40:29 -07:00
Aleix Conchillo Flaqué
d0b0221b97 Merge pull request #167 from pipecat-ai/khk-bump-anthropic
add new response frame types and vision support for anthropic
2024-05-23 12:16:55 +08:00
Kwindla Hultman Kramer
b4758cd989 update CHANGELOG.md 2024-05-22 21:14:11 -07:00
Kwindla Hultman Kramer
681250f114 add new response frame types and vision support for anthropic 2024-05-22 21:12:30 -07:00
Aleix Conchillo Flaqué
fd13d3c50e Merge pull request #168 from pipecat-ai/transcription-logging
transports(daily): add transcription logging
2024-05-23 11:42:51 +08:00
Aleix Conchillo Flaqué
674b8bb0cd transports(daily): add transcription logging 2024-05-22 20:41:34 -07:00
Aleix Conchillo Flaqué
5d9a962146 Merge pull request #166 from pipecat-ai/fix-llm-response-wake-check
fix llm response wake check
2024-05-23 11:35:11 +08:00
Aleix Conchillo Flaqué
e130aada72 filters(WakeCheckFilter): increase timeout to 3 2024-05-22 19:41:14 -07:00
Aleix Conchillo Flaqué
76709a9a39 enclose text between brackets when logging 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
acd2d55b84 examples(14): remove commented code 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
fcec0eb812 transports(base): log when user is speaking 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
e9965347b5 processors(WakeCheckFilter): log what frame we are pushing 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
5a83f75e0d processors: fix user response processors 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
91c706a201 Merge pull request #165 from pipecat-ai/clear-audio-output-buffer-when-interrupted
transport(base): clear audio output buffer if interrupted
2024-05-23 07:31:33 +08:00
Aleix Conchillo Flaqué
34384881bc transport(base): clear audio output buffer if interrupted 2024-05-22 16:30:43 -07:00
Aleix Conchillo Flaqué
71ba28753e Merge pull request #157 from pipecat-ai/khk-improved-wake-word
Improved wake word filter
2024-05-23 06:47:59 +08:00
Aleix Conchillo Flaqué
32d2f0db66 update CHANGELOG.ms with filters updates 2024-05-22 15:46:13 -07:00
Aleix Conchillo Flaqué
e1169a4e82 processors(WakeCheckFilter): push error 2024-05-22 15:44:44 -07:00
Aleix Conchillo Flaqué
0e5711e62d examples: update 10-wake-work.py to use WakeCheckFilter 2024-05-22 15:44:44 -07:00
Aleix Conchillo Flaqué
0ddfa3de5b move WakeCheckFilter to processors/filters 2024-05-22 15:44:43 -07:00
Kwindla Hultman Kramer
661aa79b7c fix user_id str field name in TranscriptionFrame 2024-05-22 15:44:43 -07:00
Kwindla Hultman Kramer
2c32cc2f27 improved wake word filter 2024-05-22 15:44:43 -07:00
Aleix Conchillo Flaqué
d7bb0bc5cb Merge pull request #164 from pipecat-ai/readd-vad-exp-smoothing
vad: re-add volume exponential smoothing
2024-05-23 06:44:27 +08:00
Aleix Conchillo Flaqué
d5644c3ab9 vad: re-add volume exponential smoothing 2024-05-22 15:26:32 -07:00
Aleix Conchillo Flaqué
09ab8e3efd Merge pull request #163 from pipecat-ai/update-0.0.20-deps
update requirements files
2024-05-23 05:40:12 +08:00
Aleix Conchillo Flaqué
2f683529ec update requirements files 2024-05-22 14:39:26 -07:00
Aleix Conchillo Flaqué
6ac012a82b Merge pull request #158 from pipecat-ai/use-pyloudnorm-loudness
interruptions: introduce pyloudnorm to compute loudness
2024-05-23 05:24:38 +08:00
Aleix Conchillo Flaqué
075194cb54 update CHANGELOG for 0.0.20 2024-05-22 14:21:13 -07:00
Aleix Conchillo Flaqué
269f070051 audio: no need for compute_rms 2024-05-22 14:09:24 -07:00
Aleix Conchillo Flaqué
3342c9d7c2 services(stt): use calculate_audio_volume 2024-05-22 13:05:20 -07:00
Aleix Conchillo Flaqué
b468b2f926 audio: clamp normalized volume 2024-05-22 13:04:09 -07:00
Aleix Conchillo Flaqué
af1c7d0023 interruptions: introduce pyloudnorm to compute loudness
https://github.com/csteinmetz1/pyloudnorm
2024-05-22 11:52:07 -07:00
Aleix Conchillo Flaqué
34670eef79 Merge pull request #162 from pipecat-ai/reset-before-pushing
processors: reset aggergator before pushing
2024-05-23 02:51:55 +08:00
Aleix Conchillo Flaqué
979739c1b7 processors: reset aggergator before pushing 2024-05-22 11:26:08 -07:00
Aleix Conchillo Flaqué
83ed6870b9 Merge pull request #161 from pipecat-ai/only-interrupt-assistant
processors: only interrupt asssisstant
2024-05-23 02:02:43 +08:00
Aleix Conchillo Flaqué
57a568986a processors: only interrupt asssisstant
We were pushing interruption frames in the audio task. This was caussing the
LLMUserResponseAggregator to push the accumulated text and then casuing the LLM
to respond.
2024-05-22 10:15:35 -07:00
Aleix Conchillo Flaqué
e828e26b5b Merge pull request #159 from pipecat-ai/create-pool-executor
transports: run threads in their own ThreadPoolExecutor
2024-05-22 15:49:03 +08:00
Aleix Conchillo Flaqué
825738440e transports: run threads in their own ThreadPoolExecutor 2024-05-21 18:52:27 -07:00
Aleix Conchillo Flaqué
147bd1a075 Merge pull request #156 from pipecat-ai/pipecat-0.0.19
update CHANGELOG.md for 0.0.19
2024-05-21 12:36:48 +08:00
Aleix Conchillo Flaqué
209e97f372 update CHANGELOG.md for 0.0.19 2024-05-20 21:33:15 -07:00
Aleix Conchillo Flaqué
47f8627432 Merge pull request #155 from pipecat-ai/llm-accumlate-full-response
aggregators: accumulate full responses and take interruptions into ac…
2024-05-21 11:34:39 +08:00
Aleix Conchillo Flaqué
cc6713837a github: publish test to pypi again. simply always use PRs 2024-05-20 12:19:39 -07:00
Aleix Conchillo Flaqué
728fe0ad88 github: don't publish to test pypi twice 2024-05-20 12:15:54 -07:00
Aleix Conchillo Flaqué
dbba45349f github: don't run publish_test on main branch 2024-05-20 12:14:00 -07:00
Aleix Conchillo Flaqué
40ccf46b4b aggregators: accumulate full responses and take interruptions into account 2024-05-20 11:40:57 -07:00
Aleix Conchillo Flaqué
077bb9f20a Merge pull request #153 from pipecat-ai/expose-llm-messages
aggregators: expose LLM messages
2024-05-21 02:40:26 +08:00
Aleix Conchillo Flaqué
e4c990c677 aggregators: expose LLM messages 2024-05-20 10:51:37 -07:00
Aleix Conchillo Flaqué
1c8b9d813a examples: minot updates to storytelling-chatbot instructions 2024-05-20 10:31:33 -07:00
Aleix Conchillo Flaqué
83812f2671 transports(daily): implement DailyOutputTransport.send_message 2024-05-20 10:30:59 -07:00
Aleix Conchillo Flaqué
4053c33899 update CHANGELOG for 0.0.17 2024-05-19 19:27:20 -07:00
Aleix Conchillo Flaqué
03978b63bc update linux-py3.10-requirements.txt 2024-05-19 19:27:04 -07:00
Aleix Conchillo Flaqué
bf036be6b8 Merge pull request #150 from pipecat-ai/khk-gemini
Initial commit of Google Gemini LLM service.
2024-05-20 10:24:31 +08:00
Kwindla Hultman Kramer
7ffb10d7f5 add to CHANGELOG.md 2024-05-19 12:44:45 -07:00
Kwindla Hultman Kramer
66377954cb fix up openai vision and gemini implementation 2024-05-19 12:33:57 -07:00
Kwindla Hultman Kramer
e507686cef oops, fix openai.py 2024-05-19 11:13:39 -07:00
Kwindla Hultman Kramer
e5ddaf14f4 add google and deepgram to README.md 2024-05-19 11:09:30 -07:00
Kwindla Hultman Kramer
cf597a2f6b add back in debug log line in openai.py 2024-05-19 11:08:38 -07:00
Kwindla Hultman Kramer
d83f0aabca generate macos-py3.10-requirements.txt with Python 3.10 2024-05-19 10:53:50 -07:00
Kwindla Hultman Kramer
b337e984b3 Initial commit of Google Gemini LLM service.
Gemini text input works. We translate from OpenAILLMContext format
on the fly in the GoogleLLMService implementation. This commit also
implements image input (vision) in both the GoogleLLMService and in
the OpenAILLMService. Image input is a hack and needs to be revisited.
OpenAI expects images to be uploaded as base64-encoded JPEGs. Google
does not require the base64 encoding. Other than for images, we use
the OpenAI format as our standard, but base64-encoding the images
and then unencoding them in the GoogleLLMService feels wasteful.
2024-05-19 10:35:20 -07:00
Aleix Conchillo Flaqué
6366ee072e Merge pull request #144 from pipecat-ai/initial-interruptions
intial basic interruptions support
2024-05-20 01:33:15 +08:00
Aleix Conchillo Flaqué
c3bfcbd562 aggregators: clear accumulated responses if interruption happens 2024-05-19 10:21:45 -07:00
Aleix Conchillo Flaqué
c0d5054798 examples: some prompt tweaking 2024-05-19 09:41:36 -07:00
Aleix Conchillo Flaqué
810dc30d3d examples: fix examples to use LLMFullResponseEndFrame 2024-05-19 09:39:34 -07:00
Aleix Conchillo Flaqué
36dd4933e9 example: add assistant responses to simple chatbot 2024-05-18 10:01:46 -07:00
Aleix Conchillo Flaqué
435fffe1b0 add LLMFullResponseStartFrame/LLMFullResponseEndFrame 2024-05-18 09:49:38 -07:00
Aleix Conchillo Flaqué
2b8f1c4cda services(openai): send LLMResponseStartFrame for each completion 2024-05-17 17:47:33 -07:00
Aleix Conchillo Flaqué
0e8c7a9b28 transports(output): create an downstream push frame task 2024-05-17 17:47:24 -07:00
Aleix Conchillo Flaqué
3e13678f23 vad: use exponential smoothed volume to improve speech detection 2024-05-17 17:13:31 -07:00
Aleix Conchillo Flaqué
455ec4f1fd services(tts): always send received TextFrame downstream 2024-05-17 17:11:11 -07:00
Aleix Conchillo Flaqué
8dc81042c3 examples: use DailyTranscriptionSettings in translation-chatbot 2024-05-17 15:37:29 -07:00
Aleix Conchillo Flaqué
c77db79447 examples: pipelines readability and add LLM assistants after transport 2024-05-17 14:52:51 -07:00
Aleix Conchillo Flaqué
de65028061 vad: reduce default confidence back to 0.5 2024-05-17 14:39:40 -07:00
Aleix Conchillo Flaqué
d66a795413 examples: use SileroVADAnalyzer instead of SileroVAD 2024-05-17 14:18:55 -07:00
Aleix Conchillo Flaqué
34762bf604 transports: allows update allow_interruptinos when receiving StartFrame 2024-05-17 14:15:37 -07:00
Aleix Conchillo Flaqué
57121338b1 pipeline(task): cleanup processors only if we need to 2024-05-17 13:53:33 -07:00
Aleix Conchillo Flaqué
a5d246ec0c vad: use exponential smoothing to avoid sudden changes 2024-05-17 13:53:33 -07:00
Aleix Conchillo Flaqué
f2cefeeedc utils: move exp_smoothing to utils module 2024-05-17 13:52:18 -07:00
Aleix Conchillo Flaqué
537e72a05f vad: introduce VADParams so you can tweak things 2024-05-17 13:52:18 -07:00
Aleix Conchillo Flaqué
efa5a061d7 silero: simplify int16 -> float32 conversion 2024-05-17 13:51:06 -07:00
Aleix Conchillo Flaqué
0bef44c2ff introduce StartInterruptionFrame and StopInterruptionFrame 2024-05-17 13:51:06 -07:00
Aleix Conchillo Flaqué
f62fe059b1 fix issues with Ctrl-C tasks cancellation 2024-05-17 13:51:04 -07:00
Aleix Conchillo Flaqué
f432e2b17e transports: allow adding a vad analyzer to BaseInputTransport 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
8c877d7d8e examples: update 07-interruptible 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
dc9377fb92 add missing queue task_done() 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
7384b63b1d initial interruptions support 2024-05-17 13:50:45 -07:00
Aleix Conchillo Flaqué
ba6ecf541f update CHANGELOG.md for 0.0.16 2024-05-16 18:15:07 -07:00
Aleix Conchillo Flaqué
94e5709d58 Merge pull request #149 from pipecat-ai/transports-push-task
transport: create input transports push frame task
2024-05-17 09:14:35 +08:00
Aleix Conchillo Flaqué
add8d3cbaf transport: create input transports push frame task 2024-05-16 16:54:39 -07:00
Aleix Conchillo Flaqué
1a42188bce Merge pull request #146 from pipecat-ai/daily-dont-send-tracks-if-not-enabled
transports(daily): don't send camera/audio tracks if not enabled
2024-05-17 01:24:39 +08:00
Aleix Conchillo Flaqué
0da427e127 transports(daily): don't send camera/audio tracks if not enabled 2024-05-16 08:16:39 -07:00
Aleix Conchillo Flaqué
9447b32f3e transports(daily): on_app_message doesn't need to be event handler 2024-05-15 17:06:47 -07:00
Aleix Conchillo Flaqué
af10adb7fe some minor event loop updates 2024-05-15 17:00:43 -07:00
Aleix Conchillo Flaqué
129acf886f transports(daily): hot fix for receiving transport messages 2024-05-15 17:00:04 -07:00
Aleix Conchillo Flaqué
9af3e1efac update CHANGELOG.md for 0.0.14 2024-05-15 15:59:38 -07:00
Aleix Conchillo Flaqué
9e22a8b4ff transports(daily): add receiving transport messages 2024-05-15 15:59:08 -07:00
Aleix Conchillo Flaqué
28da747f19 transports(daily): fix on_participant_left event 2024-05-15 15:40:31 -07:00
Aleix Conchillo Flaqué
3d6783ddb0 transports: resize output image if it doesn't match camera 2024-05-15 15:36:20 -07:00
Aleix Conchillo Flaqué
349fc526d7 transports(daily): avoid locking if no participant has joined yet 2024-05-15 15:24:58 -07:00
Aleix Conchillo Flaqué
acf6dc0a30 transports: more start and stop fixes 2024-05-15 15:23:03 -07:00
Aleix Conchillo Flaqué
3563e66ff6 transports(daily): add on_participant_left event 2024-05-15 15:20:37 -07:00
Aleix Conchillo Flaqué
8965ff27ec examples: use DEBUG in 09-mirror.py 2024-05-14 19:25:31 -07:00
Aleix Conchillo Flaqué
86feb1e104 services: fix DailyTransport stop/cleanup ordering 2024-05-14 19:24:55 -07:00
Aleix Conchillo Flaqué
f6257a86d3 examples: re-enable audio in 09-mirror.py 2024-05-14 19:23:35 -07:00
Aleix Conchillo Flaqué
bd04ea8aca examples: simplify 09-mirror.py 2024-05-14 19:07:19 -07:00
Aleix Conchillo Flaqué
754c1c6775 services: fixed DailyTransport output camera and audio 2024-05-14 19:07:19 -07:00
Aleix Conchillo Flaqué
0b01eb5a11 services: pass **kwargs to TTService 2024-05-14 18:46:03 -07:00
Aleix Conchillo Flaqué
6247b9df39 services: fix STTService and WhisperSTTService 2024-05-14 18:45:40 -07:00
Aleix Conchillo Flaqué
bd5344c892 services: MoondreamService model_id argument is now model 2024-05-14 18:34:10 -07:00
Aleix Conchillo Flaqué
e4fe54cd7f vad: rename VADAnalyzer arguments 2024-05-14 18:33:17 -07:00
Aleix Conchillo Flaqué
97f9e9b042 examples: update simple-chatbot prompt 2024-05-14 15:30:31 -07:00
Aleix Conchillo Flaqué
3668eb1606 update CHANGELOG for 0.0.12 2024-05-14 14:52:08 -07:00
Aleix Conchillo Flaqué
e23addcc02 examples: update simple-chatbot with Spanish 2024-05-14 14:51:44 -07:00
Aleix Conchillo Flaqué
5147f4086e transports(daily): add DailyTranscriptionSettings to update settings easier 2024-05-14 14:49:30 -07:00
Aleix Conchillo Flaqué
fb3c2de83f Merge pull request #141 from pipecat-ai/add-changelog
add CHANGELOG.md
2024-05-15 04:47:45 +08:00
Aleix Conchillo Flaqué
107817317c add CHANGELOG.md 2024-05-14 13:45:01 -07:00
Aleix Conchillo Flaqué
663ff3417c examples: add missing requirements 2024-05-14 08:03:51 -07:00
Aleix Conchillo Flaqué
2b19d6bbac examples: remove commented out silero from storytelling 2024-05-14 00:57:21 -07:00
Aleix Conchillo Flaqué
7c41246e55 examples: fix storytelling example 2024-05-14 00:32:37 -07:00
Aleix Conchillo Flaqué
11aa9dc803 pipeline: allow stopping tasks with StopTaskFrame 2024-05-14 00:30:32 -07:00
Aleix Conchillo Flaqué
922cdefee5 services: run_* now return async generators 2024-05-14 00:30:07 -07:00
Aleix Conchillo Flaqué
e018d5b47a transports(daily): always allow capturing transcriptions 2024-05-14 00:29:02 -07:00
Aleix Conchillo Flaqué
20c679988c transports: allow base transports to be reused 2024-05-14 00:28:43 -07:00
Aleix Conchillo Flaqué
a344101cff README.md: s/Twitter/X/ 2024-05-13 18:24:06 -07:00
Aleix Conchillo Flaqué
2cefc40a77 README.md: use http urls for images 2024-05-13 18:20:57 -07:00
Aleix Conchillo Flaqué
68f0da26b6 examples: more translation-chatbot fixes 2024-05-13 17:57:11 -07:00
Aleix Conchillo Flaqué
9aea8e951c aggregators/sentence: ignore interim transcriptions 2024-05-13 17:56:19 -07:00
Aleix Conchillo Flaqué
12ff6d08fe examples: fix translation-chatbot 2024-05-13 16:22:11 -07:00
Aleix Conchillo Flaqué
1b21867a6f transports: add support for sending transport messages 2024-05-13 16:22:11 -07:00
Aleix Conchillo Flaqué
d28d0fa218 processors: add FrameProcessor.push_error 2024-05-13 16:12:35 -07:00
Aleix Conchillo Flaqué
01381f6dcd frames: add TransportMessageFrame 2024-05-13 16:12:30 -07:00
Aleix Conchillo Flaqué
c111fff0f7 services: update azure services 2024-05-13 16:12:26 -07:00
Aleix Conchillo Flaqué
50677e6085 Merge pull request #138 from pipecat-ai/moondream-chatbot-fixes
examples: fix moondream-chatbot
2024-05-14 06:29:13 +08:00
Aleix Conchillo Flaqué
22cd1ac5f2 examples: fix moondream-chatbot 2024-05-13 15:28:11 -07:00
Kwindla Hultman Kramer
fdfcfd1d5e Merge pull request #137 from rahulunair/intel_gpu
(feat): adding intel gpus support
2024-05-13 14:52:34 -07:00
Aleix Conchillo Flaqué
b6385be6c6 Merge pull request #136 from pipecat-ai/simple-chatbot-fixes
examples: fix simple-chatbot
2024-05-14 05:41:52 +08:00
rahulunair
6be88fa81b (feat): adding intel gpus support 2024-05-13 21:21:05 +00:00
Aleix Conchillo Flaqué
ed31c7924e examples: fix simple-chatbot 2024-05-13 13:19:11 -07:00
Jon Taylor
4898084645 Update LICENSE 2024-05-13 20:49:51 +01:00
chadbailey59
6be0751a52 Delete CNAME 2024-05-13 14:42:46 -05:00
Aleix Conchillo Flaqué
7ce1206ed4 Create CNAME 2024-05-13 12:05:08 -07:00
Jon Taylor
1b5130694a Update README.md 2024-05-13 19:36:39 +01:00
Jon Taylor
7c6199e93e Merge pull request #135 from pipecat-ai/jpt/devrel-edits-2
Jpt/devrel edits 2
2024-05-13 18:19:33 +01:00
Jon Taylor
3be742479d removed space 2024-05-13 18:17:00 +01:00
Aleix Conchillo Flaqué
d380b02a44 README: improve code reading 2024-05-13 10:12:19 -07:00
Aleix Conchillo Flaqué
5600fc49f1 README: fix code indentation 2024-05-13 10:08:09 -07:00
Jon Taylor
5f0d8b8d9f removed docs badge 2024-05-13 17:42:01 +01:00
Jon Taylor
8204e5c2d4 removed images 2024-05-13 17:41:03 +01:00
Jon Taylor
29b98c0326 removed images from examples readme 2024-05-13 17:40:07 +01:00
Jon Taylor
3502ef4745 Merge pull request #134 from pipecat-ai/jpt/devrel-edits
Added example apps to repo
2024-05-13 17:37:31 +01:00
Jon Taylor
0d28e84c59 addressed nitpicks 2024-05-13 17:37:01 +01:00
Jon Taylor
062fbf4ce3 fixed header for VAD 2024-05-13 17:20:50 +01:00
Jon Taylor
af8471b370 changed daily_url to daily_room 2024-05-13 17:20:10 +01:00
Jon Taylor
f756027333 updated text for simple example 2024-05-13 17:17:41 +01:00
Jon Taylor
65c4c0b21f fixed typo in readme 2024-05-13 17:14:17 +01:00
Jon Taylor
f1c02f8554 added examples back 2024-05-13 17:09:46 +01:00
Jon Taylor
27ba50cbbf updated README with sample code 2024-05-13 14:51:10 +01:00
Aleix Conchillo Flaqué
b254525d3c go back to using @dataclass since they can be inspected 2024-05-12 22:35:43 -07:00
Aleix Conchillo Flaqué
6c06fb8169 README: update pypi badge 2024-05-12 19:28:00 -07:00
Aleix Conchillo Flaqué
721cd11d62 Merge pull request #133 from pipecat-ai/aleix/readme
rebased jpt/readme branch
2024-05-13 10:26:45 +08:00
Aleix Conchillo Flaqué
bfbcb9d531 fix autopep8 linting 2024-05-12 19:25:17 -07:00
Aleix Conchillo Flaqué
724e78c5be renamed image.png to pipecat.png 2024-05-12 17:44:10 -07:00
Jon Taylor
d3c3d78855 added discord badge 2024-05-12 17:41:36 -07:00
Jon Taylor
8fa9fdcd5a Reworked readme to have more pipes and cats 2024-05-12 17:41:30 -07:00
Aleix Conchillo Flaqué
7856d20a38 Merge pull request #132 from pipecat-ai/pypi-repo-change
change pypi repo to pipecat-ai
2024-05-13 03:14:40 +08:00
Aleix Conchillo Flaqué
6d10027f2d change pypi repo to pipecat-ai 2024-05-12 12:08:43 -07:00
Aleix Conchillo Flaqué
bea31215dc Merge pull request #129 from daily-co/wip-proposal
pipecat proposal
2024-05-13 01:13:18 +08:00
Aleix Conchillo Flaqué
083480ca1e update macos-py3.10-requirements.txt 2024-05-12 10:10:35 -07:00
Aleix Conchillo Flaqué
65846330cf update linux-py3.10-requirements.txt 2024-05-12 10:09:04 -07:00
Aleix Conchillo Flaqué
29f48266f7 README: install dev-requirements.txt first 2024-05-12 10:07:54 -07:00
Aleix Conchillo Flaqué
bfd583211c examples: use LocalAudioTransport 2024-05-12 10:07:54 -07:00
Aleix Conchillo Flaqué
b026915d19 initial commit for new pipecat architecture 2024-05-12 10:07:25 -07:00
Aleix Conchillo Flaqué
4a0836dc8f Merge pull request #130 from daily-co/dependabot-05-06-24
dependabot: update packages 05-06-24
2024-05-07 08:14:38 +08:00
Aleix Conchillo Flaqué
2729c6bf5b dependabot: update packages 05-06-24 2024-05-06 15:33:33 -07:00
Aleix Conchillo Flaqué
712a889121 Merge pull request #128 from daily-co/pillow-security-fixes
pyproject: pillow security fixes
2024-04-23 01:51:49 +08:00
Aleix Conchillo Flaqué
2f341e4fb0 pyproject: pillow security fixes 2024-04-22 10:28:42 -07:00
Kwindla Hultman Kramer
24198ecf45 Merge pull request #126 from daily-co/jptaylor-patch-3
Update README.md
2024-04-12 23:10:30 -07:00
Jon Taylor
7e4fefe958 Update README.md 2024-04-12 22:45:30 -07:00
Jon Taylor
e9af39b85f Merge pull request #125 from daily-co/jptaylor-patch-2
Update README.md
2024-04-12 22:44:14 -07:00
Jon Taylor
38aa3cebb4 Update README.md 2024-04-12 22:42:11 -07:00
Jon Taylor
72724365a0 Merge pull request #124 from daily-co/jptaylor-patch-1
Update README.md
2024-04-12 22:40:29 -07:00
Jon Taylor
5368462e41 Update README.md 2024-04-12 22:28:40 -07:00
Jon Taylor
1b2b29dd18 Merge pull request #123 from daily-co/jpt/pypi-badge
added pypi badge
2024-04-12 07:33:26 -07:00
Kwindla Hultman Kramer
d2b2b6f619 Merge pull request #122 from daily-co/kwindla-patch-1
Update README.md
2024-04-11 21:34:37 -07:00
Jon Taylor
54bcb52129 added pypi badge 2024-04-11 21:34:27 -07:00
Kwindla Hultman Kramer
3dc7438bc8 Update README.md 2024-04-11 21:05:27 -07:00
329 changed files with 18219 additions and 6226 deletions

View File

@@ -46,7 +46,7 @@ jobs:
needs: [ build ]
environment:
name: pypi
url: https://pypi.org/p/dailyai
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
@@ -67,7 +67,7 @@ jobs:
needs: [ build ]
environment:
name: testpypi
url: https://pypi.org/p/dailyai
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:

View File

@@ -40,13 +40,13 @@ jobs:
name: wheels
path: ./dist
publish-to-pypi:
publish-to-test-pypi:
name: "Publish to Test PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: testpypi
url: https://pypi.org/p/dailyai
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:

461
CHANGELOG.md Normal file
View File

@@ -0,0 +1,461 @@
# Changelog
All notable changes to **pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.0.26] - 2024-06-05
### Added
- Allow passing `output_format` and `model_id` to `CartesiaTTSService` to change
audio sample format and the model to use.
- Added `DailyRESTHelper` which helps you create Daily rooms and tokens in an
easy way.
- `PipelineTask` now has a `has_finished()` method to indicate if the task has
completed. If a task is never ran `has_finished()` will return False.
- `PipelineRunner` now supports SIGTERM. If received, the runner will be
canceled.
### Fixed
- Fixed an issue where `BaseInputTransport` and `BaseOutputTransport` where
stopping push tasks before pushing `EndFrame` frames could cause the bots to
get stuck.
- Fixed an error closing local audio transports.
- Fixed an issue with Deepgram TTS that was introduced in the previous release.
- Fixed `AnthropicLLMService` interruptions. If an interruption occurred, a
`user` message could be appended after the previous `user` message. Anthropic
does not allow that because it requires alternate `user` and `assistant`
messages.
### Performance
- The `BaseInputTransport` does not pull audio frames from sub-classes any
more. Instead, sub-classes now push audio frames into a queue in the base
class. Also, `DailyInputTransport` now pushes audio frames every 20ms instead
of 10ms.
- Remove redundant camera input thread from `DailyInputTransport`. This should
improve performance a little bit when processing participant videos.
- Load Cartesia voice on startup.
## [0.0.25] - 2024-05-31
### Added
- Added WebsocketServerTransport. This will create a websocket server and will
read messages coming from a client. The messages are serialized/deserialized
with protobufs. See `examples/websocket-server` for a detailed example.
- Added function calling (LLMService.register_function()). This will allow the
LLM to call functions you have registered when needed. For example, if you
register a function to get the weather in Los Angeles and ask the LLM about
the weather in Los Angeles, the LLM will call your function.
See https://platform.openai.com/docs/guides/function-calling
- Added new `LangchainProcessor`.
- Added Cartesia TTS support (https://cartesia.ai/)
### Fixed
- Fixed SileroVAD frame processor.
- Fixed an issue where `camera_out_enabled` would cause the highg CPU usage if
no image was provided.
### Performance
- Removed unnecessary audio input tasks.
## [0.0.24] - 2024-05-29
### Added
- Exposed `on_dialin_ready` for Daily transport SIP endpoint handling. This
notifies when the Daily room SIP endpoints are ready. This allows integrating
with third-party services like Twilio.
- Exposed Daily transport `on_app_message` event.
- Added Daily transport `on_call_state_updated` event.
- Added Daily transport `start_recording()`, `stop_recording` and
`stop_dialout`.
### Changed
- Added `PipelineParams`. This replaces the `allow_interruptions` argument in
`PipelineTask` and will allow future parameters in the future.
- Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.
- GoogleLLMService `api_key` argument is now mandatory.
### Fixed
- Daily tranport `dialin-ready` doesn't not block anymore and it now handles
timeouts.
- Fixed AzureLLMService.
## [0.0.23] - 2024-05-23
### Fixed
- Fixed an issue handling Daily transport `dialin-ready` event.
## [0.0.22] - 2024-05-23
### Added
- Added Daily transport `start_dialout()` to be able to make phone or SIP calls.
See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout
- Added Daily transport support for dial-in use cases.
- Added Daily transport events: `on_dialout_connected`, `on_dialout_stopped`,
`on_dialout_error` and `on_dialout_warning`. See
https://reference-python.daily.co/api_reference.html#daily.EventHandler
## [0.0.21] - 2024-05-22
### Added
- Added vision support to Anthropic service.
- Added `WakeCheckFilter` which allows you to pass information downstream only
if you say a certain phrase/word.
### Changed
- `Filter` has been renamed to `FrameFilter` and it's now under
`processors/filters`.
### Fixed
- Fixed Anthropic service to use new frame types.
- Fixed an issue in `LLMUserResponseAggregator` and `UserResponseAggregator`
that would cause frames after a brief pause to not be pushed to the LLM.
- Clear the audio output buffer if we are interrupted.
- Re-add exponential smoothing after volume calculation. This makes sure the
volume value being used doesn't fluctuate so much.
## [0.0.20] - 2024-05-22
### Added
- In order to improve interruptions we now compute a loudness level using
[pyloudnorm](https://github.com/csteinmetz1/pyloudnorm). The audio coming
WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm
applied to the signal, however we don't do that on our local PyAudio
signals. This means that currently incoming audio from PyAudio is kind of
broken. We will fix it in future releases.
### Fixed
- Fixed an issue where `StartInterruptionFrame` would cause
`LLMUserResponseAggregator` to push the accumulated text causing the LLM
respond in the wrong task. The `StartInterruptionFrame` should not trigger any
new LLM response because that would be spoken in a different task.
- Fixed an issue where tasks and threads could be paused because the executor
didn't have more tasks available. This was causing issues when cancelling and
recreating tasks during interruptions.
## [0.0.19] - 2024-05-20
### Changed
- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` internal
messages are now exposed through the `messages` property.
### Fixed
- Fixed an issue where `LLMAssistantResponseAggregator` was not accumulating the
full response but short sentences instead. If there's an interruption we only
accumulate what the bot has spoken until now in a long response as well.
## [0.0.18] - 2024-05-20
### Fixed
- Fixed an issue in `DailyOuputTransport` where transport messages were not
being sent.
## [0.0.17] - 2024-05-19
### Added
- Added `google.generativeai` model support, including vision. This new `google`
service defaults to using `gemini-1.5-flash-latest`. Example in
`examples/foundational/12a-describe-video-gemini-flash.py`.
- Added vision support to `openai` service. Example in
`examples/foundational/12a-describe-video-gemini-flash.py`.
- Added initial interruptions support. The assistant contexts (or aggregators)
should now be placed after the output transport. This way, only the completed
spoken context is added to the assistant context.
- Added `VADParams` so you can control voice confidence level and others.
- `VADAnalyzer` now uses an exponential smoothed volume to improve speech
detection. This is useful when voice confidence is high (because there's
someone talking near you) but volume is low.
### Fixed
- Fixed an issue where TTSService was not pushing TextFrames downstream.
- Fixed issues with Ctrl-C program termination.
- Fixed an issue that was causing `StopTaskFrame` to actually not exit the
`PipelineTask`.
## [0.0.16] - 2024-05-16
### Fixed
- `DailyTransport`: don't publish camera and audio tracks if not enabled.
- Fixed an issue in `BaseInputTransport` that was causing frames pushed
downstream not pushed in the right order.
## [0.0.15] - 2024-05-15
### Fixed
- Quick hot fix for receiving `DailyTransportMessage`.
## [0.0.14] - 2024-05-15
### Added
- Added `DailyTransport` event `on_participant_left`.
- Added support for receiving `DailyTransportMessage`.
### Fixed
- Images are now resized to the size of the output camera. This was causing
images not being displayed.
- Fixed an issue in `DailyTransport` that would not allow the input processor to
shutdown if no participant ever joined the room.
- Fixed base transports start and stop. In some situation processors would halt
or not shutdown properly.
## [0.0.13] - 2024-05-14
### Changed
- `MoondreamService` argument `model_id` is now `model`.
- `VADAnalyzer` arguments have been renamed for more clarity.
### Fixed
- Fixed an issue with `DailyInputTransport` and `DailyOutputTransport` that
could cause some threads to not start properly.
- Fixed `STTService`. Add `max_silence_secs` and `max_buffer_secs` to handle
better what's being passed to the STT service. Also add exponential smoothing
to the RMS.
- Fixed `WhisperSTTService`. Add `no_speech_prob` to avoid garbage output text.
## [0.0.12] - 2024-05-14
### Added
- Added `DailyTranscriptionSettings` to be able to specify transcription
settings much easier (e.g. language).
### Other
- Updated `simple-chatbot` with Spanish.
- Add missing dependencies in some of the examples.
## [0.0.11] - 2024-05-13
### Added
- Allow stopping pipeline tasks with new `StopTaskFrame`.
### Changed
- TTS, STT and image generation service now use `AsyncGenerator`.
### Fixed
- `DailyTransport`: allow registering for participant transcriptions even if
input transport is not initialized yet.
### Other
- Updated `storytelling-chatbot`.
## [0.0.10] - 2024-05-13
### Added
- Added Intel GPU support to `MoondreamService`.
- Added support for sending transport messages (e.g. to communicate with an app
at the other end of the transport).
- Added `FrameProcessor.push_error()` to easily send an `ErrorFrame` upstream.
### Fixed
- Fixed Azure services (TTS and image generation).
### Other
- Updated `simple-chatbot`, `moondream-chatbot` and `translation-chatbot`
examples.
## [0.0.9] - 2024-05-12
### Changed
Many things have changed in this version. Many of the main ideas such as frames,
processors, services and transports are still there but some things have changed
a bit.
- `Frame`s describe the basic units for processing. For example, text, image or
audio frames. Or control frames to indicate a user has started or stopped
speaking.
- `FrameProcessor`s process frames (e.g. they convert a `TextFrame` to an
`ImageRawFrame`) and push new frames downstream or upstream to their linked
peers.
- `FrameProcessor`s can be linked together. The easiest wait is to use the
`Pipeline` which is a container for processors. Linking processors allow
frames to travel upstream or downstream easily.
- `Transport`s are a way to send or receive frames. There can be local
transports (e.g. local audio or native apps), network transports
(e.g. websocket) or service transports (e.g. https://daily.co).
- `Pipeline`s are just a processor container for other processors.
- A `PipelineTask` know how to run a pipeline.
- A `PipelineRunner` can run one or more tasks and it is also used, for example,
to capture Ctrl-C from the user.
## [0.0.8] - 2024-04-11
### Added
- Added `FireworksLLMService`.
- Added `InterimTranscriptionFrame` and enable interim results in
`DailyTransport` transcriptions.
### Changed
- `FalImageGenService` now uses new `fal_client` package.
### Fixed
- `FalImageGenService`: use `asyncio.to_thread` to not block main loop when
generating images.
- Allow `TranscriptionFrame` after an end frame (transcriptions can be delayed
and received after `UserStoppedSpeakingFrame`).
## [0.0.7] - 2024-04-10
### Added
- Add `use_cpu` argument to `MoondreamService`.
## [0.0.6] - 2024-04-10
### Added
- Added `FalImageGenService.InputParams`.
- Added `URLImageFrame` and `UserImageFrame`.
- Added `UserImageRequestFrame` and allow requesting an image from a participant.
- Added base `VisionService` and `MoondreamService`
### Changed
- Don't pass `image_size` to `ImageGenService`, images should have their own size.
- `ImageFrame` now receives a tuple`(width,height)` to specify the size.
- `on_first_other_participant_joined` now gets a participant argument.
### Fixed
- Check if camera, speaker and microphone are enabled before writing to them.
### Performance
- `DailyTransport` only subscribe to desired participant video track.
## [0.0.5] - 2024-04-06
### Changed
- Use `camera_bitrate` and `camera_framerate`.
- Increase `camera_framerate` to 30 by default.
### Fixed
- Fixed `LocalTransport.read_audio_frames`.
## [0.0.4] - 2024-04-04
### Added
- Added project optional dependencies `[silero,openai,...]`.
### Changed
- Moved thransports to its own directory.
- Use `OPENAI_API_KEY` instead of `OPENAI_CHATGPT_API_KEY`.
### Fixed
- Don't write to microphone/speaker if not enabled.
### Other
- Added live translation example.
- Fix foundational examples.
## [0.0.3] - 2024-03-13
### Other
- Added `storybot` and `chatbot` examples.
## [0.0.2] - 2024-03-12
Initial public release.

62
CHANGELOG.md.template Normal file
View File

@@ -0,0 +1,62 @@
# Changelog
All notable changes to the **<project name>** SDK will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Please make sure to add your changes to the appropriate categories:
## [Unreleased]
### Added
<!-- for new functionality -->
- n/a
### Changed
<!-- for changed functionality -->
- n/a
### Deprecated
<!-- for soon-to-be removed functionality -->
- n/a
### Removed
<!-- for removed functionality -->
- n/a
### Fixed
<!-- for fixed bugs -->
- n/a
### Performance
<!-- for performance-relevant changes -->
- n/a
### Security
<!-- for security-relevant changes -->
- n/a
### Other
<!-- for everything else -->
- n/a
## [0.1.0] - YYYY-MM-DD
Initial release.

183
README.md
View File

@@ -1,119 +1,164 @@
# dailyai — an open source framework for real-time, multi-modal, conversational AI applications
<div align="center">
 <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div>
Build things like this:
# Pipecat
[![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0)
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Discord](https://img.shields.io/discord/1239284677165056021
)](https://discord.gg/pipecat)
**`dailyai` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, [story-telling toys for kids](https://storytelling-chatbot.fly.dev/), customer support bots, [intake flows](https://www.youtube.com/watch?v=lDevgsp9vn0), and snarky social companions.
In 2023 a *lot* of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
- low-latency, reliable audio transport
- echo cancellation
- phrase endpointing (knowing when the bot should respond to human speech)
- interruptibility
- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models
Take a look at some example apps:
As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
<p float="left">
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/storytelling-chatbot/image.png" width="280" /></a>
<br/>
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/translation-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="280" /></a>
</p>
Today, `dailyai` is:
## Getting started with voice agents
1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
2. transport services that moves audio, video, and events across the Internet
3. implementations of specific generative AI services
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.
Currently implemented services:
- Speech-to-text
- Deepgram
- Whisper
- LLMs
- Azure
- Fireworks
- OpenAI
- Image generation
- Azure
- Fal
- OpenAI
- Text-to-speech
- Azure
- Deepgram
- ElevenLabs
- Transport
- Daily
- Local (in progress, intended as a quick start example service)
- Vision
- Moondream
If you'd like to [implement a service]((https://github.com/daily-co/daily-ai-sdk/tree/main/src/dailyai/services)), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
## Getting started
Today, the easiest way to get started with `dailyai` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations. (The [transport base class](https://github.com/daily-co/daily-ai-sdk/blob/main/src/dailyai/transports/abstract_transport.py) is easy to extend, though, so feel free to submit PRs if you'd like to implement another transport service.)
```
```shell
# install the module
pip install dailyai
pip install pipecat-ai
# set up an .env file with API keys
cp dot-env.template .env
```
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional
dependencies that you can install with:
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:
```
pip install "dailyai[option,...]"
```shell
pip install "pipecat-ai[option,...]"
```
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
- **Transports**: `daily`, `local`, `websocket`
- **AI services**: `anthropic`, `azure`, `deepgram`, `google`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
- **Transports**: `local`, `websocket`, `daily`
## Code examples
There are two directories of examples:
- [foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development
- [foundational](https://github.com/daily-co/daily-ai-sdk/tree/main/examples/foundational) — demos that build on each other, introducing one or two concepts at a time
- [starter apps](https://github.com/daily-co/daily-ai-sdk/tree/main/examples/starter-apps) — complete applications that you can use as starting points for development
## A simple voice agent running locally
Before running the examples you need to install the dependencies (which will install all the dependencies to run all of the examples):
Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use [Daily](https://daily.co) for real-time media transport, and [ElevenLabs](https://elevenlabs.io/) for text-to-speech.
```
pip install -r {env}-requirements.txt
```python
#app.py
import asyncio
import aiohttp
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
async def main():
async with aiohttp.ClientSession() as session:
# Use Daily as a real-time media transport (WebRTC)
transport = DailyTransport(
room_url=...,
token=...,
"Bot Name",
DailyParams(audio_out_enabled=True))
# Use Eleven Labs for Text-to-Speech
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=...,
voice_id=...,
)
# Simple pipeline that will process text to speech and output the result
pipeline = Pipeline([tts, transport.output()])
# Create Pipecat processor that can run one or more pipelines tasks
runner = PipelineRunner()
# Assign the task callable to run the pipeline
task = PipelineTask(pipeline)
# Register an event handler to play audio when a
# participant joins the transport WebRTC session
@transport.event_handler("on_participant_joined")
async def on_new_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
# Queue a TextFrame that will get spoken by the TTS service (Eleven Labs)
await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
# Run the pipeline task
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())
```
To run the example below you need to sign up for a [free Daily account](https://dashboard.daily.co/u/signup) and create a Daily room (so you can hear the LLM talking). After that, join the room's URL directly from a browser tab and run:
Run it with:
```shell
python app.py
```
python examples/foundational/02-llm-say-one-thing.py
Daily provides a prebuilt WebRTC user interface. Whilst the app is running, you can visit at `https://<yourdomain>.daily.co/<room_url>` and listen to the bot say hello!
## WebRTC for production use
WebSockets are fine for server-to-server communication or for initial development. But for production use, youll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/#webrtc))
One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard.
## What is VAD?
Voice Activity Detection &mdash; very important for knowing when a user has finished speaking to your bot. If you are not using press-to-talk, and want Pipecat to detect when the user has finished talking, VAD is an essential component for a natural feeling conversation.
Pipecast makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.
```shell
pip install pipecat-ai[silero]
```
The first time your run your bot with Silero, startup may take a while whilst it downloads and caches the model in the background. You can check the progress of this in the console.
## Hacking on the framework itself
_Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_
```
```shell
python3 -m venv venv
source venv/bin/activate
```
From the root of this repo, run the following:
```
pip install -r {env}-requirements.txt -r dev-requirements.txt
```shell
pip install -r dev-requirements.txt -r {env}-requirements.txt
python -m build
```
This builds the package. To use the package locally (eg to run sample files), run
```
```shell
pip install --editable .
```
If you want to use this package from another directory, you can run:
```
```shell
pip install path_to_this_repo
```
@@ -121,7 +166,7 @@ pip install path_to_this_repo
From the root directory, run:
```
```shell
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests
```
@@ -168,3 +213,9 @@ Install the
"--max-line-length=100"
],
```
## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)
➡️ [Reach us on X](https://x.com/pipecat_ai)

View File

@@ -1,6 +1,7 @@
autopep8==2.0.4
build==1.0.3
pip-tools==7.4.1
pytest==8.1.1
setuptools==69.2.0
setuptools_scm==8.0.4
autopep8~=2.1.0
build~=1.2.1
grpcio-tools~=1.62.2
pip-tools~=7.4.1
pytest~=8.2.0
setuptools~=69.5.1
setuptools_scm~=8.1.0

View File

@@ -1,17 +1,10 @@
# Daily AI SDK Docs
# Pipecat Docs
## [Architecture Overview](architecture.md)
Learn about the thinking behind the SDK's design.
Learn about the thinking behind the framework's design.
## [A Frame's Progress](frame-progress.md)
See how a Frame is processed through a Transport, a Pipeline, and a series of Frame Processors.
## [Example Code](examples/)
The repo includes several example apps in the `examples` directory. The docs explain how they work.
## [API Reference](api/)
Complete documentation of the available classes and methods in the SDK.

View File

@@ -1,4 +1,4 @@
# Daily AI SDK Architecture Guide
# Pipecat architecture guide
## Frames
@@ -10,8 +10,8 @@ Frame processors operate on frames. Every frame processor implements a `process_
## Pipelines
Pipelines are lists of frame processors that read from a source queue and send the processed frames to a sink queue. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport's send queue as its sync. Placing LLM message frames on the pipeline's source queue will cause the LLM's response to be spoken. See example #2 for an implementation of this.
Pipelines are lists of frame processors linked together. Frame processors can push frames upstream or downstream to their peers. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport as an output.
## Transports
Transports provide a receive queue, which is input from "the outside world", and a sink queue, which is data that will be sent "to the outside world". The `LocalTransportService` does this with the local camera, mic, display and speaker. The `DailyTransportService` does this with a WebRTC session joined to a Daily.co room.
Transports provide input and output frame processors to receive or send frames respectively. For example, the `DailyTransport` does this with a WebRTC session joined to a Daily.co room.

View File

@@ -1,119 +0,0 @@
# 01: Say One Thing
_video here - youtube?_
This example uses a text-to-speech (TTS) service to say one predefined sentence. But first, a quick overview of the general structure of these examples.
## Running the demos
All of the demos have something like this at the bottom of the file:
```python
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
```
### `configure()`
The `configure()` function comes from `examples/foundational/support/runner.py`, and it allows you to configure the examples from the command line directly, or using environment variables:
```bash
python 01-say-one-thing.py -u https://YOUR_DOMAIN.daily.co/YOUR_ROOM -k YOUR_API_KEY
# or
DAILY_ROOM_URL=https://YOUR_DOMAIN.daily.co/YOUR_ROOM DAILY_API_KEY=YOUR_API_KEY python 01-say-one-thing.py
# or set DAILY_ROOM_URL and DAILY_API_KEY in a .env file
python 01-say-one-thing.py
```
You'll need a Daily account to run these demos. You can sign up for free at [daily.co](https://daily.co). Once you've signed up you can create a room from the [Dashboard](https://dashboard.daily.co/rooms), and grab [your API key](https://dashboard.daily.co/developers) while you're there.
Some functionality (such as transcription) requires the bot to have owner privileges in the room. `runner.py` uses the Daily REST API to create a meeting token with owner privileges. You can learn more about meeting tokens in the [Daily docs](https://docs.daily.co/reference/rest-api/meeting-tokens).
### `asyncio.run()`
The AI SDK makes heavy use of Python's `asyncio` module. [This is a reasonable intro to the topic](https://builtin.com/data-science/asyncio) if you haven't worked with `asyncio` and coroutines before.
You can learn a bit more about the specifics of how the Daily AI SDK uses coroutines in the [Architecture Guide](../architecture.md).
## The `main()` function
All of the examples have a `main()` function with a similar structure:
- Configure the transport
- Configure the AI service(s) used in the demo
- Configure any event listeners
- Define a processing pipeline
- Run the example's coroutine(s)
### Configuring the transport
The first section of the `main()` function configures the transport object:
```python
meeting_duration_minutes = 5
transport = DailyTransportService(
room_url,
None,
"Say One Thing",
meeting_duration_minutes,
)
transport.mic_enabled = True
```
The [Architecture Guide](../architecture.md) explains the transport object in more detail. In this case, we're configuring a Daily transport object and enabling the virtual microphone, so our bot can play audio.
### Configuring the services
As described in the [Architecture Guide](../architecture.md), 'a 'Service' is a class that processes 'Frames' as part of a 'Pipeline'. In this demo app, we'll only need one service: a text-to-speech generator. We can create an instance of the `ElevenLabsTTSService` class with this line of code:
```python
tts = ElevenLabsTTSService(aiohttp_session=session, api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
```
You'll need to make sure and set those environment variables somewhere. The easiest way to do that is to copy the `example.env` file in the repo and rename it to `.env`, and then add your credentials to that file. `runner.py` loads the `python-dotenv` module and initializes it, making the values in that file available in the environment.
### Configuring event listeners
This part isn't strictly necessary for an app like this. You could include the contents of the `on_participant_joined` function directly in the body of the `main()` function, and it would run as soon as you started the script from the command line.
Instead, we can use an event handler to wait to run that code until someone else joins the meeting. We'll define a function called `greet_user()`, and use the `@transport.event_handler("on_participant_joined")` decorator to tell the SDK that we want to run that function whenever a user joins the room.
```python
@transport.event_handler("on_participant_joined")
async def greet_user(transport, participant):
if participant["info"]["isLocal"]:
return
await tts.say(
"Hello there, " + participant["info"]["userName"] + "!",
transport.send_queue,
)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
```
### Defining a processing pipeline
In this example, we don't actually have much of a processing pipeline! In fact, we're doing the whole thing inside the `greet_user()` function already.
Pipelines usually look like a bunch of nested calls to the `run()` or `run_to_queue()` function from different Services. In this example, we're using the `say()` function from the TTS service. This is effectively a convenience wrapper around the `run_to_queue()` function, which we'll discuss more later. It's important to `await` this function to ensure that the speech frames are queued for playback before the next line of code, because of the `stop_when_done()` function being called immediately afterward.
The output of the `say()` function goes to the transport's `send_queue`. This queue is the all-important connection between the world of the Services pipeline that's generating frames asynchronously and the ordered playback of audio and visual media in the WebRTC call.
### Running the coroutines
In this example, we don't actually have any separate processing pipelines—everything happens as a result of an event from the transport. So we only need to run the transport's coroutine, and await its completion:
```python
await transport.run()
```
In future examples, we'll run more processes in parallel. For now, this script can run until the transport exits—which will happen based on calling `stop_when_done()` in the `greet_user()` function.
## Next Steps
Next, we'll start connecting multiple AI services together by building a service pipeline.
## [02 - LLM Say One Thing »](02-llm-say-one-thing.md)

View File

@@ -1,5 +0,0 @@
# Daily AI SDK Examples
The docs in this folder pair with the example apps located in `examples/foundational`. They are designed to serve as a quick references for building different kinds of AI apps. But the examples also build on one another, so it can be really helpful to walk through them in order.
To start, you can learn about the overall structure of the examples in [01 - Say One Thing](01-say-one-thing.md).

84
examples/README.md Normal file
View File

@@ -0,0 +1,84 @@
# Pipecat &mdash; Examples
## Foundational snippets
Small snippets that build on each other, introducing one or two concepts at a time.
➡️ [Take a look](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational)
## Chatbot examples
Collection of self-contained real-time voice and video AI demo applications built with Pipecat.
### Quickstart
Each project has its own set of dependencies and configuration variables. They intentionally avoids shared code across projects &mdash; you can grab whichever demo folder you want to work with as a starting point.
We recommend you start with a virtual environment:
```shell
cd pipecat-ai/examples/simple-chatbot
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
Next, follow the steps in the README for each demo.
Make sure you `pip install -r requirements.txt` for each demo project, so you can be sure to have the necessary service dependencies that extend the functionality of Pipecat. You can read more about the framework architecture [here](https://github.com/pipecat-ai/pipecat/tree/main/docs).
## Projects:
| Project | Description | Services |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, OpenAI, Daily, Daily Prebuilt UI |
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, Open AI, Fal, Daily, Custom UI |
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| Function-calling Chatbot (TBC) | A chatbot that can call functions in response to user input | Deepgram, OpenAI, Fireworks, Daily, Daily Prebuilt UI |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.
> It provides a quick way to join a real-time session with your bot and test your ideas without building any frontend code. If you'd like to see an example of a custom UI, try Storybot.
## FAQ
### Deployment
For each of these demos we've included a `Dockerfile`. Out of the box, this should provide everything needed to get the respective demo running on a VM:
```shell
docker build username/app:tag .
docker run -p 7860:7860 --env-file ./.env username/app:tag
docker push ...
```
### SSL
If you're working with a custom UI (such as with the Storytelling Chatbot), it's important to ensure your deployment platform supports HTTPS, as accessing user devices such as mics and webcams requires SSL.
If you try to run a custom UI without SSL, you may see an error in the console telling you that `navigator` is undefined, or no devices are available.
### Are these examples production ready?
Yes, kind of.
These demos attempt to keep things simple and are unopinionated regarding environment or scalability.
We're using FastAPI to spawn a subprocess for the bots / agents &mdash; useful for small tests, but not so great for production grade apps with many concurrent users. You can see how this works in each project's `start` endpoint in `server.py`.
Creating virtualized worker pools and on-demand instances is out of scope for these examples, but we hope to add some examples to this repo soon!
For projects that have CUDA as a requirement, such as Moondream Chatbot, be sure to deploy to a GPU-powered platform (such as [fly.io](https://fly.io) or [Runpod](https://runpod.io).)
## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)
➡️ [Reach us on Twitter](https://x.com/pipecat_ai)

View File

@@ -1,31 +1,36 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frames import EndFrame, TextFrame
from dailyai.pipeline.pipeline import Pipeline
import sys
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Say One Thing",
mic_enabled=True,
)
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
@@ -33,21 +38,18 @@ async def main(room_url):
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([tts])
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
if participant["info"]["isLocal"]:
return
async def on_new_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
await pipeline.queue_frames([TextFrame("Hello there, " + participant_name + "!"), EndFrame()])
await transport.run(pipeline)
del tts
await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()

View File

@@ -0,0 +1,53 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([tts, transport.output()])
task = PipelineTask(pipeline)
async def say_something():
await asyncio.sleep(1)
await task.queue_frames([TextFrame("Hello there!"), EndFrame()])
runner = PipelineRunner()
await asyncio.gather(runner.run(task), say_something())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,38 +0,0 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.transports.local_transport import LocalTransport
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 1
transport = LocalTransport(
duration_minutes=meeting_duration_minutes, mic_enabled=True
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
async def say_something():
await asyncio.sleep(1)
await transport.say("Hello there.", tts)
await transport.stop_when_done()
await asyncio.gather(transport.run(), say_something())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,23 +1,31 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import logging
import aiohttp
import os
import sys
from dailyai.pipeline.frames import EndFrame, LLMMessagesFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.open_ai_services import OpenAILLMService
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
@@ -26,8 +34,7 @@ async def main(room_url):
room_url,
None,
"Say One Thing From an LLM",
mic_enabled=True,
)
DailyParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
@@ -37,7 +44,7 @@ async def main(room_url):
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
model="gpt-4o")
messages = [
{
@@ -45,13 +52,15 @@ async def main(room_url):
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}]
pipeline = Pipeline([llm, tts])
runner = PipelineRunner()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await pipeline.queue_frames([LLMMessagesFrame(messages), EndFrame()])
task = PipelineTask(Pipeline([llm, tts, transport.output()]))
await transport.run(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
await runner.run(task)
if __name__ == "__main__":

View File

@@ -1,21 +1,30 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import logging
import os
import sys
from dailyai.pipeline.frames import TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.fal_ai_services import FalImageGenService
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
@@ -24,10 +33,11 @@ async def main(room_url):
room_url,
None,
"Show a still frame image",
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=1
DailyParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024
)
)
imagegen = FalImageGenService(
@@ -38,19 +48,19 @@ async def main(room_url):
key=os.getenv("FAL_KEY"),
)
pipeline = Pipeline([imagegen])
runner = PipelineRunner()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
task = PipelineTask(Pipeline([imagegen, transport.output()]))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Note that we do not put an EndFrame() item in the pipeline for this demo.
# This means that the bot will stay in the channel until it times out.
# An EndFrame() in the pipeline would cause the transport to shut
# down.
await pipeline.queue_frames(
[TextFrame("a cat in the style of picasso")]
)
await task.queue_frames([TextFrame("a cat in the style of picasso")])
await transport.run(pipeline)
await runner.run(task)
if __name__ == "__main__":

View File

@@ -1,58 +0,0 @@
import asyncio
import aiohttp
import logging
import os
import tkinter as tk
from dailyai.pipeline.frames import TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.transports.local_transport import LocalTransport
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 2
tk_root = tk.Tk()
tk_root.title("dailyai")
transport = LocalTransport(
tk_root=tk_root,
mic_enabled=False,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=meeting_duration_minutes,
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
pipeline = Pipeline([imagegen])
await pipeline.queue_frames([TextFrame("a cat in the style of picasso")])
async def run_tk():
while not transport._stop_threads.is_set():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(transport.run(pipeline, override_pipeline_source_queue=False), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,68 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import tkinter as tk
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Picasso Cat")
transport = TkLocalTransport(
tk_root,
TransportParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024))
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
pipeline = Pipeline([imagegen, transport.output()])
task = PipelineTask(pipeline)
await task.queue_frames([TextFrame("a cat in the style of picasso")])
runner = PipelineRunner()
async def run_tk():
while runner.is_active():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,37 +1,40 @@
import asyncio
import logging
import os
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
from dailyai.pipeline.merge_pipeline import SequentialMergePipeline
from dailyai.pipeline.pipeline import Pipeline
import asyncio
import os
import sys
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.deepgram_ai_services import DeepgramTTSService
from dailyai.pipeline.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from pipecat.pipeline.merge_pipeline import SequentialMergePipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.frames.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.task import PipelineTask
from pipecat.services.azure import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.transport_services import TransportServiceOutput
from pipecat.services.transports.daily_transport import DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Static And Dynamic Speech",
duration_minutes=1,
mic_enabled=True,
mic_sample_rate=16000,
)
transport = DailyTransport(room_url, None, "Static And Dynamic Speech")
meeting = TransportServiceOutput(transport, mic_enabled=True)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
@@ -43,10 +46,6 @@ async def main(room_url: str):
region=os.getenv("AZURE_SPEECH_REGION"),
)
deepgram_tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
)
elevenlabs_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
@@ -56,11 +55,13 @@ async def main(room_url: str):
messages = [{"role": "system",
"content": "tell the user a joke about llamas"}]
# Start a task to run the LLM to create a joke, and convert the LLM output to audio frames. This task
# will run in parallel with generating and speaking the audio for static text, so there's no delay to
# speak the LLM response.
# Start a task to run the LLM to create a joke, and convert the LLM
# output to audio frames. This task will run in parallel with generating
# and speaking the audio for static text, so there's no delay to speak
# the LLM response.
llm_pipeline = Pipeline([llm, elevenlabs_tts])
await llm_pipeline.queue_frames([LLMMessagesFrame(messages), EndPipeFrame()])
llm_task = PipelineTask(llm_pipeline)
await llm_task.queue_frames([LLMMessagesFrame(messages), EndPipeFrame()])
simple_tts_pipeline = Pipeline([azure_tts])
await simple_tts_pipeline.queue_frames(

View File

@@ -1,64 +1,74 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import logging
import sys
from dataclasses import dataclass
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import (
GatedAggregator,
LLMFullResponseAggregator,
ParallelPipeline,
SentenceAggregator,
)
from dailyai.pipeline.frames import (
Frame,
TextFrame,
from pipecat.frames.frames import (
AppFrame,
EndFrame,
ImageFrame,
Frame,
ImageRawFrame,
LLMFullResponseStartFrame,
LLMMessagesFrame,
LLMResponseStartFrame,
TextFrame
)
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.aggregators.gated import GatedAggregator
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.aggregators.parallel_task import ParallelTask
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
@dataclass
class MonthFrame(Frame):
class MonthFrame(AppFrame):
month: str
def __str__(self):
return f"{self.name}(month: {self.month})"
class MonthPrepender(FrameProcessor):
def __init__(self):
super().__init__()
self.most_recent_month = "Placeholder, month frame not yet received"
self.prepend_to_next_text_frame = False
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, MonthFrame):
self.most_recent_month = frame.month
elif self.prepend_to_next_text_frame and isinstance(frame, TextFrame):
yield TextFrame(f"{self.most_recent_month}: {frame.text}")
await self.push_frame(TextFrame(f"{self.most_recent_month}: {frame.text}"))
self.prepend_to_next_text_frame = False
elif isinstance(frame, LLMResponseStartFrame):
elif isinstance(frame, LLMFullResponseStartFrame):
self.prepend_to_next_text_frame = True
yield frame
await self.push_frame(frame)
else:
yield frame
await self.push_frame(frame, direction)
async def main(room_url):
@@ -67,11 +77,12 @@ async def main(room_url):
room_url,
None,
"Month Narration Bot",
mic_enabled=True,
camera_enabled=True,
mic_sample_rate=16000,
camera_width=1024,
camera_height=1024,
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024
)
)
tts = ElevenLabsTTSService(
@@ -82,7 +93,7 @@ async def main(room_url):
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
model="gpt-4o")
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
@@ -93,24 +104,25 @@ async def main(room_url):
)
gated_aggregator = GatedAggregator(
gate_open_fn=lambda frame: isinstance(
frame, ImageFrame), gate_close_fn=lambda frame: isinstance(
frame, LLMResponseStartFrame), start_open=False, )
gate_open_fn=lambda frame: isinstance(frame, ImageRawFrame),
gate_close_fn=lambda frame: isinstance(frame, LLMFullResponseStartFrame),
start_open=False
)
sentence_aggregator = SentenceAggregator()
month_prepender = MonthPrepender()
llm_full_response_aggregator = LLMFullResponseAggregator()
pipeline = Pipeline(
processors=[
llm,
sentence_aggregator,
ParallelPipeline(
[[month_prepender, tts], [llm_full_response_aggregator, imagegen]]
),
gated_aggregator,
],
)
pipeline = Pipeline([
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
ParallelTask( # Run pipelines in parallel aggregating the result
[month_prepender, tts], # Create "Month: sentence" and output audio
[llm_full_response_aggregator, imagegen] # Aggregate full LLM response
),
gated_aggregator, # Queues everything until an image is available
transport.output() # Transport output
])
frames = []
for month in [
@@ -133,13 +145,18 @@ async def main(room_url):
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
frames.append(MonthFrame(month))
frames.append(MonthFrame(month=month))
frames.append(LLMMessagesFrame(messages))
frames.append(EndFrame())
await pipeline.queue_frames(frames)
await transport.run(pipeline, override_pipeline_source_queue=False)
runner = PipelineRunner()
task = PipelineTask(pipeline)
await task.queue_frames(frames)
await runner.run(task)
if __name__ == "__main__":

View File

@@ -0,0 +1,168 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import tkinter as tk
from pipecat.frames.frames import AudioRawFrame, Frame, URLImageRawFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Calendar")
runner = PipelineRunner()
async def get_month_data(month):
messages = [{"role": "system", "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.", }]
class ImageDescription(FrameProcessor):
def __init__(self):
super().__init__()
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
class AudioGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.audio = bytearray()
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, AudioRawFrame):
self.audio.extend(frame.audio)
self.frame = AudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels)
class ImageGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, URLImageRawFrame):
self.frame = frame
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"))
aggregator = LLMFullResponseAggregator()
description = ImageDescription()
audio_grabber = AudioGrabber()
image_grabber = ImageGrabber()
pipeline = Pipeline([
llm,
aggregator,
description,
ParallelPipeline([tts, audio_grabber],
[imagegen, image_grabber])
])
task = PipelineTask(pipeline)
await task.queue_frame(LLMMessagesFrame(messages))
await task.stop_when_done()
await runner.run(task)
return {
"month": month,
"text": description.text,
"image": image_grabber.frame,
"audio": audio_grabber.frame,
}
transport = TkLocalTransport(
tk_root,
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024))
pipeline = Pipeline([transport.output()])
task = PipelineTask(pipeline)
# We only specify 5 months as we create tasks all at once and we might
# get rate limited otherwise.
months: list[str] = [
"January",
"February",
# "March",
# "April",
# "May",
]
# We create one task per month. This will be executed concurrently.
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
# Now we wait for each month task in the order they're completed. The
# benefit is we'll have as little delay as possible before the first
# month, and likely no delay between months, but the months won't
# display in order.
async def show_images(month_tasks):
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
await task.queue_frames([data["image"], data["audio"]])
await runner.stop_when_done()
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), show_images(month_tasks), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,146 +0,0 @@
import aiohttp
import asyncio
import logging
import tkinter as tk
import os
from dailyai.pipeline.aggregators import LLMFullResponseAggregator
from dailyai.pipeline.frames import AudioFrame, URLImageFrame, LLMMessagesFrame, TextFrame
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.transports.local_transport import LocalTransport
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
meeting_duration_minutes = 5
tk_root = tk.Tk()
tk_root.title("dailyai")
transport = LocalTransport(
mic_enabled=True,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
duration_minutes=meeting_duration_minutes,
tk_root=tk_root,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="1024x1024"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
# Get a complete audio chunk from the given text. Splitting this into its own
# coroutine lets us ensure proper ordering of the audio chunks on the
# send queue.
async def get_all_audio(text):
all_audio = bytearray()
async for audio in tts.run_tts(text):
all_audio.extend(audio)
return all_audio
async def get_month_description(aggregator, frame):
async for frame in aggregator.process_frame(frame):
if isinstance(frame, TextFrame):
return frame.text
async def get_month_data(month):
messages = [{"role": "system", "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.", }]
messages_frame = LLMMessagesFrame(messages)
llm_full_response_aggregator = LLMFullResponseAggregator()
image_description = None
async for frame in llm.process_frame(messages_frame):
result = await get_month_description(llm_full_response_aggregator, frame)
if result:
image_description = result
break
if not image_description:
return
to_speak = f"{month}: {image_description}"
audio_task = asyncio.create_task(get_all_audio(to_speak))
image_task = asyncio.create_task(
imagegen.run_image_gen(image_description))
(audio, image_data) = await asyncio.gather(audio_task, image_task)
return {
"month": month,
"text": image_description,
"image_url": image_data[0],
"image": image_data[1],
"image_size": image_data[2],
"audio": audio,
}
# We only specify 5 months as we create tasks all at once and we might
# get rate limited otherwise.
months: list[str] = [
"January",
"February",
"March",
"April",
"May",
]
async def show_images():
# This will play the months in the order they're completed. The benefit
# is we'll have as little delay as possible before the first month, and
# likely no delay between months, but the months won't display in
# order.
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
if data:
await transport.send_queue.put(
[
URLImageFrame(data["image_url"], data["image"], data["image_size"]),
AudioFrame(data["audio"]),
]
)
await asyncio.sleep(25)
# wait for the output queue to be empty, then leave the meeting
await transport.stop_when_done()
async def run_tk():
while not transport._stop_threads.is_set():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
month_tasks = [
asyncio.create_task(
get_month_data(month)) for month in months]
await asyncio.gather(transport.run(), show_images(), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,26 +1,37 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frames import LLMMessagesFrame
from dailyai.pipeline.pipeline import Pipeline
import sys
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.ai_services import FrameLogger
from dailyai.pipeline.aggregators import (
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
@@ -29,12 +40,12 @@ async def main(room_url: str, token):
room_url,
token,
"Respond bot",
duration_minutes=5,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
vad_enabled=True,
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
@@ -45,38 +56,46 @@ async def main(room_url: str, token):
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
fl = FrameLogger("Inner")
fl2 = FrameLogger("Outer")
model="gpt-4o")
fl = FrameLogger("!!! after LLM", "red")
fltts = FrameLogger("@@@ out of tts", "green")
flend = FrameLogger("### out of the end", "magenta")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
processors=[
fl,
tma_in,
llm,
fl2,
tts,
tma_out,
],
)
pipeline = Pipeline([
transport.input(),
tma_in,
llm,
fl,
tts,
fltts,
transport.output(),
tma_out,
flend
])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
task = PipelineTask(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await pipeline.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([LLMMessagesFrame(messages)])
await transport.run(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":

View File

@@ -1,43 +1,60 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import logging
from typing import AsyncGenerator
import aiohttp
import os
import sys
from PIL import Image
from dailyai.pipeline.frames import ImageFrame, Frame, TextFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.ai_services import AIService
from dailyai.pipeline.aggregators import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
from pipecat.frames.frames import ImageRawFrame, Frame, SystemFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from pipecat.transports.services.daily import DailyParams
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class ImageSyncAggregator(AIService):
class ImageSyncAggregator(FrameProcessor):
def __init__(self, speaking_path: str, waiting_path: str):
super().__init__()
self._speaking_image = Image.open(speaking_path)
self._speaking_image_format = self._speaking_image.format
self._speaking_image_bytes = self._speaking_image.tobytes()
self._waiting_image = Image.open(waiting_path)
self._waiting_image_format = self._waiting_image.format
self._waiting_image_bytes = self._waiting_image.tobytes()
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
yield ImageFrame(self._speaking_image_bytes, (1024, 1024))
yield frame
yield ImageFrame(self._waiting_image_bytes, (1024, 1024))
async def process_frame(self, frame: Frame, direction: FrameDirection):
if not isinstance(frame, SystemFrame):
await self.push_frame(ImageRawFrame(image=self._speaking_image_bytes, size=(1024, 1024), format=self._speaking_image_format))
await self.push_frame(frame)
await self.push_frame(ImageRawFrame(image=self._waiting_image_bytes, size=(1024, 1024), format=self._waiting_image_format))
else:
await self.push_frame(frame)
async def main(room_url: str, token):
@@ -46,12 +63,14 @@ async def main(room_url: str, token):
room_url,
token,
"Respond bot",
5,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
mic_enabled=True,
mic_sample_rate=16000,
DailyParams(
audio_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
@@ -62,32 +81,44 @@ async def main(room_url: str, token):
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so it should not include any special characters. Respond to what the user said in a creative and helpful way.",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
)
pipeline = Pipeline([image_sync_aggregator, tma_in, llm, tma_out, tts])
pipeline = Pipeline([
transport.input(),
image_sync_aggregator,
tma_in,
llm,
tts,
transport.output(),
tma_out
])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await pipeline.queue_frames([TextFrame("Hi, I'm listening!")])
task = PipelineTask(pipeline)
await transport.run(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([TextFrame(f"Hi, this is {participant_name}.")])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":

View File

@@ -1,26 +1,34 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.aggregators import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
import sys
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.ai_services import FrameLogger
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
@@ -29,12 +37,12 @@ async def main(room_url: str, token):
room_url,
token,
"Respond bot",
duration_minutes=5,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
vad_enabled=True,
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
@@ -45,29 +53,40 @@ async def main(room_url: str, token):
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
model="gpt-4o")
pipeline = Pipeline([FrameLogger(), llm, FrameLogger(), tts])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await transport.say("Hi, I'm listening!", tts)
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
async def run_conversation():
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
await transport.run_interruptible_pipeline(
pipeline,
post_processor=LLMAssistantResponseAggregator(messages),
pre_processor=LLMUserResponseAggregator(messages),
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
await asyncio.gather(transport.run(), run_conversation())
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":

View File

@@ -0,0 +1,95 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-opus-20240229")
# todo: think more about how to handle system prompts in a more general way. OpenAI,
# Google, and Anthropic all have slightly different approaches to providing a system
# prompt.
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative, helpful, and brief way. Say hello.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,125 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.processors.frameworks.langchain import LangchainProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
message_store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in message_store:
message_store[session_id] = ChatMessageHistory()
return message_store[session_id]
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
prompt = ChatPromptTemplate.from_messages(
[
("system",
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o", temperature=0.7)
history_chain = RunnableWithMessageHistory(
chain,
get_session_history,
history_messages_key="chat_history",
input_messages_key="input")
lc = LangchainProcessor(history_chain)
tma_in = LLMUserResponseAggregator()
tma_out = LLMAssistantResponseAggregator()
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
lc, # Langchain
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
lc.set_participant_id(participant["id"])
# Kick off the conversation.
# the `LLMMessagesFrame` will be picked up by the LangchainProcessor using
# only the content of the last message to inject it in the prompt defined
# above. So no role is required here.
messages = [(
{
"content": "Please briefly introduce yourself to the user."
}
)]
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,94 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-helios-en"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,95 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=44100,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_name="British Lady",
output_format="pcm_44100"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -3,14 +3,14 @@ import aiohttp
import asyncio
import logging
import os
from dailyai.pipeline.aggregators import SentenceAggregator
from dailyai.pipeline.pipeline import Pipeline
from pipecat.pipeline.aggregators import SentenceAggregator
from pipecat.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from pipecat.transports.daily_transport import DailyTransport
from pipecat.services.azure_ai_services import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs_ai_services import ElevenLabsTTSService
from pipecat.services.fal_ai_services import FalImageGenService
from pipecat.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from runner import configure
@@ -18,7 +18,7 @@ from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger = logging.getLogger("pipecat")
logger.setLevel(logging.DEBUG)
@@ -92,7 +92,7 @@ async def main(room_url: str):
if isinstance(frame, TextFrame):
message += frame.text
elif isinstance(frame, AudioFrame):
all_audio.extend(frame.data)
all_audio.extend(frame.audio)
return (message, all_audio)

View File

@@ -0,0 +1,54 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.services.daily import DailyTransport, DailyParams
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url, token):
transport = DailyTransport(
room_url, token, "Test",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
camera_out_width=1280,
camera_out_height=720
)
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
pipeline = Pipeline([transport.input(), transport.output()])
runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,66 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
import tkinter as tk
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url, token):
tk_root = tk.Tk()
tk_root.title("Local Mirror")
daily_transport = DailyTransport(room_url, token, "Test", DailyParams(audio_in_enabled=True))
tk_transport = TkLocalTransport(
tk_root,
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
camera_out_width=1280,
camera_out_height=720))
@daily_transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
pipeline = Pipeline([daily_transport.input(), tk_transport.output()])
task = PipelineTask(pipeline)
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
runner = PipelineRunner()
await asyncio.gather(runner.run(task), run_tk())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,94 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Robot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
},
]
hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
hey_robot_filter, # Filter out speech not directed at the robot
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await tts.say("Hi! If you want to talk to me, just say 'Hey Robot'.")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -1,176 +0,0 @@
import aiohttp
import asyncio
import logging
import os
import random
from typing import AsyncGenerator
from PIL import Image
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from dailyai.pipeline.frames import (
Frame,
TextFrame,
ImageFrame,
SpriteFrame,
TranscriptionFrame,
)
from dailyai.services.ai_services import AIService
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
sprites = {}
image_files = [
"sc-default.png",
"sc-talk.png",
"sc-listen-1.png",
"sc-think-1.png",
"sc-think-2.png",
"sc-think-3.png",
"sc-think-4.png",
]
script_dir = os.path.dirname(__file__)
for file in image_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites[file] = img.tobytes()
# When the bot isn't talking, show a static image of the cat listening
quiet_frame = ImageFrame(sprites["sc-listen-1.png"], (720, 1280))
# When the bot is talking, build an animation from two sprites
talking_list = [sprites["sc-default.png"], sprites["sc-talk.png"]]
talking = [random.choice(talking_list) for x in range(30)]
talking_frame = SpriteFrame(images=talking)
# TODO: Support "thinking" as soon as we get a valid transcript, while LLM
# is processing
thinking_list = [
sprites["sc-think-1.png"],
sprites["sc-think-2.png"],
sprites["sc-think-3.png"],
sprites["sc-think-4.png"],
]
thinking_frame = SpriteFrame(images=thinking_list)
class TranscriptFilter(AIService):
def __init__(self, bot_participant_id=None):
self.bot_participant_id = bot_participant_id
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, TranscriptionFrame):
if frame.participantId != self.bot_participant_id:
yield frame
class NameCheckFilter(AIService):
def __init__(self, names: list[str]):
self.names = names
self.sentence = ""
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
content: str = ""
# TODO: split up transcription by participant
if isinstance(frame, TextFrame):
content = frame.text
self.sentence += content
if self.sentence.endswith((".", "?", "!")):
if any(name in self.sentence for name in self.names):
out = self.sentence
self.sentence = ""
yield TextFrame(out)
else:
out = self.sentence
self.sentence = ""
class ImageSyncAggregator(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
yield talking_frame
yield frame
yield quiet_frame
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Santa Cat",
duration_minutes=3,
start_transcription=True,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=True,
camera_width=720,
camera_height=1280,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)
isa = ImageSyncAggregator()
messages = [
{
"role": "system",
"content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while. Your responses should only be a few sentences long.",
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
tf = TranscriptFilter(transport._my_participant_id)
ncf = NameCheckFilter(["Santa Cat", "Santa"])
pipeline = Pipeline([isa, tf, ncf, tma_in, llm, tma_out, tts])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await transport.say(
"Hi! If you want to talk to me, just say 'hey Santa Cat'.",
tts,
)
async def starting_image():
await transport.send_queue.put(quiet_frame)
await asyncio.gather(transport.run(pipeline), starting_image())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -1,34 +1,45 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import logging
import os
import sys
import wave
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.open_ai_services import OpenAILLMService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.pipeline.aggregators import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from dailyai.services.ai_services import AIService, FrameLogger
from dailyai.pipeline.frames import (
from pipecat.frames.frames import (
Frame,
AudioFrame,
LLMResponseEndFrame,
AudioRawFrame,
LLMFullResponseEndFrame,
LLMMessagesFrame,
)
from typing import AsyncGenerator
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMUserResponseAggregator,
LLMAssistantResponseAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
sounds = {}
sound_files = ["ding1.wav", "ding2.wav"]
@@ -42,33 +53,30 @@ for file in sound_files:
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = audio_file.readframes(-1)
sounds[file] = AudioRawFrame(audio_file.readframes(-1),
audio_file.getframerate(), audio_file.getnchannels())
class OutboundSoundEffectWrapper(AIService):
def __init__(self):
pass
class OutboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMResponseEndFrame):
yield AudioFrame(sounds["ding1.wav"])
# In case anything else up the stack needs it
yield frame
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, LLMFullResponseEndFrame):
await self.push_frame(sounds["ding1.wav"])
# In case anything else downstream needs it
await self.push_frame(frame, direction)
else:
yield frame
await self.push_frame(frame, direction)
class InboundSoundEffectWrapper(AIService):
def __init__(self):
pass
class InboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, LLMMessagesFrame):
yield AudioFrame(sounds["ding2.wav"])
# In case anything else up the stack needs it
yield frame
await self.push_frame(sounds["ding2.wav"])
# In case anything else downstream needs it
await self.push_frame(frame, direction)
else:
yield frame
await self.push_frame(frame, direction)
async def main(room_url: str, token):
@@ -77,15 +85,17 @@ async def main(room_url: str, token):
room_url,
token,
"Respond bot",
duration_minutes=5,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=False,
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
model="gpt-4o")
tts = ElevenLabsTTSService(
aiohttp_session=session,
@@ -100,24 +110,37 @@ async def main(room_url: str, token):
},
]
tma_in = LLMUserContextAggregator(
messages, transport._my_participant_id)
tma_out = LLMAssistantContextAggregator(
messages, transport._my_participant_id
)
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
pipeline = Pipeline([tma_in, in_sound, fl2, llm, tma_out, fl, tts, out_sound])
pipeline = Pipeline([
transport.input(),
tma_in,
in_sound,
fl2,
llm,
fl,
tts,
out_sound,
transport.output(),
tma_out
])
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await transport.say("Hi, I'm listening!", tts)
await transport.send_queue.put(AudioFrame(sounds["ding1.wav"]))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await tts.say("Hi, I'm listening!")
await transport.send_audio(sounds["ding1.wav"])
await asyncio.gather(transport.run(pipeline))
runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
if __name__ == "__main__":

View File

@@ -1,38 +1,50 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import logging
import os
import sys
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import FrameProcessor, UserResponseAggregator, VisionImageFrameAggregator
from dailyai.pipeline.frames import Frame, TextFrame, UserImageRequestFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.services.moondream_ai_service import MoondreamService
from dailyai.transports.daily_transport import DailyTransport
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.moondream import MoondreamService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
participant_id: str
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self.participant_id = participant_id
self._participant_id = participant_id
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if self.participant_id and isinstance(frame, TextFrame):
yield UserImageRequestFrame(self.participant_id)
yield frame
async def process_frame(self, frame: Frame, direction: FrameDirection):
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
@@ -41,12 +53,12 @@ async def main(room_url: str, token):
room_url,
token,
"Describe participant video",
duration_minutes=5,
mic_enabled=True,
mic_sample_rate=16000,
vad_enabled=True,
start_transcription=True,
video_rendering_enabled=True
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
@@ -70,15 +82,28 @@ async def main(room_url: str, token):
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await transport.say("Hi there! Feel free to ask me what I see.", tts)
transport.render_participant_video(participant["id"], framerate=0)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([user_response, image_requester, vision_aggregator, moondream, tts])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
moondream,
tts,
transport.output()
])
await transport.run(pipeline)
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()

View File

@@ -0,0 +1,106 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.google import GoogleLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_in_enabled=True, # This is so Silero VAD can get audio data
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
google = GoogleLLMService(
model="gemini-1.5-flash-latest",
api_key=os.getenv("GOOGLE_API_KEY"))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
google,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,106 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
openai = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o"
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
openai,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,106 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
anthropic = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-sonnet-20240229"
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
anthropic,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -1,56 +1,53 @@
import asyncio
import logging
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
from dailyai.pipeline.frames import EndFrame, TranscriptionFrame
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
from dailyai.pipeline.pipeline import Pipeline
import asyncio
import sys
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main(room_url: str):
transport = DailyTransport(
room_url,
None,
"Transcription bot",
start_transcription=False,
mic_enabled=False,
camera_enabled=False,
speaker_enabled=True,
)
transport = DailyTransport(room_url, None, "Transcription bot",
DailyParams(audio_in_enabled=True))
stt = WhisperSTTService()
transcription_output_queue = asyncio.Queue()
transport_done = asyncio.Event()
tl = TranscriptionLogger()
pipeline = Pipeline([stt], source=transport.receive_queue, sink=transcription_output_queue)
pipeline = Pipeline([transport.input(), stt, tl])
async def handle_transcription():
print("`````````TRANSCRIPTION`````````")
while not transport_done.is_set():
item = await transcription_output_queue.get()
print("got item from queue", item)
if isinstance(item, TranscriptionFrame):
print(item.text)
elif isinstance(item, EndFrame):
break
print("handle_transcription done")
task = PipelineTask(pipeline)
async def run_until_done():
await transport.run()
transport_done.set()
print("run_until_done done")
runner = PipelineRunner()
await asyncio.gather(run_until_done(), pipeline.run_pipeline(), handle_transcription())
await runner.run(task)
if __name__ == "__main__":

View File

@@ -1,50 +1,51 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import logging
import sys
from dailyai.pipeline.frames import EndFrame, TranscriptionFrame
from dailyai.transports.local_transport import LocalTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
from dailyai.pipeline.pipeline import Pipeline
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main():
meeting_duration_minutes = 1
transport = LocalTransport(
mic_enabled=True,
camera_enabled=False,
speaker_enabled=True,
duration_minutes=meeting_duration_minutes,
)
transport = LocalAudioTransport(TransportParams(audio_in_enabled=True))
stt = WhisperSTTService()
transcription_output_queue = asyncio.Queue()
transport_done = asyncio.Event()
tl = TranscriptionLogger()
pipeline = Pipeline([stt], source=transport.receive_queue, sink=transcription_output_queue)
pipeline = Pipeline([transport.input(), stt, tl])
async def handle_transcription():
print("`````````TRANSCRIPTION`````````")
while not transport_done.is_set():
item = await transcription_output_queue.get()
print("got item from queue", item)
if isinstance(item, TranscriptionFrame):
print(item.text)
elif isinstance(item, EndFrame):
break
print("handle_transcription done")
task = PipelineTask(pipeline)
async def run_until_done():
await transport.run()
transport_done.set()
print("run_until_done done")
runner = PipelineRunner()
await asyncio.gather(run_until_done(), pipeline.run_pipeline(), handle_transcription())
await runner.run(task)
if __name__ == "__main__":

View File

@@ -0,0 +1,140 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
)
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(llm):
await llm.push_frame(TextFrame("Let me think."))
async def fetch_weather_from_api(llm, args):
return ({"conditions": "nice", "temperature": "75"})
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
llm.register_function(
"get_current_weather",
fetch_weather_from_api,
start_callback=start_fetch_weather)
fl_in = FrameLogger("Inner")
fl_out = FrameLogger("Outer")
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": [
"location",
"format"],
},
})]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
tma_in = LLMUserContextAggregator(context)
tma_out = LLMAssistantContextAggregator(context)
pipeline = Pipeline([
fl_in,
transport.input(),
tma_in,
llm,
fl_out,
tts,
transport.output(),
tma_out
])
task = PipelineTask(pipeline)
@ transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await tts.say("Hi! Ask me about the weather in San Francisco.")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -1,52 +0,0 @@
import asyncio
import logging
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import FrameProcessor
from dailyai.pipeline.frames import ImageFrame, Frame, UserImageFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class UserImageProcessor(FrameProcessor):
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, UserImageFrame):
yield ImageFrame(frame.image, frame.size)
else:
yield frame
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Render participant video",
camera_width=1280,
camera_height=720,
camera_enabled=True,
video_rendering_enabled=True
)
@ transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
transport.render_participant_video(participant["id"])
pipeline = Pipeline([UserImageProcessor()])
await asyncio.gather(transport.run(pipeline))
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -1,71 +0,0 @@
import asyncio
import logging
import tkinter as tk
from typing import AsyncGenerator
from dailyai.pipeline.aggregators import FrameProcessor
from dailyai.pipeline.frames import ImageFrame, Frame, UserImageFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.transports.daily_transport import DailyTransport
from dailyai.transports.local_transport import LocalTransport
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class UserImageProcessor(FrameProcessor):
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, UserImageFrame):
yield ImageFrame(frame.image, frame.size)
else:
yield frame
async def main(room_url: str, token):
tk_root = tk.Tk()
tk_root.title("dailyai")
local_transport = LocalTransport(
tk_root=tk_root,
camera_enabled=True,
camera_width=1280,
camera_height=720
)
transport = DailyTransport(
room_url,
token,
"Render participant video",
video_rendering_enabled=True
)
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
transport.render_participant_video(participant["id"])
async def run_tk():
while not transport._stop_threads.is_set():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
local_pipeline = Pipeline([UserImageProcessor()], source=transport.receive_queue)
await asyncio.gather(
transport.run(),
local_transport.run(local_pipeline, override_pipeline_source_queue=False),
run_tk()
)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -1,25 +0,0 @@
syntax = "proto3";
package dailyai_proto;
message TextFrame {
string text = 1;
}
message AudioFrame {
bytes audio = 1;
}
message TranscriptionFrame {
string text = 1;
string participant_id = 2;
string timestamp = 3;
}
message Frame {
oneof frame {
TextFrame text = 1;
AudioFrame audio = 2;
TranscriptionFrame transcription = 3;
}
}

View File

@@ -1,134 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="//cdn.jsdelivr.net/npm/protobufjs@7.X.X/dist/protobuf.min.js"></script>
<title>WebSocket Audio Stream</title>
</head>
<body>
<h1>WebSocket Audio Stream</h1>
<button id="startAudioBtn">Start Audio</button>
<button id="stopAudioBtn">Stop Audio</button>
<script>
const SAMPLE_RATE = 16000;
const BUFFER_SIZE = 8192;
const MIN_AUDIO_SIZE = 6400;
let audioContext;
let microphoneStream;
let scriptProcessor;
let source;
let frame;
let audioChunks = [];
let isPlaying = false;
let ws;
const proto = protobuf.load("frames.proto", (err, root) => {
if (err) throw err;
frame = root.lookupType("dailyai_proto.Frame");
});
function initWebSocket() {
ws = new WebSocket('ws://localhost:8765');
ws.addEventListener('open', () => console.log('WebSocket connection established.'));
ws.addEventListener('message', handleWebSocketMessage);
ws.addEventListener('close', (event) => console.log("WebSocket connection closed.", event.code, event.reason));
ws.addEventListener('error', (event) => console.error('WebSocket error:', event));
}
async function handleWebSocketMessage(event) {
const arrayBuffer = await event.data.arrayBuffer();
enqueueAudioFromProto(arrayBuffer);
}
function enqueueAudioFromProto(arrayBuffer) {
const parsedFrame = frame.decode(new Uint8Array(arrayBuffer));
if (!parsedFrame?.audio) return false;
const frameCount = parsedFrame.audio.data.length / 2;
const audioOutBuffer = audioContext.createBuffer(1, frameCount, SAMPLE_RATE);
const nowBuffering = audioOutBuffer.getChannelData(0);
const view = new Int16Array(parsedFrame.audio.data.buffer);
for (let i = 0; i < frameCount; i++) {
const word = view[i];
nowBuffering[i] = ((word + 32768) % 65536 - 32768) / 32768.0;
}
audioChunks.push(audioOutBuffer);
if (!isPlaying) playNextChunk();
}
function playNextChunk() {
if (audioChunks.length === 0) {
isPlaying = false;
return;
}
isPlaying = true;
const audioOutBuffer = audioChunks.shift();
const source = audioContext.createBufferSource();
source.buffer = audioOutBuffer;
source.connect(audioContext.destination);
source.onended = playNextChunk;
source.start();
}
function startAudio() {
if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) {
alert('getUserMedia is not supported in your browser.');
return;
}
navigator.mediaDevices.getUserMedia({ audio: true })
.then((stream) => {
microphoneStream = stream;
audioContext = new (window.AudioContext || window.webkitAudioContext)();
scriptProcessor = audioContext.createScriptProcessor(BUFFER_SIZE, 1, 1);
source = audioContext.createMediaStreamSource(stream);
source.connect(scriptProcessor);
scriptProcessor.connect(audioContext.destination);
const audioBuffer = [];
const skipRatio = Math.floor(audioContext.sampleRate / (SAMPLE_RATE * 2));
scriptProcessor.onaudioprocess = (event) => {
const rawLeftChannelData = event.inputBuffer.getChannelData(0);
for (let i = 0; i < rawLeftChannelData.length; i += skipRatio) {
const normalized = ((rawLeftChannelData[i] * 32768.0) + 32768) % 65536 - 32768;
const swappedBytes = ((normalized & 0xff) << 8) | ((normalized >> 8) & 0xff);
audioBuffer.push(swappedBytes);
}
if (audioBuffer.length >= MIN_AUDIO_SIZE) {
const audioFrame = frame.create({ audio: { audio: audioBuffer.slice(0, MIN_AUDIO_SIZE) } });
const encodedFrame = new Uint8Array(frame.encode(audioFrame).finish());
ws.send(encodedFrame);
audioBuffer.splice(0, MIN_AUDIO_SIZE);
}
};
initWebSocket();
})
.catch((error) => console.error('Error accessing microphone:', error));
}
function stopAudio() {
if (ws) {
ws.close();
scriptProcessor.disconnect();
source.disconnect();
ws = undefined;
}
}
document.getElementById('startAudioBtn').addEventListener('click', startAudio);
document.getElementById('stopAudioBtn').addEventListener('click', stopAudio);
</script>
</body>
</html>

View File

@@ -1,50 +0,0 @@
import asyncio
import aiohttp
import logging
import os
from dailyai.pipeline.frame_processor import FrameProcessor
from dailyai.pipeline.frames import TextFrame, TranscriptionFrame
from dailyai.pipeline.pipeline import Pipeline
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
from dailyai.transports.websocket_transport import WebsocketTransport
from dailyai.services.whisper_ai_services import WhisperSTTService
logging.basicConfig(format="%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("dailyai")
logger.setLevel(logging.DEBUG)
class WhisperTranscriber(FrameProcessor):
async def process_frame(self, frame):
if isinstance(frame, TranscriptionFrame):
print(f"Transcribed: {frame.text}")
else:
yield frame
async def main():
async with aiohttp.ClientSession() as session:
transport = WebsocketTransport(
mic_enabled=True,
speaker_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([
WhisperSTTService(),
WhisperTranscriber(),
tts,
])
@transport.on_connection
async def queue_frame():
await pipeline.queue_frames([TextFrame("Hello there!")])
await transport.run(pipeline)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,123 +0,0 @@
import argparse
import asyncio
import requests
import time
import urllib.parse
import random
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.pipeline.frames import Frame, FrameType
from dailyai.services.fal_ai_services import FalImageGenService
from dailyai.services.elevenlabs_ai_service import ElevenLabsTTSService
async def main(room_url: str, token):
global transport
global llm
global tts
transport = DailyTransport(
room_url,
token,
"Imagebot",
1,
)
transport._mic_enabled = True
transport._camera_enabled = True
transport._mic_sample_rate = 16000
transport._camera_width = 1024
transport._camera_height = 1024
llm = AzureLLMService()
tts = AzureTTSService()
img = FalImageGenService()
async def handle_transcriptions():
print("handle_transcriptions got called")
sentence = ""
async for message in transport.get_transcriptions():
print(f"transcription message: {message}")
if message["session_id"] == transport._my_participant_id:
continue
finder = message["text"].find("start over")
print(f"finder: {finder}")
if finder >= 0:
async for audio in tts.run_tts(f"Resetting."):
transport.output_queue.put(
Frame(FrameType.AUDIO_FRAME, audio))
sentence = ""
continue
# todo: we could differentiate between transcriptions from
# different participants
sentence += f" {message['text']}"
print(f"sentence is now: {sentence}")
# TODO: Cache this audio
phrase = random.choice(
["OK.", "Got it.", "Sure.", "You bet.", "Sure thing."])
async for audio in tts.run_tts(phrase):
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
img_result = img.run_image_gen(sentence, "1024x1024")
awaited_img = await asyncio.gather(img_result)
transport.output_queue.put(
[
Frame(FrameType.IMAGE_FRAME, awaited_img[0][1]),
]
)
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
print(f"participant joined: {participant['info']['userName']}")
if participant["info"]["isLocal"]:
return
async for audio in tts.run_tts("Describe an image, and I'll create it."):
audio_generator = tts.run_tts(
f"Hello, {participant['info']['userName']}! Describe an image and I'll create it. To start over, just say 'start over'.")
async for audio in audio_generator:
transport.output_queue.put(Frame(FrameType.AUDIO_FRAME, audio))
await asyncio.gather(transport.run(), handle_transcriptions())
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Daily Bot Sample")
parser.add_argument(
"-u",
"--url",
type=str,
required=True,
help="URL of the Daily room to join")
parser.add_argument(
"-k",
"--apikey",
type=str,
required=True,
help="Daily API Key (needed to create token)",
)
args, unknown = parser.parse_known_args()
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(args.url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={
"Authorization": f"Bearer {args.apikey}"},
json={
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
asyncio.run(main(args.url, token))

View File

@@ -1,135 +0,0 @@
import aiohttp
import asyncio
import os
import wave
from dailyai.transports.daily_transport import DailyTransport
from dailyai.services.azure_ai_services import AzureLLMService, AzureTTSService
from dailyai.pipeline.aggregators import LLMContextAggregator
from dailyai.services.ai_services import AIService, FrameLogger
from dailyai.pipeline.frames import Frame, AudioFrame, LLMResponseEndFrame, LLMMessagesFrame
from typing import AsyncGenerator
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
sounds = {}
sound_files = [
'ding1.wav',
'ding2.wav'
]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = audio_file.readframes(-1)
class OutboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMResponseEndFrame):
yield AudioFrame(sounds["ding1.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
class InboundSoundEffectWrapper(AIService):
def __init__(self):
pass
async def process_frame(self, frame: Frame) -> AsyncGenerator[Frame, None]:
if isinstance(frame, LLMMessagesFrame):
yield AudioFrame(sounds["ding2.wav"])
# In case anything else up the stack needs it
yield frame
else:
yield frame
async def main(room_url: str, token, phone):
async with aiohttp.ClientSession() as session:
global transport
global llm
global tts
transport = DailyTransport(
room_url,
token,
"Respond bot",
300,
)
transport._mic_enabled = True
transport._mic_sample_rate = 16000
transport._camera_enabled = False
llm = AzureLLMService()
tts = AzureTTSService()
@transport.event_handler("on_first_other_participant_joined")
async def on_first_other_participant_joined(transport, participant):
await tts.say("Hi, I'm listening!", transport.send_queue)
await transport.send_queue.put(AudioFrame(sounds["ding1.wav"]))
async def handle_transcriptions():
messages = [
{"role": "system", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way."},
]
tma_in = LLMContextAggregator(
messages, "user", transport._my_participant_id
)
tma_out = LLMContextAggregator(
messages, "assistant", transport._my_participant_id
)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
await out_sound.run_to_queue(
transport.send_queue,
tts.run(
tma_out.run(
llm.run(
fl2.run(
in_sound.run(
tma_in.run(
transport.get_receive_frames()
)
)
)
)
)
)
)
@transport.event_handler("on_participant_joined")
async def pax_joined(transport, pax):
print(f"PARTICIPANT JOINED: {pax}")
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if (state == "joined"):
if (phone):
transport.start_recording()
transport.dialout(phone)
await asyncio.gather(transport.run(), handle_transcriptions())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,163 @@
# flyctl launch added from .gitignore
# Byte-compiled / optimized / DLL files
**/__pycache__
**/*.py[cod]
**/*$py.class
# C extensions
**/*.so
# Distribution / packaging
**/.Python
**/build
**/develop-eggs
**/dist
**/downloads
**/eggs
**/.eggs
**/lib
**/lib64
**/parts
**/sdist
**/var
**/wheels
**/share/python-wheels
**/*.egg-info
**/.installed.cfg
**/*.egg
**/MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
**/*.manifest
**/*.spec
# Installer logs
**/pip-log.txt
**/pip-delete-this-directory.txt
# Unit test / coverage reports
**/htmlcov
**/.tox
**/.nox
**/.coverage
**/.coverage.*
**/.cache
**/nosetests.xml
**/coverage.xml
**/*.cover
**/*.py,cover
**/.hypothesis
**/.pytest_cache
**/cover
# Translations
**/*.mo
**/*.pot
# Django stuff:
**/*.log
**/local_settings.py
**/db.sqlite3
**/db.sqlite3-journal
# Flask stuff:
**/instance
**/.webassets-cache
# Scrapy stuff:
**/.scrapy
# Sphinx documentation
**/docs/_build
# PyBuilder
**/.pybuilder
**/target
# Jupyter Notebook
**/.ipynb_checkpoints
# IPython
**/profile_default
**/ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
**/.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
**/__pypackages__
# Celery stuff
**/celerybeat-schedule
**/celerybeat.pid
# SageMath parsed files
**/*.sage.py
# Environments
**/.env
**/.venv
**/env
**/venv
**/ENV
**/env.bak
**/venv.bak
# Spyder project settings
**/.spyderproject
**/.spyproject
# Rope project settings
**/.ropeproject
# mkdocs documentation
site
# mypy
**/.mypy_cache
**/.dmypy.json
**/dmypy.json
# Pyre type checker
**/.pyre
# pytype static type analyzer
**/.pytype
# Cython debug symbols
**/cython_debug
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
**/runpod.toml
fly.toml

161
examples/moondream-chatbot/.gitignore vendored Normal file
View File

@@ -0,0 +1,161 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml

View File

@@ -0,0 +1,25 @@
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y wget
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
RUN dpkg -i cuda-keyring_1.1-1_all.deb
RUN echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" > /etc/apt/sources.list.d/cuda-ubuntu2204-x86_64.list
RUN apt-get update && apt-get install -y python3 python3-pip
RUN apt-get install -y cuda-nvcc-12-4 libcublas-12-4 libcudnn8
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
copy assets/* /app/assets/
copy utils/* /app/utils/
WORKDIR /app
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "server.py"]

View File

@@ -0,0 +1,76 @@
FROM ubuntu:22.04
# environment variables for Intel OneAPI components
ENV DPCPPROOT=/opt/intel/oneapi/compiler/latest
ENV MKLROOT=/opt/intel/oneapi/mkl/latest
ENV CCLROOT=/opt/intel/oneapi/ccl/latest
ENV MPIROOT=/opt/intel/oneapi/mpi/latest
# Install necessary dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
wget \
lsb-release \
pciutils \
gnupg2 \
python3-pip
# Add Intel OneAPI repository and GPG key
# Intel GPU repository and GPG key
# Install Intel OneAPI components and source the environment scripts
RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null && \
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list && \
/bin/bash -c ' \
. /etc/os-release && \
if [[ " jammy " =~ " ${VERSION_CODENAME} " ]]; then \
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg && \
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu ${VERSION_CODENAME}/lts/2350 unified" | \
tee /etc/apt/sources.list.d/intel-gpu-${VERSION_CODENAME}.list && \
apt-get update && \
apt-get install -y --no-install-recommends intel-opencl-icd \
intel-level-zero-gpu level-zero intel-media-va-driver-non-free \
libmfx1 libmfxgen1 libvpl2 libegl-mesa0 libegl1-mesa \
libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 \
libxatracker2 mesa-va-drivers mesa-vdpau-drivers \
mesa-vulkan-drivers va-driver-all; \
else \
echo "Ubuntu version ${VERSION_CODENAME} not supported. Exiting..."; \
exit 1; \
fi' && \
apt-get update && apt-get install -y --no-install-recommends \
intel-oneapi-dpcpp-cpp-2024.1=2024.1.0-963 intel-oneapi-mkl-devel=2024.1.0-691 \
intel-oneapi-ccl-devel=2021.12.0-309 && \
apt-get clean && rm -rf /var/lib/apt/lists/* && \
groupadd -r render && usermod -aG render root && \
echo "source ${DPCPPROOT}/env/vars.sh" >> ~/.bashrc && \
echo "source ${MKLROOT}/env/vars.sh" >> ~/.bashrc && \
echo "source ${CCLROOT}/env/vars.sh" >> ~/.bashrc && \
echo "source ${MPIROOT}/env/vars.sh" >> ~/.bashrc && \
echo "export LD_LIBRARY_PATH=${MKLROOT}/lib:${DPCPPROOT}/linux/compiler/lib/intel64_lin:$LD_LIBRARY_PATH" >> ~/.bashrc
WORKDIR /app
COPY . /app
RUN mkdir -p /app /app/assets /app/utils
COPY *.py requirements.txt assets/* utils/* /app/
# Install the Intel-specific versions of torch
RUN python3 -m pip install --no-cache-dir -r requirements.txt && \
pip uninstall -y torch && \
pip freeze | grep 'nvidia-' | xargs pip uninstall -y && \
pip install --no-cache-dir --force-reinstall torch==2.1.0.post2 torchvision==0.16.0.post2 torchaudio==2.1.0.post2 \
intel-extension-for-pytorch==2.1.30+xpu oneccl_bind_pt==2.1.300+xpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
RUN echo '#!/bin/bash\n\
source ${DPCPPROOT}/env/vars.sh\n\
source ${MKLROOT}/env/vars.sh\n\
source ${CCLROOT}/env/vars.sh\n\
source ${MPIROOT}/env/vars.sh\n\
export LD_LIBRARY_PATH=${MKLROOT}/lib:${DPCPPROOT}/linux/compiler/lib/intel64_lin:$LD_LIBRARY_PATH\n\
python3 server.py' > /usr/local/bin/run_app.sh && \
chmod +x /usr/local/bin/run_app.sh && \
find / -type d -name "__pycache__" -exec rm -rf {} +
EXPOSE 7860
ENTRYPOINT ["/usr/local/bin/run_app.sh"]

View File

@@ -0,0 +1,44 @@
# Moondream Chatbot
<img src="image.png" width="420px">
This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion. The chatbot also has vision powers thanks to [Moondream](https://moondream.ai) so you can ask it, for example, "what do you see?".
The first time, things might take some time to get started since VAD (Voice Activity Detection) and vision models need to be downloaded.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/start` in your browser to start a chatbot
session.
## Build and test the Docker image
```
docker build -t moonbot .
docker run --env-file .env -p 7860:7860 moonbot
```
### For Intel GPUs (Arc, Max and Flex series)
```
docker build -t moonbot -f Dockerfile.intel .
docker run --env-file .env -p 7860:7860 --device /dev/dri moonbot
```
You can try to visit `http://localhost:7860/start` again.

View File

Before

Width:  |  Height:  |  Size: 759 KiB

After

Width:  |  Height:  |  Size: 759 KiB

View File

Before

Width:  |  Height:  |  Size: 884 KiB

After

Width:  |  Height:  |  Size: 884 KiB

View File

Before

Width:  |  Height:  |  Size: 876 KiB

After

Width:  |  Height:  |  Size: 876 KiB

View File

Before

Width:  |  Height:  |  Size: 881 KiB

After

Width:  |  Height:  |  Size: 881 KiB

View File

Before

Width:  |  Height:  |  Size: 866 KiB

After

Width:  |  Height:  |  Size: 866 KiB

View File

Before

Width:  |  Height:  |  Size: 874 KiB

After

Width:  |  Height:  |  Size: 874 KiB

View File

Before

Width:  |  Height:  |  Size: 882 KiB

After

Width:  |  Height:  |  Size: 882 KiB

View File

Before

Width:  |  Height:  |  Size: 885 KiB

After

Width:  |  Height:  |  Size: 885 KiB

View File

Before

Width:  |  Height:  |  Size: 888 KiB

After

Width:  |  Height:  |  Size: 888 KiB

View File

Before

Width:  |  Height:  |  Size: 890 KiB

After

Width:  |  Height:  |  Size: 890 KiB

View File

Before

Width:  |  Height:  |  Size: 898 KiB

After

Width:  |  Height:  |  Size: 898 KiB

View File

Before

Width:  |  Height:  |  Size: 836 KiB

After

Width:  |  Height:  |  Size: 836 KiB

View File

Before

Width:  |  Height:  |  Size: 903 KiB

After

Width:  |  Height:  |  Size: 903 KiB

View File

Before

Width:  |  Height:  |  Size: 908 KiB

After

Width:  |  Height:  |  Size: 908 KiB

View File

Before

Width:  |  Height:  |  Size: 908 KiB

After

Width:  |  Height:  |  Size: 908 KiB

View File

Before

Width:  |  Height:  |  Size: 905 KiB

After

Width:  |  Height:  |  Size: 905 KiB

View File

Before

Width:  |  Height:  |  Size: 903 KiB

After

Width:  |  Height:  |  Size: 903 KiB

View File

Before

Width:  |  Height:  |  Size: 866 KiB

After

Width:  |  Height:  |  Size: 866 KiB

View File

Before

Width:  |  Height:  |  Size: 849 KiB

After

Width:  |  Height:  |  Size: 849 KiB

View File

Before

Width:  |  Height:  |  Size: 866 KiB

After

Width:  |  Height:  |  Size: 866 KiB

View File

Before

Width:  |  Height:  |  Size: 866 KiB

After

Width:  |  Height:  |  Size: 866 KiB

View File

Before

Width:  |  Height:  |  Size: 864 KiB

After

Width:  |  Height:  |  Size: 864 KiB

View File

Before

Width:  |  Height:  |  Size: 858 KiB

After

Width:  |  Height:  |  Size: 858 KiB

View File

Before

Width:  |  Height:  |  Size: 875 KiB

After

Width:  |  Height:  |  Size: 875 KiB

View File

Before

Width:  |  Height:  |  Size: 881 KiB

After

Width:  |  Height:  |  Size: 881 KiB

View File

@@ -0,0 +1,200 @@
import asyncio
import aiohttp
import os
import sys
from PIL import Image
from pipecat.frames.frames import (
ImageRawFrame,
SpriteFrame,
Frame,
LLMMessagesFrame,
AudioRawFrame,
TTSStoppedFrame,
TextFrame,
UserImageRawFrame,
UserImageRequestFrame,
)
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import LLMUserResponseAggregator
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.moondream import MoondreamService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
user_request_answer = "Let me take a look."
sprites = []
script_dir = os.path.dirname(__file__)
for i in range(1, 26):
# Build the full path to the image file
full_path = os.path.join(script_dir, f"assets/robot0{i}.png")
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(ImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
flipped = sprites[::-1]
sprites.extend(flipped)
# When the bot isn't talking, show a static image of the cat listening
quiet_frame = sprites[0]
talking_frame = SpriteFrame(images=sprites)
class TalkingAnimation(FrameProcessor):
"""
This class starts a talking animation when it receives an first AudioFrame,
and then returns to a "quiet" sprite when it sees a TTSStoppedFrame.
"""
def __init__(self):
super().__init__()
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, AudioRawFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
elif isinstance(frame, TTSStoppedFrame):
await self.push_frame(quiet_frame)
self._is_talking = False
await self.push_frame(frame)
class UserImageRequester(FrameProcessor):
def __init__(self):
super().__init__()
self.participant_id = None
def set_participant_id(self, participant_id: str):
self.participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
if self.participant_id and isinstance(frame, TextFrame):
if frame.text == user_request_answer:
await self.push_frame(UserImageRequestFrame(self.participant_id), FrameDirection.UPSTREAM)
await self.push_frame(TextFrame("Describe the image in a short sentence."))
elif isinstance(frame, UserImageRawFrame):
await self.push_frame(frame)
class TextFilterProcessor(FrameProcessor):
def __init__(self, text: str):
super().__init__()
self.text = text
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, TextFrame):
if frame.text != self.text:
await self.push_frame(frame)
else:
await self.push_frame(frame)
class ImageFilterProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
if not isinstance(frame, ImageRawFrame):
await self.push_frame(frame)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=576,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="pNInz6obpgDQGcFmaJgB",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
ta = TalkingAnimation()
sa = SentenceAggregator()
ir = UserImageRequester()
va = VisionImageFrameAggregator()
# If you run into weird description, try with use_cpu=True
moondream = MoondreamService()
tf = TextFilterProcessor(user_request_answer)
imgf = ImageFilterProcessor()
messages = [
{
"role": "system",
"content": f"You are Chatbot, a friendly, helpful robot. Let the user know that you are capable of chatting or describing what you see. Your goal is to demonstrate your capabilities in a succinct way. Reply with only '{user_request_answer}' if the user asks you to describe what you see. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself.",
},
]
ura = LLMUserResponseAggregator(messages)
pipeline = Pipeline([
transport.input(),
ura,
llm,
ParallelPipeline(
[sa, ir, va, moondream],
[tf, imgf]),
tts,
ta,
transport.output()
])
task = PipelineTask(pipeline)
await task.queue_frame(quiet_frame)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
transport.capture_participant_video(participant["id"], framerate=0)
ir.set_participant_id(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,4 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=7df...
OPENAI_API_KEY=sk-PL...
ELEVENLABS_API_KEY=aeb...

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

View File

@@ -0,0 +1,5 @@
python-dotenv
requests
fastapi[all]
uvicorn
pipecat-ai[daily,moondream,openai,silero]

View File

@@ -0,0 +1,124 @@
import os
import argparse
import subprocess
import atexit
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, RedirectResponse
from utils.daily_helpers import create_room as _create_room, get_token
MAX_BOTS_PER_ROOM = 1
# Bot sub-process dict for status reporting and concurrency control
bot_procs = {}
def cleanup():
# Clean up function, just to be extra safe
for proc in bot_procs.values():
proc.terminate()
proc.wait()
atexit.register(cleanup)
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/start")
async def start_agent(request: Request):
print(f"!!! Creating room")
room_url, room_name = _create_room()
print(f"!!! Room URL: {room_url}")
# Ensure the room property is present
if not room_url:
raise HTTPException(
status_code=500,
detail="Missing 'room' property in request data. Cannot start agent without a target room!")
# Check if there is already an existing process running in this room
num_bots_in_room = sum(
1 for proc in bot_procs.values() if proc[1] == room_url and proc[0].poll() is None)
if num_bots_in_room >= MAX_BOTS_PER_ROOM:
raise HTTPException(
status_code=500, detail=f"Max bot limited reach for room: {room_url}")
# Get the token for the room
token = get_token(room_url)
if not token:
raise HTTPException(
status_code=500, detail=f"Failed to get token for room: {room_url}")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
try:
proc = subprocess.Popen(
[
f"python3 -m bot -u {room_url} -t {token}"
],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__))
)
bot_procs[proc.pid] = (proc, room_url)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Failed to start subprocess: {e}")
return RedirectResponse(room_url)
@app.get("/status/{pid}")
def get_status(pid: int):
# Look up the subprocess
proc = bot_procs.get(pid)
# If the subprocess doesn't exist, return an error
if not proc:
raise HTTPException(
status_code=404, detail=f"Bot with process id: {pid} not found")
# Check the status of the subprocess
if proc[0].poll() is None:
status = "running"
else:
status = "finished"
return JSONResponse({"bot_id": pid, "status": status})
if __name__ == "__main__":
import uvicorn
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(
description="Daily Moondream FastAPI server")
parser.add_argument("--host", type=str,
default=default_host, help="Host address")
parser.add_argument("--port", type=int,
default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true",
help="Reload code on change")
config = parser.parse_args()
uvicorn.run(
"server:app",
host=config.host,
port=config.port,
reload=config.reload,
)

View File

@@ -0,0 +1,109 @@
import urllib.parse
import os
import time
import urllib
import requests
from dotenv import load_dotenv
load_dotenv()
daily_api_path = os.getenv("DAILY_API_URL") or "api.daily.co/v1"
daily_api_key = os.getenv("DAILY_API_KEY")
def create_room() -> tuple[str, str]:
"""
Helper function to create a Daily room.
# See: https://docs.daily.co/reference/rest-api/rooms
Returns:
tuple: A tuple containing the room URL and room name.
Raises:
Exception: If the request to create the room fails or if the response does not contain the room URL or room name.
"""
room_props = {
"exp": time.time() + 60 * 60, # 1 hour
"enable_chat": True,
"enable_emoji_reactions": True,
"eject_at_room_exp": True,
"enable_prejoin_ui": False, # Important for the bot to be able to join headlessly
}
res = requests.post(
f"https://{daily_api_path}/rooms",
headers={"Authorization": f"Bearer {daily_api_key}"},
json={
"properties": room_props
},
)
if res.status_code != 200:
raise Exception(f"Unable to create room: {res.text}")
data = res.json()
room_url: str = data.get("url")
room_name: str = data.get("name")
if room_url is None or room_name is None:
raise Exception("Missing room URL or room name in response")
return room_url, room_name
def get_name_from_url(room_url: str) -> str:
"""
Extracts the name from a given room URL.
Args:
room_url (str): The URL of the room.
Returns:
str: The extracted name from the room URL.
"""
return urllib.parse.urlparse(room_url).path[1:]
def get_token(room_url: str) -> str:
"""
Retrieves a meeting token for the specified Daily room URL.
# See: https://docs.daily.co/reference/rest-api/meeting-tokens
Args:
room_url (str): The URL of the Daily room.
Returns:
str: The meeting token.
Raises:
Exception: If no room URL is specified or if no Daily API key is specified.
Exception: If there is an error creating the meeting token.
"""
if not room_url:
raise Exception(
"No Daily room specified. You must specify a Daily room in order a token to be generated.")
if not daily_api_key:
raise Exception(
"No Daily API key specified. set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
expiration: float = time.time() + 60 * 60
room_name = get_name_from_url(room_url)
res: requests.Response = requests.post(
f"https://{daily_api_path}/meeting-tokens",
headers={
"Authorization": f"Bearer {daily_api_key}"},
json={
"properties": {
"room_name": room_name,
"is_owner": True, # Owner tokens required for transcription
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
return token

View File

@@ -0,0 +1,16 @@
FROM python:3.10-bullseye
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
copy assets/* /app/assets/
copy utils/* /app/utils/
WORKDIR /app
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "server.py"]

View File

@@ -0,0 +1,37 @@
# Simple Chatbot
<img src="image.png" width="420px">
This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion.
See a video of it in action: https://x.com/kwindla/status/1778628911817183509
And a quick video walkthrough of the code: https://www.loom.com/share/13df1967161f4d24ade054e7f8753416
The first time, things might take extra time to get started since VAD (Voice Activity Detection) model needs to be downloaded.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/start` in your browser to start a chatbot session.
## Build and test the Docker image
```
docker build -t chatbot .
docker run --env-file .env -p 7860:7860 chatbot
```

View File

@@ -0,0 +1,355 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import wave
from typing import List
from openai._types import NotGiven, NOT_GIVEN
from openai.types.chat import (
ChatCompletionToolParam,
)
from pipecat.frames.frames import AudioRawFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMUserContextAggregator, LLMAssistantContextAggregator
from pipecat.processors.logger import FrameLogger
from pipecat.processors.frame_processor import FrameDirection
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMContextFrame, OpenAILLMService
from pipecat.services.ai_services import AIService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
sounds = {}
sound_files = [
"clack-short.wav",
"clack.wav",
"clack-short-quiet.wav",
"ding.wav",
"ding2.wav",
]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the sound file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the sound and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = AudioRawFrame(audio_file.readframes(-1),
audio_file.getframerate(), audio_file.getnchannels())
class IntakeProcessor:
def __init__(
self,
context: OpenAILLMContext,
llm: AIService,
tools: List[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self._context: OpenAILLMContext = context
self._llm = llm
print(f"Initializing context from IntakeProcessor")
self._context.add_message({"role": "system", "content": "You are Jessica, an agent for a company called Tri-County Health Services. Your job is to collect important information from the user before their doctor visit. You're talking to Chad Bailey. You should address the user by their first name and be polite and professional. You're not a medical professional, so you shouldn't provide any advice. Keep your responses short. Your job is to collect information to give to a doctor. Don't make assumptions about what values to plug into functions. Ask for clarification if a user response is ambiguous. Start by introducing yourself. Then, ask the user to confirm their identity by telling you their birthday, including the year. When they answer with their birthday, call the verify_birthday function."})
self._context.set_tools([
{
"type": "function",
"function": {
"name": "verify_birthday",
"description": "Use this function to verify the user has provided their correct birthday.",
"parameters": {
"type": "object",
"properties": {
"birthday": {
"type": "string",
"description": "The user's birthdate, including the year. The user can provide it in any format, but convert it to YYYY-MM-DD format to call this function.",
}},
},
},
}])
# Create an allowlist of functions that the LLM can call
self._functions = [
"verify_birthday",
"list_prescriptions",
"list_allergies",
"list_conditions",
"list_visit_reasons",
]
async def verify_birthday(self, llm, args):
if args["birthday"] == "1983-01-01":
self._context.set_tools(
[
{
"type": "function",
"function": {
"name": "list_prescriptions",
"description": "Once the user has provided a list of their prescription medications, call this function.",
"parameters": {
"type": "object",
"properties": {
"prescriptions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"medication": {
"type": "string",
"description": "The medication's name",
},
"dosage": {
"type": "string",
"description": "The prescription's dosage",
},
},
},
}},
},
},
}])
# It's a bit weird to push this to the LLM, but it gets it into the pipeline
await llm.push_frame(sounds["ding2.wav"], FrameDirection.DOWNSTREAM)
# We don't need the function call in the context, so just return a new
# system message and let the framework re-prompt
return [{"role": "system", "content": "Next, thank the user for confirming their identity, then ask the user to list their current prescriptions. Each prescription needs to have a medication name and a dosage. Do not call the list_prescriptions function with any unknown dosages."}]
else:
# The user provided an incorrect birthday; ask them to try again
return [{"role": "system", "content": "The user provided an incorrect birthday. Ask them for their birthday again. When they answer, call the verify_birthday function."}]
async def start_prescriptions(self, llm):
print(f"!!! doing start prescriptions")
# Move on to allergies
self._context.set_tools(
[
{
"type": "function",
"function": {
"name": "list_allergies",
"description": "Once the user has provided a list of their allergies, call this function.",
"parameters": {
"type": "object",
"properties": {
"allergies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "What the user is allergic to",
}},
},
}},
},
},
}])
self._context.add_message(
{
"role": "system",
"content": "Next, ask the user if they have any allergies. Once they have listed their allergies or confirmed they don't have any, call the list_allergies function."})
print(f"!!! about to await llm process frame in start prescrpitions")
await llm.process_frame(OpenAILLMContextFrame(self._context), FrameDirection.DOWNSTREAM)
print(f"!!! past await process frame in start prescriptions")
async def start_allergies(self, llm):
print("!!! doing start allergies")
# Move on to conditions
self._context.set_tools(
[
{
"type": "function",
"function": {
"name": "list_conditions",
"description": "Once the user has provided a list of their medical conditions, call this function.",
"parameters": {
"type": "object",
"properties": {
"conditions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user's medical condition",
}},
},
}},
},
},
},
])
self._context.add_message(
{
"role": "system",
"content": "Now ask the user if they have any medical conditions the doctor should know about. Once they've answered the question, call the list_conditions function."})
await llm.process_frame(OpenAILLMContextFrame(self._context), FrameDirection.DOWNSTREAM)
async def start_conditions(self, llm):
print("!!! doing start conditions")
# Move on to visit reasons
self._context.set_tools(
[
{
"type": "function",
"function": {
"name": "list_visit_reasons",
"description": "Once the user has provided a list of the reasons they are visiting a doctor today, call this function.",
"parameters": {
"type": "object",
"properties": {
"visit_reasons": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user's reason for visiting the doctor",
}},
},
}},
},
},
}])
self._context.add_message(
{"role": "system", "content": "Finally, ask the user the reason for their doctor visit today. Once they answer, call the list_visit_reasons function."})
await llm.process_frame(OpenAILLMContextFrame(self._context), FrameDirection.DOWNSTREAM)
async def start_visit_reasons(self, llm):
print("!!! doing start visit reasons")
# move to finish call
self._context.set_tools([])
self._context.add_message({"role": "system",
"content": "Now, thank the user and end the conversation."})
await llm.process_frame(OpenAILLMContextFrame(self._context), FrameDirection.DOWNSTREAM)
async def save_data(self, llm, args):
logger.info(f"!!! Saving data: {args}")
# Since this is supposed to be "async", returning None from the callback
# will prevent adding anything to context or re-prompting
return None
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=576,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="pNInz6obpgDQGcFmaJgB",
#
# Spanish
#
# model="eleven_multilingual_v2",
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = []
context = OpenAILLMContext(messages=messages)
user_context = LLMUserContextAggregator(context)
assistant_context = LLMAssistantContextAggregator(context)
intake = IntakeProcessor(context, llm)
llm.register_function("verify_birthday", intake.verify_birthday)
llm.register_function(
"list_prescriptions",
intake.save_data,
start_callback=intake.start_prescriptions)
llm.register_function(
"list_allergies",
intake.save_data,
start_callback=intake.start_allergies)
llm.register_function(
"list_conditions",
intake.save_data,
start_callback=intake.start_conditions)
llm.register_function(
"list_visit_reasons",
intake.save_data,
start_callback=intake.start_visit_reasons)
fl = FrameLogger("LLM Output")
pipeline = Pipeline([
transport.input(), # Transport input
user_context, # User responses
llm, # LLM
fl, # Frame logger
tts, # TTS
transport.output(), # Transport output
assistant_context, # Assistant responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=False))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
print(f"Context is: {context}")
await task.queue_frames([OpenAILLMContextFrame(context)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,4 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=7df...
OPENAI_API_KEY=sk-PL...
ELEVENLABS_API_KEY=aeb...

Binary file not shown.

After

Width:  |  Height:  |  Size: 733 KiB

View File

@@ -0,0 +1,5 @@
python-dotenv
requests
fastapi[all]
uvicorn
pipecat-ai[daily,openai,silero]

View File

@@ -0,0 +1,58 @@
import argparse
import os
import time
import urllib
import requests
def configure():
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u",
"--url",
type=str,
required=False,
help="URL of the Daily room to join")
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL.")
if not key:
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(url).path[1:]
expiration: float = time.time() + 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={
"Authorization": f"Bearer {key}"},
json={
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
return (url, token)

View File

@@ -0,0 +1,124 @@
import os
import argparse
import subprocess
import atexit
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, RedirectResponse
from utils.daily_helpers import create_room as _create_room, get_token
MAX_BOTS_PER_ROOM = 1
# Bot sub-process dict for status reporting and concurrency control
bot_procs = {}
def cleanup():
# Clean up function, just to be extra safe
for proc in bot_procs.values():
proc.terminate()
proc.wait()
atexit.register(cleanup)
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/start")
async def start_agent(request: Request):
print(f"!!! Creating room")
room_url, room_name = _create_room()
print(f"!!! Room URL: {room_url}")
# Ensure the room property is present
if not room_url:
raise HTTPException(
status_code=500,
detail="Missing 'room' property in request data. Cannot start agent without a target room!")
# Check if there is already an existing process running in this room
num_bots_in_room = sum(
1 for proc in bot_procs.values() if proc[1] == room_url and proc[0].poll() is None)
if num_bots_in_room >= MAX_BOTS_PER_ROOM:
raise HTTPException(
status_code=500, detail=f"Max bot limited reach for room: {room_url}")
# Get the token for the room
token = get_token(room_url)
if not token:
raise HTTPException(
status_code=500, detail=f"Failed to get token for room: {room_url}")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
try:
proc = subprocess.Popen(
[
f"python3 -m bot -u {room_url} -t {token}"
],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__))
)
bot_procs[proc.pid] = (proc, room_url)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Failed to start subprocess: {e}")
return RedirectResponse(room_url)
@app.get("/status/{pid}")
def get_status(pid: int):
# Look up the subprocess
proc = bot_procs.get(pid)
# If the subprocess doesn't exist, return an error
if not proc:
raise HTTPException(
status_code=404, detail=f"Bot with process id: {pid} not found")
# Check the status of the subprocess
if proc[0].poll() is None:
status = "running"
else:
status = "finished"
return JSONResponse({"bot_id": pid, "status": status})
if __name__ == "__main__":
import uvicorn
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(
description="Daily Storyteller FastAPI server")
parser.add_argument("--host", type=str,
default=default_host, help="Host address")
parser.add_argument("--port", type=int,
default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true",
help="Reload code on change")
config = parser.parse_args()
print(f"to join a test room, visit http://localhost:{config.port}/start")
uvicorn.run(
"server:app",
host=config.host,
port=config.port,
reload=config.reload,
)

Some files were not shown because too many files have changed in this diff Show More