Compare commits

...

869 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
060a22f395 github: only run publish_test manually
We need to run this manually to avoid test.pypi.org project size limits.
2024-07-23 14:19:24 -07:00
Aleix Conchillo Flaqué
d3e85355f1 Merge pull request #318 from pipecat-ai/aleix/prepare-0.0.38
update CHANGELOG for 0.0.38
2024-07-23 14:12:01 -07:00
Aleix Conchillo Flaqué
83e730b768 update CHANGELOG for 0.0.38 2024-07-23 14:10:10 -07:00
Aleix Conchillo Flaqué
5fcc96446c Merge pull request #317 from pipecat-ai/aleix/silero-repo-params
vad(silero): expose cache and repo parameters
2024-07-23 12:13:20 -07:00
Aleix Conchillo Flaqué
ad88925154 vad(silero): expose cache and repo parameters 2024-07-23 12:12:28 -07:00
Aleix Conchillo Flaqué
0a6ddbf15c Merge pull request #316 from pipecat-ai/aleix/metrics-improvements
metrics improvements
2024-07-23 11:23:57 -07:00
Aleix Conchillo Flaqué
08e0722d97 fix initial metrics format 2024-07-23 11:23:03 -07:00
Aleix Conchillo Flaqué
05d4fba551 processors(rtvi): send initial empty metrics 2024-07-23 11:22:41 -07:00
Aleix Conchillo Flaqué
f41c2b3c9f transports(daily): don't send empty metrics 2024-07-23 11:22:41 -07:00
Aleix Conchillo Flaqué
69f64899fe pipeline: add send_initial_empty_metrics flag 2024-07-23 11:22:41 -07:00
Aleix Conchillo Flaqué
33f0865430 Merge pull request #315 from pipecat-ai/aleix/stop-transcription-error
transports(daily): wait until start|stop_transcription are finished
2024-07-23 11:18:59 -07:00
Aleix Conchillo Flaqué
ad5b9202ab transports(daily): wait until start|stop_transcription are finished
Fixes #305
2024-07-22 22:59:30 -07:00
Aleix Conchillo Flaqué
1676693091 Merge pull request #314 from pipecat-ai/aleix/transcription-timestamps
services: transcription timestamp should use ISO8601 format
2024-07-22 22:43:01 -07:00
Aleix Conchillo Flaqué
0852b50b8f services: transcription timestamp should use ISO8601 format 2024-07-22 22:40:28 -07:00
Aleix Conchillo Flaqué
eb998aa502 Merge pull request #312 from pipecat-ai/aleix/rtvi-support
RTVI support
2024-07-22 16:58:40 -07:00
Aleix Conchillo Flaqué
6dab0e9de7 update CHANGELOG for 0.0.37 2024-07-22 16:00:30 -07:00
Aleix Conchillo Flaqué
95ff1d141c update CHANGELOG with RTVIProcessor 2024-07-22 16:00:26 -07:00
Aleix Conchillo Flaqué
87bc8a9da6 examples: remove RTVI since there are full demos elsewhere 2024-07-22 15:53:39 -07:00
Aleix Conchillo Flaqué
087fe9a537 services(cartesia): fix TTFB 2024-07-22 15:30:16 -07:00
Aleix Conchillo Flaqué
c1170260b5 processors(rtvi): use generic LLM and TTS names 2024-07-22 15:27:33 -07:00
Aleix Conchillo Flaqué
65cdf50774 processors(rtvi): fix task cleanup 2024-07-22 15:01:45 -07:00
Aleix Conchillo Flaqué
9233bb490c processors(rtvi): add support for "tts-text" messages 2024-07-22 11:40:17 -07:00
Aleix Conchillo Flaqué
43932220f7 processors(rtvi): use only user-transcription 2024-07-22 09:40:16 -07:00
Aleix Conchillo Flaqué
cea4d1894e processors(rtvi): change voice before LLM updates 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
80baa0358d processors(rtvi): lable is now rtvi 2024-07-22 09:32:18 -07:00
Chad Bailey
5d73db53a0 initial pseudo function calling 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
302ea90dce processors(rtvi): messages now require an id 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
37b04ed283 processors(rtvi): use send a type=response as command responses 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
be6995cfdf processors(rtvi): renamed realtime-ai to rtvi 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
dfbc11300c processors(realtime-ai): use label instead of tag 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
82d539d174 processors(realtime-ai): add support for interrupting the bot 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
6e00f31014 updated CHANGELOG with new frames and realtime-ai changes 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
a46ac3cc92 examples: moved 18-realtime-ai.py to examples/realtime-ai 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
6fbf98d8e2 processors(realtime-ai): llm-context now uses a data field 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
f094c42728 processors(realtime-ai): add transcription messages 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
13827e1282 processors(realtime-ai): send a successful response for every command 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
32170b47d9 processors(realtime-ai): add user-[start|stopped]-speaking messages 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
09c05354c2 processors(realtime-ai): fix voice initialization 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
b0b1475563 processors(realtime-ai): add support making TTS to speak 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
b85dd7283a processors(realtime-ai): add support for appending to the LLM context 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
846ae765e5 services(TTSService): fix sentence cleanup 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
4c629e538e processors(realtime-ai): add assistant before output transport
Cartesia can do word-to-word output instead of full sentences. This means that
for properly adding things into the context we need to add it before the
transport, otherwise some words might be lost.
2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
f6e22bb3b9 processors(realtime-ai): add silero vad to the transport 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
46a048d7f6 processors(realtime-ai): allow default setup to be None 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
bd9f4eea06 processors(realtime-ai): provide default values 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
0a672e61e2 processors(realtime-ai): update it to use groq by default 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
29a8530221 processors(realtime-ai): add support for updating config (model, voice...) 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
3e738642a7 processors(realtime-ai): add support for getting/updating LLM context 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
f551f55f03 examples: add new foundational/18-realtime-ai.py 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
9f012c8002 processors: add new RealtimeAIProcessor 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
0a69a9e5ef transport(daily): also accept TransportMessageFrame 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
194790183a processor: add support for setting a processor parent 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
2227721173 update CHANGELOG with StatelessTextTransformer fix (update) 2024-07-22 09:30:45 -07:00
Aleix Conchillo Flaqué
77a53da5f5 update CHANGELOG with StatelessTextTransformer fix 2024-07-22 09:28:38 -07:00
Aleix Conchillo Flaqué
ab63ff275d Merge pull request #310 from weedge/fix/StatelessTextTransformer
fix: push_frame use TextFrame
2024-07-22 09:25:27 -07:00
weedge
e5363f65f0 fix: push_frame use TextFrame
Signed-off-by: weedge <weege007@gmail.com>
2024-07-22 17:29:06 +08:00
Lewis Wolfgang
ffc157de65 Merge pull request #307 from pipecat-ai/lewis/increase_openai_keepalive_expiry
Allow openai http connections to remain open in the pool indefinitely.
2024-07-19 07:09:17 -04:00
Lewis Wolfgang
f9fdadb4c0 Allow openai http connections to remain open in the pool indefinitely.
Rather than expiring in 5 seconds.
2024-07-18 11:18:21 -04:00
Aleix Conchillo Flaqué
4efccb79f2 Merge pull request #306 from pipecat-ai/aleix/remove-llm-response-start-end-frame
remove LLMResponseStartFrame and LLMResponseEndFrame
2024-07-17 21:51:02 -07:00
Aleix Conchillo Flaqué
337968199a update CHANGELOG with CartesiaTTSService and TTSService updates 2024-07-17 20:58:10 -07:00
Aleix Conchillo Flaqué
37027f68cb remove LLMResponseStartFrame and LLMResponseEndFrame
This was added in the past to properly handle interruptions for the
LLMAssistantContextAggregator. But this is not necessary anymore since we can
handle interruptions by just processing the StartInterruptionFrame, so there's
no need for these extra frames.
2024-07-17 20:53:35 -07:00
Kwindla Hultman Kramer
d1b62c5495 Merge pull request #304 from pipecat-ai/khk/cartesia-continue
Cartesia streaming (WebSocket) and word-level timestamps support
2024-07-17 20:29:15 -07:00
Kwindla Hultman Kramer
355fe01cb7 fixed forgotten renames 2024-07-17 20:28:27 -07:00
Kwindla Hultman Kramer
9d050a16c7 committing an uncommitted file 2024-07-17 20:23:41 -07:00
Kwindla Hultman Kramer
fa53c67606 comments re fixes 2024-07-17 18:30:45 -07:00
Kwindla Hultman Kramer
5006376fe6 undo changes to 02-llm-say-one-thing.py 2024-07-17 15:18:47 -07:00
Kwindla Hultman Kramer
2204b8e205 cartesia streaming and context management via word-level timestamps 2024-07-17 15:17:00 -07:00
Kwindla Hultman Kramer
270007b17c wip - using cartesia word timestamps for context management 2024-07-17 14:13:52 -07:00
Kwindla Hultman Kramer
568eb2ef4c cartesia websockets and streaming 2024-07-17 14:13:52 -07:00
Kwindla Hultman Kramer
73ca9184a8 wip cartesia continuation (not working yet) 2024-07-17 14:13:52 -07:00
Aleix Conchillo Flaqué
5e8e11e16e pyproject: require python >= 3.10 2024-07-17 09:52:42 -07:00
Aleix Conchillo Flaqué
029bbc16f2 Merge pull request #286 from TomTom101/feat/regex_endofsentence
fix: No more falsely detect a sentence end on "U.S.A", "3:00 a.m."
2024-07-17 09:49:21 -07:00
Aleix Conchillo Flaqué
9e3d87e4f6 Merge pull request #291 from adidoit/main
Fix error with readme example - SyntaxError: positional argument follows keyword argument
2024-07-15 13:10:17 -04:00
Aleix Conchillo Flaqué
f1410a1127 Merge pull request #297 from wtlow003/main
fix: minor typo
2024-07-15 13:08:23 -04:00
wtlow003
2b980d16c3 fix: minor typo 2024-07-12 18:27:57 +08:00
Adi Pradhan
b2b97aafb8 fix error with readme example - SyntaxError: positional argument follows keyword argument 2024-07-10 09:50:20 -04:00
TomTom101
da2082b025 chore: Combined combinable lookaheads 2024-07-06 11:11:40 +02:00
TomTom101
327ea9d547 chore: Make it a const 2024-07-06 11:08:51 +02:00
TomTom101
b23db4a202 chore: commented regex 2024-07-06 11:06:52 +02:00
TomTom101
d1a36004ab fix: No more falsely detect a sentence end on "U.S.A", "3:00 a.m." and more 2024-07-06 11:01:32 +02:00
Jon Taylor
6071920c45 Merge pull request #284 from pipecat-ai/jpt/storybot-load-balance
Update storybot demo
2024-07-03 19:48:32 +01:00
Jon Taylor
5f539e1fba fixed teardown 2024-07-03 17:02:54 +01:00
Jon Taylor
8e1539c360 virtualized deployment and added room-based balancing 2024-07-03 16:48:14 +01:00
Aleix Conchillo Flaqué
065cfb2aca Merge pull request #280 from pipecat-ai/aleix/library-updates-070224
library updates 070224 and pipecat 0.0.36
2024-07-02 10:14:03 -07:00
Aleix Conchillo Flaqué
3147534e86 update CHANGELOG for 0.0.36 2024-07-02 10:13:26 -07:00
Aleix Conchillo Flaqué
be5603bf16 examples: fix 06a-image-sync.py 2024-07-02 10:11:50 -07:00
Aleix Conchillo Flaqué
b9b0bcdcbd services(azure): close the audio stream on exit 2024-07-02 10:11:35 -07:00
Aleix Conchillo Flaqué
5bcece56f3 services(cartesia): make sure we close the client on exit 2024-07-02 10:11:16 -07:00
Aleix Conchillo Flaqué
d67faef88c pyproject: multiple library updates 2024-07-02 09:05:37 -07:00
Aleix Conchillo Flaqué
8f6db5e905 Merge pull request #279 from pipecat-ai/aleix/gladia-stt-support
add Gladia STT support
2024-07-02 08:07:35 -07:00
Aleix Conchillo Flaqué
82e93a0560 use exclude_none=True when dumping BaseModels 2024-07-02 08:03:31 -07:00
Aleix Conchillo Flaqué
a9a82c083b services: add GladiaSTTService support 2024-07-02 08:03:29 -07:00
Aleix Conchillo Flaqué
974d9c33ed Merge pull request #278 from pipecat-ai/aleix/detect-user-idle
add support for detecting user idle
2024-07-02 08:01:27 -07:00
Jon Taylor
c1957ab694 Merge pull request #274 from pipecat-ai/jpt/deployment-examples
Example deployment pattern for fly.io
2024-07-02 10:17:13 +01:00
Jon Taylor
b20a10a4bc fixed double fly 2024-07-02 10:17:01 +01:00
Aleix Conchillo Flaqué
be14ce465d transports(daily): make sure we don't send data if client is closed 2024-07-01 18:26:13 -07:00
Aleix Conchillo Flaqué
d1ca0c5614 examples: added new 17-detect-user-idle.py 2024-07-01 18:17:43 -07:00
Aleix Conchillo Flaqué
535514f506 processors: added new UserIdleProcessor 2024-07-01 18:17:43 -07:00
Aleix Conchillo Flaqué
933b63cf13 processors: added new IdleFrameProcessor 2024-07-01 14:57:42 -07:00
Aleix Conchillo Flaqué
d7c3e380a5 added BotSpeakingFrame 2024-07-01 14:57:18 -07:00
Aleix Conchillo Flaqué
c5298f78cb add more missing keyword-only arguments 2024-07-01 12:34:53 -07:00
Jon Taylor
4f8f7b8d1d added on_call_state event to prevent idle vms 2024-07-01 19:21:16 +01:00
Aleix Conchillo Flaqué
d7d46919ac update macos-py3.10-requirements.txt 2024-07-01 11:00:59 -07:00
Aleix Conchillo Flaqué
e5d73d2e2e update linux-py3.10-requirements.txt 2024-07-01 10:58:49 -07:00
Aleix Conchillo Flaqué
b145e8ec90 update README with XTTS 2024-07-01 10:49:43 -07:00
Aleix Conchillo Flaqué
97ff4a1fb8 Merge pull request #275 from pipecat-ai/aleix/add-missing-keyword-separators
add missing keyword separators
2024-07-01 10:45:31 -07:00
Aleix Conchillo Flaqué
5018a552c1 services(xtts): no need the WAV header 2024-07-01 10:44:32 -07:00
Aleix Conchillo Flaqué
7f9fd9ffce examples: added 07i-interruptible-xtts 2024-07-01 10:41:34 -07:00
Aleix Conchillo Flaqué
ddd0ca6a8f update CHANGELOG 2024-07-01 10:27:26 -07:00
Aleix Conchillo Flaqué
06f817c7e3 transport(websocket): don't send if serializer returns None 2024-07-01 10:27:26 -07:00
Aleix Conchillo Flaqué
df4c3e56c4 services: add missing * keyword separator 2024-07-01 10:27:26 -07:00
Aleix Conchillo Flaqué
9d5c2b9656 Merge pull request #276 from eddieoz/feature/xtts
Added service XTTS
2024-07-01 10:26:53 -07:00
eddieoz
7ce59c5e2e added service xtts 2024-07-01 20:17:19 +03:00
Aleix Conchillo Flaqué
1c9631fc78 Merge pull request #271 from pipecat-ai/aleix/silero-vad-version
vad(silero): allow specifying a Silero VAD version
2024-07-01 09:39:59 -07:00
Aleix Conchillo Flaqué
efbe7297f7 vad(silero): allow specifying a Silero VAD version 2024-07-01 09:38:43 -07:00
Aleix Conchillo Flaqué
1b45946a61 Merge pull request #270 from pipecat-ai/aleix/async-frame-processor
add new AsyncFrameProcessor and AsyncAIService
2024-07-01 09:37:51 -07:00
Aleix Conchillo Flaqué
cbf5a6362c add new AsyncFrameProcessor and AsyncAIService 2024-07-01 09:37:02 -07:00
Aleix Conchillo Flaqué
583b96c341 Merge pull request #269 from pipecat-ai/aleix/improve-error-handling
improve error handling and don't swallow exceptions
2024-07-01 09:36:00 -07:00
Aleix Conchillo Flaqué
fc0920504d improve error handling and don't swallow exceptions 2024-07-01 09:35:45 -07:00
Aleix Conchillo Flaqué
abd65a93b2 Merge pull request #268 from pipecat-ai/aleix/websocket-dont-send-if-closed
transports(websocket): don't send data if websocket closed
2024-07-01 09:33:45 -07:00
Aleix Conchillo Flaqué
c3244fdd7a transports(websocket): don't send data if websocket closed 2024-07-01 09:31:58 -07:00
Aleix Conchillo Flaqué
e8f58938b0 Merge pull request #267 from pipecat-ai/aleix/processing-metrics
add support for processing metrics
2024-07-01 09:31:05 -07:00
Jon Taylor
602b4f34b1 added example fly.toml 2024-07-01 16:50:53 +01:00
Jon Taylor
0399c84dfa added flyio deployment example 2024-07-01 16:46:38 +01:00
Aleix Conchillo Flaqué
fd5d879bf5 add support for processing metrics
Processing metrics indicate how much time a processor takes to generate all of
its output.
2024-06-28 14:26:57 -07:00
Aleix Conchillo Flaqué
8dff460307 Merge pull request #266 from pipecat-ai/aleix/silero-num-frames-fixes
vad: fix Silero VAD required number of frames
2024-06-28 11:25:55 -07:00
Aleix Conchillo Flaqué
cce1ddb183 vad: fix Silero VAD required number of frames 2024-06-28 10:45:48 -07:00
Aleix Conchillo Flaqué
8691d14289 Merge pull request #255 from Viking5274/main
Fix twilio error
2024-06-26 10:17:03 -07:00
daniil5701133
dd402da9e5 added handling streamSid after first wss connect
fixx name
2024-06-26 18:56:30 +03:00
Aleix Conchillo Flaqué
2fd04248f1 examples(storytelling-chatbot): upgrade npm vulnerabilities 2024-06-25 22:04:55 -07:00
Aleix Conchillo Flaqué
0ac42006f8 Merge pull request #260 from pipecat-ai/aleix/more-interruption-fixes
more interruption fixes
2024-06-25 21:52:02 -07:00
Aleix Conchillo Flaqué
66e331248d update CHANGELOG for 0.0.34 2024-06-25 21:43:23 -07:00
Aleix Conchillo Flaqué
4be3e8c87d aggregators: revert using intermediate results 2024-06-25 21:33:17 -07:00
Aleix Conchillo Flaqué
dac033fe61 services(azure): allow transcriptions during interruptions
If the user interrupts we can't just discard transcriptions because the user is
actually interrupting and talking.
2024-06-25 21:33:06 -07:00
Aleix Conchillo Flaqué
d302cbb114 services(deepgram): allow transcriptions during interruptions
If the user interrupts we can't just discard transcriptions because the user is
actually interrupting and talking.
2024-06-25 21:32:21 -07:00
Aleix Conchillo Flaqué
e3b407db28 Merge pull request #259 from pipecat-ai/aleix/prepare-0.0.33
update CHANGELOG for 0.0.33
2024-06-25 12:05:07 -07:00
Aleix Conchillo Flaqué
4ef623f09e update CHANGELOG for 0.0.33 2024-06-25 11:53:07 -07:00
Aleix Conchillo Flaqué
253530a63d Merge pull request #258 from pipecat-ai/aleix/upgrade-cartesia-1.0.0
services(cartesia): upgrade to new cartesia 1.0.0
2024-06-25 11:52:04 -07:00
Aleix Conchillo Flaqué
4f38d989f5 services(cartesia): upgrade to new cartesia 1.0.0 2024-06-25 11:51:34 -07:00
Aleix Conchillo Flaqué
84074e90ee Merge pull request #257 from pipecat-ai/aleix/cancel-all-tasks-when-interrutpted
cancel all tasks when interrutpted
2024-06-25 11:16:00 -07:00
Aleix Conchillo Flaqué
38aee7d8f2 services(azure): cancel tasks when interrupted and ignore incoming transcriptions 2024-06-25 11:15:26 -07:00
Aleix Conchillo Flaqué
64198313c6 services(deepgram): cancel tasks when interrupted and ignore incoming transcriptions 2024-06-25 11:15:07 -07:00
Aleix Conchillo Flaqué
d61b6c301c transports(base_input): create push tasks after pushing interruption 2024-06-25 11:15:07 -07:00
Aleix Conchillo Flaqué
83d1931266 Merge pull request #256 from pipecat-ai/aleix/tts-cleanup-when-interrupted
services(tts): strip before TTS and cleanup when interrupted
2024-06-25 11:14:32 -07:00
Aleix Conchillo Flaqué
c31f2ab285 services(tts): strip before TTS and cleanup when interrupted 2024-06-25 11:13:19 -07:00
Aleix Conchillo Flaqué
0ddc5721b4 Merge pull request #252 from pipecat-ai/aleix/daily-check-size-read-audio-frames
transports(daily): always check size of read audio frames
2024-06-25 09:45:05 -07:00
Aleix Conchillo Flaqué
98bd183bc4 pyproject: fix cartesia version and update requirements files 2024-06-25 09:43:54 -07:00
Aleix Conchillo Flaqué
aaa154524c Merge pull request #253 from pipecat-ai/aleix/llm-response-use-intermediate-results
aggregators: uses intermediate results for LLMAssistantResponseAggreg…
2024-06-24 19:21:14 -07:00
Aleix Conchillo Flaqué
beced68337 aggregators: uses intermediate results for LLMAssistantResponseAggregator 2024-06-24 17:33:45 -07:00
Aleix Conchillo Flaqué
94823ab952 transports(daily): always check size of read audio frames 2024-06-24 14:56:24 -07:00
Kwindla Hultman Kramer
0b6a19802f Merge pull request #250 from pipecat-ai/lewis/flush-tts-on-llm-response-end
Flush output from TTSService on LLMFullResponseEndFrame
2024-06-22 20:37:45 -04:00
Lewis Wolfgang
c4a2d2197c Flush output from TTSService on LLMFullResponseEndFrame
To cover cases when the LLM response does not end in punctuation.
2024-06-22 14:57:44 -04:00
Aleix Conchillo Flaqué
269d06aa15 Merge pull request #249 from pipecat-ai/aleix/pipecat-0.0.32
update CHANGELOG.md for 0.0.32
2024-06-22 09:21:21 -07:00
Aleix Conchillo Flaqué
dfef1f2c54 update CHANGELOG.md for 0.0.32 2024-06-22 09:19:22 -07:00
Aleix Conchillo Flaqué
b62beaba0b Merge pull request #248 from pipecat-ai/aleix/deepgramstt-url
services(deepgram): add url to DeepgramSTTService
2024-06-21 22:26:23 -07:00
Aleix Conchillo Flaqué
adf414e40f services(deepgram): add url to DeepgramSTTService 2024-06-21 16:52:28 -07:00
Aleix Conchillo Flaqué
dc64e57f63 Merge pull request #241 from pipecat-ai/aleix/transports-async
transports: fully use asyncio in all read/write operations
2024-06-21 16:00:08 -07:00
Aleix Conchillo Flaqué
d3e410b2ac transports: fully use asyncio in all read/write operations 2024-06-21 15:55:15 -07:00
Aleix Conchillo Flaqué
c544b2474b update linux-py3.10-requirements with fastapi and new daily-python 2024-06-21 15:44:01 -07:00
Aleix Conchillo Flaqué
18243de358 add fastapi and update macos-py3.10-requirements.txt 2024-06-21 13:16:47 -07:00
Aleix Conchillo Flaqué
6625895d1f update macos-py3.10-requirements.txt 2024-06-21 13:13:02 -07:00
Aleix Conchillo Flaqué
f9ecce739e Merge pull request #247 from pipecat-ai/aleix/twilio-updates
some twilio updates
2024-06-21 10:14:40 -07:00
Aleix Conchillo Flaqué
0075dd8386 update linux/macos-py3.10-requirements.txt 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
eef1cde816 updated CHANGELOG.md with fastapi and twilio updates 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
8d867c30c6 transports(websocket): verify websockets module 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
42c668b7ae examples(twilio-chatbot): update instructions and renames 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
b62227b4ae serializers(twilio): formatting and allow str | bytes | None 2024-06-21 09:47:17 -07:00
Aleix Conchillo Flaqué
25ef0cb87b serializers: allow str | bytes | None 2024-06-21 09:42:43 -07:00
Aleix Conchillo Flaqué
e195941aa5 Merge pull request #246 from pipecat-ai/aleix/daily-dialout-answered
transports(daily): added dialout_answered event
2024-06-20 18:37:24 -07:00
Aleix Conchillo Flaqué
e09eef1dd7 Merge pull request #243 from Viking5274/main
Add twilio_websocket_service with example
2024-06-20 14:09:48 -07:00
Aleix Conchillo Flaqué
7c13663a4e transports(daily): added dialout_answered event 2024-06-20 13:01:25 -07:00
daniil5701133
5753869e5e add twilio-chatbot example with README.md info how to start app
created twilio_websocket_service.py, TwilioFrameSerializer.py

moved pcm_16000_to_ulaw_8000 and ulaw_8000_to_pcm_16000 to src/pipecat/utils/audio.py
fixed callback on disconnect
2024-06-20 23:00:01 +03:00
chadbailey59
ba878a19f4 fixed "Dr." interruption (#245) 2024-06-19 20:53:04 -05:00
Aleix Conchillo Flaqué
55a9de78cd Merge pull request #239 from pipecat-ai/aleix/azure-stt
azure stt support
2024-06-14 14:07:07 +08:00
Aleix Conchillo Flaqué
ff51fc9091 updated CHANGELOG and README 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
a4f857ee34 examples: use new AzureSTTService in 07f-interruptible-azure 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
3250d74bef services(azure): new AzureSTTService 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
c086160239 examples: cleanup some 07 interruptible examples 2024-06-13 16:36:10 -07:00
Aleix Conchillo Flaqué
6cdccaff53 Merge pull request #238 from pipecat-ai/aleix/pipecat-0.0.31
pipecat 0.0.31
2024-06-14 06:31:41 +08:00
Aleix Conchillo Flaqué
a9ab8de25d update CHANGELOG for 0.0.31 2024-06-13 15:31:03 -07:00
Aleix Conchillo Flaqué
2a29cb18a5 transports(base_output): chunk audio into 20ms instead of 10ms 2024-06-13 15:30:41 -07:00
Aleix Conchillo Flaqué
4193a4f415 Merge pull request #237 from pipecat-ai/aleix/pipecat-0.0.30
update CHANGELOG for 0.0.30
2024-06-14 05:28:14 +08:00
Aleix Conchillo Flaqué
0226ec450a update CHANGELOG for 0.0.30 2024-06-13 14:27:37 -07:00
Aleix Conchillo Flaqué
020b8ebb35 Merge pull request #236 from pipecat-ai/aleix/report-only-initial-ttfb
report only initial ttfb
2024-06-14 05:24:52 +08:00
Aleix Conchillo Flaqué
1170b30c1b aggregator(user_response): also handle small VADParams.stop_secs 2024-06-13 13:30:31 -07:00
Aleix Conchillo Flaqué
0004d4a906 vad: reduce smoothing factor and increase confidence 2024-06-13 13:30:11 -07:00
Aleix Conchillo Flaqué
cb27e86266 metrics: allow sending only initial TTFB metrics 2024-06-13 13:30:00 -07:00
Aleix Conchillo Flaqué
77a3b2ea5c Merge pull request #235 from pipecat-ai/aleix/openpipe-refactoring
openpipe refactoring
2024-06-14 01:28:50 +08:00
Aleix Conchillo Flaqué
099e65f3b6 report processor name in error logs 2024-06-13 10:20:45 -07:00
Aleix Conchillo Flaqué
befb8db120 update pyproject and requirements 2024-06-13 10:20:45 -07:00
Aleix Conchillo Flaqué
9992d826b1 examples: renamed 06b-listen... to 07h-inte... 2024-06-13 10:18:20 -07:00
Aleix Conchillo Flaqué
18604e1a39 re-add removed CHANGELOG lines 2024-06-13 10:18:20 -07:00
Aleix Conchillo Flaqué
312c569182 services(openpipe): refactored so it's based on BaseOpenAILLMService 2024-06-13 09:30:50 -07:00
Aleix Conchillo Flaqué
b43e0ed130 Merge pull request #233 from KwalAI/openpipe-integration
OpenPipe Integration
2024-06-13 22:41:57 +08:00
Aleix Conchillo Flaqué
289debea34 Merge pull request #234 from pipecat-ai/aleix/fix-daily-room-properties-exp
transports(helpers): fix DailyRoomProperties.exp
2024-06-13 22:38:41 +08:00
Aleix Conchillo Flaqué
ccd6af7016 transports(helpers): fix DailyRoomProperties.exp 2024-06-12 23:15:22 -07:00
Ankur Duggal
effc69e4e4 formatting 2024-06-12 15:01:19 -07:00
Ankur Duggal
c7a0d0db64 OpenPipe Integration 2024-06-12 14:23:56 -07:00
Aleix Conchillo Flaqué
50d69a1ca4 Merge pull request #231 from pipecat-ai/aleix/websocket-deserializer-none
serializer: allow deserialize() to return None
2024-06-13 04:36:03 +08:00
Aleix Conchillo Flaqué
8a6b8fe70a Merge pull request #232 from pipecat-ai/aleix/pyproject-deepgram
pyproject: add deepgram-sdk
2024-06-13 03:53:08 +08:00
Aleix Conchillo Flaqué
c4e53aea71 update macos-py3.10-requirements with deepgram 2024-06-12 12:52:20 -07:00
Aleix Conchillo Flaqué
ad5125e93f pyproject: add deepgram-sdk 2024-06-12 12:50:18 -07:00
Aleix Conchillo Flaqué
8d92cbac93 Merge pull request #230 from pipecat-ai/aleix/processor-names
processor names
2024-06-13 03:16:07 +08:00
Aleix Conchillo Flaqué
0225443ec8 transports(base): always send MetricsFrame 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
71e1d0a334 pipeline: send initial TTFB initial metrics from PipelineTask 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
83f69e02fd allow specifying frame processor names 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
e1b2da1ff0 serializer: allow deserialize() to return None 2024-06-12 12:11:36 -07:00
Kwindla Hultman Kramer
5eb1b90a4b Merge pull request #229 from pipecat-ai/khk-deepgram-url-configurable
Deepgram TTS service improvements
2024-06-12 14:52:04 -04:00
Kwindla Hultman Kramer
9c4ee74b91 bot to test for demo 2024-06-12 10:41:49 -07:00
Aleix Conchillo Flaqué
f65f566829 re-add transports/services/helpers/__init__.py 2024-06-12 10:37:28 -07:00
Aleix Conchillo Flaqué
c8ad3123b7 Merge pull request #207 from pipecat-ai/dialin-example
New example: Dialin bot (call your Pipecat via phone)
2024-06-13 01:36:00 +08:00
Jon Taylor
8cefce28cf added example fly toml 2024-06-12 10:35:03 -07:00
Jon Taylor
a834d26885 removed https from daily boy 2024-06-12 10:35:03 -07:00
Jon Taylor
810e3cd551 added fly.example.toml due to gitignore 2024-06-12 10:35:03 -07:00
Jon Taylor
f258fa96cd added env to dockerignore 2024-06-12 10:35:03 -07:00
Jon Taylor
757ec61f14 added deepgram to readme 2024-06-12 10:35:03 -07:00
Jon Taylor
2c933f43d8 linting errors and removed unusued sip url 2024-06-12 10:35:03 -07:00
Jon Taylor
cc5bfa8af8 removed helps and fixed linting 2024-06-12 10:35:03 -07:00
Jon Taylor
de9f3e55f1 new example: dialin 2024-06-12 10:35:03 -07:00
Aleix Conchillo Flaqué
ed0c986218 Merge pull request #228 from pipecat-ai/aleix/websocket-fixes
websocket fixes
2024-06-13 01:30:21 +08:00
Aleix Conchillo Flaqué
72c27215b6 transports(websocket): use push_audio_frame() 2024-06-12 10:29:39 -07:00
Aleix Conchillo Flaqué
c23b14f768 examples: use DeepgramSTTService in websocker-server 2024-06-12 10:29:22 -07:00
Aleix Conchillo Flaqué
81282f9c4d services(deepgram): keep conenction alive 2024-06-12 10:29:22 -07:00
Aleix Conchillo Flaqué
2b324f6f81 Merge pull request #227 from pipecat-ai/aleix/daily-room-properties-extra
transports(daily): DailyRoomProperties now allow extra unknown parame…
2024-06-13 00:25:07 +08:00
Kwindla Hultman Kramer
049f110344 PipelineTask should not exit when Deepgram TTS returns a Bad Request "unutterable" 2024-06-12 09:24:09 -07:00
Kwindla Hultman Kramer
448a0307a8 rebasing 2024-06-12 07:54:18 -07:00
Aleix Conchillo Flaqué
7390e42f5c transports(daily): DailyRoomProperties now allow extra unknown parameters 2024-06-11 22:31:32 -07:00
Aleix Conchillo Flaqué
ee880d229f Merge pull request #223 from pipecat-ai/aleix/fix-lower-vad-stop-secs
processors: fix LLMResponseAggregator with lower VAD values
2024-06-12 13:30:34 +08:00
Aleix Conchillo Flaqué
9cd07d81f8 processors: fix LLMResponseAggregator with lower VAD values 2024-06-11 22:30:06 -07:00
Aleix Conchillo Flaqué
b453d089c3 Merge pull request #226 from pipecat-ai/aleix/chunk-audio-output
transport: chunk longer audio frames
2024-06-12 13:28:28 +08:00
Aleix Conchillo Flaqué
7410fe1d1e transport: chunk longer audio frames 2024-06-11 17:50:51 -07:00
Aleix Conchillo Flaqué
6323a77431 Merge pull request #224 from pipecat-ai/aleix/deepgram-stt-simple
deepgram stt simple
2024-06-12 08:48:19 +08:00
Aleix Conchillo Flaqué
0aedaa8553 services(deepgram): abstract StartFrame/EndFrame/CancelFrame 2024-06-10 21:18:42 -07:00
Aleix Conchillo Flaqué
6554479d39 transports: don't queue system frames 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
ce2ebd3198 examples: updated 07c-interruptible-deepgram to usee DeepgramSTTService 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
13ea1efc96 examples: add new 13b-deepgram-transcription 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
ef380321cf services: added new DeepgramSTTService 2024-06-10 21:00:01 -07:00
Kwindla Hultman Kramer
294b037730 configurable deepgram base url 2024-06-08 09:38:48 -04:00
Aleix Conchillo Flaqué
7603996612 Merge pull request #220 from pipecat-ai/aleix/pipecat-0.0.29
update CHANGELOG for 0.0.29
2024-06-08 04:43:52 +08:00
Aleix Conchillo Flaqué
3048d2b0b1 update CHANGELOG for 0.0.29 2024-06-07 13:43:00 -07:00
Aleix Conchillo Flaqué
0bb47a09d2 Merge pull request #218 from pipecat-ai/aleix/send-inital-metrics-mapping
send inital metrics mapping
2024-06-08 04:41:59 +08:00
Aleix Conchillo Flaqué
1afe6901d9 processors: add processors_with_metrics() and can_generate_metrics() 2024-06-07 13:38:21 -07:00
Aleix Conchillo Flaqué
3e019fb512 services(openai): remove unused _chat_completions 2024-06-07 13:18:11 -07:00
Aleix Conchillo Flaqué
e069aa9608 updated CHANGELOG with BasePipeline 2024-06-07 13:18:09 -07:00
Aleix Conchillo Flaqué
0b32e42d25 transports(daily): fix extra super().process_frame() 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
8d18be5069 services(anthropic): fix metrics 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
e715d99d0c pipeline: send initial ttfb metrics mapping 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
dc28590247 moved ParallelTask to pipecat.pipeline.parallel_task 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
139f158ea1 Merge pull request #219 from pipecat-ai/aleix/switch-voices
switch voices and languages
2024-06-08 04:13:25 +08:00
Aleix Conchillo Flaqué
4b2a18837f services(whisper): add text logging 2024-06-07 13:12:51 -07:00
Aleix Conchillo Flaqué
b4340d0185 services(whisper): increase no speech probability to 0.4 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
90d11398e6 examples: add 15a-switch-languages 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
bf8c73b25b examples: add 15-switch-voices 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
21cd21de1b processors(filters): add FunctionFilter 2024-06-07 13:12:18 -07:00
Aleix Conchillo Flaqué
c25f6e56e7 Merge pull request #217 from pipecat-ai/khk-tts-timings
Added TTFB timings for all TTS services
2024-06-07 05:42:52 +08:00
Aleix Conchillo Flaqué
a1f1d1995c transports: allow sending metrics 2024-06-06 14:35:34 -07:00
Aleix Conchillo Flaqué
390582d7f3 services: use start/stop_ttfb_metrics to report TTFB metrics 2024-06-06 14:00:10 -07:00
Aleix Conchillo Flaqué
e765a29ca2 processors: implement base process_frame(). all subsclassed should call it 2024-06-06 10:54:21 -07:00
Kwindla Hultman Kramer
cf5c244487 Merge branch 'main' into khk-tts-timings 2024-06-06 13:05:42 -04:00
Kwindla Hultman Kramer
a5eb30a93d changelog 2024-06-06 11:49:05 -04:00
Kwindla Hultman Kramer
ac7bc35944 azure tts ttfb 2024-06-06 11:45:48 -04:00
Kwindla Hultman Kramer
ddfd721f6e openai tts ttfb 2024-06-06 11:32:47 -04:00
Kwindla Hultman Kramer
aee3916cd1 cartesia async fixed 2024-06-06 11:24:26 -04:00
Kwindla Hultman Kramer
3eff1e559b pipecat async working, but maybe needs a threaded implementation 2024-06-06 11:11:06 -04:00
Kwindla Hultman Kramer
1a542c91fa temp commit, woring on playht 2024-06-06 10:48:22 -04:00
Aleix Conchillo Flaqué
cd60a84f8a Merge pull request #215 from pipecat-ai/aleix/silero-vad-memory-fix
vad(silero): fix memory issue
2024-06-06 05:50:47 +08:00
Aleix Conchillo Flaqué
3dd4bac6e6 vad(silero): fix memory issue 2024-06-05 14:50:28 -07:00
Kwindla Hultman Kramer
06ff9cfede added timing logs for cartesia, deepgram, elevenlabs 2024-06-05 16:12:10 -04:00
Aleix Conchillo Flaqué
2d1ed9a304 Merge pull request #214 from pipecat-ai/aleix/pipecat-0.0.27
transports(daily): added participants() and participant_counts()
2024-06-06 03:15:34 +08:00
Aleix Conchillo Flaqué
50b51c05f6 transports(daily): added participants() and participant_counts() 2024-06-05 12:14:00 -07:00
Aleix Conchillo Flaqué
5ce4b8dd5b update CHANGELOG with OpenAITTSService 2024-06-05 11:44:24 -07:00
Aleix Conchillo Flaqué
2f4467b5a5 Merge pull request #213 from pipecat-ai/aleix/pipecat-0.0.26
update CHANGELOG for 0.0.26
2024-06-06 01:10:01 +08:00
Aleix Conchillo Flaqué
e91ab54a69 update CHANGELOG for 0.0.26 2024-06-05 10:07:45 -07:00
Aleix Conchillo Flaqué
6a33432c82 Merge pull request #212 from pipecat-ai/aleix/make-pinlesscallupdate-public
transports(daily): move pinlessCallUpdate to public api
2024-06-05 23:14:14 +08:00
Aleix Conchillo Flaqué
135654a080 transports(daily): move pinlessCallUpdate to public api 2024-06-05 08:08:56 -07:00
Aleix Conchillo Flaqué
7b708a2bee Merge pull request #211 from pipecat-ai/aleix/base-transport-async
various fixes and improvements
2024-06-05 22:57:35 +08:00
Aleix Conchillo Flaqué
b515c28417 services(cartesia): allow output_format and model_id 2024-06-04 19:24:33 -07:00
Aleix Conchillo Flaqué
854ffb0323 update CHANGELOG for DailyRESTHelper 2024-06-04 15:45:17 -07:00
Aleix Conchillo Flaqué
891b7b22ea transports: push EndFrame/CancelFrame before stopping push task 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
c8d37a7227 pipeline(runner): add support for SIGTERM 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
489060881d update macos-py3.10-requirements 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
d56a4cce1b update CHANGELOG with latest changes 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
7eb9dfde38 pyproject: include langchain-community and langchain-openai 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
571e10f83e services(anthropic): fix interruptions with anthropic 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
af202d4fe5 pipeline(task): introduce has_finished() 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
4057fbbcfd transports(tk): fix pyaudio output stream cleanup 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
5cdb8a79a1 examples: use camera_out_is_live for live video 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
a674b43243 transport: remove redundant camera thread and switch audio pull for push 2024-06-04 15:43:54 -07:00
Jon Taylor
ac41f13b7c Merge pull request #205 from pipecat-ai/daily_rest_helpers
Created REST helpers for Daily covering commonly used methods for running / deployment
2024-06-04 22:26:39 +02:00
Jon Taylor
003b9887b1 made sip and sipuri optional and None 2024-06-04 19:03:58 +02:00
Jon Taylor
ba45c2ab5b addressed review (urllib import and linting 2024-06-04 18:39:35 +02:00
Aleix Conchillo Flaqué
9d36a48a80 Merge pull request #208 from pipecat-ai/aleix/cartesia-voice-load-startup
services(cartesia): load voices on startup
2024-06-04 22:54:25 +08:00
Aleix Conchillo Flaqué
20a525635e Merge pull request #201 from TomTom101/TomTom101/openai_tts
Added OpenAI TTS (#196)
2024-06-04 22:53:56 +08:00
Aleix Conchillo Flaqué
659eceea95 services(cartesia): load voices on startup 2024-06-03 14:08:04 -07:00
TomTom101
d462c03d00 chore: Review comments 2024-06-03 20:13:15 +02:00
Jon Taylor
6591e07eb4 removed hardcoded 'https' from API url 2024-06-03 19:32:14 +02:00
Aleix Conchillo Flaqué
fe71825954 Merge pull request #206 from pipecat-ai/aleix/fix-deepgram-tts
services(deepgram): fixed DeepgramTTSService
2024-06-04 00:28:53 +08:00
Aleix Conchillo Flaqué
43516f84fe services(deepgram): fixed DeepgramTTSService 2024-06-03 07:53:46 -07:00
Jon Taylor
0849edb00b added Daily REST helpers file for common methods used in Pipecat bots 2024-06-03 16:38:13 +02:00
Aleix Conchillo Flaqué
dd3b4083eb Merge pull request #204 from TomTom101/TomTom101/langchain
fix: Fixed imports, support new PipelineParams
2024-06-03 03:16:30 +08:00
TomTom101
89673a4040 test(langchain): Use new PipelineParams in test 2024-06-02 20:19:55 +02:00
TomTom101
410dbd3dfc fix: Fixed imports, support new PipelineParams 2024-06-02 20:16:11 +02:00
TomTom101
7085b1ea3f doc(openai): Added hint re the 24kHz sample rate 2024-06-01 20:35:46 +02:00
TomTom101
8683cae719 feat: OpenAITTS 2024-06-01 10:13:28 +02:00
Aleix Conchillo Flaqué
0197efa524 Merge pull request #200 from pipecat-ai/aleix/changelog-0.0.25
update CHANGELOG.md for version 0.0.25
2024-06-01 07:48:42 +08:00
Aleix Conchillo Flaqué
16e76caa33 update CHANGELOG.md for version 0.0.25 2024-05-31 16:48:03 -07:00
Aleix Conchillo Flaqué
1f5240694d Merge pull request #199 from pipecat-ai/aleix/langchain-changelog
move LangchainProcessor to processors/frameworks and update CHANGELOG
2024-06-01 07:46:51 +08:00
Aleix Conchillo Flaqué
f087151db7 move LangchainProcessor to processors/frameworks and update CHANGELOG 2024-05-31 16:45:39 -07:00
Aleix Conchillo Flaqué
0b691ff597 Merge pull request #198 from pipecat-ai/aleix/websocket-transport
websocket transport support
2024-06-01 04:40:39 +08:00
TomTom101
ae049961b7 wip: untested 2024-05-31 22:30:52 +02:00
Aleix Conchillo Flaqué
0d6eee705f Merge pull request #190 from TomTom101/TomTom101/langchain
Langchain service
2024-06-01 04:21:12 +08:00
Aleix Conchillo Flaqué
58d20ec9dc transport(websocket-server): add on_client_disconnected 2024-05-31 12:52:43 -07:00
Aleix Conchillo Flaqué
38befe1dc1 examples(websocket): rename server.py to bot.py 2024-05-31 12:09:54 -07:00
Aleix Conchillo Flaqué
2f335100a5 remove storage folder 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
3fef818843 examples(websocket-server): use VAD analyzer from transport 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
428c8af77e transports(websocket): base class from BaseInputTransport 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
54fccd2e25 pipeline: cleanup processors one by one 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
66c6a5dc0f transports(websocket): base class from BaseOutputTransport 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
92561ae19d some event loop parameter updates 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
b85e93410b transports(daily): fix event handlers callback 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
593993ba97 transports(base_input): remove unnecessary task 2024-05-31 11:37:41 -07:00
Aleix Conchillo Flaqué
7b8b606278 update CHANGELOG and create websocker-server instructions 2024-05-31 11:37:19 -07:00
Aleix Conchillo Flaqué
7116ad0607 examples: fix websocket-client audio playback 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
c507044277 examples: use gpt-4o model by default 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
5f45a9d90f examples: websocket-server updates 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e31e87aabd transport(websocket): update audio_frame_size 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
2957416d90 serializers(protobuf): support id and name fields 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
b9b761b67a added sample_rate and num_channels to protobuf AudioRawFrame 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
a7539e9317 transports: simplify and fix async and nested decorators 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
75575c0c68 use get_event_loop() and move event handlers to BaseTransport 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
77b3e08214 examples: add and update wbesocket eaxmples 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
956b783c1a transports: added new WebsocketServerTransport 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e90c080470 serializers: added BaseSerializer 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
37aabaa03a frames: generate protobuf pb2 file for pipecat package 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
3e289a7bef pyproject: add protobuf dependency 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
6dd5e3fdf5 dev-requirements: add grpcio-tools 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e60df3c7c0 Merge pull request #195 from pipecat-ai/aleix/function-calling-move-to-llmservice
function calling move to LLMService
2024-06-01 02:36:29 +08:00
Aleix Conchillo Flaqué
42f772beed examples: some function calling examples cleanup 2024-05-31 11:36:04 -07:00
Aleix Conchillo Flaqué
3655c4a0fc services: move function calling registration to LLMService 2024-05-31 11:36:04 -07:00
Aleix Conchillo Flaqué
012dbffd94 update CHANGELOG.md for function calling 2024-05-31 11:36:03 -07:00
TomTom101
4b39efeee3 fix(langchain): try/catch langchain import in service; Only langchain is installed with the [langchain] extra (#190) 2024-05-31 10:19:27 +02:00
Kwindla Hultman Kramer
19caf750fd Merge pull request #194 from pipecat-ai/khk-cartesia-changelog
Added cartesia line to CHANGELOG.md
2024-05-30 14:18:41 -07:00
Kwindla Hultman Kramer
296611714f added cartesia line to CHANGELOG.md 2024-05-30 10:41:00 -07:00
chadbailey59
4c3d19cc8b Function calling (#175)
* added function calling code back

* removed old llm_context file

* added integration testing for openai

* added function calling example

* added function callbacks

* added function start callback

* fixup

* fixup

* added different return type support for function calling

* intake example working

* added frame loggers

* cleanup

* fixup

* Update openai.py

* removed function call frame types

* fixup

* re-added example

* renumbered wake phrase

* fixup for autopep8

* remove unused imports
2024-05-30 12:25:39 -05:00
Aleix Conchillo Flaqué
a3ba07c7a3 Merge pull request #193 from pipecat-ai/aleix/fix-camera-out-enabled-cpu
transport(output): fix high CPU usage with camera_out_enabled and no …
2024-05-31 01:25:06 +08:00
Kwindla Hultman Kramer
a1579808b2 Merge pull request #189 from pipecat-ai/khk-cartesia-etc
Cartesia TTS
2024-05-30 10:24:45 -07:00
Aleix Conchillo Flaqué
aecb9f5816 transport(output): fix high CPU usage with camera_out_enabled and no images 2024-05-30 10:18:43 -07:00
Aleix Conchillo Flaqué
a5d42a526c Merge pull request #191 from pipecat-ai/aleix/fix-silero-vad
vad: fix silero vad frame processor
2024-05-30 23:25:52 +08:00
Aleix Conchillo Flaqué
a9472f8116 vad: fix silero vad frame processor 2024-05-30 07:50:58 -07:00
TomTom101
b19243ab75 fix: corrected hint to install Langchain libs 2024-05-30 10:53:42 +02:00
TomTom101
2bf094b950 test(langchain): Rewrite to unittest, make it meaningful 2024-05-30 10:43:33 +02:00
Kwindla Hultman Kramer
d5f106ae19 pr fixes 2024-05-29 23:41:35 -07:00
Kwindla Hultman Kramer
920745345a cartesia tts support 2024-05-29 23:35:35 -07:00
TomTom101
143033d7db fix: install langchain-community with the langchain extra 2024-05-30 03:15:14 +02:00
TomTom101
335990c145 wip: hint to install langchain_community 2024-05-30 03:15:14 +02:00
TomTom101
6d24e836b0 wip: Example using LC message history 2024-05-30 03:15:14 +02:00
TomTom101
278a2fed56 wip: First stab at langchain support
Is this a service or processor?
How to deal with conversation history? LC has sophisticated means of this, but might get in the way of `LLMResponseAggregator`
2024-05-30 03:15:14 +02:00
Aleix Conchillo Flaqué
c444004eec Merge pull request #186 from pipecat-ai/aleix/update-changelog-0.0.24
update CHANGELOG.md 0.0.24
2024-05-29 23:23:06 +08:00
Aleix Conchillo Flaqué
72cf7896d7 update CHANGELOG.md 0.0.24 2024-05-29 08:22:33 -07:00
Aleix Conchillo Flaqué
31af5f8177 Merge pull request #182 from pipecat-ai/aleix/expo-se-dialin-ready
transports(daily): expose dialin-ready and handle timeouts
2024-05-29 23:05:47 +08:00
Aleix Conchillo Flaqué
6a68d9a57e pyproject: update daily-python to 0.9.0 2024-05-28 18:30:43 -07:00
Aleix Conchillo Flaqué
39f41ab25e transports(daily): expose dialin-ready and handle timeouts 2024-05-28 18:00:09 -07:00
Aleix Conchillo Flaqué
624cc1e987 Merge pull request #185 from pipecat-ai/aleix/add-start-recording
transport(daily): add start_recording, stop_recording and stop_dialout
2024-05-29 08:24:59 +08:00
Aleix Conchillo Flaqué
08a15e5cdd transports(daily): expose on_app_message 2024-05-28 17:23:34 -07:00
Aleix Conchillo Flaqué
4cd4787e4d transports(daily): added on_call_state_updated 2024-05-28 17:23:34 -07:00
Aleix Conchillo Flaqué
65afee2808 transport(daily): add start_recording, stop_recording and stop_dialout 2024-05-28 17:16:39 -07:00
Aleix Conchillo Flaqué
00ece864ec Merge pull request #184 from pipecat-ai/aleix/introduce-pipelineparams
introduce PipelineParams
2024-05-29 08:14:58 +08:00
Aleix Conchillo Flaqué
6d6d9bea5a introduce PipelineParams 2024-05-28 17:14:14 -07:00
Kwindla Hultman Kramer
7c213f8533 Merge pull request #183 from pipecat-ai/khk-deepgram-fix
moving Deepgram TTS base_url from beta to prod
2024-05-28 17:04:03 -07:00
Kwindla Hultman Kramer
3685c19b2d moving Deepgram TTS base_url from beta to prod 2024-05-28 15:59:26 -07:00
Aleix Conchillo Flaqué
650a2b4da4 Merge pull request #174 from pipecat-ai/fix-azure-llm-service
services(azure): fix AzureLLMService
2024-05-25 00:27:51 +08:00
Aleix Conchillo Flaqué
afea6f38f6 examples: no need to define tts twice 2024-05-24 09:23:00 -07:00
Aleix Conchillo Flaqué
c45d428551 services(google): make api_key argument mandatory 2024-05-24 09:23:00 -07:00
Aleix Conchillo Flaqué
4e594aa9b0 services: BaseOpenAILLMService.create_client() now returns the client 2024-05-24 09:04:15 -07:00
Aleix Conchillo Flaqué
32f91c5f31 services(azure): fix AzureLLMService
Fixes #160
2024-05-23 16:51:04 -07:00
Aleix Conchillo Flaqué
a32ece897a Merge pull request #179 from pipecat-ai/aleix/aiohttp-response-text
fix aiohttp response text
2024-05-24 07:42:05 +08:00
Aleix Conchillo Flaqué
88f6436aaa fix aiohttp response text 2024-05-23 15:51:00 -07:00
Aleix Conchillo Flaqué
fac43cea06 Merge pull request #178 from pipecat-ai/aleix/daily-python-0.8.0-deps
update linux/macos requirements
2024-05-24 05:50:10 +08:00
Aleix Conchillo Flaqué
a9e6aeed54 update linux/macos requirements 2024-05-23 14:49:34 -07:00
Aleix Conchillo Flaqué
fa9f49f5bb Merge pull request #177 from pipecat-ai/aleix/dialin-ready-missing-sipuri
transports(daily): fix dialin-ready event handling
2024-05-24 05:39:31 +08:00
Aleix Conchillo Flaqué
2a6183aba5 transports(daily): fix dialin-ready event handling 2024-05-23 14:38:37 -07:00
Aleix Conchillo Flaqué
b1a622971b Merge pull request #176 from pipecat-ai/aleix/handle-dialin-ready
transport(daily): add support for dial-in use cases
2024-05-24 04:58:10 +08:00
Aleix Conchillo Flaqué
5b72faccb4 update CHANGELOG.md for release 0.0.22 2024-05-23 13:57:28 -07:00
Aleix Conchillo Flaqué
c8732544c7 transport(daily): add support for dial-in use cases 2024-05-23 13:56:50 -07:00
Aleix Conchillo Flaqué
d4219b16b8 Merge pull request #170 from pipecat-ai/add-daily-transport-dialout-support
transport(daily): add dialout support
2024-05-24 04:19:51 +08:00
Aleix Conchillo Flaqué
0c33432f64 transport(daily): update CHANGELOG.md with dialout/dialin updates 2024-05-23 13:14:34 -07:00
Aleix Conchillo Flaqué
95bd58cced pyproject: depend on daily-python 0.8.0 2024-05-23 13:10:48 -07:00
Aleix Conchillo Flaqué
8d7d1a7e24 transport(daily): add dialin-ready event 2024-05-23 07:12:31 -07:00
Aleix Conchillo Flaqué
3768cb2f2c transport(daily): add dialout support 2024-05-22 22:44:01 -07:00
Aleix Conchillo Flaqué
d4b2741608 Merge pull request #169 from pipecat-ai/update-changelog-0.0.21
update CHANGELOG.md for 0.0.21
2024-05-23 12:42:41 +08:00
Aleix Conchillo Flaqué
aef2152dcc update CHANGELOG.md for 0.0.21 2024-05-22 21:40:29 -07:00
Aleix Conchillo Flaqué
d0b0221b97 Merge pull request #167 from pipecat-ai/khk-bump-anthropic
add new response frame types and vision support for anthropic
2024-05-23 12:16:55 +08:00
Kwindla Hultman Kramer
b4758cd989 update CHANGELOG.md 2024-05-22 21:14:11 -07:00
Kwindla Hultman Kramer
681250f114 add new response frame types and vision support for anthropic 2024-05-22 21:12:30 -07:00
Aleix Conchillo Flaqué
fd13d3c50e Merge pull request #168 from pipecat-ai/transcription-logging
transports(daily): add transcription logging
2024-05-23 11:42:51 +08:00
Aleix Conchillo Flaqué
674b8bb0cd transports(daily): add transcription logging 2024-05-22 20:41:34 -07:00
Aleix Conchillo Flaqué
5d9a962146 Merge pull request #166 from pipecat-ai/fix-llm-response-wake-check
fix llm response wake check
2024-05-23 11:35:11 +08:00
Aleix Conchillo Flaqué
e130aada72 filters(WakeCheckFilter): increase timeout to 3 2024-05-22 19:41:14 -07:00
Aleix Conchillo Flaqué
76709a9a39 enclose text between brackets when logging 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
acd2d55b84 examples(14): remove commented code 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
fcec0eb812 transports(base): log when user is speaking 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
e9965347b5 processors(WakeCheckFilter): log what frame we are pushing 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
5a83f75e0d processors: fix user response processors 2024-05-22 19:05:18 -07:00
Aleix Conchillo Flaqué
91c706a201 Merge pull request #165 from pipecat-ai/clear-audio-output-buffer-when-interrupted
transport(base): clear audio output buffer if interrupted
2024-05-23 07:31:33 +08:00
Aleix Conchillo Flaqué
34384881bc transport(base): clear audio output buffer if interrupted 2024-05-22 16:30:43 -07:00
Aleix Conchillo Flaqué
71ba28753e Merge pull request #157 from pipecat-ai/khk-improved-wake-word
Improved wake word filter
2024-05-23 06:47:59 +08:00
Aleix Conchillo Flaqué
32d2f0db66 update CHANGELOG.ms with filters updates 2024-05-22 15:46:13 -07:00
Aleix Conchillo Flaqué
e1169a4e82 processors(WakeCheckFilter): push error 2024-05-22 15:44:44 -07:00
Aleix Conchillo Flaqué
0e5711e62d examples: update 10-wake-work.py to use WakeCheckFilter 2024-05-22 15:44:44 -07:00
Aleix Conchillo Flaqué
0ddfa3de5b move WakeCheckFilter to processors/filters 2024-05-22 15:44:43 -07:00
Kwindla Hultman Kramer
661aa79b7c fix user_id str field name in TranscriptionFrame 2024-05-22 15:44:43 -07:00
Kwindla Hultman Kramer
2c32cc2f27 improved wake word filter 2024-05-22 15:44:43 -07:00
Aleix Conchillo Flaqué
d7bb0bc5cb Merge pull request #164 from pipecat-ai/readd-vad-exp-smoothing
vad: re-add volume exponential smoothing
2024-05-23 06:44:27 +08:00
Aleix Conchillo Flaqué
d5644c3ab9 vad: re-add volume exponential smoothing 2024-05-22 15:26:32 -07:00
Aleix Conchillo Flaqué
09ab8e3efd Merge pull request #163 from pipecat-ai/update-0.0.20-deps
update requirements files
2024-05-23 05:40:12 +08:00
Aleix Conchillo Flaqué
2f683529ec update requirements files 2024-05-22 14:39:26 -07:00
Aleix Conchillo Flaqué
6ac012a82b Merge pull request #158 from pipecat-ai/use-pyloudnorm-loudness
interruptions: introduce pyloudnorm to compute loudness
2024-05-23 05:24:38 +08:00
Aleix Conchillo Flaqué
075194cb54 update CHANGELOG for 0.0.20 2024-05-22 14:21:13 -07:00
Aleix Conchillo Flaqué
269f070051 audio: no need for compute_rms 2024-05-22 14:09:24 -07:00
Aleix Conchillo Flaqué
3342c9d7c2 services(stt): use calculate_audio_volume 2024-05-22 13:05:20 -07:00
Aleix Conchillo Flaqué
b468b2f926 audio: clamp normalized volume 2024-05-22 13:04:09 -07:00
Aleix Conchillo Flaqué
af1c7d0023 interruptions: introduce pyloudnorm to compute loudness
https://github.com/csteinmetz1/pyloudnorm
2024-05-22 11:52:07 -07:00
Aleix Conchillo Flaqué
34670eef79 Merge pull request #162 from pipecat-ai/reset-before-pushing
processors: reset aggergator before pushing
2024-05-23 02:51:55 +08:00
Aleix Conchillo Flaqué
979739c1b7 processors: reset aggergator before pushing 2024-05-22 11:26:08 -07:00
Aleix Conchillo Flaqué
83ed6870b9 Merge pull request #161 from pipecat-ai/only-interrupt-assistant
processors: only interrupt asssisstant
2024-05-23 02:02:43 +08:00
Aleix Conchillo Flaqué
57a568986a processors: only interrupt asssisstant
We were pushing interruption frames in the audio task. This was caussing the
LLMUserResponseAggregator to push the accumulated text and then casuing the LLM
to respond.
2024-05-22 10:15:35 -07:00
Aleix Conchillo Flaqué
e828e26b5b Merge pull request #159 from pipecat-ai/create-pool-executor
transports: run threads in their own ThreadPoolExecutor
2024-05-22 15:49:03 +08:00
Aleix Conchillo Flaqué
825738440e transports: run threads in their own ThreadPoolExecutor 2024-05-21 18:52:27 -07:00
Aleix Conchillo Flaqué
147bd1a075 Merge pull request #156 from pipecat-ai/pipecat-0.0.19
update CHANGELOG.md for 0.0.19
2024-05-21 12:36:48 +08:00
Aleix Conchillo Flaqué
209e97f372 update CHANGELOG.md for 0.0.19 2024-05-20 21:33:15 -07:00
Aleix Conchillo Flaqué
47f8627432 Merge pull request #155 from pipecat-ai/llm-accumlate-full-response
aggregators: accumulate full responses and take interruptions into ac…
2024-05-21 11:34:39 +08:00
Aleix Conchillo Flaqué
cc6713837a github: publish test to pypi again. simply always use PRs 2024-05-20 12:19:39 -07:00
Aleix Conchillo Flaqué
728fe0ad88 github: don't publish to test pypi twice 2024-05-20 12:15:54 -07:00
Aleix Conchillo Flaqué
dbba45349f github: don't run publish_test on main branch 2024-05-20 12:14:00 -07:00
Aleix Conchillo Flaqué
40ccf46b4b aggregators: accumulate full responses and take interruptions into account 2024-05-20 11:40:57 -07:00
Aleix Conchillo Flaqué
077bb9f20a Merge pull request #153 from pipecat-ai/expose-llm-messages
aggregators: expose LLM messages
2024-05-21 02:40:26 +08:00
Aleix Conchillo Flaqué
e4c990c677 aggregators: expose LLM messages 2024-05-20 10:51:37 -07:00
Aleix Conchillo Flaqué
1c8b9d813a examples: minot updates to storytelling-chatbot instructions 2024-05-20 10:31:33 -07:00
Aleix Conchillo Flaqué
83812f2671 transports(daily): implement DailyOutputTransport.send_message 2024-05-20 10:30:59 -07:00
Aleix Conchillo Flaqué
4053c33899 update CHANGELOG for 0.0.17 2024-05-19 19:27:20 -07:00
Aleix Conchillo Flaqué
03978b63bc update linux-py3.10-requirements.txt 2024-05-19 19:27:04 -07:00
Aleix Conchillo Flaqué
bf036be6b8 Merge pull request #150 from pipecat-ai/khk-gemini
Initial commit of Google Gemini LLM service.
2024-05-20 10:24:31 +08:00
Kwindla Hultman Kramer
7ffb10d7f5 add to CHANGELOG.md 2024-05-19 12:44:45 -07:00
Kwindla Hultman Kramer
66377954cb fix up openai vision and gemini implementation 2024-05-19 12:33:57 -07:00
Kwindla Hultman Kramer
e507686cef oops, fix openai.py 2024-05-19 11:13:39 -07:00
Kwindla Hultman Kramer
e5ddaf14f4 add google and deepgram to README.md 2024-05-19 11:09:30 -07:00
Kwindla Hultman Kramer
cf597a2f6b add back in debug log line in openai.py 2024-05-19 11:08:38 -07:00
Kwindla Hultman Kramer
d83f0aabca generate macos-py3.10-requirements.txt with Python 3.10 2024-05-19 10:53:50 -07:00
Kwindla Hultman Kramer
b337e984b3 Initial commit of Google Gemini LLM service.
Gemini text input works. We translate from OpenAILLMContext format
on the fly in the GoogleLLMService implementation. This commit also
implements image input (vision) in both the GoogleLLMService and in
the OpenAILLMService. Image input is a hack and needs to be revisited.
OpenAI expects images to be uploaded as base64-encoded JPEGs. Google
does not require the base64 encoding. Other than for images, we use
the OpenAI format as our standard, but base64-encoding the images
and then unencoding them in the GoogleLLMService feels wasteful.
2024-05-19 10:35:20 -07:00
Aleix Conchillo Flaqué
6366ee072e Merge pull request #144 from pipecat-ai/initial-interruptions
intial basic interruptions support
2024-05-20 01:33:15 +08:00
Aleix Conchillo Flaqué
c3bfcbd562 aggregators: clear accumulated responses if interruption happens 2024-05-19 10:21:45 -07:00
Aleix Conchillo Flaqué
c0d5054798 examples: some prompt tweaking 2024-05-19 09:41:36 -07:00
Aleix Conchillo Flaqué
810dc30d3d examples: fix examples to use LLMFullResponseEndFrame 2024-05-19 09:39:34 -07:00
Aleix Conchillo Flaqué
36dd4933e9 example: add assistant responses to simple chatbot 2024-05-18 10:01:46 -07:00
Aleix Conchillo Flaqué
435fffe1b0 add LLMFullResponseStartFrame/LLMFullResponseEndFrame 2024-05-18 09:49:38 -07:00
Aleix Conchillo Flaqué
2b8f1c4cda services(openai): send LLMResponseStartFrame for each completion 2024-05-17 17:47:33 -07:00
Aleix Conchillo Flaqué
0e8c7a9b28 transports(output): create an downstream push frame task 2024-05-17 17:47:24 -07:00
Aleix Conchillo Flaqué
3e13678f23 vad: use exponential smoothed volume to improve speech detection 2024-05-17 17:13:31 -07:00
Aleix Conchillo Flaqué
455ec4f1fd services(tts): always send received TextFrame downstream 2024-05-17 17:11:11 -07:00
Aleix Conchillo Flaqué
8dc81042c3 examples: use DailyTranscriptionSettings in translation-chatbot 2024-05-17 15:37:29 -07:00
Aleix Conchillo Flaqué
c77db79447 examples: pipelines readability and add LLM assistants after transport 2024-05-17 14:52:51 -07:00
Aleix Conchillo Flaqué
de65028061 vad: reduce default confidence back to 0.5 2024-05-17 14:39:40 -07:00
Aleix Conchillo Flaqué
d66a795413 examples: use SileroVADAnalyzer instead of SileroVAD 2024-05-17 14:18:55 -07:00
Aleix Conchillo Flaqué
34762bf604 transports: allows update allow_interruptinos when receiving StartFrame 2024-05-17 14:15:37 -07:00
Aleix Conchillo Flaqué
57121338b1 pipeline(task): cleanup processors only if we need to 2024-05-17 13:53:33 -07:00
Aleix Conchillo Flaqué
a5d246ec0c vad: use exponential smoothing to avoid sudden changes 2024-05-17 13:53:33 -07:00
Aleix Conchillo Flaqué
f2cefeeedc utils: move exp_smoothing to utils module 2024-05-17 13:52:18 -07:00
Aleix Conchillo Flaqué
537e72a05f vad: introduce VADParams so you can tweak things 2024-05-17 13:52:18 -07:00
Aleix Conchillo Flaqué
efa5a061d7 silero: simplify int16 -> float32 conversion 2024-05-17 13:51:06 -07:00
Aleix Conchillo Flaqué
0bef44c2ff introduce StartInterruptionFrame and StopInterruptionFrame 2024-05-17 13:51:06 -07:00
Aleix Conchillo Flaqué
f62fe059b1 fix issues with Ctrl-C tasks cancellation 2024-05-17 13:51:04 -07:00
Aleix Conchillo Flaqué
f432e2b17e transports: allow adding a vad analyzer to BaseInputTransport 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
8c877d7d8e examples: update 07-interruptible 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
dc9377fb92 add missing queue task_done() 2024-05-17 13:50:48 -07:00
Aleix Conchillo Flaqué
7384b63b1d initial interruptions support 2024-05-17 13:50:45 -07:00
Aleix Conchillo Flaqué
ba6ecf541f update CHANGELOG.md for 0.0.16 2024-05-16 18:15:07 -07:00
Aleix Conchillo Flaqué
94e5709d58 Merge pull request #149 from pipecat-ai/transports-push-task
transport: create input transports push frame task
2024-05-17 09:14:35 +08:00
Aleix Conchillo Flaqué
add8d3cbaf transport: create input transports push frame task 2024-05-16 16:54:39 -07:00
Aleix Conchillo Flaqué
1a42188bce Merge pull request #146 from pipecat-ai/daily-dont-send-tracks-if-not-enabled
transports(daily): don't send camera/audio tracks if not enabled
2024-05-17 01:24:39 +08:00
Aleix Conchillo Flaqué
0da427e127 transports(daily): don't send camera/audio tracks if not enabled 2024-05-16 08:16:39 -07:00
Aleix Conchillo Flaqué
9447b32f3e transports(daily): on_app_message doesn't need to be event handler 2024-05-15 17:06:47 -07:00
Aleix Conchillo Flaqué
af10adb7fe some minor event loop updates 2024-05-15 17:00:43 -07:00
Aleix Conchillo Flaqué
129acf886f transports(daily): hot fix for receiving transport messages 2024-05-15 17:00:04 -07:00
Aleix Conchillo Flaqué
9af3e1efac update CHANGELOG.md for 0.0.14 2024-05-15 15:59:38 -07:00
Aleix Conchillo Flaqué
9e22a8b4ff transports(daily): add receiving transport messages 2024-05-15 15:59:08 -07:00
Aleix Conchillo Flaqué
28da747f19 transports(daily): fix on_participant_left event 2024-05-15 15:40:31 -07:00
Aleix Conchillo Flaqué
3d6783ddb0 transports: resize output image if it doesn't match camera 2024-05-15 15:36:20 -07:00
Aleix Conchillo Flaqué
349fc526d7 transports(daily): avoid locking if no participant has joined yet 2024-05-15 15:24:58 -07:00
Aleix Conchillo Flaqué
acf6dc0a30 transports: more start and stop fixes 2024-05-15 15:23:03 -07:00
Aleix Conchillo Flaqué
3563e66ff6 transports(daily): add on_participant_left event 2024-05-15 15:20:37 -07:00
Aleix Conchillo Flaqué
8965ff27ec examples: use DEBUG in 09-mirror.py 2024-05-14 19:25:31 -07:00
Aleix Conchillo Flaqué
86feb1e104 services: fix DailyTransport stop/cleanup ordering 2024-05-14 19:24:55 -07:00
Aleix Conchillo Flaqué
f6257a86d3 examples: re-enable audio in 09-mirror.py 2024-05-14 19:23:35 -07:00
Aleix Conchillo Flaqué
bd04ea8aca examples: simplify 09-mirror.py 2024-05-14 19:07:19 -07:00
Aleix Conchillo Flaqué
754c1c6775 services: fixed DailyTransport output camera and audio 2024-05-14 19:07:19 -07:00
Aleix Conchillo Flaqué
0b01eb5a11 services: pass **kwargs to TTService 2024-05-14 18:46:03 -07:00
Aleix Conchillo Flaqué
6247b9df39 services: fix STTService and WhisperSTTService 2024-05-14 18:45:40 -07:00
Aleix Conchillo Flaqué
bd5344c892 services: MoondreamService model_id argument is now model 2024-05-14 18:34:10 -07:00
Aleix Conchillo Flaqué
e4fe54cd7f vad: rename VADAnalyzer arguments 2024-05-14 18:33:17 -07:00
Aleix Conchillo Flaqué
97f9e9b042 examples: update simple-chatbot prompt 2024-05-14 15:30:31 -07:00
Aleix Conchillo Flaqué
3668eb1606 update CHANGELOG for 0.0.12 2024-05-14 14:52:08 -07:00
Aleix Conchillo Flaqué
e23addcc02 examples: update simple-chatbot with Spanish 2024-05-14 14:51:44 -07:00
Aleix Conchillo Flaqué
5147f4086e transports(daily): add DailyTranscriptionSettings to update settings easier 2024-05-14 14:49:30 -07:00
Aleix Conchillo Flaqué
fb3c2de83f Merge pull request #141 from pipecat-ai/add-changelog
add CHANGELOG.md
2024-05-15 04:47:45 +08:00
Aleix Conchillo Flaqué
107817317c add CHANGELOG.md 2024-05-14 13:45:01 -07:00
Aleix Conchillo Flaqué
663ff3417c examples: add missing requirements 2024-05-14 08:03:51 -07:00
Aleix Conchillo Flaqué
2b19d6bbac examples: remove commented out silero from storytelling 2024-05-14 00:57:21 -07:00
Aleix Conchillo Flaqué
7c41246e55 examples: fix storytelling example 2024-05-14 00:32:37 -07:00
Aleix Conchillo Flaqué
11aa9dc803 pipeline: allow stopping tasks with StopTaskFrame 2024-05-14 00:30:32 -07:00
Aleix Conchillo Flaqué
922cdefee5 services: run_* now return async generators 2024-05-14 00:30:07 -07:00
Aleix Conchillo Flaqué
e018d5b47a transports(daily): always allow capturing transcriptions 2024-05-14 00:29:02 -07:00
Aleix Conchillo Flaqué
20c679988c transports: allow base transports to be reused 2024-05-14 00:28:43 -07:00
Aleix Conchillo Flaqué
a344101cff README.md: s/Twitter/X/ 2024-05-13 18:24:06 -07:00
Aleix Conchillo Flaqué
2cefc40a77 README.md: use http urls for images 2024-05-13 18:20:57 -07:00
Aleix Conchillo Flaqué
68f0da26b6 examples: more translation-chatbot fixes 2024-05-13 17:57:11 -07:00
Aleix Conchillo Flaqué
9aea8e951c aggregators/sentence: ignore interim transcriptions 2024-05-13 17:56:19 -07:00
Aleix Conchillo Flaqué
12ff6d08fe examples: fix translation-chatbot 2024-05-13 16:22:11 -07:00
Aleix Conchillo Flaqué
1b21867a6f transports: add support for sending transport messages 2024-05-13 16:22:11 -07:00
Aleix Conchillo Flaqué
d28d0fa218 processors: add FrameProcessor.push_error 2024-05-13 16:12:35 -07:00
Aleix Conchillo Flaqué
01381f6dcd frames: add TransportMessageFrame 2024-05-13 16:12:30 -07:00
Aleix Conchillo Flaqué
c111fff0f7 services: update azure services 2024-05-13 16:12:26 -07:00
Aleix Conchillo Flaqué
50677e6085 Merge pull request #138 from pipecat-ai/moondream-chatbot-fixes
examples: fix moondream-chatbot
2024-05-14 06:29:13 +08:00
Aleix Conchillo Flaqué
22cd1ac5f2 examples: fix moondream-chatbot 2024-05-13 15:28:11 -07:00
Kwindla Hultman Kramer
fdfcfd1d5e Merge pull request #137 from rahulunair/intel_gpu
(feat): adding intel gpus support
2024-05-13 14:52:34 -07:00
Aleix Conchillo Flaqué
b6385be6c6 Merge pull request #136 from pipecat-ai/simple-chatbot-fixes
examples: fix simple-chatbot
2024-05-14 05:41:52 +08:00
rahulunair
6be88fa81b (feat): adding intel gpus support 2024-05-13 21:21:05 +00:00
Aleix Conchillo Flaqué
ed31c7924e examples: fix simple-chatbot 2024-05-13 13:19:11 -07:00
Jon Taylor
4898084645 Update LICENSE 2024-05-13 20:49:51 +01:00
chadbailey59
6be0751a52 Delete CNAME 2024-05-13 14:42:46 -05:00
Aleix Conchillo Flaqué
7ce1206ed4 Create CNAME 2024-05-13 12:05:08 -07:00
Jon Taylor
1b5130694a Update README.md 2024-05-13 19:36:39 +01:00
Jon Taylor
7c6199e93e Merge pull request #135 from pipecat-ai/jpt/devrel-edits-2
Jpt/devrel edits 2
2024-05-13 18:19:33 +01:00
Jon Taylor
3be742479d removed space 2024-05-13 18:17:00 +01:00
Aleix Conchillo Flaqué
d380b02a44 README: improve code reading 2024-05-13 10:12:19 -07:00
Aleix Conchillo Flaqué
5600fc49f1 README: fix code indentation 2024-05-13 10:08:09 -07:00
Jon Taylor
5f0d8b8d9f removed docs badge 2024-05-13 17:42:01 +01:00
Jon Taylor
8204e5c2d4 removed images 2024-05-13 17:41:03 +01:00
Jon Taylor
29b98c0326 removed images from examples readme 2024-05-13 17:40:07 +01:00
Jon Taylor
3502ef4745 Merge pull request #134 from pipecat-ai/jpt/devrel-edits
Added example apps to repo
2024-05-13 17:37:31 +01:00
Jon Taylor
0d28e84c59 addressed nitpicks 2024-05-13 17:37:01 +01:00
Jon Taylor
062fbf4ce3 fixed header for VAD 2024-05-13 17:20:50 +01:00
Jon Taylor
af8471b370 changed daily_url to daily_room 2024-05-13 17:20:10 +01:00
Jon Taylor
f756027333 updated text for simple example 2024-05-13 17:17:41 +01:00
Jon Taylor
65c4c0b21f fixed typo in readme 2024-05-13 17:14:17 +01:00
Jon Taylor
f1c02f8554 added examples back 2024-05-13 17:09:46 +01:00
Jon Taylor
27ba50cbbf updated README with sample code 2024-05-13 14:51:10 +01:00
Aleix Conchillo Flaqué
b254525d3c go back to using @dataclass since they can be inspected 2024-05-12 22:35:43 -07:00
Aleix Conchillo Flaqué
6c06fb8169 README: update pypi badge 2024-05-12 19:28:00 -07:00
Aleix Conchillo Flaqué
721cd11d62 Merge pull request #133 from pipecat-ai/aleix/readme
rebased jpt/readme branch
2024-05-13 10:26:45 +08:00
Aleix Conchillo Flaqué
bfbcb9d531 fix autopep8 linting 2024-05-12 19:25:17 -07:00
Aleix Conchillo Flaqué
724e78c5be renamed image.png to pipecat.png 2024-05-12 17:44:10 -07:00
Jon Taylor
d3c3d78855 added discord badge 2024-05-12 17:41:36 -07:00
Jon Taylor
8fa9fdcd5a Reworked readme to have more pipes and cats 2024-05-12 17:41:30 -07:00
Aleix Conchillo Flaqué
7856d20a38 Merge pull request #132 from pipecat-ai/pypi-repo-change
change pypi repo to pipecat-ai
2024-05-13 03:14:40 +08:00
Aleix Conchillo Flaqué
6d10027f2d change pypi repo to pipecat-ai 2024-05-12 12:08:43 -07:00
Aleix Conchillo Flaqué
bea31215dc Merge pull request #129 from daily-co/wip-proposal
pipecat proposal
2024-05-13 01:13:18 +08:00
Aleix Conchillo Flaqué
083480ca1e update macos-py3.10-requirements.txt 2024-05-12 10:10:35 -07:00
Aleix Conchillo Flaqué
65846330cf update linux-py3.10-requirements.txt 2024-05-12 10:09:04 -07:00
Aleix Conchillo Flaqué
29f48266f7 README: install dev-requirements.txt first 2024-05-12 10:07:54 -07:00
Aleix Conchillo Flaqué
bfd583211c examples: use LocalAudioTransport 2024-05-12 10:07:54 -07:00
Aleix Conchillo Flaqué
b026915d19 initial commit for new pipecat architecture 2024-05-12 10:07:25 -07:00
Aleix Conchillo Flaqué
4a0836dc8f Merge pull request #130 from daily-co/dependabot-05-06-24
dependabot: update packages 05-06-24
2024-05-07 08:14:38 +08:00
Aleix Conchillo Flaqué
2729c6bf5b dependabot: update packages 05-06-24 2024-05-06 15:33:33 -07:00
Aleix Conchillo Flaqué
712a889121 Merge pull request #128 from daily-co/pillow-security-fixes
pyproject: pillow security fixes
2024-04-23 01:51:49 +08:00
Aleix Conchillo Flaqué
2f341e4fb0 pyproject: pillow security fixes 2024-04-22 10:28:42 -07:00
Kwindla Hultman Kramer
24198ecf45 Merge pull request #126 from daily-co/jptaylor-patch-3
Update README.md
2024-04-12 23:10:30 -07:00
Jon Taylor
7e4fefe958 Update README.md 2024-04-12 22:45:30 -07:00
Jon Taylor
e9af39b85f Merge pull request #125 from daily-co/jptaylor-patch-2
Update README.md
2024-04-12 22:44:14 -07:00
Jon Taylor
38aa3cebb4 Update README.md 2024-04-12 22:42:11 -07:00
Jon Taylor
72724365a0 Merge pull request #124 from daily-co/jptaylor-patch-1
Update README.md
2024-04-12 22:40:29 -07:00
Jon Taylor
5368462e41 Update README.md 2024-04-12 22:28:40 -07:00
Jon Taylor
1b2b29dd18 Merge pull request #123 from daily-co/jpt/pypi-badge
added pypi badge
2024-04-12 07:33:26 -07:00
Kwindla Hultman Kramer
d2b2b6f619 Merge pull request #122 from daily-co/kwindla-patch-1
Update README.md
2024-04-11 21:34:37 -07:00
Jon Taylor
54bcb52129 added pypi badge 2024-04-11 21:34:27 -07:00
Kwindla Hultman Kramer
3dc7438bc8 Update README.md 2024-04-11 21:05:27 -07:00
Aleix Conchillo Flaqué
523bb9f2a2 Merge pull request #120 from daily-co/small-fireworks-fixes
minor fireworks updates
2024-04-12 06:35:57 +08:00
Aleix Conchillo Flaqué
0c2b3f8b65 minor fireworks updates 2024-04-11 15:34:23 -07:00
chadbailey59
0b7578056d added fireworks adapter (#118) 2024-04-11 17:15:02 -05:00
Aleix Conchillo Flaqué
f1b6b9f8e5 Merge pull request #119 from daily-co/use-new-fal-client-library
services: FalImageGenService now uses fal-client library
2024-04-12 05:59:58 +08:00
Aleix Conchillo Flaqué
cbc51babbe services: use asyncio to_thread in moondreamservice 2024-04-11 14:22:44 -07:00
Aleix Conchillo Flaqué
b0faafc184 update macos-py3.10 requirements 2024-04-11 14:16:19 -07:00
Aleix Conchillo Flaqué
103092dbb2 update linux-py3.10 requirements 2024-04-11 14:13:59 -07:00
Aleix Conchillo Flaqué
7b49c9ade3 services: FalImageGenService now uses fal-client library 2024-04-11 14:09:01 -07:00
Aleix Conchillo Flaqué
1e83a405c0 Merge pull request #117 from daily-co/llm-use-aggregator-pass-through-fix
aggregators: fix LLMUserResponseAggregator passs-through
2024-04-12 04:24:56 +08:00
Aleix Conchillo Flaqué
7336866a1c examples: rely on new daily default transcription settings 2024-04-11 11:22:58 -07:00
Aleix Conchillo Flaqué
0f23282e30 transport: enable interim results in daily transport 2024-04-11 11:22:05 -07:00
Aleix Conchillo Flaqué
eb3bf117b1 use InterimTranscriptionFrame in LLMUserResponseAggregator 2024-04-11 11:21:42 -07:00
Aleix Conchillo Flaqué
e288aa047b examples: use LLMUserResponseAggregator with VAD 2024-04-11 08:10:56 -07:00
Aleix Conchillo Flaqué
9a9df35d7b aggregators: allow TranscriptionFrame after an end frame threshold 2024-04-10 23:35:31 -07:00
Aleix Conchillo Flaqué
af8663e95d aggregators: fix LLMUserResponseAggregator passs-through 2024-04-10 21:46:16 -07:00
Aleix Conchillo Flaqué
db05a9b29b Merge pull request #116 from daily-co/moondream-use-cpu
moondream: allow passing use_cpu
2024-04-11 09:08:11 +08:00
Aleix Conchillo Flaqué
130e418800 moondream: allow passing use_cpu 2024-04-10 17:43:44 -07:00
Aleix Conchillo Flaqué
1a0a66e503 Merge pull request #114 from daily-co/jpt/fal-updates
Updated Fal.ai service to take a params model and allow for model string param
2024-04-11 00:47:33 +08:00
Aleix Conchillo Flaqué
e22babbae2 examples: update with new FalImageGenService parameters 2024-04-10 09:45:08 -07:00
Aleix Conchillo Flaqué
bfe2e0f36e services: don't use image_size in ImageGenService 2024-04-10 09:44:42 -07:00
Aleix Conchillo Flaqué
26d401e5de Merge pull request #115 from daily-co/add-vision-and-moondream-service
add vision and moondream service
2024-04-11 00:22:26 +08:00
Aleix Conchillo Flaqué
3c20f9153d added VisionImageFrame and VisionImageFrameAggregator 2024-04-10 09:19:34 -07:00
Aleix Conchillo Flaqué
2f9899af5a update macos-py3.10 requirements 2024-04-09 22:39:04 -07:00
Aleix Conchillo Flaqué
5ef5cf30f4 update linux-py3.10 requirements 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
34a6c5691b examples: added 12-describe-video 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
18bf09c704 services: added MoondreamService 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
84cfa7cc95 services: added VisionService 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
a5eba0106b transport: allow requesting a user frame 2024-04-09 22:36:35 -07:00
Aleix Conchillo Flaqué
b117a185e3 frames: added UserImageRequestFrame 2024-04-09 22:14:54 -07:00
Aleix Conchillo Flaqué
0219230827 Merge pull request #113 from daily-co/aleix/only-subcribe-to-participant
only subcribe to participant
2024-04-10 10:47:29 +08:00
Aleix Conchillo Flaqué
9fcbb36997 examples: add 14a-local-render-remote-participant 2024-04-09 19:46:10 -07:00
Aleix Conchillo Flaqué
0bf15fd6eb daily: only subscribe to participant video source 2024-04-09 19:46:10 -07:00
Aleix Conchillo Flaqué
989252bb52 daily: always check camera/mic/speaker enabled 2024-04-09 19:46:10 -07:00
Jon Taylor
7b44a79a5b added params and model attribute to fal service 2024-04-09 17:43:27 -07:00
Aleix Conchillo Flaqué
4bd29b0080 Merge pull request #110 from daily-co/compatible-versions
pyproject: use compatible version
2024-04-10 00:41:22 +08:00
Aleix Conchillo Flaqué
ebb76fdae9 update macos-py3.10 requirements 2024-04-09 08:52:37 -07:00
Aleix Conchillo Flaqué
5d52def0fe update linux-py3.10 requirements 2024-04-09 08:49:41 -07:00
Aleix Conchillo Flaqué
9ada56d0b0 pyproject: use compatible version 2024-04-09 08:41:54 -07:00
Aleix Conchillo Flaqué
8d73cdb2ee Merge pull request #111 from daily-co/user-transcription-aggregator
pipeline: add UserTranscriptionAggregator
2024-04-09 23:34:52 +08:00
Aleix Conchillo Flaqué
4f04b10202 Merge pull request #112 from daily-co/user-image-frame
user image frames and other updates
2024-04-09 23:34:32 +08:00
Aleix Conchillo Flaqué
97b923e37e llm user and assistant aggregator renames 2024-04-09 08:31:48 -07:00
Aleix Conchillo Flaqué
57aabea0a3 examples: added 14-render-remote-participant 2024-04-09 08:01:14 -07:00
Aleix Conchillo Flaqué
319b8e7816 updated ImageFrame and added URLImageFrame and UserImageFrame 2024-04-08 23:23:33 -07:00
Aleix Conchillo Flaqué
96950ca6df daily: on_first_other_participant_joined now gets the participant 2024-04-08 23:23:33 -07:00
Aleix Conchillo Flaqué
d7b2e67c35 pipeline: add UserTranscriptionAggregator 2024-04-08 17:15:14 -07:00
Aleix Conchillo Flaqué
53930b47a5 github: just some rewording 2024-04-06 18:03:53 -07:00
Aleix Conchillo Flaqué
86c8ab02cc github: also publish stables releases to test pypi 2024-04-06 17:58:13 -07:00
Aleix Conchillo Flaqué
b678097f6d Merge pull request #109 from daily-co/only-use-fps
transport: only use fps to set maxFramerate
2024-04-07 07:02:44 +08:00
Aleix Conchillo Flaqué
eb455043c4 transport: use camera_bitrate and camera_framerate 2024-04-06 12:27:05 -07:00
Aleix Conchillo Flaqué
dd696be04c Merge pull request #108 from daily-co/add-camera-max-framerate
transport: add camera_max_framerate argument
2024-04-06 11:18:42 +08:00
Aleix Conchillo Flaqué
96b2337183 transport: add camera_max_framerate argument 2024-04-05 20:16:03 -07:00
Aleix Conchillo Flaqué
ea52e73f57 Merge pull request #107 from daily-co/increase-max-framerate
transport: increase daily maxFramerate to 30
2024-04-06 11:08:21 +08:00
Aleix Conchillo Flaqué
88404e4739 Merge pull request #106 from daily-co/updated-to-be-updated-examples
examples: updated to_be_updated examples
2024-04-06 11:06:30 +08:00
Aleix Conchillo Flaqué
0fd323714e transport: add camera_max_bitrate argument 2024-04-05 20:05:58 -07:00
Aleix Conchillo Flaqué
a362ca4d3d transport: increase daily maxFramerate to 30 2024-04-05 19:44:25 -07:00
Aleix Conchillo Flaqué
02b5c3dd5f update dot-env.template 2024-04-05 16:16:56 -07:00
Aleix Conchillo Flaqué
497a09cbc8 examples: updated to_be_updated examples 2024-04-05 16:01:23 -07:00
Aleix Conchillo Flaqué
172a14245d Merge pull request #104 from daily-co/threaded-transport-allow-sink-override
examples: fix whisper examples
2024-04-06 04:46:12 +08:00
Aleix Conchillo Flaqué
302246399b Merge pull request #105 from daily-co/local-tranport-read-audio-frames
transports: fix local transport read_audio_frames
2024-04-06 04:44:37 +08:00
Aleix Conchillo Flaqué
9590cc2fbc examples: fix whisper examples 2024-04-05 13:43:51 -07:00
Aleix Conchillo Flaqué
09e4044c72 transports: fix local transport read_audio_frames 2024-04-05 13:34:01 -07:00
Aleix Conchillo Flaqué
efdfb74dc3 github: increase fetch-depth to 100 for test publish 2024-04-05 08:32:29 -07:00
Aleix Conchillo Flaqué
158de6f20b github: fetch-tags and increase fetch-depth for test publish 2024-04-05 08:25:37 -07:00
Aleix Conchillo Flaqué
47f68b742d pyproject: user proper environment for test pypi 2024-04-05 08:02:45 -07:00
Aleix Conchillo Flaqué
2654ca1f62 pyproject: don't use local version for test pypi 2024-04-05 07:51:52 -07:00
Aleix Conchillo Flaqué
4263827ee8 README: use double-quotes with optional dependencies 2024-04-04 17:47:16 -07:00
Aleix Conchillo Flaqué
97fe529b0e github: update test publish workflow 2024-04-04 17:41:31 -07:00
Aleix Conchillo Flaqué
86025723e7 github: one more publish workflow fix 2024-04-04 17:36:20 -07:00
Aleix Conchillo Flaqué
6f4270a552 github: avoid caching in publish workflow 2024-04-04 17:32:50 -07:00
Aleix Conchillo Flaqué
31f050c02b github: more publish workflows fixes 2024-04-04 17:31:59 -07:00
Aleix Conchillo Flaqué
a0fe57721b github: fix publish workflows 2024-04-04 17:17:15 -07:00
Aleix Conchillo Flaqué
abf5e57319 Merge pull request #103 from daily-co/aleix/fix-github-cache-name
github: fix github cache name
2024-04-05 08:03:15 +08:00
Aleix Conchillo Flaqué
44de9007c3 Merge pull request #102 from daily-co/examples-cleanup
examples cleanup
2024-04-05 08:02:57 +08:00
Aleix Conchillo Flaqué
46d265514e pyproject: update github url 2024-04-04 15:52:28 -07:00
Aleix Conchillo Flaqué
9e64de8606 Merge pull request #101 from daily-co/cb/bot-exit
Allow transport exit to end a running pipeline
2024-04-05 06:51:06 +08:00
Aleix Conchillo Flaqué
1ea503c1e6 examples: fix 03a-image-local 2024-04-04 15:35:58 -07:00
Aleix Conchillo Flaqué
d0aeeccb68 github: fix github cache name 2024-04-04 14:36:04 -07:00
Aleix Conchillo Flaqué
d687c8cdeb transports: updated silero vad not found message 2024-04-04 14:05:40 -07:00
Aleix Conchillo Flaqué
951f20c788 transports: don't write/read if microphone/speaker not enabled 2024-04-04 14:05:15 -07:00
Aleix Conchillo Flaqué
982c0a0749 examples: move non-working examples to to_be_updated 2024-04-04 14:04:53 -07:00
Chad Bailey
27cef7cd70 add endframe to transport receive queue 2024-04-04 20:45:23 +00:00
chadbailey59
03ea208361 VAD fallback (#97)
* Silero VAD preferred with webrtc fallback

* webrtc VAD neds a different sample size

* fixup

* fixup
2024-04-04 13:31:07 -05:00
Aleix Conchillo Flaqué
385b51ac83 Merge pull request #98 from daily-co/use-pip-features
use pip optional dependencies
2024-04-05 01:00:21 +08:00
Aleix Conchillo Flaqué
a37e4fabad github: only run publish-test on main 2024-04-04 09:58:42 -07:00
Aleix Conchillo Flaqué
8bc3c03a69 add a requirements.txt per platform 2024-04-03 21:39:10 -07:00
Aleix Conchillo Flaqué
1fc800754b github: no need to install dependencies when building/deploying 2024-04-03 16:26:58 -07:00
Aleix Conchillo Flaqué
18c4bccc13 github: rename deploy to publish 2024-04-03 16:22:23 -07:00
Aleix Conchillo Flaqué
d57d473c13 pyproject.toml: use setuptools_scm to auto manage versions 2024-04-03 16:13:07 -07:00
Aleix Conchillo Flaqué
48bb3c6955 github: add publish to pypi workflows 2024-04-03 15:57:59 -07:00
Aleix Conchillo Flaqué
e3ee3f9cc6 github(lint): use requirements-dev.txt 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
3528f5d735 use conditional imports and show help errors if modules not found 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
23735cb3a3 dot-env.example: cleanup and add missing environment variables 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
6918dc69f0 github: separate build and test workflows 2024-04-03 15:27:31 -07:00
Aleix Conchillo Flaqué
128d350abc pyproject.toml: use project optional dependencies and pin them 2024-04-03 15:27:31 -07:00
chadbailey59
2f59e38a7a Modularize tricky dependencies (#95)
* removed pyaudio from threaded transport

* modularized torch and torchaudio

* modularized local transport

* Working Dockerfile as well

* docker updates for fly.io
2024-04-03 10:48:11 -05:00
chadbailey59
c21014860f Added app messages to translator example (#94) 2024-04-01 14:25:20 -05:00
chadbailey59
d4e3e1710f Server updates (#90)
* updated server readme

* fixup

* Refactored server

* fixup
2024-03-28 15:03:08 -05:00
Moishe Lettvin
e7f9296b5a Merge pull request #93 from daily-co/frame-name-cleanup
Cleanup the last few badly-named Frame types
2024-03-28 14:25:59 -04:00
Moishe Lettvin
27322108b7 Cleanup the last few badly-named Frame types 2024-03-28 12:36:24 -04:00
Moishe Lettvin
22bbedec93 Merge pull request #92 from daily-co/remove-bad-print
Remove mistakenly-added print statement
2024-03-28 11:54:49 -04:00
Moishe Lettvin
ed91bc0f66 Remove mistakenly-added print statement 2024-03-28 11:47:11 -04:00
Moishe Lettvin
565acfa9c9 Merge pull request #86 from daily-co/transport-refactor
Starting refactor of transports into their own directory
2024-03-28 11:17:32 -04:00
Moishe Lettvin
a2295b6b1d Merge pull request #91 from daily-co/pipeline-logging
Add logging for pipeline
2024-03-28 11:16:26 -04:00
Moishe Lettvin
fef1366c84 Merge pull request #88 from daily-co/frame-progress-diagram
Frame progress diagram
2024-03-28 11:13:21 -04:00
Moishe Lettvin
5c0ba1b6f0 Fix off by one errors, add tests and comment 2024-03-28 08:34:34 -04:00
Moishe Lettvin
05c77bce25 Add logging for pipeline 2024-03-27 18:48:30 -04:00
Moishe Lettvin
4ce140bf84 Move some things to AbstractTransport class 2024-03-27 12:59:08 -04:00
James Hush
a3293c6d7a fix: force overriding environment variables from .env files (#89) 2024-03-27 23:38:55 +08:00
Moishe Lettvin
ce04d4a54a Add text to md 2024-03-27 08:10:14 -04:00
Moishe Lettvin
758ed2d895 Frame progress images 2024-03-26 20:40:10 -04:00
Moishe Lettvin
85cd795b2b fix image 2024-03-26 20:36:18 -04:00
Moishe Lettvin
6c36d5f686 Testing 2024-03-26 20:33:09 -04:00
Moishe Lettvin
b2425d6dcd Testing 2024-03-26 20:32:32 -04:00
Moishe Lettvin
e8a6560ac1 Merge forgotten files 2024-03-26 16:24:47 -04:00
Moishe Lettvin
78c80d8941 some more renames 2024-03-26 15:57:19 -04:00
Moishe Lettvin
2fc5de6afe Starting refactor of transports into their own directory 2024-03-26 08:35:04 -04:00
Moishe Lettvin
24fb7c5a05 Merge pull request #81 from daily-co/websocket-transport
Websocket transport
2024-03-25 14:40:34 -04:00
Moishe Lettvin
5761e23af1 remove unnecessary checks 2024-03-25 14:00:08 -04:00
Moishe Lettvin
960c659d5a Remove duplicated constant 2024-03-25 13:59:03 -04:00
Moishe Lettvin
2bda4c3307 Websocket transport 2024-03-25 13:54:34 -04:00
Aleix Conchillo Flaqué
2c5628a621 Merge pull request #85 from daily-co/minor-readme-update
README: minor fixes
2024-03-22 04:33:42 +08:00
Aleix Conchillo Flaqué
9b4cfd9a6c README: minor fixes 2024-03-21 13:16:50 -07:00
Aleix Conchillo Flaqué
8f9aeb0751 Merge pull request #82 from daily-co/remove-unused-imports
remove unused imports
2024-03-22 03:02:07 +08:00
Aleix Conchillo Flaqué
e8a9d43287 Merge pull request #84 from daily-co/use-openai-api-key
use OPENAI_API_KEY instead of OPENAI_CHATGPT_API_KEY
2024-03-21 21:57:40 +08:00
Aleix Conchillo Flaqué
cf5d516d51 use OPENAI_API_KEY instead of OPENAI_CHATGPT_API_KEY
Fixes #77
2024-03-20 15:26:32 -07:00
Aleix Conchillo Flaqué
0666dd1194 remove unused imports 2024-03-20 14:52:19 -07:00
Aleix Conchillo Flaqué
42e25ccd13 create missing __init__.py 2024-03-20 14:41:39 -07:00
Aleix Conchillo Flaqué
520cee273f Merge pull request #80 from daily-co/move-src-daily-tests-to-tests
move src/dailyai/tests to tests
2024-03-21 00:27:07 +08:00
Aleix Conchillo Flaqué
a189e2618f github: source venv in every step 2024-03-19 15:31:03 -07:00
Aleix Conchillo Flaqué
ae2dcf88ed github: use virtual environment 2024-03-19 15:23:09 -07:00
Aleix Conchillo Flaqué
5cdb82ad3c README: one more autopep8 emacs update 2024-03-19 15:18:29 -07:00
Aleix Conchillo Flaqué
593513c84a github: add venv caching 2024-03-19 15:17:48 -07:00
Aleix Conchillo Flaqué
16257f8ec0 move src/dailyai/tests to tests 2024-03-19 14:59:48 -07:00
Aleix Conchillo Flaqué
5fc21a7508 Merge pull request #73 from daily-co/github-unittests-workflow
github: add workflow for unit tests
2024-03-20 03:01:03 +08:00
Aleix Conchillo Flaqué
cc05429135 github: add workflow for unit tests 2024-03-19 11:51:14 -07:00
Aleix Conchillo Flaqué
85e66dddbe Merge pull request #79 from daily-co/readme-emacs-autopep8-update
README: emacs autopep8 update
2024-03-20 02:17:44 +08:00
Aleix Conchillo Flaqué
03ea559839 README: emacs autopep8 update 2024-03-19 10:28:11 -07:00
Aleix Conchillo Flaqué
b6c9859e34 Merge pull request #78 from daily-co/readme-editor-setup
README: add editor setup
2024-03-20 01:10:57 +08:00
Aleix Conchillo Flaqué
bc47c909a3 README: add editor setup 2024-03-19 10:10:14 -07:00
Aleix Conchillo Flaqué
428659730d Merge pull request #70 from daily-co/move-src-example-to-examples
move src/examples to examples
2024-03-20 01:09:13 +08:00
Aleix Conchillo Flaqué
a573277a10 examples: copy runner.py and auth.py where needed 2024-03-18 17:10:23 -07:00
Aleix Conchillo Flaqué
69c2637a25 README.md: update examples 2024-03-18 14:53:53 -07:00
Aleix Conchillo Flaqué
90c34d278f move src/examples to examples 2024-03-18 11:51:38 -07:00
Aleix Conchillo Flaqué
2f4e31d1b2 Merge pull request #69 from daily-co/add-github-linting-workflow
github: add linting workflow
2024-03-19 02:46:50 +08:00
Aleix Conchillo Flaqué
9385270775 autopep8 formatting 2024-03-18 11:28:32 -07:00
Aleix Conchillo Flaqué
2914e43350 github: add linting workflow 2024-03-18 11:28:06 -07:00
chadbailey59
78638d2dba Live translation (#61)
* added translator

* fixup
2024-03-18 13:26:05 -05:00
Aleix Conchillo Flaqué
141a5bb548 Merge pull request #68 from daily-co/log-transcription-errors
daily: log transcription errors
2024-03-19 01:53:40 +08:00
Aleix Conchillo Flaqué
3957813202 Merge pull request #67 from daily-co/add-dot-env-template
add dot-env.template
2024-03-19 01:49:21 +08:00
Aleix Conchillo Flaqué
549862ef99 daily: log transcription errors 2024-03-18 10:47:20 -07:00
Aleix Conchillo Flaqué
1000ca5b55 add dot-env.template 2024-03-18 10:43:57 -07:00
Moishe Lettvin
91dbfef4c3 Merge pull request #64 from daily-co/docs
Some docs
2024-03-18 13:38:32 -04:00
Moishe Lettvin
3b61d0b41a fix typos 2024-03-18 13:38:00 -04:00
Moishe Lettvin
bf3ae091b9 Merge pull request #62 from daily-co/anthropic-support
Anthropic LLM service
2024-03-18 13:36:39 -04:00
Aleix Conchillo Flaqué
34ac796607 Merge pull request #66 from daily-co/daily-transport-release-client
services: release daily client after leave
2024-03-19 01:36:22 +08:00
Aleix Conchillo Flaqué
e0551e9d85 services: release daily client after leave 2024-03-18 10:32:46 -07:00
Moishe Lettvin
b1ab6f91b9 Merge pull request #65 from daily-co/app-messages
Support for app messages
2024-03-18 11:37:10 -04:00
Moishe Lettvin
58726dc20d clean up imports 2024-03-18 10:14:51 -04:00
Moishe Lettvin
8e61fe8e36 Support for app messages 2024-03-18 10:08:41 -04:00
Moishe Lettvin
99b836c227 added docstrings to frames. 2024-03-18 09:08:12 -04:00
Moishe Lettvin
1c27f77f1a drafty architecture doc 2024-03-18 08:39:50 -04:00
Moishe Lettvin
c91fa39a99 Remove testing code 2024-03-15 19:42:46 -04:00
Moishe Lettvin
eacaea7db4 Anthropic LLM service 2024-03-15 19:40:37 -04:00
Moishe Lettvin
c6dfcb6f7a Merge pull request #60 from daily-co/remove-ai-service-methods
Remove run_to_queue and run from AIService class
2024-03-15 15:28:28 -04:00
Moishe Lettvin
18bf26de14 Update apps 2024-03-15 13:39:33 -04:00
Moishe Lettvin
b8b35db89c Remove run_to_queue and run from AIService class 2024-03-15 11:04:22 -04:00
Moishe Lettvin
358166f347 Merge pull request #59 from daily-co/remove-requirements
Remove unused requirements file
2024-03-13 16:23:42 -04:00
Moishe Lettvin
c006c123b2 Remove unused requirements file 2024-03-13 16:19:03 -04:00
chadbailey59
cf302fb765 Storybot and Chatbot examples (#58)
* storybot

* storybot

* added pipeline.queue_frames

* fixup
2024-03-13 15:12:59 -05:00
Moishe Lettvin
e33820fe36 Merge pull request #56 from daily-co/fal-redux
Use other model in FAL
2024-03-12 15:14:57 -04:00
Moishe Lettvin
b84b3d59f3 Use other model in FAL 2024-03-12 14:47:00 -04:00
Moishe Lettvin
7b5b88b99b Merge pull request #55 from daily-co/fix-fal
set FAL param correctly
2024-03-12 14:12:16 -04:00
Moishe Lettvin
e87196cce7 set FAL param correctly 2024-03-12 14:03:43 -04:00
chadbailey59
bbfc9e703b intake cleanup (#54) 2024-03-12 13:01:39 -05:00
Moishe Lettvin
c21a63d48b Merge pull request #49 from daily-co/openai-base-llm
Base OpenAI LLM service
2024-03-12 12:58:31 -04:00
Moishe Lettvin
f546bb32da Make 08- work again 2024-03-12 10:34:52 -04:00
Moishe Lettvin
d9378e23ba Base OpenAI LLM service 2024-03-11 16:52:41 -04:00
Moishe Lettvin
c75a3fb0d0 Merge pull request #53 from daily-co/fix_other_joined_event
Don't do time-consuming processing in `on_other_joined_event`
2024-03-11 13:27:13 -04:00
Moishe Lettvin
f8ae264957 remove unnecessary print 2024-03-11 13:20:28 -04:00
Moishe Lettvin
977c12d530 undo fal change 2024-03-11 13:19:47 -04:00
Moishe Lettvin
61c55d2f47 Fix up other examples 2024-03-11 13:17:31 -04:00
Moishe Lettvin
fd2fa23e9c Fix example 2 2024-03-11 13:00:29 -04:00
Moishe Lettvin
de026ccc8a Merge pull request #50 from daily-co/khk/launch-samples
Khk/launch samples
2024-03-11 12:50:38 -04:00
Moishe Lettvin
c5bb0e14ab Merge pull request #51 from daily-co/khk/readme
updated README
2024-03-11 12:50:22 -04:00
chadbailey59
a4f3c51184 the smallest commit in history 2024-03-11 09:47:00 -05:00
Moishe Lettvin
7786e685cc Merge pull request #52 from daily-co/pypi-updates
updates to pyproject.toml
2024-03-11 10:34:35 -04:00
Moishe Lettvin
33793ca9f8 update description 2024-03-11 07:31:39 -04:00
Moishe Lettvin
d26aede667 updates to pyproject.toml 2024-03-11 07:25:20 -04:00
Moishe Lettvin
ad993056d8 rename to dailyai 2024-03-11 07:16:20 -04:00
Kwindla Hultman Kramer
5b1f26aacb updated README 2024-03-10 22:06:23 -07:00
Kwindla Hultman Kramer
4e16e514dd attempting to change tts to deepgram in example 04 2024-03-10 19:43:06 -07:00
Kwindla Hultman Kramer
959ffa9d36 small streamlining of example 03 2024-03-10 19:42:19 -07:00
Kwindla Hultman Kramer
4396b1018a small streamlining of example 02 2024-03-10 19:41:32 -07:00
Kwindla Hultman Kramer
37e904ce68 changed fal to a maybe slightly faster model 2024-03-10 19:40:51 -07:00
Kwindla Hultman Kramer
ef39d842a5 custom processor in example 05 2024-03-10 19:18:37 -07:00
Kwindla Hultman Kramer
72f631a066 working on foundational examples 2024-03-10 17:21:46 -07:00
chadbailey59
5d46302b9e changed default services (#47) 2024-03-08 15:36:30 -06:00
chadbailey59
8241dc0bed cleaned up example logging (#46) 2024-03-08 15:25:17 -06:00
Moishe Lettvin
95a1efbe75 Merge pull request #45 from daily-co/exception_handling_callbacks
Wait for the callback's result, so exceptions get raised
2024-03-08 15:04:15 -05:00
Moishe Lettvin
e59df8476e Wait for the callback's result, so exceptions get raised 2024-03-08 15:02:15 -05:00
chadbailey59
824df8ca7c moved patient intake and example runner (#44) 2024-03-08 12:07:51 -06:00
chadbailey59
0db8a51b27 cleaned up function calling frames (#43) 2024-03-08 10:13:28 -06:00
chadbailey59
ce9c6ede66 function allowlist (#42) 2024-03-08 08:49:09 -06:00
Moishe Lettvin
192b46bbab Merge pull request #41 from daily-co/optimize-pipeline
Optimize pipeline processing
2024-03-07 21:01:03 -05:00
Moishe Lettvin
196279e342 Add endframe to sample 4 2024-03-07 19:24:27 -05:00
Moishe Lettvin
edd93bc4cb remove errant print statement 2024-03-07 19:05:03 -05:00
Moishe Lettvin
d0076dd4ee Optimize pipeline processing so we don't wait for the completion of one generator to move onto the next. 2024-03-07 18:59:47 -05:00
chadbailey59
3c5f4800d4 Chad's big patient intake PR (#40)
* at least it runs, kind of

* wip

* wip with user response aggregator

* frame and pipeline docstrings

* Getting started on docstrings

* finish docstrings for aggregators

* patient intake is working!

* cleanup

* cleanup

---------

Co-authored-by: Moishe Lettvin <moishel@gmail.com>
2024-03-07 17:41:32 -06:00
Moishe Lettvin
2bcb4966d3 Merge pull request #39 from daily-co/docstrings
Docstrings
2024-03-07 15:39:50 -05:00
Moishe Lettvin
b14f08a7d5 finish docstrings for aggregators 2024-03-07 15:16:23 -05:00
Moishe Lettvin
8fb92e3fd7 Getting started on docstrings 2024-03-07 12:51:19 -05:00
Moishe Lettvin
337ca7f581 frame and pipeline docstrings 2024-03-07 10:16:27 -05:00
Moishe Lettvin
eb430621f1 Merge pull request #37 from daily-co/fix-interruptible
Fix interruptible pipeline runner and aggregator.
2024-03-07 09:09:41 -05:00
Moishe Lettvin
d5683c4f24 Fix interruptible pipeline runner and aggregator. 2024-03-07 09:05:49 -05:00
chadbailey59
b4505b7eff added audio chunking for better interruption support (#35) 2024-03-06 18:20:04 -06:00
Moishe Lettvin
3e46d28aff Add start frame to interrupt loop 2024-03-06 15:58:19 -05:00
Moishe Lettvin
d3e76c4fd6 Merge pull request #34 from daily-co/rename-frames
Remove Queue in frame names
2024-03-06 14:10:56 -05:00
Moishe Lettvin
62fd371b97 Remove Queue in frame names 2024-03-06 14:09:06 -05:00
Moishe Lettvin
b9556716dd Merge pull request #33 from daily-co/pipeline-instead-of-nest
Pipeline instead of nest
2024-03-05 11:04:20 -05:00
Moishe Lettvin
2708dcf7b5 Remove conversation wrapper 2024-03-04 14:07:49 -05:00
Moishe Lettvin
d3f86dab2e starting on interruptions 2024-03-04 13:41:28 -05:00
Moishe Lettvin
18e7626b9f Getting started on interruptible transport pipeline runner 2024-03-04 07:51:22 -05:00
Moishe Lettvin
763a50f8ec First cut at sample 6 rewrite with pipelines 2024-03-04 07:28:10 -05:00
Moishe Lettvin
3b282cc921 some comments 2024-03-03 20:17:48 -05:00
Moishe Lettvin
434772dc23 Update sample 5! 2024-03-03 19:50:13 -05:00
Moishe Lettvin
15df4a9d58 cleanup, make sample 4 work with new stuff 2024-03-03 19:37:30 -05:00
Moishe Lettvin
643be238f9 getting started 2024-03-03 16:31:31 -05:00
chadbailey59
d90fdb1cae Isolated changes to add VAD (#32)
* added VAD

* added separate 'vad enabled' property
2024-02-28 15:16:44 -06:00
Moishe Lettvin
f710aeae95 Merge pull request #30 from daily-co/unsub-video
cleanup client properties and unsubscribe from camera
2024-02-27 13:16:20 -05:00
Moishe Lettvin
20091d91c9 cleanup client properties and unsubscribe from camera 2024-02-27 13:09:55 -05:00
Moishe Lettvin
92ec5641d4 update deepgram tts to new service structure 2024-02-14 13:44:59 -05:00
Moishe Lettvin
53e97bd872 Merge pull request #28 from daily-co/update-playht-service
Update playht service
2024-02-14 12:54:34 -05:00
Moishe Lettvin
dcbd79333a make destructor call client.close in PlayHT service 2024-02-14 12:53:20 -05:00
Moishe Lettvin
97a4cb8b7f Update playht tts service 2024-02-14 12:40:13 -05:00
Moishe Lettvin
cc7877f626 Merge pull request #26 from daily-co/fix-sigint
fix sigint handling
2024-02-14 12:11:44 -05:00
Moishe Lettvin
1992b7e79e fix sigint handling 2024-02-14 12:10:47 -05:00
Moishe Lettvin
2516670874 Merge pull request #25 from daily-co/keyboard-interrupt
Call client.leave on keyboard interrupt
2024-02-13 14:18:42 -05:00
Moishe Lettvin
4fecc10808 Call client.leave on keyboard interrupt 2024-02-13 14:17:09 -05:00
Moishe Lettvin
08144fc560 Merge pull request #24 from daily-co/another-formatting-pass
Another autopep8 formatting pass
2024-02-10 09:39:51 -05:00
Moishe Lettvin
815aa2bc3e Another autopep8 formatting pass 2024-02-10 09:29:08 -05:00
Moishe Lettvin
560c98f2fa Merge pull request #23 from daily-co/ollama-service
Ollama LLM service
2024-02-10 09:27:17 -05:00
Moishe Lettvin
0e0c992f59 Ollama LLM service 2024-02-10 09:22:52 -05:00
Moishe Lettvin
d76139ac1a Merge pull request #22 from daily-co/temp-readme-patch
Make the README okay-enough for limited public release
2024-02-09 11:57:39 -05:00
Moishe Lettvin
444418d94c Make the README okay-enough for limited public release 2024-02-09 10:26:39 -05:00
Moishe Lettvin
d27122e35e Create LICENSE 2024-02-09 09:10:28 -06:00
Chad Bailey
0ae83577c6 renamed samples to examples 2024-02-08 16:34:48 +00:00
Chad Bailey
5c402eee81 started adding docs 2024-02-08 16:31:17 +00:00
Moishe Lettvin
80750fe022 Remove old/deprecated/broken samples 2024-02-08 09:56:22 -05:00
Moishe Lettvin
ccfba04ea2 Remove mistakenly-added file 2024-02-08 09:55:28 -05:00
Moishe Lettvin
5b8198cf9e Merge pull request #21 from daily-co/cleanup_constructor_args
Cleanup constructor args in examples
2024-02-08 09:44:51 -05:00
Moishe Lettvin
3fa00c4db8 Cleanup constructor args in examples 2024-02-08 09:41:51 -05:00
Moishe Lettvin
4ce36f8c63 Merge pull request #20 from daily-co/base_transport
Add a "Local Transport" as a proof of concept
2024-02-08 08:25:03 -05:00
Moishe Lettvin
9620080cc5 A little example cleanup 2024-02-08 08:24:25 -05:00
Moishe Lettvin
ee1ce8f288 Abstract base transport class & local transport class 2024-02-08 08:15:28 -05:00
chadbailey59
70d07b6ea2 WIP: environment cleanup (#19)
* removed env var usage from SDK services

* started consolidating configure.py

* 1–3 work

* cleaned up the rest

* more cleanup

* cleanup and 05 tinkering

* made fal keys optional
2024-02-06 15:07:16 -06:00
Moishe Lettvin
9d5ad5675c Fix 06- demo and also fix bugs where dangling sentences wouldn't be spoken 2024-02-01 12:54:23 -05:00
chadbailey59
0d96f91cde Added sound effect example (#18)
* added sound effect example

* added dialout to this branch too

* fixup

* fixup for more dialout testing

* cleanup
2024-02-01 10:26:50 -06:00
Moishe Lettvin
4e9586595d minor cleanup 2024-01-29 15:06:39 -05:00
Moishe Lettvin
d0bcddfd70 Fix 06a-image-sync.py 2024-01-29 14:29:32 -05:00
Chad Bailey
065a213ebb example renaming 2024-01-29 17:42:45 +00:00
Chad Bailey
7d6c94d604 added 09 examples 2024-01-29 17:39:28 +00:00
Chad Bailey
0859b57b00 Added 09 examples 2024-01-29 17:39:14 +00:00
Moishe Lettvin
09838c9b1f Merge pull request #17 from daily-co/start_tests
Add some basic daily_transport tests
2024-01-29 07:57:33 -05:00
Moishe Lettvin
c39920132c Add some basic daily_transport tests 2024-01-29 07:56:12 -05:00
Moishe Lettvin
860129a4be Merge pull request #16 from daily-co/image_tweaks
Minor Cleanup
2024-01-27 19:10:52 -05:00
Moishe Lettvin
4416f36ae9 some minor cleanup, and coalesce image/images into one thing, and use itertools.cycle 2024-01-27 19:07:29 -05:00
chadbailey59
86af896150 Wake word and animation sprites (#15)
* WIP: golden kitty

* added web server

* added health check

* added flask to module build

* trying requirements.txt

* added dotenv

* flask_cors

* gunicorn

* requirements cleanup

* Dockerfile

* WOOF

* basic wake word

* removed otel

* basic animation kind of works

* i think animation defeated me

* added santa cat assets

* cleanup

* cleanup

* server example and cleanup

* more cleanup

* fix up some class variable names

* minor cleanup, remove mistakenly-added print and logger stuff

* cleanup

* cleanup

---------

Co-authored-by: Moishe Lettvin <moishel@gmail.com>
2024-01-26 15:37:39 -06:00
Moishe Lettvin
5cbac4701b minor cleanup, remove mistakenly-added print and logger stuff 2024-01-26 15:27:12 -05:00
Moishe Lettvin
5d9aa530e2 fix up some class variable names 2024-01-26 15:15:44 -05:00
Moishe Lettvin
d4c4d49035 Merge pull request #14 from daily-co/aiosessions
Don't create aiohttp sessions inside services
2024-01-26 14:01:24 -05:00
Moishe Lettvin
e81f247845 Don't create aiohttp sessions inside services 2024-01-26 12:30:37 -05:00
Liza
8baf137511 prefix suspected private members (#13) 2024-01-26 18:28:54 +01:00
Moishe Lettvin
fcceb32bd7 Merge pull request #12 from daily-co/frame_sync
Speaking / waiting images
2024-01-26 10:17:01 -05:00
Moishe Lettvin
ead655fe23 some more fixup 2024-01-26 10:07:16 -05:00
Moishe Lettvin
bab102f197 little more cleanup 2024-01-26 09:54:51 -05:00
Moishe Lettvin
95fc802607 Speaking / waiting images 2024-01-26 09:15:29 -05:00
Moishe Lettvin
2886997693 Merge pull request #11 from daily-co/autopep
Autopep linter fixes
2024-01-25 12:17:26 -05:00
Moishe Lettvin
5fdda43bed Autopep linter fixes 2024-01-25 12:12:46 -05:00
Moishe Lettvin
f0d9b0613e Add faster_whisper to module dependencies; remove unneeded import 2024-01-25 11:27:00 -05:00
Moishe Lettvin
a661905d7f Merge pull request #9 from daily-co/interruptions
Interruptable conversation wrapper
2024-01-25 11:24:57 -05:00
Moishe Lettvin
c9c2e5f561 Remove unnecessary try/except 2024-01-25 11:18:55 -05:00
Moishe Lettvin
795a339542 Add InterruptibleConversationWrapper 2024-01-25 11:15:04 -05:00
Liza
31db156dfc Local Whisper transcription (#10)
* First pass at Whisper transcription

* deletions

* Revise based on feedback, add autopep8
2024-01-25 13:43:25 +01:00
Moishe Lettvin
690cf2e47d Merge pull request #8 from daily-co/queueframe-refactor
Refactor QueueFrame
2024-01-23 13:15:11 -05:00
Moishe Lettvin
ba89e41c5b remove commented-out code 2024-01-23 09:37:15 -05:00
Moishe Lettvin
c134598a77 Refactor QueueFrame 2024-01-23 09:33:51 -05:00
Liza
b51abd2969 facilitate manual call management (#7) 2024-01-23 14:33:27 +01:00
Moishe Lettvin
3fda9b0ecb Use more flexibile aggregator 2024-01-22 16:02:35 -05:00
Moishe Lettvin
95c92e5304 Aggregators for LLM messages 2024-01-22 10:59:13 -05:00
Moishe Lettvin
b443fbdb60 Very rough draft at intro/overview in README 2024-01-19 16:20:08 -05:00
Moishe Lettvin
ccd2fa31e5 Rename 'theoretical-to-real' samples to 'foundational' 2024-01-19 13:57:52 -05:00
Moishe Lettvin
9b65286216 Merge pull request #6 from daily-co/rm-sentence-aggregator
Cleanup: no more sentence aggregator
2024-01-19 13:42:27 -05:00
Moishe Lettvin
6ae733ebfe Cleanup: no more sentence aggregator, let the TTS service deal with that; also removed the queue typing stuff from ai_services 2024-01-19 13:06:15 -05:00
Liza
1071dede1a Only initialize Daily once (#5) 2024-01-19 14:59:48 +01:00
397 changed files with 26506 additions and 3302 deletions

30
.dockerignore Normal file
View File

@@ -0,0 +1,30 @@
# flyctl launch added from .gitignore
**/.vscode
**/env
**/__pycache__
**/*~
**/venv
#*#
# Distribution / packaging
**/.Python
**/build
**/develop-eggs
**/dist
**/downloads
**/eggs
**/.eggs
**/lib
**/lib64
**/parts
**/sdist
**/var
**/wheels
**/share/python-wheels
**/*.egg-info
**/.installed.cfg
**/*.egg
**/MANIFEST
**/.DS_Store
**/.env
fly.toml

44
.github/workflows/build.yaml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: build
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
name: "Build and Install"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Install project and other Python dependencies
run: |
source .venv/bin/activate
pip install --editable .

44
.github/workflows/lint.yaml vendored Normal file
View File

@@ -0,0 +1,44 @@
name: lint
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-lint-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
autopep8:
name: "Formatting lints"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install development Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: autopep8
id: autopep8
run: |
source .venv/bin/activate
autopep8 --max-line-length 100 --exit-code -r -d --exclude "*_pb2.py" -a -a src/
- name: Fail if autopep8 requires changes
if: steps.autopep8.outputs.exit-code == 2
run: exit 1

84
.github/workflows/publish.yaml vendored Normal file
View File

@@ -0,0 +1,84 @@
name: publish
on:
workflow_dispatch:
inputs:
gitref:
type: string
description: "what git ref to build"
required: true
jobs:
build:
name: "Build and upload wheels"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels
path: ./dist
publish-to-pypi:
name: "Publish to PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: pypi
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
publish-to-test-pypi:
name: "Publish to Test PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: testpypi
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/

55
.github/workflows/publish_test.yaml vendored Normal file
View File

@@ -0,0 +1,55 @@
name: publish-test
on: workflow_dispatch
jobs:
build:
name: "Build and upload wheels"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels
path: ./dist
publish-to-test-pypi:
name: "Publish to Test PyPI"
runs-on: ubuntu-latest
needs: [ build ]
environment:
name: testpypi
url: https://pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v4
with:
name: wheels
path: ./dist
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/

49
.github/workflows/tests.yaml vendored Normal file
View File

@@ -0,0 +1,49 @@
name: test
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
concurrency:
group: build-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
test:
name: "Unit and Integration Tests"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing requirements-dev.txt and requirements-extra.txt which
# contain all dependencies needed to run the tests and examples.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('linux-py3.10-requirements.txt') }}-${{ hashFiles('dev-requirements.txt') }}
path: .venv
- name: Install system packages
run: sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r linux-py3.10-requirements.txt -r dev-requirements.txt
- name: Test with pytest
run: |
source .venv/bin/activate
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests

3
.gitignore vendored
View File

@@ -2,6 +2,8 @@
env/
__pycache__/
*~
venv
.venv
#*#
# Distribution / packaging
@@ -25,3 +27,4 @@ share/python-wheels/
MANIFEST
.DS_Store
.env
fly.toml

768
CHANGELOG.md Normal file
View File

@@ -0,0 +1,768 @@
# Changelog
All notable changes to **pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.0.38] - 2024-07-23
### Added
- Added `force_reload`, `skip_validation` and `trust_repo` to `SileroVAD` and
`SileroVADAnalyzer`. This allows caching and various GitHub repo validations.
- Added `send_initial_empty_metrics` flag to `PipelineParams` to request for
initial empty metrics (zero values). True by default.
### Fixed
- Fixed initial metrics format. It was using the wrong keys name/time instead of
processor/value.
- STT services should be using ISO 8601 time format for transcription frames.
- Fixed an issue that would cause Daily transport to show a stop transcription
error when actually none occurred.
## [0.0.37] - 2024-07-22
### Added
- Added `RTVIProcessor` which implements the RTVI-AI standard.
See https://github.com/rtvi-ai
- Added `BotInterruptionFrame` which allows interrupting the bot while talking.
- Added `LLMMessagesAppendFrame` which allows appending messages to the current
LLM context.
- Added `LLMMessagesUpdateFrame` which allows changing the LLM context for the
one provided in this new frame.
- Added `LLMModelUpdateFrame` which allows updating the LLM model.
- Added `TTSSpeakFrame` which causes the bot say some text. This text will not
be part of the LLM context.
- Added `TTSVoiceUpdateFrame` which allows updating the TTS voice.
### Removed
- We remove the `LLMResponseStartFrame` and `LLMResponseEndFrame` frames. These
were added in the past to properly handle interruptions for the
`LLMAssistantContextAggregator`. But the `LLMContextAggregator` is now based
on `LLMResponseAggregator` which handles interruptions properly by just
processing the `StartInterruptionFrame`, so there's no need for these extra
frames any more.
### Fixed
- Fixed an issue with `StatelessTextTransformer` where it was pushing a string
instead of a `TextFrame`.
- `TTSService` end of sentence detection has been improved. It now works with
acronyms, numbers, hours and others.
- Fixed an issue in `TTSService` that would not properly flush the current
aggregated sentence if an `LLMFullResponseEndFrame` was found.
### Performance
- `CartesiaTTSService` now uses websockets which improves speed. It also
leverages the new Cartesia contexts which maintains generated audio prosody
when multiple inputs are sent, therefore improving audio quality a lot.
## [0.0.36] - 2024-07-02
### Added
- Added `GladiaSTTService`.
See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition
- Added `XTTSService`. This is a local Text-To-Speech service.
See https://github.com/coqui-ai/TTS
- Added `UserIdleProcessor`. This processor can be used to wait for any
interaction with the user. If the user doesn't say anything within a given
timeout a provided callback is called.
- Added `IdleFrameProcessor`. This processor can be used to wait for frames
within a given timeout. If no frame is received within the timeout a provided
callback is called.
- Added new frame `BotSpeakingFrame`. This frame will be continuously pushed
upstream while the bot is talking.
- It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer`
or `SileroVAD`.
- Added `AysncFrameProcessor` and `AsyncAIService`. Some services like
`DeepgramSTTService` need to process things asynchronously. For example, audio
is sent to Deepgram but transcriptions are not returned immediately. In these
cases we still require all frames (except system frames) to be pushed
downstream from a single task. That's what `AsyncFrameProcessor` is for. It
creates a task and all frames should be pushed from that task. So, whenever a
new Deepgram transcription is ready that transcription will also be pushed
from this internal task.
- The `MetricsFrame` now includes processing metrics if metrics are enabled. The
processing metrics indicate the time a processor needs to generate all its
output. Note that not all processors generate these kind of metrics.
### Changed
- `WhisperSTTService` model can now also be a string.
- Added missing * keyword separators in services.
### Fixed
- `WebsocketServerTransport` doesn't try to send frames anymore if serializers
returns `None`.
- Fixed an issue where exceptions that occurred inside frame processors were
being swallowed and not displayed.
- Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send
data to the websocket after being closed.
### Other
- Added Fly.io deployment example in `examples/deployment/flyio-example`.
- Added new `17-detect-user-idle.py` example that shows how to use the new
`UserIdleProcessor`.
## [0.0.35] - 2024-06-28
### Changed
- `FastAPIWebsocketParams` now require a serializer.
- `TwilioFrameSerializer` now requires a `streamSid`.
### Fixed
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for
8000 sample rate.
## [0.0.34] - 2024-06-25
### Fixed
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
interruptions to ignore transcriptions.
- Fixed an issue introduced in 0.0.33 that would cause the LLM to generate
shorter output.
## [0.0.33] - 2024-06-25
### Changed
- Upgraded to Cartesia's new Python library 1.0.0. `CartesiaTTSService` now
expects a voice ID instead of a voice name (you can get the voice ID from
Cartesia's playground). You can also specify the audio `sample_rate` and
`encoding` instead of the previous `output_format`.
### Fixed
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
cause static audio issues and interruptions to not work properly when dealing
with multiple LLMs sentences.
- Fixed an issue that could mix new LLM responses with previous ones when
handling interruptions.
- Fixed a Daily transport blocking situation that occurred while reading audio
frames after a participant left the room. Needs daily-python >= 0.10.1.
## [0.0.32] - 2024-06-22
### Added
- Allow specifying a `DeepgramSTTService` url which allows using on-prem
Deepgram.
- Added new `FastAPIWebsocketTransport`. This is a new websocket transport that
can be integrated with FastAPI websockets.
- Added new `TwilioFrameSerializer`. This is a new serializer that knows how to
serialize and deserialize audio frames from Twilio.
- Added Daily transport event: `on_dialout_answered`. See
https://reference-python.daily.co/api_reference.html#daily.EventHandler
- Added new `AzureSTTService`. This allows you to use Azure Speech-To-Text.
### Performance
- Convert `BaseOutputTransport` and `BaseOutputTransport` to fully use asyncio
and remove the use of threads.
### Other
- Added `twilio-chatbot`. This is an example that shows how to integrate Twilio
phone numbers with a Pipecat bot.
- Updated `07f-interruptible-azure.py` to use `AzureLLMService`,
`AzureSTTService` and `AzureTTSService`.
## [0.0.31] - 2024-06-13
### Performance
- Break long audio frames into 20ms chunks instead of 10ms.
## [0.0.30] - 2024-06-13
### Added
- Added `report_only_initial_ttfb` to `PipelineParams`. This will make it so
only the initial TTFB metrics after the user stops talking are reported.
- Added `OpenPipeLLMService`. This service will let you run OpenAI through
OpenPipe's SDK.
- Allow specifying frame processors' name through a new `name` constructor
argument.
- Added `DeepgramSTTService`. This service has an ongoing websocket
connection. To handle this, it subclasses `AIService` instead of
`STTService`. The output of this service will be pushed from the same task,
except system frames like `StartFrame`, `CancelFrame` or
`StartInterruptionFrame`.
### Changed
- `FrameSerializer.deserialize()` can now return `None` in case it is not
possible to desearialize the given data.
- `daily_rest.DailyRoomProperties` now allows extra unknown parameters.
### Fixed
- Fixed an issue where `DailyRoomProperties.exp` always had the same old
timestamp unless set by the user.
- Fixed a couple of issues with `WebsocketServerTransport`. It needed to use
`push_audio_frame()` and also VAD was not working properly.
- Fixed an issue that would cause LLM aggregator to fail with small
`VADParams.stop_secs` values.
- Fixed an issue where `BaseOutputTransport` would send longer audio frames
preventing interruptions.
### Other
- Added new `07h-interruptible-openpipe.py` example. This example shows how to
use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe.
- Added new `dialin-chatbot` example. This examples shows how to call the bot
using a phone number.
## [0.0.29] - 2024-06-07
### Added
- Added a new `FunctionFilter`. This filter will let you filter frames based on
a given function, except system messages which should never be filtered.
- Added `FrameProcessor.can_generate_metrics()` method to indicate if a
processor can generate metrics. In the future this might get an extra argument
to ask for a specific type of metric.
- Added `BasePipeline`. All pipeline classes should be based on this class. All
subclasses should implement a `processors_with_metrics()` method that returns
a list of all `FrameProcessor`s in the pipeline that can generate metrics.
- Added `enable_metrics` to `PipelineParams`.
- Added `MetricsFrame`. The `MetricsFrame` will report different metrics in the
system. Right now, it can report TTFB (Time To First Byte) values for
different services, that is the time spent between the arrival of a `Frame` to
the processor/service until the first `DataFrame` is pushed downstream. If
metrics are enabled an intial `MetricsFrame` with all the services in the
pipeline will be sent.
- Added TTFB metrics and debug logging for TTS services.
### Changed
- Moved `ParallelTask` to `pipecat.pipeline.parallel_task`.
### Fixed
- Fixed PlayHT TTS service to work properly async.
## [0.0.28] - 2024-06-05
### Fixed
- Fixed an issue with `SileroVADAnalyzer` that would cause memory to keep
growing indefinitely.
## [0.0.27] - 2024-06-05
### Added
- Added `DailyTransport.participants()` and `DailyTransport.participant_counts()`.
## [0.0.26] - 2024-06-05
### Added
- Added `OpenAITTSService`.
- Allow passing `output_format` and `model_id` to `CartesiaTTSService` to change
audio sample format and the model to use.
- Added `DailyRESTHelper` which helps you create Daily rooms and tokens in an
easy way.
- `PipelineTask` now has a `has_finished()` method to indicate if the task has
completed. If a task is never ran `has_finished()` will return False.
- `PipelineRunner` now supports SIGTERM. If received, the runner will be
canceled.
### Fixed
- Fixed an issue where `BaseInputTransport` and `BaseOutputTransport` where
stopping push tasks before pushing `EndFrame` frames could cause the bots to
get stuck.
- Fixed an error closing local audio transports.
- Fixed an issue with Deepgram TTS that was introduced in the previous release.
- Fixed `AnthropicLLMService` interruptions. If an interruption occurred, a
`user` message could be appended after the previous `user` message. Anthropic
does not allow that because it requires alternate `user` and `assistant`
messages.
### Performance
- The `BaseInputTransport` does not pull audio frames from sub-classes any
more. Instead, sub-classes now push audio frames into a queue in the base
class. Also, `DailyInputTransport` now pushes audio frames every 20ms instead
of 10ms.
- Remove redundant camera input thread from `DailyInputTransport`. This should
improve performance a little bit when processing participant videos.
- Load Cartesia voice on startup.
## [0.0.25] - 2024-05-31
### Added
- Added WebsocketServerTransport. This will create a websocket server and will
read messages coming from a client. The messages are serialized/deserialized
with protobufs. See `examples/websocket-server` for a detailed example.
- Added function calling (LLMService.register_function()). This will allow the
LLM to call functions you have registered when needed. For example, if you
register a function to get the weather in Los Angeles and ask the LLM about
the weather in Los Angeles, the LLM will call your function.
See https://platform.openai.com/docs/guides/function-calling
- Added new `LangchainProcessor`.
- Added Cartesia TTS support (https://cartesia.ai/)
### Fixed
- Fixed SileroVAD frame processor.
- Fixed an issue where `camera_out_enabled` would cause the highg CPU usage if
no image was provided.
### Performance
- Removed unnecessary audio input tasks.
## [0.0.24] - 2024-05-29
### Added
- Exposed `on_dialin_ready` for Daily transport SIP endpoint handling. This
notifies when the Daily room SIP endpoints are ready. This allows integrating
with third-party services like Twilio.
- Exposed Daily transport `on_app_message` event.
- Added Daily transport `on_call_state_updated` event.
- Added Daily transport `start_recording()`, `stop_recording` and
`stop_dialout`.
### Changed
- Added `PipelineParams`. This replaces the `allow_interruptions` argument in
`PipelineTask` and will allow future parameters in the future.
- Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.
- GoogleLLMService `api_key` argument is now mandatory.
### Fixed
- Daily tranport `dialin-ready` doesn't not block anymore and it now handles
timeouts.
- Fixed AzureLLMService.
## [0.0.23] - 2024-05-23
### Fixed
- Fixed an issue handling Daily transport `dialin-ready` event.
## [0.0.22] - 2024-05-23
### Added
- Added Daily transport `start_dialout()` to be able to make phone or SIP calls.
See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout
- Added Daily transport support for dial-in use cases.
- Added Daily transport events: `on_dialout_connected`, `on_dialout_stopped`,
`on_dialout_error` and `on_dialout_warning`. See
https://reference-python.daily.co/api_reference.html#daily.EventHandler
## [0.0.21] - 2024-05-22
### Added
- Added vision support to Anthropic service.
- Added `WakeCheckFilter` which allows you to pass information downstream only
if you say a certain phrase/word.
### Changed
- `Filter` has been renamed to `FrameFilter` and it's now under
`processors/filters`.
### Fixed
- Fixed Anthropic service to use new frame types.
- Fixed an issue in `LLMUserResponseAggregator` and `UserResponseAggregator`
that would cause frames after a brief pause to not be pushed to the LLM.
- Clear the audio output buffer if we are interrupted.
- Re-add exponential smoothing after volume calculation. This makes sure the
volume value being used doesn't fluctuate so much.
## [0.0.20] - 2024-05-22
### Added
- In order to improve interruptions we now compute a loudness level using
[pyloudnorm](https://github.com/csteinmetz1/pyloudnorm). The audio coming
WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm
applied to the signal, however we don't do that on our local PyAudio
signals. This means that currently incoming audio from PyAudio is kind of
broken. We will fix it in future releases.
### Fixed
- Fixed an issue where `StartInterruptionFrame` would cause
`LLMUserResponseAggregator` to push the accumulated text causing the LLM
respond in the wrong task. The `StartInterruptionFrame` should not trigger any
new LLM response because that would be spoken in a different task.
- Fixed an issue where tasks and threads could be paused because the executor
didn't have more tasks available. This was causing issues when cancelling and
recreating tasks during interruptions.
## [0.0.19] - 2024-05-20
### Changed
- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` internal
messages are now exposed through the `messages` property.
### Fixed
- Fixed an issue where `LLMAssistantResponseAggregator` was not accumulating the
full response but short sentences instead. If there's an interruption we only
accumulate what the bot has spoken until now in a long response as well.
## [0.0.18] - 2024-05-20
### Fixed
- Fixed an issue in `DailyOuputTransport` where transport messages were not
being sent.
## [0.0.17] - 2024-05-19
### Added
- Added `google.generativeai` model support, including vision. This new `google`
service defaults to using `gemini-1.5-flash-latest`. Example in
`examples/foundational/12a-describe-video-gemini-flash.py`.
- Added vision support to `openai` service. Example in
`examples/foundational/12a-describe-video-gemini-flash.py`.
- Added initial interruptions support. The assistant contexts (or aggregators)
should now be placed after the output transport. This way, only the completed
spoken context is added to the assistant context.
- Added `VADParams` so you can control voice confidence level and others.
- `VADAnalyzer` now uses an exponential smoothed volume to improve speech
detection. This is useful when voice confidence is high (because there's
someone talking near you) but volume is low.
### Fixed
- Fixed an issue where TTSService was not pushing TextFrames downstream.
- Fixed issues with Ctrl-C program termination.
- Fixed an issue that was causing `StopTaskFrame` to actually not exit the
`PipelineTask`.
## [0.0.16] - 2024-05-16
### Fixed
- `DailyTransport`: don't publish camera and audio tracks if not enabled.
- Fixed an issue in `BaseInputTransport` that was causing frames pushed
downstream not pushed in the right order.
## [0.0.15] - 2024-05-15
### Fixed
- Quick hot fix for receiving `DailyTransportMessage`.
## [0.0.14] - 2024-05-15
### Added
- Added `DailyTransport` event `on_participant_left`.
- Added support for receiving `DailyTransportMessage`.
### Fixed
- Images are now resized to the size of the output camera. This was causing
images not being displayed.
- Fixed an issue in `DailyTransport` that would not allow the input processor to
shutdown if no participant ever joined the room.
- Fixed base transports start and stop. In some situation processors would halt
or not shutdown properly.
## [0.0.13] - 2024-05-14
### Changed
- `MoondreamService` argument `model_id` is now `model`.
- `VADAnalyzer` arguments have been renamed for more clarity.
### Fixed
- Fixed an issue with `DailyInputTransport` and `DailyOutputTransport` that
could cause some threads to not start properly.
- Fixed `STTService`. Add `max_silence_secs` and `max_buffer_secs` to handle
better what's being passed to the STT service. Also add exponential smoothing
to the RMS.
- Fixed `WhisperSTTService`. Add `no_speech_prob` to avoid garbage output text.
## [0.0.12] - 2024-05-14
### Added
- Added `DailyTranscriptionSettings` to be able to specify transcription
settings much easier (e.g. language).
### Other
- Updated `simple-chatbot` with Spanish.
- Add missing dependencies in some of the examples.
## [0.0.11] - 2024-05-13
### Added
- Allow stopping pipeline tasks with new `StopTaskFrame`.
### Changed
- TTS, STT and image generation service now use `AsyncGenerator`.
### Fixed
- `DailyTransport`: allow registering for participant transcriptions even if
input transport is not initialized yet.
### Other
- Updated `storytelling-chatbot`.
## [0.0.10] - 2024-05-13
### Added
- Added Intel GPU support to `MoondreamService`.
- Added support for sending transport messages (e.g. to communicate with an app
at the other end of the transport).
- Added `FrameProcessor.push_error()` to easily send an `ErrorFrame` upstream.
### Fixed
- Fixed Azure services (TTS and image generation).
### Other
- Updated `simple-chatbot`, `moondream-chatbot` and `translation-chatbot`
examples.
## [0.0.9] - 2024-05-12
### Changed
Many things have changed in this version. Many of the main ideas such as frames,
processors, services and transports are still there but some things have changed
a bit.
- `Frame`s describe the basic units for processing. For example, text, image or
audio frames. Or control frames to indicate a user has started or stopped
speaking.
- `FrameProcessor`s process frames (e.g. they convert a `TextFrame` to an
`ImageRawFrame`) and push new frames downstream or upstream to their linked
peers.
- `FrameProcessor`s can be linked together. The easiest wait is to use the
`Pipeline` which is a container for processors. Linking processors allow
frames to travel upstream or downstream easily.
- `Transport`s are a way to send or receive frames. There can be local
transports (e.g. local audio or native apps), network transports
(e.g. websocket) or service transports (e.g. https://daily.co).
- `Pipeline`s are just a processor container for other processors.
- A `PipelineTask` know how to run a pipeline.
- A `PipelineRunner` can run one or more tasks and it is also used, for example,
to capture Ctrl-C from the user.
## [0.0.8] - 2024-04-11
### Added
- Added `FireworksLLMService`.
- Added `InterimTranscriptionFrame` and enable interim results in
`DailyTransport` transcriptions.
### Changed
- `FalImageGenService` now uses new `fal_client` package.
### Fixed
- `FalImageGenService`: use `asyncio.to_thread` to not block main loop when
generating images.
- Allow `TranscriptionFrame` after an end frame (transcriptions can be delayed
and received after `UserStoppedSpeakingFrame`).
## [0.0.7] - 2024-04-10
### Added
- Add `use_cpu` argument to `MoondreamService`.
## [0.0.6] - 2024-04-10
### Added
- Added `FalImageGenService.InputParams`.
- Added `URLImageFrame` and `UserImageFrame`.
- Added `UserImageRequestFrame` and allow requesting an image from a participant.
- Added base `VisionService` and `MoondreamService`
### Changed
- Don't pass `image_size` to `ImageGenService`, images should have their own size.
- `ImageFrame` now receives a tuple`(width,height)` to specify the size.
- `on_first_other_participant_joined` now gets a participant argument.
### Fixed
- Check if camera, speaker and microphone are enabled before writing to them.
### Performance
- `DailyTransport` only subscribe to desired participant video track.
## [0.0.5] - 2024-04-06
### Changed
- Use `camera_bitrate` and `camera_framerate`.
- Increase `camera_framerate` to 30 by default.
### Fixed
- Fixed `LocalTransport.read_audio_frames`.
## [0.0.4] - 2024-04-04
### Added
- Added project optional dependencies `[silero,openai,...]`.
### Changed
- Moved thransports to its own directory.
- Use `OPENAI_API_KEY` instead of `OPENAI_CHATGPT_API_KEY`.
### Fixed
- Don't write to microphone/speaker if not enabled.
### Other
- Added live translation example.
- Fix foundational examples.
## [0.0.3] - 2024-03-13
### Other
- Added `storybot` and `chatbot` examples.
## [0.0.2] - 2024-03-12
Initial public release.

62
CHANGELOG.md.template Normal file
View File

@@ -0,0 +1,62 @@
# Changelog
All notable changes to the **&lt;project name&gt;** SDK will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Please make sure to add your changes to the appropriate categories:
## [Unreleased]
### Added
<!-- for new functionality -->
- n/a
### Changed
<!-- for changed functionality -->
- n/a
### Deprecated
<!-- for soon-to-be removed functionality -->
- n/a
### Removed
<!-- for removed functionality -->
- n/a
### Fixed
<!-- for fixed bugs -->
- n/a
### Performance
<!-- for performance-relevant changes -->
- n/a
### Security
<!-- for security-relevant changes -->
- n/a
### Other
<!-- for everything else -->
- n/a
## [0.1.0] - YYYY-MM-DD
Initial release.

40
Dockerfile Normal file
View File

@@ -0,0 +1,40 @@
# setup
FROM python:3.11.5
WORKDIR /app
COPY requirements.txt /app
COPY *.py /app
COPY pyproject.toml /app
COPY src/ /app/src/
COPY examples/ /app/examples/
WORKDIR /app
RUN ls --recursive /app/
RUN pip3 install --upgrade -r requirements.txt
RUN python -m build .
RUN pip3 install .
RUN pip3 install gunicorn
# If running on Ubuntu, Azure TTS requires some extra config
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi
RUN wget -O - https://www.openssl.org/source/openssl-1.1.1w.tar.gz | tar zxf -
WORKDIR openssl-1.1.1w
RUN ./config --prefix=/usr/local
RUN make -j $(nproc)
RUN make install_sw install_ssldirs
RUN ldconfig -v
ENV SSL_CERT_DIR=/etc/ssl/certs
#ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
RUN apt clean
RUN apt-get update
RUN apt-get -y install build-essential libssl-dev ca-certificates libasound2 wget
ENV PYTHONUNBUFFERED=1
WORKDIR /app
EXPOSE 8000
# run
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--chdir", "examples/server", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]

24
LICENSE Normal file
View File

@@ -0,0 +1,24 @@
BSD 2-Clause License
Copyright (c) 2024, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

214
README.md
View File

@@ -1,55 +1,221 @@
# dailyai SDK
<div align="center">
 <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div>
This SDK can help you build applications that participate in WebRTC meetings and use various AI services to interact with other participants.
# Pipecat
## Build/Install
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Discord](https://img.shields.io/discord/1239284677165056021
)](https://discord.gg/pipecat)
`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, [story-telling toys for kids](https://storytelling-chatbot.fly.dev/), customer support bots, [intake flows](https://www.youtube.com/watch?v=lDevgsp9vn0), and snarky social companions.
Take a look at some example apps:
<p float="left">
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/storytelling-chatbot/image.png" width="280" /></a>
<br/>
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/translation-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="280" /></a>
</p>
## Getting started with voice agents
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.
```shell
# install the module
pip install pipecat-ai
# set up an .env file with API keys
cp dot-env.template .env
```
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:
```shell
pip install "pipecat-ai[option,...]"
```
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
- **AI services**: `anthropic`, `azure`, `deepgram`, `gladia`, `google`, `fal`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`, `xtts`
- **Transports**: `local`, `websocket`, `daily`
## Code examples
- [foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development
## A simple voice agent running locally
Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use [Daily](https://daily.co) for real-time media transport, and [ElevenLabs](https://elevenlabs.io/) for text-to-speech.
```python
#app.py
import asyncio
import aiohttp
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
async def main():
async with aiohttp.ClientSession() as session:
# Use Daily as a real-time media transport (WebRTC)
transport = DailyTransport(
room_url=...,
token=...,
bot_name="Bot Name",
params=DailyParams(audio_out_enabled=True))
# Use Eleven Labs for Text-to-Speech
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=...,
voice_id=...,
)
# Simple pipeline that will process text to speech and output the result
pipeline = Pipeline([tts, transport.output()])
# Create Pipecat processor that can run one or more pipelines tasks
runner = PipelineRunner()
# Assign the task callable to run the pipeline
task = PipelineTask(pipeline)
# Register an event handler to play audio when a
# participant joins the transport WebRTC session
@transport.event_handler("on_participant_joined")
async def on_new_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
# Queue a TextFrame that will get spoken by the TTS service (Eleven Labs)
await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
# Run the pipeline task
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())
```
Run it with:
```shell
python app.py
```
Daily provides a prebuilt WebRTC user interface. Whilst the app is running, you can visit at `https://<yourdomain>.daily.co/<room_url>` and listen to the bot say hello!
## WebRTC for production use
WebSockets are fine for server-to-server communication or for initial development. But for production use, youll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/#webrtc))
One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard.
## What is VAD?
Voice Activity Detection &mdash; very important for knowing when a user has finished speaking to your bot. If you are not using press-to-talk, and want Pipecat to detect when the user has finished talking, VAD is an essential component for a natural feeling conversation.
Pipecat makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.
```shell
pip install pipecat-ai[silero]
```
The first time your run your bot with Silero, startup may take a while whilst it downloads and caches the model in the background. You can check the progress of this in the console.
## Hacking on the framework itself
_Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_
```
python3 -m venv env
source env/bin/activate
```shell
python3 -m venv venv
source venv/bin/activate
```
From the root of this repo, run the following:
```
pip install -r requirements.txt
```shell
pip install -r dev-requirements.txt -r {env}-requirements.txt
python -m build
```
This builds the package. To use the package locally (eg to run sample files), run
```
```shell
pip install --editable .
```
If you want to use this package from another directory, you can run:
```
```shell
pip install path_to_this_repo
```
## Running the samples
### Running tests
Tou can run the simple sample like so:
From the root directory, run:
```
python src/samples/theoretical-to-real/01-say-one-thing.py -u <url of your Daily meeting> -k <your Daily API Key>
```shell
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests
```
Note that the sample uses Azure's TTS and LLM services. You'll need to set the following environment variables for the sample to work:
## Setting up your editor
```
AZURE_SPEECH_SERVICE_KEY
AZURE_SPEECH_SERVICE_REGION
AZURE_CHATGPT_KEY
AZURE_CHATGPT_ENDPOINT
AZURE_CHATGPT_DEPLOYMENT_ID
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting.
### Emacs
You can use [use-package](https://github.com/jwiegley/use-package) to install [py-autopep8](https://codeberg.org/ideasman42/emacs-py-autopep8) package and configure `autopep8` arguments:
```elisp
(use-package py-autopep8
:ensure t
:defer t
:hook ((python-mode . py-autopep8-mode))
:config
(setq py-autopep8-options '("-a" "-a", "--max-line-length=100")))
```
If you have those environment variables stored in an .env file, you can quickly load them into your terminal's environment by running this:
`autopep8` was installed in the `venv` environment described before, so you should be able to use [pyvenv-auto](https://github.com/ryotaro612/pyvenv-auto) to automatically load that environment inside Emacs.
```elisp
(use-package pyvenv-auto
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
```bash
export $(grep -v '^#' .env | xargs)
```
### Visual Studio Code
Install the
[autopep8](https://marketplace.visualstudio.com/items?itemName=ms-python.autopep8) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, enable formatting on save and configure `autopep8` arguments:
```json
"[python]": {
"editor.defaultFormatter": "ms-python.autopep8",
"editor.formatOnSave": true
},
"autopep8.args": [
"-a",
"-a",
"--max-line-length=100"
],
```
## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)
➡️ [Reach us on X](https://x.com/pipecat_ai)

8
dev-requirements.txt Normal file
View File

@@ -0,0 +1,8 @@
autopep8~=2.1.0
build~=1.2.1
grpcio-tools~=1.62.2
pip-tools~=7.4.1
pyright~=1.1.367
pytest~=8.2.0
setuptools~=69.5.1
setuptools_scm~=8.1.0

10
docs/README.md Normal file
View File

@@ -0,0 +1,10 @@
# Pipecat Docs
## [Architecture Overview](architecture.md)
Learn about the thinking behind the framework's design.
## [A Frame's Progress](frame-progress.md)
See how a Frame is processed through a Transport, a Pipeline, and a series of Frame Processors.

17
docs/architecture.md Normal file
View File

@@ -0,0 +1,17 @@
# Pipecat architecture guide
## Frames
Frames can represent discrete chunks of data, for instance a chunk of text, a chunk of audio, or an image. They can also be used to as control flow, for instance a frame that indicates that there is no more data available, or that a user started or stopped talking. They can also represent more complex data structures, such as a message array used for an LLM completion.
## FrameProcessors
Frame processors operate on frames. Every frame processor implements a `process_frame` method that consumes one frame and produces zero or more frames. Frame processors can do simple transforms, such as concatenating text fragments into sentences, or they can treat frames as input for an AI Service, and emit chat completions based on message arrays or transform text into audio or images.
## Pipelines
Pipelines are lists of frame processors linked together. Frame processors can push frames upstream or downstream to their peers. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport as an output.
## Transports
Transports provide input and output frame processors to receive or send frames respectively. For example, the `DailyTransport` does this with a WebRTC session joined to a Daily.co room.

46
docs/frame-progress.md Normal file
View File

@@ -0,0 +1,46 @@
# A Frame's Progress
1. A user says “Hello, LLM” and the cloud transcription service delivers a transcription to the Transport.
![A transcript frame arrives](images/frame-progress-01.png)
2. The Transport places a Transcription frame in the Pipelines source queue.
![Frame in source queue](images/frame-progress-02.png)
3. The Pipeline passes the Transcription frame to the first Frame Processor in its list, the LLM User Message Aggregator.
![To UMA](images/frame-progress-03.png)
4. The LLM User Message Aggregator updates the LLM Context with a `{“user”: “Hello LLM”}` message.
![Update context](images/frame-progress-04.png)
5. The LLM User Message Aggregator yields an LLM Message Frame, containing the updated LLM Context. The Pipeline passes this frame to the LLM Frame Processor.
![Update context](images/frame-progress-05.png)
6. The LLM Frame Processor creates a streaming chat completion based on the LLM context and yields the first chunk of a response, Text Frame with the value “Hi, “. The Pipeline passes this frame to the TTS Frame Processor. The TTS Frame Processor aggregates this response but doesnt yield anything, yet, because its waiting for a full sentence.
![LLM yields Text](images/frame-progress-06.png)
7. The LLM Frame Processor yields another Text Frame with the value “there.”. The Pipeline passes this frame to the TTS Frame Processor.
![LLM yields more Text](images/frame-progress-07.png)
8. The TTS Frame Processor now has a full sentence, so it starts streaming audio based on “Hi, there.” It yields the first chunk of streaming audio as an Audio frame, which the Pipeline passes to the LLM Assistant Message Aggregator.
![TTS yields Audio](images/frame-progress-08.png)
9. The LLM Assistant Message Aggregator doesnt do anything with Audio frames, so it immediately yields the frame, unchanged. This is the convention for all Frame Processors: frames that the processor doesnt process should be immediately yielded.
![pass-through](images/frame-progress-09.png)
10. The Pipeline places the first Audio frame in its sink queue, which is being watched by the Transport. Since the frame is now in a queue, the Pipeline can continue processing other frames. Note that the source and sink queues form a sort of “boundary of concurrent processing” between a Pipeline and the outside world. In a Pipeline, Frames are processed sequentially; once a Frame is on a queue it can be processed in parallel with the frames being processed by the Pipeline. TODO: link to a more in-depth section about this.
![sink queue](images/frame-progress-10.png)
11. The TTS Frame Processor yields another Audio frame as the Transport transmits the first Audio frame.
![parallel audio](images/frame-progress-11.png)
12. As before, the LLM Assistant Message Aggregator immediately yields the Audio frame and the Pipeline places the Audio frame in the sink queue.
![sink queue 2](images/frame-progress-12.png)
13. The TTS Frame Processor has no more frames to yield. The LLM Frame Processor emits an LLM Response End Frame, which the Pipeline passes to the TTS Frame Processor.
![response end](images/frame-progress-13.png)
14. The TTS Frame Processor immediately yields the LLM Response End Frame, so the Pipeline passes it along to the LLM Assistant Message Aggregator. The LLM Assistant Message Aggregator updates the LLM Context with the full response from the LLM. TODO TODO: I realized I forgot that the TSS Frame Processor also yields the Text frames that the LLM emitted so that the LLM Assistant Message Aggregator could accumulate them, arrggh.
![response end](images/frame-progress-14.png)
15. The system is quiet, and waiting for the next message from the Transport.
![response end](images/frame-progress-15.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

41
dot-env.template Normal file
View File

@@ -0,0 +1,41 @@
# Anthropic
ANTHROPIC_API_KEY=...
# Azure
AZURE_SPEECH_REGION=...
AZURE_SPEECH_API_KEY=...
AZURE_CHATGPT_API_KEY=...
AZURE_CHATGPT_ENDPOINT=https://...
AZURE_CHATGPT_MODEL=...
AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
# ElevenLabs
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
# Fal
FAL_KEY=...
# Fireworks
FIREWORKS_API_KEY=...
# Gladia
GLADIA_API_KEY=...
# PlayHT
PLAY_HT_USER_ID=...
PLAY_HT_API_KEY=...
# OpenAI
OPENAI_API_KEY=...
#OpenPipe
OPENPIPE_API_KEY=...

86
examples/README.md Normal file
View File

@@ -0,0 +1,86 @@
# Pipecat &mdash; Examples
## Foundational snippets
Small snippets that build on each other, introducing one or two concepts at a time.
➡️ [Take a look](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational)
## Chatbot examples
Collection of self-contained real-time voice and video AI demo applications built with Pipecat.
### Quickstart
Each project has its own set of dependencies and configuration variables. They intentionally avoids shared code across projects &mdash; you can grab whichever demo folder you want to work with as a starting point.
We recommend you start with a virtual environment:
```shell
cd pipecat-ai/examples/simple-chatbot
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
Next, follow the steps in the README for each demo.
Make sure you `pip install -r requirements.txt` for each demo project, so you can be sure to have the necessary service dependencies that extend the functionality of Pipecat. You can read more about the framework architecture [here](https://github.com/pipecat-ai/pipecat/tree/main/docs).
## Projects:
| Project | Description | Services |
|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, OpenAI, Fal, Daily, Custom UI |
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| [Patient intake](patient-intake) | A chatbot that can call functions in response to user input. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.
> It provides a quick way to join a real-time session with your bot and test your ideas without building any frontend code. If you'd like to see an example of a custom UI, try Storybot.
## FAQ
### Deployment
For each of these demos we've included a `Dockerfile`. Out of the box, this should provide everything needed to get the respective demo running on a VM:
```shell
docker build username/app:tag .
docker run -p 7860:7860 --env-file ./.env username/app:tag
docker push ...
```
### SSL
If you're working with a custom UI (such as with the Storytelling Chatbot), it's important to ensure your deployment platform supports HTTPS, as accessing user devices such as mics and webcams requires SSL.
If you try to run a custom UI without SSL, you may see an error in the console telling you that `navigator` is undefined, or no devices are available.
### Are these examples production ready?
Yes, kind of.
These demos attempt to keep things simple and are unopinionated regarding environment or scalability.
We're using FastAPI to spawn a subprocess for the bots / agents &mdash; useful for small tests, but not so great for production grade apps with many concurrent users. You can see how this works in each project's `start` endpoint in `server.py`.
Creating virtualized worker pools and on-demand instances is out of scope for these examples, but we hope to add some examples to this repo soon!
For projects that have CUDA as a requirement, such as Moondream Chatbot, be sure to deploy to a GPU-powered platform (such as [fly.io](https://fly.io) or [Runpod](https://runpod.io).)
## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)
➡️ [Reach us on Twitter](https://x.com/pipecat_ai)

View File

@@ -0,0 +1,16 @@
FROM python:3.11-bullseye
# Open port 7860 for http service
ENV FAST_API_PORT=7860
EXPOSE 7860
# Install Python dependencies
COPY *.py .
COPY ./requirements.txt requirements.txt
RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
# Install models
RUN python3 install_deps.py
# Start the FastAPI server
CMD python3 bot_runner.py --port ${FAST_API_PORT}

View File

@@ -0,0 +1,43 @@
# Fly.io deployment example
This project modifies the `bot_runner.py` server to launch a new machine for each user session. This is a recommended approach for production vs. running shell processess as your deployment will quickly run out of system resources under load.
To speed up machine boot times, we also download and cache Silero VAD as part of the Dockerfile (`install_deps.py`). If you are using other custom models, you can add them here too.
For this example, we are using Daily as a WebRTC transport and provisioning a new room and token for each session. You can use another transport, such as WebSockets, by modifying the `bot.py` and `bot_runner.py` files accordingly.
## Setting up your fly.io deployment
### Create your fly.toml file
You can copy the `example-fly.toml` as a reference. Be sure to change the app name to something unique.
### Create your .env file
Copy the base `env.example` to `.env` and enter the necessary API keys.
`FLY_APP_NAME` should match that in the `fly.toml` file.
### Launch a new fly.io project
`fly launch` or `fly launch --org your-org-name`
### Set the necessary app secrets from your .env
Note: you can do this manually via the fly.io dashboard under the "secrets" sub-section of your deployment (e.g. "https://fly.io/apps/fly-app-name/secrets") or run the following terminal command:
`cat .env | tr '\n' ' ' | xargs flyctl secrets set`
### Deploy your machine
`fly deploy`
## Connecting to your bot
Send a post request to your running fly.io instance:
`curl --location --request POST 'https://YOUR_FLY_APP_NAME/start_bot'`
This request will wait until the machine enters into a `starting` state, before returning the a room URL and token to join.

View File

@@ -0,0 +1,103 @@
import asyncio
import aiohttp
import os
import sys
import argparse
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
daily_api_key = os.getenv("DAILY_API_KEY", "")
daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
async def main(room_url: str, token: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your output will be converted to audio so don't include special characters other than '!' or '?' in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying hello.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(),
tma_in,
llm,
tts,
transport.output(),
tma_out,
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Bot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
config = parser.parse_args()
asyncio.run(main(config.u, config.t))

View File

@@ -0,0 +1,199 @@
import os
import argparse
import subprocess
import requests
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from dotenv import load_dotenv
load_dotenv(override=True)
# ------------ Configuration ------------ #
MAX_SESSION_TIME = 5 * 60 # 5 minutes
REQUIRED_ENV_VARS = [
'DAILY_API_KEY',
'OPENAI_API_KEY',
'ELEVENLABS_API_KEY',
'ELEVENLABS_VOICE_ID',
'FLY_API_KEY',
'FLY_APP_NAME',]
FLY_API_HOST = os.getenv("FLY_API_HOST", "https://api.machines.dev/v1")
FLY_APP_NAME = os.getenv("FLY_APP_NAME", "pipecat-fly-example")
FLY_API_KEY = os.getenv("FLY_API_KEY", "")
FLY_HEADERS = {
'Authorization': f"Bearer {FLY_API_KEY}",
'Content-Type': 'application/json'
}
daily_rest_helper = DailyRESTHelper(
os.getenv("DAILY_API_KEY", ""),
os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))
# ----------------- API ----------------- #
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"]
)
# ----------------- Main ----------------- #
def spawn_fly_machine(room_url: str, token: str):
# Use the same image as the bot runner
res = requests.get(f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines", headers=FLY_HEADERS)
if res.status_code != 200:
raise Exception(f"Unable to get machine info from Fly: {res.text}")
image = res.json()[0]['config']['image']
# Machine configuration
cmd = f"python3 bot.py -u {room_url} -t {token}"
cmd = cmd.split()
worker_props = {
"config": {
"image": image,
"auto_destroy": True,
"init": {
"cmd": cmd
},
"restart": {
"policy": "no"
},
"guest": {
"cpu_kind": "shared",
"cpus": 1,
"memory_mb": 1024
}
},
}
# Spawn a new machine instance
res = requests.post(
f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines",
headers=FLY_HEADERS,
json=worker_props)
if res.status_code != 200:
raise Exception(f"Problem starting a bot worker: {res.text}")
# Wait for the machine to enter the started state
vm_id = res.json()['id']
res = requests.get(
f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines/{vm_id}/wait?state=started",
headers=FLY_HEADERS)
if res.status_code != 200:
raise Exception(f"Bot was unable to enter started state: {res.text}")
print(f"Machine joined room: {room_url}")
@app.post("/start_bot")
async def start_bot(request: Request) -> JSONResponse:
try:
data = await request.json()
# Is this a webhook creation request?
if "test" in data:
return JSONResponse({"test": True})
except Exception as e:
pass
# Use specified room URL, or create a new one if not specified
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")
if not room_url:
params = DailyRoomParams(
properties=DailyRoomProperties()
)
try:
room: DailyRoomObject = daily_rest_helper.create_room(params=params)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Unable to provision room {e}")
else:
# Check passed room URL exists, we should assume that it already has a sip set up
try:
room: DailyRoomObject = daily_rest_helper.get_room_from_url(room_url)
except Exception:
raise HTTPException(
status_code=500, detail=f"Room not found: {room_url}")
# Give the agent a token to join the session
token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
if not room or not token:
raise HTTPException(
status_code=500, detail=f"Failed to get token for room: {room_url}")
# Launch a new fly.io machine, or run as a shell process (not recommended)
run_as_process = os.getenv("RUN_AS_PROCESS", False)
if run_as_process:
try:
subprocess.Popen(
[f"python3 -m bot -u {room.url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)))
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Failed to start subprocess: {e}")
else:
try:
spawn_fly_machine(room.url, token)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Failed to spawn VM: {e}")
# Grab a token for the user to join with
user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
return JSONResponse({
"room_url": room.url,
"token": user_token,
})
if __name__ == "__main__":
# Check environment variables
for env_var in REQUIRED_ENV_VARS:
if env_var not in os.environ:
raise Exception(f"Missing environment variable: {env_var}.")
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument("--host", type=str,
default=os.getenv("HOST", "0.0.0.0"), help="Host address")
parser.add_argument("--port", type=int,
default=os.getenv("PORT", 7860), help="Port number")
parser.add_argument("--reload", action="store_true",
default=False, help="Reload code on change")
config = parser.parse_args()
try:
import uvicorn
uvicorn.run(
"bot_runner:app",
host=config.host,
port=config.port,
reload=config.reload
)
except KeyboardInterrupt:
print("Pipecat runner shutting down...")

View File

@@ -0,0 +1,8 @@
DAILY_API_KEY=
DAILY_SAMPLE_ROOM_URL= # Enter a Daily room URL to use a set room URL each time (useful for local testing)
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
FLY_API_KEY=
FLY_APP_NAME=
RUN_AS_PROCESS= # Spawn fly.io machine for each session or run as local process

View File

@@ -0,0 +1,25 @@
# fly.toml app configuration file generated for pipecat-fly-example on 2024-07-01T15:04:53+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = 'pipecat-fly-example'
primary_region = 'sjc'
[build]
[env]
FLY_APP_NAME = 'pipecat-fly-example'
[http_service]
internal_port = 7860
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
processes = ['app']
[[vm]]
memory = 512
cpu_kind = 'shared'
cpus = 1

View File

@@ -0,0 +1,4 @@
import torch
# Download (cache) the Silero VAD model
torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)

View File

@@ -0,0 +1,6 @@
pipecat-ai[daily,openai,silero]
fastapi
uvicorn
requests
python-dotenv
loguru

View File

@@ -0,0 +1,3 @@
**/.DS_Store
.env
.env.*

165
examples/dialin-chatbot/.gitignore vendored Normal file
View File

@@ -0,0 +1,165 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml
# custom script to recursively upgrade items in requirements.py
upgrade_requirements.py
.DS_Store

View File

@@ -0,0 +1,40 @@
FROM python:3.11-bullseye
ARG DEBIAN_FRONTEND=noninteractive
ARG USE_PERSISTENT_DATA
ENV PYTHONUNBUFFERED=1
# Expose FastAPI port
ENV FAST_API_PORT=7860
EXPOSE 7860
# Install system dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
build-essential \
git \
ffmpeg \
google-perftools \
ca-certificates curl gnupg \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
# Set up a new user named "user" with user ID 1000
RUN useradd -m -u 1000 user
# Set home to the user's home directory
ENV HOME=/home/user \
PATH=/home/user/.local/bin:$PATH \
PYTHONPATH=$HOME/app \
PYTHONUNBUFFERED=1
# Switch to the "user" user
USER user
# Set the working directory to the user's home directory
WORKDIR $HOME/app
# Install Python dependencies
COPY *.py .
COPY ./requirements.txt requirements.txt
RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
# Start the FastAPI server
CMD python3 bot_runner.py --host "0.0.0.0" --port ${FAST_API_PORT}

View File

@@ -0,0 +1,85 @@
<div align="center">
 <img alt="pipecat" width="300px" height="auto" src="image.png">
</div>
# Dialin example
Example project that demonstrates how to add phone number dialin to your Pipecat bots. We include examples for both Daily (`bot_daily.py`) and Twilio (`bot_twilio.py`), depending on who you want to use as a phone vendor.
- 🔁 Transport: Daily WebRTC
- 💬 Speech-to-Text: Deepgram via Daily transport
- 🤖 LLM: GPT4-o / OpenAI
- 🔉 Text-to-Speech: ElevenLabs
#### Should I use Daily or Twilio as a vendor?
If you're starting from scratch, using Daily to provision phone numbers alongside Daily as a transport offers some convenience (such as automatic call forwarding.)
If you already have Twilio numbers and workflows that you want to connect to your Pipecat bots, there is some additional configuration required (you'll need to create a `on_dialin_ready` and use the Twilio client to trigger the forward.)
You can read more about this, as well as see respective walkthroughs in our docs.
## Setup
```shell
# Install the requirements
pip install -r requirements.txt
# Setup your env
mv env.example .env
```
## Using Daily numbers
Run `bot_runner.py` to handle incoming HTTP requests:
`python bot_runner.py --host localhost`
Then target the following URL:
`POST /daily_start_bot`
For more configuration options, please consult Daily's API documentation.
## Using Twilio numbers
As above, but target the following URL:
`POST /twilio_start_bot`
For more configuration options, please consult Twilio's API documentation.
## Deployment example
A Dockerfile is included in this demo for convenience. Here is an example of how to build and deploy your bot to [fly.io](https://fly.io).
*Please note: This demo spawns agents as subprocesses for convenience / demonstration purposes. You would likely not want to do this in production as it would limit concurrency to available system resources. For more information on how to deploy your bots using VMs, refer to the Pipecat documentation.*
### Build the docker image
`docker build -t tag:project .`
### Launch the fly project
`mv fly.example.toml fly.toml`
`fly launch` (using the included fly.toml)
### Setup your secrets on Fly
Set the necessary secrets (found in `env.example`)
`fly secrets set DAILY_API_KEY=... OPENAI_API_KEY=... ELEVENLABS_API_KEY=... ELEVENLABS_VOICE_ID=...`
If you're using Twilio as a number vendor:
`fly secrets set TWILIO_ACCOUNT_SID=... TWILIO_AUTH_TOKEN=...`
### Deploy!
`fly deploy`
## Need to do something more advanced?
This demo covers the basics of bot telephony. If you want to know more about working with PSTN / SIP, please ping us on [Discord](https://discord.gg/pipecat).

View File

@@ -0,0 +1,111 @@
import asyncio
import aiohttp
import os
import sys
import argparse
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
from pipecat.frames.frames import (
LLMMessagesFrame,
EndFrame
)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyDialinSettings
from pipecat.vad.silero import SileroVADAnalyzer
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
daily_api_key = os.getenv("DAILY_API_KEY", "")
daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
async def main(room_url: str, token: str, callId: str, callDomain: str):
async with aiohttp.ClientSession() as session:
# diallin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
diallin_settings = DailyDialinSettings(
call_id=callId,
call_domain=callDomain
)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
dialin_settings=diallin_settings,
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying 'Oh, hello! Who dares dial me at this hour?!'.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(),
tma_in,
llm,
tts,
transport.output(),
tma_out,
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Simple ChatBot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
parser.add_argument("-i", type=str, help="Call ID")
parser.add_argument("-d", type=str, help="Call Domain")
config = parser.parse_args()
asyncio.run(main(config.u, config.t, config.i, config.d))

View File

@@ -0,0 +1,220 @@
"""
bot_runner.py
HTTP service that listens for incoming calls from either Daily or Twilio,
provisioning a room and starting a Pipecat bot in response.
Refer to README for more information.
"""
import os
import argparse
import subprocess
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomSipParams, DailyRoomParams
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, PlainTextResponse
from twilio.twiml.voice_response import VoiceResponse
from dotenv import load_dotenv
load_dotenv(override=True)
# ------------ Configuration ------------ #
MAX_SESSION_TIME = 5 * 60 # 5 minutes
REQUIRED_ENV_VARS = ['OPENAI_API_KEY', 'DAILY_API_KEY',
'ELEVENLABS_API_KEY', 'ELEVENLABS_VOICE_ID']
daily_rest_helper = DailyRESTHelper(
os.getenv("DAILY_API_KEY", ""),
os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))
# ----------------- API ----------------- #
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"]
)
"""
Create Daily room, tell the bot if the room is created for Twilio's SIP or Daily's SIP (vendor).
When the vendor is Daily, the bot handles the call forwarding automatically,
i.e, forwards the call from the "hold music state" to the Daily Room's SIP URI.
Alternatively, when the vendor is Twilio (not Daily), the bot is responsible for
updating the state on Twilio. So when `dialin-ready` fires, it takes appropriate
action using the Twilio Client library.
"""
def _create_daily_room(room_url, callId, callDomain=None, vendor="daily"):
if not room_url:
params = DailyRoomParams(
properties=DailyRoomProperties(
# Note: these are the default values, except for the display name
sip=DailyRoomSipParams(
display_name="dialin-user",
video=False,
sip_mode="dial-in",
num_endpoints=1
)
)
)
print(f"Creating new room...")
room: DailyRoomObject = daily_rest_helper.create_room(params=params)
else:
# Check passed room URL exist (we assume that it already has a sip set up!)
try:
print(f"Joining existing room: {room_url}")
room: DailyRoomObject = daily_rest_helper.get_room_from_url(
room_url)
except Exception:
raise HTTPException(
status_code=500, detail=f"Room not found: {room_url}")
print(f"Daily room: {room.url} {room.config.sip_endpoint}")
# Give the agent a token to join the session
token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
if not room or not token:
raise HTTPException(
status_code=500, detail=f"Failed to get room or token token")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in docs)
if vendor == "daily":
bot_proc = f"python3 -m bot_daily -u {room.url} -t {token} -i {
callId} -d {callDomain}"
else:
bot_proc = f"python3 -m bot_twilio -u {room.url} -t {
token} -i {callId} -s {room.config.sip_endpoint}"
try:
subprocess.Popen(
[bot_proc],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__))
)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Failed to start subprocess: {e}")
return room
@app.post("/twilio_start_bot", response_class=PlainTextResponse)
async def twilio_start_bot(request: Request):
print(f"POST /twilio_voice_bot")
# twilio_start_bot is invoked directly by Twilio (as a web hook).
# On Twilio, under Active Numbers, pick the phone number
# Click Configure and under Voice Configuration,
# "a call comes in" choose webhook and point the URL to
# where this code is hosted.
data = {}
try:
# shouldnt have received json, twilio sends form data
form_data = await request.form()
data = dict(form_data)
except Exception:
pass
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
callId = data.get('CallSid')
if not callId:
raise HTTPException(
status_code=500, detail="Missing 'CallSid' in request")
print("CallId: %s" % callId)
# create room and tell the bot to join the created room
# note: Twilio does not require a callDomain
room: DailyRoomObject = _create_daily_room(
room_url, callId, None, "twilio")
print(f"Put Twilio on hold...")
# We have the room and the SIP URI,
# but we do not know if the Daily SIP Worker and the Bot have joined the call
# put the call on hold until the 'on_dialin_ready' fires.
# Then, the bot will update the called sid with the sip uri.
# http://com.twilio.music.classical.s3.amazonaws.com/BusyStrings.mp3
resp = VoiceResponse()
resp.play(
url="http://com.twilio.sounds.music.s3.amazonaws.com/MARKOVICHAMP-Borghestral.mp3", loop=10)
return str(resp)
@app.post("/daily_start_bot")
async def daily_start_bot(request: Request) -> JSONResponse:
# The /daily_start_bot is invoked when a call is received on Daily's SIP URI
# daily_start_bot will create the room, put the call on hold until
# the bot and sip worker are ready. Daily will automatically
# forward the call to the SIP URi when dialin_ready fires.
# Use specified room URL, or create a new one if not specified
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
# Get the dial-in properties from the request
try:
data = await request.json()
if "test" in data:
# Pass through any webhook checks
return JSONResponse({"test": True})
callId = data.get("callId", None)
callDomain = data.get("callDomain", None)
except Exception:
raise HTTPException(
status_code=500,
detail="Missing properties 'callId' or 'callDomain'")
print(f"CallId: {callId}, CallDomain: {callDomain}")
room: DailyRoomObject = _create_daily_room(
room_url, callId, callDomain, "daily")
# Grab a token for the user to join with
return JSONResponse({
"room_url": room.url,
"sipUri": room.config.sip_endpoint
})
# ----------------- Main ----------------- #
if __name__ == "__main__":
# Check environment variables
for env_var in REQUIRED_ENV_VARS:
if env_var not in os.environ:
raise Exception(f"Missing environment variable: {env_var}.")
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument("--host", type=str,
default=os.getenv("HOST", "0.0.0.0"), help="Host address")
parser.add_argument("--port", type=int,
default=os.getenv("PORT", 7860), help="Port number")
parser.add_argument("--reload", action="store_true",
default=True, help="Reload code on change")
config = parser.parse_args()
try:
import uvicorn
uvicorn.run(
"bot_runner:app",
host=config.host,
port=config.port,
reload=config.reload
)
except KeyboardInterrupt:
print("Pipecat runner shutting down...")

View File

@@ -0,0 +1,125 @@
import asyncio
import aiohttp
import os
import sys
import argparse
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
from pipecat.frames.frames import (
LLMMessagesFrame,
EndFrame
)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from twilio.rest import Client
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
twilio_account_sid = os.getenv('TWILIO_ACCOUNT_SID')
twilio_auth_token = os.getenv('TWILIO_AUTH_TOKEN')
twilioclient = Client(twilio_account_sid, twilio_auth_token)
daily_api_key = os.getenv("DAILY_API_KEY", "")
async def main(room_url: str, token: str, callId: str, sipUri: str):
async with aiohttp.ClientSession() as session:
# diallin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_key=daily_api_key,
dialin_settings=None, # Not required for Twilio
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying 'Hello! Who dares dial me at this hour?!'.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(),
tma_in,
llm,
tts,
transport.output(),
tma_out,
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
@transport.event_handler("on_dialin_ready")
async def on_dialin_ready(transport, cdata):
# For Twilio, Telnyx, etc. You need to update the state of the call
# and forward it to the sip_uri..
print(f"Forwarding call: {callId} {sipUri}")
try:
# The TwiML is updated using Twilio's client library
call = twilioclient.calls(callId).update(
twiml=f'<Response><Dial><Sip>{sipUri}</Sip></Dial></Response>'
)
except Exception as e:
raise Exception(f"Failed to forward call: {str(e)}")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Simple ChatBot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
parser.add_argument("-i", type=str, help="Call ID")
parser.add_argument("-s", type=str, help="SIP URI")
config = parser.parse_args()
asyncio.run(main(config.u, config.t, config.i, config.s))

View File

@@ -0,0 +1,8 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (optional: for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=.
DAILY_API_URL=api.daily.co/v1
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=

View File

@@ -0,0 +1,19 @@
# fly.toml app configuration file generated for pipecat-dialin-demo on 2024-06-03T15:57:57+02:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = 'pipecat-dialin-demo'
primary_region = 'sjc'
[build]
[http_service]
internal_port = 7860
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[[vm]]
size = 'performance-1x'

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

View File

@@ -0,0 +1,7 @@
pipecat-ai[daily,openai,silero]
fastapi
uvicorn
requests
python-dotenv
loguru
twilio

View File

@@ -0,0 +1,56 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_participant_joined")
async def on_new_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,53 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([tts, transport.output()])
task = PipelineTask(pipeline)
async def say_something():
await asyncio.sleep(1)
await task.queue_frames([TextFrame("Hello there!"), EndFrame()])
runner = PipelineRunner()
await asyncio.gather(runner.run(task), say_something())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,68 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Say One Thing From an LLM",
DailyParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}]
runner = PipelineRunner()
task = PipelineTask(Pipeline([llm, tts, transport.output()]))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,68 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Show a still frame image",
DailyParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024
)
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
runner = PipelineRunner()
task = PipelineTask(Pipeline([imagegen, transport.output()]))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Note that we do not put an EndFrame() item in the pipeline for this demo.
# This means that the bot will stay in the channel until it times out.
# An EndFrame() in the pipeline would cause the transport to shut
# down.
await task.queue_frames([TextFrame("a cat in the style of picasso")])
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,68 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import tkinter as tk
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Picasso Cat")
transport = TkLocalTransport(
tk_root,
TransportParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024))
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
pipeline = Pipeline([imagegen, transport.output()])
task = PipelineTask(pipeline)
await task.queue_frames([TextFrame("a cat in the style of picasso")])
runner = PipelineRunner()
async def run_tk():
while runner.is_active():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,86 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
from pipecat.pipeline.merge_pipeline import SequentialMergePipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.frames.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.task import PipelineTask
from pipecat.services.azure import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.transport_services import TransportServiceOutput
from pipecat.services.transports.daily_transport import DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(room_url, None, "Static And Dynamic Speech")
meeting = TransportServiceOutput(transport, mic_enabled=True)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
azure_tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
elevenlabs_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
messages = [{"role": "system",
"content": "tell the user a joke about llamas"}]
# Start a task to run the LLM to create a joke, and convert the LLM
# output to audio frames. This task will run in parallel with generating
# and speaking the audio for static text, so there's no delay to speak
# the LLM response.
llm_pipeline = Pipeline([llm, elevenlabs_tts])
llm_task = PipelineTask(llm_pipeline)
await llm_task.queue_frames([LLMMessagesFrame(messages), EndPipeFrame()])
simple_tts_pipeline = Pipeline([azure_tts])
await simple_tts_pipeline.queue_frames(
[
TextFrame("My friend the LLM is going to tell a joke about llamas."),
EndPipeFrame(),
]
)
merge_pipeline = SequentialMergePipeline(
[simple_tts_pipeline, llm_pipeline])
await asyncio.gather(
transport.run(merge_pipeline),
simple_tts_pipeline.run_pipeline(),
llm_pipeline.run_pipeline(),
)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,166 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from dataclasses import dataclass
from pipecat.frames.frames import (
AppFrame,
EndFrame,
Frame,
ImageRawFrame,
LLMFullResponseStartFrame,
LLMMessagesFrame,
TextFrame
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.parallel_task import ParallelTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.aggregators.gated import GatedAggregator
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
@dataclass
class MonthFrame(AppFrame):
month: str
def __str__(self):
return f"{self.name}(month: {self.month})"
class MonthPrepender(FrameProcessor):
def __init__(self):
super().__init__()
self.most_recent_month = "Placeholder, month frame not yet received"
self.prepend_to_next_text_frame = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, MonthFrame):
self.most_recent_month = frame.month
elif self.prepend_to_next_text_frame and isinstance(frame, TextFrame):
await self.push_frame(TextFrame(f"{self.most_recent_month}: {frame.text}"))
self.prepend_to_next_text_frame = False
elif isinstance(frame, LLMFullResponseStartFrame):
self.prepend_to_next_text_frame = True
await self.push_frame(frame)
else:
await self.push_frame(frame, direction)
async def main(room_url):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Month Narration Bot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
gated_aggregator = GatedAggregator(
gate_open_fn=lambda frame: isinstance(frame, ImageRawFrame),
gate_close_fn=lambda frame: isinstance(frame, LLMFullResponseStartFrame),
start_open=False
)
sentence_aggregator = SentenceAggregator()
month_prepender = MonthPrepender()
llm_full_response_aggregator = LLMFullResponseAggregator()
pipeline = Pipeline([
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
ParallelTask( # Run pipelines in parallel aggregating the result
[month_prepender, tts], # Create "Month: sentence" and output audio
[llm_full_response_aggregator, imagegen] # Aggregate full LLM response
),
gated_aggregator, # Queues everything until an image is available
transport.output() # Transport output
])
frames = []
for month in [
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
]:
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
frames.append(MonthFrame(month=month))
frames.append(LLMMessagesFrame(messages))
frames.append(EndFrame())
runner = PipelineRunner()
task = PipelineTask(pipeline)
await task.queue_frames(frames)
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,174 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import tkinter as tk
from pipecat.frames.frames import AudioRawFrame, Frame, URLImageRawFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Calendar")
runner = PipelineRunner()
async def get_month_data(month):
messages = [{"role": "system", "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.", }]
class ImageDescription(FrameProcessor):
def __init__(self):
super().__init__()
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
class AudioGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.audio = bytearray()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, AudioRawFrame):
self.audio.extend(frame.audio)
self.frame = AudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels)
class ImageGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, URLImageRawFrame):
self.frame = frame
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"))
aggregator = LLMFullResponseAggregator()
description = ImageDescription()
audio_grabber = AudioGrabber()
image_grabber = ImageGrabber()
pipeline = Pipeline([
llm,
aggregator,
description,
ParallelPipeline([tts, audio_grabber],
[imagegen, image_grabber])
])
task = PipelineTask(pipeline)
await task.queue_frame(LLMMessagesFrame(messages))
await task.stop_when_done()
await runner.run(task)
return {
"month": month,
"text": description.text,
"image": image_grabber.frame,
"audio": audio_grabber.frame,
}
transport = TkLocalTransport(
tk_root,
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024))
pipeline = Pipeline([transport.output()])
task = PipelineTask(pipeline)
# We only specify 5 months as we create tasks all at once and we might
# get rate limited otherwise.
months: list[str] = [
"January",
"February",
# "March",
# "April",
# "May",
]
# We create one task per month. This will be executed concurrently.
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
# Now we wait for each month task in the order they're completed. The
# benefit is we'll have as little delay as possible before the first
# month, and likely no delay between months, but the months won't
# display in order.
async def show_images(month_tasks):
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
await task.queue_frames([data["image"], data["audio"]])
await runner.stop_when_done()
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), show_images(month_tasks), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,103 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
fl = FrameLogger("!!! after LLM", "red")
fltts = FrameLogger("@@@ out of tts", "green")
flend = FrameLogger("### out of the end", "magenta")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(),
tma_in,
llm,
fl,
tts,
fltts,
transport.output(),
tma_out,
flend
])
task = PipelineTask(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,129 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from PIL import Image
from pipecat.frames.frames import ImageRawFrame, Frame, SystemFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from pipecat.transports.services.daily import DailyParams
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class ImageSyncAggregator(FrameProcessor):
def __init__(self, speaking_path: str, waiting_path: str):
super().__init__()
self._speaking_image = Image.open(speaking_path)
self._speaking_image_format = self._speaking_image.format
self._speaking_image_bytes = self._speaking_image.tobytes()
self._waiting_image = Image.open(waiting_path)
self._waiting_image_format = self._waiting_image.format
self._waiting_image_bytes = self._waiting_image.tobytes()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if not isinstance(frame, SystemFrame):
await self.push_frame(ImageRawFrame(image=self._speaking_image_bytes, size=(1024, 1024), format=self._speaking_image_format))
await self.push_frame(frame)
await self.push_frame(ImageRawFrame(image=self._waiting_image_bytes, size=(1024, 1024), format=self._waiting_image_format))
else:
await self.push_frame(frame)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
)
pipeline = Pipeline([
transport.input(),
image_sync_aggregator,
tma_in,
llm,
tts,
transport.output(),
tma_out
])
task = PipelineTask(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,98 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(
allow_interruptions=True,
enable_metrics=True,
report_only_initial_ttfb=True,
))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,95 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-opus-20240229")
# todo: think more about how to handle system prompts in a more general way. OpenAI,
# Google, and Anthropic all have slightly different approaches to providing a system
# prompt.
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative, helpful, and brief way. Say hello.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,125 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.processors.frameworks.langchain import LangchainProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
message_store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in message_store:
message_store[session_id] = ChatMessageHistory()
return message_store[session_id]
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
prompt = ChatPromptTemplate.from_messages(
[
("system",
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o", temperature=0.7)
history_chain = RunnableWithMessageHistory(
chain,
get_session_history,
history_messages_key="chat_history",
input_messages_key="input")
lc = LangchainProcessor(history_chain)
tma_in = LLMUserResponseAggregator()
tma_out = LLMAssistantResponseAggregator()
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
lc, # Langchain
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
lc.set_participant_id(participant["id"])
# Kick off the conversation.
# the `LLMMessagesFrame` will be picked up by the LangchainProcessor using
# only the content of the last message to inject it in the prompt defined
# above. So no role is required here.
messages = [(
{
"content": "Please briefly introduce yourself to the user."
}
)]
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,97 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True
)
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-helios-en"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,94 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_sample_rate=44100,
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="a0e99841-438c-4a64-b679-ae501e7d6091", # Barbershop Man
sample_rate=44100,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
tma_out, # Goes before the transport because cartesia has word-level timestamps!
transport.output(), # Transport bot output
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,93 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.playht import PlayHTTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=16000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = PlayHTTTSService(
user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/801a663f-efd0-4254-98d0-5c175514c3e8/jennifer/manifest.json",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,100 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.azure import AzureLLMService, AzureSTTService, AzureTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=16000,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
)
)
stt = AzureSTTService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,92 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.openai import OpenAITTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
voice="alloy"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,102 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openpipe import OpenPipeLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
import time
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
timestamp = int(time.time())
llm = OpenPipeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
model="gpt-4o",
tags={
"conversation_id": f"pipecat-{timestamp}"
}
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,96 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.xtts import XTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
)
)
tts = XTTSService(
aiohttp_session=session,
voice_id="Claribel Dervla",
language="en",
base_url="http://localhost:8000"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,101 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.gladia import GladiaSTTService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.xtts import XTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
)
)
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY"),
)
tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-helios-en"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,148 @@
from typing import Tuple
import aiohttp
import asyncio
import logging
import os
from pipecat.pipeline.aggregators import SentenceAggregator
from pipecat.pipeline.pipeline import Pipeline
from pipecat.transports.daily_transport import DailyTransport
from pipecat.services.azure_ai_services import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs_ai_services import ElevenLabsTTSService
from pipecat.services.fal_ai_services import FalImageGenService
from pipecat.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("pipecat")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Respond bot",
duration_minutes=10,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
tts1 = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts2 = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)
dalle = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="1024x1024"
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
bot1_messages = [
{
"role": "system",
"content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long.",
},
]
bot2_messages = [
{
"role": "system",
"content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich.",
},
]
async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
"""This function streams text from the LLM and uses the TTS service to convert
that text to speech as it's received. """
source_queue = asyncio.Queue()
sink_queue = asyncio.Queue()
sentence_aggregator = SentenceAggregator()
pipeline = Pipeline(
[llm, sentence_aggregator, tts1], source_queue, sink_queue
)
await source_queue.put(LLMMessagesFrame(messages))
await source_queue.put(EndFrame())
await pipeline.run_pipeline()
message = ""
all_audio = bytearray()
while sink_queue.qsize():
frame = sink_queue.get_nowait()
if isinstance(frame, TextFrame):
message += frame.text
elif isinstance(frame, AudioFrame):
all_audio.extend(frame.audio)
return (message, all_audio)
async def get_bot1_statement():
message, audio = await get_text_and_audio(bot1_messages)
bot1_messages.append({"role": "assistant", "content": message})
bot2_messages.append({"role": "user", "content": message})
return audio
async def get_bot2_statement():
message, audio = await get_text_and_audio(bot2_messages)
bot2_messages.append({"role": "assistant", "content": message})
bot1_messages.append({"role": "user", "content": message})
return audio
async def argue():
for i in range(100):
print(f"In iteration {i}")
bot1_description = "A woman conservatively dressed as a librarian in a library surrounded by books, cartoon, serious, highly detailed"
(audio1, image_data1) = await asyncio.gather(
get_bot1_statement(), dalle.run_image_gen(bot1_description)
)
await transport.send_queue.put(
[
ImageFrame(image_data1[1], image_data1[2]),
AudioFrame(audio1),
]
)
bot2_description = "A cat dressed in a hot dog costume, cartoon, bright colors, funny, highly detailed"
(audio2, image_data2) = await asyncio.gather(
get_bot2_statement(), dalle.run_image_gen(bot2_description)
)
await transport.send_queue.put(
[
ImageFrame(image_data2[1], image_data2[2]),
AudioFrame(audio2),
]
)
await asyncio.gather(transport.run(), argue())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,54 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.services.daily import DailyTransport, DailyParams
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url, token):
transport = DailyTransport(
room_url, token, "Test",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
camera_out_width=1280,
camera_out_height=720
)
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
pipeline = Pipeline([transport.input(), transport.output()])
runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,66 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
import tkinter as tk
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url, token):
tk_root = tk.Tk()
tk_root.title("Local Mirror")
daily_transport = DailyTransport(room_url, token, "Test", DailyParams(audio_in_enabled=True))
tk_transport = TkLocalTransport(
tk_root,
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
camera_out_width=1280,
camera_out_height=720))
@daily_transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
pipeline = Pipeline([daily_transport.input(), tk_transport.output()])
task = PipelineTask(pipeline)
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
runner = PipelineRunner()
await asyncio.gather(runner.run(task), run_tk())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,94 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Robot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
},
]
hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
hey_robot_filter, # Filter out speech not directed at the robot
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await tts.say("Hi! If you want to talk to me, just say 'Hey Robot'.")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,152 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import wave
from pipecat.frames.frames import (
Frame,
AudioRawFrame,
LLMFullResponseEndFrame,
LLMMessagesFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMUserResponseAggregator,
LLMAssistantResponseAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
sounds = {}
sound_files = ["ding1.wav", "ding2.wav"]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = AudioRawFrame(audio_file.readframes(-1),
audio_file.getframerate(), audio_file.getnchannels())
class OutboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMFullResponseEndFrame):
await self.push_frame(sounds["ding1.wav"])
# In case anything else downstream needs it
await self.push_frame(frame, direction)
else:
await self.push_frame(frame, direction)
class InboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMMessagesFrame):
await self.push_frame(sounds["ding2.wav"])
# In case anything else downstream needs it
await self.push_frame(frame, direction)
else:
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
pipeline = Pipeline([
transport.input(),
tma_in,
in_sound,
fl2,
llm,
fl,
tts,
out_sound,
transport.output(),
tma_out
])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await tts.say("Hi, I'm listening!")
await transport.send_audio(sounds["ding1.wav"])
runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,112 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.moondream import MoondreamService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
# If you run into weird description, try with use_cpu=True
moondream = MoondreamService()
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
moondream,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,108 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.google import GoogleLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_in_enabled=True, # This is so Silero VAD can get audio data
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
google = GoogleLLMService(
model="gemini-1.5-flash-latest",
api_key=os.getenv("GOOGLE_API_KEY"))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
google,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,108 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
openai = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o"
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
openai,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,108 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
def set_participant_id(self, participant_id: str):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
user_response = UserResponseAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator()
anthropic = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-sonnet-20240229"
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
transport.capture_participant_video(participant["id"], framerate=0)
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
anthropic,
tts,
transport.output()
])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,57 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main(room_url: str):
transport = DailyTransport(room_url, None, "Transcription bot",
DailyParams(audio_in_enabled=True))
stt = WhisperSTTService()
tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,54 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main():
transport = LocalAudioTransport(TransportParams(audio_in_enabled=True))
stt = WhisperSTTService()
tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,58 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main(room_url: str):
transport = DailyTransport(room_url, None, "Transcription bot",
DailyParams(audio_in_enabled=True))
stt = DeepgramSTTService(os.getenv("DEEPGRAM_API_KEY"))
tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -0,0 +1,140 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
)
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(llm):
await llm.push_frame(TextFrame("Let me think."))
async def fetch_weather_from_api(llm, args):
return {"conditions": "nice", "temperature": "75"}
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
llm.register_function(
"get_current_weather",
fetch_weather_from_api,
start_callback=start_fetch_weather)
fl_in = FrameLogger("Inner")
fl_out = FrameLogger("Outer")
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": [
"location",
"format"],
},
})]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
tma_in = LLMUserContextAggregator(context)
tma_out = LLMAssistantContextAggregator(context)
pipeline = Pipeline([
fl_in,
transport.input(),
tma_in,
llm,
fl_out,
tts,
transport.output(),
tma_out
])
task = PipelineTask(pipeline)
@ transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await tts.say("Hi! Ask me about the weather in San Francisco.")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,155 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantContextAggregator,
LLMUserContextAggregator
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.function_filter import FunctionFilter
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
current_voice = "News Lady"
async def switch_voice(llm, args):
global current_voice
current_voice = args["voice"]
return {"voice": f"You are now using your {current_voice} voice. Your responses should now be as if you were a {current_voice}."}
async def news_lady_filter(frame) -> bool:
return current_voice == "News Lady"
async def british_lady_filter(frame) -> bool:
return current_voice == "British Lady"
async def barbershop_man_filter(frame) -> bool:
return current_voice == "Barbershop Man"
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Pipecat",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
news_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="bf991597-6c13-47e4-8411-91ec2de5c466", # Newslady
)
british_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
barbershop_man = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="a0e99841-438c-4a64-b679-ae501e7d6091", # Barbershop Man
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
llm.register_function("switch_voice", switch_voice)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_voice",
"description": "Switch your voice only when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"voice": {
"type": "string",
"description": "The voice the user wants you to use",
},
},
"required": ["voice"],
},
})]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can do the following voices: 'News Lady', 'British Lady' and 'Barbershop Man'.",
},
]
context = OpenAILLMContext(messages, tools)
tma_in = LLMUserContextAggregator(context)
tma_out = LLMAssistantContextAggregator(context)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
ParallelPipeline( # TTS (one of the following vocies)
[FunctionFilter(news_lady_filter), news_lady], # News Lady voice
[FunctionFilter(british_lady_filter), british_lady], # British Lady voice
[FunctionFilter(barbershop_man_filter), barbershop_man], # Barbershop Man voice
),
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{
"role": "system",
"content": f"Please introduce yourself to the user and let them know the voices you can do. Your initial responses should be as if you were a {current_voice}."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,153 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantContextAggregator,
LLMUserContextAggregator
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.function_filter import FunctionFilter
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.whisper import Model, WhisperSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
current_language = "English"
async def switch_language(llm, args):
global current_language
current_language = args["language"]
return {"voice": f"Your answers from now on should be in {current_language}."}
async def english_filter(frame) -> bool:
return current_language == "English"
async def spanish_filter(frame) -> bool:
return current_language == "Spanish"
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Pipecat",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True
)
)
stt = WhisperSTTService(model=Model.LARGE)
english_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="pNInz6obpgDQGcFmaJgB",
)
spanish_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
model="eleven_multilingual_v2",
voice_id="9F4C8ztpNUmXkdDDbz3J",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
llm.register_function("switch_language", switch_language)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_language",
"description": "Switch to another language when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"language": {
"type": "string",
"description": "The language the user wants you to speak",
},
},
"required": ["language"],
},
})]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can speak the following languages: 'English' and 'Spanish'.",
},
]
context = OpenAILLMContext(messages, tools)
tma_in = LLMUserContextAggregator(context)
tma_out = LLMAssistantContextAggregator(context)
pipeline = Pipeline([
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
ParallelPipeline( # TTS (bot will speak the chosen language)
[FunctionFilter(english_filter), english_tts], # English
[FunctionFilter(spanish_filter), spanish_tts], # Spanish
),
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{
"role": "system",
"content": f"Please introduce yourself to the user and let them know the languages you speak. Your initial responses should be in {current_language}."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,130 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import json
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyTransportMessageFrame
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-asteria-en",
base_url="http://0.0.0.0:8080/v1/speak"
)
llm = OpenAILLMService(
# To use OpenAI
# api_key=os.getenv("OPENAI_API_KEY"),
# model="gpt-4o"
# Or, to use a local vLLM (or similar) api server
model="meta-llama/Meta-Llama-3-8B-Instruct",
base_url="http://0.0.0.0:8000/v1"
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
# When a participant joins, start transcription for that participant so the
# bot can "hear" and respond to them.
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# When the first participant joins, the bot should introduce itself.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
# Handle "latency-ping" messages. The client will send app messages that look like
# this:
# { "latency-ping": { ts: <client-side timestamp> }}
#
# We want to send an immediate pong back to the client from this handler function.
# Also, we will push a frame into the top of the pipeline and send it after the
#
@transport.event_handler("on_app_message")
async def on_app_message(transport, message, sender):
try:
if "latency-ping" in message:
logger.debug(f"Received latency ping app message: {message}")
ts = message["latency-ping"]["ts"]
# Send immediately
transport.output().send_message(DailyTransportMessageFrame(
message={"latency-pong-msg-handler": {"ts": ts}},
participant_id=sender))
# And push to the pipeline for the Daily transport.output to send
await tma_in.push_frame(
DailyTransportMessageFrame(
message={"latency-pong-pipeline-delivery": {"ts": ts}},
participant_id=sender))
except Exception as e:
logger.debug(f"message handling error: {e} - {message}")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,108 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.processors.frame_processor import FrameDirection
from pipecat.processors.user_idle_processor import UserIdleProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
async def user_idle_callback(user_idle: UserIdleProcessor):
messages.append(
{"role": "system", "content": "Ask the user if they are still there and try to prompt for some input, but be short."})
await user_idle.queue_frame(LLMMessagesFrame(messages))
user_idle = UserIdleProcessor(callback=user_idle_callback, timeout=5.0)
pipeline = Pipeline([
transport.input(), # Transport user input
user_idle, # Idle user check-in
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(
allow_interruptions=True,
enable_metrics=True,
report_only_initial_ttfb=True,
))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

Binary file not shown.

Binary file not shown.

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 870 KiB

After

Width:  |  Height:  |  Size: 870 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 871 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 872 KiB

Some files were not shown because too many files have changed in this diff Show More