Compare commits

..

177 Commits

Author SHA1 Message Date
James Hush
5e6979cf95 Remove logs 2025-03-20 14:28:53 +08:00
James Hush
b1e9dc5bb4 Remove extra imports 2025-03-20 14:27:58 +08:00
James Hush
2d06cd2109 Send message when user is muted 2025-03-20 14:27:03 +08:00
James Hush
1a237ddae8 This is working but RTVI still gets the transcript
FEAT: Example of muting LLMMessages to LLM.
2025-03-20 13:46:13 +08:00
Aleix Conchillo Flaqué
f31e77c4f6 pyproject: added empty tavus dependencies 2025-03-19 18:43:07 -07:00
Aleix Conchillo Flaqué
afb26be0ad Merge pull request #1396 from pipecat-ai/aleix/stt-service-audio-passthrough
SegmentedSTTService: allow audio to pass-through downstream
2025-03-19 11:16:40 -07:00
Aleix Conchillo Flaqué
48d73a2636 SegmentedSTTService: allow audio to pass-through downstream 2025-03-19 11:06:12 -07:00
Aleix Conchillo Flaqué
da531dabfd Merge pull request #1304 from pipecat-ai/aleix/handle-emails-user-email-gathering
add skip tags aggregator to support TTS service spelling out tags
2025-03-19 11:05:10 -07:00
Aleix Conchillo Flaqué
336e2f1579 TTSServices: for now just specify a single text aggregator 2025-03-19 11:02:29 -07:00
Aleix Conchillo Flaqué
fc0f404d26 examples: add new 36-user-email-gathering.py 2025-03-19 10:57:29 -07:00
Aleix Conchillo Flaqué
54620133d4 services: add spelling out support to CartesiaTTSService and RimeTTSService 2025-03-19 10:57:29 -07:00
Aleix Conchillo Flaqué
e7224473f2 utils(text): add new SkipTagsAggregator 2025-03-19 10:57:29 -07:00
Aleix Conchillo Flaqué
1a3a268c9d utils(string): add new function parse_start_end_tags() 2025-03-19 10:57:29 -07:00
Aleix Conchillo Flaqué
11984b89b7 utils(string): add support for floating point numbers 2025-03-19 10:57:29 -07:00
Aleix Conchillo Flaqué
1dbad2326a utils(string): support email addresses in end of sentence matching 2025-03-19 10:57:27 -07:00
Mark Backman
2e0c6c2bd1 Merge pull request #1397 from pipecat-ai/mb/disconnect-bot
Fix: RTVI message disconnect-bot now pushes EndTaskFrame
2025-03-19 10:45:24 -04:00
Mark Backman
7f1ccab445 Fix: RTVI message disconnect-bot now pushes EndTaskFrame 2025-03-19 07:07:45 -04:00
Aleix Conchillo Flaqué
7ddac4eb88 Merge pull request #1395 from pipecat-ai/aleix/multiple-text-filters-and-aggregators
TTSService: allow passing multiple text filters and aggregators
2025-03-18 21:25:29 -07:00
Aleix Conchillo Flaqué
514ecda755 TTSService: allow passing multiple text filters and aggregators 2025-03-18 17:31:01 -07:00
Aleix Conchillo Flaqué
71a38a120e Merge pull request #1376 from pipecat-ai/aleix/event-handlers-as-tasks
event handlers are now executed in separate tasks
2025-03-18 12:10:34 -07:00
Mark Backman
79616de7a4 Merge pull request #1392 from pipecat-ai/mb/fix-google-stt-timeout
Fix an issue where GoogleSTTService would timeout due to stream inact…
2025-03-18 14:17:44 -04:00
Mark Backman
6368fbe0dd Merge pull request #1318 from Vaibhav159/vl_google_vertex_llm
adding vertex google llm
2025-03-18 14:17:21 -04:00
Mark Backman
5dc8b48fbe Fix an issue where GoogleSTTService would timeout due to stream inactivity 2025-03-18 14:06:32 -04:00
Aleix Conchillo Flaqué
9112ff114f Merge pull request #1359 from lucasrothman/tavus-output-sample-rate
Tavus support for custom output rate
2025-03-18 10:16:34 -07:00
Aleix Conchillo Flaqué
32609b1132 event handlers are now executed in separate tasks 2025-03-18 09:25:39 -07:00
Vaibhav159
4303ed4991 rename service 2025-03-18 20:58:21 +05:30
Mark Backman
4677c34663 Merge pull request #1387 from pipecat-ai/mb/pattern-aggregator
Add PatternPairAggregator
2025-03-18 08:46:42 -04:00
Mark Backman
b28276446d Code review feedback 2025-03-18 07:49:54 -04:00
Mark Backman
2dee882710 Add unit tests 2025-03-18 07:30:37 -04:00
Mark Backman
6ec4052f29 Add CHANGELOG entries 2025-03-18 07:30:36 -04:00
Mark Backman
ddcc1fbb2f Add foundational example 35 2025-03-18 07:30:11 -04:00
Mark Backman
e731a0d41f Add PairPatternAggregator 2025-03-18 07:30:11 -04:00
Mark Backman
4918eab4e8 Merge pull request #1371 from pipecat-ai/mb/openai-realtime-transcription
Add TranscriptProcessor support for OpenAIRealtimeBetaLLMService
2025-03-18 07:28:07 -04:00
Mark Backman
11987765d8 Merge pull request #1381 from pipecat-ai/mb/recording-example-stt
Update the 34-audio-recording.py example to include an STT processor
2025-03-18 07:20:42 -04:00
Mark Backman
6f09ee25b8 Merge pull request #1385 from pipecat-ai/mb/add-neuphonic-readme
Add Google Imagen and Neuphonic TTS to README
2025-03-18 07:20:15 -04:00
Mark Backman
83dda8a759 Merge pull request #1390 from adnansiddiquei/add-neuphonic-languages
Added 5 new languages for Neuphonic: FR, PT, RU, ZH, HI.
2025-03-18 07:18:27 -04:00
Adnan Siddiquei
188677e601 Added 4 new languages: FR, PT, RU, ZH, HI. 2025-03-18 10:35:22 +00:00
Lucas Rothman
c57fa93a70 Renamed to sample_rate 2025-03-17 16:22:36 -07:00
Mark Backman
6885d07e88 Simplify the TranscriptProcessor _emit_aggregated_text logic 2025-03-17 16:36:03 -04:00
Mark Backman
acd0660f66 Update GeminiMultimodalLiveLLMService to work with the TranscriptProcessor 2025-03-17 16:36:03 -04:00
Mark Backman
3f002f8ffb Remove unnecessary TranscriptProcessor examples 2025-03-17 16:36:02 -04:00
Mark Backman
d5776c27f4 Update 19-openai-realtime-beta 2025-03-17 16:35:35 -04:00
Mark Backman
6e6905405b Update CHANGELOG 2025-03-17 16:35:35 -04:00
Mark Backman
571c10403f tests: Add additional coverage to test_transcript_processor 2025-03-17 16:35:35 -04:00
Mark Backman
5b6b700214 OpenAIRealtimeBetaLLMService outputs a TTSTextFrame 2025-03-17 16:35:35 -04:00
Mark Backman
1ad8e28025 Update TranscriptProcessor to more robustly handle different TTSTextFrame outputs 2025-03-17 16:35:35 -04:00
Mark Backman
3458f1b6de Add Google Imagen to README 2025-03-17 11:43:40 -04:00
Mark Backman
02dbef8f5a Add Neuphonic TTS to README 2025-03-17 11:28:51 -04:00
Mark Backman
c1382b0691 Update the 34-audio-recording.py example to include an STT processor 2025-03-15 20:30:35 -04:00
Vaibhav159
5f000efc61 adding example 2025-03-15 10:36:26 +05:30
Vaibhav159
fa7da8f5f6 adding vertex llm 2025-03-15 10:21:40 +05:30
Mark Backman
8b86f6991d Merge pull request #1343 from pipecat-ai/mb/pipecat-cloud-example
Add a Pipecat Cloud deployment example
2025-03-14 20:49:45 -04:00
Mark Backman
d3cd1a6c59 Update with latest starter 2025-03-14 20:40:33 -04:00
Mark Backman
24220f38f0 Add a Pipecat Cloud deployment example 2025-03-14 20:40:29 -04:00
Aleix Conchillo Flaqué
1f8752ab03 Merge pull request #1378 from pipecat-ai/aleix/remove-deprecations
removed most deprecations
2025-03-14 14:42:34 -07:00
Aleix Conchillo Flaqué
16d7df1c9f removed most deprecations 2025-03-14 14:37:08 -07:00
Aleix Conchillo Flaqué
2474211291 Merge pull request #1379 from pipecat-ai/aleix/introduce-text-aggregators
introduce text aggregators
2025-03-14 13:03:49 -07:00
Aleix Conchillo Flaqué
b632d71465 TTSService: flush_audio() should be in the base class 2025-03-14 10:48:25 -07:00
Aleix Conchillo Flaqué
f8610a69a5 introduce text aggregators 2025-03-14 10:48:25 -07:00
Aleix Conchillo Flaqué
624a454f8b Merge pull request #1366 from adnansiddiquei/neuphonic-tts-plugin
Add integration for Neuphonic TTS
2025-03-14 10:27:24 -07:00
Aleix Conchillo Flaqué
11ba08b7ba Merge pull request #1377 from pipecat-ai/aleix/task-upstream-downstream-filters
PipelineTask: only call event handlers if a filter is matched
2025-03-14 08:49:24 -07:00
Adnan Siddiquei
11b13d053b Fixed a bug from previous commit. Removed the concept of model from Neuphonic. 2025-03-14 11:17:22 +00:00
Adnan Siddiquei
7dec8431e1 Review comments by aconchillo. 2025-03-14 10:52:13 +00:00
Aleix Conchillo Flaqué
ce3f3b2edb Merge pull request #1372 from pipecat-ai/khk-fix-multimodal-live-example
fix for 26-gemini-multimodal-live.py
2025-03-13 20:22:07 -07:00
Aleix Conchillo Flaqué
1b3b4ee04a PipelineTask: only call event handlers if a filter is matched 2025-03-13 18:44:30 -07:00
Mark Backman
676c5d9ba7 Merge pull request #1374 from pipecat-ai/mb/add-riva-to-readme 2025-03-13 20:41:05 -04:00
Mark Backman
6eb3a8409f README: Add Parakeet and FastPitch 2025-03-13 18:42:19 -04:00
Kwindla Hultman Kramer
c9a31ea513 fix for 26-gemini-multimodal-live.py 2025-03-13 14:35:47 -07:00
Aleix Conchillo Flaqué
c0c7c5d600 Merge pull request #1370 from pipecat-ai/aleix/minor-ultravox-updates
services(ultravox): CHANGELOG, formatting and minor changes
2025-03-13 12:05:13 -07:00
Aleix Conchillo Flaqué
87004937be services(ultravox): CHANGELOG, formatting and minor changes 2025-03-13 11:49:18 -07:00
Aleix Conchillo Flaqué
b426be3067 Merge pull request #1331 from CerebriumAI/feature/ultravox
Added ultravox service
2025-03-13 10:40:00 -07:00
Aleix Conchillo Flaqué
b71e2b97ff Merge pull request #1368 from pipecat-ai/aleix/pipelinetask-frame-event-handlers
PipelineTask: add on_frame_reached_upstream and on_frame_reached_downstream
2025-03-13 10:31:33 -07:00
Aleix Conchillo Flaqué
25dcf7def6 PipelineTask: add on_frame_reached_upstream/on_frame_reached_downstream 2025-03-13 10:26:11 -07:00
Adnan Siddiquei
1bf964a667 Added two examples on how to use Neuphonic as a TTS (07u). 2025-03-13 14:42:42 +00:00
Adnan Siddiquei
08fb931ef6 Swapped NEUPHONIC_API_TOKEN for NEUPHONIC_API_KEY. 2025-03-13 12:10:03 +00:00
Aleix Conchillo Flaqué
c5aa931096 Merge pull request #1358 from pipecat-ai/aleix/abstractmethod-fixes
ai_services: fix abstractmethod issues
2025-03-12 17:26:48 -07:00
Mark Backman
b084a3e9e7 Merge pull request #1367 from MaCaki/macaki/rime/send_msg_in_flush_audio
[rime client] Sending over trailing space to help indicate end of utt…
2025-03-12 14:25:18 -04:00
macaki
5c9e33bc7a formatting 2025-03-12 12:20:18 -06:00
Adnan Siddiquei
0b9c4b2255 Fixed a couple of small bugs. 2025-03-12 18:04:48 +00:00
macaki
effb5f6cd8 added changelog 2025-03-12 11:57:25 -06:00
Adnan Siddiquei
ead555eb4b Corrected versions on pyproject.toml. 2025-03-12 17:39:04 +00:00
macaki
f843482968 [rime client] Sending over trailing space to help indicate end of utterance after a punctuation. 2025-03-12 11:26:43 -06:00
Adnan Siddiquei
23a4933af9 Initial implementation of Neuphonic service. A TTS provider. 2025-03-12 17:15:31 +00:00
Michael Louis
d9ef19233a Added foundational example for ultravox 2025-03-12 10:30:23 -04:00
Mark Backman
357334e3c9 Merge pull request #1341 from pipecat-ai/mb/fix-google-typo
Add a set_language convenience method for GoogleSTTService
2025-03-12 09:05:52 -04:00
Mark Backman
59ea94af86 Merge pull request #1360 from pipecat-ai/mb/update-cartesia-voice
Update Cartesia voice for demos
2025-03-12 08:02:26 -04:00
Mark Backman
4a363bebf0 Add a set_language convenience method for GoogleSTTService 2025-03-12 07:58:29 -04:00
Mark Backman
c196fb5f98 Merge pull request #1342 from pipecat-ai/mb/lmnt-flush-audio 2025-03-11 22:22:38 -04:00
Mark Backman
5f97f6ff94 Add flush_audio() to LmntTTSService 2025-03-11 21:57:54 -04:00
Mark Backman
5860fe5319 Merge pull request #1340 from pipecat-ai/mb/fish-flush
Add flush_audio to FishTTSService
2025-03-11 21:56:44 -04:00
Mark Backman
3522bbb533 tmp 2025-03-11 21:55:18 -04:00
Mark Backman
cfca7269f4 Update the Cartesia voice in all demos with one built for sonic-2 2025-03-11 21:53:03 -04:00
Mark Backman
e6f269a903 Add flush_audio to FishTTSService 2025-03-11 21:48:41 -04:00
Mark Backman
468e936a5f Merge pull request #1356 from pipecat-ai/mb/add-chirp-tts-support
Add support for Chirp voices in GoogleTTSService
2025-03-11 20:12:52 -04:00
Lucas Rothman
ecc4411128 Tavus support for custom output rate 2025-03-11 16:02:33 -07:00
Aleix Conchillo Flaqué
740ba4e759 ai_services: fix abstractmethod issues 2025-03-11 14:29:03 -07:00
Mark Backman
a62741df94 Add support for Chirp voices in GoogleTTSService 2025-03-11 07:56:27 -04:00
Mark Backman
5bd359ada9 Merge pull request #1354 from pipecat-ai/mb/cartesia-changelog
Changelog entry for Cartesia model update
2025-03-11 07:20:04 -04:00
Mark Backman
40562402a2 Changelog entry for Cartesia model update 2025-03-10 21:10:11 -04:00
Mark Backman
98e5089fbe Merge pull request #1353 from kunal-cai/main
[Cartesia] Update the default alias for Cartesia TTS Service
2025-03-10 21:07:19 -04:00
Kunal Shah
e1c8a09b60 [Cartesia] Update the default alias for Cartesia TTS Service 2025-03-10 14:43:58 -07:00
Filipi da Silva Fuchter
154fe65011 Merge pull request #1336 from pipecat-ai/fixing_function_calling_examples
Pipecat small fixes and refactored function calling examples
2025-03-07 16:10:27 -03:00
Mark Backman
61f534ca34 Merge pull request #1334 from pipecat-ai/aleix/user-and-bot-turn-audio
add support for user and bot turn audio
2025-03-06 18:35:56 -05:00
Mark Backman
a91c26785f Store recording in a folder 2025-03-06 18:31:48 -05:00
Aleix Conchillo Flaqué
d7e93551d2 examples(chatbot-audio-recording): add support for user/bot turn audio 2025-03-06 11:49:01 -08:00
Aleix Conchillo Flaqué
06c742a2ad AudioBufferProcessor: add on_user_turn_audio_data and on_bot_turn_audio_data 2025-03-06 11:49:01 -08:00
Filipi Fuchter
55b0797fd5 Removing the extra examples inside the unified-format-function-calling folder 2025-03-06 12:00:22 -03:00
Filipi Fuchter
21443b9a08 Refactored gemini multimodal example to use the unified format for function calling. 2025-03-06 11:59:08 -03:00
Filipi Fuchter
4b167a3c3d Fixing the ruff format. 2025-03-06 10:38:45 -03:00
Filipi Fuchter
2df77430aa Refactoring the 14 series examples to use the unified format for function calling. 2025-03-06 10:35:26 -03:00
Filipi Fuchter
2d114b15f9 Adding missing flush_audio method to AzureTTSService. 2025-03-06 10:34:25 -03:00
Filipi Fuchter
26000b616d Fixing the base_whisper services to implement set_language. 2025-03-06 10:15:04 -03:00
Aleix Conchillo Flaqué
710eebab09 Merge pull request #1332 from pipecat-ai/aleix/base-object-and-event-handlers
introduce BaseObject class
2025-03-05 13:41:27 -08:00
Dominic Stewart
532423eb4c Updated example to switch pipelines per the original request (#1320) 2025-03-05 13:40:36 -08:00
Aleix Conchillo Flaqué
bb29e50adb introduce BaseObject class 2025-03-05 13:38:53 -08:00
Filipi da Silva Fuchter
4048d6782b Merge pull request #1211 from pipecat-ai/function_calling_unified_format
Unified format for function calling
2025-03-05 18:30:22 -03:00
Filipi Fuchter
76d36a312b Adding the unified format function calling to the changelog. 2025-03-05 14:18:37 -03:00
Filipi Fuchter
2a75373c04 Created examples for unified format function calling. 2025-03-05 14:12:30 -03:00
Filipi Fuchter
a840b0e815 Prevents pytest from collecting TestFrameProcessor. 2025-03-05 14:11:52 -03:00
Filipi Fuchter
ebcde719a6 Integration test for function calling. 2025-03-05 14:11:16 -03:00
Filipi Fuchter
5c912927bb Unit tests for function calling adapters. 2025-03-05 14:11:02 -03:00
Filipi Fuchter
0e55db054e Created script to fix ruff format issues. 2025-03-05 14:10:47 -03:00
Filipi Fuchter
5967ac0d4f Implementing unified format for function calling. 2025-03-05 14:10:32 -03:00
Aleix Conchillo Flaqué
1451483cf7 Merge pull request #1330 from pipecat-ai/aleix/playht-update-0.1.12
pyproject: update pyht to 0.1.12
2025-03-04 18:35:03 -08:00
Michael Louis
3fe7c1d730 Added ultravox service 2025-03-04 13:59:03 -05:00
Aleix Conchillo Flaqué
c14b85c12b pyproject: update pyht to 0.1.12
Fixes #1309
2025-03-04 10:26:11 -08:00
kompfner
9f3c0219d7 Merge pull request #1329 from pipecat-ai/add-permissions-to-daily-meeting-token-properties
Add the `permissions` property to `DailyMeetingTokenProperties`
2025-03-03 14:44:10 -05:00
Aleix Conchillo Flaqué
ec36fef26e updated CHANGELOG and fix GladiaSTTService formatting 2025-03-03 09:53:03 -08:00
allenmylath
5f1848d24b Update gladia.py (#1317)
* Update gladia.py

According to gladia docs 
https://docs.gladia.io/api-reference/v2/live/init
speech threshould value close to 1 enables gladia to better isolate speeech from noise.
2025-03-03 09:51:11 -08:00
Aleix Conchillo Flaqué
d6867bd12f Merge pull request #1321 from pipecat-ai/aleix/allow-setting-context-aggregator-parameters
LLMService: add user/assistant args to create_context_aggregator()
2025-03-03 09:48:31 -08:00
Aleix Conchillo Flaqué
17a1f30572 LLMService: add user/assistant args to create_context_aggregator() 2025-03-03 09:46:37 -08:00
Paul Kompfner
8e0dc1f256 Add the permissions property to DailyMeetingTokenProperties 2025-03-03 10:13:25 -05:00
Kwindla Hultman Kramer
b9100beee3 Merge pull request #1327 from pipecat-ai/azure-realtime-changelog
CHANGELOG.md entry for AzureRealtimeBetaLLMService
2025-03-02 20:30:40 -08:00
Mark Backman
b8bc3d2565 Merge pull request #1326 from pipecat-ai/mb/11labs-speed
Add speed as InputParam to ElevenLabs TTS services
2025-03-02 15:20:01 -05:00
Kwindla Hultman Kramer
3213e85b7d CHANGELOG.md entry for AzureRealtimeBetaLLMService 2025-03-02 12:16:50 -08:00
Kwindla Hultman Kramer
de3bcd64c4 Merge pull request #1324 from pipecat-ai/azure-realtime
Support for Azure OpenAI Realtime API
2025-03-02 12:13:29 -08:00
Mark Backman
ad7f1eec12 Create a function to build voice_settings dictionary 2025-03-02 08:27:29 -05:00
Mark Backman
29310b4e92 Add speed as InputParam to ElevenLabs TTS services 2025-03-02 08:19:44 -05:00
Kwindla Hultman Kramer
2f4d36a146 docstring fixup 2025-03-01 15:44:10 -08:00
Kwindla Hultman Kramer
6c9bb782b1 add __init__.py 2025-03-01 15:42:20 -08:00
Kwindla Hultman Kramer
010d9103d4 support for Azure OpenAI Realtime API 2025-03-01 15:39:19 -08:00
Aleix Conchillo Flaqué
12131eb7c5 Merge pull request #1313 from Vaibhav159/vl_add_automated_formatting
using ruff automated formatting to avoid action failures.
2025-02-28 13:12:31 -08:00
Aleix Conchillo Flaqué
80b830322a Merge pull request #1311 from pipecat-ai/aleix/llm-full-response-aggregator
add new LLMFullResponseAggregator
2025-02-28 13:08:06 -08:00
Aleix Conchillo Flaqué
8db9d16174 add new LLMFullResponseAggregator 2025-02-28 13:05:21 -08:00
Aleix Conchillo Flaqué
1c92fab1fb Merge pull request #1308 from Vaibhav159/vl_google_openai_format
adding GoogleLLMOpenAIBetaService
2025-02-28 12:04:37 -08:00
Vaibhav159
974717d1b9 sync with main 2025-03-01 01:16:21 +05:30
Vaibhav159
59fb631390 fixing function calling and adding example 2025-03-01 01:14:37 +05:30
Vaibhav159
4824220260 adding GoogleLLMOpenAIBetaService 2025-03-01 01:14:26 +05:30
Mark Backman
55a338614d Merge pull request #1312 from pipecat-ai/mb/move-server-message-frame
Rename ServerMessageFrame to RTVIServerMessageFrame and move to rtvi.py
2025-02-28 13:59:31 -05:00
Vaibhav159
f033046963 using ruff automated formatting to avoid repeated failures 2025-02-28 08:25:15 +05:30
Mark Backman
6018fc068c Rename ServerMessageFrame to RTVIServerMessageFrame and move to rtvi.py 2025-02-27 20:07:07 -05:00
Aleix Conchillo Flaqué
d5b634301f Merge pull request #1302 from pipecat-ai/aleix/cleanup-llm-tts-logging
services: minor LLM and TTS logging improvements
2025-02-27 13:51:04 -08:00
Aleix Conchillo Flaqué
a37eb1049d Merge pull request #1310 from Canonical-AI-Inc/without-audio
Optional Recording
2025-02-27 13:37:39 -08:00
Adrian Cowham
803ea9d8bc update the canonical client so that the audio recording is optional as long as there is a transcript 2025-02-27 12:31:02 -08:00
Mark Backman
499bc25217 Merge pull request #1303 from pipecat-ai/mb/add-server-to-client-msg
Add a new generic server to client message and frame type
2025-02-27 12:56:57 -05:00
Mark Backman
53d403af4b Remove the RTVIServerMessage logic from the RTVIProcessor 2025-02-27 12:50:43 -05:00
Aleix Conchillo Flaqué
a0a8ea1641 Merge pull request #1301 from pipecat-ai/aleix/example-22d-fix-llm-aggregator 2025-02-26 22:39:48 -08:00
Mark Backman
26c68ccd7c Add a new generic server to client message and frame type 2025-02-26 18:59:06 -05:00
Aleix Conchillo Flaqué
fa010c8644 services: minor LLM and TTS logging improvements 2025-02-26 15:36:25 -08:00
Aleix Conchillo Flaqué
d58f398bc4 examples: fix for 22d-natural-conversation-gemini-audio.py 2025-02-26 13:15:07 -08:00
Aleix Conchillo Flaqué
11383a86a1 Merge pull request #1300 from pipecat-ai/aleix/prepare-0.0.58
update CHANGELOG for 0.0.58
2025-02-26 11:31:24 -08:00
Aleix Conchillo Flaqué
daa52ff8df update CHANGELOG for 0.0.58 2025-02-26 11:29:04 -08:00
Mark Backman
a5f41e22f7 Merge pull request #1299 from pipecat-ai/mb/add-track-level-recording
Added on_track_audio_data callback to AudioBufferProcessor for track level recording
2025-02-26 13:49:36 -05:00
Mark Backman
530bb5233d example: Added a foundational example (34) for audio recording 2025-02-26 13:44:32 -05:00
Aleix Conchillo Flaqué
4a64e09f6c Merge pull request #1297 from pipecat-ai/aleix/daily-python-0.15.0
pyproject: update daily-python, aiohttp and pydantic
2025-02-26 10:26:59 -08:00
Aleix Conchillo Flaqué
74582bb8d5 pyproject: update daily-python, aiohttp and pydantic 2025-02-26 10:22:34 -08:00
Mark Backman
1ca2101e3a Added on_track_audio_data callback to AudioBufferProcessor for track level recording 2025-02-26 10:48:56 -05:00
Aleix Conchillo Flaqué
e80311c323 Merge pull request #1296 from pipecat-ai/aleix/google-always-send-text-with-audio
GoogleLLMService: always send text with audio
2025-02-26 07:47:56 -08:00
Aleix Conchillo Flaqué
2f24c422b6 Merge pull request #1289 from pipecat-ai/aleix/tts-http-improvements
small TTS http improvements
2025-02-26 07:47:26 -08:00
Mark Backman
0d0b9fddef Merge pull request #1291 from pipecat-ai/mb/playht-http-protocol
PlayHTHttpTTSService now takes a separate protocol input
2025-02-26 08:09:49 -05:00
Mark Backman
1753cc99f4 PlayHTHttpTTSService now takes a separate protocol input 2025-02-26 08:01:54 -05:00
Aleix Conchillo Flaqué
4f8b036abe pyproject: remote httpx old dependency and upgrade anthropic/google-genai 2025-02-25 22:28:21 -08:00
Aleix Conchillo Flaqué
f83c89c202 examples: update google examples 2025-02-25 22:28:02 -08:00
Aleix Conchillo Flaqué
bb89a036e5 google: always send text part when sending inline audio 2025-02-25 22:27:38 -08:00
Aleix Conchillo Flaqué
b994a03466 examples: add more HTTP TTS services examples 2025-02-25 21:40:41 -08:00
Aleix Conchillo Flaqué
27161f8e3b BaseOutputTransport: cleanup audio buffer after bot stops talking 2025-02-25 21:39:47 -08:00
Aleix Conchillo Flaqué
8acf9a488b tts: some small HTTP-based services improvements 2025-02-25 21:39:47 -08:00
174 changed files with 6559 additions and 1714 deletions

View File

@@ -1,7 +1,8 @@
repos:
- repo: local
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.9.7
hooks:
- id: ruff-format-hook
name: Check ruff formatting
entry: sh scripts/pre-commit.sh
language: system
- id: ruff
language_version: python3
args: [ --select, I, ]
- id: ruff-format

View File

@@ -9,6 +9,191 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Added new `SkipTagsAggregator` that extends `BaseTextAggregator` to aggregate
text and skips end of sentence matching if aggregated text is between
start/end tags.
- Added new `PatternPairAggregator` that extends `BaseTextAggregator` to
identify content between matching pattern pairs in streamed text. This allows
for detection and processing of structured content like XML-style tags that
may span across multiple text chunks or sentence boundaries.
- Added new `BaseTextAggregator`. Text aggregators are used by the TTS service
to aggregate LLM tokens and decide when the aggregated text should be pushed
to the TTS service. They also allow for the text to be manipulated while it's
being aggregated. A text aggregator can be passed via `text_aggregator` to the
TTS service.
- Added new `UltravoxSTTService`.
(see https://github.com/fixie-ai/ultravox)
- Added `on_frame_reached_upstream` and `on_frame_reached_downstream` event
handlers to `PipelineTask`. Those events will be called when a frame reaches
the beginning or end of the pipeline respectively. Note that by default, the
event handlers will not be called unless a filter is set with
`PipelineTask.set_reached_upstream_filter()` or
`PipelineTask.set_reached_downstream_filter()`.
- Added support for Chirp voices in `GoogleTTSService`.
- Added a `flush_audio()` method to `FishTTSService` and `LmntTTSService`.
- Added a `set_language` convenience method for `GoogleSTTService`, allowing
you to set a single language. This is in addition to the `set_languages`
method which allows you to set a list of languages.
- Added `on_user_turn_audio_data` and `on_bot_turn_audio_data` to
`AudioBufferProcessor`. This gives the ability to grab the audio of only that
turn for both the user and the bot.
- Added new base class `BaseObject` which is now the base class of
`FrameProcessor`, `PipelineRunner`, `PipelineTask` and `BaseTransport`. The
new `BaseObject` adds supports for event handlers.
- Added support for a unified format for specifying function calling across all
LLM services.
```python
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
```
- Added `speech_threshold` parameter to `GladiaSTTService`.
- Allow passing user (`user_kwargs`) and assistant (`assistant_kwargs`) context
aggregator parameters when using `create_context_aggregator()`. The values are
passed as a mapping that will then be converted to arguments.
- Added `speed` as an `InputParam` for both `ElevenLabsTTSService` and
`ElevenLabsHttpTTSService`.
- Added new `LLMFullResponseAggregator` to aggregate full LLM completions. At
every completion the `on_completion` event handler is triggered.
- Added a new frame, `RTVIServerMessageFrame`, and RTVI message
`RTVIServerMessage` which provides a generic mechanism for sending custom
messages from server to client. The `RTVIServerMessageFrame` is processed by
the `RTVIObserver` and will be delivered to the client's `onServerMessage`
callback or `ServerMessage` event.
- Added `GoogleLLMOpenAIBetaService` for Google LLM integration with an
OpenAI-compatible interface. Added foundational example
`14o-function-calling-gemini-openai-format.py`.
- Added `AzureRealtimeBetaLLMService` to support Azure's OpeanAI Realtime API. Added
foundational example `19a-azure-realtime-beta.py`.
- Introduced `GoogleVertexLLMService`, a new class for integrating with Vertex AI
Gemini models. Added foundational example
`14p-function-calling-gemini-vertex-ai.py`.
### Changed
- All event handlers are now executed in separate tasks in order to prevent
blocking the pipeline. It is possible that event handlers take some time to
execute in which case the pipeline would be blocked waiting for the event
handler to complete.
- Updated `TranscriptProcessor` to support text output from
`OpenAIRealtimeBetaLLMService`.
- `OpenAIRealtimeBetaLLMService` and `GeminiMultimodalLiveLLMService` now push
a `TTSTextFrame`.
- Updated the default mode for `CartesiaTTSService` and
`CartesiaHttpTTSService` to `sonic-2`.
### Deprecated
- `TTSService` parameter `text_filter` is now deprecated, use `text_filters`
instead which is now a list. This allows passing multiple filters that will be
executed in order.
### Removed
- Removed deprecated `audio.resample_audio()`, use `create_default_resampler()`
instead.
- Removed deprecated`stt_service` parameter from `STTMuteFilter`.
- Removed deprecated RTVI processors, use an `RTVIObserver` instead.
- Removed deprecated `AWSTTSService`, use `PollyTTSService` instead.
- Removed deprecated field `tier` from `DailyTranscriptionSettings`, use `model`
instead.
- Removed deprecated `pipecat.vad` package, use `pipecat.audio.vad` instead.
### Fixed
- Fixed an issue with `SegmentedSTTService` based services
(e.g. `GroqSTTService`) that was not allow audio to pass-through downstream.
- Fixed a `CartesiaTTSService` and `RimeTTSService` issue that would consider
text between spelling out tags end of sentence.
- Fixed a `match_endofsentence` issue that would result in floating point
numbers to be considered an end of sentence.
- Fixed a `match_endofsentence` issue that would result in emails to be
considered an end of sentence.
- Fixed an issue where the RTVI message `disconnect-bot` was pushing an
`EndFrame`, resulting in the pipeline not shutting down. It now pushes an
`EndTaskFrame` upstream to shutdown the pipeline.
- Fixed an issue with the `GoogleSTTService` where stream timeouts during
periods of inactivity were causing connection failures. The service now
properly detects timeout errors and handles reconnection gracefully,
ensuring continuous operation even after periods of silence or when using an
`STTMuteFilter`.
- Fixed an issue in `RimeTTSService` where the last line of text sent didn't
result in an audio output being generated.
### Other
- Added a new example `examples/foundational/36-user-email-gathering.py` to show
how to gather user emails. The example uses's Cartesia's `<spell></spell>`
tags and Rime `spell()` function to spell out the emails for confirmation.
- Update the `34-audio-recording.py` example to include an STT processor.
- Added foundational example `35-voice-switching.py` showing how to use the new
`PatternPairAggregator`. This example shows how to encode information for the
LLM to instruct TTS voice changes, but this can be used to encode any
information into the LLM response, which you want to parse and use in other
parts of your application.
- Added a Pipecat Cloud deployment example to the `examples` directory.
- Removed foundational examples 28b and 28c as the TranscriptProcessor no
longer has an LLM depedency. Renamed foundational example 28a to
`28-transcript-processor.py`.
## [0.0.58] - 2025-02-26
### Added
- Added track-specific audio event `on_track_audio_data` to
`AudioBufferProcessor` for accessing separate input and output audio tracks.
- Pipecat version will now be logged on every application startup. This will
help us identify what version we are running in case of any issues.
@@ -45,6 +230,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- ⚠️ `PipelineTask` now requires keyword arguments (except for the first one for
the pipeline).
- Updated `PlayHTHttpTTSService` to take a `voice_engine` and `protocol` input
in the constructor. The previous method of providing a `voice_engine` input
that contains the engine and protocol is deprecated by PlayHT.
- The base `TTSService` class now strips leading newlines before sending text
to the TTS provider. This change is to solve issues where some TTS providers,
like Azure, would not output text due to newlines.
@@ -78,6 +267,9 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
### Fixed
- Fixed a `GoogleLLMService` that was causing an exception when sending inline
audio in some cases.
- Fixed an `AudioContextWordTTSService` issue that would cause an `EndFrame` to
disconnect from the TTS service before audio from all the contexts was
received. This affected services like Cartesia and Rime.
@@ -124,6 +316,9 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
- Added Gemini support to `examples/phone-chatbot`.
- Added foundational example `34-audio-recording.py` showing how to use the
AudioBufferProcessor callbacks to save merged and track recordings.
## [0.0.57] - 2025-02-14
### Added

View File

@@ -57,13 +57,13 @@ pip install "pipecat-ai[option,...]"
| Category | Services | Install Command Example |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[google]"` |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
| Vision & Image | [Moondream](https://docs.pipecat.ai/server/services/vision/moondream), [fal](https://docs.pipecat.ai/server/services/image-generation/fal) | `pip install "pipecat-ai[moondream]"` |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) | `pip install "pipecat-ai[moondream]"` |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |

View File

@@ -29,6 +29,9 @@ DAILY_SAMPLE_ROOM_URL=https://...
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
# Neuphonic
NEUPHONIC_API_KEY=...
# Fal
FAL_KEY=...

View File

@@ -64,7 +64,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
runner = PipelineRunner()

View File

@@ -113,8 +113,8 @@ async def main():
llm,
tts,
transport.output(),
audio_buffer_processor, # captures audio into a buffer
canonical, # uploads audio buffer to Canonical AI for metrics
audio_buffer_processor, # captures audio into a buffer
context_aggregator.assistant(),
]
)

View File

@@ -32,10 +32,16 @@ load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Create the recordings directory if it doesn't exist
os.makedirs("recordings", exist_ok=True)
async def save_audio(audio: bytes, sample_rate: int, num_channels: int):
async def save_audio(audio: bytes, sample_rate: int, num_channels: int, name: str):
if len(audio) > 0:
filename = f"conversation_recording{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.wav"
filename = os.path.join(
"recordings",
f"{name}_conversation_recording{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.wav",
)
with io.BytesIO() as buffer:
with wave.open(buffer, "wb") as wf:
wf.setsampwidth(2)
@@ -110,7 +116,7 @@ async def main():
# NOTE: Watch out! This will save all the conversation in memory. You
# can pass `buffer_size` to get periodic callbacks.
audiobuffer = AudioBufferProcessor()
audiobuffer = AudioBufferProcessor(enable_turn_audio=True)
pipeline = Pipeline(
[
@@ -128,7 +134,15 @@ async def main():
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels)
await save_audio(audio, sample_rate, num_channels, "full")
@audiobuffer.event_handler("on_user_turn_audio_data")
async def on_user_turn_audio_data(buffer, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels, "user")
@audiobuffer.event_handler("on_bot_turn_audio_data")
async def on_bot_turn_audio_data(buffer, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels, "bot")
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -34,7 +34,7 @@ async def main(room_url: str, token: str):
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -0,0 +1,94 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
dist/
*.egg-info/
*.egg
.installed.cfg
.eggs/
downloads/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
MANIFEST
# Virtual Environments
venv/
env/
.env
.venv/
ENV/
env.bak/
venv.bak/
# IDE
.idea/
.vscode/
.spyderproject
.spyproject
.ropeproject
# Testing and Coverage
.coverage
.coverage.*
htmlcov/
.pytest_cache/
.tox/
.nox/
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
cover/
# Logs and Databases
*.log
*.db
db.sqlite3
db.sqlite3-journal
pip-log.txt
# System Files
.DS_Store
Thumbs.db
desktop.ini
*.swp
*.swo
*.bak
*.tmp
*~
# Build and Documentation
docs/_build/
.pybuilder/
target/
instance/
.webassets-cache
.pdm.toml
.pdm-python
.pdm-build/
__pypackages__/
# Other
*.mo
*.pot
*.sage.py
.mypy_cache/
.dmypy.json
dmypy.json
.pyre/
.pytype/
cython_debug/
.ipynb_checkpoints
# Pipecat cloud
.pcc-deploy.toml

View File

@@ -0,0 +1,7 @@
FROM dailyco/pipecat-base:latest
COPY ./requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade -r requirements.txt
COPY ./bot.py bot.py

View File

@@ -0,0 +1,196 @@
# Pipecat Cloud Starter Project
[![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.daily.co) [![Discord](https://img.shields.io/discord/1217145424381743145)](https://discord.gg/dailyco)
A template voice agent for [Pipecat Cloud](https://www.daily.co/products/pipecat-cloud/) that demonstrates building and deploying a conversational AI agent.
> **For a detailed step-by-step guide, see our [Quickstart Documentation](https://docs.pipecat.daily.co/quickstart).**
## Prerequisites
- Python 3.10+
- Linux, MacOS, or Windows Subsystem for Linux (WSL)
- [Docker](https://www.docker.com) and a Docker repository (e.g., [Docker Hub](https://hub.docker.com))
- A Docker Hub account (or other container registry account)
- [Pipecat Cloud](https://pipecat.daily.co) account
> **Note**: If you haven't installed Docker yet, follow the official installation guides for your platform ([Linux](https://docs.docker.com/engine/install/), [Mac](https://docs.docker.com/desktop/setup/install/mac-install/), [Windows](https://docs.docker.com/desktop/setup/install/windows-install/)). For Docker Hub, [create a free account](https://hub.docker.com/signup) and log in via terminal with `docker login`.
## Get Started
### 1. Get the starter project
Clone the starter project from GitHub:
```bash
git clone https://github.com/daily-co/pipecat-cloud-starter
cd pipecat-cloud-starter
```
### 2. Set up your Python environment
We recommend using a virtual environment to manage your Python dependencies.
```bash
# Create a virtual environment
python -m venv .venv
# Activate it
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the Pipecat Cloud CLI
pip install pipecatcloud
```
### 3. Authenticate with Pipecat Cloud
```bash
pcc auth login
```
### 4. Acquire required API keys
This starter requires the following API keys:
- **OpenAI API Key**: Get from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
- **Cartesia API Key**: Get from [play.cartesia.ai/keys](https://play.cartesia.ai/keys)
- **Daily API Key**: Automatically provided through your Pipecat Cloud account
### 5. Configure to run locally (optional)
You can test your agent locally before deploying to Pipecat Cloud:
```bash
# Set environment variables with your API keys
export CARTESIA_API_KEY="your_cartesia_key"
export DAILY_API_KEY="your_daily_key"
export OPENAI_API_KEY="your_openai_key"
```
> Your `DAILY_API_KEY` can be found at [https://pipecat.daily.co](https://pipecat.daily.co) under the `Settings` in the `Daily (WebRTC)` tab.
First install requirements:
```bash
pip install -r requirements.txt
```
Then, launch the bot.py script locally:
```bash
LOCAL_RUN=1 python bot.py
```
## Deploy & Run
### 1. Build and push your Docker image
```bash
# Build the image (targeting ARM architecture for cloud deployment)
docker build --platform=linux/arm64 -t my-first-agent:latest .
# Tag with your Docker username and version
docker tag my-first-agent:latest your-username/my-first-agent:0.1
# Push to Docker Hub
docker push your-username/my-first-agent:0.1
```
### 2. Create a secret set for your API keys
The starter project requires API keys for OpenAI and Cartesia:
```bash
# Copy the example env file
cp env.example .env
# Edit .env to add your API keys:
# CARTESIA_API_KEY=your_cartesia_key
# OPENAI_API_KEY=your_openai_key
# Create a secret set from your .env file
pcc secrets set my-first-agent-secrets --file .env
```
Alternatively, you can create secrets directly via CLI:
```bash
pcc secrets set my-first-agent-secrets \
CARTESIA_API_KEY=your_cartesia_key \
OPENAI_API_KEY=your_openai_key
```
### 3. Deploy to Pipecat Cloud
```bash
pcc deploy my-first-agent your-username/my-first-agent:0.1 --secrets my-first-agent-secrets
```
> **Note (Optional)**: For a more maintainable approach, you can use the included `pcc-deploy.toml` file:
>
> ```toml
> agent_name = "my-first-agent"
> image = "your-username/my-first-agent:0.1"
> secret_set = "my-first-agent-secrets"
>
> [scaling]
> min_instances = 0
> ```
>
> Then simply run `pcc deploy` without additional arguments.
> **Note**: If your repository is private, you'll need to add credentials:
>
> ```bash
> # Create pull secret (youll be prompted for credentials)
> pcc secrets image-pull-secret pull-secret https://index.docker.io/v1/
>
> # Deploy with credentials
> pcc deploy my-first-agent your-username/my-first-agent:0.1 --credentials pull-secret
> ```
### 4. Check deployment and scaling (optional)
By default, your agent will use "scale-to-zero" configuration, which means it may have a cold start of around 10 seconds when first used. By default, idle instances are maintained for 5 minutes before being terminated when using scale-to-zero.
For more responsive testing, you can scale your deployment to keep a minimum of one instance warm:
```bash
# Ensure at least one warm instance is always available
pcc deploy my-first-agent your-username/my-first-agent:0.1 --min-instances 1
# Check the status of your deployment
pcc agent status my-first-agent
```
By default, idle instances are maintained for 5 minutes before being terminated when using scale-to-zero.
### 5. Create an API key
```bash
# Create a public API key for accessing your agent
pcc organizations keys create
# Set it as the default key to use with your agent
pcc organizations keys use
```
### 6. Start your agent
```bash
# Start a session with your agent in a Daily room
pcc agent start my-first-agent --use-daily
```
This will return a URL, which you can use to connect to your running agent.
## Documentation
For more details on Pipecat Cloud and its capabilities:
- [Pipecat Cloud Documentation](https://docs.pipecat.daily.co)
- [Pipecat Project Documentation](https://docs.pipecat.ai)
## Support
Join our [Discord community](https://discord.gg/dailyco) for help and discussions.

View File

@@ -0,0 +1,161 @@
#
# Copyright (c) 2025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecatcloud.agent import DailySessionArguments
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
# Check if we're in local development mode
LOCAL_RUN = os.getenv("LOCAL_RUN")
if LOCAL_RUN:
import asyncio
import webbrowser
try:
from local_runner import configure
except ImportError:
logger.error("Could not import local_runner module. Local development mode may not work.")
# Load environment variables
load_dotenv(override=True)
async def main(room_url: str, token: str):
"""Main pipeline setup and execution function.
Args:
room_url: The Daily room URL
token: The Daily room token
"""
logger.debug("Starting bot in room: {}", room_url)
transport = DailyTransport(
room_url,
token,
"bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
logger.info("First participant joined: {}", participant["id"])
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{
"role": "system",
"content": "Please start with 'Hello World' and introduce yourself to the user.",
}
)
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
logger.info("Participant left: {}", participant)
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
async def bot(args: DailySessionArguments):
"""Main bot entry point compatible with the FastAPI route handler.
Args:
room_url: The Daily room URL
token: The Daily room token
body: The configuration object from the request body
session_id: The session ID for logging
"""
logger.info(f"Bot process initialized {args.room_url} {args.token}")
try:
await main(args.room_url, args.token)
logger.info("Bot process completed")
except Exception as e:
logger.exception(f"Error in bot process: {str(e)}")
raise
# Local development functions
async def local_main():
"""Function for local development testing."""
try:
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
logger.warning("_")
logger.warning("_")
logger.warning(f"Talk to your voice agent here: {room_url}")
logger.warning("_")
logger.warning("_")
webbrowser.open(room_url)
await main(room_url, token)
except Exception as e:
logger.exception(f"Error in local development mode: {e}")
# Local development entry point
if LOCAL_RUN and __name__ == "__main__":
try:
asyncio.run(local_main())
except Exception as e:
logger.exception(f"Failed to run in local mode: {e}")

View File

@@ -0,0 +1,2 @@
CARTESIA_API_KEY=
OPENAI_API_KEY=

View File

@@ -0,0 +1,46 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
async def configure(aiohttp_session: aiohttp.ClientSession):
(url, token) = await configure_with_args(aiohttp_session)
return (url, token)
async def configure_with_args(aiohttp_session: aiohttp.ClientSession = None):
key = os.getenv("DAILY_API_KEY")
if not key:
raise Exception(
"No Daily API key specified. set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
room = await daily_rest_helper.create_room(
DailyRoomParams(properties={"enable_prejoin_ui": False})
)
if not room.url:
raise HTTPException(status_code=500, detail="Failed to create room")
url = room.url
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -0,0 +1,6 @@
agent_name = "my-first-agent"
image = "your-username/my-first-agent:0.1"
secret_set = "my-first-agent-secrets"
[scaling]
min_instances = 0

View File

@@ -0,0 +1,3 @@
pipecatcloud
pipecat-ai[cartesia,daily,openai,silero]>=0.0.58
python-dotenv~=1.0.1

View File

@@ -36,7 +36,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
runner = PipelineRunner()

View File

@@ -29,7 +29,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
pipeline = Pipeline([tts, transport.output()])

View File

@@ -83,7 +83,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
runner = PipelineRunner()

View File

@@ -37,7 +37,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -87,7 +87,7 @@ async def main():
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
imagegen = FalImageGenService(

View File

@@ -97,7 +97,7 @@ async def main():
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
imagegen = FalImageGenService(

View File

@@ -74,7 +74,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -93,7 +93,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -47,7 +47,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -46,7 +46,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -46,7 +46,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AnthropicLLMService(

View File

@@ -64,7 +64,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
prompt = ChatPromptTemplate.from_messages(

View File

@@ -0,0 +1,103 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.elevenlabs import ElevenLabsHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsHttpTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -47,7 +47,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
timestamp = int(time.time())

View File

@@ -51,7 +51,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -46,7 +46,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = TogetherLLMService(

View File

@@ -51,7 +51,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -0,0 +1,103 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService
from pipecat.services.rime import RimeHttpTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -213,7 +213,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")

View File

@@ -0,0 +1,102 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.neuphonic import NeuphonicHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = NeuphonicHttpTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"),
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,102 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.neuphonic import NeuphonicTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = NeuphonicTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"),
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,90 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.ultravox import UltravoxSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
# NOTE: This example requires GPU resources to run efficiently.
# The Ultravox model is compute-intensive and performs best with GPU acceleration.
# This can be deployed on cloud GPU providers like Cerebrium.ai for optimal performance.
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Want to initialize the ultravox processor since it takes time to load the model and dont
# want to load it every time the pipeline is run
ultravox_processor = UltravoxSTTService(
model_size="fixie-ai/ultravox-v0_4_1-llama-3_1-8b",
hf_token=os.getenv("HF_TOKEN"),
)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
vad_audio_passthrough=True,
),
)
tts = CartesiaTTSService(
api_key=os.environ.get("CARTESIA_API_KEY"),
voice_id="97f4b8fb-f2fe-444b-bb9a-c109783a857a",
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
ultravox_processor,
tts, # TTS
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
),
)
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -47,7 +47,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -100,7 +100,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [

View File

@@ -77,7 +77,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
@transport.event_handler("on_first_participant_joined")

View File

@@ -77,7 +77,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
@transport.event_handler("on_first_participant_joined")

View File

@@ -76,7 +76,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
@transport.event_handler("on_first_participant_joined")

View File

@@ -76,7 +76,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
@transport.event_handler("on_first_participant_joined")

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -57,7 +58,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
@@ -65,30 +66,24 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -13,6 +13,8 @@ from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -51,7 +53,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AnthropicLLMService(
@@ -59,22 +61,18 @@ async def main():
)
llm.register_function("get_weather", get_weather)
tools = [
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
weather_function = FunctionSchema(
name="get_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
}
]
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
# todo: test with very short initial user message

View File

@@ -13,6 +13,8 @@ from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -60,7 +62,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AnthropicLLMService(
@@ -72,36 +74,29 @@ async def main():
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
tools = [
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
weather_function = FunctionSchema(
name="get_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
{
"name": "get_image",
"description": "Get an image from the video stream.",
"input_schema": {
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
},
"required": ["question"],
},
required=["location"],
)
get_image_function = FunctionSchema(
name="get_image",
description="Get an image from the video stream.",
properties={
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
},
]
required=["question"],
)
tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
# todo: test with very short initial user message

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -58,7 +59,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = TogetherLLMService(
@@ -69,30 +70,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -59,54 +60,41 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
),
ChatCompletionToolParam(
type="function",
function={
"name": "get_image",
"description": "Get an image from the video stream.",
"parameters": {
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "The question to ask the AI to generate an image of",
},
},
"required": ["question"],
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
),
]
},
required=["location"],
)
get_image_function = FunctionSchema(
name="get_image",
description="Get an image from the video stream.",
properties={
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
},
required=["question"],
)
tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
system_prompt = """\
You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.

View File

@@ -13,6 +13,8 @@ from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -66,52 +68,41 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
llm.register_function("get_weather", get_weather, start_fetch_weather)
llm.register_function("get_image", get_image)
tools = [
{
"function_declarations": [
{
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
{
"name": "get_image",
"description": "Get and image from the camera or video stream.",
"parameters": {
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "The question to to use when running inference on the acquired image.",
},
},
"required": ["question"],
},
},
]
}
]
weather_function = FunctionSchema(
name="get_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
get_image_function = FunctionSchema(
name="get_image",
description="Get an image from the video stream.",
properties={
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
},
required=["question"],
)
tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
system_prompt = """\
You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -60,7 +61,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile")
@@ -68,30 +69,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -58,7 +59,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GrokLLMService(api_key=os.getenv("GROK_API_KEY"))
@@ -66,30 +67,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -58,7 +59,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AzureLLMService(
@@ -70,30 +71,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -58,7 +59,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = FireworksLLMService(
@@ -69,30 +70,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -58,8 +59,8 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
# text_filter=MarkdownTextFilter(),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
# text_filters=[MarkdownTextFilter()],
)
llm = NimLLMService(
@@ -69,30 +70,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Returns the current weather at a location, if one is specified, and defaults to the user's location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to find the weather of, or if not provided, it's the default location.",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Whether to use SI or USCS units (celsius or fahrenheit).",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -58,7 +59,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = CerebrasLLMService(api_key=os.getenv("CEREBRAS_API_KEY"), model="llama-3.3-70b")
@@ -66,30 +67,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather for a specific location. You MUST use this function whenever asked about weather.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Use fahrenheit for US locations, celsius for others.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -58,7 +59,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = DeepSeekLLMService(api_key=os.getenv("DEEPSEEK_API_KEY"), model="deepseek-chat")
@@ -66,30 +67,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather for a specific location. You MUST use this function whenever asked about weather.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Use fahrenheit for US locations, celsius for others.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -11,9 +11,10 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -70,30 +71,23 @@ async def main():
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
)
]
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",

View File

@@ -55,7 +55,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = PerplexityLLMService(api_key=os.getenv("PERPLEXITY_API_KEY"), model="sonar")

View File

@@ -0,0 +1,131 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.google import GoogleLLMOpenAIBetaService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(function_name, llm, context):
"""Push a frame to the LLM; this is handy when the LLM response might take a while."""
await llm.push_frame(TTSSpeakFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = GoogleLLMOpenAIBetaService(api_key=os.getenv("GEMINI_API_KEY"))
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(
"get_current_weather", fetch_weather_from_api, start_callback=start_fetch_weather
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "user",
"content": "Start a conversation with 'Hey there' to get the current weather.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,137 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.google import GoogleVertexLLMService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(function_name, llm, context):
"""Push a frame to the LLM; this is handy when the LLM response might take a while."""
await llm.push_frame(TTSSpeakFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = GoogleVertexLLMService(
# credentials="<json-credentials>",
params=GoogleVertexLLMService.InputParams(
project_id="<google-project-id>",
)
)
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(
"get_current_weather", fetch_weather_from_api, start_callback=start_fetch_weather
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "user",
"content": "Start a conversation with 'Hey there' to get the current weather.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -78,7 +78,7 @@ async def main():
british_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
barbershop_man = CartesiaTTSService(
@@ -125,7 +125,10 @@ async def main():
llm, # LLM
ParallelPipeline( # TTS (one of the following vocies)
[FunctionFilter(news_lady_filter), news_lady], # News Lady voice
[FunctionFilter(british_lady_filter), british_lady], # British Lady voice
[
FunctionFilter(british_lady_filter),
british_lady,
], # British Reading Lady voice
[FunctionFilter(barbershop_man_filter), barbershop_man], # Barbershop Man voice
),
transport.output(), # Transport bot output

View File

@@ -71,7 +71,7 @@ async def main():
english_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
spanish_tts = CartesiaTTSService(

View File

@@ -48,7 +48,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -16,13 +16,10 @@ from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import TranscriptionMessage, TranscriptionUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.transcript_processor import TranscriptProcessor
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.openai_realtime_beta import (
InputAudioTranscription,
OpenAIRealtimeBetaLLMService,
@@ -143,23 +140,15 @@ Remember, your responses should be short. Just one or two sentences, usually."""
tools,
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
# Create transcript processor and handler
transcript = TranscriptProcessor()
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
transcript.user(), # User transcripts
context_aggregator.user(),
llm, # LLM
context_aggregator.assistant(),
transcript.assistant(), # Assistant transcripts
transport.output(), # Transport bot output
context_aggregator.assistant(),
]
)
@@ -173,16 +162,9 @@ Remember, your responses should be short. Just one or two sentences, usually."""
),
)
# Register event handler for transcript updates
@transcript.event_handler("on_transcript_update")
async def on_transcript_update(processor, frame):
logger.debug(f"Received transcript update with {len(frame.messages)} new messages")
for msg in frame.messages:
logger.debug(msg)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# await transport.capture_participant_transcription(participant["id"])
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])

View File

@@ -0,0 +1,179 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from datetime import datetime
import aiohttp
import websockets
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai_realtime_beta import (
AzureRealtimeBetaLLMService,
InputAudioTranscription,
SessionProperties,
TurnDetection,
)
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
temperature = 75 if args["format"] == "fahrenheit" else 24
await result_callback(
{
"conditions": "nice",
"temperature": temperature,
"format": args["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
tools = [
{
"type": "function",
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
}
]
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.8)),
vad_audio_passthrough=True,
),
)
session_properties = SessionProperties(
input_audio_transcription=InputAudioTranscription(),
# Set openai TurnDetection parameters. Not setting this at all will turn it
# on by default
# turn_detection=TurnDetection(silence_duration_ms=1000),
# Or set to False to disable openai turn detection and use transport VAD
# turn_detection=False,
# tools=tools,
instructions="""Your knowledge cutoff is 2023-10. You are a helpful and friendly AI.
Act like a human, but remember that you aren't a human and that you can't do human
things in the real world. Your voice and personality should be warm and engaging, with a lively and
playful tone.
If interacting in a non-English language, start by using the standard accent or dialect familiar to
the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
even if you're asked about them.
-
You are participating in a voice conversation. Keep your responses concise, short, and to the point
unless specifically asked to elaborate on a topic.
Remember, your responses should be short. Just one or two sentences, usually.""",
)
llm = AzureRealtimeBetaLLMService(
api_key=os.getenv("AZURE_REALTIME_API_KEY"),
base_url=os.getenv("AZURE_REALTIME_BASE_URL"),
session_properties=session_properties,
start_audio_paused=False,
)
# you can either register a single function for all function calls, or specific functions
# llm.register_function(None, fetch_weather_from_api)
llm.register_function("get_current_weather", fetch_weather_from_api)
# Create a standard OpenAI LLM context object using the normal messages format. The
# OpenAIRealtimeBetaLLMService will convert this internally to messages that the
# openai WebSocket API can understand.
context = OpenAILLMContext(
[{"role": "user", "content": "Say hello!"}],
# [{"role": "user", "content": [{"type": "text", "text": "Say hello!"}]}],
# [
# {
# "role": "user",
# "content": [
# {"type": "text", "text": "Say"},
# {"type": "text", "text": "yo what's up!"},
# ],
# }
# ],
tools,
)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(),
llm, # LLM
context_aggregator.assistant(),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
# report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -184,7 +184,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -179,7 +179,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AnthropicLLMService(

View File

@@ -234,7 +234,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY"))

View File

@@ -56,7 +56,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# This is the LLM that will be used to detect if the user has finished a

View File

@@ -229,7 +229,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# This is the LLM that will be used to detect if the user has finished a

View File

@@ -433,7 +433,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# This is the LLM that will be used to detect if the user has finished a

View File

@@ -23,7 +23,6 @@ from pipecat.frames.frames import (
FunctionCallInProgressFrame,
FunctionCallResultFrame,
InputAudioRawFrame,
LLMFullResponseEndFrame,
LLMFullResponseStartFrame,
StartFrame,
StartInterruptionFrame,
@@ -37,7 +36,7 @@ from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMResponseAggregator
from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext,
OpenAILLMContextFrame,
@@ -389,7 +388,7 @@ class AudioAccumulator(FrameProcessor):
)
self._user_speaking = False
context = GoogleLLMContext()
context.add_audio_frames_message(text="Audio follows", audio_frames=self._audio_frames)
context.add_audio_frames_message(audio_frames=self._audio_frames)
await self.push_frame(OpenAILLMContextFrame(context=context))
elif isinstance(frame, InputAudioRawFrame):
# Append the audio frame to our buffer. Treat the buffer as a ring buffer, dropping the oldest
@@ -432,7 +431,11 @@ class CompletenessCheck(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, UserStartedSpeakingFrame):
if isinstance(frame, (EndFrame, CancelFrame)):
if self._idle_task:
await self.cancel_task(self._idle_task)
self._idle_task = None
elif isinstance(frame, UserStartedSpeakingFrame):
if self._idle_task:
await self.cancel_task(self._idle_task)
elif isinstance(frame, TextFrame) and frame.text.startswith("YES"):
@@ -474,19 +477,11 @@ class CompletenessCheck(FrameProcessor):
self._idle_task = None
class UserAggregatorBuffer(LLMResponseAggregator):
class LLMAggregatorBuffer(LLMAssistantResponseAggregator):
"""Buffers the output of the transcription LLM. Used by the bot output gate."""
def __init__(self, **kwargs):
super().__init__(
messages=None,
role=None,
start_frame=LLMFullResponseStartFrame,
end_frame=LLMFullResponseEndFrame,
accumulator_frame=TextFrame,
handle_interruptions=True,
expect_stripped_words=False,
)
super().__init__(expect_stripped_words=False)
self._transcription = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -544,7 +539,7 @@ class OutputGate(FrameProcessor):
self,
notifier: BaseNotifier,
context: OpenAILLMContext,
user_transcription_buffer: "UserAggregatorBuffer",
llm_transcription_buffer: LLMAggregatorBuffer,
**kwargs,
):
super().__init__(**kwargs)
@@ -552,7 +547,7 @@ class OutputGate(FrameProcessor):
self._frames_buffer = []
self._notifier = notifier
self._context = context
self._transcription_buffer = user_transcription_buffer
self._transcription_buffer = llm_transcription_buffer
self._gate_task = None
def close_gate(self):
@@ -649,7 +644,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# This is the LLM that will transcribe user speech.
@@ -699,10 +694,10 @@ async def main():
conversation_audio_context_assembler = ConversationAudioContextAssembler(context=context)
user_aggregator_buffer = UserAggregatorBuffer()
llm_aggregator_buffer = LLMAggregatorBuffer()
bot_output_gate = OutputGate(
notifier=notifier, context=context, user_transcription_buffer=user_aggregator_buffer
notifier=notifier, context=context, llm_transcription_buffer=llm_aggregator_buffer
)
pipeline = Pipeline(
@@ -723,7 +718,7 @@ async def main():
],
[
tx_llm,
user_aggregator_buffer,
llm_aggregator_buffer,
],
)
],

View File

@@ -59,7 +59,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -294,7 +294,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
conversation_llm = GoogleLLMService(

View File

@@ -77,7 +77,7 @@ async def main():
LLMMessagesAppendFrame(
messages=[
{
"role": "assistant",
"role": "user",
"content": "Greet the user.",
}
]

View File

@@ -14,6 +14,8 @@ from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.pipeline.pipeline import Pipeline
@@ -41,32 +43,6 @@ async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context
)
tools = [
{
"function_declarations": [
{
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
]
}
]
system_instruction = """
You are a helpful assistant who can answer questions and use tools.
@@ -95,6 +71,27 @@ async def main():
),
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
search_tool = {"google_search": {}}
tools = ToolsSchema(
standard_tools=[weather_function], custom_tools={AdapterType.GEMINI: [search_tool]}
)
llm = GeminiMultimodalLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system_instruction,

View File

@@ -78,7 +78,7 @@ async def main():
# )
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
messages = [

View File

@@ -113,7 +113,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(

View File

@@ -1,177 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from typing import List, Optional
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TranscriptionMessage, TranscriptionUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.transcript_processor import TranscriptProcessor
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptHandler:
"""Handles real-time transcript processing and output.
Maintains a list of conversation messages and outputs them either to a log
or to a file as they are received. Each message includes its timestamp and role.
Attributes:
messages: List of all processed transcript messages
output_file: Optional path to file where transcript is saved. If None, outputs to log only.
"""
def __init__(self, output_file: Optional[str] = None):
"""Initialize handler with optional file output.
Args:
output_file: Path to output file. If None, outputs to log only.
"""
self.messages: List[TranscriptionMessage] = []
self.output_file: Optional[str] = output_file
logger.debug(
f"TranscriptHandler initialized {'with output_file=' + output_file if output_file else 'with log output only'}"
)
async def save_message(self, message: TranscriptionMessage):
"""Save a single transcript message.
Outputs the message to the log and optionally to a file.
Args:
message: The message to save
"""
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
line = f"{timestamp}{message.role}: {message.content}"
# Always log the message
logger.info(f"Transcript: {line}")
# Optionally write to file
if self.output_file:
try:
with open(self.output_file, "a", encoding="utf-8") as f:
f.write(line + "\n")
except Exception as e:
logger.error(f"Error saving transcript message to file: {e}")
async def on_transcript_update(
self, processor: TranscriptProcessor, frame: TranscriptionUpdateFrame
):
"""Handle new transcript messages.
Args:
processor: The TranscriptProcessor that emitted the update
frame: TranscriptionUpdateFrame containing new messages
"""
logger.debug(f"Received transcript update with {len(frame.messages)} new messages")
for msg in frame.messages:
self.messages.append(msg)
await self.save_message(msg)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-5-sonnet-20241022"
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative, helpful, and brief way.",
},
{"role": "user", "content": "Say hello."},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# Create transcript processor and handler
transcript = TranscriptProcessor()
transcript_handler = TranscriptHandler() # Output to log only
# transcript_handler = TranscriptHandler(output_file="transcript.txt") # Output to file and log
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
transcript.user(), # User transcripts
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
transcript.assistant(), # Assistant transcripts
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
# Register event handler for transcript updates
@transcript.event_handler("on_transcript_update")
async def on_transcript_update(processor, frame):
await transcript_handler.on_transcript_update(processor, frame)
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
# Stop the pipeline immediately when the participant leaves
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,210 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sqlite3
import sys
from typing import List, Optional
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TranscriptionMessage, TranscriptionUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.transcript_processor import TranscriptProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.google import GoogleLLMService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptHandler:
"""Handles real-time transcript processing and output.
Maintains a list of conversation messages and outputs them either to a log
or to a file as they are received. Each message includes its timestamp and role.
Attributes:
messages: List of all processed transcript messages
output_file: Optional path to file where transcript is saved. If None, outputs to log only.
"""
def __init__(self, output_file: Optional[str] = None, output_db: Optional[str] = None):
"""Initialize handler with optional file or database output.
Args:
output_file: Path to output file. If None, outputs to log only.
"""
self.messages: List[TranscriptionMessage] = []
self.output_file: Optional[str] = output_file
self.output_db: Optional[str] = output_db
if self.output_db:
self.con = sqlite3.connect("example.db")
self.db = self.con.cursor()
table = self.db.execute("SELECT name FROM sqlite_master WHERE name='messages'")
if not (table.fetchone()):
self.db.execute(
"CREATE TABLE messages(role TEXT, content TEXT, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP )"
)
logger.debug(
f"TranscriptHandler initialized; output file: {output_file}, output DB: {output_db}"
)
async def save_message(self, message: TranscriptionMessage):
"""Save a single transcript message.
Outputs the message to the log and optionally to a SQLite database or file.
Args:
message: The message to save
"""
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
line = f"{timestamp}{message.role}: {message.content}"
# Always log the message
logger.info(f"Transcript: {line}")
# Optionally write to file
if self.output_file:
try:
with open(self.output_file, "a", encoding="utf-8") as f:
f.write(line + "\n")
except Exception as e:
logger.error(f"Error saving transcript message to file: {e}")
# and/or to a SQLite database
if self.output_db:
self.db.execute(
"INSERT INTO messages VALUES (?, ?, ?)",
(message.role, message.content, message.timestamp),
)
self.con.commit()
async def on_transcript_update(
self, processor: TranscriptProcessor, frame: TranscriptionUpdateFrame
):
"""Handle new transcript messages.
Args:
processor: The TranscriptProcessor that emitted the update
frame: TranscriptionUpdateFrame containing new messages
"""
logger.debug(f"Received transcript update with {len(frame.messages)} new messages")
for msg in frame.messages:
self.messages.append(msg)
await self.save_message(msg)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GoogleLLMService(
model="models/gemini-2.0-flash-exp",
# model="gemini-exp-1114",
api_key=os.getenv("GOOGLE_API_KEY"),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative, helpful, and brief way.",
},
{"role": "user", "content": "Say hello."},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# Create transcript processor and handler
transcript = TranscriptProcessor()
# Select a TranscriptHandler output method
# Uncomment out only one of the following lines:
transcript_handler = TranscriptHandler() # Output to log only
# transcript_handler = TranscriptHandler(output_file="transcript.txt") # Output to file and log
# transcript_handler = TranscriptHandler(output_db="example.db") # Output to SQLite DB and log
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
transcript.user(), # User transcripts
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
transcript.assistant(), # Assistant transcripts
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
# Register event handler for transcript updates
@transcript.event_handler("on_transcript_update")
async def on_transcript_update(processor, frame):
await transcript_handler.on_transcript_update(processor, frame)
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
# Stop the pipeline immediately when the participant leaves
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -131,7 +131,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [

View File

@@ -89,7 +89,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -81,7 +81,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# Initialize the Gemini Multimodal Live model

View File

@@ -0,0 +1,190 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Audio Recording Example with Pipecat.
This example demonstrates how to record audio from a conversation between a user and an AI assistant,
saving both merged and individual audio tracks. It showcases the AudioBufferProcessor's capabilities
to handle both combined and separate audio streams.
The example:
1. Sets up a basic conversation with an AI assistant
2. Records the entire conversation
3. Saves three separate WAV files:
- A merged recording of both participants
- Individual recording of user audio
- Individual recording of assistant audio
Example usage (run from pipecat root directory):
$ pip install "pipecat-ai[daily,openai,cartesia,silero]"
$ pip install -r dev-requirements.txt
$ python examples/foundational/34-audio-recording.py
Requirements:
- OpenAI API key (for GPT-4)
- Cartesia API key (for text-to-speech)
- Daily API key (for video/audio transport)
Environment variables (.env file):
OPENAI_API_KEY=your_openai_key
CARTESIA_API_KEY=your_cartesia_key
DAILY_API_KEY=your_daily_key
DEEPGRAM_API_KEY=your_deepgram_key
The recordings will be saved in a 'recordings' directory with timestamps:
recordings/
merged_20240315_123456.wav (Combined audio)
user_20240315_123456.wav (User audio only)
bot_20240315_123456.wav (Bot audio only)
Note:
This example requires the AudioBufferProcessor with track-specific audio support,
which provides both 'on_audio_data' and 'on_track_audio_data' events for
handling merged and separate audio tracks respectively.
"""
import asyncio
import datetime
import io
import os
import sys
import wave
import aiofiles
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def save_audio_file(audio: bytes, filename: str, sample_rate: int, num_channels: int):
"""Save audio data to a WAV file."""
if len(audio) > 0:
with io.BytesIO() as buffer:
with wave.open(buffer, "wb") as wf:
wf.setsampwidth(2)
wf.setnchannels(num_channels)
wf.setframerate(sample_rate)
wf.writeframes(audio)
async with aiofiles.open(filename, "wb") as file:
await file.write(buffer.getvalue())
logger.info(f"Audio saved to {filename}")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Recording bot",
DailyParams(
# audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"), audio_passthrough=True)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4")
# Create audio buffer processor
audiobuffer = AudioBufferProcessor()
messages = [
{
"role": "system",
"content": "You are a helpful assistant demonstrating audio recording capabilities. Keep your responses brief and clear.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
audiobuffer, # Add audio buffer to pipeline
context_aggregator.assistant(),
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await audiobuffer.start_recording()
messages.append(
{
"role": "system",
"content": "Greet the user and explain that this conversation will be recorded.",
}
)
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await audiobuffer.stop_recording()
await task.cancel()
# Handler for merged audio
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"recordings/merged_{timestamp}.wav"
os.makedirs("recordings", exist_ok=True)
await save_audio_file(audio, filename, sample_rate, num_channels)
# Handler for separate tracks
@audiobuffer.event_handler("on_track_audio_data")
async def on_track_audio_data(buffer, user_audio, bot_audio, sample_rate, num_channels):
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
os.makedirs("recordings", exist_ok=True)
# Save user audio
user_filename = f"recordings/user_{timestamp}.wav"
await save_audio_file(user_audio, user_filename, sample_rate, 1)
# Save bot audio
bot_filename = f"recordings/bot_{timestamp}.wav"
await save_audio_file(bot_audio, bot_filename, sample_rate, 1)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,230 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Pattern Pair Voice Switching Example with Pipecat.
This example demonstrates how to use the PatternPairAggregator to dynamically switch
between different voices in a storytelling application. It showcases how pattern matching
can be used to control TTS behavior in streaming text from an LLM.
The example:
1. Sets up a storytelling bot with three distinct voices (narrator, male, female)
2. Uses pattern pairs (<voice>name</voice>) to trigger voice switching
3. Processes the patterns in real-time as text streams from the LLM
4. Removes the pattern tags before sending text to TTS
The PatternPairAggregator:
- Buffers text until complete patterns are detected
- Identifies content between start/end pattern pairs
- Triggers callbacks when patterns are matched
- Processes patterns that may span across multiple text chunks
- Returns processed text at sentence boundaries
Example usage (run from pipecat root directory):
$ pip install "pipecat-ai[daily,openai,cartesia,silero]"
$ pip install -r dev-requirements.txt
$ python examples/foundational/35-pattern-pair-voice-switching.py
Requirements:
- OpenAI API key (for GPT-4o)
- Cartesia API key (for text-to-speech)
- Daily API key (for video/audio transport)
Environment variables (.env file):
OPENAI_API_KEY=your_openai_key
CARTESIA_API_KEY=your_cartesia_key
DAILY_API_KEY=your_daily_key
Note:
This example shows one application of PatternPairAggregator (voice switching),
but the same approach can be used for various pattern-based text processing needs,
such as formatting instructions, command recognition, or structured data extraction.
"""
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.utils.text.pattern_pair_aggregator import PatternMatch, PatternPairAggregator
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Define voice IDs
VOICE_IDS = {
"narrator": "c45bc5ec-dc68-4feb-8829-6e6b2748095d", # Narrator voice
"female": "71a7ad14-091c-4e8e-a314-022ece01c121", # Female character voice
"male": "7cf0e2b1-8daf-4fe4-89ad-f6039398f359", # Male character voice
}
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Multi-voice storyteller",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
# Create pattern pair aggregator for voice switching
pattern_aggregator = PatternPairAggregator()
# Add pattern for voice switching
pattern_aggregator.add_pattern_pair(
pattern_id="voice_tag",
start_pattern="<voice>",
end_pattern="</voice>",
remove_match=True,
)
# Register handler for voice switching
def on_voice_tag(match: PatternMatch):
voice_name = match.content.strip().lower()
if voice_name in VOICE_IDS:
voice_id = VOICE_IDS[voice_name]
tts.set_voice(voice_id)
logger.info(f"Switched to {voice_name} voice")
else:
logger.warning(f"Unknown voice: {voice_name}")
pattern_aggregator.on_pattern_match("voice_tag", on_voice_tag)
# Initialize TTS with narrator voice as default
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id=VOICE_IDS["narrator"],
text_aggregator=pattern_aggregator,
)
# Initialize LLM
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
# System prompt for storytelling with voice switching
system_prompt = """You are an engaging storyteller that uses different voices to bring stories to life.
You have three voices to use, but each has a specific purpose:
<voice>narrator</voice>
This is the default narrator voice. Use this for all narration, descriptions, and non-dialogue text.
<voice>female</voice>
Use this ONLY for direct speech by female characters (just the quoted text).
<voice>male</voice>
Use this ONLY for direct speech by male characters (just the quoted text).
IMPORTANT: Switch back to narrator voice immediately after character dialogue.
Here's an EXAMPLE of correct voice usage:
<voice>narrator</voice>
Sarah spotted her old friend across the café. She couldn't believe her eyes.
<voice>female</voice>
"Jacob! It's been so long!"
<voice>narrator</voice>
Sarah exclaimed, jumping up from her seat with a radiant smile.
<voice>male</voice>
"Sarah, is it really you? I can't believe it!"
<voice>narrator</voice>
Jacob replied, grinning widely as he walked over to her. The two friends embraced warmly, as if trying to make up for all the years spent apart.
<voice>female</voice>
"What are you doing in town? Last I heard you were in Seattle."
<voice>narrator</voice>
She asked, gesturing for him to join her at the table.
FOLLOW THESE RULES:
1. Always begin with the narrator voice
2. Only use character voices for the EXACT words they speak (in quotes)
3. SWITCH BACK to narrator voice for speech tags and all other text
4. Begin by asking what kind of story the user would like to hear
5. Create engaging dialogue with distinct characters
Remember: Use narrator voice for EVERYTHING except the actual quoted dialogue."""
# Set up LLM context
messages = [
{
"role": "system",
"content": system_prompt,
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# Create pipeline
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts, # TTS with pattern aggregator
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
logger.info(f"First participant joined: {participant['id']}")
await transport.capture_participant_transcription(participant["id"])
# Start conversation - empty prompt to let LLM follow system instructions
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
logger.info(f"Participant left: {participant['id']}")
await task.cancel()
logger.info(f"Starting storytelling bot at: {room_url}")
logger.info("Join the room to interact with the bot!")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,141 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMService
from pipecat.services.rime import RimeHttpTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def store_user_emails(function_name, tool_call_id, args, llm, context, result_callback):
print(f"User emails: {args}")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
# Cartesia offers a `<spell></spell>` tags that we can use to ask the user
# to confirm the emails.
# (see https://docs.cartesia.ai/build-with-sonic/formatting-text-for-sonic/spelling-out-input-text)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
aiohttp_session=session,
)
# Rime offers a function `spell()` that we can use to ask the user
# to confirm the emails.
# (see https://docs.rime.ai/api-reference/spell)
# tts = RimeHttpTTSService(
# api_key=os.getenv("RIME_API_KEY", ""),
# voice_id="eva",
# aiohttp_session=session,
# )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
# You can aslo register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("store_user_emails", store_user_emails)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "store_user_emails",
"description": "Store user emails when confirmed",
"parameters": {
"type": "object",
"properties": {
"emails": {
"type": "array",
"description": "The list of user emails",
"items": {"type": "string"},
},
},
"required": ["emails"],
},
},
)
]
messages = [
{
"role": "system",
# Cartesia <spell></spell>
"content": "You need to gather a valid email or emails from the user. Your output will be converted to audio so don't include special characters in your answers. If the user provides one or more email addresses confirm them with the user. Enclose all emails with <spell> tags, for example <spell>a@a.com</spell>.",
# Rime spell()
# "content": "You need to gather a valid email or emails from the user. Your output will be converted to audio so don't include special characters in your answers. If the user provides one or more email addresses confirm them with the user. Enclose all emails with spell(), for example spell(a@a.com).",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -154,7 +154,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

View File

@@ -96,8 +96,8 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
text_filter=MarkdownTextFilter(),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
text_filters=[MarkdownTextFilter()],
)
llm = GoogleLLMService(

View File

@@ -303,7 +303,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# tts = CartesiaTTSService(

View File

@@ -7,20 +7,17 @@ import argparse
import asyncio
import os
import sys
from dataclasses import dataclass
from typing import Optional
import google.ai.generativelanguage as glm
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
BotStoppedSpeakingFrame,
EndFrame,
EndTaskFrame,
Frame,
InputAudioRawFrame,
SystemFrame,
StopTaskFrame,
TranscriptionFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
@@ -28,12 +25,17 @@ from pipecat.frames.frames import (
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContextFrame
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.ai_services import LLMService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.google import GoogleLLMContext, GoogleLLMService
from pipecat.transports.services.daily import DailyDialinSettings, DailyParams, DailyTransport
from pipecat.services.google import GoogleLLMService
from pipecat.services.google.google import GoogleLLMContext
from pipecat.transports.services.daily import (
DailyDialinSettings,
DailyParams,
DailyTransport,
)
load_dotenv(override=True)
@@ -44,6 +46,8 @@ logger.add(sys.stderr, level="DEBUG")
daily_api_key = os.getenv("DAILY_API_KEY", "")
daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
system_message = None
class UserAudioCollector(FrameProcessor):
"""This FrameProcessor collects audio frames in a buffer, then adds them to the
@@ -117,7 +121,13 @@ class FunctionHandlers:
self.context_switcher = context_switcher
async def voicemail_response(
self, function_name, tool_call_id, args, llm, context, result_callback
self,
function_name,
tool_call_id,
args,
llm: LLMService,
context,
result_callback,
):
"""Function the bot can call to leave a voicemail message."""
message = """You are Chatbot leaving a voicemail message. Say EXACTLY this message and nothing else:
@@ -127,62 +137,48 @@ class FunctionHandlers:
After saying this message, call the terminate_call function."""
await self.context_switcher.switch_context(system_instruction=message)
await result_callback("Leaving a voicemail message")
async def human_conversation(
self, function_name, tool_call_id, args, llm, context, result_callback
self,
function_name,
tool_call_id,
args,
llm: LLMService,
context,
result_callback,
):
"""Function the bot can when it detects it's talking to a human."""
message = """You are Chatbot talking to a human. Be friendly and helpful.
Start with: "Hello! I'm a friendly chatbot. How can I help you today?"
Keep your responses brief and to the point. Listen to what the person says.
When the person indicates they're done with the conversation by saying something like:
- "Goodbye"
- "That's all"
- "I'm done"
- "Thank you, that's all I needed"
THEN say: "Thank you for chatting. Goodbye!" and call the terminate_call function."""
await self.context_switcher.switch_context(system_instruction=message)
await result_callback("Talking to the customer")
await llm.push_frame(StopTaskFrame(), FrameDirection.UPSTREAM)
async def terminate_call(
function_name, tool_call_id, args, llm: LLMService, context, result_callback
function_name,
tool_call_id,
args,
llm: LLMService,
context,
result_callback,
call_state=None,
):
"""Function the bot can call to terminate the call upon completion of the call."""
await llm.queue_frame(EndTaskFrame(), FrameDirection.UPSTREAM)
if call_state:
call_state.bot_terminated_call = True
await llm.push_frame(EndTaskFrame(), FrameDirection.UPSTREAM)
async def main(
room_url: str,
token: str,
callId: str,
callDomain: str,
callId: Optional[str],
callDomain: Optional[str],
detect_voicemail: bool,
dialout_number: Optional[str],
):
# dialin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
# We don't want to specify dial-in settings if we're not dialing in
dialin_settings = None
if callId and callDomain:
dialin_settings = DailyDialinSettings(call_id=callId, call_domain=callDomain)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
transport_params = DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
dialin_settings=dialin_settings,
@@ -192,8 +188,30 @@ async def main(
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
# transcription_enabled=True,
),
)
else:
transport_params = DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
)
class CallState:
participant_left_early = False
bot_terminated_call = False
call_state = CallState()
transport = DailyTransport(
room_url,
token,
"Chatbot",
transport_params,
)
tts = ElevenLabsTTSService(
@@ -201,6 +219,10 @@ async def main(
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
### VOICEMAIL PIPELINE
tools = [
{
"function_declarations": [
@@ -222,55 +244,67 @@ async def main(
system_instruction = """You are Chatbot trying to determine if this is a voicemail system or a human.
If you hear any of these phrases (or very similar ones):
- "Please leave a message after the beep"
- "No one is available to take your call"
- "Record your message after the tone"
- "You have reached voicemail for..."
- "You have reached [phone number]"
- "[phone number] is unavailable"
- "The person you are trying to reach..."
- "The number you have dialed..."
- "Your call has been forwarded to an automated voice messaging system"
If you hear any of these phrases (or very similar ones):
- "Please leave a message after the beep"
- "No one is available to take your call"
- "Record your message after the tone"
- "You have reached voicemail for..."
- "You have reached [phone number]"
- "[phone number] is unavailable"
- "The person you are trying to reach..."
- "The number you have dialed..."
- "Your call has been forwarded to an automated voice messaging system"
Then call the function switch_to_voicemail_response.
Then call the function switch_to_voicemail_response.
If it sounds like a human (saying hello, asking questions, etc.), call the function switch_to_human_conversation.
If it sounds like a human (saying hello, asking questions, etc.), call the function switch_to_human_conversation.
DO NOT say anything until you've determined if this is a voicemail or human."""
DO NOT say anything until you've determined if this is a voicemail or human."""
llm = GoogleLLMService(
model="models/gemini-2.0-flash-lite-preview-02-05",
voicemail_detection_llm = GoogleLLMService(
model="models/gemini-2.0-flash-lite",
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system_instruction,
tools=tools,
)
context = GoogleLLMContext()
context_aggregator = llm.create_context_aggregator(context)
audio_collector = UserAudioCollector(context, context_aggregator.user())
context_switcher = ContextSwitcher(llm, context_aggregator.user())
voicemail_detection_context = GoogleLLMContext()
voicemail_detection_context_aggregator = voicemail_detection_llm.create_context_aggregator(
voicemail_detection_context
)
context_switcher = ContextSwitcher(
voicemail_detection_llm, voicemail_detection_context_aggregator.user()
)
handlers = FunctionHandlers(context_switcher)
llm.register_function("switch_to_voicemail_response", handlers.voicemail_response)
llm.register_function("switch_to_human_conversation", handlers.human_conversation)
llm.register_function("terminate_call", terminate_call)
pipeline = Pipeline(
[
transport.input(), # Transport user input
audio_collector, # Collect audio frames
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
voicemail_detection_llm.register_function(
"switch_to_voicemail_response", handlers.voicemail_response
)
voicemail_detection_llm.register_function(
"switch_to_human_conversation", handlers.human_conversation
)
voicemail_detection_llm.register_function(
"terminate_call",
lambda *args, **kwargs: terminate_call(*args, **kwargs, call_state=call_state),
)
task = PipelineTask(
pipeline,
voicemail_detection_audio_collector = UserAudioCollector(
voicemail_detection_context, voicemail_detection_context_aggregator.user()
)
voicemail_detection_pipeline = Pipeline(
[
transport.input(), # Transport user input
voicemail_detection_audio_collector, # Collect audio frames
voicemail_detection_context_aggregator.user(), # User responses
voicemail_detection_llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
voicemail_detection_context_aggregator.assistant(), # Assistant spoken responses
]
)
voicemail_detection_pipeline_task = PipelineTask(
voicemail_detection_pipeline,
params=PipelineParams(allow_interruptions=True),
)
@@ -305,25 +339,116 @@ DO NOT say anything until you've determined if this is a voicemail or human."""
# machine to say something like 'Leave a message after the beep', or for the user to say 'Hello?'.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
logger.debug("Detect voicemail; capturing participant transcription")
await transport.capture_participant_transcription(participant["id"])
else:
logger.debug("no dialout number; assuming dialin")
logger.debug("+++++ No dialout number; assuming dialin")
# Different handlers for dialin
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# This event is not firing for some reason
await transport.capture_participant_transcription(participant["id"])
# For the dialin case, we want the bot to answer the phone and greet the user. We
# can prompt the bot to speak by putting the context into the pipeline.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
dialin_instructions = """Always call the function switch_to_human_conversation"""
messages = [
{
"role": "system",
"content": dialin_instructions,
}
]
voicemail_detection_context_aggregator.user().set_messages(messages)
await voicemail_detection_pipeline_task.queue_frames(
[voicemail_detection_context_aggregator.user().get_context_frame()]
)
runner = PipelineRunner()
await runner.run(task)
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
call_state.participant_left_early = True
await voicemail_detection_pipeline_task.queue_frame(EndFrame())
print("!!! starting voicemail detection pipeline")
await runner.run(voicemail_detection_pipeline_task)
print("!!! Done with voicemail detection pipeline")
if call_state.participant_left_early or call_state.bot_terminated_call:
if call_state.participant_left_early:
print("!!! Participant left early; terminating call")
elif call_state.bot_terminated_call:
print("!!! Bot terminated call; not proceeding to human conversation")
return
### HUMAN CONVERSATION PIPELINE
human_conversation_system_instruction = """You are Chatbot talking to a human. Be friendly and helpful.
Start with: "Hello! I'm a friendly chatbot. How can I help you today?"
Keep your responses brief and to the point. Listen to what the person says.
When the person indicates they're done with the conversation by saying something like:
- "Goodbye"
- "That's all"
- "I'm done"
- "Thank you, that's all I needed"
THEN say: "Thank you for chatting. Goodbye!" and call the terminate_call function."""
human_conversation_llm = GoogleLLMService(
model="models/gemini-2.0-flash-001",
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=human_conversation_system_instruction,
tools=tools,
)
human_conversation_context = GoogleLLMContext()
human_conversation_context_aggregator = human_conversation_llm.create_context_aggregator(
human_conversation_context
)
human_conversation_llm.register_function(
"terminate_call",
lambda *args, **kwargs: terminate_call(*args, **kwargs, call_state=call_state),
)
human_conversation_pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
human_conversation_context_aggregator.user(), # User responses
human_conversation_llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
human_conversation_context_aggregator.assistant(), # Assistant spoken responses
]
)
human_conversation_pipeline_task = PipelineTask(
human_conversation_pipeline,
params=PipelineParams(allow_interruptions=True),
)
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await voicemail_detection_pipeline_task.queue_frame(EndFrame())
await human_conversation_pipeline_task.queue_frame(EndFrame())
print("!!! starting human conversation pipeline")
human_conversation_context_aggregator.user().set_messages(
[
{
"role": "system",
"content": human_conversation_system_instruction,
}
]
)
await human_conversation_pipeline_task.queue_frames(
[human_conversation_context_aggregator.user().get_context_frame()]
)
await runner.run(human_conversation_pipeline_task)
print("!!! Done with human conversation pipeline")
if __name__ == "__main__":

View File

@@ -29,18 +29,30 @@ from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
BotSpeakingFrame,
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
InterimTranscriptionFrame,
LLMMessagesFrame,
OutputImageRawFrame,
SpriteFrame,
STTMuteFrame,
TranscriptionFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.stt_mute_filter import STTMuteConfig, STTMuteFilter, STTMuteStrategy
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.processors.frameworks.rtvi import (
RTVIConfig,
RTVIObserver,
RTVIProcessor,
RTVIServerMessageFrame,
)
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -49,6 +61,51 @@ load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionMuteProcessor(FrameProcessor):
"""Takes in STTMuteFrame and mutes TranscriptionFrame based on its content."""
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._is_muted = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and mute TranscriptionFrame based on STTMuteFrame content.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
if isinstance(frame, STTMuteFrame):
self._is_muted = frame.mute
frame = RTVIServerMessageFrame(
data={"type": "user-muted-event", "payload": {"is_muted": self._is_muted}}
)
self.push_frame(frame)
if isinstance(
frame,
(
TranscriptionFrame,
InterimTranscriptionFrame,
LLMMessagesFrame,
),
):
# Only pass frames when not muted
if not self._is_muted:
await self.push_frame(frame, direction)
else:
logger.trace(
f"{frame.__class__.__name__} suppressed - Transcription STT currently muted"
)
else:
await self.push_frame(frame, direction)
sprites = []
script_dir = os.path.dirname(__file__)
@@ -128,7 +185,8 @@ async def main():
camera_out_height=576,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
vad_audio_passthrough=True,
# transcription_enabled=True,
#
# Spanish
#
@@ -183,9 +241,20 @@ async def main():
#
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt_mute_processor = STTMuteFilter(
config=STTMuteConfig(strategies={STTMuteStrategy.ALWAYS}),
)
transcription_mute_processor = TranscriptionMuteProcessor()
pipeline = Pipeline(
[
transport.input(),
stt,
stt_mute_processor,
transcription_mute_processor,
rtvi,
context_aggregator.user(),
llm,
@@ -213,7 +282,7 @@ async def main():
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")

View File

@@ -54,7 +54,7 @@ async def run_bot(
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [

View File

@@ -144,7 +144,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="34dbb662-8e98-413c-a1ef-1a3407675fe7", # Spanish Narrator Man
model="sonic-multilingual",
model="sonic-2",
)
in_language = "English"

View File

@@ -74,7 +74,7 @@ async def run_bot(websocket_client: WebSocket, stream_sid: str, testing: bool):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
push_silence_after_stop=testing,
)

View File

@@ -97,7 +97,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [

View File

@@ -20,17 +20,14 @@ classifiers = [
"Topic :: Scientific/Engineering :: Artificial Intelligence"
]
dependencies = [
"aiohttp~=3.11.11",
"aiohttp~=3.11.13",
"audioop-lts~=0.2.1; python_version>='3.13'",
# We need an older version of `httpx` that doesn't remove the deprecated
# `proxies` argument. This is necessary for Azure and Anthropic clients.
"httpx~=0.27.2",
"loguru~=0.7.3",
"Markdown~=3.7",
"numpy~=1.26.4",
"Pillow~=11.1.0",
"protobuf~=5.29.3",
"pydantic~=2.10.5",
"pydantic~=2.10.6",
"pyloudnorm~=0.1.1",
"resampy~=0.4.3",
"soxr~=0.5.0",
@@ -42,21 +39,22 @@ Source = "https://github.com/pipecat-ai/pipecat"
Website = "https://pipecat.ai"
[project.optional-dependencies]
anthropic = [ "anthropic~=0.45.2" ]
anthropic = [ "anthropic~=0.47.2" ]
assemblyai = [ "assemblyai~=0.36.0" ]
aws = [ "boto3~=1.35.99" ]
azure = [ "azure-cognitiveservices-speech~=1.42.0"]
canonical = [ "aiofiles~=24.1.0" ]
cartesia = [ "cartesia~=1.3.1", "websockets~=13.1" ]
neuphonic = [ "pyneuphonic~=1.5.13", "websockets~=13.1" ]
cerebras = []
deepseek = []
daily = [ "daily-python~=0.14.2" ]
daily = [ "daily-python~=0.15.0" ]
deepgram = [ "deepgram-sdk~=3.8.0" ]
elevenlabs = [ "websockets~=13.1" ]
fal = [ "fal-client~=0.5.6" ]
fish = [ "ormsgpack~=1.7.0", "websockets~=13.1" ]
gladia = [ "websockets~=13.1" ]
google = [ "google-cloud-speech~=2.31.0", "google-cloud-texttospeech~=2.25.0", "google-genai~=1.2.0", "google-generativeai~=0.8.4" ]
google = [ "google-cloud-speech~=2.31.0", "google-cloud-texttospeech~=2.25.0", "google-genai~=1.3.0", "google-generativeai~=0.8.4" ]
grok = []
groq = []
gstreamer = [ "pygobject~=3.50.0" ]
@@ -73,14 +71,16 @@ noisereduce = [ "noisereduce~=3.0.3" ]
openai = [ "websockets~=13.1" ]
openpipe = [ "openpipe~=4.45.0" ]
perplexity = []
playht = [ "pyht~=0.1.6", "websockets~=13.1" ]
playht = [ "pyht~=0.1.12", "websockets~=13.1" ]
rime = [ "websockets~=13.1" ]
riva = [ "nvidia-riva-client~=2.18.0" ]
sentry = [ "sentry-sdk~=2.20.0" ]
silero = [ "onnxruntime~=1.20.1" ]
simli = [ "simli-ai~=0.1.10"]
soundfile = [ "soundfile~=0.13.0" ]
tavus=[]
together = []
ultravox = [ "transformers~=4.48.0", "vllm~=0.7.3" ]
websocket = [ "websockets~=13.1", "fastapi~=0.115.6" ]
whisper = [ "faster-whisper~=1.1.1" ]
openrouter = []

4
scripts/fix-ruff.sh Executable file
View File

@@ -0,0 +1,4 @@
ruff format src
ruff format examples
ruff format tests
ruff check --select I --fix

View File

@@ -0,0 +1,22 @@
from abc import ABC, abstractmethod
from typing import Any, List, Union, cast
from loguru import logger
from pipecat.adapters.schemas.tools_schema import ToolsSchema
class BaseLLMAdapter(ABC):
@abstractmethod
def to_provider_tools_format(self, tools_schema: ToolsSchema) -> List[Any]:
"""Converts tools to the provider's format."""
pass
def from_standard_tools(self, tools: Any) -> List[Any]:
if isinstance(tools, ToolsSchema):
logger.debug(f"Retrieving the tools using the adapter: {type(self)}")
return self.to_provider_tools_format(tools)
# Fallback to return the same tools in case they are not in a standard format
return tools
# TODO: we can move the logic to also handle the Messages here

Some files were not shown because too many files have changed in this diff Show More