Compare commits

...

565 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
d340d044e6 services(playht): ignore message from a different request id 2025-01-14 12:11:04 -08:00
Aleix Conchillo Flaqué
5b632de04a Merge pull request #982 from pipecat-ai/aleix/pipelinetask-cleanup-sink
pipeline(task): cleanup Sink processor
2025-01-14 09:14:03 -08:00
Mark Backman
6bcc196489 Merge pull request #969 from pipecat-ai/mb/deepseek
Add support for DeepSeek LLM
2025-01-14 09:40:06 -05:00
Mark Backman
66375e9dff Update dot-env.template API keys 2025-01-14 09:34:34 -05:00
Mark Backman
bc839492b6 Add support for DeepSeek LLM 2025-01-14 09:34:33 -05:00
Filipi da Silva Fuchter
4854645637 Merge pull request #960 from pipecat-ai/example_gemini_with_goolge_search
Example with Gemini using google search to retrieve news.
2025-01-14 10:07:15 -03:00
Mark Backman
98e80b7d4a Merge pull request #970 from pipecat-ai/mb/user-controlled-run-llm
Add an override_run_llm option to optionally defer function call completion
2025-01-13 18:48:00 -05:00
Mark Backman
8c0ecb89de Refactor for new on_context_updated callback and new frame properties 2025-01-13 17:20:41 -05:00
Aleix Conchillo Flaqué
4c8fcb2cfc pipeline(task): cleanup Sink processor
Fixes #953
2025-01-13 13:29:44 -08:00
Aleix Conchillo Flaqué
92313d6ce7 Merge pull request #972 from pipecat-ai/aleix/simple-chatbot-android-workflow-update
github: only run android simple-chatbot worflow if android example modified
2025-01-13 13:26:12 -08:00
Mark Backman
1ca6ecc46e Update CHANGELOG 2025-01-13 09:49:09 -05:00
Mark Backman
f1947d7d38 Update Anthropic and Gemini to allow overriding run_llm 2025-01-13 09:48:43 -05:00
Mark Backman
0852570212 Update Grok for function call override 2025-01-13 09:48:43 -05:00
Mark Backman
874b8bb136 Allow for an override of running a completion after a function call completes, OpenAI 2025-01-13 09:48:43 -05:00
Mark Backman
da1878537b Merge pull request #974 from pipecat-ai/mb/26d-example
Align 26d example with foundation norms
2025-01-12 19:44:31 -05:00
Mark Backman
f406d93b0f Align 26d example with foundation norms 2025-01-12 19:19:16 -05:00
Aleix Conchillo Flaqué
3cd2b90177 Merge pull request #971 from pipecat-ai/aleix/update-copyright-keep-original-year
update copyright keeping original year (2024)
2025-01-12 11:37:15 -08:00
Aleix Conchillo Flaqué
c4f0c7bcfd github: only run android simple-chatbot worflow if android example modified 2025-01-12 11:35:34 -08:00
Aleix Conchillo Flaqué
95e69597f3 update copyright keeping original year (2024) 2025-01-12 11:34:00 -08:00
Aleix Conchillo Flaqué
710baa5e17 Merge pull request #973 from pipecat-ai/aleix/simple-chatbot-clients
examples/simple-chatbot: move clients to client directory
2025-01-12 11:28:21 -08:00
Mark Backman
8c953bac41 Merge pull request #966 from imsakg/main
fix(services): handle TranscriptionFrame separately in TTSService
2025-01-12 11:33:38 -05:00
Mark Backman
4c0861ce39 Some addition links and README changes 2025-01-12 09:27:23 -05:00
Mark Backman
12b1e1db9d Merge pull request #965 from pipecat-ai/mb/aws-add-session-token
Add optional aws_session_token for PollyTTSService
2025-01-12 09:13:03 -05:00
Mark Backman
53bfdfd83f Merge pull request #963 from pipecat-ai/mb/cleanup-examples
Update examples to align with latest best practices
2025-01-12 09:12:34 -05:00
Mark Backman
2a5593afea Merge pull request #968 from pipecat-ai/mb/readme-websocket
Update README.md
2025-01-12 09:12:19 -05:00
Aleix Conchillo Flaqué
a04a920e54 examples/simple-chatbot: move clients to client directory 2025-01-11 19:16:05 -08:00
Aleix Conchillo Flaqué
2ce6d92455 Merge pull request #959 from KevGTL/fix-livekit-transport
fix: push input audio frame only via push_audio_frame()
2025-01-11 19:03:35 -08:00
Mark Backman
1ecd5da219 Update README.md
Add websocket docs links to README.
2025-01-11 08:37:17 -05:00
Mert Sefa AKGUN
7ec351813c style(ai_services): fix import order with ruff 2025-01-11 13:04:26 +03:00
Mert Sefa AKGUN
df6c2fc403 fix(services): handle TranscriptionFrame separately in TTSService
Exclude TranscriptionFrame from text frame processing in TTSService by updating the type check condition. This resolves unintended processing behavior when handling different frame types.
2025-01-11 13:00:38 +03:00
Mark Backman
71e107725c Add optional aws_session_token for PollyTTSService 2025-01-10 19:33:47 -05:00
Mark Backman
4d0c11fcab Update examples to align with latest best practices 2025-01-10 15:07:06 -05:00
Mark Backman
a8ae79831e Merge pull request #921 from pipecat-ai/mb/playht-http
PlayHTHttpTTSService fixes
2025-01-10 13:26:45 -05:00
Mark Backman
86516d2415 PlayHTHttpTTSService fixes 2025-01-10 13:21:27 -05:00
Vanessa Pyne
5cd9dab14b Merge pull request #949 from imsakg/main
fix(examples): correct TTS service import and setup
2025-01-10 10:58:50 -06:00
Kwindla Hultman Kramer
a3e2e06975 Merge pull request #961 from pipecat-ai/khk/tiny-chatbot-readme-fix
fixed 404 in SimpleChatbot iOS example README
2025-01-10 08:45:05 -08:00
Kwindla Hultman Kramer
e7107b99c5 fixed 404 in SimpleChatbot iOS example README 2025-01-10 08:37:13 -08:00
Filipi Fuchter
aa1b8879ee Fixing ruff format 2025-01-10 13:21:51 -03:00
Mark Backman
6802459165 Merge pull request #956 from pipecat-ai/mb/tavus
Update the Tavus example and comment about using the PERSONA_ID
2025-01-10 11:18:05 -05:00
Filipi Fuchter
6719d1fddc Example with Gemini using google search to retrieve news. 2025-01-10 13:13:59 -03:00
kompfner
a798bf18f2 Merge pull request #955 from pipecat-ai/ios-simple-chatbot-mainactor-fixes
iOS SimpleChatbot @MainActor fixes
2025-01-10 09:37:02 -05:00
Kevin Oury
f9d0cca60f fix: push input audio frame only via push_audio_frame() 2025-01-10 15:02:38 +01:00
Mark Backman
cb22de0d13 Update the Tavus example and comment about using the PERSONA_ID 2025-01-10 08:01:00 -05:00
marcus-daily
7d161cc53b Setting target SDK to 35 2025-01-10 09:50:37 +00:00
marcus-daily
255abf46ef Updating Gradle and AGP 2025-01-10 09:50:37 +00:00
marcus-daily
27579bcb70 Fixing imports 2025-01-10 09:50:37 +00:00
marcus-daily
1295b64879 Updating library dependencies 2025-01-10 09:50:37 +00:00
marcus-daily
ca57670f65 Removing unnecessary drawables 2025-01-10 09:50:37 +00:00
marcus-daily
06d0a231b9 Android demo app for simple-chatbot example 2025-01-10 09:50:37 +00:00
Mert Sefa AKGUN
67af4e619b style(examples): fix ruff formatting in Gemini text example
Refactor `CartesiaTTSService` instantiation to comply with line
length requirements from the ruff linter.
2025-01-10 12:32:53 +03:00
Mert Sefa AKGUN
21c274944e Update examples/foundational/26d-gemini-multimodal-live-text.py
Co-authored-by: Vanessa Pyne <vipyne@gmail.com>
2025-01-10 12:28:13 +03:00
Paul Kompfner
3239249feb In the iOS SimpleChatbot, fix @MainActor-related warnings (which would be errors in Swift 6). The delegate methods aren't contractually guaranteed to run on the main thread, so we can't mark them as @MainActor. 2025-01-09 17:35:44 -05:00
Paul Kompfner
216979c377 Bump iOS SimpleChatbot's pipecat-client-ios-daily dependency to version 0.3.1 2025-01-09 16:22:26 -05:00
Filipi da Silva Fuchter
b9db53d3cd Merge pull request #952 from pipecat-ai/fixing_gemini_function_calling
Fixing GeminiMultimodalLiveLLMService function calling to work with pipecat-flows
2025-01-09 17:50:25 -03:00
Filipi Fuchter
58bfcc8370 Fixing GeminiMultimodalLiveLLMService function calling when using with pipecat-flows. 2025-01-09 12:22:37 -03:00
Mert Sefa AKGUN
6664c492ac feat(gemini): enable audio transcription in live text example
Add options to transcribe both user and model audio during the GeminiMultimodalLiveLLMService setup in the 26d-gemini-multimodal-live-text.py example.
2025-01-09 15:38:33 +03:00
Mert Sefa AKGUN
7634058f97 fix(examples): correct TTS service import and setup
- Update import to use CartesiaTTSService instead of CartesiaMultiLingualTTSService.
- Adjust GeminiMultimodalLiveLLMService setup to use set_model_modalities with TEXT modality.
2025-01-09 02:19:08 +03:00
Mark Backman
39c6446bdc Merge pull request #947 from pipecat-ai/mb/add-rime-set-voices
Add setters for model and voice to RimeHttpTTSService
2025-01-08 14:25:24 -05:00
Filipi da Silva Fuchter
2df7dfcc91 Merge pull request #943 from pipecat-ai/simple_chat_bot_ios
SimpleChatbot iOS app.
2025-01-08 16:17:39 -03:00
Mark Backman
c23c9e046c Add setters for model and voice to RimeHttpTTSService 2025-01-08 14:17:32 -05:00
Mark Backman
9dae753e8c Merge pull request #926 from imsakg/main
feat(gemini): add text handling to GeminiMultimodalLive
2025-01-08 13:42:17 -05:00
Mert Sefa AKGUN
40e9ee6d63 fix(examples): correct import order in Gemini example
- Move `CartesiaMultiLingualTTSService` import to maintain proper order.
- Reorganize `enum` import to adhere to styling standards.
2025-01-08 21:14:29 +03:00
Mert Sefa AKGUN
a342fe732e docs: update CHANGELOG with Gemini modalities and examples 2025-01-08 19:34:42 +03:00
Mert Sefa AKGUN
a729834482 refactor(gemini): reposition WebSocket connection code
Move WebSocket connection setup earlier in the function for better
organization and to prepare for subsequent configuration steps.
2025-01-08 19:29:36 +03:00
Mert Sefa AKGUN
94a6f1086e feat(gemini): change default modality to AUDIO
Modify the default modality in the `InputParams` class from TEXT to AUDIO
to better align with the intended use case for GeminiMultimodalLive
service.
2025-01-08 19:29:36 +03:00
Mert Sefa AKGUN
b42d3a8257 feat(gemini): add modality configuration for GeminiMultimodalLive
- Introduce `GeminiMultimodalModalities` enum for modality options.
- Add modality field to `InputParams`, defaulting to text.
- Simplify modality setup with `set_model_modalities` method.
- Refactor WebSocket configuration to support dynamic response modalities.
2025-01-08 19:29:36 +03:00
Mert Sefa AKGUN
12ae980abe feat(gemini): handle full text response in GeminiMultimodalLive
- Add a buffer to store bot text responses.
- Push a `LLMFullResponseStartFrame` when text begins.
- Clear the text buffer and send `LLMFullResponseEndFrame` after processing.
2025-01-08 19:29:36 +03:00
Mert Sefa AKGUN
cdb909958c feat(examples): add Gemini multimodal live text example
Introduce a new example `26d-gemini-multimodal-live-text.py` to
demonstrate the use of GeminiMultimodalLiveLLMService with text-only
responses. This example sets up a pipeline for audio input via DailyTransport,
processing with Gemini, and output via Cartesia TTS.
2025-01-08 19:29:35 +03:00
Mert Sefa AKGUN
c72c3025f6 feat(gemini): add configuration methods for response modalities
- Introduce `set_model_only_audio` and `set_model_only_text` methods
  to toggle between audio-only and text-only response modes in
  `GeminiMultimodalLiveLLMService`.
- Refactor configuration setup to a class attribute for improved
  reusability and maintenance.
- Remove redundant configuration instantiation in the WebSocket
  connection setup process.
2025-01-08 19:29:35 +03:00
Mert Sefa AKGUN
5cbd719780 feat(gemini): add text handling to GeminiMultimodalLive
- Introduce text attribute in Part class for handling string data.
- Incorporate text processing in GeminiMultimodalLiveLLMService to push TextFrame if text is present.
2025-01-08 19:29:35 +03:00
Filipi Fuchter
23d6290672 Removing not used class. 2025-01-08 12:05:04 -03:00
Filipi Fuchter
d4e7e11981 SimpleChatbot iOS app. 2025-01-08 12:00:11 -03:00
Mark Backman
8057fe3fcf Merge pull request #742 from Vaibhav159/vl_feature_websocket_fastapi_timeout
adding session_timeout param
2025-01-08 09:05:41 -05:00
Vaibhav159
3b446234a7 fix hyperlink 2025-01-08 10:54:27 +05:30
Vaibhav159
768487ffb3 final changelog 2025-01-08 10:53:32 +05:30
Vaibhav159
2da5620d10 adding changelog 2025-01-08 10:50:09 +05:30
Vaibhav159
af90d65b3b adding session timeout example in websocket-server example 2025-01-08 10:43:10 +05:30
Vaibhav159
c8569a7b67 Merge remote-tracking branch 'upstream/main' into vl_feature_websocket_fastapi_timeout 2025-01-08 10:21:36 +05:30
Vaibhav159
0ecd98c873 Merge branch 'main' into vl_feature_websocket_fastapi_timeout 2025-01-08 10:20:55 +05:30
Mark Backman
6f863ba2c6 Merge pull request #938 from jcbjoe/jg/optional-authentication-polly
Changed Polly authentication params to be optional
2025-01-07 15:37:23 -05:00
Mark Backman
602ca5ebe6 Merge pull request #939 from Vaibhav159/vl_adding_daily_room_properties
adding more daily room params
2025-01-07 14:33:59 -05:00
Vaibhav159
787ade41f3 adding missing doc string 2025-01-08 00:58:01 +05:30
Joe Garlick
bb767831d5 Added: Changelog entry 2025-01-07 19:05:02 +00:00
Mark Backman
bc25a771dc Merge pull request #935 from pipecat-ai/hush/modalUpdate
docs: update dependencies for modal demo
2025-01-07 13:57:46 -05:00
Vaibhav159
f37626f81d adding more daily room params 2025-01-07 21:38:05 +05:30
Mark Backman
9d54578e65 Merge pull request #934 from pipecat-ai/mb/bump-open-ai-version
Bump openai version to 1.59.0 for realtime and model updates
2025-01-07 08:29:45 -05:00
Joe Garlick
79afe7ec2a Changed: Polly authentication information to be optional 2025-01-07 11:43:57 +00:00
James Hush
2c1fd3c3cc docs: update dependencies for modal demo 2025-01-07 15:45:55 +08:00
Mark Backman
b0dd8e03a6 Bump openai version to 1.59.0 for realtime and model updates 2025-01-06 17:05:22 -05:00
Mark Backman
ee20e48ef8 Merge pull request #931 from pipecat-ai/mb/fix-openai-realtime-
Fix truncation timing of OpenAIRealtimeBetaLLMService
2025-01-06 16:25:09 -05:00
Mark Backman
12b5c5a646 Fix truncation timing of OpenAIRealtimeBetaLLMService 2025-01-06 15:37:58 -05:00
Mark Backman
7a021cc82d Merge pull request #929 from pipecat-ai/mb/add-google-journey-support
Added support for Google Journey TTS voices
2025-01-06 15:13:00 -05:00
Mark Backman
3e1ec4a8ee Added support for Google Journey TTS voices 2025-01-06 14:54:34 -05:00
Mark Backman
a1377b7f1a Merge pull request #924 from xtreme-sameer-vohra/patch-1
Update frames.py
2025-01-06 14:13:10 -05:00
Mark Backman
d6335886e2 Merge pull request #848 from Vaibhav159/vl_add_audio_and_chat_livekit_example
adding example for livekit audio and chat version
2025-01-06 13:27:38 -05:00
Vaibhav159
b3b7a5f023 adding 2025 license 2025-01-06 22:10:46 +05:30
Vaibhav159
5138017b57 ruff changes 2025-01-06 22:07:59 +05:30
Vaibhav159
87670067d7 adding changelog 2025-01-06 22:03:11 +05:30
Vaibhav159
656cd2859e Merge branch 'main' into vl_add_audio_and_chat_livekit_example 2025-01-06 21:57:43 +05:30
Mark Backman
15b2cc210c Merge pull request #927 from pipecat-ai/mb/update-copyright
Update copyright to 2025
2025-01-06 10:33:04 -05:00
Mark Backman
4667624b60 Update copyright to 2025 2025-01-06 10:19:37 -05:00
Sameer Vohra
d07ba80572 Update frames.py
fix minor typo in docs
2025-01-05 22:57:54 -05:00
Aleix Conchillo Flaqué
386ba61483 Merge pull request #909 from pipecat-ai/aleix/pipecat-0.0.52
update CHANGELOG for 0.0.52
2024-12-24 08:16:05 -08:00
Aleix Conchillo Flaqué
e9d275f270 update CHANGELOG for 0.0.52 2024-12-23 19:52:34 -08:00
Aleix Conchillo Flaqué
3a4994370c update README 2024-12-23 19:20:23 -08:00
Aleix Conchillo Flaqué
6125ea882d update README 2024-12-23 19:19:39 -08:00
Aleix Conchillo Flaqué
0a1ce1bb63 update CHANGELOG 2024-12-23 19:13:59 -08:00
Kwindla Hultman Kramer
ab3bcde5f7 Merge pull request #907 from pipecat-ai/khk/gemini-20241221
Gemini unary API fixes and natural conversation demo
2024-12-23 17:34:57 -08:00
Kwindla Hultman Kramer
1368d3db5c revert elevenlabs example changes 2024-12-23 17:33:59 -08:00
Aleix Conchillo Flaqué
cd7dec7391 Merge pull request #906 from pipecat-ai/aleix/fix-duplicate-base-output-frames
transports(base_output): fix duplicate push_frame()
2024-12-23 06:12:31 -08:00
Kwindla Hultman Kramer
a5e985094b remove stray line 2024-12-22 19:45:57 -08:00
Aleix Conchillo Flaqué
c04c69df95 transports(base_output): fix duplicate push_frame() 2024-12-22 14:43:38 -08:00
Aleix Conchillo Flaqué
9c105e25ac Merge pull request #905 from pipecat-ai/aleix/daily-python-0.14.2
pyproject: update daily-python to 0.14.2
2024-12-22 13:03:25 -08:00
Aleix Conchillo Flaqué
6901c4fa57 pyproject: update daily-python to 0.14.2 2024-12-22 12:30:17 -08:00
Mark Backman
469c13c07e Merge pull request #903 from pipecat-ai/mb/send-prebuilt-chat
Add the ability to send_prebuilt_chat_message when using the DailyTra…
2024-12-22 14:33:50 -05:00
Mark Backman
46871ae686 Merge pull request #899 from pipecat-ai/mb/add-fish-audio
Add Fish Audio TTS service
2024-12-22 14:26:59 -05:00
Kwindla Hultman Kramer
ab5df1a236 feature complete gemini audio, transcription, and phrase endpointing demo 2024-12-22 11:19:02 -08:00
Kwindla Hultman Kramer
f5f0de00e4 still some cleanup to do 2024-12-21 23:04:00 -08:00
Kwindla Hultman Kramer
f3dd35bfd9 working but needs cleanup 2024-12-21 22:18:56 -08:00
Kwindla Hultman Kramer
53a5e63990 function calling dead-end 2024-12-21 18:10:25 -08:00
Kwindla Hultman Kramer
d435a6a6d6 fixes to audio buffer 2024-12-21 16:22:53 -08:00
Kwindla Hultman Kramer
59240c7b96 delay gemini multimodal live websocket connect 2024-12-21 14:36:37 -08:00
Mark Backman
6c11753985 Add the ability to send_prebuilt_chat_message when using the DailyTransport 2024-12-21 14:04:46 -05:00
Mark Backman
6fabb7e7d5 Fix metrics calculations 2024-12-21 13:25:43 -05:00
Mark Backman
bce218915e Add Fish to the README 2024-12-21 12:54:07 -05:00
Mark Backman
627c91f4a6 Flush the audio 2024-12-21 12:52:28 -05:00
Mark Backman
dac4468ca1 Add Fish Audio TTS service 2024-12-21 12:42:56 -05:00
Mark Backman
503eddf7d6 Merge pull request #897 from pipecat-ai/mb/update-playht
Update PlayHT to use the latest Websocket connection endpoint
2024-12-20 20:31:41 -05:00
Aleix Conchillo Flaqué
1a0f6f2a21 Merge pull request #898 from pipecat-ai/aleix/reset-input-queue-flag-if-interruption
frame_processor: reset input queue flag with interruptions
2024-12-20 13:58:12 -08:00
Aleix Conchillo Flaqué
43759295cc frame_processor: reset input queue flag with interruptions 2024-12-20 09:33:20 -08:00
Mark Backman
900b95eb92 Update PlayHT to use the latest Websocket connection endpoint 2024-12-20 10:44:47 -05:00
marcus-daily
41d07692ca Fix import order 2024-12-20 14:30:38 +00:00
marcus-daily
dcf6b6e120 Add an RTVIProcessor to the simple-chatbot pipeline 2024-12-20 14:30:38 +00:00
Mark Backman
99dba3b6b9 Merge pull request #893 from pipecat-ai/mb/changelog-11L
Added an `auto_mode` input parameter to `ElevenLabsTTSService`
2024-12-19 21:38:06 -05:00
Aleix Conchillo Flaqué
4547609ffb examples(01a): remove unused import 2024-12-19 17:49:27 -08:00
Mark Backman
9554804a49 Update 11L default model, allow language to be used by more models 2024-12-19 20:33:58 -05:00
Mark Backman
656cbc35e1 Make auto_mode an input parametere for ElevenLabsTTSService; add changelog entry 2024-12-19 20:33:56 -05:00
Aleix Conchillo Flaqué
6f7c4dd998 Merge pull request #894 from pipecat-ai/aleix/daily-python-0.14.0
transports(daily): update to daily-python 0.14.0
2024-12-19 17:14:31 -08:00
Aleix Conchillo Flaqué
8b496f8c6f transports(daily): daily-python 0.14.0 (SIP transfer/refer, DTMF) 2024-12-19 17:08:29 -08:00
Aleix Conchillo Flaqué
15047f5f0a Merge pull request #885 from pipecat-ai/aleix/parallelpipeline-wait-for-slowest-endframe
pipeline(parallel): wait for slowest endframe
2024-12-19 15:18:22 -08:00
Aleix Conchillo Flaqué
e08c24dc41 Merge pull request #883 from pipecat-ai/aleix/base-output-transport-avoid-pushing-endframe
transport(base output): avoid pushing EndFrame twice
2024-12-19 11:26:31 -08:00
Aleix Conchillo Flaqué
5341739ece transport(base output): avoid pushing EndFrame twice 2024-12-19 11:19:49 -08:00
Mark Backman
5b0fc3fa15 Merge pull request #891 from louisjoecodes/louis/flush-shorter-messages-elevenlabs
feat: set auto_mode=true - ElevenLabs tts WSS
2024-12-19 12:08:04 -05:00
Louis Jordan
b7b8e59e9e feat: set auto_mode=true - ElevenLabs tts WSS 2024-12-19 16:57:17 +00:00
Mark Backman
6e0d3aef32 Merge pull request #860 from pipecat-ai/mb/transcription
Add a TranscriptProcessor and new frames
2024-12-19 08:15:53 -05:00
Mark Backman
1ccc84dd7a Merge pull request #888 from pipecat-ai/mb/add-cerebras
Add CerebrasLLMService and foundational example
2024-12-19 08:14:53 -05:00
Mark Backman
c9dd906057 Tailor chat completion inputs to Cerebras API 2024-12-19 08:10:33 -05:00
Mark Backman
4f093f11db Add CerebrasLLMService and foundational example 2024-12-19 08:10:31 -05:00
Mark Backman
887a9170b2 Merge pull request #889 from pipecat-ai/mb/openai-realtime-model
Add model parameter to OpenAI realtime service constructor, update de…
2024-12-19 08:08:52 -05:00
Aleix Conchillo Flaqué
f2e191855a Merge pull request #881 from pipecat-ai/aleix/langchain-updates
pyproject: update langchaing to 0.3.12
2024-12-18 19:42:39 -08:00
Aleix Conchillo Flaqué
78b90e9591 Merge pull request #884 from pipecat-ai/aleix/filters-handle-endframe
processors(filters): allow passing EndFrame
2024-12-18 19:35:56 -08:00
Aleix Conchillo Flaqué
17decee788 Merge pull request #882 from pipecat-ai/aleix/stop-transport-parent-first
transports: call parent stop() before disconnecting
2024-12-18 19:35:39 -08:00
Aleix Conchillo Flaqué
f89014d100 pyproject: update langchaing to 0.3.12 2024-12-18 19:34:49 -08:00
Mark Backman
3b3e22fe7c Add model parameter to OpenAI realtime service constructor, update default model 2024-12-18 18:12:51 -05:00
Aleix Conchillo Flaqué
0df0194cc1 Merge pull request #886 from pipecat-ai/aleix/koala-noise-suppression
audio(koala): add new audio filter KoalaFilter
2024-12-18 14:02:04 -08:00
Mark Backman
8a7a61914e Code review feedback 2024-12-17 22:35:13 -05:00
Mark Backman
1117c21483 Refactor TranscriptProcessor into user and assistant processors 2024-12-17 22:34:22 -05:00
Mark Backman
4211664a77 TranscriptProcessor to handle simple and list content 2024-12-17 22:34:03 -05:00
Mark Backman
1f8a217cd1 Code review changes 2024-12-17 22:34:02 -05:00
Mark Backman
b5bd662fe1 Add changelog and rename examples 2024-12-17 22:33:39 -05:00
Mark Backman
dd2703317a Add timestamp frames and include timestamps in the transcription event and frame 2024-12-17 22:31:15 -05:00
Mark Backman
77aeda36eb Update OpenAI's from_standard_message to convert back to OpenAI's simple format 2024-12-17 22:31:15 -05:00
Mark Backman
51b235df4b Add docstrings for Google and Anthropic's to_standard_messages and from_standard_message functions 2024-12-17 22:31:15 -05:00
Mark Backman
4f2aee5fba Update OpenAI's to_standard_messages to return the verboase message format 2024-12-17 22:31:15 -05:00
Mark Backman
55879bf365 Add TranscriptionProcessor 2024-12-17 22:31:15 -05:00
Aleix Conchillo Flaqué
7322badbe7 audio(koala): add new audio filter KoalaFilter 2024-12-17 18:45:10 -08:00
Aleix Conchillo Flaqué
42bea578e8 pipeline(parallel): wait for slowest endframe
If we are sending an EndFrame and a ParallelPipeline has multiple pipelines we
want to wait before pushing the EndFrame downstream until the slowest pipeline
is finished. Otherwise, we could be disconnecting from the transport too early.
2024-12-17 17:05:11 -08:00
Aleix Conchillo Flaqué
2dfdceb9e6 processors(filters): allow passing EndFrame 2024-12-17 16:22:19 -08:00
Aleix Conchillo Flaqué
5bfcac1f5c transports: call parent stop() before disconnecting
This rollbacks a previous change https://github.com/pipecat-ai/pipecat/pull/855
which was trying to fix an issue in the wrong way.

The reasoning behind this fix is that the parent class might be sending audio or
messages (through the subclass) and if we disconnect before all the data is sent
we will run into incomplete audio or even errors. Therefore, we first make sure
the parent tasks stop and then it will be safe to disconnect.
2024-12-17 16:02:33 -08:00
Aleix Conchillo Flaqué
fb9f72d38b Merge pull request #880 from pipecat-ai/aleix/ruff-check-import-linter
ruff check import linter
2024-12-17 14:14:47 -08:00
Aleix Conchillo Flaqué
146a341a38 Merge pull request #879 from Vaibhav159/vl_add_readme_for_ruff_formatter_in_pycharm
updating readme to support auto-formatting of ruff in pycharm
2024-12-17 11:49:01 -08:00
Aleix Conchillo Flaqué
b9ca667d31 pyproject: use tool.ruff.lint sections 2024-12-17 11:40:43 -08:00
Aleix Conchillo Flaqué
5c57cccea3 github: run ruff check import linter 2024-12-17 11:29:28 -08:00
Aleix Conchillo Flaqué
17162258a2 fix ruff linter import organization 2024-12-17 11:28:58 -08:00
Aleix Conchillo Flaqué
da3fb98101 examples(storytelling-chatbot): update dependencies 2024-12-17 11:24:50 -08:00
Aleix Conchillo Flaqué
6244124d14 README: added Emacs import re-organization with Ruff 2024-12-17 11:20:18 -08:00
Vaibhav159
53049adeea removing --config flag 2024-12-18 00:47:00 +05:30
Vaibhav159
4208d2d7c4 updating readme to support auto-formatting of ruff in pycharm 2024-12-17 23:38:36 +05:30
Mark Backman
9f7f74e4d8 Merge pull request #869 from Vaibhav159/vl_fixing_deepgram_language_bug_#868
fixing [#868] bug where deepgram client fails due to langauge
2024-12-17 12:50:57 -05:00
Vaibhav159
f14d32d09e fixing ruff issue 2024-12-17 23:11:18 +05:30
Vaibhav159
7351e281e2 ruff change 2024-12-17 22:21:56 +05:30
Vaibhav159
b94b10f7d6 added change log 2024-12-17 22:11:52 +05:30
Vaibhav159
1cc90eb1a3 Merge branch 'main' into vl_fixing_deepgram_language_bug_#868 2024-12-17 22:09:30 +05:30
Vaibhav159
5f7d28bb05 adding type check and value check 2024-12-17 22:07:35 +05:30
Mark Backman
204a08ab8f Merge pull request #877 from pipecat-ai/mb/grok-function-calling-fix
Add custom assistant context aggregator for Grok due to content requi…
2024-12-17 10:51:19 -05:00
Aleix Conchillo Flaqué
141b0a6560 sentry: fix formatting 2024-12-17 07:14:31 -08:00
Mark Backman
ca086a856f Add custom assistant context aggregator for Grok due to content requirement in function calling 2024-12-17 09:11:21 -05:00
Aleix Conchillo Flaqué
fe0a7d07bd update CHANGELOG 2024-12-16 21:02:38 -08:00
Aleix Conchillo Flaqué
79eb29d614 Merge pull request #875 from pipecat-ai/aleix/update-dependencies
update dependencies
2024-12-16 20:58:30 -08:00
Aleix Conchillo Flaqué
da15c83bab fix ruff formatting 2024-12-16 20:52:40 -08:00
Aleix Conchillo Flaqué
d6bac77b3c pyproject: add audioop-lts for python 3.13 2024-12-16 20:50:25 -08:00
Aleix Conchillo Flaqué
7faa4eb295 update dev-requirements 2024-12-16 20:50:25 -08:00
Aleix Conchillo Flaqué
0e31413851 pyproject: update numpy, pydantic, loguru 2024-12-16 19:20:34 -08:00
Aleix Conchillo Flaqué
16948b251d services: fix infinite websocket-bases TTS services retries
Fixes #871
2024-12-16 16:36:44 -08:00
Mark Backman
f3112a8638 Merge pull request #866 from pipecat-ai/mb/readme-links
Fix a bunch of README docs links
2024-12-16 10:51:01 -05:00
Mark Backman
0293d40e4e Merge pull request #870 from pipecat-ai/mb/dotenv
Add python-dotenv to dev-requirements.txt
2024-12-16 10:50:46 -05:00
Mark Backman
64038442ed Add python-dotenv to dev-requirements.txt 2024-12-16 09:23:12 -05:00
Vaibhav159
facc280599 fixing [#868] bug where deepgram client fails due to langauge 2024-12-16 17:47:50 +05:30
Mark Backman
f90cbe8086 Fix a bunch of README docs links 2024-12-15 14:30:20 -05:00
Mark Backman
09a611d44b Merge pull request #856 from pipecat-ai/mb/daily-rest-helpers
Remove default 5 min exp time for created rooms, add docstrings
2024-12-13 12:08:58 -05:00
Mark Backman
16d7fb2c4a Remove default 5 min exp time for created rooms, add docstrings 2024-12-13 12:02:26 -05:00
Aleix Conchillo Flaqué
643160c960 Merge pull request #858 from pipecat-ai/aleix/fastpitch-timeout
riva: make sure we don't block on fastpitch
2024-12-13 08:20:38 -08:00
Aleix Conchillo Flaqué
aac907aadb riva: make sure we don't block on fastpitch 2024-12-13 07:32:51 -08:00
Aleix Conchillo Flaqué
8f24ca4e58 Merge pull request #857 from pipecat-ai/aleix/fix-riva-tts-audio-stuttering
riva: fix FastPitchTTSService audio stuttering
2024-12-12 22:20:00 -08:00
Aleix Conchillo Flaqué
420ce16807 riva: fix FastPitchTTSService audio stuttering 2024-12-12 22:15:44 -08:00
Aleix Conchillo Flaqué
2b8c35c681 Merge pull request #855 from pipecat-ai/aleix/transport-services-disconnect-fixes
transports(services): disconnect client first
2024-12-12 19:40:03 -08:00
Mark Backman
3d96369193 Merge pull request #852 from pipecat-ai/mb/readme-docs-badge
Add docs badge to README
2024-12-12 22:21:41 -05:00
Aleix Conchillo Flaqué
d44b36a07c Merge pull request #854 from pipecat-ai/aleix/aiservice-add-missing-process-frame
AIService: add missing super().process_frame()
2024-12-12 19:10:21 -08:00
Aleix Conchillo Flaqué
ccc96994e9 pyproject: update livekit 2024-12-12 19:09:36 -08:00
Aleix Conchillo Flaqué
337d421338 transports: disconnect client first 2024-12-12 19:09:06 -08:00
Aleix Conchillo Flaqué
752720b4d5 AIService: add missing super().process_frame() 2024-12-12 17:25:38 -08:00
Aleix Conchillo Flaqué
f8e69cfa00 Merge pull request #853 from pipecat-ai/revert-849-aleix/no-need-for-super-process-frame
Revert "no longer necessary to call super().process_frame(frame, direction)"
2024-12-12 17:21:20 -08:00
Aleix Conchillo Flaqué
6d11911d83 Revert "no longer necessary to call super().process_frame(frame, direction)" 2024-12-12 17:03:40 -08:00
Mark Backman
ec6e71c8ea Add docs badge to README 2024-12-12 18:08:24 -05:00
Aleix Conchillo Flaqué
10f854aeba Merge pull request #846 from pipecat-ai/aleix/base-output-transport-audio-sync
transport(output): fix non-audio frames sync after audio frames
2024-12-12 14:29:42 -08:00
Aleix Conchillo Flaqué
d8caf007b0 Merge pull request #849 from pipecat-ai/aleix/no-need-for-super-process-frame
no longer necessary to call super().process_frame(frame, direction)
2024-12-12 14:29:10 -08:00
Mark Backman
26ea64ef12 Merge pull request #850 from pipecat-ai/mb/fix-docs-builds
Fix docs generation build issues
2024-12-12 17:27:00 -05:00
Mark Backman
19c178ebc7 Fix docs generation build issues 2024-12-12 17:18:04 -05:00
Aleix Conchillo Flaqué
3c3fd67d96 no longer necessary to call super().process_frame(frame, direction) 2024-12-12 13:03:41 -08:00
Mark Backman
7bbc0ee8df Merge pull request #845 from pipecat-ai/mb/more-docs-updates
Docs auto-gen improvements
2024-12-12 15:42:34 -05:00
Mark Backman
67804edce6 Remove formats from .readthedocs.yaml 2024-12-12 15:41:11 -05:00
Mark Backman
ec082d0888 Remove deprecated VAD module 2024-12-12 15:32:38 -05:00
Mark Backman
8631d71d5a Fix more missing docs 2024-12-12 15:16:37 -05:00
Vaibhav159
62fc95300b adding livekit audio and chat version 2024-12-13 01:09:47 +05:30
Aleix Conchillo Flaqué
db7eaed980 transport(output): fix non-audio frames sync after audio frames 2024-12-12 10:56:02 -08:00
Mark Backman
44c5220104 Update README 2024-12-12 13:28:05 -05:00
Mark Backman
276fd86ecb More fixes for missing packages 2024-12-12 13:25:13 -05:00
Mark Backman
2de0737056 Merge pull request #844 from pipecat-ai/cb-gemini-example-fix
Update requirements.txt for simple-chatbot
2024-12-12 11:18:58 -05:00
Mark Backman
b5d5a0e923 Add special cases for displaying some names 2024-12-12 11:15:36 -05:00
Mark Backman
f3ed12c30b Clean up module and package display names 2024-12-12 11:11:53 -05:00
Mark Backman
e14399727b Add README and build script for local testing 2024-12-12 11:06:53 -05:00
Mark Backman
414dcf9810 Improve TOC in sidebar, fix missing services 2024-12-12 11:06:09 -05:00
chadbailey59
88d530e840 Update requirements.txt for simple-chatbot
The gemini example doesn't actually work from a fresh install, because the requirements.txt file doesn't include google :)
2024-12-12 09:31:15 -06:00
Aleix Conchillo Flaqué
af821d8e95 Merge pull request #841 from pipecat-ai/aleix/aws-to-polly
polly: renamed AWSTTSService to PollyTTSService
2024-12-11 18:13:02 -08:00
Aleix Conchillo Flaqué
133e1aff6c polly: renamed AWSTTSService to PollyTTSService 2024-12-11 17:56:43 -08:00
Aleix Conchillo Flaqué
def415f476 Merge pull request #840 from pipecat-ai/aleix/11labs-playht-more-languages
tts: support more languages in playht and elevenlabs
2024-12-11 14:58:03 -08:00
Aleix Conchillo Flaqué
a34d16dabe tts: support more languages in playht and elevenlabs 2024-12-11 14:53:24 -08:00
Mark Backman
ec7260b237 Merge pull request #839 from pipecat-ai/mb/bump-versions
Bump openai and aiohttp package versions
2024-12-11 17:06:15 -05:00
Mark Backman
96c6c71d5b Bump openai and aiohttp package versions 2024-12-11 16:48:36 -05:00
Aleix Conchillo Flaqué
8e140b2be6 Merge pull request #838 from pipecat-ai/aleix/prepare-0.0.50
update CHANGELOG fot 0.0.50
2024-12-11 11:49:15 -08:00
Aleix Conchillo Flaqué
a70c785b2e update CHANGELOG fot 0.0.50 2024-12-11 11:33:13 -08:00
Aleix Conchillo Flaqué
f1d3c5e9ad Merge pull request #837 from pipecat-ai/aleix/update-protobuf-to-5.29.1
pyproject: update protobuf to 5.29.1
2024-12-11 11:31:49 -08:00
Aleix Conchillo Flaqué
346329ba73 pyproject: update protobuf to 5.29.1 2024-12-11 11:29:48 -08:00
Aleix Conchillo Flaqué
6089d4255c Merge pull request #836 from pipecat-ai/aleix/moondream-studypal-fixes
examples: fixes for moondream-chatbot and studypal
2024-12-11 11:16:09 -08:00
Aleix Conchillo Flaqué
cff9bb6068 Merge pull request #835 from pipecat-ai/aleix/even-more-parallel-pipeline-fixes
parallel_pipeline: fix system frames and parallel pipelines again
2024-12-11 11:15:59 -08:00
Aleix Conchillo Flaqué
fdefdc9d68 Merge pull request #834 from pipecat-ai/aleix/transcription-are-text
frames: transcriptions should be TextFrames as before
2024-12-11 11:15:43 -08:00
Aleix Conchillo Flaqué
2dd418a38d parallel_pipeline: fix system frames and parallel pipelines again
The previous fixes didn't take into account that system frames can be generated
inside the internal pipelines.
2024-12-11 10:55:04 -08:00
Aleix Conchillo Flaqué
42f5ec20f6 examples: fixes for moondream-chatbot and studypal 2024-12-11 10:46:38 -08:00
Aleix Conchillo Flaqué
5b5125b74c frames: transcriptions should be TextFrames as before 2024-12-11 10:42:38 -08:00
Mark Backman
be4df5f713 Merge pull request #833 from pipecat-ai/mb/update-changelog-for-gemini
Update the CHANGELOG and README for Gemini Multimodal Live
2024-12-11 11:41:42 -05:00
Mark Backman
5418cdc4d1 Update the CHANGELOG and README for Gemini Multimodal Live 2024-12-11 11:40:16 -05:00
Mark Backman
6c9f5a81dc Merge pull request #832 from pipecat-ai/khk/gemini-live-function-calling
Gemini Multimodal Live function calling example
2024-12-11 11:39:19 -05:00
Mark Backman
027e360436 Fix demo numbering and prompt the bot to say hi in 26b 2024-12-11 11:36:38 -05:00
Kwindla Hultman Kramer
c219172266 Gemini Multimodal Live function calling example 2024-12-11 08:29:09 -08:00
Mark Backman
7b040be209 Merge pull request #830 from pipecat-ai/khk/gemini-multimodal-live
Gemini Multimodal Live API service
2024-12-11 11:25:55 -05:00
Mark Backman
0d74531f36 Minor changes to demos 2024-12-11 11:23:59 -05:00
Mark Backman
3341c4f608 Merge pull request #831 from pipecat-ai/mb/gemini-simple-chatbot
Gemini updates to the simple-chatbot demo
2024-12-11 11:15:15 -05:00
Mark Backman
1e45e55528 Add copyright block to audio_transcriber 2024-12-11 11:06:48 -05:00
Mark Backman
8086a94e49 Renumber foundational demos 2024-12-11 10:56:51 -05:00
Kwindla Hultman Kramer
81895f4a5c Gemini Multimodal Live API service 2024-12-11 07:38:23 -08:00
Mark Backman
2846d6f461 Update READMEs and comment files 2024-12-11 00:06:35 -05:00
Mark Backman
14f309ce2b Add Gemini Live bot file 2024-12-10 22:25:17 -05:00
Aleix Conchillo Flaqué
62ec2f5d1e Merge pull request #814 from pipecat-ai/aleix/simli-updates
minor simli updates
2024-12-10 18:48:29 -08:00
Aleix Conchillo Flaqué
4f9a4ebce2 Merge pull request #820 from pipecat-ai/aleix/more-parallelpipeline-fixes
parallel_pipeline: fix system frames again
2024-12-10 18:43:34 -08:00
Aleix Conchillo Flaqué
5b478a5c7a add SimliVideoService to CHANGELOG 2024-12-10 18:42:26 -08:00
Aleix Conchillo Flaqué
87c1f2bcce services(simli): remove ready flag, events vs sleep, handle CancelledError 2024-12-10 18:42:12 -08:00
Aleix Conchillo Flaqué
b85072637f examples(26-simli-layer): use room returned by configure() 2024-12-10 18:42:12 -08:00
Aleix Conchillo Flaqué
ffe1e023e7 Merge pull request #819 from pipecat-ai/aleix/fix-openaillmcontext-from-image-frame
fix OpenAILLMContext from image frame
2024-12-10 18:39:55 -08:00
Aleix Conchillo Flaqué
9a358b2e86 Merge pull request #824 from pipecat-ai/aleix/openpipe-use-openai-base-service
services(openpipe): use OpenAILLMService to get access to aggregators
2024-12-10 18:34:46 -08:00
Aleix Conchillo Flaqué
b034c6e247 Merge pull request #821 from pipecat-ai/aleix/update-pyproject
pyproject: update onnxruntime, whisper and azure
2024-12-10 18:34:27 -08:00
Aleix Conchillo Flaqué
c7ca0eea0f Merge pull request #823 from pipecat-ai/aleix/fix-15a-switch-languages
examples: fix 15a-switch-languages pipeline
2024-12-10 18:34:13 -08:00
Aleix Conchillo Flaqué
29d931cdcd Merge pull request #822 from pipecat-ai/aleix/fix-11-sound-effects
examples: fix 11-sound-effects
2024-12-10 18:33:53 -08:00
Aleix Conchillo Flaqué
ecf0c61af9 services(openpipe): use OpenAILLMService to get access to aggregators 2024-12-10 18:29:03 -08:00
Aleix Conchillo Flaqué
67e8252d76 examples: fix 15a-switch-languages pipeline 2024-12-10 18:27:49 -08:00
Aleix Conchillo Flaqué
775aa9493e examples: fix 11-sound-effects 2024-12-10 18:25:43 -08:00
Aleix Conchillo Flaqué
c446f91d4a pyproject: update onnxruntime, whisper and azure 2024-12-10 18:16:27 -08:00
Aleix Conchillo Flaqué
7b6bbc29ed parallel_pipeline: fix system frames again 2024-12-10 18:12:33 -08:00
Aleix Conchillo Flaqué
9e7ecccf1e google: fix VisionImageRawFrame context 2024-12-10 17:39:52 -08:00
Aleix Conchillo Flaqué
a618bd3fa6 openai: remove from_image_frame() and use add_image_frame_message() 2024-12-10 17:39:52 -08:00
Aleix Conchillo Flaqué
246c825a82 examples: rename 07p-interruptible-google-audio-in to 07s 2024-12-10 17:07:17 -08:00
Aleix Conchillo Flaqué
9e6fabf110 Merge pull request #818 from pipecat-ai/aleix/fastpitch-rename
riva: rename FastpitchTTSService to FastPitchTTSService
2024-12-10 13:36:38 -08:00
Aleix Conchillo Flaqué
d2dabe4358 riva: rename FastpitchTTSService to FastPitchTTSService 2024-12-10 13:30:43 -08:00
Vanessa Pyne
1db624575f Merge pull request #795 from pipecat-ai/vp-nvidia-riva
[WIP] add nvidia riva
2024-12-10 15:17:26 -06:00
vipyne
a49b4e450b services(riva): check service config before running tts 2024-12-10 15:15:46 -06:00
vipyne
9211a37efc services(riva): convention tweaks 2024-12-10 15:15:46 -06:00
vipyne
3f9d39329c services(riva): model -> function_id 2024-12-10 15:15:46 -06:00
vipyne
5a98ae6380 chore: update test-requirements 2024-12-10 15:15:46 -06:00
vipyne
8caad15e9b examples trivial update 2024-12-10 15:15:46 -06:00
vipyne
9222d9f721 services(riva): cleanup 2024-12-10 15:15:46 -06:00
vipyne
5a467a30a3 add nvidia riva - fastpitch 2024-12-10 15:15:46 -06:00
Aleix Conchillo Flaqué
d74e728332 pyproject: update google-cloud-texttospeech to 2.21.1 2024-12-10 15:15:46 -06:00
vipyne
8a9fdaf441 services(riva): cleanup 2024-12-10 15:15:46 -06:00
Aleix Conchillo Flaqué
4b55c73fbe services(riva): make FastpitchTTSService asyncio 2024-12-10 15:15:46 -06:00
Aleix Conchillo Flaqué
7e407e5548 services(riva): first working version of ParakeetSTTService 2024-12-10 15:15:46 -06:00
Aleix Conchillo Flaqué
ce94421c90 pyproject: add riva option and update protobuf and playht 2024-12-10 15:15:46 -06:00
vipyne
49ce3dcb27 add nvidia riva - fastpitch 2024-12-10 15:15:46 -06:00
Aleix Conchillo Flaqué
6ba2dea6f0 Merge pull request #812 from zzz-heygen/zzz/fix_serializer_backward_compat
fix: make ProtobufFrameSerializer backwards compatible
2024-12-10 13:11:09 -08:00
Aleix Conchillo Flaqué
9ac34ac371 Merge pull request #816 from pipecat-ai/aleix/rtvi-version-update
rtvi: update protocol version to 0.3.0
2024-12-10 11:52:28 -08:00
Aleix Conchillo Flaqué
a8644d2129 Merge pull request #815 from pipecat-ai/aleix/identity-filter
processors(filters): add IdentityFilter
2024-12-10 11:09:20 -08:00
Aleix Conchillo Flaqué
3bf15476a4 processors(filters): add IdentityFilter 2024-12-10 11:01:59 -08:00
Aleix Conchillo Flaqué
acb3e21432 rtvi: update protocol version to 0.3.0 2024-12-10 10:57:42 -08:00
Mark Backman
8c9c81d84b Merge pull request #810 from pipecat-ai/mb/read-the-docs
Changes for Read the Docs hosting
2024-12-10 12:48:26 -05:00
Aleix Conchillo Flaqué
e51e2f781d Merge pull request #765 from simliai/simli
Add Simli Service
2024-12-10 09:23:06 -08:00
Dan Goodman
af6f5ecc86 customize Anthropic client via kwargs, also bumps default model version (#813)
* customize Anthropic client via kwargs

* bump default model
2024-12-10 09:13:44 -08:00
antonyesk601
81a18633ca Remove duplicate frame push if simli connection isn't ready 2024-12-10 10:18:31 +00:00
antonyesk601
397342d0b9 Inizialize simli_client on StartFrame; Follow variable naming scheme; Use logger instead of print statements; 2024-12-10 10:11:07 +00:00
zzz
d6b3a50108 x 2024-12-10 07:50:50 +00:00
Mark Backman
66b08161f1 Changes for Read the Docs hosting 2024-12-10 00:54:21 -05:00
Mark Backman
e7fa1cacce Merge pull request #800 from pipecat-ai/mb/autogen-docs
Auto-generate API reference docs
2024-12-09 22:05:08 -05:00
Mark Backman
2d3864ee09 Move API docs generation to docs/api 2024-12-09 20:44:10 -05:00
Aleix Conchillo Flaqué
0287f06379 Merge pull request #809 from pipecat-ai/aleix/parallel-pipeline-fix-system-frames
fix system frames parallel pipeline
2024-12-09 15:48:27 -08:00
Mark Backman
681c8ffb1d Merge pull request #807 from pipecat-ai/mb/stt-mute-strategy
Add new STT mute strategy, accept a set of strategies
2024-12-09 18:34:30 -05:00
Mark Backman
676643d558 Code review fixes 2024-12-09 18:27:07 -05:00
Mark Backman
0c4cbc2615 Push FunctionCall Frames upstream and downstream; update example 2024-12-09 18:27:07 -05:00
Aleix Conchillo Flaqué
e690c98230 transports(daily): no need for joining flag
This was put back because of an issue in ParallelPipeline but that issue is now
fixed so the joining check is not really necessary.
2024-12-09 09:38:30 -08:00
Aleix Conchillo Flaqué
e0a6c6871c parallel_pipeline: don't queue system frames 2024-12-09 09:38:30 -08:00
Mark Backman
29a042a101 Add changelog entry 2024-12-09 10:52:32 -05:00
Mark Backman
1cc2da571e Add new STT mute strategy, accept a set of strategies 2024-12-09 10:50:08 -05:00
Kwindla Hultman Kramer
c6b401b5d1 Merge pull request #805 from pipecat-ai/khk/parallel-pipeline-fix
Check to avoid double-join in ParallelPipeline case
2024-12-07 21:49:16 -08:00
Kwindla Hultman Kramer
315b7fcc34 check to avoid double-join 2024-12-07 21:22:36 -08:00
Mark Backman
e9f5fe0f37 Merge pull request #802 from Allenmylath/patch-22
Update README.md
2024-12-07 10:14:44 -05:00
allenmylath
64faf2218e Update examples/patient-intake/README.md
Co-authored-by: Mark Backman <m.backman@gmail.com>
2024-12-07 19:08:00 +05:30
allenmylath
e77a785a7d Update README.md 2024-12-07 13:36:50 +05:30
Mark Backman
03a269fb87 Merge pull request #801 from pipecat-ai/aleix/rtvi-handle-transport-urgent-frames
rtvi: handle transport urgent frames
2024-12-06 21:33:18 -05:00
Aleix Conchillo Flaqué
d1a55c6063 rtvi: handle transport urgent frames 2024-12-06 17:51:09 -08:00
Mark Backman
61d0fa42f1 Add a workflow to generate the docs 2024-12-06 20:32:33 -05:00
Mark Backman
16de1fca9b Add Read the Docs config 2024-12-06 20:15:17 -05:00
Mark Backman
2ad83f23c8 Initial reference docs commit 2024-12-06 19:44:44 -05:00
Aleix Conchillo Flaqué
422ee98db0 Merge pull request #798 from pipecat-ai/aleix/functioncall-data-frames
frames: FunctionCallResultFrame should be a DataFrame as before
2024-12-06 16:38:23 -08:00
Aleix Conchillo Flaqué
3d4620cf95 frames: FunctionCallResultFrame should be a DataFrame as before 2024-12-06 11:54:50 -08:00
Aleix Conchillo Flaqué
752a6f02b5 Merge pull request #799 from pipecat-ai/aleix/cartesia-interruptions-fix
cartesia: fix broken interruptions
2024-12-06 11:52:22 -08:00
Aleix Conchillo Flaqué
7e41809ec2 cartesia: fix broken interruptions 2024-12-06 11:49:03 -08:00
Aleix Conchillo Flaqué
e344a73d14 Merge pull request #797 from pipecat-ai/aleix/xtts-default-language
services(xtts): default language to Language.EN
2024-12-06 11:00:53 -08:00
Aleix Conchillo Flaqué
d6f480fa50 Merge pull request #791 from pipecat-ai/aleix/fastapi-generic-websocket
FastAPIWebsocketTransport: fix to work with text and binary
2024-12-06 10:46:16 -08:00
Aleix Conchillo Flaqué
423d6485f8 services(xtts): default language to Language.EN 2024-12-06 10:45:20 -08:00
Aleix Conchillo Flaqué
842b3de7f5 FastAPIWebsocketTransport: fix to work with text and binary 2024-12-06 10:31:42 -08:00
Aleix Conchillo Flaqué
3cb7829624 update CHANGELOG 2024-12-06 10:31:11 -08:00
Aleix Conchillo Flaqué
4292507616 Merge pull request #793 from balalofernandez/send-interruption-to-cartesia
fix: Send interruption to cartesia
2024-12-06 10:26:34 -08:00
Aleix Conchillo Flaqué
98c9759f41 Merge pull request #796 from pipecat-ai/aleix/improve-tts-reconnection
services: improve Cartesia, 11Labs, PlayHT and LMNT TTS reconnection
2024-12-06 10:22:54 -08:00
Aleix Conchillo Flaqué
bafb867ffc services: improve Cartesia, 11Labs, PlayHT and LMNT TTS reconnection 2024-12-06 10:11:59 -08:00
Mark Backman
b05809be2e Merge pull request #794 from pipecat-ai/mb/upgrade-anthropic
Upgrade Anthropic to the latest to avoid collision with aiohttp 3.11.9
2024-12-06 12:01:51 -05:00
Mark Backman
57d346ce13 Upgrade Anthropic to the latest to avoid collision with aiohttp 3.11.9 2024-12-06 11:59:19 -05:00
balalo
9001cb17ce Fix interruption frame to avoid issues with sending None 2024-12-06 17:42:46 +01:00
Mark Backman
40cfd9776f Merge pull request #792 from pipecat-ai/mb/cartesia-languages
Add additional languages for Cartesia
2024-12-06 09:57:38 -05:00
Mark Backman
d68b3ad1b2 Add additional languages for Cartesia 2024-12-06 09:22:05 -05:00
Kwindla Hultman Kramer
9b51588b92 Merge pull request #782 from pipecat-ai/khk/flash-transcription
Async Google LLM + Gemini Flash transcription example
2024-12-05 12:50:18 -08:00
Aleix Conchillo Flaqué
9a36a4ca32 Merge pull request #790 from pipecat-ai/aleix/base-output-transport-wait-for-output-tasks
transports(base_output): wait for output tasks on EndFrame
2024-12-05 11:30:55 -08:00
Aleix Conchillo Flaqué
f80a97b545 transports(base_output): wait for output tasks on EndFrame 2024-12-05 11:26:18 -08:00
Mark Backman
274278e229 Merge pull request #789 from pipecat-ai/mb/update-simple-chatbot-demo
Add RTVI transcripts, align styling
2024-12-05 11:56:07 -05:00
Mark Backman
6b94bcac03 Add RTVI transcripts, align styling 2024-12-05 11:12:48 -05:00
Aleix Conchillo Flaqué
969b87dee9 update aiohttp version to 3.11.9 2024-12-05 07:35:21 -08:00
balalo
bc699735a3 Send interruption message to cartesia 2024-12-05 16:23:40 +01:00
Mark Backman
00fd381808 Merge pull request #745 from pipecat-ai/mb/user-idle
Only run the UserIdleProcessor while pipeline is running
2024-12-05 10:12:02 -05:00
Mark Backman
672b1c6d73 Merge pull request #786 from Allenmylath/patch-21
Update README.md
2024-12-05 09:15:24 -05:00
Mark Backman
f455eb171b Merge pull request #784 from pipecat-ai/mb/simple-bot-client
Update the simple-chatbot demo to have JS and React clients
2024-12-05 08:34:33 -05:00
allenmylath
62c8c90e17 Update README.md 2024-12-05 13:23:05 +05:30
Aleix Conchillo Flaqué
28bb448605 Merge pull request #783 from pipecat-ai/aleix/deepgram-vad-event-handlers
deepgram: add VAD event handlers
2024-12-04 19:35:22 -08:00
Aleix Conchillo Flaqué
3d76b30a7c deepgram: add VAD event handlers 2024-12-04 19:31:09 -08:00
Aleix Conchillo Flaqué
0ae8ca0813 Merge pull request #781 from pipecat-ai/aleix/websocket-transports-mixer-fixes
websocket transports mixer fixes
2024-12-04 19:12:20 -08:00
Aleix Conchillo Flaqué
0935d773f5 transport(websockets): fix initial busy loop when using audio mixers 2024-12-04 19:10:39 -08:00
Aleix Conchillo Flaqué
e0f7a8a9f4 audio(mixer): SoundfileMixer doesn't resample files anymore 2024-12-04 19:09:50 -08:00
Aleix Conchillo Flaqué
2a0e01898f Merge pull request #780 from pipecat-ai/aleix/gstreamer-default-sample-rate
gstreamer: update default sample rate to 24000
2024-12-04 19:09:02 -08:00
Aleix Conchillo Flaqué
9d25e325dd Merge pull request #779 from pipecat-ai/aleix/websocket-server-audio-mixins-fix
frames: fix AudioRawFrame mixin
2024-12-04 19:08:41 -08:00
Aleix Conchillo Flaqué
37c21426bf Merge pull request #778 from pipecat-ai/aleix/transports-disconnect-on-last-transport
transports: fix premature input transport closing
2024-12-04 19:08:23 -08:00
Mark Backman
c467ec8ded Merge pull request #772 from pipecat-ai/mb/nim-llm
Add a NIM LLM service
2024-12-04 21:41:09 -05:00
Kwindla Hultman Kramer
a367a038f1 fix for finally clause 2024-12-04 18:31:30 -08:00
Mark Backman
e45a123eab Add image to README 2024-12-04 21:29:22 -05:00
Mark Backman
2ecc0e2b13 Remove node modules 2024-12-04 21:28:17 -05:00
Mark Backman
d532e924cd Add .gitignore 2024-12-04 21:28:17 -05:00
Mark Backman
36208049dc Update changelog 2024-12-04 21:28:17 -05:00
Mark Backman
1d11419691 Update the simple-chatbot demo to have JS and React clients 2024-12-04 21:13:14 -05:00
Mark Backman
05451f882d Merge pull request #777 from pipecat-ai/mb/twilio-example
Improve twilio-chatbot README
2024-12-04 20:26:45 -05:00
Kwindla Hultman Kramer
9c22f5b81b async google llm 2024-12-04 15:52:52 -08:00
Aleix Conchillo Flaqué
891f261191 gstreamer: update default sample rate to 24000 2024-12-04 14:41:44 -08:00
Aleix Conchillo Flaqué
13c27eaa1d frames: fix AudioRawFrame mixin 2024-12-04 13:25:37 -08:00
Mark Backman
c395d1a234 Merge pull request #773 from Allenmylath/patch-20
Update README.md
2024-12-04 14:45:38 -05:00
Mark Backman
49639c8631 Improve the twilio-chatbot README 2024-12-04 14:42:05 -05:00
Mark Backman
695a98a1f7 Remove streams.xml from version control 2024-12-04 14:26:10 -05:00
Mark Backman
5cbc37472c Update .gitignore to exclude streams.xml 2024-12-04 14:25:10 -05:00
Aleix Conchillo Flaqué
5b6d9a1050 transports: fix premature input transport closing 2024-12-04 10:56:57 -08:00
allenmylath
332d36475b Update examples/patient-intake/README.md
Co-authored-by: Mark Backman <m.backman@gmail.com>
2024-12-04 23:27:25 +05:30
Mark Backman
29b67578e3 Update README 2024-12-04 12:52:09 -05:00
Mark Backman
9db3743901 Update pyproject.toml with a nim optional dep 2024-12-04 12:52:09 -05:00
Mark Backman
496aded031 Update changelog 2024-12-04 12:38:05 -05:00
Mark Backman
1c1fa0db65 Add a NIM LLM service 2024-12-04 12:35:24 -05:00
Mark Backman
a2ad40d7e0 Merge pull request #775 from pipecat-ai/mb/llm-stubs
Added LLM services for GroqLLMService and GrokLLMService
2024-12-04 12:26:19 -05:00
Mark Backman
2bb3682d88 Update README 2024-12-04 12:24:39 -05:00
Kwindla Hultman Kramer
f33f08d667 partially working audio+transcription parallel pipelines 2024-12-04 08:51:35 -08:00
Mark Backman
d9bc2b618f Update FireworksLLMService to use OpenAILLMService 2024-12-04 11:51:05 -05:00
Mark Backman
d5a50e2cad Update AzureLLMService to use OpenAILLMService 2024-12-04 11:01:56 -05:00
Mark Backman
7013343bf0 Update the changelog 2024-12-04 10:10:55 -05:00
Mark Backman
728acba8a5 Add LLMService stubs for Grok and Groq, add examples 2024-12-04 10:08:28 -05:00
allenmylath
3b2c78747c Update README.md 2024-12-04 10:24:17 +05:30
allenmylath
44a0acffc8 Update README.md 2024-12-04 10:21:17 +05:30
Aleix Conchillo Flaqué
c31d5a4f1a Merge pull request #771 from pipecat-ai/aleix/daily-execute-callbacks-from-task
transports(daily): use a task to execute callbacks
2024-12-03 19:55:38 -08:00
Aleix Conchillo Flaqué
52caaa4afb transports(daily): use a task to execute callbacks
This commit fixes an issue where we were not waiting for
`asyncio.run_coroutine_threadsafe` to complete which can cause a series of
undesired issues (e.g. not actually executing the coroutine).
2024-12-03 18:58:54 -08:00
Aleix Conchillo Flaqué
115e75d808 Merge pull request #770 from pipecat-ai/aleix/system-input-frames-and-audio-buffer-processor
system input frames and audio buffer processor fixes
2024-12-03 18:58:13 -08:00
Mark Backman
897e024dd8 Only run the UserIdleProcessor while pipeline is running 2024-12-03 21:09:03 -05:00
Aleix Conchillo Flaqué
1cf93f1dcb FrameProcessor: ignore other frames during CancelFrame 2024-12-03 16:26:29 -08:00
Aleix Conchillo Flaqué
d278996d5b updated CHANGELOG 2024-12-03 16:12:40 -08:00
Aleix Conchillo Flaqué
322dd0cea1 AudioBufferProcessor: use on_audio_data event handler to retrieve audio 2024-12-03 16:12:40 -08:00
Aleix Conchillo Flaqué
a6a4910931 transports(services): incoming transport messages should be urgent 2024-12-03 14:30:15 -08:00
Aleix Conchillo Flaqué
52cefaa9d6 frames: remove AppFrame 2024-12-03 14:30:15 -08:00
Aleix Conchillo Flaqué
42658ecd92 frames: use mixins for audio and image data 2024-12-03 14:30:15 -08:00
Aleix Conchillo Flaqué
a6606a4040 transports(base_output): remove unused code 2024-12-03 14:30:15 -08:00
Aleix Conchillo Flaqué
d6c944cdc1 processors(audio): fix AudioBufferProcessor interruptions 2024-12-03 14:30:15 -08:00
Aleix Conchillo Flaqué
a5c7b02a73 frames: input frames are now system frames
Input frames from a transport should be processed fast and there's no need for
them to be queued internally in each element.
2024-12-03 14:30:15 -08:00
Aleix Conchillo Flaqué
6b9223d87e Merge pull request #768 from pipecat-ai/aleix/websocket-server-interruptions
transports(websockets): use frame serializers during interruptions
2024-12-02 19:18:20 -08:00
Aleix Conchillo Flaqué
c2135cbe11 transports(websockets): use frame serializers during interruptions 2024-12-02 19:17:17 -08:00
Aleix Conchillo Flaqué
32495ddd0b Merge pull request #769 from pipecat-ai/aleix/daily-subscribe-video-source
transports(daily): subscribe to the desired video source
2024-12-02 19:16:14 -08:00
Aleix Conchillo Flaqué
4301f0abf7 Merge pull request #767 from pipecat-ai/aleix/warn-transcription-no-token
transports(daily): warn if transcription enabled but no token provided
2024-12-02 15:06:35 -08:00
Aleix Conchillo Flaqué
5e854c4d03 transports(daily): subscribe to the desired video source 2024-12-02 12:13:23 -08:00
Aleix Conchillo Flaqué
bec46a87ae Merge pull request #766 from Allenmylath/patch-20
Update requirements.txt
2024-12-02 10:32:36 -08:00
Aleix Conchillo Flaqué
71cf94e936 transports(daily): warn if transcription enabled but no token provided 2024-12-02 09:55:17 -08:00
allenmylath
acbecf1c4c Update requirements.txt
daily is not used here.transport is fastapi websocket.
2024-12-02 21:36:29 +05:30
Mark Backman
6095fd342e Merge pull request #763 from Allenmylath/patch-19
Update README.md
2024-12-02 09:30:36 -05:00
Waleed
bf40b4936b updated env template; added simli variables 2024-12-02 12:05:55 +01:00
Waleed
c60dd8d4d2 updated environment variable name for cartesia 2024-12-02 12:05:32 +01:00
Waleed
d472aaf391 updated readme. Added simli 2024-12-02 11:50:51 +01:00
Waleed
6cc0b74e6c integrated simli 2024-12-02 11:35:46 +01:00
allenmylath
23316fbcf9 Update README.md 2024-12-02 13:35:44 +05:30
James Hush
5e22ef251d fix: add logging and error handling for issue #721 (#755) 2024-11-29 13:06:45 +08:00
Mark Backman
c5324df807 Merge pull request #752 from pipecat-ai/mb/google-context-message-conversion
Use Google Gemini message format when adding message to the LLM context
2024-11-27 14:13:17 -05:00
Mark Backman
3c19a7ae3d Use Google Gemini message format when adding message to the LLM context 2024-11-27 12:46:51 -05:00
Mark Backman
98c0a6e047 Merge pull request #749 from pipecat-ai/mb/pipecat-flows-standalone
Make Pipecat Flows an independent package
2024-11-25 17:09:11 -05:00
Mark Backman
f599e160de Make Pipecat Flows an independent package 2024-11-25 13:42:08 -05:00
Mark Backman
11c5d822f9 Merge pull request #746 from pipecat-ai/mb/update-flows
Bumping pipecat-ai-flows version
2024-11-22 11:25:03 -05:00
Mark Backman
c3e22f0931 Bumping pipecat-ai-flows version 2024-11-22 11:21:40 -05:00
Kwindla Hultman Kramer
9409546f90 Merge pull request #743 from pipecat-ai/khk/gemini-exp
Empty text content bug fix for Gemini
2024-11-21 14:04:28 -08:00
Kwindla Hultman Kramer
8ddac0ccd8 Testing with gemini-exp-1114. Bug fix 2024-11-21 10:33:12 -08:00
Vaibhav159
6e8e7fa19a adding session_timeout in fastapi 2024-11-21 14:56:42 +05:30
Vaibhav159
7dfa886669 moving logic to WebsocketServerInputTransport 2024-11-21 14:45:24 +05:30
Vaibhav159
da254c5143 correcting _monitor_websocket 2024-11-21 12:36:51 +05:30
Vaibhav159
e11f128110 adding on_session_timeout 2024-11-21 12:34:32 +05:30
Vaibhav-Lodha
3aa89fb13a adding session_timeout param 2024-11-21 12:20:51 +05:30
Mark Backman
f938960d50 Merge pull request #736 from pipecat-ai/mb/language-support
Make language support more robust
2024-11-20 13:03:47 -05:00
Mark Backman
2981d87bc1 Update changelog 2024-11-20 12:56:35 -05:00
Mark Backman
106042bbb2 Make language support more robust 2024-11-20 12:56:11 -05:00
Filipi da Silva Fuchter
d25ddeb962 Merge pull request #739 from pipecat-ai/krisp_v7
bumping krisp to support v7
2024-11-20 11:39:39 -03:00
Filipi Fuchter
c441baa692 bumping krisp to support v7 2024-11-20 11:37:45 -03:00
Mark Backman
676ff14913 Merge pull request #735 from pipecat-ai/vp-internal-push-frame-fix
internal push frame fix
2024-11-20 06:34:40 -05:00
Vanessa Pyne
14893ade92 Update src/pipecat/processors/frame_processor.py
Co-authored-by: Mark Backman <mark@daily.co>
2024-11-19 22:37:58 -06:00
Mark Backman
2a39ff69d6 Merge pull request #720 from pipecat-ai/mb/conversation-flow 2024-11-19 21:46:20 -05:00
Mark Backman
e79289454a Merge pull request #734 from pipecat-ai/mb/fix-cartesia 2024-11-19 21:27:52 -05:00
Mark Backman
25d02da1b2 Merge pull request #738 from pipecat-ai/mb/natural-conversation-demo 2024-11-19 21:27:38 -05:00
Mark Backman
a36fc370fa Improve the 22c foundational example 2024-11-19 15:49:40 -05:00
Mark Backman
e4c2f6d4c2 Update changelog 2024-11-18 21:32:53 -05:00
Mark Backman
97659ca3f0 Use the new pipecat-ai-flows module 2024-11-18 21:29:35 -05:00
vipyne
e00c75ce3f fix: raise exception in internal_push_frame 2024-11-18 16:01:04 -06:00
Mark Backman
cf62167f54 Revert: services(cartesia): generated TTSStoppedFrame after no more audio 2024-11-18 12:25:04 -05:00
Mark Backman
b3dfeb61c4 Add CHANGELOG entry 2024-11-18 12:18:20 -05:00
Mark Backman
bd020320cd Support a list of messages 2024-11-18 12:18:20 -05:00
Mark Backman
7a55d2d7db Add end session handler and update example 2024-11-18 12:18:20 -05:00
Mark Backman
b7308dca5d Fix issue where actions would execute on terminating nodes 2024-11-18 12:18:20 -05:00
Mark Backman
5301f44b3b Add pre- and post-actions 2024-11-18 12:18:20 -05:00
Mark Backman
686165b95a Add ability to register actions 2024-11-18 12:18:20 -05:00
Mark Backman
4e0ecdd673 Class name updates and remove FrameProcessor base class 2024-11-18 12:18:20 -05:00
Mark Backman
1b74560f9d Move function registration into the ConversationFlowProcessor class 2024-11-18 12:18:20 -05:00
Mark Backman
0c1070433f Clean up and commenting 2024-11-18 12:18:20 -05:00
Mark Backman
ece2c08cde debugging 2024-11-18 12:18:20 -05:00
Mark Backman
0b9742da9e Add a conversation flow processor 2024-11-18 12:18:20 -05:00
Aleix Conchillo Flaqué
635aa6eb5b Merge pull request #729 from pipecat-ai/aleix/fastapi-websocket-dont-close
transports(fastapi): don't try to close socket
2024-11-18 16:01:41 +01:00
Mark Backman
1ff17cc2b6 Merge pull request #733 from pipecat-ai/aleix/add-missing-init-files
processors: add missing __init__.py
2024-11-18 09:44:56 -05:00
Mark Backman
41ce9e9087 Merge pull request #697 from pipecat-ai/cst/leave-message
add handler for disconnect-bot message
2024-11-18 09:38:11 -05:00
Mark Backman
4803c54ecf Update CHANGELOG 2024-11-18 09:36:19 -05:00
Christian Stuff
5d7b3f2b38 add handler for disconnect-bot message 2024-11-18 09:33:30 -05:00
Aleix Conchillo Flaqué
23e5b1ec4d processors: add missing __init__.py 2024-11-18 11:32:20 +01:00
Aleix Conchillo Flaqué
7f5a8928b8 transports(fastapi): don't try to close socket
The websocket is passed from outside (in the transport constructor) so we should
not be trying to close it. FastAPI does actually close it later. We didn't see
any issue because these functions were not implemented properly. The value to
check was `application_state` instead of `client_state`. But in any case,
Pipecat should not be responsible for closing things passed from outside.
2024-11-18 01:15:19 +01:00
Aleix Conchillo Flaqué
53f675f5cf Merge pull request #727 from pipecat-ai/aleix/pipecat-0.0.49
update CHANGELOG for 0.0.49
2024-11-18 06:27:12 +08:00
Aleix Conchillo Flaqué
8173e4ce55 update CHANGELOG for 0.0.49 2024-11-17 23:26:09 +01:00
Aleix Conchillo Flaqué
5445bb0363 rtvi: add on_bot_started event 2024-11-17 22:40:00 +01:00
Mark Backman
a2a94724e5 Merge pull request #725 from pipecat-ai/mb/fix-simple-chatbot
Fix simple-chatbot example
2024-11-16 12:10:05 -05:00
Aleix Conchillo Flaqué
a8f9b0635a Merge pull request #722 from pipecat-ai/aleix/more-dailin-events
transports(daily): add more dial-in events
2024-11-17 01:09:01 +08:00
Mark Backman
4273a31fd5 Fix simple-chatbot example 2024-11-16 07:48:42 -05:00
Aleix Conchillo Flaqué
67f975a2c8 transports(daily): add more dial-in events 2024-11-16 01:22:50 +01:00
Mark Backman
d0bca67666 Merge pull request #716 from pipecat-ai/mb/mute-stt-service
Add STTMuteFilter to un/mute the STT
2024-11-14 19:55:00 -05:00
Mark Backman
966974bfc6 Change STTMuteProcessor to STTMuteFilter 2024-11-14 19:47:37 -05:00
Mark Backman
f807f233bd Suppress UserStartedSpeakingFrame and UserStoppedSpeakingFrame when muted 2024-11-14 17:11:51 -05:00
Mark Backman
33108f5798 Code review feedback 2024-11-14 17:05:08 -05:00
Mark Backman
52de825af8 Update CHANGELOG 2024-11-14 13:47:08 -05:00
Mark Backman
5fe679039c Add STTMuteProcessor to un/mute the STT 2024-11-14 13:35:02 -05:00
Kwindla Hultman Kramer
534f710f5d Merge pull request #688 from pipecat-ai/khk/natural-conversation
More work on llm-as-judge phrase endpointing
2024-11-14 09:15:16 -08:00
Mark Backman
53a11744a8 Merge pull request #712 from pipecat-ai/aleix/some-languages-tweaks
some languages tweaks
2024-11-14 09:33:26 -05:00
Mark Backman
72412cc0c4 Code review feedback 2024-11-14 09:31:04 -05:00
Mark Backman
b77ac07bc6 Merge pull request #715 from pipecat-ai/mb/update-readme-2
Add visual divider below Pipecat README image
2024-11-14 08:54:25 -05:00
Mark Backman
eb6926e0ce Add visual divider below Pipecat README image 2024-11-14 08:51:07 -05:00
Mark Backman
3b2c9de944 Merge pull request #713 from pipecat-ai/mb/update-readme
Update README
2024-11-14 08:45:28 -05:00
Mark Backman
27ff868e5a Move CONTRIBUTING to top directory 2024-11-14 08:43:03 -05:00
Mark Backman
57ef525a8e Update README 2024-11-14 08:43:03 -05:00
Aleix Conchillo Flaqué
d1db54d5fe examples(playht): use a 2.0 engine 2024-11-13 17:19:23 +01:00
Aleix Conchillo Flaqué
4f88fc0eb8 services(tts): initialize language to the proper language code 2024-11-13 17:19:23 +01:00
Aleix Conchillo Flaqué
37d1f4c4e1 services(tts): some language to service language cleanup 2024-11-13 17:19:23 +01:00
Aleix Conchillo Flaqué
ef9e86d997 services(playht): make sure we only skip wav header no matter the size 2024-11-13 17:19:23 +01:00
Aleix Conchillo Flaqué
2d2ef5a417 services(playht): voice engine is Play3.0-mini 2024-11-13 17:19:23 +01:00
Aleix Conchillo Flaqué
c1fff00586 services(playht): fix language codes 2024-11-13 17:19:23 +01:00
Mark Backman
0af2196f50 Merge pull request #708 from pipecat-ai/mb/add-rime-ai
Add RimeTTSService
2024-11-12 18:29:53 -05:00
Mark Backman
cd42320788 Update changelog 2024-11-12 18:28:04 -05:00
Mark Backman
70fce52499 Merge pull request #710 from pipecat-ai/mb/update-readme-krisp
Update Krisp README instructions
2024-11-12 11:15:25 -05:00
Mark Backman
70b60c0593 Update Krisp README instructions 2024-11-12 10:26:12 -05:00
Jon Taylor
2d8aa03f31 Merge pull request #706 from pipecat-ai/jpt/modal-example
barebones modal.com deployment example
2024-11-12 11:41:00 +00:00
Kwindla Hultman Kramer
581ff26704 Merge pull request #707 from pipecat-ai/khk/clean-up
tiny PR to remove old comment lines
2024-11-11 21:14:16 -08:00
Kwindla Hultman Kramer
335178ff06 some gemini audio input examples 2024-11-11 21:04:50 -08:00
Kwindla Hultman Kramer
ee53535f41 gemini audio-in with no transcription 2024-11-11 21:04:50 -08:00
Kwindla Hultman Kramer
91ac40307e small fix and more prompt examples 2024-11-11 21:04:50 -08:00
Kwindla Hultman Kramer
b6c2c1f730 anthropic natural conversation example using claude haiku 2024-11-11 21:04:50 -08:00
Kwindla Hultman Kramer
b56c789ae4 fixes for proposed judge pipeline 2024-11-11 21:04:50 -08:00
Kwindla Hultman Kramer
bd435d9e62 missing commit 2024-11-11 21:04:50 -08:00
Kwindla Hultman Kramer
55a81df84f contributing to llm-as-judge phrase endpointing work 2024-11-11 21:04:50 -08:00
Kwindla Hultman Kramer
87434460f5 temp hacking 2024-11-11 21:04:50 -08:00
Mark Backman
958ec42e8d Add Rime.ai TTS service 2024-11-11 21:58:09 -05:00
Jon Taylor
d1fff60d1d barebones modal.com deployment example 2024-11-11 22:30:07 +00:00
Kwindla Hultman Kramer
1438e5654a remove old comment 2024-11-10 16:08:10 -08:00
Aleix Conchillo Flaqué
1d4be0139a Merge pull request #705 from pipecat-ai/aleix/prepare-0.0.48
update CHANGELOG for 0.0.48
2024-11-10 14:08:33 -08:00
Aleix Conchillo Flaqué
f58c3ee322 update CHANGELOG for 0.0.48 2024-11-10 23:01:03 +01:00
Aleix Conchillo Flaqué
379750df91 Merge pull request #704 from pipecat-ai/aleix/cartesia-tts-stopped-frame
services(cartesia): generated TTSStoppedFrame after no more audio
2024-11-10 05:17:36 -08:00
Aleix Conchillo Flaqué
d125a38737 services(cartesia): generated TTSStoppedFrame after no more audio
The TTSStoppedFrame should be generated when the TTS services stoped generating
audio not when the bot stops speaking.
2024-11-10 09:55:45 +01:00
Mark Backman
446bb0aeaf Merge pull request #702 from pipecat-ai/mb/azure-websocket
Add an Azure TTS websocket service
2024-11-09 17:41:53 -05:00
Aleix Conchillo Flaqué
d839080834 Merge pull request #642 from pipecat-ai/aleix/input-queues-block-frames
introduce frame processor input queues block frames
2024-11-09 14:30:17 -08:00
Mark Backman
9b85d0642b Add a changelog entry 2024-11-09 12:37:29 -05:00
Mark Backman
230b51a117 Add an Azure TTS websocket service 2024-11-09 12:37:29 -05:00
Mark Backman
3a965ca396 Merge pull request #701 from pipecat-ai/khk/anthropic-function-calling-fix
fixes for anthropic function calling
2024-11-09 06:39:34 -05:00
Kwindla Hultman Kramer
33fc5bf990 improved 20c-persistent-context-anthropic.py 2024-11-08 16:42:30 -08:00
Kwindla Hultman Kramer
a54ca08405 fixes for anthropic function calling 2024-11-08 16:33:02 -08:00
Filipi da Silva Fuchter
4379db43ed Merge pull request #689 from pipecat-ai/filipi/krisp
Making pipecat work with Krisp
2024-11-08 16:22:52 -03:00
Filipi Fuchter
e915c676aa Added support for Krisp audio filter 2024-11-08 16:18:10 -03:00
Mark Backman
e0a003afa1 Merge pull request #695 from pipecat-ai/mb/initialize-azure-lang
Initialize the speech_recognition_language for Azure TTS
2024-11-08 06:40:40 -05:00
James Hush
d5666727ce feat: toggle looping with soundfile mixer (#693)
* feat: toggle looping with soundfile mixer

* Implement PR changes
2024-11-07 21:08:37 -08:00
Mark Backman
f6d7402530 Update changelog 2024-11-07 15:16:03 -05:00
Mark Backman
aefe190c9f Initialize the speech_recognition_language for Azure TTS 2024-11-07 15:14:05 -05:00
Vanessa Pyne
29925a8f21 Merge pull request #551 from Allenmylath/patch-3
Frame types and short descriptionCreate Frames.md
2024-11-07 10:05:32 -06:00
Aleix Conchillo Flaqué
beb3271168 services(tts): make sure word timestamp is reset properly 2024-11-06 18:54:12 -08:00
Aleix Conchillo Flaqué
b959ac6e1e Merge pull request #694 from pipecat-ai/aleix/daily-add-on-transcription-message
transports(daily): call on_transcription_message event handler
2024-11-06 15:21:17 -08:00
Aleix Conchillo Flaqué
17f4286942 transports(daily): call on_transcription_message event handler 2024-11-06 15:10:58 -08:00
Aleix Conchillo Flaqué
ce89bbb16e tts(elevenlabs): support pausing and resuming frames while speaking 2024-11-06 14:38:33 -08:00
Aleix Conchillo Flaqué
865768039b processors: remove block_on_frames and add pause_processing_frames() instead 2024-11-06 14:20:25 -08:00
Aleix Conchillo Flaqué
7071482583 try to use queue_frame() instead of process_frame() 2024-11-06 14:18:21 -08:00
Aleix Conchillo Flaqué
5353d13151 update CHANGELOG 2024-11-06 13:16:58 -08:00
Aleix Conchillo Flaqué
a9e565f355 processors: fix input queue interruptions 2024-11-06 13:12:24 -08:00
Aleix Conchillo Flaqué
b6f0c16591 examples: restore EndFrame() on 01 and 02 foundational 2024-11-06 13:05:03 -08:00
Aleix Conchillo Flaqué
49005d02f5 services(tts): use TTSSpeakFrame in say() method 2024-11-06 13:05:03 -08:00
Aleix Conchillo Flaqué
6d8b885071 transports(base_output): push bot started/stopped frames downstream 2024-11-06 13:04:37 -08:00
Aleix Conchillo Flaqué
2eccb33e73 processors: allow passing a callback when queued frame is processed 2024-11-06 13:04:37 -08:00
Aleix Conchillo Flaqué
22ca4c5a02 processors: cancel input task and empty queue with interruptions 2024-11-06 13:04:37 -08:00
Aleix Conchillo Flaqué
84f26ac1ca processors: introduce input queues
Frame processors can now decide if they should continue processing frames or
not, and if so also decide when to continue processing frames. For example,
asynchronous TTS services will stop processing frames until they have generated
all the audio for an LLM response.
2024-11-06 12:13:49 -08:00
Aleix Conchillo Flaqué
74937411e6 Merge pull request #691 from pipecat-ai/aleix/rtvi-manual-bot-ready
rtvi: bot-ready message needs to be sent manual
2024-11-06 10:53:25 -08:00
Aleix Conchillo Flaqué
8aab068ffd rtvi: bot-ready message needs to be sent manual 2024-11-05 10:52:54 -08:00
Aleix Conchillo Flaqué
bd50201ce4 transports(daily): just make it clear we subscribe to camera 2024-11-04 17:32:46 -08:00
Aleix Conchillo Flaqué
6082da284e Merge pull request #611 from pipecat-ai/aleix/audio-filters
introduce audio filters
2024-11-04 16:34:47 -08:00
Aleix Conchillo Flaqué
358c458265 transports(base_input): handle filter contorl frames 2024-11-04 16:19:52 -08:00
Aleix Conchillo Flaqué
807dbbe326 audio(noisereduce): allow enabling/disabling filter 2024-11-04 16:13:29 -08:00
Aleix Conchillo Flaqué
3c116b291d audio(mixers): some cosmetics 2024-11-04 15:37:08 -08:00
Aleix Conchillo Flaqué
0dd413ee90 audio(filters): add noisereduce filter 2024-11-04 15:37:08 -08:00
Aleix Conchillo Flaqué
abc8ede3d7 introduce audio filters 2024-11-04 15:37:08 -08:00
Aleix Conchillo Flaqué
126324ca1b Merge pull request #687 from pipecat-ai/aleix/transport-audio-mixers
introduce transport audio mixers
2024-11-04 13:14:36 -08:00
Aleix Conchillo Flaqué
602915ae18 examples(websocket-server): allow interruptions 2024-11-04 13:05:02 -08:00
Aleix Conchillo Flaqué
0ac9e2dd3f transports(network): synchronize with time before sending data 2024-11-04 13:04:18 -08:00
Aleix Conchillo Flaqué
a9ef5ca95d examples: add bot background sound example 2024-11-03 11:13:02 -08:00
Aleix Conchillo Flaqué
81c476dd4c introduce output transport audio mixers 2024-11-03 11:13:02 -08:00
Aleix Conchillo Flaqué
4455b2a428 rtvi: create queues before tasks 2024-11-01 23:06:50 -07:00
Aleix Conchillo Flaqué
94062592ef base_output: generate smaller audio frames of the same incoming type 2024-11-01 23:06:50 -07:00
Aleix Conchillo Flaqué
d2401a76c8 base_output: only generate bot speaking with TTS audio frames 2024-11-01 23:06:50 -07:00
Aleix Conchillo Flaqué
e2b1b56e86 examples: don't require room token if using an STT 2024-11-01 23:06:50 -07:00
allenmylath
0e69625a01 Rename frames.md to frame.md
edited again to frame.md
2024-10-14 10:07:47 +05:30
allenmylath
4e0823fced Rename Frames.md to frames.md
file name changed as requested
2024-10-14 10:05:26 +05:30
Allenmylath
40af3571f0 Create Frames.md
Made asmall explanation for diffrent types of frames in pipcat
2024-10-05 22:04:03 +05:30
439 changed files with 26098 additions and 3035 deletions

48
.github/workflows/android.yaml vendored Normal file
View File

@@ -0,0 +1,48 @@
name: android
on:
push:
branches:
- main
paths:
- "examples/simple-chatbot/client/android/**"
pull_request:
branches:
- "**"
paths:
- "examples/simple-chatbot/client/android/**"
workflow_dispatch:
inputs:
sdk_git_ref:
type: string
description: "Which git ref of the app to build"
concurrency:
group: build-android-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
sdk:
name: "Simple chatbot demo"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.sdk_git_ref || github.ref }}
- name: "Install Java"
uses: actions/setup-java@v4
with:
distribution: 'temurin'
java-version: '17'
- name: Build demo app
working-directory: examples/simple-chatbot/client/android
run: ./gradlew :simple-chatbot-client:assembleDebug
- name: Upload demo APK
uses: actions/upload-artifact@v4
with:
name: Simple Chatbot Android Client
path: examples/simple-chatbot/client/android/simple-chatbot-client/build/outputs/apk/debug/simple-chatbot-client-debug.apk

View File

@@ -35,7 +35,12 @@ jobs:
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: Ruff formatter
id: ruff
id: ruff-format
run: |
source .venv/bin/activate
ruff format --diff
- name: Ruff import linter
id: ruff-check
run: |
source .venv/bin/activate
ruff check --select I

9
.gitignore vendored
View File

@@ -28,4 +28,11 @@ share/python-wheels/
MANIFEST
.DS_Store
.env
fly.toml
fly.toml
# Example files
pipecat/examples/twilio-chatbot/templates/streams.xml
# Documentation
docs/api/_build/
docs/api/api

36
.readthedocs.yaml Normal file
View File

@@ -0,0 +1,36 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: '3.12'
apt_packages:
- portaudio19-dev
- python3-dev
- libasound2-dev
jobs:
pre_build:
- python -m pip install --upgrade pip
- pip install wheel setuptools
post_build:
- echo "Build completed"
sphinx:
configuration: docs/api/conf.py
fail_on_warning: false
python:
install:
- requirements: docs/api/requirements.txt
- method: pip
path: .
search:
ranking:
api/*: 5
getting-started/*: 4
guides/*: 3
submodules:
include: all
recursive: true

View File

@@ -9,6 +9,367 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Added `DeepSeekLLMService` for DeepSeek integration with an OpenAI-compatible
interface. Added foundational example `14l-function-calling-deepseek.py`.
- Added `FunctionCallResultProperties` dataclass to provide a structured way to
control function call behavior, including:
- `run_llm`: Controls whether to trigger LLM completion
- `on_context_updated`: Optional callback triggered after context update
- Added a new foundational example `07e-interruptible-playht-http.py` for easy
testing of `PlayHTHttpTTSService`.
- Added support for Google TTS Journey voices in `GoogleTTSService`.
- Added `29-livekit-audio-chat.py`, as a new foundational examples for
`LiveKitTransportLayer`.
- Added `enable_prejoin_ui`, `max_participants` and `start_video_off` params
to `DailyRoomProperties`.
- Added `session_timeout` to `FastAPIWebsocketTransport` and
`WebsocketServerTransport` for configuring session timeouts (in
seconds). Triggers `on_session_timeout` for custom timeout handling.
See [examples/websocket-server/bot.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/websocket-server/bot.py).
- Added the new modalities option and helper function to set Gemini output
modalities.
- Added `examples/foundational/26d-gemini-multimodal-live-text.py` which is
using Gemini as TEXT modality and using another TTS provider for TTS process.
### Changed
- Modified `OpenAIAssistantContextAggregator` to support controlled completions
and to emit context update callbacks via `FunctionCallResultProperties`.
- Added `aws_session_token` to the `PollyTTSService`.
- Changed the default model for `PlayHTHttpTTSService` to `Play3.0-mini-http`.
- `api_key`, `aws_access_key_id` and `region` are no longer required parameters
for the PollyTTSService (AWSTTSService)
- Added `session_timeout` example in `examples/websocket-server/bot.py` to
handle session timeout event.
- Changed `InputParams` in
`src/pipecat/services/gemini_multimodal_live/gemini.py` to support different
modalities.
### Fixed
- Fixed a `PipelineTask` issue that would cause a dangling task after stopping
the pipeline with an `EndFrame`.
- Fixed an import issue for `PlayHTHttpTTSService`.
- Fixed an issue where languages couldn't be used with the `PlayHTHttpTTSService`.
- Fixed an issue where `OpenAIRealtimeBetaLLMService` audio chunks were hitting
an error when truncating audio content.
- Fixed an issue where setting the voice and model for `RimeHttpTTSService`
wasn't working.
## [0.0.52] - 2024-12-24
### Added
- Constructor arguments for GoogleLLMService to directly set tools and tool_config.
- Smart turn detection example (`22d-natural-conversation-gemini-audio.py`) that
leverages Gemini 2.0 capabilities ().
(see https://x.com/kwindla/status/1870974144831275410)
- Added `DailyTransport.send_dtmf()` to send dial-out DTMF tones.
- Added `DailyTransport.sip_call_transfer()` to forward SIP and PSTN calls to
another address or number. For example, transfer a SIP call to a different
SIP address or transfer a PSTN phone number to a different PSTN phone number.
- Added `DailyTransport.sip_refer()` to transfer incoming SIP/PSTN calls from
outside Daily to another SIP/PSTN address.
- Added an `auto_mode` input parameter to `ElevenLabsTTSService`. `auto_mode`
is set to `True` by default. Enabling this setting disables the chunk
schedule and all buffers, which reduces latency.
- Added `KoalaFilter` which implement on device noise reduction using Koala
Noise Suppression.
(see https://picovoice.ai/platform/koala/)
- Added `CerebrasLLMService` for Cerebras integration with an OpenAI-compatible
interface. Added foundational example `14k-function-calling-cerebras.py`.
- Pipecat now supports Python 3.13. We had a dependency on the `audioop` package
which was deprecated and now removed on Python 3.13. We are now using
`audioop-lts` (https://github.com/AbstractUmbra/audioop) to provide the same
functionality.
- Added timestamped conversation transcript support:
- New `TranscriptProcessor` factory provides access to user and assistant
transcript processors.
- `UserTranscriptProcessor` processes user speech with timestamps from
transcription.
- `AssistantTranscriptProcessor` processes assistant responses with LLM
context timestamps.
- Messages emitted with ISO 8601 timestamps indicating when they were spoken.
- Supports all LLM formats (OpenAI, Anthropic, Google) via standard message
format.
- New examples: `28a-transcription-processor-openai.py`,
`28b-transcription-processor-anthropic.py`, and
`28c-transcription-processor-gemini.py`.
- Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino,
Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian,
Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa).
### Changed
- `PlayHTTTSService` uses the new v4 websocket API, which also fixes an issue
where text inputted to the TTS didn't return audio.
- The default model for `ElevenLabsTTSService` is now `eleven_flash_v2_5`.
- `OpenAIRealtimeBetaLLMService` now takes a `model` parameter in the
constructor.
- Updated the default model for the `OpenAIRealtimeBetaLLMService`.
- Room expiration (`exp`) in `DailyRoomProperties` is now optional (`None`) by
default instead of automatically setting a 5-minute expiration time. You must
explicitly set expiration time if desired.
### Deprecated
- `AWSTTSService` is now deprecated, use `PollyTTSService` instead.
### Fixed
- Fixed token counting in `GoogleLLMService`. Tokens were summed incorrectly
(double-counted in many cases).
- Fixed an issue that could cause the bot to stop talking if there was a user
interruption before getting any audio from the TTS service.
- Fixed an issue that would cause `ParallelPipeline` to handle `EndFrame`
incorrectly causing the main pipeline to not terminate or terminate too early.
- Fixed an audio stuttering issue in `FastPitchTTSService`.
- Fixed a `BaseOutputTransport` issue that was causing non-audio frames being
processed before the previous audio frames were played. This will allow, for
example, sending a frame `A` after a `TTSSpeakFrame` and the frame `A` will
only be pushed downstream after the audio generated from `TTSSpeakFrame` has
been spoken.
- Fixed a `DeepgramSTTService` issue that was causing language to be passed as
an object instead of a string resulting in the connection to fail.
## [0.0.51] - 2024-12-16
### Fixed
- Fixed an issue in websocket-based TTS services that was causing infinite
reconnections (Cartesia, ElevenLabs, PlayHT and LMNT).
## [0.0.50] - 2024-12-11
### Added
- Added `GeminiMultimodalLiveLLMService`. This is an integration for Google's
Gemini Multimodal Live API, supporting:
- Real-time audio and video input processing
- Streaming text responses with TTS
- Audio transcription for both user and bot speech
- Function calling
- System instructions and context management
- Dynamic parameter updates (temperature, top_p, etc.)
- Added `AudioTranscriber` utility class for handling audio transcription with
Gemini models.
- Added new context classes for Gemini:
- `GeminiMultimodalLiveContext`
- `GeminiMultimodalLiveUserContextAggregator`
- `GeminiMultimodalLiveAssistantContextAggregator`
- `GeminiMultimodalLiveContextAggregatorPair`
- Added new foundational examples for `GeminiMultimodalLiveLLMService`:
- `26-gemini-multimodal-live.py`
- `26a-gemini-multimodal-live-transcription.py`
- `26b-gemini-multimodal-live-video.py`
- `26c-gemini-multimodal-live-video.py`
- Added `SimliVideoService`. This is an integration for Simli AI avatars.
(see https://www.simli.com)
- Added NVIDIA Riva's `FastPitchTTSService` and `ParakeetSTTService`.
(see https://www.nvidia.com/en-us/ai-data-science/products/riva/)
- Added `IdentityFilter`. This is the simplest frame filter that lets through
all incoming frames.
- New `STTMuteStrategy` called `FUNCTION_CALL` which mutes the STT service
during LLM function calls.
- `DeepgramSTTService` now exposes two event handlers `on_speech_started` and
`on_utterance_end` that could be used to implement interruptions. See new
example `examples/foundational/07c-interruptible-deepgram-vad.py`.
- Added `GroqLLMService`, `GrokLLMService`, and `NimLLMService` for Groq, Grok,
and NVIDIA NIM API integration, with an OpenAI-compatible interface.
- New examples demonstrating function calling with Groq, Grok, Azure OpenAI,
Fireworks, and NVIDIA NIM: `14f-function-calling-groq.py`,
`14g-function-calling-grok.py`, `14h-function-calling-azure.py`,
`14i-function-calling-fireworks.py`, and `14j-function-calling-nvidia.py`.
- In order to obtain the audio stored by the `AudioBufferProcessor` you can now
also register an `on_audio_data` event handler. The `on_audio_data` handler
will be called every time `buffer_size` (a new constructor argument) is
reached. If `buffer_size` is 0 (default) you need to manually get the audio as
before using `AudioBufferProcessor.merge_audio_buffers()`.
```
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(processor, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels)
```
- Added a new RTVI message called `disconnect-bot`, which when handled pushes
an `EndFrame` to trigger the pipeline to stop.
### Changed
- `STTMuteFilter` now supports multiple simultaneous muting strategies.
- `XTTSService` language now defaults to `Language.EN`.
- `SoundfileMixer` doesn't resample input files anymore to avoid startup
delays. The sample rate of the provided sound files now need to match the
sample rate of the output transport.
- Input frames (audio, image and transport messages) are now system frames. This
means they are processed immediately by all processors instead of being queued
internally.
- Expanded the transcriptions.language module to support a superset of
languages.
- Updated STT and TTS services with language options that match the supported
languages for each service.
- Updated the `AzureLLMService` to use the `OpenAILLMService`. Updated the
`api_version` to `2024-09-01-preview`.
- Updated the `FireworksLLMService` to use the `OpenAILLMService`. Updated the
default model to `accounts/fireworks/models/firefunction-v2`.
- Updated the `simple-chatbot` example to include a Javascript and React client
example, using RTVI JS and React.
### Removed
- Removed `AppFrame`. This was used as a special user custom frame, but there's
actually no use case for that.
### Fixed
- Fixed a `ParallelPipeline` issue that would cause system frames to be queued.
- Fixed `FastAPIWebsocketTransport` so it can work with binary data (e.g. using
the protobuf serializer).
- Fixed an issue in `CartesiaTTSService` that could cause previous audio to be
received after an interruption.
- Fixed Cartesia, ElevenLabs, LMNT and PlayHT TTS websocket
reconnection. Before, if an error occurred no reconnection was happening.
- Fixed a `BaseOutputTransport` issue that was causing audio to be discarded
after an `EndFrame` was received.
- Fixed an issue in `WebsocketServerTransport` and `FastAPIWebsocketTransport`
that would cause a busy loop when using audio mixer.
- Fixed a `DailyTransport` and `LiveKitTransport` issue where connections were
being closed in the input transport prematurely. This was causing frames
queued inside the pipeline being discarded.
- Fixed an issue in `DailyTransport` that would cause some internal callbacks to
not be executed.
- Fixed an issue where other frames were being processed while a `CancelFrame`
was being pushed down the pipeline.
- `AudioBufferProcessor` now handles interruptions properly.
- Fixed a `WebsocketServerTransport` issue that would prevent interruptions with
`TwilioSerializer` from working.
- `DailyTransport.capture_participant_video` now allows capturing user's screen
share by simply passing `video_source="screenVideo"`.
- Fixed Google Gemini message handling to properly convert appended messages to
Gemini's required format.
- Fixed an issue with `FireworksLLMService` where chat completions were failing
by removing the `stream_options` from the chat completion options.
## [0.0.49] - 2024-11-17
### Added
- Added RTVI `on_bot_started` event which is useful in a single turn
interaction.
- Added `DailyTransport` events `dialin-connected`, `dialin-stopped`,
`dialin-error` and `dialin-warning`. Needs daily-python >= 0.13.0.
- Added `RimeHttpTTSService` and the `07q-interruptible-rime.py` foundational
example.
- Added `STTMuteFilter`, a general-purpose processor that combines STT
muting and interruption control. When active, it prevents both transcription
and interruptions during bot speech. The processor supports multiple
strategies: `FIRST_SPEECH` (mute only during bot's first
speech), `ALWAYS` (mute during all bot speech), or `CUSTOM` (using provided
callback).
- Added `STTMuteFrame`, a control frame that enables/disables speech
transcription in STT services.
## [0.0.48] - 2024-11-10 "Antonio release"
### Added
- There's now an input queue in each frame processor. When you call
`FrameProcessor.push_frame()` this will internally call
`FrameProcessor.queue_frame()` on the next processor (upstream or downstream)
and the frame will be internally queued (except system frames). Then, the
queued frames will get processed. With this input queue it is also possible
for FrameProcessors to block processing more frames by calling
`FrameProcessor.pause_processing_frames()`. The way to resume processing
frames is by calling `FrameProcessor.resume_processing_frames()`.
- Added audio filter `NoisereduceFilter`.
- Introduce input transport audio filters (`BaseAudioFilter`). Audio filters can
be used to remove background noises before audio is sent to VAD.
- Introduce output transport audio mixers (`BaseAudioMixer`). Output transport
audio mixers can be used, for example, to add background sounds or any other
audio mixing functionality before the output audio is actually written to the
transport.
- Added `GatedOpenAILLMContextAggregator`. This aggregator keeps the last
received OpenAI LLM context frame and it doesn't let it through until the
notifier is notified.
@@ -31,6 +392,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
grained control of what media subscriptions you want for each participant in a
room.
- Added audio filter `KrispFilter`.
### Changed
- The following `DailyTransport` functions are now `async` which means they need
@@ -42,8 +405,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
output to 24000 and also the default output transport sample rate. This
improves audio quality at the cost of some extra bandwidth.
- `AzureTTSService` now uses Azure websockets instead of HTTP requests.
- The previous `AzureTTSService` HTTP implementation is now
`AzureHttpTTSService`.
### Fixed
- Websocket transports (FastAPI and Websocket) now synchronize with time before
sending data. This allows for interruptions to just work out of the box.
- Improved bot speaking detection for all TTS services by using actual bot
audio.
@@ -55,9 +426,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Fixed an issue with PlayHTTTSService, where the TTFB metrics were reporting
very small time values.
- Fixed an issue where AzureTTSService wasn't initializing the specified
language.
### Other
- Added a new foundational example 22-natural-conversation.py. This examples
- Add `23-bot-background-sound.py` foundational example.
- Added a new foundational example `22-natural-conversation.py`. This example
shows how to achieve a more natural conversation detecting when the user ends
statement.

View File

@@ -1,6 +1,6 @@
BSD 2-Clause License
Copyright (c) 2024, Daily
Copyright (c) 20242025, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

100
README.md
View File

@@ -1,14 +1,21 @@
<div align="center">
<h1><div align="center">
 <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div>
</div></h1>
# Pipecat
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) <a href="https://app.commanddash.io/agent/github_pipecat-ai_pipecat"><img src="https://img.shields.io/badge/AI-Code%20Agent-EB9FDA"></a>
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) <a href="https://app.commanddash.io/agent/github_pipecat-ai_pipecat"><img src="https://img.shields.io/badge/AI-Code%20Agent-EB9FDA"></a>
Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.
`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, [story-telling toys for kids](https://storytelling-chatbot.fly.dev/), customer support bots, [intake flows](https://www.youtube.com/watch?v=lDevgsp9vn0), and snarky social companions.
## What you can build
Take a look at some example apps:
- **Voice Assistants**: [Natural, real-time conversations with AI](https://demo.dailybots.ai/)
- **Interactive Agents**: Personal coaches and meeting assistants
- **Multimodal Apps**: Combine voice, video, images, and text
- **Creative Tools**: [Story-telling experiences](https://storytelling-chatbot.fly.dev/) and social companions
- **Business Solutions**: [Customer intake flows](https://www.youtube.com/watch?v=lDevgsp9vn0) and support bots
- **Complex conversational flows**: [Refer to Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) to learn more
## See it in action
<p float="left">
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="280" /></a>&nbsp;
@@ -18,33 +25,54 @@ Take a look at some example apps:
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="280" /></a>
</p>
## Getting started with voice agents
## Key features
- **Voice-first Design**: Built-in speech recognition, TTS, and conversation handling
- **Flexible Integration**: Works with popular AI services (OpenAI, ElevenLabs, etc.)
- **Pipeline Architecture**: Build complex apps from simple, reusable components
- **Real-time Processing**: Frame-based pipeline architecture for fluid interactions
- **Production Ready**: Enterprise-grade WebRTC and Websocket support
💡 Looking to build structured conversations? Check out [Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) for managing complex conversational states and transitions.
## Getting started
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.
```shell
# install the module
# Install the module
pip install pipecat-ai
# set up an .env file with API keys
# Set up your environment
cp dot-env.template .env
```
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:
To keep things lightweight, only the core framework is included by default. If you need support for third-party AI services, you can add the necessary dependencies with:
```shell
pip install "pipecat-ai[option,...]"
```
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
Available options include:
- **AI services**: `anthropic`, `assemblyai`, `aws`, `azure`, `deepgram`, `gladia`, `google`, `fal`, `lmnt`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`, `xtts`
- **Transports**: `local`, `websocket`, `daily`
| Category | Services | Install Command Example |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[openai]"` |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
| Vision & Image | [Moondream](https://docs.pipecat.ai/server/services/vision/moondream), [fal](https://docs.pipecat.ai/server/services/image-generation/fal) | `pip install "pipecat-ai[moondream]"` |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
## Code examples
- [foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [Example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development
## A simple voice agent running locally
@@ -109,7 +137,7 @@ Run it with:
python app.py
```
Daily provides a prebuilt WebRTC user interface. Whilst the app is running, you can visit at `https://<yourdomain>.daily.co/<room_url>` and listen to the bot say hello!
Daily provides a prebuilt WebRTC user interface. While the app is running, you can visit at `https://<yourdomain>.daily.co/<room_url>` and listen to the bot say hello!
## WebRTC for production use
@@ -119,16 +147,6 @@ One way to get up and running quickly with WebRTC is to sign up for a Daily deve
Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard.
## What is VAD?
Voice Activity Detection &mdash; very important for knowing when a user has finished speaking to your bot. If you are not using press-to-talk, and want Pipecat to detect when the user has finished talking, VAD is an essential component for a natural feeling conversation.
Pipecat makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.
```shell
pip install pipecat-ai[silero]
```
## Hacking on the framework itself
_Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_
@@ -179,9 +197,7 @@ You can use [use-package](https://github.com/jwiegley/use-package) to install [e
:hook ((python-mode . lazy-ruff-mode))
:config
(setq lazy-ruff-format-command "ruff format")
(setq lazy-ruff-only-format-block t)
(setq lazy-ruff-only-format-region t)
(setq lazy-ruff-only-format-buffer t))
(setq lazy-ruff-check-command "ruff check --select I"))
```
`ruff` was installed in the `venv` environment described before, so you should be able to use [pyvenv-auto](https://github.com/ryotaro612/pyvenv-auto) to automatically load that environment inside Emacs.
@@ -191,7 +207,6 @@ You can use [use-package](https://github.com/jwiegley/use-package) to install [e
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
```
### Visual Studio Code
@@ -206,8 +221,33 @@ Install the
}
```
### PyCharm
`ruff` was installed in the `venv` environment described before, now to enable autoformatting on save, go to `File` -> `Settings` -> `Tools` -> `File Watchers` and add a new watcher with the following settings:
1. **Name**: `Ruff formatter`
2. **File type**: `Python`
3. **Working directory**: `$ContentRoot$`
4. **Arguments**: `format $FilePath$`
5. **Program**: `$PyInterpreterDirectory$/ruff`
## Contributing
We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:
- **Found a bug?** Open an [issue](https://github.com/pipecat-ai/pipecat/issues)
- **Have a feature idea?** Start a [discussion](https://discord.gg/pipecat)
- **Want to contribute code?** Check our [CONTRIBUTING.md](CONTRIBUTING.md) guide
- **Documentation improvements?** [Docs](https://github.com/pipecat-ai/docs) PRs are always welcome
Before submitting a pull request, please check existing issues and PRs to avoid duplicates.
We aim to review all contributions promptly and provide constructive feedback to help get your changes merged.
## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)
➡️ [Read the docs](https://docs.pipecat.ai)
➡️ [Reach us on X](https://x.com/pipecat_ai)

View File

@@ -1,8 +1,9 @@
build~=1.2.1
grpcio-tools~=1.62.2
build~=1.2.2
grpcio-tools~=1.68.1
pip-tools~=7.4.1
pyright~=1.1.376
pytest~=8.3.2
ruff~=0.6.7
setuptools~=72.2.0
pyright~=1.1.390
pytest~=8.3.4
ruff~=0.8.3
setuptools~=75.6.0
setuptools_scm~=8.1.0
python-dotenv~=1.0.1

20
docs/api/Makefile Normal file
View File

@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

109
docs/api/README.md Normal file
View File

@@ -0,0 +1,109 @@
# Pipecat Documentation
This directory contains the source files for auto-generating Pipecat's server API reference documentation.
## Setup
1. Install documentation dependencies:
```bash
pip install -r requirements.txt
```
2. Make the build scripts executable:
```bash
chmod +x build-docs.sh rtd-test.py
```
## Building Documentation
From this directory, you can build the documentation in several ways:
### Local Build
```bash
# Using the build script (automatically opens docs when done)
./build-docs.sh
# Or directly with sphinx-build
sphinx-build -b html . _build/html -W --keep-going
```
### ReadTheDocs Test Build
To test the documentation build process exactly as it would run on ReadTheDocs:
```bash
./rtd-test.py
```
This script:
- Creates a fresh virtual environment
- Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva and PlayHT)
- Builds the documentation in an isolated environment
- Provides detailed logging of the build process
Use this script to verify your documentation will build correctly on ReadTheDocs before pushing changes.
## Viewing Documentation
The built documentation will be available at `_build/html/index.html`. To open:
```bash
# On MacOS
open _build/html/index.html
# On Linux
xdg-open _build/html/index.html
# On Windows
start _build/html/index.html
```
## Directory Structure
```
.
├── api/ # Auto-generated API documentation
├── _build/ # Built documentation
├── _static/ # Static files (images, css, etc.)
├── conf.py # Sphinx configuration
├── index.rst # Main documentation entry point
├── requirements-base.txt # Base documentation dependencies
├── requirements-riva.txt # Riva-specific dependencies
├── requirements-playht.txt # PlayHT-specific dependencies
├── build-docs.sh # Local build script
└── rtd-test.py # ReadTheDocs test build script
```
## Notes
- Documentation is auto-generated from Python docstrings
- Service modules are automatically detected and included
- The build process matches our ReadTheDocs configuration
- Warnings are treated as errors (-W flag) to maintain consistency
- The --keep-going flag ensures all errors are reported
- Dependencies are split into multiple requirements files to handle version conflicts
## Troubleshooting
If you encounter missing service modules:
1. Verify the service is installed with its extras: `pip install pipecat-ai[service-name]`
2. Check the build logs for import errors
3. Ensure the service module is properly initialized in the package
4. Run `./rtd-test.py` to test in an isolated environment matching ReadTheDocs
For dependency conflicts:
1. Check the requirements files for version specifications
2. Use `rtd-test.py` to verify dependency resolution
3. Consider adding service-specific requirements files if needed
For more information:
- [ReadTheDocs Configuration](.readthedocs.yaml)
- [Sphinx Documentation](https://www.sphinx-doc.org/)

10
docs/api/build-docs.sh Executable file
View File

@@ -0,0 +1,10 @@
#!/bin/bash
# Clean previous build
rm -rf _build
# Build docs matching ReadTheDocs configuration
sphinx-build -b html -d _build/doctrees . _build/html -W --keep-going
# Open docs (MacOS)
open _build/html/index.html

252
docs/api/conf.py Normal file
View File

@@ -0,0 +1,252 @@
import logging
import sys
from pathlib import Path
# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger("sphinx-build")
# Add source directory to path
docs_dir = Path(__file__).parent
project_root = docs_dir.parent.parent
sys.path.insert(0, str(project_root / "src"))
# Project information
project = "pipecat-ai"
copyright = "2024, Daily"
author = "Daily"
# General configuration
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx.ext.intersphinx",
]
# Napoleon settings
napoleon_google_docstring = True
napoleon_numpy_docstring = False
napoleon_include_init_with_doc = True
# AutoDoc settings
autodoc_default_options = {
"members": True,
"member-order": "bysource",
"special-members": "__init__",
"undoc-members": True,
"exclude-members": "__weakref__",
"no-index": True,
"show-inheritance": True,
}
# Mock imports for optional dependencies
autodoc_mock_imports = [
"riva",
"livekit",
"pyht", # Base PlayHT package
"pyht.async_client", # PlayHT specific imports
"pyht.client",
"pyht.protos",
"pyht.protos.api_pb2",
"pipecat_ai_playht", # PlayHT wrapper
"anthropic",
"assemblyai",
"boto3",
"azure",
"cartesia",
"deepgram",
"elevenlabs",
"fal",
"gladia",
"google",
"krisp",
"langchain",
"lmnt",
"noisereduce",
"openai",
"openpipe",
"simli",
"soundfile",
# Existing mocks
"pipecat_ai_krisp",
"pyaudio",
"_tkinter",
"tkinter",
"daily",
"daily_python",
"pydantic.BaseModel",
"pydantic.Field",
"pydantic._internal._model_construction",
"pydantic._internal._fields",
]
# HTML output settings
html_theme = "sphinx_rtd_theme"
html_static_path = ["_static"]
autodoc_typehints = "description"
html_show_sphinx = False
def verify_modules():
"""Verify that required modules are available."""
required_modules = {
"services": [
"assemblyai",
"aws",
"cartesia",
"deepgram",
"google",
"lmnt",
"riva",
"simli",
],
"serializers": ["livekit"],
"vad": ["silero", "vad_analyzer"],
"transports": {
"services": ["daily", "livekit"],
"local": ["audio", "tk"],
"network": ["fastapi_websocket", "websocket_server"],
},
}
missing = []
for category, modules in required_modules.items():
if isinstance(modules, dict):
# Handle nested structure
for subcategory, submodules in modules.items():
for module in submodules:
try:
__import__(f"pipecat.{category}.{subcategory}.{module}")
logger.info(
f"Successfully imported pipecat.{category}.{subcategory}.{module}"
)
except (ImportError, TypeError, NameError) as e:
missing.append(f"pipecat.{category}.{subcategory}.{module}")
logger.warning(
f"Optional module not available: pipecat.{category}.{subcategory}.{module} - {str(e)}"
)
else:
# Handle flat structure
for module in modules:
try:
__import__(f"pipecat.{category}.{module}")
logger.info(f"Successfully imported pipecat.{category}.{module}")
except (ImportError, TypeError, NameError) as e:
missing.append(f"pipecat.{category}.{module}")
logger.warning(
f"Optional module not available: pipecat.{category}.{module} - {str(e)}"
)
if missing:
logger.warning(f"Some optional modules are not available: {missing}")
def clean_title(title: str) -> str:
"""Automatically clean module titles."""
# Remove everything after space (like 'module', 'processor', etc.)
title = title.split(" ")[0]
# Get the last part of the dot-separated path
parts = title.split(".")
title = parts[-1]
# Special cases for service names and common acronyms
special_cases = {
"ai": "AI",
"aws": "AWS",
"api": "API",
"vad": "VAD",
"assemblyai": "AssemblyAI",
"deepgram": "Deepgram",
"elevenlabs": "ElevenLabs",
"openai": "OpenAI",
"openpipe": "OpenPipe",
"playht": "PlayHT",
"xtts": "XTTS",
"lmnt": "LMNT",
}
# Check if the entire title is a special case
if title.lower() in special_cases:
return special_cases[title.lower()]
# Otherwise, capitalize each word
words = title.split("_")
cleaned_words = []
for word in words:
if word.lower() in special_cases:
cleaned_words.append(special_cases[word.lower()])
else:
cleaned_words.append(word.capitalize())
return " ".join(cleaned_words)
def setup(app):
"""Generate API documentation during Sphinx build."""
from sphinx.ext.apidoc import main
docs_dir = Path(__file__).parent
project_root = docs_dir.parent.parent
output_dir = str(docs_dir / "api")
source_dir = str(project_root / "src" / "pipecat")
# Clean existing files
if Path(output_dir).exists():
import shutil
shutil.rmtree(output_dir)
logger.info(f"Cleaned existing documentation in {output_dir}")
logger.info(f"Generating API documentation...")
logger.info(f"Output directory: {output_dir}")
logger.info(f"Source directory: {source_dir}")
excludes = [
str(project_root / "src/pipecat/pipeline/to_be_updated"),
str(project_root / "src/pipecat/processors/gstreamer"),
str(project_root / "src/pipecat/services/to_be_updated"),
str(project_root / "src/pipecat/vad"), # deprecated
"**/test_*.py",
"**/tests/*.py",
]
try:
main(
[
"-f", # Force overwriting
"-e", # Don't generate empty files
"-M", # Put module documentation before submodule documentation
"--no-toc", # Don't create a table of contents file
"--separate", # Put documentation for each module in its own page
"--module-first", # Module documentation before submodule documentation
"--implicit-namespaces", # Added: Handle implicit namespace packages
"-o",
output_dir,
source_dir,
]
+ excludes
)
logger.info("API documentation generated successfully!")
# Process generated RST files to update titles
for rst_file in Path(output_dir).glob("**/*.rst"): # Changed to recursive glob
content = rst_file.read_text()
lines = content.split("\n")
# Find and clean up the title
if lines and "=" in lines[1]: # Title is typically the first line
old_title = lines[0]
new_title = clean_title(old_title)
content = content.replace(old_title, new_title)
rst_file.write_text(content)
logger.info(f"Updated title: {old_title} -> {new_title}")
except Exception as e:
logger.error(f"Error generating API documentation: {e}", exc_info=True)
# Run module verification
verify_modules()

77
docs/api/index.rst Normal file
View File

@@ -0,0 +1,77 @@
Pipecat API Reference Docs
==========================
Welcome to Pipecat's API reference documentation!
Pipecat is an open source framework for building voice and multimodal assistants.
It provides a flexible pipeline architecture for connecting various AI services,
audio processing, and transport layers.
Quick Links
-----------
* `GitHub Repository <https://github.com/pipecat-ai/pipecat>`_
* `Website <https://pipecat.ai>`_
API Reference
-------------
Core Components
~~~~~~~~~~~~~~~
* :mod:`Frames <pipecat.frames>`
* :mod:`Processors <pipecat.processors>`
* :mod:`Pipeline <pipecat.pipeline>`
Audio Processing
~~~~~~~~~~~~~~~~
* :mod:`Audio <pipecat.audio>`
Services
~~~~~~~~
* :mod:`Services <pipecat.services>`
Transport & Serialization
~~~~~~~~~~~~~~~~~~~~~~~~~
* :mod:`Transports <pipecat.transports>`
* :mod:`Local <pipecat.transports.local>`
* :mod:`Network <pipecat.transports.network>`
* :mod:`Services <pipecat.transports.services>`
* :mod:`Serializers <pipecat.serializers>`
Utilities
~~~~~~~~~
* :mod:`Clocks <pipecat.clocks>`
* :mod:`Metrics <pipecat.metrics>`
* :mod:`Sync <pipecat.sync>`
* :mod:`Transcriptions <pipecat.transcriptions>`
* :mod:`Utils <pipecat.utils>`
.. toctree::
:maxdepth: 3
:caption: API Reference
:hidden:
Audio <api/pipecat.audio>
Clocks <api/pipecat.clocks>
Frames <api/pipecat.frames>
Metrics <api/pipecat.metrics>
Pipeline <api/pipecat.pipeline>
Processors <api/pipecat.processors>
Serializers <api/pipecat.serializers>
Services <api/pipecat.services>
Sync <api/pipecat.sync>
Transcriptions <api/pipecat.transcriptions>
Transports <api/pipecat.transports>
Utils <api/pipecat.utils>
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

35
docs/api/make.bat Normal file
View File

@@ -0,0 +1,35 @@
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)
if "%1" == "" goto help
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd

40
docs/api/requirements.txt Normal file
View File

@@ -0,0 +1,40 @@
# Sphinx dependencies
sphinx>=8.1.3
sphinx-rtd-theme
sphinx-markdown-builder
sphinx-autodoc-typehints
toml
# Install all extras individually to ensure they're properly resolved
pipecat-ai[anthropic]
pipecat-ai[assemblyai]
pipecat-ai[aws]
pipecat-ai[azure]
pipecat-ai[canonical]
pipecat-ai[cartesia]
pipecat-ai[daily]
pipecat-ai[deepgram]
pipecat-ai[elevenlabs]
pipecat-ai[fal]
pipecat-ai[fireworks]
pipecat-ai[gladia]
pipecat-ai[google]
pipecat-ai[grok]
pipecat-ai[groq]
# pipecat-ai[krisp] # Mocked instead
pipecat-ai[langchain]
pipecat-ai[livekit]
pipecat-ai[lmnt]
pipecat-ai[local]
pipecat-ai[moondream]
pipecat-ai[nim]
pipecat-ai[noisereduce]
pipecat-ai[openai]
# pipecat-ai[openpipe]
# pipecat-ai[playht] # Mocked due to grpcio conflict with riva
pipecat-ai[riva]
pipecat-ai[silero]
pipecat-ai[simli]
pipecat-ai[soundfile]
pipecat-ai[websocket]
pipecat-ai[whisper]

38
docs/api/rtd-test.sh Executable file
View File

@@ -0,0 +1,38 @@
#!/bin/bash
set -e
# Configuration
DOCS_DIR=$(pwd)
PROJECT_ROOT=$(cd ../../ && pwd)
TEST_DIR="/tmp/rtd-test-$(date +%Y%m%d_%H%M%S)"
echo "Creating test directory: $TEST_DIR"
mkdir -p "$TEST_DIR"
cd "$TEST_DIR"
# Create virtual environment
python -m venv venv
source venv/bin/activate
echo "Installing build dependencies..."
pip install --upgrade pip wheel setuptools
echo "Installing documentation dependencies..."
pip install -r "$DOCS_DIR/requirements.txt"
echo "Building documentation..."
cd "$DOCS_DIR"
sphinx-build -b html . "_build/html"
echo "Build complete. Check _build/html directory for output."
# Print summary
echo -e "\n=== Build Summary ==="
echo "Documentation: $DOCS_DIR/_build/html"
echo "Test environment: $TEST_DIR"
echo -e "\nTo view the documentation:"
echo "open $DOCS_DIR/_build/html/index.html"
# Print installed packages for verification
echo -e "\n=== Installed Packages ==="
pip freeze | grep -E "sphinx|pipecat"

110
docs/frame.md Normal file
View File

@@ -0,0 +1,110 @@
# Understanding Different Frame Types in the Pipecat System
In the Pipecat system, frames are used to represent different types of data and control signals that flow through the pipeline. Understanding these frame types is crucial for working with the system effectively. This tutorial will cover the main categories of frames and their specific uses.
## 1. Base Frame Classes
### Frame
The `Frame` class is the base class for all frames. It includes:
- `id`: A unique identifier
- `name`: A descriptive name
- `pts`: Presentation timestamp (optional)
### DataFrame
`DataFrame` is a subclass of `Frame` and serves as a base for most data-carrying frames.
## 2. Audio Frames
### AudioRawFrame
Represents a chunk of audio with properties:
- `audio`: Raw audio data
- `sample_rate`: Audio sample rate
- `num_channels`: Number of audio channels
Subclasses include:
- `InputAudioRawFrame`: For audio from input sources
- `OutputAudioRawFrame`: For audio to be played by output devices
- `TTSAudioRawFrame`: For audio generated by Text-to-Speech services
## 3. Image Frames
### ImageRawFrame
Represents an image with properties:
- `image`: Raw image data
- `size`: Image dimensions
- `format`: Image format (e.g., JPEG, PNG)
Subclasses include:
- `InputImageRawFrame`: For images from input sources
- `OutputImageRawFrame`: For images to be displayed
- `UserImageRawFrame`: For images associated with a specific user
- `VisionImageRawFrame`: For images with associated text for description
- `URLImageRawFrame`: For images with an associated URL
### SpriteFrame
Represents an animated sprite, containing a list of `ImageRawFrame` objects.
## 4. Text and Transcription Frames
### TextFrame
Represents a chunk of text, used for various purposes in the pipeline.
### TranscriptionFrame
A specialized `TextFrame` for speech transcriptions, including:
- `user_id`: ID of the speaking user
- `timestamp`: When the transcription was generated
- `language`: Detected language of the speech
### InterimTranscriptionFrame
Similar to `TranscriptionFrame`, but for interim (not final) transcriptions.
## 5. LLM (Language Model) Frames
### LLMMessagesFrame
Contains a list of messages for an LLM service to process.
### LLMMessagesAppendFrame and LLMMessagesUpdateFrame
Used to modify the current context of LLM messages.
### LLMSetToolsFrame
Specifies tools (functions) available for the LLM to use.
### LLMEnablePromptCachingFrame
Controls prompt caching in certain LLMs.
## 6. System and Control Frames
### SystemFrame
Base class for system-level frames.
Important system frames include:
- `StartFrame`: Initiates a pipeline
- `CancelFrame`: Stops a pipeline immediately
- `ErrorFrame`: Notifies of errors (with `FatalErrorFrame` for unrecoverable errors)
- `EndTaskFrame` and `CancelTaskFrame`: Control pipeline tasks
- `StartInterruptionFrame` and `StopInterruptionFrame`: Indicate user speech for interruptions
### ControlFrame
Base class for control-flow frames.
Notable control frames:
- `EndFrame`: Signals the end of a pipeline
- `LLMFullResponseStartFrame` and `LLMFullResponseEndFrame`: Bracket LLM responses
- `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame`: Indicate user speech activity
- `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame`: Indicate bot speech activity
- `TTSStartedFrame` and `TTSStoppedFrame`: Bracket Text-to-Speech responses
## 7. Special Purpose Frames
### MetricsFrame
Contains performance metrics data.
### FunctionCallInProgressFrame and FunctionCallResultFrame
Used for handling LLM function (tool) calls.
### ServiceUpdateSettingsFrame
Base class for updating service settings, with specific subclasses for LLM, TTS, and STT services.
## Conclusion
Understanding these frame types is essential for working with the Pipecat system. Each frame type serves a specific purpose in the pipeline, whether it's carrying data (like audio or images), controlling the flow of the pipeline, or managing system-level operations. By using the appropriate frame types, you can effectively process and transmit various kinds of information through your pipeline.

View File

@@ -52,4 +52,32 @@ OPENPIPE_API_KEY=...
# Tavus
TAVUS_API_KEY=...
TAVUS_REPLICA_ID=...
TAVUS_PERSONA_ID=...
TAVUS_PERSONA_ID=...
# Simli
SIMLI_API_KEY=...
SIMLI_FACE_ID=...
# Krisp
KRISP_MODEL_PATH=...
# DeepSeek
DEEPSEEK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Grok
GROK_API_KEY=...
# Together.ai
TOGETHER_API_KEY=...
# Cerebras
CEREBRAS_API_KEY=...
# Fish Audio
FISH_API_KEY=...
# Assembly AI
ASSEMBLYAI_API_KEY=...

View File

@@ -42,6 +42,7 @@ Next, follow the steps in the README for each demo.
| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [studypal](studypal) | A chatbot to have a conversation about any article on the web | |
| [WebSocket Chatbot Server](websocket-server) | A real-time websocket server that handles audio streaming and bot interactions with speech-to-text and text-to-speech capabilities | `python-websockets`, `openai`, `deepgram`, `silero-tts`, `numpy` |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -15,7 +15,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -102,7 +102,6 @@ async def main():
audio_buffer_processor=audio_buffer_processor,
aiohttp_session=session,
api_key=os.getenv("CANONICAL_API_KEY"),
api_url=os.getenv("CANONICAL_API_URL"),
call_id=str(uuid.uuid4()),
assistant="pipecat-chatbot",
assistant_speaks_first=True,
@@ -125,7 +124,7 @@ async def main():
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,22 +1,24 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import datetime
import io
import os
import sys
import aiohttp
import datetime
import wave
import aiofiles
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -32,15 +34,17 @@ logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def save_audio(audiobuffer):
if audiobuffer.has_audio():
merged_audio = audiobuffer.merge_audio_buffers()
async def save_audio(audio: bytes, sample_rate: int, num_channels: int):
if len(audio) > 0:
filename = f"conversation_recording{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.wav"
with wave.open(filename, "wb") as wf:
wf.setnchannels(2)
wf.setsampwidth(2)
wf.setframerate(audiobuffer._sample_rate)
wf.writeframes(merged_audio)
with io.BytesIO() as buffer:
with wave.open(buffer, "wb") as wf:
wf.setsampwidth(2)
wf.setnchannels(num_channels)
wf.setframerate(sample_rate)
wf.writeframes(audio)
async with aiofiles.open(filename, "wb") as file:
await file.write(buffer.getvalue())
print(f"Merged audio saved to {filename}")
else:
print("No audio data to save")
@@ -106,7 +110,9 @@ async def main():
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
audiobuffer = AudioBufferProcessor()
# Save audio every 10 seconds.
audiobuffer = AudioBufferProcessor(buffer_size=480000)
pipeline = Pipeline(
[
transport.input(), # microphone
@@ -121,16 +127,19 @@ async def main():
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.queue_frame(EndFrame())
await save_audio(audiobuffer)
runner = PipelineRunner()

View File

@@ -1,3 +1,4 @@
aiofiles
python-dotenv
fastapi[all]
uvicorn

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,22 +1,21 @@
import argparse
import asyncio
import os
import sys
import argparse
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -76,7 +75,7 @@ async def main(room_url: str, token: str):
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):

View File

@@ -1,29 +1,27 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import argparse
import subprocess
import os
import subprocess
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
import aiohttp
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomObject,
DailyRoomProperties,
DailyRoomParams,
DailyRoomProperties,
)
from dotenv import load_dotenv
load_dotenv(override=True)

View File

@@ -0,0 +1,91 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
dist/
*.egg-info/
*.egg
.installed.cfg
.eggs/
downloads/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
MANIFEST
# Virtual Environments
venv/
env/
.env
.venv/
ENV/
env.bak/
venv.bak/
# IDE
.idea/
.vscode/
.spyderproject
.spyproject
.ropeproject
# Testing and Coverage
.coverage
.coverage.*
htmlcov/
.pytest_cache/
.tox/
.nox/
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
cover/
# Logs and Databases
*.log
*.db
db.sqlite3
db.sqlite3-journal
pip-log.txt
# System Files
.DS_Store
Thumbs.db
desktop.ini
*.swp
*.swo
*.bak
*.tmp
*~
# Build and Documentation
docs/_build/
.pybuilder/
target/
instance/
.webassets-cache
.pdm.toml
.pdm-python
.pdm-build/
__pypackages__/
# Other
*.mo
*.pot
*.sage.py
.mypy_cache/
.dmypy.json
dmypy.json
.pyre/
.pytype/
cython_debug/
.ipynb_checkpoints

View File

@@ -0,0 +1,37 @@
# Deploying Pipecat to Modal.com
Barebones deployment example for [modal.com](https://www.modal.com)
1. Install dependencies
```bash
python -m venv venv
source venv/bin/active # or OS equivalent
pip install -r requirements.txt
```
2. Setup .env
```bash
cp env.example .env
```
Alternatively, you can configure your Modal app to use [secrets](https://modal.com/docs/guide/secrets)
3. Test the app locally
```bash
modal serve app.py
```
4. Deploy to production
```bash
modal deploy app.py
```
## Configuration options
This app sets some sensible defaults for reducing cold starts, such as `minkeep_warm=1`, which will keep at least 1 warm instance ready for your bot function.
It has been configured to only allow a concurrency of 1 (`max_inputs=1`) as each user will require their own running function.

View File

@@ -0,0 +1,74 @@
import os
import aiohttp
import modal
from bot import _voice_bot_process
from fastapi import HTTPException
from fastapi.responses import JSONResponse
from loguru import logger
MAX_SESSION_TIME = 15 * 60 # 15 minutes
app = modal.App("pipecat-modal")
image = modal.Image.debian_slim(python_version="3.12").pip_install_from_requirements(
"requirements.txt"
)
@app.function(
image=image,
cpu=1.0,
secrets=[modal.Secret.from_dotenv()],
keep_warm=1,
enable_memory_snapshot=True,
max_inputs=1, # Do not reuse instances across requests
retries=0,
)
def launch_bot_process(room_url: str, token: str):
_voice_bot_process(room_url, token)
@app.function(
image=image,
secrets=[modal.Secret.from_dotenv()],
)
@modal.web_endpoint(method="POST")
async def start():
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomParams,
)
logger.info("Request received")
async with aiohttp.ClientSession() as session:
daily_rest_helper = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=session,
)
# Create new Daily room
room = await daily_rest_helper.create_room(DailyRoomParams())
if not room.url:
raise HTTPException(
status_code=500,
detail="Unable to create room",
)
logger.info(f"Created room: {room.url}")
# Create bot token for room
token = await daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
logger.info(f"Bot token created: {token}")
# Spawn a new bot process
launch_bot_process.spawn(room_url=room.url, token=token)
# Return room URL to the user to join
# Note: in production, you would want to return a token to the user
return JSONResponse(content={"room_url": room.url, token: token})

View File

@@ -0,0 +1,90 @@
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token: str):
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
transport = DailyTransport(
room_url,
token,
"bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
def _voice_bot_process(room_url: str, token: str):
asyncio.run(main(room_url, token))

View File

@@ -0,0 +1,3 @@
DAILY_API_KEY=
OPENAI_API_KEY=
CARTESIA_API_KEY=

View File

@@ -0,0 +1,5 @@
python-dotenv==1.0.1
modal==0.71.3
pipecat-ai[daily,silero,cartesia,openai]==0.0.52
fastapi==0.115.6
aiohttp==3.11.11

View File

@@ -1,21 +1,20 @@
import argparse
import asyncio
import os
import sys
import argparse
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyDialinSettings
from loguru import logger
from dotenv import load_dotenv
from pipecat.transports.services.daily import DailyDialinSettings, DailyParams, DailyTransport
load_dotenv(override=True)
@@ -82,7 +81,7 @@ async def main(room_url: str, token: str, callId: str, callDomain: str):
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):

View File

@@ -7,14 +7,14 @@ provisioning a room and starting a Pipecat bot in response.
Refer to README for more information.
"""
import aiohttp
import os
import argparse
import os
import subprocess
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
import aiohttp
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, PlainTextResponse
from twilio.twiml.voice_response import VoiceResponse
@@ -22,13 +22,11 @@ from twilio.twiml.voice_response import VoiceResponse
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomObject,
DailyRoomParams,
DailyRoomProperties,
DailyRoomSipParams,
DailyRoomParams,
)
from dotenv import load_dotenv
load_dotenv(override=True)

View File

@@ -1,24 +1,22 @@
import argparse
import asyncio
import os
import sys
import argparse
from dotenv import load_dotenv
from loguru import logger
from twilio.rest import Client
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from twilio.rest import Client
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -85,7 +83,7 @@ async def main(room_url: str, token: str, callId: str, sipUri: str):
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):

View File

@@ -1,26 +1,24 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from loguru import logger
from dotenv import load_dotenv
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -36,7 +34,7 @@ async def main():
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
)
tts = CartesiaHttpTTSService(
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
@@ -50,12 +48,9 @@ async def main():
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))
# Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await task.queue_frames(
[TTSSpeakFrame(f"Hello there, {participant_name}!"), EndFrame()]
)
await runner.run(task)

View File

@@ -1,15 +1,17 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import TextFrame
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
@@ -17,10 +19,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -28,25 +26,24 @@ logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))
transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
pipeline = Pipeline([tts, transport.output()])
pipeline = Pipeline([tts, transport.output()])
task = PipelineTask(pipeline)
task = PipelineTask(pipeline)
async def say_something():
await asyncio.sleep(1)
await task.queue_frame(TextFrame("Hello there!"))
async def say_something():
await asyncio.sleep(1)
await task.queue_frames([TTSSpeakFrame("Hello there, how is it going!"), EndFrame()])
runner = PipelineRunner()
runner = PipelineRunner()
await asyncio.gather(runner.run(task), say_something())
await asyncio.gather(runner.run(task), say_something())
if __name__ == "__main__":

View File

@@ -4,6 +4,9 @@ import os
import sys
import aiohttp
from dotenv import load_dotenv
from livekit import api
from loguru import logger
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -12,12 +15,6 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.livekit import LiveKitParams, LiveKitTransport
from livekit import api
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -0,0 +1,54 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.riva import FastPitchTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
)
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
await task.queue_frames([TTSSpeakFrame(f"Aloha, {participant_name}!"), EndFrame()])
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,28 +1,26 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -37,7 +35,7 @@ async def main():
room_url, None, "Say One Thing From an LLM", DailyParams(audio_out_enabled=True)
)
tts = CartesiaHttpTTSService(
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
@@ -57,11 +55,7 @@ async def main():
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frame(LLMMessagesFrame(messages))
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
await runner.run(task)

View File

@@ -1,14 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -16,12 +20,6 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,16 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import tkinter as tk
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -19,10 +21,6 @@ from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -8,27 +8,24 @@
# This example broken on latest pipecat and needs updating.
#
import aiohttp
import asyncio
import os
import sys
from pipecat.pipeline.merge_pipeline import SequentialMergePipeline
from pipecat.pipeline.pipeline import Pipeline
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.merge_pipeline import SequentialMergePipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.services.azure import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.transport_services import TransportServiceOutput
from pipecat.services.transports.daily_transport import DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,18 +1,21 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from dataclasses import dataclass
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import (
AppFrame,
DataFrame,
Frame,
LLMFullResponseStartFrame,
LLMMessagesFrame,
@@ -22,19 +25,13 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.fal import FalImageGenService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -42,7 +39,7 @@ logger.add(sys.stderr, level="DEBUG")
@dataclass
class MonthFrame(AppFrame):
class MonthFrame(DataFrame):
month: str
def __str__(self):

View File

@@ -1,23 +1,25 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import tkinter as tk
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import (
Frame,
LLMMessagesFrame,
OutputAudioRawFrame,
TextFrame,
TTSAudioRawFrame,
URLImageRawFrame,
LLMMessagesFrame,
TextFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -26,15 +28,11 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.fal import FalImageGenService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport, TkOutputTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, LLMMessagesFrame, MetricsFrame
from pipecat.frames.frames import EndFrame, Frame, MetricsFrame
from pipecat.metrics.metrics import (
LLMUsageMetricsData,
ProcessingMetricsData,
@@ -113,7 +113,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,18 +1,21 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, OutputImageRawFrame, SystemFrame, TextFrame
from pipecat.frames.frames import EndFrame, Frame, OutputImageRawFrame, SystemFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
@@ -20,14 +23,7 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyTransport
from pipecat.transports.services.daily import DailyParams
from runner import configure
from loguru import logger
from dotenv import load_dotenv
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -130,6 +126,10 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)

View File

@@ -1,30 +1,28 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.audio.vad.silero import SileroVAD
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.audio.vad.silero import SileroVAD
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -92,7 +90,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,16 +1,20 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -19,12 +23,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -90,7 +88,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -78,13 +78,25 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -9,9 +9,17 @@ import os
import sys
import aiohttp
from dotenv import load_dotenv
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -23,18 +31,6 @@ from pipecat.processors.frameworks.langchain import LangchainProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
@@ -105,7 +101,15 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
@@ -118,6 +122,10 @@ async def main():
messages = [({"content": "Please briefly introduce yourself to the user."})]
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)

View File

@@ -0,0 +1,117 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from deepgram import LiveOptions
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import (
BotInterruptionFrame,
EndFrame,
StopInterruptionFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
)
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@stt.event_handler("on_speech_started")
async def on_speech_started(stt, *args, **kwargs):
await task.queue_frames([BotInterruptionFrame(), UserStartedSpeakingFrame()])
@stt.event_handler("on_utterance_end")
async def on_utterance_end(stt, *args, **kwargs):
await task.queue_frames([StopInterruptionFrame(), UserStoppedSpeakingFrame()])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -31,11 +31,11 @@ logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
token,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
@@ -73,13 +73,25 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,14 +11,14 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -88,7 +88,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -0,0 +1,105 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService
from pipecat.services.playht import PlayHTHttpTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = PlayHTHttpTTSService(
user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -49,7 +49,7 @@ async def main():
tts = PlayHTTTSService(
user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/801a663f-efd0-4254-98d0-5c175514c3e8/jennifer/manifest.json",
voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
params=PlayHTTTSService.InputParams(language=Language.EN),
)
@@ -91,7 +91,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,16 +1,20 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -18,13 +22,6 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.azure import AzureLLMService, AzureSTTService, AzureTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -85,14 +82,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,14 +11,14 @@ import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService, OpenAITTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -70,14 +70,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,16 +1,21 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import time
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -19,13 +24,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openpipe import OpenPipeLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
import time
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -82,14 +80,26 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,16 +1,20 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -19,12 +23,6 @@ from pipecat.services.openai import OpenAILLMService
from pipecat.services.xtts import XTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -50,7 +48,6 @@ async def main():
tts = XTTSService(
aiohttp_session=session,
voice_id="Claribel Dervla",
language="en",
base_url="http://localhost:8000",
)
@@ -77,14 +74,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -79,14 +79,22 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
# Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")

View File

@@ -1,16 +1,20 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -19,12 +23,6 @@ from pipecat.services.lmnt import LmntTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -73,14 +71,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -90,7 +90,10 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True, enable_metrics=True, enable_usage_metrics=True
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@@ -98,7 +101,11 @@ async def main():
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,12 +14,12 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.aws import AWSTTSService
from pipecat.services.aws import PollyTTSService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -32,11 +32,11 @@ logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
token,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
@@ -48,12 +48,12 @@ async def main():
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = AWSTTSService(
tts = PollyTTSService(
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region=os.getenv("AWS_REGION"),
voice_id="Amy",
params=AWSTTSService.InputParams(engine="neural", language="en-GB", rate="1.05"),
params=PollyTTSService.InputParams(engine="neural", language="en-GB", rate="1.05"),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
@@ -80,14 +80,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -22,6 +22,7 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.google import GoogleTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -32,11 +33,11 @@ logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
token,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
@@ -50,8 +51,8 @@ async def main():
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = GoogleTTSService(
voice_id="en-US-Neural2-J",
params=GoogleTTSService.InputParams(language="en-US", rate="1.05"),
voice_id="en-US-Journey-F",
params=GoogleTTSService.InputParams(language=Language.EN_US),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
@@ -78,14 +79,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -79,14 +79,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -0,0 +1,104 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.filters.krisp_filter import KrispFilter
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
audio_in_filter=KrispFilter(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,104 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService
from pipecat.services.rime import RimeHttpTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
params=RimeHttpTTSService.InputParams(reduce_latency=True),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,96 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.nim import NimLLMService
from pipecat.services.riva import FastPitchTTSService, ParakeetSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = ParakeetSTTService(api_key=os.getenv("NVIDIA_API_KEY"))
llm = NimLLMService(
api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct"
)
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,282 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from dataclasses import dataclass
import aiohttp
import google.ai.generativelanguage as glm
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
EndFrame,
Frame,
InputAudioRawFrame,
LLMFullResponseEndFrame,
LLMFullResponseStartFrame,
StartInterruptionFrame,
TextFrame,
TranscriptionFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.google import GoogleLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
marker = "|----|"
system_message = f"""
You are a helpful LLM in a WebRTC call. Your goals are to be helpful and brief in your responses.
You are expert at transcribing audio to text. You will receive a mixture of audio and text input. When
asked to transcribe what the user said, output an exact, word-for-word transcription.
Your output will be converted to audio so don't include special characters in your answers.
Each time you answer, you should respond in three parts.
1. Transcribe exactly what the user said.
2. Output the separator field '{marker}'.
3. Respond to the user's input in a helpful, creative way using only simple text and punctuation.
Example:
User: How many ounces are in a pound?
You: How many ounces are in a pound?
{marker}
There are 16 ounces in a pound.
"""
@dataclass
class MagicDemoTranscriptionFrame(Frame):
text: str
class UserAudioCollector(FrameProcessor):
def __init__(self, context, user_context_aggregator):
super().__init__()
self._context = context
self._user_context_aggregator = user_context_aggregator
self._audio_frames = []
self._start_secs = 0.2 # this should match VAD start_secs (hardcoding for now)
self._user_speaking = False
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
# We could gracefully handle both audio input and text/transcription input ...
# but let's leave that as an exercise to the reader. :-)
return
if isinstance(frame, UserStartedSpeakingFrame):
self._user_speaking = True
elif isinstance(frame, UserStoppedSpeakingFrame):
self._user_speaking = False
self._context.add_audio_frames_message(audio_frames=self._audio_frames)
await self._user_context_aggregator.push_frame(
self._user_context_aggregator.get_context_frame()
)
elif isinstance(frame, InputAudioRawFrame):
if self._user_speaking:
self._audio_frames.append(frame)
else:
# Append the audio frame to our buffer. Treat the buffer as a ring buffer, dropping the oldest
# frames as necessary. Assume all audio frames have the same duration.
self._audio_frames.append(frame)
frame_duration = len(frame.audio) / 16 * frame.num_channels / frame.sample_rate
buffer_duration = frame_duration * len(self._audio_frames)
while buffer_duration > self._start_secs:
self._audio_frames.pop(0)
buffer_duration -= frame_duration
await self.push_frame(frame, direction)
class TranscriptExtractor(FrameProcessor):
def __init__(self, context):
super().__init__()
self._context = context
self._accumulator = ""
self._processing_llm_response = False
self._accumulating_transcript = False
def reset(self):
self._accumulator = ""
self._processing_llm_response = False
self._accumulating_transcript = False
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if isinstance(frame, LLMFullResponseStartFrame):
self._processing_llm_response = True
self._accumulating_transcript = True
elif isinstance(frame, TextFrame) and self._processing_llm_response:
if self._accumulating_transcript:
text = frame.text
split_index = text.find(marker)
if split_index < 0:
self._accumulator += frame.text
# do not push this frame
return
else:
self._accumulating_transcript = False
self._accumulator += text[:split_index]
frame.text = text[split_index + len(marker) :]
await self.push_frame(frame)
return
elif isinstance(frame, LLMFullResponseEndFrame):
await self.push_frame(MagicDemoTranscriptionFrame(text=self._accumulator.strip()))
self.reset()
await self.push_frame(frame, direction)
class TanscriptionContextFixup(FrameProcessor):
def __init__(self, context):
super().__init__()
self._context = context
self._transcript = "THIS IS A TRANSCRIPT"
def swap_user_audio(self):
if not self._transcript:
return
message = self._context.messages[-2]
last_part = message.parts[-1]
if (
message.role == "user"
and last_part.inline_data
and last_part.inline_data.mime_type == "audio/wav"
):
self._context.messages[-2] = glm.Content(
role="user", parts=[glm.Part(text=self._transcript)]
)
def add_transcript_back_to_inference_output(self):
if not self._transcript:
return
message = self._context.messages[-1]
last_part = message.parts[-1]
if message.role == "model" and last_part.text:
self._context.messages[-1].parts[-1].text += f"\n\n{marker}\n{self._transcript}\n"
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if isinstance(frame, MagicDemoTranscriptionFrame):
self._transcript = frame.text
elif isinstance(frame, LLMFullResponseEndFrame) or isinstance(
frame, StartInterruptionFrame
):
self.swap_user_audio()
self.add_transcript_back_to_inference_output()
self._transcript = ""
await self.push_frame(frame, direction)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
# No transcription at all. just audio input to Gemini!
# transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GoogleLLMService(
model="gemini-1.5-flash-latest",
# model="gemini-exp-1114",
api_key=os.getenv("GOOGLE_API_KEY"),
)
messages = [
{
"role": "system",
"content": system_message,
},
{
"role": "user",
"content": "Start by saying hello.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
audio_collector = UserAudioCollector(context, context_aggregator.user())
pull_transcript_out_of_llm_output = TranscriptExtractor(context)
fixup_context_messages = TanscriptionContextFixup(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
audio_collector,
context_aggregator.user(), # User responses
llm, # LLM
pull_transcript_out_of_llm_output,
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
fixup_context_messages,
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,103 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.fish import FishAudioTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = FishAudioTTSService(
api_key=os.getenv("FISH_API_KEY"),
model="4ce7e917cedd4bc2bb2e6ff3a46acaa1", # Barack Obama
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,20 +1,19 @@
from typing import Tuple
import aiohttp
import asyncio
import logging
import os
from pipecat.processors.aggregators import SentenceAggregator
from pipecat.pipeline.pipeline import Pipeline
from typing import Tuple
from pipecat.transports.services.daily import DailyTransport
import aiohttp
from dotenv import load_dotenv
from runner import configure
from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.aggregators import SentenceAggregator
from pipecat.services.azure import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from runner import configure
from dotenv import load_dotenv
from pipecat.transports.services.daily import DailyTransport
load_dotenv(override=True)

View File

@@ -1,13 +1,17 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import (
Frame,
InputAudioRawFrame,
@@ -19,13 +23,7 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.services.daily import DailyTransport, DailyParams
from runner import configure
from loguru import logger
from dotenv import load_dotenv
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)

View File

@@ -1,15 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import sys
import tkinter as tk
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import (
Frame,
InputAudioRawFrame,
@@ -25,12 +28,6 @@ from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,14 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -19,12 +23,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,38 +1,38 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import wave
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
Frame,
LLMFullResponseEndFrame,
LLMMessagesFrame,
OutputAudioRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext,
OpenAILLMContextFrame,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.logger import FrameLogger
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -72,7 +72,7 @@ class InboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMMessagesFrame):
if isinstance(frame, OpenAILLMContextFrame):
await self.push_frame(sounds["ding2.wav"])
# In case anything else downstream needs it
await self.push_frame(frame, direction)
@@ -98,7 +98,7 @@ async def main():
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
tts = CartesiaHttpTTSService(
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)

View File

@@ -1,14 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -21,12 +25,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.moondream import MoondreamService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,14 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -21,12 +25,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.google import GoogleLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,14 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -21,12 +25,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,14 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -17,16 +21,10 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,13 +1,17 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -16,12 +20,6 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -7,6 +7,9 @@
import asyncio
import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -16,10 +19,6 @@ from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,28 +1,26 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.deepgram import DeepgramSTTService, LiveOptions, Language
from pipecat.services.deepgram import DeepgramSTTService, Language, LiveOptions
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,14 +1,19 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -17,14 +22,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,29 +1,27 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,29 +1,27 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -67,7 +65,8 @@ async def main():
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-5-sonnet-20240620",
# model="claude-3-5-sonnet-20240620",
model="claude-3-5-sonnet-latest",
enable_prompt_caching_beta=True,
)
llm.register_function("get_weather", get_weather)

View File

@@ -1,14 +1,19 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -18,14 +23,6 @@ from pipecat.services.openai import OpenAILLMContext
from pipecat.services.together import TogetherLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -125,7 +122,7 @@ async def main():
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
# await tts.say("Hi! Ask me about the weather in San Francisco.")
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()

View File

@@ -1,14 +1,19 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -17,14 +22,6 @@ from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)

View File

@@ -1,14 +1,18 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -18,12 +22,6 @@ from pipecat.services.google import GoogleLLMService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -64,7 +62,11 @@ async def main():
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GoogleLLMService(model="gemini-1.5-flash-latest", api_key=os.getenv("GOOGLE_API_KEY"))
llm = GoogleLLMService(
model="gemini-1.5-flash-latest",
# model="gemini-exp-1114",
api_key=os.getenv("GOOGLE_API_KEY"),
)
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
@@ -151,7 +153,6 @@ indicate you should use the get_image tool are:
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)

View File

@@ -0,0 +1,139 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.groq import GroqLLMService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(function_name, llm, context):
# note: we can't push a frame to the LLM here. the bot
# can interrupt itself and/or cause audio overlapping glitches.
# possible question for Aleix and Chad about what the right way
# to trigger speech is, now, with the new queues/async/sync refactors.
# await llm.push_frame(TextFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"), model="llama3-groq-70b-8192-tool-use-preview"
)
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,137 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.grok import GrokLLMService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(function_name, llm, context):
# note: we can't push a frame to the LLM here. the bot
# can interrupt itself and/or cause audio overlapping glitches.
# possible question for Aleix and Chad about what the right way
# to trigger speech is, now, with the new queues/async/sync refactors.
# await llm.push_frame(TextFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GrokLLMService(api_key=os.getenv("GROK_API_KEY"))
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,141 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.azure import AzureLLMService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(function_name, llm, context):
# note: we can't push a frame to the LLM here. the bot
# can interrupt itself and/or cause audio overlapping glitches.
# possible question for Aleix and Chad about what the right way
# to trigger speech is, now, with the new queues/async/sync refactors.
# await llm.push_frame(TextFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,140 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.fireworks import FireworksLLMService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(function_name, llm, context):
# note: we can't push a frame to the LLM here. the bot
# can interrupt itself and/or cause audio overlapping glitches.
# possible question for Aleix and Chad about what the right way
# to trigger speech is, now, with the new queues/async/sync refactors.
# await llm.push_frame(TextFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = FireworksLLMService(
api_key=os.getenv("FIREWORKS_API_KEY"),
model="accounts/fireworks/models/firefunction-v2",
)
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,140 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.nim import NimLLMService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(function_name, llm, context):
# note: we can't push a frame to the LLM here. the bot
# can interrupt itself and/or cause audio overlapping glitches.
# possible question for Aleix and Chad about what the right way
# to trigger speech is, now, with the new queues/async/sync refactors.
# await llm.push_frame(TextFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
# text_filter=MarkdownTextFilter(),
)
llm = NimLLMService(
api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.3-70b-instruct"
)
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Returns the current weather at a location, if one is specified, and defaults to the user's location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to find the weather of, or if not provided, it's the default location.",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Whether to use SI or USCS units (celsius or fahrenheit).",
},
},
"required": ["location", "format"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

Some files were not shown because too many files have changed in this diff Show More