Compare commits

..

1249 Commits

Author SHA1 Message Date
Mark Backman
54e8d29615 Merge pull request #3022 from pipecat-ai/mb/changelog-0.0.94
Prep for 0.0.94 hotfix
2025-11-10 16:52:38 -05:00
Mark Backman
ee494918a9 Prep for 0.0.94 hotfix 2025-11-10 16:50:58 -05:00
Mark Backman
aa8a50bc61 Merge pull request #3015 from pipecat-ai/mb/deprecate-krisp
Deprecate KrispFilter
2025-11-10 16:38:06 -05:00
Mark Backman
20857ac19a Merge pull request #3010 from pipecat-ai/mb/gemini-live-ar-xa
Add ar-XA language code for Gemini Live
2025-11-10 16:36:33 -05:00
Mark Backman
421a1b5389 Merge pull request #3021 from pipecat-ai/mb/add-sarvam-stt-readme
Add Sarvam STT to README list
2025-11-10 16:36:03 -05:00
Mark Backman
8dd45af5b7 Deprecate KrispFilter 2025-11-10 16:35:11 -05:00
kompfner
66c903276a Merge pull request #3020 from pipecat-ai/pk/make-explicit-adding-spaces-when-concatenating-tts-text
Make the mechanism of adding spaces when concatenating TTS (or speech…
2025-11-10 14:34:10 -05:00
Mark Backman
588dcf2ab9 Add Sarvam STT to README list 2025-11-10 14:29:54 -05:00
Paul Kompfner
913194844e Make the mechanism of adding spaces when concatenating TTS (or speech-to-speech LLM) output text explicit and deterministic, rather than heuristic-based.
This fixes a bug where spaces were sometimes missing from assistant messages in context.
2025-11-10 14:22:32 -05:00
Vanessa Pyne
c2ce143e6c Merge pull request #3017 from pipecat-ai/vp-rm-livekit-serializer
remove LivekitFrameSerializer
2025-11-10 11:56:47 -06:00
vipyne
c1c7a561ed remove LivekitFrameSerializer 2025-11-10 11:06:12 -06:00
kompfner
05311dcfbf Merge pull request #3016 from pipecat-ai/mb/revert-concat-aggregated-text
Revert "Merge pull request #3004 from pipecat-ai/mb/improve-concat-ag…
2025-11-10 10:49:38 -05:00
Mark Backman
2300941bb8 Revert "Merge pull request #3004 from pipecat-ai/mb/improve-concat-aggregated-text"
This reverts commit 5e7f59a0b0, reversing
changes made to 2ad4122b77.
2025-11-10 09:58:19 -05:00
Mark Backman
0df75b0915 Add ar-XA language code for Gemini Live 2025-11-08 08:24:55 -05:00
Aleix Conchillo Flaqué
16e2d5b998 Merge pull request #3007 from pipecat-ai/aleix/pipecat-0.0.93
update CHANGELOG for 0.0.93
2025-11-07 13:25:25 -08:00
Aleix Conchillo Flaqué
4cf9e1409e update CHANGELOG for 0.0.93 2025-11-07 13:17:44 -08:00
Aleix Conchillo Flaqué
0ed430e7e2 examples(foundational): use DeepgramSTTService in 07 2025-11-07 11:34:11 -08:00
Aleix Conchillo Flaqué
342a8b121b pyproject: update simli to 0.1.25 2025-11-07 11:30:41 -08:00
Aleix Conchillo Flaqué
5729722dcd SimliVideoService: check exception initializing simli client 2025-11-07 11:30:41 -08:00
Aleix Conchillo Flaqué
38aac44a1e scripts(evals): 26c should be a camera eval 2025-11-07 11:30:41 -08:00
Aleix Conchillo Flaqué
4f1468e0fa scripts(evals): improve eval prompt 2025-11-07 10:05:46 -08:00
Aleix Conchillo Flaqué
9b1192ca9b Merge pull request #3001 from pipecat-ai/pk/openai-realtime-toolsschema-support
Added support for passing in a `ToolsSchema` in lieu of a list of pro…
2025-11-07 09:37:43 -08:00
Mark Backman
5e7f59a0b0 Merge pull request #3004 from pipecat-ai/mb/improve-concat-aggregated-text
Improve concatenate_aggregated_text string utility
2025-11-07 12:37:12 -05:00
Aleix Conchillo Flaqué
2ad4122b77 Merge pull request #3006 from pipecat-ai/aleix/vision-image-backwards-compatibility
restore vision/image backwards compatibility
2025-11-07 09:19:38 -08:00
Mark Backman
5950f734f5 Merge pull request #3002 from pipecat-ai/mb/clarify-model-openai-realtime
Clarify how to set model in OpenAIRealtimeLLMService
2025-11-07 12:05:10 -05:00
Aleix Conchillo Flaqué
8d0364b630 restore vision/image backwards compatibility 2025-11-07 08:53:58 -08:00
kompfner
bfe031604a Merge pull request #3005 from pipecat-ai/pk/add-missing-comments
Add missing explanatory comments to AWS and Anthropic that are presen…
2025-11-07 11:50:41 -05:00
kompfner
9bfde61183 Merge pull request #3003 from pipecat-ai/pk/fix-deprecation-warning-always-printed-on-set-bot-ready
Fix a deprecation warning printed every time `RTVIProcessor.set_bot_r…
2025-11-07 11:50:30 -05:00
Paul Kompfner
cb40a39a01 Add missing explanatory comments to AWS and Anthropic that are present in the other LLM services 2025-11-07 11:44:44 -05:00
Mark Backman
03001f8047 Update TranscriptProcessor unit tests 2025-11-07 11:44:04 -05:00
Paul Kompfner
10f1c314b6 Fix a deprecation warning printed every time RTVIProcessor.set_bot_ready() is called 2025-11-07 11:27:58 -05:00
Mark Backman
4d1d6465fc Support OpenAI Realtime and Gemini Live single word edge cases in concatenate_aggregated_text 2025-11-07 11:26:38 -05:00
Paul Kompfner
359d220162 Document a OpenAIRealtimeLLMService gotcha in an example. 2025-11-07 10:32:27 -05:00
Mark Backman
6feecf05f7 Merge pull request #2994 from Toprak2/patch-1
Fix incorrect docstring in FrameProcessorQueue.__init__
2025-11-07 10:21:11 -05:00
Paul Kompfner
c3306bb4f2 Support for passing in a ToolsSchema in lieu of a list of provider-specific dicts when updating OpenAIRealtimeLLMService using LLMUpdateSettingsFrame. 2025-11-07 10:18:29 -05:00
Mark Backman
07a4aae248 Clarify how to set model in OpenAIRealtimeLLMService 2025-11-07 09:58:12 -05:00
Paul Kompfner
925a6cc2ef Added support for passing in a ToolsSchema in lieu of a list of provider-specific dicts when initializing OpenAIRealtimeLLMService.
I chose to go the somewhat hacky route of adding the `ToolsSchema` support into the `events.SessionProperties` model itself—even though we should never serialize that type when creating events—because the alternative seemed to be to create a new type for `OpenAIRealtimeLLMService` initialization parameters and then we'd have to contend with backward compatibility, which seemed like a bigger headache.
2025-11-07 09:50:26 -05:00
Mark Backman
613ad74103 Merge pull request #3000 from pipecat-ai/mb/clarify-openai-realtime-model-docs 2025-11-07 06:36:30 -05:00
Muhammed Ali Toprak
2ab6b71890 Shorten docstring for clarity 2025-11-07 11:24:06 +03:00
Toprak2
c2bd8d22a0 Merge branch 'pipecat-ai:main' into patch-1 2025-11-07 11:19:08 +03:00
Mark Backman
eda12f56e6 Add clarifying documentation about OpenAI Realtime model use 2025-11-06 19:42:35 -05:00
Aleix Conchillo Flaqué
3daa1b7850 Merge pull request #2998 from pipecat-ai/aleix/transport-params-audio-out-end-silence-secs
BaseOutputTransport: send silence when EndFrame is received
2025-11-06 12:18:33 -08:00
Aleix Conchillo Flaqué
4c8c44ecc3 BaseOutputTransport: send silence when EndFrame is received 2025-11-06 12:16:05 -08:00
Aleix Conchillo Flaqué
8c34e1efba Merge pull request #2996 from pipecat-ai/aleix/broadcast-frame
FrameProcessor: add new broadcast_frame() method
2025-11-06 12:13:15 -08:00
Aleix Conchillo Flaqué
f6916428b1 FrameProcessor: add new broadcast_frame() method 2025-11-06 12:11:48 -08:00
Marcus
a14d00b806 Improved LocalSmartTurnAnalyzerV3 performance on systems with a low CPU count (#2982) 2025-11-06 14:42:05 -05:00
Mark Backman
927cf751c0 Merge pull request #2997 from pipecat-ai/mb/google-stt-409
GoogleSTTService: Add more robust handling of 409 errors
2025-11-06 14:39:51 -05:00
Mark Backman
1fb6d6bd23 GoogleSTTService: Add more robust handling of 409 errors 2025-11-06 14:35:53 -05:00
Mark Backman
94a3306679 Merge pull request #2995 from pipecat-ai/mb/fix-stt-mute-filter-stt-muting
fix: STTMuteFilter no longer sends STTMuteFrame
2025-11-06 14:35:07 -05:00
Mark Backman
16bd1fe32d Merge pull request #2984 from pipecat-ai/mb/11labs-pronunciation-dictionary
Add ElevenLabs pronunciation dictionary support
2025-11-06 14:23:49 -05:00
Mark Backman
58b552171d Add pronunciation_dictionary_locators to ElevenLabs TTS Services 2025-11-06 14:13:51 -05:00
Mark Backman
4732a442d4 Merge pull request #2992 from pipecat-ai/mb/metrics-log-observer
Add MetricsLogObserver
2025-11-06 14:04:55 -05:00
Mark Backman
accdddce95 Add MetricsLogObserver 2025-11-06 14:01:20 -05:00
Mark Backman
daf9da823c Merge pull request #2993 from pipecat-ai/mb/fix-gemini-token-counting
fix: correct GoogleLLMService token counting
2025-11-06 13:47:51 -05:00
Mark Backman
f6b6aa8766 fix: STTMuteFilter no longer sends STTMuteFrame 2025-11-06 11:53:32 -05:00
Toprak2
935eb58951 Update docstring for FrameProcessorQueue
Clarify the docstring for FrameProcessorQueue initialization.
2025-11-06 19:18:15 +03:00
Mark Backman
9f2ddcc5f4 Merge pull request #2927 from pipecat-ai/marcus/2025-10-28_sample_rtvi_fix
Add RTVIProcessor to foundational example 38b
2025-11-06 10:19:10 -05:00
Mark Backman
961e28517e Remove arg from RTVIProcessor 2025-11-06 10:16:31 -05:00
Mark Backman
34d6f3fa00 fix: correct GoogleLLMService token counting 2025-11-06 10:01:37 -05:00
Filipi da Silva Fuchter
616abfd96c Merge pull request #2987 from pipecat-ai/filipi/fix_start_endpoint
Fixing the runner start endpoint when enableDefaultIceServers is enabled.
2025-11-06 09:32:01 -03:00
Mark Backman
d7774ac599 Merge pull request #2991 from pipecat-ai/mb/fix-deepgram-is-connected 2025-11-06 06:35:51 -05:00
Mark Backman
c8c13ecee2 fix: DeepgramSTTService await is_connected() 2025-11-05 21:42:15 -05:00
Vanessa Pyne
314acc104e Merge pull request #2990 from pipecat-ai/vp-fix-riva-default-voice
fix default riva tts voice_id
2025-11-05 18:40:41 -06:00
vipyne
1dfa59257d fix default riva tts voice_id 2025-11-05 18:30:05 -06:00
Mark Backman
376dcc34f7 Merge pull request #2986 from pipecat-ai/mb/docs-0.0.93
Fix docstrings for 0.0.93 release, fix classmethod placement in Reque…
2025-11-05 15:49:09 -05:00
Filipi Fuchter
5ee8c56899 Fixing the runner start endpoint when enableDefaultIceServers is enabled. 2025-11-05 17:36:24 -03:00
kompfner
4397deddc7 Merge pull request #2970 from pipecat-ai/pk/tool-registration-improvements
Assorted tool registration improvements
2025-11-05 15:31:15 -05:00
Paul Kompfner
13d6078ea0 Minor tweak to an example for clarity. 2025-11-05 15:30:01 -05:00
Paul Kompfner
61aec08794 CHANGELOG item ordering tweak 2025-11-05 15:29:58 -05:00
Paul Kompfner
0f69d4aea3 Fixed an issue where GeminiLiveLLMService wasn't consistent in what it would do if if it received an LLMContextFrame (triggered by an LLMRunFrame, say) and there were no user messages in the initial context:
- If the context contained a system message, that message would be converted to a user message and the LLM would respond
- If the system message was provided as a constructor argument, though, no user messages would be sent to the LLM, and the LLM would therefore not respond

Not adding this fix to the CHANGELOG since `GeminiLiveLLMService`'s ability to properly handle context-provided tools and system instruction hasn't been published yet.
2025-11-05 15:29:04 -05:00
Paul Kompfner
84ba628dfb Fix a bug in GeminiLiveLLMService where if only *one* of tools or system instruction was provided in the context, the other wouldn't fall back to using the value provided in the constructor.
Not adding this fix to the CHANGELOG since `GeminiLiveLLMService`'s ability to properly handle context-provided tools and system instruction hasn't been published yet.
2025-11-05 15:29:04 -05:00
Paul Kompfner
9ce33f23b9 Add an example demonstrating MCP usage with a speech-to-speech service (GeminiLiveLLMService) using the pattern of passing in tools in the constructor 2025-11-05 15:29:04 -05:00
Paul Kompfner
75245e1daa Fix a bug in GeminiLiveLLMService where in some circumstances it wouldn't respond after a tool call 2025-11-05 15:29:04 -05:00
Paul Kompfner
24365aeefe CHANGELOG wording fix 2025-11-05 15:29:04 -05:00
Paul Kompfner
29ef0f419f Add typing formalizing MCPClient support for registering tools on an LLMSwitcher in addition to an LLMService. 2025-11-05 15:29:01 -05:00
Paul Kompfner
a9d78bd956 Make it possible to get a ToolsSchema out of an MCPClient without passing in an LLM service.
This allows folks to use `MCPClient` alongside the pattern of passing in tools at LLM init time, a pattern supported by speech-to-speech services such as `GeminiLiveLLMService`.
2025-11-05 15:28:23 -05:00
Paul Kompfner
e6f881bb08 Remove the "needs alternate schema" mechanism in MCPClient, moving the necessary schema massaging into GeminiLLMAdapter instead.
This does a couple of things:
- Makes the `MCPClient` LLM agnostic, setting us up for some upcoming improvements (like making it possible to use with `LLMSwitcher`)
- Makes `GeminiLLMAdapter` more robust, as the schema massaging that was previously only done in `MCPClient` is useful for all tools, not just for MCP-provided ones
2025-11-05 15:28:23 -05:00
Paul Kompfner
bee4165ba4 Add LLMSwitcher.register_direct_function() 2025-11-05 15:28:19 -05:00
Mark Backman
e2f6ce1078 Fix docstrings for 0.0.93 release, fix classmethod placement in RequestHandler 2025-11-05 15:27:16 -05:00
Paul Kompfner
0184493711 Update the service switcher example to illustrate registering tools on all LLMs in a switcher 2025-11-05 15:27:00 -05:00
Aleix Conchillo Flaqué
eb3c4c59fc Merge pull request #2985 from pipecat-ai/revert-2976-aleix/interruption-task-frame-finished-event
Revert "fix interruption task frame context ordering"
2025-11-05 12:25:45 -08:00
Aleix Conchillo Flaqué
d844829538 Revert "fix interruption task frame context ordering" 2025-11-05 12:14:03 -08:00
Mark Backman
11b101e8a6 Merge pull request #2974 from pipecat-ai/mb/language-mapping-improvements
Improve language checking in STT and TTS services
2025-11-05 14:59:41 -05:00
Mark Backman
3db5ab9f23 Merge pull request #2983 from pipecat-ai/mb/bump-fastapi-0.122.0
Bumped the fastapi dependency to <0.122.0
2025-11-05 13:24:23 -05:00
Mark Backman
9a96e4060c Bumped the fastapi dependency to <0.122.0 2025-11-05 13:13:47 -05:00
Aleix Conchillo Flaqué
d826279946 Merge pull request #2976 from pipecat-ai/aleix/interruption-task-frame-finished-event
fix interruption task frame context ordering
2025-11-05 09:53:26 -08:00
Aleix Conchillo Flaqué
e4212fb3c0 tests: add interruption strategies context ordering tests 2025-11-05 09:38:18 -08:00
Aleix Conchillo Flaqué
234aae3091 FrameProcessor: use finished_event for push_interruption_task_frame_and_wait 2025-11-05 09:38:17 -08:00
Aleix Conchillo Flaqué
c33b81bb92 PipelineTask: set finished_event InterruptionFrame is received 2025-11-05 09:37:55 -08:00
Aleix Conchillo Flaqué
a1c07039ee frames: added finished_event to InterruptionFrame/InterruptionTaskFrame 2025-11-05 09:37:55 -08:00
Mark Backman
33be73692f Merge pull request #2979 from thsunkid/feature/whisper-stt-probability-metrics
Feat: Access prob metrics for Whisper STT services using include_prob_metrics
2025-11-05 12:33:24 -05:00
Mark Backman
f6d7b6ae5f Fix SpeechmaticsSTTService: use language checking for language and output_locale 2025-11-05 12:26:52 -05:00
Mark Backman
2ee54c985f Improve language checking in STT and TTS services 2025-11-05 12:26:52 -05:00
Thu Nguyen
76c336644a Merge branch 'main' into feature/whisper-stt-probability-metrics 2025-11-06 00:24:34 +07:00
Thu Nguyen
dd8711dee1 Added changelog 2025-11-06 00:23:42 +07:00
Thu Nguyen
c26c27fe21 Update util with new docs and extract_deepgram_probability 2025-11-06 00:23:20 +07:00
Mark Backman
159dbd078d Merge pull request #2980 from pipecat-ai/mb/gemini-vertex-update
Refactor GoogleVertexLLMService to use GoogleLLMService as a base class
2025-11-05 11:35:50 -05:00
Mark Backman
c18ff999a5 Update GoogleVertexLLMService default model to gemini-2.5-flash 2025-11-05 11:28:41 -05:00
Mark Backman
80d127aaa4 Refactor GoogleVertexLLMService to use GoogleLLMService as a base class 2025-11-05 09:33:02 -05:00
Mark Backman
bbc7d3e2fb Merge pull request #2977 from pipecat-ai/mb/request-handler-smallwebrtc
Fix: support request data in SmallWebRTC
2025-11-05 08:50:31 -05:00
Thu Nguyen
3486d63ef6 Add docs 2025-11-05 13:30:49 +07:00
Thu Nguyen
842c4a3485 Update base_stt 2025-11-05 13:26:59 +07:00
Thu Nguyen
0b779a880b Feat: allow accessing prob metrics for Whisper STT services with include_prob_metrics param 2025-11-05 13:24:49 +07:00
Mark Backman
01f3421052 Fix: support request data in SmallWebRTC 2025-11-04 17:14:29 -05:00
Aleix Conchillo Flaqué
c20aa78648 Merge pull request #2969 from pipecat-ai/aleix/pipecat-observer-files
PipelineTask: load observers from PIPECAT_OBSERVER_FILES
2025-11-04 12:34:37 -08:00
Aleix Conchillo Flaqué
38f27ad991 PipelineTask: load observers from PIPECAT_OBSERVER_FILES 2025-11-04 12:10:53 -08:00
Mark Backman
0c38585034 Merge pull request #2973 from pipecat-ai/mb/cartesia-sonic-3-languages
Add sonic-3 languages to Cartesia TTS services
2025-11-04 14:43:06 -05:00
Mark Backman
8a09bbbf0e Merge pull request #2972 from akash-dutta-dev/hotfix/addCustomParamForExotel
Add customer parameter in Call Data for Exotel
2025-11-04 14:29:58 -05:00
Vanessa Pyne
fb737ff671 Merge pull request #2967 from pipecat-ai/vp-39-bork-83
update example 39-mcp-stdio.py to use different mcp server
2025-11-04 09:02:29 -06:00
vipyne
b7a4d7371c wrap tools = await mcp.register_tools(llm) in try in examples 2025-11-04 09:01:12 -06:00
vipyne
ef88d6a2ea update example 39-mcp-stdio.py to use different mcp server
https://www.loom.com/share/a9f0a270261d4c6cb054ab2b4dcd6084

SO to Rijksmuseum MCP
https://github.com/r-huijts/rijksmuseum-mcp
2025-11-04 09:01:12 -06:00
kompfner
5c1bd8cda2 Merge pull request #2961 from pipecat-ai/pk/gemini-live-fix-session-resumption
Fix Gemini Live session resumption. The problem was that we weren't p…
2025-11-04 09:19:17 -05:00
Paul Kompfner
a82158045a Fix Gemini Live session resumption. The problem was that we weren't properly ignoring send errors during reconnection. 2025-11-04 09:18:40 -05:00
Mark Backman
b1533ddfc4 Add sonic-3 languages to Cartesia TTS services 2025-11-04 07:57:04 -05:00
Mark Backman
0abc699f24 Merge pull request #2964 from pipecat-ai/mb/14j-nim-updates
Fix 14j foundational example
2025-11-04 07:24:53 -05:00
Akash Dutta
09018071e8 Add customer parameter in Call Data for Exotel 2025-11-04 16:57:28 +05:30
Mark Backman
1c53a5fd01 Fix 14j foundational example 2025-11-03 14:57:44 -05:00
kompfner
05d4753d3e Merge pull request #2956 from pipecat-ai/pk/gemini-honor-context-provided-instructions-and-tools
`GeminiLiveLLMService` supports context-provided system instruction a…
2025-11-03 10:38:26 -05:00
Paul Kompfner
87131850bc GeminiLiveLLMService supports context-provided system instruction and tools 2025-11-03 10:30:46 -05:00
Aleix Conchillo Flaqué
af83f45a49 Merge pull request #2959 from pipecat-ai/aleix/cancel-frame-reason
CancelFrame: add reason field to indicate why pipeline is being cancelled
2025-11-02 11:06:58 -08:00
Aleix Conchillo Flaqué
62e45f466a EndFrame: add reason field to indicate why pipeline is being ended 2025-11-02 00:45:27 -07:00
Aleix Conchillo Flaqué
e85e93b9b1 CancelFrame: add reason field to indicate why pipeline is being cancelled 2025-11-02 00:44:47 -07:00
Mark Backman
074d3ff162 Merge pull request #2821 from shreyas-sarvam/sarvam/stt
Sarvam STT/STTT WS implementation
2025-10-31 13:47:27 -04:00
shreyas-sarvam
d680ec2e69 Merge branch 'main' into sarvam/stt 2025-10-31 23:09:47 +05:30
shreyas-sarvam
d905b21f72 fix: Pass input_audio_codec as an __init__ parameter 2025-10-31 23:07:48 +05:30
shreyas-sarvam
6c5d84ca4c fix: Fixes for sample_rate being passed by PipelineParams 2025-10-31 23:03:25 +05:30
Aleix Conchillo Flaqué
334167e3d7 Merge pull request #2953 from pipecat-ai/aleix/pipecat-0.0.92
update CHANGELOG for 0.0.92. 🎃 "The Haunted Edition" 👻
2025-10-31 09:47:25 -07:00
Aleix Conchillo Flaqué
e3531a5f25 update CHANGELOG for 0.0.92. 🎃 "The Haunted Edition" 👻 2025-10-31 09:17:03 -07:00
Mark Backman
343e97666a Merge pull request #2954 from pipecat-ai/mb/runner-meeting-token-properties
Add support for token properties in Daily util and development runner
2025-10-31 12:12:14 -04:00
Mark Backman
653e84321b Add support for token properties in Daily util and development runner 2025-10-31 12:08:53 -04:00
Mark Backman
3585f724c4 Merge pull request #2952 from pipecat-ai/mb/add-daily-room-properties-to-runner
Add support for dailyRoomProperties when calling /start using the dev…
2025-10-31 12:04:42 -04:00
Mark Backman
5fe597d355 Add support for dailyRoomProperties when calling /start using the development runner 2025-10-31 12:01:03 -04:00
Aleix Conchillo Flaqué
67ab3773f6 Merge pull request #2949 from pipecat-ai/aleix/idle-timeout-observer
PipelineTask: add IdleFrameObserver to detect idle pipelines
2025-10-31 08:51:09 -07:00
Mark Backman
c6e12b9358 Merge pull request #2943 from pipecat-ai/mb/deepgram-http
Add DeepgramHttpTTSService
2025-10-31 11:51:06 -04:00
Aleix Conchillo Flaqué
0f5030bafa tests: add unit test to check for idle timeout on swallowed frames 2025-10-31 08:45:56 -07:00
Aleix Conchillo Flaqué
ed93e29850 PipelineTask: add IdleFrameObserver to detect idle pipelines 2025-10-31 08:45:56 -07:00
Mark Backman
7eb880c5e8 Add DeepgramHttpTTSService 2025-10-31 11:39:32 -04:00
Aleix Conchillo Flaqué
4fa0de6660 Merge pull request #2947 from pipecat-ai/aleix/rename-add-to-context
UserImageRawFrame: rename add_to_context to append_to_context
2025-10-31 08:29:49 -07:00
kompfner
396c1bcc13 Merge pull request #2946 from pipecat-ai/pk/deprecate-expect-stripped-words
Deprecate `expect_stripped_words` option from `LLMAssistantAggregatorParams`…
2025-10-31 09:57:20 -04:00
shreyas-sarvam
57f6ae9e50 Merge branch 'main' into sarvam/stt 2025-10-31 17:36:52 +05:30
shreyas-sarvam
2d03e51109 fix: Remove unused imports, use sample_rate from base class 2025-10-31 17:31:59 +05:30
Mark Backman
1e7143e5f3 Merge pull request #2942 from pipecat-ai/mb/speechmatics-tts-changelog
Add SpeechmaticsTTSService, Soniox changes to changelog
2025-10-31 07:43:58 -04:00
Mark Backman
f820c20fa2 Add SpeechmaticsTTSService and SonioxSTTService changes to changelog 2025-10-31 07:41:17 -04:00
Mark Backman
83f395ff8f Merge pull request #2940 from thsunkid/feature/google-tts-chirp-speaking-rate
Add dynamic speaking_rate control for Google TTS Chirp voices
2025-10-31 07:39:05 -04:00
shreyas-sarvam
09a7e08cbf Merge branch 'main' into sarvam/stt 2025-10-31 15:21:09 +05:30
shreyas-sarvam
6f172bba8f feat: Make input parameters accessible to users 2025-10-31 15:17:06 +05:30
shreyas-sarvam
1433df4de2 fix: Fix language param and include suggested way of handling STT response 2025-10-31 13:23:08 +05:30
Thu Nguyen
6ade5617fb addressed comments 2025-10-31 09:53:47 +07:00
Aleix Conchillo Flaqué
685d440206 UserImageRawFrame: rename add_to_context to append_to_context 2025-10-30 15:18:27 -07:00
Paul Kompfner
ac5734d0ed Deprecate expect_stripped_words option from LLMAssistantAggregatorParams, when used with the newer LLMAssistantAggregator, which now handles word spacing automatically.
This commit does not change how it works in the older `LLMAssistantContextAggregator`.
2025-10-30 17:22:47 -04:00
Aleix Conchillo Flaqué
5e00133e64 Merge pull request #2935 from pipecat-ai/aleix/improve-image-and-vision-support
improve image and vision support
2025-10-30 14:09:01 -07:00
Aleix Conchillo Flaqué
42f0490414 examples(foundational): 14-* show how to tell the LLM we are capturing an image 2025-10-30 14:02:17 -07:00
Aleix Conchillo Flaqué
19f046a338 examples(foundational): add 12d-describe-image-moondream 2025-10-30 14:02:17 -07:00
Aleix Conchillo Flaqué
ec95618b94 don't tie UserImageRawFrame with function calls 2025-10-30 14:02:17 -07:00
Aleix Conchillo Flaqué
74fb6e7676 scripts(evals): improve eval prompting 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
8fa6cbac51 examples(foundational): added 14d docstrings 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
a997655eac scripts(evals): simplify eval configuration and allow RunnerArgs body 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
3b3a215155 examples(foundational): re-add 12-* but load image from file 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
e458d3edfe scripts(evals): update 12-* for 14-*-video 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
d7d409df60 examples(foundational): move 12-* to 14-*-video 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
5174b18176 LLMAssistantAggregator: don't mark function calls as completed when receiving user image
Before, when requesting a user image from a function call we had to wait for a
random time before we could indicate the function call was done. This was to
given time to the aggregator to process the image before marking the function
call as completed.

To avoid this, we now wait for the requested image to be received by the LLM
assistant agrgegator (using an asyncio event). Then, we can successfully mark
the function call as completed.
2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
9c5690d670 LLMContext: added support for image messages with URLs 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
e0933e20d2 deprecated UserResponseAggregator 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
ce13155d26 vision(moondream): process VisionImageRawFrame 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
817a485d94 frames: added VisionImageRawFrame 2025-10-30 13:08:15 -07:00
Aleix Conchillo Flaqué
b094418d1e LLMContext: add create_image_message and create_audio_message 2025-10-30 13:08:13 -07:00
Filipi da Silva Fuchter
08a1e09020 Merge pull request #2944 from pipecat-ai/filipi/flux_handlers
New event handlers for the DeepgramFluxSTTService.
2025-10-30 16:40:41 -03:00
Filipi Fuchter
52b33e5106 New event handlers for the DeepgramFluxSTTService. 2025-10-30 16:09:07 -03:00
Mark Backman
5db0871a20 Merge pull request #2873 from matejmarinko-soniox/main
Update model params for Soniox STT
2025-10-30 12:50:30 -04:00
Mark Backman
222c362fa4 Merge pull request #2937 from aaronng91/speechmatics-tts
Add Speechmatics TTS
2025-10-30 12:30:27 -04:00
Aaron Ng
9d509bb409 address changes 2025-10-30 16:25:10 +00:00
shreyas-sarvam
8d0e7e5e16 chore: Add changelog entry, update foundational examples 2025-10-30 19:22:14 +05:30
shreyas-sarvam
e7b8da7a83 feat: Refactor code to include language parameter, model_name and use _handle_transcription method 2025-10-30 19:01:04 +05:30
shreyas-sarvam
35c48a45cf fix: Ruff format 2025-10-30 18:51:18 +05:30
shreyas-sarvam
14a365aa16 fix: Use message handler to handle responses 2025-10-30 17:54:32 +05:30
shreyas-sarvam
779fc0419d Merge branch 'main' into sarvam/stt 2025-10-30 15:50:53 +05:30
Thu Nguyen
057e0c3973 Lint 2025-10-30 17:12:36 +07:00
Thu Nguyen
8a6abdd44b feat: Add dynamic speaking_rate control for Google TTS Chirp voices 2025-10-30 17:09:41 +07:00
Mark Backman
7872fa2e88 Merge pull request #2934 from roshie548/add-cartesia-generation-config
feat: add generation_config support for Cartesia Sonic-3
2025-10-29 23:10:48 -04:00
Roshan
e86c546a1a Merge branch 'main' into add-cartesia-generation-config 2025-10-29 18:31:09 -07:00
Roshan
abf34bcccf address pr comments 2025-10-29 18:29:51 -07:00
Aleix Conchillo Flaqué
56eb633390 Merge pull request #2911 from pipecat-ai/aleix/daily-transport-improve-error-handling
DailyTransport: update start_dialout/start_recording return values
2025-10-29 16:28:10 -07:00
Aleix Conchillo Flaqué
6299b9db87 DailyTransport: trigger "on_error" if transcription fails to start/stop 2025-10-29 16:25:13 -07:00
Aleix Conchillo Flaqué
bcffa590a3 DailyTransport: update start_dialout/start_recording return values 2025-10-29 16:25:13 -07:00
kompfner
8b739aa444 Merge pull request #2889 from pipecat-ai/pk/openai-realtime-universal-llmcontext-2
Support new `LLMContext` pattern with `OpenAIRealtimeLLMService`
2025-10-29 16:54:37 -04:00
Paul Kompfner
8f15980c67 Get rid of unnecessary new task in example file 2025-10-29 16:23:50 -04:00
Paul Kompfner
89e9acf0e1 CHANGELOG and code comment tweaks 2025-10-29 16:21:04 -04:00
Paul Kompfner
ddac24e6c9 Fix a missing space in a warning message 2025-10-29 16:17:05 -04:00
Paul Kompfner
d0f52feba3 OpenAI Realtime needs the assistant context aggregator to have expect_stripped_words=False 2025-10-29 16:15:16 -04:00
Paul Kompfner
8894db4290 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Add warning about no longer pushing `TTSTextFrame`s.
2025-10-29 15:45:06 -04:00
Paul Kompfner
1f96cdf970 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Make `LLMUserAggregator` push the `LLMSetToolsFrame`s, in case a speech-to-speech service that needs to handle the frame itself—like `OpenAIRealtimeLLMService`—is downstream. As far as I can tell, pushing `LLMSetToolsFrame` should otherwise have no unwanted side effects.
2025-10-29 15:43:51 -04:00
Paul Kompfner
0282033208 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Add `LLMContext.get_messages_for_persistent_storage()` for compatibility with `OpenAILLMContext`, to avoid tripping up users who we're unknowingly migrating to using `LLMContext`.
2025-10-29 15:43:51 -04:00
Paul Kompfner
917ea27352 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Update `AzureRealtimeLLMService` example (19a) to use new `LLMContext` pattern.
2025-10-29 15:43:51 -04:00
Paul Kompfner
8c03df1463 Update some docstring arg descriptions to be a bit more current or accurate 2025-10-29 15:43:51 -04:00
Paul Kompfner
15aa76efba Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Maintain backward compatibility with functions specified in dict format.
2025-10-29 15:43:51 -04:00
Paul Kompfner
8ac421f8fd Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Remove unused imports.
2025-10-29 15:43:51 -04:00
Paul Kompfner
75b3ea9c96 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Fix tracing.
2025-10-29 15:43:51 -04:00
Paul Kompfner
95be1510ac Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Improve `OpenAIRealtimeLLMAdapter.get_messages_for_logging()`.
2025-10-29 15:43:51 -04:00
Paul Kompfner
df19011080 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Improve warning about transcription frame direction change.
2025-10-29 15:43:51 -04:00
Paul Kompfner
e42cf78e79 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Update deprecation versions.
2025-10-29 15:43:51 -04:00
Paul Kompfner
0495de52b6 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Log warning about transcription frame direction change.
2025-10-29 15:43:51 -04:00
Paul Kompfner
9bc02afd0d Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
CHANGELOG tweak.
2025-10-29 15:43:51 -04:00
Paul Kompfner
6140fdb2c9 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
In anticipation of `messages` property being added to `LLMContext` (in another PR), remove warnings about the need to use `get_messages()` instead.
2025-10-29 15:43:51 -04:00
Paul Kompfner
b6a1886dae Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd). 2025-10-29 15:43:51 -04:00
Paul Kompfner
42d0a097c5 Tweaks to 20b example 2025-10-29 15:43:51 -04:00
Paul Kompfner
3761804146 Make OpenAIRealtimeLLMService's websocket send method more resilient. Previously, it was possible for a websocket send attempt to occur during a disconnect. 2025-10-29 15:43:51 -04:00
Paul Kompfner
46e97c57c2 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Update 20b example to use new `LLMContext` pattern.
2025-10-29 15:43:51 -04:00
Paul Kompfner
19770b76b4 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Add back file that was removed, when it should've just been deprecated.

Also, fix version numbers in deprecation messages to match the next expected release.
2025-10-29 15:43:51 -04:00
Paul Kompfner
b34461bf93 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd). 2025-10-29 15:43:47 -04:00
Paul Kompfner
bab0aaf585 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Update `create_context_aggregator()` (which we're keeping around for backward compatibility) to create a `LLMContextAggregatorPair` rather than OpenAI-Realtime-specific aggregators.
2025-10-29 15:36:58 -04:00
Paul Kompfner
61944d22ef Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Implement sending tool call results to the OpenAI server based on reading context updates. This lets us use the normal assistant context aggregator and not a special OpenAI Realtime subclass that pushes up a special frame for function call results.
2025-10-29 15:36:58 -04:00
Paul Kompfner
47756319be Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Receiving a new context (via a context frame) no longer serves as a signal to reset the conversation. That’s because we’re now receiving new contexts from the user aggregator every time new messages are added, and from the assistant aggregator when function call results come in. The code pattern we're heading towards, of “diffing” each new context with the previous on, sets us up for doing more sophisticated things in the future, like sending specific messages to OpenAI to edit its internally-tracked context.

Also, remove code that was directly modifying context.
2025-10-29 15:36:58 -04:00
Paul Kompfner
5fa56df014 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Update 19b example with new pattern.
2025-10-29 15:36:58 -04:00
Paul Kompfner
8a151235c3 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Deprecate `send_transcription_frames`—transcription frames are now always sent.
2025-10-29 15:36:57 -04:00
Paul Kompfner
ec42f8c24e Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Push `TranscriptionFrame`s upstream, to be handled by the user context aggregator. This will require at least a couple of other changes:
- Updating examples to put transcript processors upstream from `OpenAIRealtimeLLMService`
- Maybe figuring out a way to preserve backward compatibility with existing pipelines that put transcript processors downstream from `OpenAIRealtimeLLMService`
- Updating `OpenAIRealtimeLLMService` to ignore new received context frames, since the upstream user context aggregator will generate those after each newly-added user message; hopefully nobody was reliant on the old behavior of resetting the conversation upon receiving a new context!
2025-10-29 15:36:57 -04:00
Paul Kompfner
29fd17b9ff Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (cont'd).
Avoid pushing `LLMTextFrame` when `OpenAIRealtimeLLMService` is configured to output audio. This avoids duplicate text in assistant messages in context. Conceptually, a speech-to-speech service encapsulates TTS behavior; in a "traditional" pipeline, `LLMTextFrames` are swallowed by the TTS service, so they should similarly not be pushed by a speech-to-speech service. Only. `TTSTextFrame`s should be pushed.
2025-10-29 15:36:57 -04:00
Paul Kompfner
3ea1e357f2 Update OpenAIRealtimeLLMService to work with LLMContext and LLMContextAggregatorPair (initial part of work) 2025-10-29 15:36:57 -04:00
kompfner
351ef617ae Merge pull request #2932 from pipecat-ai/pk/gemini-live-universal-llmcontext
Update `GeminiLLMService` to work with `LLMContext` and `LLMContextAg…
2025-10-29 15:35:13 -04:00
Paul Kompfner
9dafb715c4 Update some deprecation versions 2025-10-29 15:30:43 -04:00
Paul Kompfner
82d494d3d4 Fix a bug in GeminiLiveLLMService related to ending gracefully—i.e. waiting for the bot to stop responding before ending the pipeline—when the service is configured with the TEXT modality 2025-10-29 14:34:02 -04:00
Mark Backman
e893aaa620 Merge pull request #2931 from ivaaan/hume-bugfix
Hume: use Octave v1 if description provided
2025-10-29 13:40:21 -04:00
Paul Kompfner
65c17a698e Whoops - fix a bug in GeminiLiveLLMService where we weren't checking if a tool call result was already handled before reporting it to the LLM 2025-10-29 12:44:00 -04:00
Paul Kompfner
615aae5b95 Fix GeminiLiveLLMService's sending of LLMFullResponseStartFrame and LLMFullResponseEndFrame so that they properly bookend responses.
Properly bookended responses now work with:
- AUDIO modality (validated with 26b example)
- TEXT modality (validated with 26d example)
- AUDIO modality with Vertex AI (validated with 26h example)

It doesn't seem that TEXT modality is supported with Vertex AI, hence the missing "quadrant" of validation.
2025-10-29 12:33:37 -04:00
Aaron Ng
b0acbeffb9 add sm-app param 2025-10-29 16:33:18 +00:00
Ivan A
2f1061f300 Merge branch 'main' into hume-bugfix 2025-10-29 17:06:50 +01:00
ivaaan
9307079af2 upd changelog 2025-10-29 17:05:41 +01:00
Mark Backman
efa64642a4 Merge pull request #2930 from pipecat-ai/mb/simli-constructor-update
Update Simli to align with Pipecat constructor norms
2025-10-29 11:50:11 -04:00
Mark Backman
ede6c32149 Update Simli to align with Pipecat constructor norms 2025-10-29 11:47:23 -04:00
Aaron Ng
4050e8b7dc add speechmatics tts 2025-10-29 14:53:20 +00:00
Roshan
b0f5fc02c4 refactor: use Pydantic BaseModel for GenerationConfig and simplify model_dump()
- Change GenerationConfig from dataclass to Pydantic BaseModel for consistency
- Simplify _build_msg() to use model_dump(exclude_none=True) instead of manual field extraction
- Simplify HTTP run_tts() to use model_dump(exclude_none=True) instead of manual field extraction

This addresses feedback from code review and reduces code duplication.
2025-10-28 18:41:58 -07:00
Aleix Conchillo Flaqué
493d6bf91e Merge pull request #2936 from pipecat-ai/aleix/daily-python-0.21.0
pyproject: update daily-python to 0.21.0
2025-10-28 18:25:25 -07:00
Aleix Conchillo Flaqué
aaebcae2e8 pyproject: update daily-python to 0.21.0 2025-10-28 17:23:37 -07:00
Roshan
408264a0fd docs: update CHANGELOG.md for generation_config feature 2025-10-28 15:16:49 -07:00
Roshan
df8aa3e4b0 feat: add generation_config support for Cartesia Sonic-3
Add GenerationConfig dataclass with volume, speed, and emotion parameters
for Cartesia Sonic-3 TTS models. This enables fine-grained control over
speech generation including volume (0.5-2.0), speed (0.6-1.5), and
emotion (60+ options).

Changes:
- Add GenerationConfig dataclass with proper Google-style docstrings
- Update CartesiaTTSService.InputParams to include generation_config
- Update CartesiaHttpTTSService.InputParams to include generation_config
- Modify _build_msg() to include generation_config in WebSocket messages
- Modify run_tts() to include generation_config in HTTP requests
- Maintain backward compatibility with existing speed and emotion parameters

The legacy speed (literal strings) and emotion (list) parameters remain
available for non-Sonic-3 models.
2025-10-28 15:10:46 -07:00
Mark Backman
4d82a1260b Merge pull request #2933 from pipecat-ai/mb/remove-aiohttp-session-sarvam
Remove aiohttp_session arg from SarvamTTSService
2025-10-28 16:54:56 -04:00
Paul Kompfner
f974c66e12 Update GeminiLLMService to work with LLMContext and LLMContextAggregatorPair 2025-10-28 15:46:28 -04:00
Mark Backman
533372ed37 Remove aiohttp_session arg from SarvamTTSService 2025-10-28 15:39:14 -04:00
ivaaan
a9118eb2cd use Octave 1 if description provided 2025-10-28 20:36:34 +01:00
Aleix Conchillo Flaqué
84ed2468e5 Merge pull request #2924 from pipecat-ai/aleix/daily-transport-remove-join-timeout
DailyTransport: don't timeout prematurely on join
2025-10-28 10:43:28 -07:00
Aleix Conchillo Flaqué
d82d855c20 DailyTransport: don't timeout prematurely on leave 2025-10-28 10:41:19 -07:00
Mark Backman
412ff2a4a1 Merge pull request #2929 from pipecat-ai/mb/cartesia-sonic-3
Update Cartesia's default model to sonic-3
2025-10-28 13:07:28 -04:00
Mark Backman
82ccc160fb Merge pull request #2923 from pipecat-ai/mb/runner-no-proxy-required
Remove development runner requirement for proxy
2025-10-28 11:59:38 -04:00
Mark Backman
9ef60bd468 Update Cartesia's default model to sonic-3 2025-10-28 11:49:54 -04:00
marcus-daily
06e86cc107 Add RTVIProcessor to foundational example 38b 2025-10-28 12:14:23 +00:00
Aleix Conchillo Flaqué
f3c4bf08dd DailyTransport: don't timeout prematurely on join 2025-10-27 17:52:19 -07:00
Mark Backman
f2cfbee3c3 Remove development runner requirement for proxy 2025-10-27 16:18:31 -04:00
Vanessa Pyne
8b063116ab Merge pull request #2921 from pipecat-ai/vp-azure-ex-cleanup
cleanup logger message
2025-10-27 12:59:08 -05:00
vipyne
8096e62b34 cleanup logger message 2025-10-27 11:27:30 -05:00
kompfner
20f4b0e8ff Merge pull request #2914 from pipecat-ai/pk/gemini-function-calling-fixes
Gemini function calling fixes
2025-10-27 09:45:29 -04:00
Paul Kompfner
6feaf91789 Fix a bug in GeminiLLMAdapter's handling of Gemini-specific context messages 2025-10-27 09:42:24 -04:00
Mark Backman
91d3ae07b3 Merge pull request #2915 from Rickaym/fix--rounding-the-edges-of-observer-function-method-deprecation
fix: use correct  property names
2025-10-24 19:42:34 -04:00
Pyae Sone Myo
71841f71ef fix: use correct property names 2025-10-25 00:47:46 +06:30
Paul Kompfner
949b807023 Close genai client more gracefully to avoid printed warnings. We're now following the genai library guidance: https://github.com/googleapis/python-genai?tab=readme-ov-file#close-a-client 2025-10-24 11:36:25 -04:00
Paul Kompfner
4ad15f9a01 Update Gemini service to include function name when sending function responses in context 2025-10-24 11:04:52 -04:00
Paul Kompfner
99d94fc625 Update Gemini service to use "user" role for function responses, as shown in the Gemini docs 2025-10-24 10:05:14 -04:00
Mark Backman
a3d630c0d1 Merge pull request #2908 from pipecat-ai/mb/runner-daily-start-route
fix: add support for DAILY_SAMPLE_ROOM_URL when calling /start for Da…
2025-10-23 14:15:42 -04:00
Mark Backman
04b482c445 Merge branch 'main' into mb/runner-daily-start-route 2025-10-23 14:11:38 -04:00
Mark Backman
b2bce4916f Merge pull request #2900 from pipecat-ai/mb/quickstart-pipecat-cli
Quickstart to use Pipecat CLI
2025-10-23 10:55:42 -04:00
Mark Backman
60e9817f16 fix: add support for DAILY_SAMPLE_ROOM_URL when calling /start for DailyTransport 2025-10-22 16:48:30 -04:00
kompfner
c655d0d313 Merge pull request #2907 from pipecat-ai/mb/service-switcher-updates
ServiceSwitcher updates
2025-10-22 11:23:48 -04:00
Paul Kompfner
ea6e146f2d Update TestServiceSwitcher to exercise targeting system frames only to the active service 2025-10-22 11:14:27 -04:00
Mark Backman
ec890a834f Rename to filter_system_frames 2025-10-22 11:01:33 -04:00
Mark Backman
5b921fc054 fix: FunctionFilter adds block_system_frame arg 2025-10-22 10:53:01 -04:00
Mark Backman
f1040100f4 Update ServiceSwitcher and LLMSwitcher docstrings 2025-10-22 10:51:03 -04:00
Mark Backman
54691ee781 Merge pull request #2904 from pipecat-ai/mb/bump-aws-sdk-bedrock-runtime
Upgrade aws_sdk_bedrock_runtime to v0.1.1
2025-10-22 08:58:48 -04:00
Mark Backman
49239a23c6 Upgrade aws_sdk_bedrock_runtime to v0.1.1 2025-10-21 23:27:38 -04:00
Aleix Conchillo Flaqué
e0c43de13f Merge pull request #2903 from pipecat-ai/aleix/pipecat-0.0.91
update CHANGELOG for 0.0.91
2025-10-21 19:09:23 -07:00
Aleix Conchillo Flaqué
cc4c96d099 update CHANGELOG for 0.0.91 2025-10-21 19:00:51 -07:00
Aleix Conchillo Flaqué
788465cb04 Merge pull request #2901 from pipecat-ai/pk/llmcontext-messages
Add `messages` property to `LLMContext` for usage parity with `OpenAI…
2025-10-21 18:00:33 -07:00
Aleix Conchillo Flaqué
db934eade0 Merge pull request #2891 from pipecat-ai/aleix/daily-pipecat-runner-args
runner: allow starting a bot from Daily's /start endpoint
2025-10-21 17:59:13 -07:00
Mark Backman
0b8c966a11 Merge pull request #2892 from pipecat-ai/mb/aws-llm-claude-fix
fix: AWSBedrockLLMService compatibility for newer Claude models
2025-10-21 20:50:20 -04:00
Mark Backman
5849485bc6 fix: AWSBedrockLLMService compatibility for newer Claude models 2025-10-21 19:47:27 -04:00
Aleix Conchillo Flaqué
459af58540 runner: allow starting a bot from Daily's /start endpoint 2025-10-21 16:28:11 -07:00
Aleix Conchillo Flaqué
576bd67e85 runner: add body field to RunnerArguments 2025-10-21 16:27:48 -07:00
Aleix Conchillo Flaqué
1e8629bf96 runner: allow passing an api_key to configure 2025-10-21 16:27:48 -07:00
Paul Kompfner
776a3526f9 Add messages property to LLMContext for usage parity with OpenAILLMContext.
This wasn't really an issue before, when folks were *knowingly* migrating from `OpenAILLMContext` to `LLMContext`. But in the latest AWS Nova Sonic change, we're swapping it out from under folks, so this kind of compatibility is more important.

For context, the reason we *didn't* offer the `messages` property earlier was to aid in the development of `LLMContext`—we wanted to draw attention to all the places where messages were being read from context, so we could find the places where we might need to pass an argument to the read.
2025-10-21 17:38:39 -04:00
kompfner
2ced044418 Merge pull request #2896 from pipecat-ai/pk/add-back-types-that-were-meant-to-be-deprecated-not-removed
Add back types that were removed, when they should only have been dep…
2025-10-21 17:33:17 -04:00
Mark Backman
151f187837 Merge pull request #2895 from pipecat-ai/mb/update-env-example
Organize the env.example file
2025-10-21 17:15:22 -04:00
Mark Backman
67afa718d0 Merge pull request #2898 from pipecat-ai/mb/ellipses-changelog
Changelog entry for PR #2877
2025-10-21 17:02:08 -04:00
Mark Backman
52ab0eccc0 Quickstart to use Pipecat CLI 2025-10-21 15:57:45 -04:00
Vanessa Pyne
d1f1b68b71 Merge pull request #2863 from pipecat-ai/vp-custom-frame-processor-ex
add 08-custom-frame-processor.py to foundational examples
2025-10-21 14:15:38 -05:00
Mark Backman
a479c32665 Merge pull request #2894 from pipecat-ai/mb/cli-readme
Add Pipecat CLI to README's ecosystem section
2025-10-21 13:20:12 -04:00
Mark Backman
9f66b0ba41 Add Pipecat CLI to README's ecosystem section 2025-10-21 13:17:37 -04:00
vipyne
23385ca3d2 replace foundational example 08-bots-arguing.py with 08-custom-frame-processor.py 2025-10-21 11:56:35 -05:00
vipyne
8b24bae9c5 pr notes 2025-10-21 11:42:06 -05:00
Mark Backman
0502ec6c44 Changelog entry for PR #2877 2025-10-21 11:58:27 -04:00
Mark Backman
81645910e0 Merge pull request #2877 from nimobeeren/patch-1
Add ellipsis character to sentence ending punctuation list
2025-10-21 11:53:17 -04:00
Filipi da Silva Fuchter
d6ab4c41b0 Merge pull request #2897 from pipecat-ai/filipi/fix_proxy_route
Fixed an issue in the runner's proxy_request
2025-10-21 12:28:04 -03:00
Filipi Fuchter
2f92cb8781 Fixed an issue in the runner's proxy_request where a session that exists but has empty data was being treated as invalid. 2025-10-21 11:41:52 -03:00
Paul Kompfner
fbf274374c Add back types that were removed, when they should only have been deprecated 2025-10-21 09:56:31 -04:00
Mark Backman
427efecf5b Organize the env.example file 2025-10-21 09:43:46 -04:00
Filipi da Silva Fuchter
b3e54546ac Merge pull request #2888 from pipecat-ai/filipi/rtvi_duplicated_frames
Fixed an issue where the RTVIProcessor was sending duplicate UserStartedSpeakingFrame and UserStoppedSpeakingFrame messages.
2025-10-21 08:57:32 -03:00
Filipi Fuchter
de46631bac Fixed an issue where the RTVIProcessor was sending duplicate UserStartedSpeakingFrame and UserStoppedSpeakingFrame messages. 2025-10-20 18:39:00 -03:00
vipyne
abf0150261 add 47-custom-frame-processor.py to foundational examples 2025-10-20 12:11:40 -05:00
Aleix Conchillo Flaqué
a0c93ab6de update CHANGELOG cosmetics 2025-10-20 09:07:50 -07:00
Aleix Conchillo Flaqué
4bec566bbf Merge pull request #2885 from pipecat-ai/aleix/daily-python-0.20.0
pyproject: update daily-python to 0.20.0
2025-10-20 08:04:52 -07:00
Aleix Conchillo Flaqué
ec3cd24182 pyproject: update daily-python to 0.20.0 2025-10-20 08:04:34 -07:00
kompfner
e36e64c2e8 Merge pull request #2750 from pipecat-ai/pk/aws-nova-sonic-universal-llmcontext-1
Support new `LLMContext` pattern with `AWSNovaSonicLLMService`
2025-10-20 10:12:53 -04:00
Paul Kompfner
02a88022dd Add a bit more detail to CHANGELOG related to AWSNovaSonicLLMService's support for LLMContext 2025-10-20 10:06:09 -04:00
Paul Kompfner
6cae61f2cc Add a bit more detail to CHANGELOG entry about AWSNovaSonicLLMService's support for LLMContext 2025-10-20 09:50:23 -04:00
Paul Kompfner
3b40079120 Add a detailed warning when trying to import things from pipecat.services.aws_nova_sonic.context or pipecat.services.aws.nova_sonic.context 2025-10-20 09:49:05 -04:00
Paul Kompfner
ff0b38859b Remove AWS Nova Sonic's context.py, which was always concerned with types for internal use only. Now those types are either gone or moved elsewhere. 2025-10-20 09:49:05 -04:00
Paul Kompfner
4d499324d1 Re-apply a change to AWSNovaSonicLLMService that was lost in a rebase 2025-10-20 09:49:05 -04:00
Paul Kompfner
f13e006db2 Bump version in deprecation message in docstring 2025-10-20 09:49:05 -04:00
Paul Kompfner
87d9e8c9cd Re-apply a couple of recent changes to AWSNovaSonicLLMService that were lost in a rebase 2025-10-20 09:49:05 -04:00
Paul Kompfner
4820f1c059 Address some AWSNovaSonicLLMService context-recording edge cases 2025-10-20 09:49:05 -04:00
Paul Kompfner
860c39d1b1 Get rid of LLMContext.get_messages_for_persistent_storage().
The reason for its `system_instruction` argument was to support usage with LLMs where you might pass the system instruction as a parameter to the `LLMService` rather than specifying it in the context.

But as I thought about it more I became unconvinced that the `system_instruction` argument was really beneficial:

- If you specified your system instruction in your context in the first place, it'll still be there when you read messages for persistent storage
- If you didn't specify your system instruction in the context and instead passed it in as an `LLMService` parameter, you most likely *don't* want it to be in the context when you read messages for persistent storage
- ...and if you really really do need to inject it at the start of the context, it's quite easy to do anyway

And if we remove the `system_instruction` argument from `get_messages_for_persistent_storage()`, then it's essentially just `get_messages()`.
2025-10-20 09:49:05 -04:00
Paul Kompfner
ae5c5ed7f6 Update AWSNovaSonicLLMService to work with LLMContext and LLMContextAggregatorPair 2025-10-20 09:49:00 -04:00
shreyas-sarvam
5052da8ce6 Merge branch 'main' into sarvam/stt 2025-10-20 13:45:24 +05:30
Aleix Conchillo Flaqué
7aa01c1ca8 Merge pull request #2882 from pipecat-ai/aleix/base-transport-output-cleanup
base output transport cleanup
2025-10-18 07:38:13 -07:00
Mark Backman
4d6356748f Merge pull request #2819 from shreyas-sarvam/sarvam/tts-v3
feat: Add support for bulbul:v3
2025-10-18 09:36:57 -04:00
Mark Backman
5b1a182421 Merge branch 'main' into sarvam/tts-v3 2025-10-18 09:34:10 -04:00
Mark Backman
6ac0c34413 Merge pull request #2879 from sam-s10s/fix/smx-vocab
Fix for SpeechmaticsSTTService AdditionVocabEntry entries
2025-10-18 09:27:23 -04:00
Mark Backman
c115422dbf Merge pull request #2857 from dan-ince-aai/main
feat: add keyterms_prompt to AssemblyAI service
2025-10-18 09:20:43 -04:00
Mark Backman
a2a973be27 Merge pull request #2842 from nbyers-altira/fix-riva-segmented
Fix NVIDIA Riva Segmented STT by adding missing is_final parameter to _handle_transcription
2025-10-18 09:11:11 -04:00
Aleix Conchillo Flaqué
0407744950 BaseOutputTransport: simplify process_frame 2025-10-17 21:55:20 -07:00
Aleix Conchillo Flaqué
7ce370ccc6 BaseOutputTransport: simplify bot speaking logic 2025-10-17 15:13:20 -07:00
nbyers-altira
a4867f61aa be a tad more precise in changelog 2025-10-17 13:51:49 -04:00
nbyers-altira
a67a765783 add changelog, run linter 2025-10-17 13:49:52 -04:00
nbyers-altira
81221668b1 Merge remote-tracking branch 'upstream/main' into fix-riva-segmented 2025-10-17 13:45:59 -04:00
Daniel Ince
cc9c264940 Merge branch 'main' into main 2025-10-17 15:15:36 +01:00
Sam Sykes
f2c61ac9fd Fix for AdditionVocabEntry without sounds_like items. 2025-10-17 14:34:37 +01:00
Filipi da Silva Fuchter
88f8c10f63 Merge pull request #2875 from pipecat-ai/filipi/rtvi_routes
Creating the WebRTC routes that mimic the ones provided by Pipecat Cloud.
2025-10-17 10:13:45 -03:00
Filipi Fuchter
855f4842dd Creating the WebRTC routes that mimic the ones provided by Pipecat Cloud. 2025-10-17 10:10:19 -03:00
Filipi da Silva Fuchter
2bf44fe2af Merge pull request #2853 from pipecat-ai/filipi/trickle_ice
Adding support for trickle ice.
2025-10-17 09:00:32 -03:00
Filipi Fuchter
3e8a7cc254 Adding support for trickle ICE to the SmallWebRTCTransport. 2025-10-17 08:57:45 -03:00
Daniel Ince
a600c05570 Merge branch 'main' into main 2025-10-17 11:43:38 +01:00
dan-ince-aai
3ba6b55659 feat: multilingual + changelog updates 2025-10-17 11:38:03 +01:00
dan-ince-aai
d5f2dcfac0 lint 2025-10-17 11:32:06 +01:00
Nimo Beeren
d1d74c571c add ellipsis character to sentence ending punctuation list 2025-10-17 10:38:06 +02:00
shreyas-sarvam
d12134038b chore: Update CHANGELOG 2025-10-17 10:07:58 +05:30
shreyas-sarvam
a22af3a7e0 Merge branch 'main' into sarvam/stt 2025-10-17 10:00:49 +05:30
Aleix Conchillo Flaqué
76e07c6c48 Merge pull request #2870 from pipecat-ai/aleix/openaitts-update-settings
OpenAITTSService: allow updating instructions and speed
2025-10-16 13:21:12 -07:00
Aleix Conchillo Flaqué
8d8503bca7 OpenAITTSService: allow updating instructions and speed 2025-10-16 13:20:49 -07:00
Aleix Conchillo Flaqué
a444097060 Merge pull request #2872 from pipecat-ai/aleix/pipeline-task-cancellation-fixes
PipelineTask: fix task cancellation issues
2025-10-16 13:18:13 -07:00
Aleix Conchillo Flaqué
1b9e96c016 PipelineTask: fix task cancellation issues 2025-10-16 13:16:19 -07:00
Vanessa Pyne
7967bc53c3 Merge pull request #2868 from pipecat-ai/vp-whatsapp-dep-mv
only import whatsapp deps if using whatsapp runner
2025-10-16 14:16:28 -05:00
vipyne
6381335346 Add --whatsapp flag to runner 2025-10-16 14:15:26 -05:00
vipyne
0fd5d26104 add WHATSAPP_APP_SECRET to required whatsapp env vars 2025-10-16 10:37:56 -05:00
vipyne
41f817bf04 only import whatsapp deps if using whatsapp runner 2025-10-16 10:37:56 -05:00
Matej Marinko
9acc36c58e Update model params for Soniox STT
- remove deprecated parameters and add new ones
- add support for v3 context
2025-10-16 08:51:40 +02:00
shreyas-sarvam
27115e6565 Merge branch 'main' into sarvam/tts-v3 2025-10-16 12:09:50 +05:30
shreyas-sarvam
1ecf6e05fe Merge branch 'main' into sarvam/stt 2025-10-16 12:08:32 +05:30
Mark Backman
3c4807d7d4 Merge pull request #2859 from pipecat-ai/mb/openai-package-upgrade
Bump openai, openpipe versions, add 14x foundational example
2025-10-15 15:41:32 -04:00
Mark Backman
8902f1dc94 Bump openai, openpipe versions, add 14x foundational example 2025-10-15 15:17:22 -04:00
Mark Backman
a25333ee51 Merge pull request #2856 from pipecat-ai/mb/pr-2840-cleanup
Fix an issue in ElevenLabsHttpTTSService where the last word is not e…
2025-10-15 15:16:43 -04:00
Mark Backman
82c7d7ad83 Merge pull request #2867 from pipecat-ai/mb/update-moondream-readme
Update moondream chatbot README link
2025-10-15 15:16:19 -04:00
Mark Backman
ba2ab51ef7 Merge pull request #2866 from pipecat-ai/mb/add-sentry-foundational
Add foundation 47-sentry-metrics.py
2025-10-15 15:15:52 -04:00
Mark Backman
22557fa668 Fix an issue in ElevenLabsHttpTTSService where the last word is not emitted 2025-10-15 15:13:54 -04:00
Vanessa Pyne
3fbf59e7c6 Merge pull request #2864 from pipecat-ai/vp-trace-log
WhatsApp transport debug log -> trace log
2025-10-15 13:03:58 -05:00
vipyne
129ab5ea0e WhatsApp transport debug log -> trace log 2025-10-15 13:02:57 -05:00
Aleix Conchillo Flaqué
dc917523d0 Merge pull request #2855 from pipecat-ai/aleix/stt-tts-connected-disconnected-events
services: added on_connected/on_disconnected events
2025-10-15 10:41:38 -07:00
Aleix Conchillo Flaqué
5ea7cc9d32 services: added on_connected/on_disconnected events 2025-10-15 10:39:31 -07:00
Mark Backman
e11ede475b Update moondream chatbot README link 2025-10-15 13:22:56 -04:00
Mark Backman
90d29e04af Merge pull request #2861 from pipecat-ai/mb/11labs-http-apply-text-normalization-fix
fix: set apply_text_normalization as request parameter in ElevenLabsH…
2025-10-15 12:59:36 -04:00
Mark Backman
4c67136a8d Merge pull request #2858 from pipecat-ai/mb/daily-runner-room-properties
Add room_properties to the Daily runner configure() method
2025-10-15 12:58:18 -04:00
Mark Backman
9d78402a33 fix: set apply_text_normalization as request parameter in ElevenLabsHttpTTSService 2025-10-15 12:56:42 -04:00
Mark Backman
73877218e9 Add room_properties to the Daily runner configure() method 2025-10-15 12:55:19 -04:00
Mark Backman
6a1be90cbb Merge pull request #2862 from pipecat-ai/mb/11labs-http-aggregate-sentences
Add aggregate_sentences arg to ElevenLabsHttpTTSService
2025-10-15 12:54:23 -04:00
Aleix Conchillo Flaqué
fbac959ecb Merge pull request #2865 from pipecat-ai/aleix/stop-audio-filter-also-on-cancel
BaseInputTransport: stop audio filter on cancel
2025-10-15 09:53:24 -07:00
Aleix Conchillo Flaqué
18dd85431c Merge pull request #2854 from pipecat-ai/aleix/cartesia-stt-service-websocket
CartesiaSTTService to inherit from WebsocketSTTService
2025-10-15 09:51:42 -07:00
Aleix Conchillo Flaqué
abc569b3d2 examples(foundational/07): use CartesiaSTTService 2025-10-15 09:46:57 -07:00
Mark Backman
fa5d4ecf86 Add foundation 47-sentry-metrics.py 2025-10-15 12:45:07 -04:00
Aleix Conchillo Flaqué
83b0dc39f7 BaseInputTransport: stop audio filter on cancel 2025-10-15 09:22:48 -07:00
Mark Backman
0c31b5ef19 Add aggregate_sentences arg to ElevenLabsHttpTTSService 2025-10-15 11:49:26 -04:00
dan-ince-aai
d16c36c56d feat: add keyterms_prompt to AssemblyAI service 2025-10-15 14:27:52 +01:00
Mark Backman
8fe3bcd484 Merge pull request #2840 from Rickaym/fix--excess-space-in-elevelabs-word-timestamp-joins
fix: handle ElevenLabs partial word concatenation across alignment chunks gracefully
2025-10-15 09:01:05 -04:00
Aleix Conchillo Flaqué
be2858bfbb CartesiaSTTService: inherit from WebsocketSTTService 2025-10-14 14:14:57 -07:00
Pyae Sone Myo
b6b0997553 fix: add support for partial words 2025-10-14 23:06:13 +06:30
Pyae Sone Myo
3b751322d3 fix: add interruption reset for partial word states 2025-10-14 23:04:09 +06:30
Aleix Conchillo Flaqué
fce6f55ddb Merge pull request #2844 from pipecat-ai/aleix/runner-files-path
runner: allow subdirectories in --folder
2025-10-14 08:38:38 -07:00
Aleix Conchillo Flaqué
d9580f72a9 runner: allow subdirectories in --folder 2025-10-13 18:29:19 -07:00
nbyers-altira
cc66ac14f1 add is_final to segmented func. sig. instead so tracing is consistent 2025-10-13 10:48:41 -04:00
nbyers-altira
9ddec0f8b4 is_final is not part of the segmented _handle_transcription function signature 2025-10-13 10:44:25 -04:00
shreyas-sarvam
5cc1d8a024 refactor: Update dependencies and improve logging 2025-10-13 10:18:15 +05:30
shreyas-sarvam
9babfe9fd9 refactor: Improve code reability and replace deprecated interruption frames 2025-10-13 08:54:29 +05:30
Pyae Sone Myo
21d8d148b8 fix: handle partial words across alignment chunks gracefully 2025-10-12 22:10:11 +06:30
Aleix Conchillo Flaqué
0588c82bbf Merge pull request #2838 from makosst/manta_graph_readme
Added Manta Graph to README
2025-10-11 09:31:21 -07:00
makosst
16e9093d5a Added Manta Graph to README 2025-10-11 09:20:17 -07:00
Aleix Conchillo Flaqué
91a5d580fd Merge pull request #2835 from pipecat-ai/aleix/tts-http-aligned-audio-frames
tts: fix RimeHttpTTSService/PiperTTSService 16-bit audio frames alignment
2025-10-10 14:20:44 -07:00
Aleix Conchillo Flaqué
0473556992 tts: fix RimeHttpTTSService/PiperTTSService 16-bit audio frames alignment 2025-10-10 14:19:22 -07:00
Aleix Conchillo Flaqué
fdaa4e476e Merge pull request #2830 from pipecat-ai/aleix/pipecat-0.0.90
update CHANGELOG for 0.0.90
2025-10-10 10:22:26 -07:00
Aleix Conchillo Flaqué
502e7e42a7 update CHANGELOG for 0.0.90 2025-10-10 10:20:19 -07:00
kompfner
2ab3d4fb42 Merge pull request #2834 from pipecat-ai/pk/update-vertex-disclaimer
Update a Google Vertex disclaimer for accuracy
2025-10-10 13:19:51 -04:00
Paul Kompfner
55014bdd77 Update a Google Vertex disclaimer for accuracy 2025-10-10 13:18:03 -04:00
kompfner
334796bd65 Merge pull request #2833 from pipecat-ai/pk/vertex-non-optional-location
`location` should not be optional when using Google Vertex.
2025-10-10 13:02:40 -04:00
Paul Kompfner
1c25b6fb72 location should not be optional when using Google Vertex.
Also, update `GoogleVertexLLMService` initialization pattern in the example file.
2025-10-10 12:58:45 -04:00
Mark Backman
91b29de7ca Merge pull request #2832 from pipecat-ai/mb/docs-fixes-0.0.90
Docs fixes for 0.0.90 release
2025-10-10 12:46:40 -04:00
Mark Backman
21d610cd30 Docs fixes for 0.0.90 release 2025-10-10 12:43:31 -04:00
Mark Backman
f7fe673ad1 Merge pull request #2831 from pipecat-ai/mb/update-evals
Update release evals for OpenAI Realtime, Gemini Live Vertex; shorten…
2025-10-10 12:34:27 -04:00
Mark Backman
4b415721e2 Update release evals for OpenAI Realtime, Gemini Live Vertex; shorten 26 foundational names 2025-10-10 12:26:23 -04:00
kompfner
8d2a98e0e7 Merge pull request #2829 from pipecat-ai/pk/fix-gemini-live-deprecation-messages
Fix deprecation messages pointing users to the new import paths for G…
2025-10-10 10:42:15 -04:00
Paul Kompfner
523e890c8c Fix deprecation messages pointing users to the new import paths for Gemini Live 2025-10-10 10:30:38 -04:00
kompfner
3c748fe772 Merge pull request #2823 from pipecat-ai/pk/vertex-init-args-fixup
Move `location` and `project_id` out of `InputParams` in `GoogleVerte…
2025-10-10 10:18:51 -04:00
kompfner
d293cee372 Merge pull request #2822 from pipecat-ai/pk/make-pause-processing-frames-more-robust
Make `pause_processing_frames()` and `pause_processing_system_frames(…
2025-10-10 10:16:27 -04:00
Paul Kompfner
8b62a96878 Improve how we're deprecating location and project_id in GoogleVertexLLMService, allowing user code to (correctly) continue referring to GoogleVertexLLMService.InputParams.
Also fix the slightly wrong (but so far harmless) pattern of initializing `OpenAILLMService.InputParams()` in the `GoogleVertexLLMService` if `params` wasn't provided—we should be letting the superclass decide what to do if the argument isn't specified.
2025-10-10 10:12:00 -04:00
Mark Backman
0c102ce70b Merge pull request #2826 from pipecat-ai/mb/deprecate-livekit-frame-serializer
Deprecate LivekitFrameSerializer
2025-10-10 10:01:45 -04:00
Mark Backman
3894d2a4b9 Deprecate LivekitFrameSerializer 2025-10-10 09:51:57 -04:00
Aleix Conchillo Flaqué
1f6b61c0db Merge pull request #2828 from pipecat-ai/aleix/gemini-live-gemini-to-llm
google: rename google.gemini_live.gemini to google.gemini_live.llm
2025-10-10 06:42:51 -07:00
Aleix Conchillo Flaqué
8ee28b37cd google: rename google.gemini_live.vertext to google.gemini_live.llm_vertex 2025-10-10 06:41:19 -07:00
Filipi da Silva Fuchter
e85e7e4d84 Merge pull request #2773 from pipecat-ai/filipi/krisp_viva
Added audio filter `KrispVivaFilter` using the Krisp VIVA SDK.
2025-10-10 09:51:15 -03:00
Filipi Fuchter
1b3afb5511 Added audio filter KrispVivaFilter using the Krisp VIVA SDK 2025-10-10 09:44:47 -03:00
Aleix Conchillo Flaqué
7cec013666 google: rename google.gemini_live.gemini to google.gemini_live.llm 2025-10-09 22:20:09 -07:00
Aleix Conchillo Flaqué
86127167fb Merge pull request #2827 from pipecat-ai/aleix/openai-realtime-move
move openai_realtime to openai.realtime
2025-10-09 22:18:04 -07:00
Aleix Conchillo Flaqué
9935a68018 examples(19b): fix deprecations 2025-10-09 22:14:52 -07:00
Aleix Conchillo Flaqué
5679dde70f ai_service: use openai.realtime.events instead of openai_realtime_beta.events 2025-10-09 22:14:46 -07:00
Aleix Conchillo Flaqué
d81b0f6368 update CHANGELOG with openai_realtime deprecation 2025-10-09 22:14:46 -07:00
Aleix Conchillo Flaqué
9698b008da deprecate openai_realtime 2025-10-09 22:14:46 -07:00
Aleix Conchillo Flaqué
7b05c9283b move openai.realtime.azure to azure.realtime.llm 2025-10-09 22:14:46 -07:00
Aleix Conchillo Flaqué
303dd2ec35 move openai.realtime.openai to openai.realtime.llm 2025-10-09 22:14:46 -07:00
Aleix Conchillo Flaqué
aa6e81648a move openai_realtime to openai.realtime 2025-10-09 22:14:46 -07:00
Aleix Conchillo Flaqué
1a87870ef3 Merge pull request #2825 from pipecat-ai/aleix/aws-nova-sonic-move
move aws_nova_sonic to aws.nova_sonic
2025-10-09 18:37:46 -07:00
Aleix Conchillo Flaqué
aac4ce2d12 update CHANGELOG with aws_nova_sonic deprecation 2025-10-09 18:32:26 -07:00
Aleix Conchillo Flaqué
2a79b2c853 aws: deprecate aws_nova_sonic 2025-10-09 17:44:29 -07:00
Aleix Conchillo Flaqué
15bf5b1533 aws: move aws_nova_sonic to aws.nova_sonic 2025-10-09 17:35:47 -07:00
Aleix Conchillo Flaqué
cdc86db8ce update CHANGELOG with GoogleVertexLLMService token fix 2025-10-09 16:58:22 -07:00
Aleix Conchillo Flaqué
9d2ad750b5 Merge pull request #2779 from LucasStringPay/patch-1
Ignore None value for 'completion_tokens' or similar for Gemini
2025-10-09 16:55:33 -07:00
Aleix Conchillo Flaqué
19ceb1a48f Merge pull request #2817 from pipecat-ai/aleix/runner-download-folder
runner: add --folder argument to allow file downloads
2025-10-09 16:55:17 -07:00
Aleix Conchillo Flaqué
59217eae38 runner: add --folder argument to allow file downloads 2025-10-09 16:49:51 -07:00
Aleix Conchillo Flaqué
bea0aee835 Merge pull request #2824 from pipecat-ai/aleix/gemini-under-google
google: move gemini_live inside google service
2025-10-09 16:40:15 -07:00
Aleix Conchillo Flaqué
aeace9b9be google: move gemini_live inside google service 2025-10-09 16:06:42 -07:00
Paul Kompfner
2994640f47 Move location and project_id out of InputParams in GoogleVertexLLMService, making them top-level init args instead. We do this for two reasons:
- Conceptually, these args comprise project-level setup, akin to credentials, whereas everything in `InputParams` is concerned with model configuration
- Providing a `project_id` when initializing `GoogleVertexLLMService` should not be optional, but prior to the change in this commit it was (erroneously) treated as optional by dint of `InputParams` being optional

This improvement was discussed [in this comment](https://github.com/pipecat-ai/pipecat/pull/2795#discussion_r2408279142).
2025-10-09 16:53:21 -04:00
Paul Kompfner
10069719e4 Make pause_processing_frames() and pause_processing_system_frames() more robust in FrameProcessor.
To understand this fix, let's look exclusively at `pause_processing_frames()` (`pause_processing_system_frames()` works the same way).

`pause_processing_frames()` works by setting a `__should_block_frames` flag, which is then read each time through the loop in the long-running `__process_frame_task_handler`. if `__should_block_frames` is `True`, it pauses processing frames until it's resumed.

Prior to this fix, the check for `__should_block_frames` was before `await self.__process_queue.get()`. The problem is that a lot of the time spent in the loop is waiting for a frame from the process queue. So if `pause_processing_frames()` is set at any time other than within `process_frame()` itself, it actually won't have an effect by the next frame, only on the frame *after* the next, which is later than intended.

Because thus far in the Pipecat codebase we've only ever called `pause_processing_frames()` and `pause_processing_system_frames()` from within `process_frame()`, this change should have no behavioral effect. But it will be helpful if we ever need to call it from anywhere else. I noticed this issue while developing a feature that did exactly that (though I later abandoned that code).
2025-10-09 15:57:31 -04:00
shreyas-sarvam
1e31fc7f9b fix: Format errors 2025-10-09 22:09:25 +05:30
kompfner
046b76df60 Merge pull request #2820 from pipecat-ai/pk/gemini-live-vertex-support
Support Gemini Live + Vertex AI
2025-10-09 11:53:41 -04:00
Paul Kompfner
f2d9063984 Renames: remove "multimodal" from Gemini Live types 2025-10-09 10:58:36 -04:00
shreyas-sarvam
7c1e2793c5 feat: Add support for bulbul:v3 and bulbul:v3-beta 2025-10-09 18:26:22 +05:30
Paul Kompfner
99f008e927 Make a note in our examples that there's an issue with Gemini Live + Vertex around specifying a modality other than AUDIO 2025-10-08 21:03:07 -04:00
Paul Kompfner
2699f0c2a6 Fix tool calls when using Gemini Live + Vertex AI 2025-10-08 21:03:07 -04:00
Paul Kompfner
0b6dd98000 Make a note in our examples that there's an issue with Gemini Live + Vertex around using "google_search" alongside other tools 2025-10-08 21:03:07 -04:00
Paul Kompfner
a14fb20d15 Fix Gemini Live w/Vertex AI not being able to handle an empty list provided for "function_declarations" 2025-10-08 21:03:07 -04:00
Paul Kompfner
728361a6a7 Add GeminiVertexMultimodalLiveLLMService 2025-10-08 21:03:01 -04:00
kompfner
106db69e8e Merge pull request #2816 from pipecat-ai/pk/gemini-live-await-ongoing-response-after-endframe
Implement ending `GeminiMultimodalLiveLLMService` gracefully (i.e. af…
2025-10-08 17:20:14 -04:00
Paul Kompfner
cf90071926 Format fix 2025-10-08 17:19:46 -04:00
Paul Kompfner
deaeb75a1f Fix changelog after rebase (and add a missing item) 2025-10-08 17:16:31 -04:00
Paul Kompfner
a666327d70 Implement ending GeminiMultimodalLiveLLMService gracefully (i.e. after the bot is finished) 2025-10-08 17:13:04 -04:00
kompfner
13a0522546 Merge pull request #2804 from pipecat-ai/pk/gemini-live-session-resumption
Add (relatively spartan) reconnection logic to `GeminiMultimodalLiveLLMService`
2025-10-08 17:10:45 -04:00
Paul Kompfner
7da37a0d1f Pull _connection_established_threshold and _max_consecutive_failures into file-level constants 2025-10-08 17:04:05 -04:00
Paul Kompfner
7efb22a323 Add (relatively spartan) reconnection logic to GeminiMultimodalLiveLLMService, leveraging the Gemini Live session resumption mechanism 2025-10-08 16:53:21 -04:00
kompfner
8084e2f909 Merge pull request #2776 from pipecat-ai/pk/gemini-live-gen-ai-library
Gemini Live service uses the `genai` library rather than WebSockets directly
2025-10-08 16:50:16 -04:00
Paul Kompfner
86127c6a6e Add to the changelog the GeminiMultimodalLiveLLMService update to use google-genai 2025-10-08 16:46:41 -04:00
Paul Kompfner
402e019ae2 Make a bit of code clearer 2025-10-08 16:45:55 -04:00
Paul Kompfner
f09e4e238b Fix some mishandling of enum values 2025-10-08 16:45:55 -04:00
Paul Kompfner
2921162b3b Add deprecation warning around importing StartSensitivity and EndSensitivity from pipecat.services.gemini_multimodal_live.events 2025-10-08 16:45:55 -04:00
Paul Kompfner
ac1582c906 Let users directly use google-genai types rather than aliased re-exported types 2025-10-08 16:45:55 -04:00
Paul Kompfner
e4b01a5844 Bumping deprecation version of GeminiMultimodalLiveLLMService's base_url arg 2025-10-08 16:45:55 -04:00
Paul Kompfner
fa663abbbc Add CHANGELOG entry for new GeminiMultimodalLiveLLMService configuration options 2025-10-08 16:45:55 -04:00
Paul Kompfner
d19e6111c3 Bumping deprecation version of GeminiMultimodalLiveLLMService's base_url arg 2025-10-08 16:45:55 -04:00
Paul Kompfner
8a6d504a7e Add enable_affective_dialog and proactivity settings to GeminiMultimodalLiveLLMService 2025-10-08 16:45:55 -04:00
Paul Kompfner
43915937f2 Update how we import and re-export some types in GeminiMultimodalLiveLLMService 2025-10-08 16:45:55 -04:00
Paul Kompfner
48e92a22fe Add thinking settings to GeminiMultimodalLiveLLMService 2025-10-08 16:45:55 -04:00
Paul Kompfner
566af6b0b8 Minor comment improvement 2025-10-08 16:45:55 -04:00
Paul Kompfner
12e7613d5f Deprecate the base_url argument to GeminiMultimodalLiveLLMService.
It expected a WebSocket URL, but we're no longer (directly) using WebSockets to talk to Gemini. Instead of trying to (potentially erroneously) map a given custom WebSocket URL to an `HttpOptions` object (the new preferred way of customizing requests made by the Gemini API client), we're simply deprecating `base_url` and pointing users to the `http_options` argument instead.
2025-10-08 16:45:55 -04:00
Paul Kompfner
04a68f2c57 Fix tracing in GeminiMultimodalLiveLLMService 2025-10-08 16:45:55 -04:00
Paul Kompfner
9b4ca12f49 Revert to only supporting providing a single modality - looks like specifying a list of modalities results in an API error.
Also, fix some missing `await`s in error handling.
2025-10-08 16:45:55 -04:00
Paul Kompfner
453ce715a6 Add some error handling to GeminiMultimodalLiveLLMService 2025-10-08 16:45:55 -04:00
Paul Kompfner
d87b6189ba Update GeminiMultimodalLiveLLMService to use the google-genai library, which is the new recommended approach for interfacing with Gemini Live. 2025-10-08 16:45:55 -04:00
Mark Backman
8293347b77 Merge pull request #2814 from pipecat-ai/mb/openai-service-tier
Add service_tier to BaseOpenAILLMService
2025-10-08 16:44:27 -04:00
Mark Backman
c85a3f0b94 Add service_tier to BaseOpenAILLMService 2025-10-08 16:33:36 -04:00
Aleix Conchillo Flaqué
233fb25e6c Merge pull request #2810 from pipecat-ai/aleix/on-pipeline-error
PipelineTask: add on_pipeline_error event
2025-10-08 11:26:46 -07:00
Aleix Conchillo Flaqué
080978daa6 Merge pull request #2808 from pipecat-ai/aleix/readme-pipecat-tv
README: add Pipecat TV reference
2025-10-08 11:26:17 -07:00
Aleix Conchillo Flaqué
62b7c3d3b2 PipelineTask: add on_pipeline_error event 2025-10-07 18:36:38 -07:00
Mark Backman
4b2379cba8 Merge pull request #2798 from ivaaan/hume-rtvi
Hume add RTVI
2025-10-07 21:20:50 -04:00
Aleix Conchillo Flaqué
92087bdfa8 update CHANGELOG 2025-10-07 18:08:18 -07:00
Aleix Conchillo Flaqué
617919ac09 Merge pull request #2809 from pipecat-ai/aleix/revert-interruption-strategies-ordering
revert interruption strategies ordering
2025-10-07 18:07:07 -07:00
Aleix Conchillo Flaqué
0669daec3d update CHANGELOG for 0.0.89 2025-10-07 17:44:10 -07:00
Aleix Conchillo Flaqué
7c15a8c800 Revert "fix context order when using interruption strategies"
This reverts commit de8ee96927.
2025-10-07 17:42:35 -07:00
Aleix Conchillo Flaqué
066b77fba0 README: add Pipecat TV reference 2025-10-07 15:01:28 -07:00
Aleix Conchillo Flaqué
d9aef5f916 some last release CHANGELOG updates 2025-10-07 14:29:27 -07:00
Aleix Conchillo Flaqué
91ae3f8a9b Merge pull request #2807 from pipecat-ai/aleix/pipecat-0.0.88
update CHANGELOG for 0.0.88
2025-10-07 14:16:05 -07:00
Aleix Conchillo Flaqué
36da623352 update CHANGELOG for 0.0.88 2025-10-07 14:12:12 -07:00
Filipi da Silva Fuchter
31b9087ea6 Merge pull request #2805 from pipecat-ai/filipi/allowing_update_smallwebrtc_properties
Allowing to update smallwebrtc and whatsapp properties.
2025-10-07 17:57:26 -03:00
Mark Backman
1851fed22e Merge pull request #2806 from pipecat-ai/mb/deprecate-play-ht
Deprecate PlayHT TTS services
2025-10-07 16:44:53 -04:00
Mark Backman
eddce460da Deprecate PlayHT TTS services 2025-10-07 16:40:01 -04:00
Filipi Fuchter
da4f30cb6d Allowing to update smallwebrtc and whatsapp properties. 2025-10-07 17:28:14 -03:00
Mark Backman
250cf2d8f1 Merge pull request #2803 from pipecat-ai/mb/fix-11labs-stt-deprecation
Remove deprecation warning for ElevenLabsSTTService
2025-10-07 13:04:12 -04:00
Mark Backman
7bbdb4f991 Remove deprecation warning for ElevenLabsSTTService 2025-10-07 12:32:32 -04:00
Mark Backman
051c4782fb Merge pull request #2802 from pipecat-ai/mb/fix-aws-nova-sonic
Fix AWS Nova Sonic authentication
2025-10-07 10:46:03 -04:00
Mark Backman
b1ccec74b2 Fix AWS Nova Sonic authentication 2025-10-07 09:48:18 -04:00
Filipi da Silva Fuchter
92bf0d9eda Merge pull request #2794 from pipecat-ai/filipi/verifying_whatsapp_signature
Verifying WhatsApp signature to ensure the webhook request is from WhatsApp.
2025-10-07 08:57:47 -03:00
Aleix Conchillo Flaqué
f985550441 Merge pull request #2796 from pipecat-ai/aleix/fix-interruption-strategies-context-order
fix context order when using interruption strategies
2025-10-06 22:46:31 -07:00
Aleix Conchillo Flaqué
de8ee96927 fix context order when using interruption strategies 2025-10-06 22:43:01 -07:00
Aleix Conchillo Flaqué
2576d0f340 Merge pull request #2792 from pipecat-ai/aleix/google-nano-banana
GoogleLLMService: added support for image generation
2025-10-06 22:42:14 -07:00
ivaaan
f38f4711ac wip 2025-10-06 20:24:27 -07:00
ivaaan
c2f3ddd329 add RTVI to Hume 2025-10-06 19:41:31 -07:00
ivaaan
73ffe96228 add RTVI to Hume 2025-10-06 19:37:05 -07:00
Aleix Conchillo Flaqué
bd13a80da7 pyproject: update google dependencies 2025-10-06 17:38:08 -07:00
Aleix Conchillo Flaqué
312959f97e GoogleLLMService: update default model to gemini-2.5-flash 2025-10-06 17:38:08 -07:00
Aleix Conchillo Flaqué
fe168e3c68 GoogleLLMService: added support for image generation 2025-10-06 17:38:08 -07:00
Filipi Fuchter
28929a47f7 Verifying WhatsApp signature to ensure the webhook request is from WhatsApp. 2025-10-06 16:16:59 -03:00
Mark Backman
03f5defbc3 Merge pull request #2793 from pipecat-ai/mb/fix-flux-deprecation
Fix: Resolve Flux deprecation warning
2025-10-06 12:07:27 -04:00
Mark Backman
b216648315 Fix: Resolve Flux deprecation warning 2025-10-06 09:55:02 -04:00
Mark Backman
084b133a01 Merge pull request #2790 from pipecat-ai/add-security-md
Add SECURITY.md
2025-10-06 09:45:02 -04:00
Mark Backman
e589876176 Merge pull request #2786 from pipecat-ai/mb/nltk-download-error
Catch PermissionError when NLTK data can't be downloaded
2025-10-06 09:27:22 -04:00
Vanessa Pyne
a826313bf9 Add SECURITY.md 2025-10-05 13:24:47 -05:00
Mark Backman
49f44aa7c8 Catch PermissionError when NLTK data can't be downloaded 2025-10-04 08:41:32 -04:00
Mark Backman
64ceef9cf0 Merge pull request #2783 from pipecat-ai/mb/community-integrations-submission
Update to Community Integrations submission process
2025-10-03 12:41:13 -04:00
Mark Backman
cd6567c1f1 Update to Community Integrations submission process 2025-10-03 12:15:48 -04:00
Mark Backman
ac67ca1555 Merge pull request #2778 from pipecat-ai/mb/hume-cleanup
Tidying up the Hume example and service
2025-10-03 11:09:18 -04:00
mattie ruth backman
8d38994756 Transports now send InputTransportMessageFrames (not Urgent Frames) 2025-10-03 09:47:44 -04:00
LucasStringPay
607e3040d4 Ignore None 'completion_tokens' or similar
Similar as 144ea36c81 , reported in https://github.com/pipecat-ai/pipecat/issues/2207
2025-10-02 15:16:11 -07:00
Mark Backman
60604a9449 Tidying up the Hume example and service 2025-10-02 17:34:40 -04:00
Aleix Conchillo Flaqué
4abe4a6253 Merge pull request #2777 from pipecat-ai/aleix/readme-mention-tail
README: add tail terminal dashboard
2025-10-02 14:31:26 -07:00
Aleix Conchillo Flaqué
4c054af17b README: remove setup editor instructions 2025-10-02 14:30:31 -07:00
Aleix Conchillo Flaqué
dcba940d42 README: add tail terminal dashboard 2025-10-02 14:27:55 -07:00
Mark Backman
ad2adb0c58 Merge pull request #2518 from zgreathouse/hume-tts-service
Hume tts service
2025-10-02 17:26:39 -04:00
ivaaan
76923010b5 upd Hume version to 2 2025-10-02 13:57:07 -07:00
ivaaan
1b511557b2 upd evals 2025-10-02 13:48:30 -07:00
ivaaan
fdadb12933 upd Changelog 2025-10-02 13:46:22 -07:00
ivaaan
f1bbb7ba22 Regenerate uv.lock after resolving merge conflicts 2025-10-02 13:44:07 -07:00
ivaaan
c1492c5275 fixes based on markbackman review 2025-10-02 13:38:36 -07:00
ivaaan
4ffdabcfde add Hume example, small fixes 2025-10-02 13:38:36 -07:00
zach
b489de2fc3 adds hume tts service 2025-10-02 13:38:05 -07:00
zach
d9656cbb1a add hume sdk for hume tts service 2025-10-02 13:38:05 -07:00
zach
05fb223985 Add hume to .env.example 2025-10-02 13:34:37 -07:00
Mark Backman
62a5f07ad2 Merge pull request #2701 from pipecat-ai/mb/third-party-integrations
Add a third-party integrations guide
2025-10-02 15:59:38 -04:00
Mark Backman
b669e3a481 Update name to Community Integrations and streamline guide 2025-10-02 15:54:04 -04:00
Mark Backman
99f1041a47 More review fixes 2025-10-02 14:48:12 -04:00
Mark Backman
37b1345bfa Changes from review feedback 2025-10-02 14:48:12 -04:00
Mark Backman
8994ac17eb Add a third-party integrations guide 2025-10-02 14:48:12 -04:00
Mark Backman
63bc825008 Merge pull request #2771 from pipecat-ai/mb/update-publish-workflows
Updates to publish workflows
2025-10-02 12:35:43 -04:00
Mark Backman
e7ffde1c4c Merge pull request #2774 from pipecat-ai/mb/docs-fixes-0.0.87
Fix: Resolve docstring build issues before 0.0.87 release
2025-10-02 12:34:27 -04:00
Mark Backman
1c88565725 Merge pull request #2772 from pipecat-ai/mb/fix-openai-realtime-import
Fix: Change import for OpenAIRealtimeLLMContext in OpenAIRealtimeLLMS…
2025-10-02 12:34:16 -04:00
Aleix Conchillo Flaqué
07a6c2fb0e Merge pull request #2775 from pipecat-ai/aleix/pipecat-0.0.87
update CHANGELOG for 0.0.87
2025-10-02 09:12:41 -07:00
Aleix Conchillo Flaqué
e99f3bf75a update CHANGELOG for 0.0.87 2025-10-02 09:11:30 -07:00
Mark Backman
f09d780413 Fix: Resolve docstring build issues before 0.0.87 release 2025-10-02 10:09:25 -04:00
Mark Backman
e370d23374 Fix: Change import for OpenAIRealtimeLLMContext in OpenAIRealtimeLLMService 2025-10-02 09:39:44 -04:00
Mark Backman
b68ec14146 Updates to publish workflows 2025-10-02 08:25:35 -04:00
Filipi da Silva Fuchter
c567fd71b1 Merge pull request #2747 from pipecat-ai/filipi/whatsapp_runner
Creating the whatsapp routes inside the runner.
2025-10-01 21:21:34 -03:00
Filipi da Silva Fuchter
2ca1b2d6f8 Merge pull request #2612 from pipecat-ai/filipi/deepgram_flux
Integrating the new Deepgram model (Flux) with Pipecat
2025-10-01 21:20:47 -03:00
Mark Backman
04041a9a9a Merge pull request #2757 from pipecat-ai/hush/retryTimeout
Fix AWS Bedrock timeout exception handling
2025-10-01 19:08:09 -04:00
Aleix Conchillo Flaqué
6c498dc70f Merge pull request #2745 from pipecat-ai/aleix/transport-message-frames-deprecations
transport message frames deprecations
2025-10-01 16:05:55 -07:00
James Hush
32b07c1720 Fix AWS Bedrock timeout exception handling
- Use ReadTimeoutError and asyncio.TimeoutError which are the actual exceptions thrown by boto3
2025-10-01 19:04:35 -04:00
Aleix Conchillo Flaqué
ad507ce23d FrameLogger: it's fine to print transport messages 2025-10-01 16:00:42 -07:00
Aleix Conchillo Flaqué
be562cedfc DailyTransport: deprecate DailyTransportMessage(Urgent)Frame 2025-10-01 16:00:42 -07:00
Aleix Conchillo Flaqué
089e703e1f LiveKitTransport: deprecate LiveKitTransportMessage(Urgent)Frame 2025-10-01 16:00:42 -07:00
Aleix Conchillo Flaqué
4dc1e15a99 frames: use OutputTransportMessage(Urgent)Frame instead of TransportMessage(Urgent)Frame 2025-10-01 16:00:42 -07:00
Aleix Conchillo Flaqué
c7dc2e886f frames: use InputTransportMessageFrame instead of InputTransportMessageUrgentFrame
By default, input frames are already urgent.
2025-10-01 15:30:45 -07:00
Filipi Fuchter
11bc4ea854 Adding deepgram flux to release evals. 2025-10-01 19:24:58 -03:00
Mark Backman
029d76033d Merge pull request #2765 from pipecat-ai/mb/remove-daily-logging-04a
Remove DailyLogLevel from 04a example
2025-10-01 17:52:33 -04:00
Aleix Conchillo Flaqué
924d7dea9a Merge pull request #2766 from pipecat-ai/aleix/rtvi-properly-deprecate-errors-enabled
RTVIParams: properly deprecate errors_enabled
2025-10-01 14:49:12 -07:00
Aleix Conchillo Flaqué
244e94f3ce RTVIParams: properly deprecate errors_enabled 2025-10-01 14:30:41 -07:00
Mark Backman
af1f51d49e Remove DailyLogLevel from 04a example 2025-10-01 17:06:35 -04:00
Filipi da Silva Fuchter
9ba3c168b8 Merge pull request #2756 from pipecat-ai/filipi/esp32
SDP munging fixes.
2025-10-01 16:05:47 -03:00
Filipi Fuchter
e6ee8f7a16 New example using DeepgramFluxSTTService. 2025-10-01 15:43:25 -03:00
Filipi Fuchter
2ea2bd99e0 Deepgram Flux speech-to-text service implementation. 2025-10-01 15:43:09 -03:00
Filipi Fuchter
0c2ced7c52 Created WebsocketSTTService base class. 2025-10-01 15:42:56 -03:00
Filipi Fuchter
fb160646b8 Fixing the SDP munging to keep it working on Chrome. 2025-10-01 14:18:39 -03:00
Filipi da Silva Fuchter
89fed57af2 Merge pull request #2748 from pipecat-ai/filipi/remove_smallwebrtc_queue
Removing the message queue inside the SmallWebRTCConnection.
2025-10-01 08:07:47 -03:00
Aleix Conchillo Flaqué
feae3b6d2d Merge pull request #2742 from pipecat-ai/aleix/deprecate-daily-update-remote-participants-frame
DailyTransport: deprecated DailyUpdateRemoteParticipantsFrame
2025-09-30 16:27:34 -07:00
Aleix Conchillo Flaqué
92d3be8975 DailyTransport: deprecated DailyUpdateRemoteParticipantsFrame 2025-09-30 16:26:48 -07:00
Aleix Conchillo Flaqué
0f53e1db2c Merge pull request #2759 from pipecat-ai/aleix/dont-cancel-if-finished
PipelineTask: avoid cancellation if application is finished
2025-09-30 16:21:16 -07:00
Aleix Conchillo Flaqué
d398e8cc10 Merge pull request #2761 from pipecat-ai/aleix/rtvi-tail-updates
RTVI updates: audio levels and system logs
2025-09-30 13:55:17 -07:00
Aleix Conchillo Flaqué
e5f263d380 update CHANGELOG 2025-09-30 13:51:35 -07:00
Aleix Conchillo Flaqué
3a4c303c54 RTVIParams: add errors_enabled deprecation warnings 2025-09-30 13:49:51 -07:00
Mark Backman
54a1ef47d0 Merge pull request #2758 from pipecat-ai/mb/claude-sonnet-4.5
Update AnthropicLLMService to use claude-sonnet-4-5-20250929
2025-09-30 16:42:47 -04:00
Aleix Conchillo Flaqué
149ffa4f3c RTVIObserver: add support system logs 2025-09-30 13:42:40 -07:00
Aleix Conchillo Flaqué
e5465034d9 RTVIObserver: add support for user/bot audio levels 2025-09-30 13:41:26 -07:00
Aleix Conchillo Flaqué
568c7c782d rtvi: allow None RTVIProcessor and rename to send_rtvi_message() 2025-09-30 13:35:27 -07:00
Aleix Conchillo Flaqué
9851334221 rtvi: deprecate errors_enabled and always send errors 2025-09-30 13:31:30 -07:00
Aleix Conchillo Flaqué
e79c4fc99d PipelineTask: avoid cancellation if application is finished 2025-09-30 13:18:25 -07:00
Aleix Conchillo Flaqué
55c321f4ff Merge pull request #2751 from pipecat-ai/aleix/nova-sonic-disconnect-fix
AWSNovaSonicLLMService: add missing await
2025-09-30 13:12:22 -07:00
kompfner
a14a53a005 Merge pull request #2735 from pipecat-ai/pk/remove-openaillmcontext-usage
Remove remaining usage of `OpenAILLMContext` throughout the codebase …
2025-09-30 10:09:25 -04:00
Mark Backman
a71f937e8f Update AnthropicLLMService to use claude-sonnet-4-5-20250929 2025-09-30 08:49:30 -04:00
Filipi Fuchter
032032df65 Only remove ESP32 ICE candidates if host is defined. 2025-09-29 15:42:23 -03:00
Mark Backman
d0178edad0 Merge pull request #2753 from pipecat-ai/mb/quickstart-0.0.86
Quickstart: Update to 0.0.86, removing pytorch requirements
2025-09-29 09:43:33 -04:00
Mark Backman
795c5e55d9 Quickstart: Update to 0.0.86, removing pytorch requirements 2025-09-27 08:30:37 -04:00
Aleix Conchillo Flaqué
8f8d8ae0d8 AWSNovaSonicLLMService: add missing await 2025-09-26 15:58:05 -07:00
Vanessa Pyne
741f192d04 Merge pull request #2096 from pipecat-ai/vp-mcp-ex-nit
mcp examples: check for env vars needed for examples
2025-09-26 10:21:22 -05:00
Filipi Fuchter
a5595b82ea removing the message queue inside the SmallWebRTCConnection. 2025-09-26 11:02:17 -03:00
Filipi Fuchter
4d1915eb41 Fixing ruff format. 2025-09-26 10:49:52 -03:00
Filipi Fuchter
b3a84fc772 Refactoring how we are handling the lifespan inside the runner. 2025-09-26 10:47:04 -03:00
Filipi Fuchter
403d22e62c Creating the whatsapp routes inside the runner. 2025-09-26 10:28:19 -03:00
Aleix Conchillo Flaqué
ee00ee5c57 Merge pull request #2744 from pipecat-ai/aleix/vad-analyzer-thread-executor
BaseInputTransport: create VAD thread in VADAnalyzer
2025-09-25 13:43:34 -07:00
Aleix Conchillo Flaqué
f53fd880dc BaseInputTransport: create VAD thread in VADAnalyzer
We move the thread creation to the VADAnalyzer instead of the input
transport. This can potentially be useful if we need to analyze multiple audio
streams.
2025-09-25 13:41:20 -07:00
Aleix Conchillo Flaqué
de3461e4cc Merge pull request #2743 from pipecat-ai/aleix/turn-analyzer-fixes
turn analyzer fixes
2025-09-25 13:40:43 -07:00
Aleix Conchillo Flaqué
7bafc3a1bb BaseSmartTurn: process speech in a separate thread 2025-09-25 13:37:28 -07:00
Aleix Conchillo Flaqué
22ef61fe8d BaseTurnAnalyzer: add BaseTurnParams base class for parameters 2025-09-25 13:37:09 -07:00
Aleix Conchillo Flaqué
7078fb53bd Merge pull request #2738 from pipecat-ai/aleix/openai-cached-tokens-metrics
BaseOpenAILLMService: include cached tokens to metrics frame
2025-09-25 13:36:03 -07:00
Aleix Conchillo Flaqué
33447ad6f2 BaseOpenAILLMService: include cached tokens to metrics frame 2025-09-24 19:32:16 -07:00
Paul Kompfner
6faa50ae5b Remove remaining usage of OpenAILLMContext throughout the codebase in favor of LLMContext, except for:
- Usage in classes that are already deprecated
- Usage related to realtime LLMs, which don't yet support `LLMContext`
- Usage in (soon-to-be-deprecated) code paths related to `OpenAILLMContext` itself and associated machinery
2025-09-24 16:35:03 -04:00
Aleix Conchillo Flaqué
3797f41c8c Merge pull request #2734 from pipecat-ai/aleix/pipecat-0.0.86
update CHANGELOG for 0.0.86
2025-09-24 12:19:16 -07:00
Aleix Conchillo Flaqué
ff919b8c15 update CHANGELOG for 0.0.86 2025-09-24 11:28:14 -07:00
Aleix Conchillo Flaqué
cb048d6c7e Merge pull request #2733 from pipecat-ai/aleix/tavus-missing-daily-callback
TavusTransport: add missing on_before_leave callback
2025-09-24 11:10:32 -07:00
Aleix Conchillo Flaqué
6c2c43ade0 Merge pull request #2724 from pipecat-ai/pk/update-natural-conversation-examples-with-universal-context
Update natural conversation examples with universal context
2025-09-24 11:07:50 -07:00
Aleix Conchillo Flaqué
f899c15b03 Merge pull request #2731 from pipecat-ai/pk/update-example-25-to-use-universal-context
Update example 25 to use universal `LLMContext`
2025-09-24 11:05:36 -07:00
Aleix Conchillo Flaqué
d10ef08775 Merge pull request #2727 from pipecat-ai/pk/strands-agents-needs-to-support-openaillmcontextframe-for-now
`StrandsAgentsProcessor` should still support `OpenAILLMContextFrame`…
2025-09-24 11:05:07 -07:00
Aleix Conchillo Flaqué
27a5af6fa1 Merge pull request #2728 from pipecat-ai/pk/fix-playht-in-env-example
Fix PlayHT env variable names in env.example
2025-09-24 11:04:55 -07:00
Aleix Conchillo Flaqué
4bff0a7c49 Merge pull request #2732 from pipecat-ai/pk/update-voicemail-detector-to-use-llm-context
Update `VoicemailDetector` to use universal `LLMContext`
2025-09-24 11:04:42 -07:00
Aleix Conchillo Flaqué
508f7d203d Merge pull request #2729 from pipecat-ai/aleix/frame-processor-cancel-default-timeout
FrameProcessor: timeout when cancelling tasks
2025-09-24 10:55:52 -07:00
Filipi da Silva Fuchter
0f87d5342c Merge pull request #2266 from pipecat-ai/filipi/hey_gen_transport
HeyGen implementation for Pipecat - HeyGenTransport
2025-09-24 14:35:50 -03:00
Filipi Fuchter
f6164e3bde HeyGen implementation for Pipecat - HeyGenTransport. 2025-09-24 14:33:56 -03:00
Aleix Conchillo Flaqué
1a0fb55d0f TavusTransport: add missing on_before_leave callback 2025-09-24 10:18:56 -07:00
Paul Kompfner
6d0beef944 Update VoicemailDetector to use universal LLMContext 2025-09-24 12:58:42 -04:00
Paul Kompfner
b9fd6b873b Update comment in example 07b to reference LLMContext rather than OpenAILLMContext 2025-09-24 12:49:34 -04:00
Paul Kompfner
dea0f1791f Update OpenAILLMAdapter.get_messages_for_logging() to truncate "input_audio" message data 2025-09-24 12:41:41 -04:00
Paul Kompfner
da66c38795 Update example 25 to use universal LLMContext 2025-09-24 12:37:29 -04:00
Paul Kompfner
912f8b96f0 Fix PlayHT env variable names in env.example 2025-09-24 11:24:35 -04:00
Aleix Conchillo Flaqué
f9eb447d82 FrameProcessor: timeout when cancelling tasks 2025-09-24 08:24:28 -07:00
Paul Kompfner
65f5fe8588 StrandsAgentsProcessor should still support OpenAILLMContextFrame until that frame has been deprecated 2025-09-24 11:05:27 -04:00
Paul Kompfner
817c77f3fe Update SmallWebRTCTransport to pass a sender ID in the "on_app_message" event 2025-09-24 10:24:59 -04:00
Paul Kompfner
8896179b00 Update another "natural conversation" example to use universal LLMContext. Note that this one had to also be fixed in various ways, as it wasn't working. 2025-09-24 09:55:11 -04:00
Filipi da Silva Fuchter
463752360b Merge pull request #2726 from pipecat-ai/mb/twilio-serializer-cleanup
Fixup for TwilioFrameSerializer
2025-09-24 10:45:38 -03:00
Paul Kompfner
66b7977a62 Make SmallWebRTCTransport adhere to the expected "on_app_message" event signature 2025-09-24 09:34:28 -04:00
Mark Backman
468de68aec Fixup for TwilioFrameSerializer 2025-09-24 09:32:46 -04:00
Mark Backman
c4762c1a92 Merge pull request #2627 from jessieweiyi/main
Support TwilioFrameSerializer region/edge settings. Close #2625.
2025-09-24 09:13:10 -04:00
Aleix Conchillo Flaqué
7f4d3a2f02 pyproject: updated sentry to 2.38.0 2025-09-23 19:12:03 -07:00
Jessie Wei
88614b312f Merge branch 'pipecat-ai:main' into main 2025-09-24 10:23:52 +10:00
Jessie Wei
5b4655f45a chore: Update per review comments 2025-09-24 00:22:56 +00:00
Aleix Conchillo Flaqué
d7c8f8df53 update CHANGELOG with AudioBufferProcessor fixes 2025-09-23 15:42:01 -07:00
Aleix Conchillo Flaqué
2571cb2e69 tests: fix formatting 2025-09-23 15:28:07 -07:00
Aleix Conchillo Flaqué
15782be27c Merge pull request #2676 from golbin/main
Fix audio buffer flush and silence handling
2025-09-23 15:27:31 -07:00
Aleix Conchillo Flaqué
997e4b66c6 Merge pull request #2722 from pipecat-ai/aleix/strands-agents-update-and-evals
examples: update Strands Agents with universal context and add evals
2025-09-23 15:21:41 -07:00
Paul Kompfner
6ccbfd9b57 Update "natural conversation" examples to use universal LLMContext 2025-09-23 16:20:16 -04:00
Paul Kompfner
677f69971c Add filters in 22b and 22c examples to prevent function call results triggering the "statement judge" LLM from running unnecessarily, and with the wrong system prompt, which would result in garbled output statements comprised of both LLMs outputs combined 2025-09-23 16:14:58 -04:00
Paul Kompfner
678dd22b8e Add missing sender argument to a few "on_app_message" handlers in examples 2025-09-23 15:29:20 -04:00
kompfner
0bba02028d Merge pull request #2721 from pipecat-ai/pk/update-persistent-storage-examples-to-use-universal-llmcontext
Update persistent conversation storage examples to use universal `LLM…
2025-09-23 15:18:21 -04:00
Aleix Conchillo Flaqué
620b1f785c examples: update Strands Agents with universal context and add evals 2025-09-23 11:37:57 -07:00
Paul Kompfner
780e91eb91 Update persistent conversation storage examples to use universal LLMContext.
Note that `LLMContext` doesn't have a `get_messages_for_persistent_storage()`; the messages are already in the "standard" format so they can be used directly for storage.
2025-09-23 14:16:35 -04:00
Aleix Conchillo Flaqué
667569ef47 Merge pull request #2708 from pipecat-ai/aleix/base-output-transport-only-push-if-send-successful
BaseOutputTransport: only push downstream if transport write successful
2025-09-23 11:10:29 -07:00
Aleix Conchillo Flaqué
17ea0afa6f StrandsAgentsProcessor: more formatting fixes 2025-09-23 11:05:14 -07:00
Aleix Conchillo Flaqué
3fc5214c15 BaseOutputTransport: only push downstream if transport write successful
Fixes #2589
2025-09-23 11:04:37 -07:00
kompfner
1636c48ab9 Merge pull request #2720 from pipecat-ai/pk/update-changelog-with-additional-llmcontext-support
Update CHANGELOG with additional `LLMContext` support
2025-09-23 13:46:20 -04:00
Paul Kompfner
c3a2fa100c Update CHANGELOG with additional LLMContext support 2025-09-23 13:45:49 -04:00
kompfner
8649368337 Merge pull request #2719 from pipecat-ai/pk/update-more-examples-to-use-universal-llmcontext
Update more examples to use universal `LLMContext`. Specifically, upd…
2025-09-23 13:44:20 -04:00
Aleix Conchillo Flaqué
781366627c updated CHANGELOG with Strands Agents 2025-09-23 10:41:35 -07:00
Aleix Conchillo Flaqué
f6b4db42ef pyproject: add strands-agents 2025-09-23 10:39:06 -07:00
Aleix Conchillo Flaqué
ed64716219 StrandsAgentsProcessor: fix formatting 2025-09-23 10:38:31 -07:00
Aleix Conchillo Flaqué
5c22b2e1de Merge pull request #2610 from adithyaxx/add-strands-processor
Add native Strands Agents support to Pipecat
2025-09-23 10:33:22 -07:00
Paul Kompfner
d4b1e1ab41 Update more examples to use universal LLMContext. Specifically, update examples we didn't update before because they weren't using ToolsSchema for their tool definitions, which is a requirement for using LLMContext.
NOTE: oops! Turns out some of these files had *already* been updated to use universal `LLMContext` even though they weren't yet using `ToolsSchema`. This commit should fix those examples.
2025-09-23 12:41:35 -04:00
Mark Backman
fafe0cc4a3 Merge pull request #2718 from lshaun/fix-openai-deprecation-warning
update imports to avoid deprecated module
2025-09-23 12:29:16 -04:00
Mark Backman
40c82a8530 Merge pull request #2716 from pipecat-ai/mb/add-11labs-stt
Add ElevenLabsSTTService
2025-09-23 12:21:08 -04:00
lshaun
98d3686861 update imports to avoid deprecated module 2025-09-23 15:58:09 +00:00
kompfner
88337fc21f Merge pull request #2717 from pipecat-ai/pk/mem0-support-univeral-context
Add support for universal `LLMContext` to `Mem0MemoryService`
2025-09-23 11:54:55 -04:00
kompfner
928c0ef1b4 Merge pull request #2715 from pipecat-ai/pk/langchain-processor-support-univeral-context
Add support for universal `LLMContext` to `LangchainProcessor`
2025-09-23 11:54:44 -04:00
kompfner
1f005e7075 Merge pull request #2714 from pipecat-ai/pk/gated-openai-llm-context-aggregator-support-univeral-context
Add support for universal `LLMContext` to `GatedOpenAILLMContextAggre…
2025-09-23 11:54:26 -04:00
kompfner
2cee6229ae Merge pull request #2713 from pipecat-ai/pk/log-observer-support-univeral-context
Add support for universal `LLMContext` to `LLMLogObserver`
2025-09-23 11:54:11 -04:00
Aleix Conchillo Flaqué
10e9371f49 Merge pull request #2712 from pipecat-ai/aleix/pipeline-runner-signals-windows
PipelineRunner: use signal.signal() on Windows
2025-09-23 08:28:26 -07:00
Paul Kompfner
e21ab89509 Add support for universal LLMContext to Mem0MemoryService 2025-09-23 10:35:54 -04:00
Mark Backman
cbce2075eb Add ElevenLabsSTTService 2025-09-23 10:27:31 -04:00
Paul Kompfner
97868175e6 Add support for universal LLMContext to LangchainProcessor 2025-09-23 10:00:48 -04:00
Paul Kompfner
99731ca40a Add support for universal LLMContext to GatedOpenAILLMContextAggregator, renaming it to GatedLLMContextAggregator in the process 2025-09-23 09:52:13 -04:00
Paul Kompfner
f96cbcce22 Add support for universal LLMContext to LLMLogObserver 2025-09-23 09:30:14 -04:00
kompfner
a19b9f70c0 Merge pull request #2706 from pipecat-ai/pk/update-examples-to-use-universal-llm-context
Update examples, wherever possible, to use `LLMContext` and associate…
2025-09-23 09:20:39 -04:00
Filipi da Silva Fuchter
9b4f1bdf39 Merge pull request #2705 from tzookb/tzookb/latency-logging
update UserBotLatencyLogObserver to have logging in functions that can be overidden
2025-09-23 09:46:20 -03:00
Filipi Fuchter
6b2bf8de64 Fixing the ruff format and making the methods sync. 2025-09-23 09:41:50 -03:00
Filipi da Silva Fuchter
33481c6614 Merge pull request #2672 from pipecat-ai/filipi/pcc_small_webrtc_2
Monitoring the peer connection while it is in the *connecting* state.
2025-09-23 08:32:16 -03:00
Filipi Fuchter
a5776b20ad Monitoring the peer connection while it is in the *connecting* state. 2025-09-23 08:30:17 -03:00
Filipi da Silva Fuchter
e286e015cf Merge pull request #2687 from pipecat-ai/memory_leak
Improving memory cleanup
2025-09-23 08:13:05 -03:00
Filipi Fuchter
a7bfac8d68 Mentioning the memory cleanups in the changelog. 2025-09-23 08:09:31 -03:00
Filipi Fuchter
1647b5b665 Created a new example using the video processor to make it easier to investigate memory leaks. 2025-09-23 08:05:28 -03:00
Filipi Fuchter
eaecefe675 Refactoring how we are reading the image bytes inside the base llm. 2025-09-23 08:05:15 -03:00
Filipi Fuchter
7c569b3863 Calling task_done when reading the audio from the queue. 2025-09-23 08:04:06 -03:00
Filipi Fuchter
8bf6a4c66f Improving memory cleanup inside the SmallWebRTCTransport. 2025-09-23 08:03:55 -03:00
Filipi Fuchter
1df3660186 Not storing anymore the last frames received to display them in the idle processor. 2025-09-23 08:03:35 -03:00
Aleix Conchillo Flaqué
75c0b089e0 PipelineRunner: use signal.signal() on Windows 2025-09-22 22:38:12 -07:00
Aleix Conchillo Flaqué
d8f3d4dd32 Merge pull request #1874 from nischalj10/patch-1
added deepwiki badge for weekly repo refresh
2025-09-22 17:44:08 -07:00
Aleix Conchillo Flaqué
c5e53bb84f Merge pull request #2707 from pipecat-ai/aleix/daily-transport-deprecated-mistake
DailyTransport: remove deprecated note and double registration
2025-09-22 17:01:09 -07:00
Aleix Conchillo Flaqué
b04e494373 DailyTransport: remove deprecated note and double registration 2025-09-22 16:45:58 -07:00
Jessie Wei
392293d55f Merge branch 'pipecat-ai:main' into main 2025-09-23 07:48:45 +10:00
Paul Kompfner
272532a3ea Update examples, wherever possible, to use LLMContext and associated machinery instead of OpenAILLMContext and associated machinery.
With all these examples updated, we no longer need dedicated examples illustrating `LLMContext`, so they're removed.

Here’s where we *don’t* yet use `LLMContext` and associated machinery:
- Realtime services: OpenAI Realtime, Gemini Live, and AWS Nova Sonic (support coming soon)
- `GoogleLLMOpenAIBetaService` (it’s deprecated, so we didn’t bother adding support)
- `LLMLogObserver` (support coming soon)
- `GatedOpenAILLMContextAggregator` (support coming soon)
- `LangchainProcessor` (support coming soon)
- `Mem0MemoryService` (support coming soon)
- Examples that use LLM-specific tools definitions as opposed to `ToolsSchema` (these will be updated soon)
- Examples that rely `GoogleLLMContext.upgrade_to_google` (TBD what to do with these)

Examples that use `LLMLogObserver`:
- 30-

Examples that use `GatedOpenAILLMContextAggregator`:
- 22-

Examples that use `LangchainProcessor`:
- 07b-

Examples that use `Mem0MemoryService`:
- 37-

Examples that need updating to use `ToolsSchema`:
- 15-
- 15a-
- 20a-
- 20c-
- 20d-
- 22b-
- 22c-
- 33-
- 36-

Examples that use `GoogleLLMContext.upgrade_to_google`:
- 22d-
- 25-
2025-09-22 16:21:35 -04:00
Tzook Bar Noy
3d04f565ec update latency observer logging to be in unique funcs, so others could extend and overwrite 2025-09-22 15:38:29 -04:00
Filipi da Silva Fuchter
d0477edb6a Merge pull request #2696 from pipecat-ai/filipi/inworld_default_temperature
Changing InworldTTSService default temperature to 1.1
2025-09-22 09:33:52 -03:00
Filipi Fuchter
326bfe4239 Removing the temperature from InworldTTSService example. 2025-09-22 09:30:53 -03:00
Aleix Conchillo Flaqué
3cb78d839d examples(foundational): update comment in 45-before-and-afet-events.py 2025-09-20 10:33:52 -07:00
Aleix Conchillo Flaqué
9129e44c05 Merge pull request #2697 from pipecat-ai/aleix/frame-processor-before-after-events
FrameProcessor: add before/after events for processed/pushed frames
2025-09-20 10:26:37 -07:00
Aleix Conchillo Flaqué
ec664e2d33 examples(foundational): added 45-before-and-afet-events.py 2025-09-20 10:23:49 -07:00
Aleix Conchillo Flaqué
3d88b42e0b FrameProcessor: add before/after events for processed/pushed frames 2025-09-19 20:47:21 -07:00
Aleix Conchillo Flaqué
2289409b4c Merge pull request #2699 from pipecat-ai/aleix/daily-on-before-leave
DailyTransport: rename on_before_leave to on_before_disconnect
2025-09-19 20:38:46 -07:00
Aleix Conchillo Flaqué
b1551b0d6b DailyTransport: rename on_before_leave to on_before_disconnect 2025-09-19 19:21:31 -07:00
Mark Backman
7cb5c951f4 Merge pull request #2689 from pipecat-ai/mb/foundational-smart-turn
Update quickstart and foundational examples to use smart-turn v3
2025-09-19 15:39:14 -07:00
mattie ruth backman
a2cb5ab8e1 Add Changelong entry 2025-09-19 17:46:29 -04:00
mattie ruth backman
cad3104d56 Fix _handle_send_text to use default options if not provided 2025-09-19 17:46:29 -04:00
mattie ruth backman
d8a2a917a2 Deprecate append-to-text handling in favor of new send-text with support for toggling skip-tts while handling the text 2025-09-19 17:46:29 -04:00
chadbailey59
3984cb58a2 cleared input and process events when pausing (#2698) 2025-09-19 16:44:43 -05:00
chadbailey59
9027a96a07 Add remote participant updates to DailyTransport (#2694)
* add remote participant updates to DailyTransport

* cleanup

* cleanup

* ruff cleanup again
2025-09-19 16:03:34 -05:00
Aleix Conchillo Flaqué
6abac3e3e5 Merge pull request #2673 from pipecat-ai/aleix/sync-event-handlers
introduce synchronous event handlers
2025-09-19 13:40:43 -07:00
Aleix Conchillo Flaqué
077b949bb2 BaseObject: run each handler for the same event in a separate task 2025-09-19 13:32:39 -07:00
Aleix Conchillo Flaqué
3a6c9786e8 LiveKitTransport: added synchronous before_disconnect event 2025-09-19 13:32:39 -07:00
Aleix Conchillo Flaqué
492da16cc9 DailyTransport: added synchronous on_before_disconnect event 2025-09-19 13:32:39 -07:00
Aleix Conchillo Flaqué
a698c4064b BaseObject: allow synchronous event handlers 2025-09-19 13:32:39 -07:00
Filipi Fuchter
9e098b5f79 Changing InworldTTSService default temperature to 1.1 2025-09-19 17:03:53 -03:00
Mark Backman
7df7395dd1 Merge pull request #2692 from pipecat-ai/mb/lazy-load-smallwebrtc-request
Lazy load SmallWebRTCRequest, SmallWebRTCRequestHandler in runner
2025-09-19 10:43:43 -07:00
Mark Backman
0885bc9cdf Lazy load SmallWebRTCRequest, SmallWebRTCRequestHandler in runner 2025-09-19 13:28:01 -04:00
vipyne
889dc19a27 mcp examples: check for env vars needed for examples 2025-09-19 12:09:50 -05:00
Mark Backman
4bc41466b7 Bump quickstart pipecat-ai version to require smart-turn v3, add local smart-turn dep, update quickstart 2025-09-19 08:03:06 -04:00
Mark Backman
9ab8ddee79 Update quickstart and foundational examples to use smart-turn v3 2025-09-18 23:54:18 -04:00
Aleix Conchillo Flaqué
0204f6a95d Merge pull request #2686 from pipecat-ai/aleix/silero-vad-v6
audio(vad): update Silero VAD model to v6
2025-09-18 20:31:10 -07:00
Mark Backman
b0bf653f04 Merge pull request #2679 from pipecat-ai/mb/gladia-remove-confidence
GladiaSTTService: deprecate confidence arg
2025-09-18 17:41:33 -07:00
Mark Backman
e8a676eb36 GladiaSTTService: deprecate confidence arg 2025-09-18 20:38:53 -04:00
Mark Backman
ca96eef1f3 Merge pull request #2680 from pipecat-ai/mb/dial-in-session-id
DailyTransport sip_call_transfer now automatically receives session_id
2025-09-18 17:36:51 -07:00
Mark Backman
8e1637d6c7 DailyTransport sip_call_transfer now automatically receives session_id 2025-09-18 20:34:14 -04:00
Filipi da Silva Fuchter
367200c0ad Merge pull request #2682 from pipecat-ai/filipi/smallwebrtc_leak
Smallwebrtc memory leak
2025-09-18 18:56:08 -03:00
Filipi Fuchter
766e1948a6 Mentioning the fix in the changelog. 2025-09-18 18:43:33 -03:00
Aleix Conchillo Flaqué
f369683b8b audio(vad): update Silero VAD model to v6 2025-09-18 14:06:37 -07:00
Aleix Conchillo Flaqué
461025d1cc Merge pull request #2684 from pipecat-ai/aleix/readme-whisker
README: add whisker debugger
2025-09-18 13:27:35 -07:00
Aleix Conchillo Flaqué
ac88706f38 README: add whisker debugger 2025-09-18 13:22:54 -07:00
Filipi Fuchter
93a89449b8 Adding warnings in case queue grows. 2025-09-18 16:43:57 -03:00
Filipi Fuchter
199bf72945 Preventing memory growth if we are not consuming the track. 2025-09-18 16:16:10 -03:00
Filipi Fuchter
d20e4125f6 Updating aiortc to the latest version. 2025-09-18 15:22:46 -03:00
Filipi Fuchter
c1baed642e Script to monitor memory usage. 2025-09-18 14:43:42 -03:00
Mark Backman
33ef68573f Merge pull request #2662 from pelguetat/fix-vertex-ai-global-location-support
feat: add support for global location in Vertex AI base URL
2025-09-18 10:25:10 -07:00
Pablo Elgueta
3c1b41df13 docs: add changelog entry for global location support
- Document the new global location support in GoogleVertexLLMService
- Explain the difference between regional and global API hosts
- Follow Keep a Changelog format
2025-09-18 17:39:03 +01:00
Jin Kim
58f70e7e0d Add tests for audio buffer processor flush alignment 2025-09-18 22:19:32 +09:00
kompfner
fca4ecc73c Merge pull request #2675 from pipecat-ai/pk/service-switcher-logic-simplification
Simplify a bit of logic in `ServiceSwitcher`
2025-09-18 09:17:22 -04:00
Jin Kim
d0b573e44f Fix audio buffer flush and silence handling 2025-09-18 19:40:45 +09:00
Paul Kompfner
cfa333508b Simplify a bit of logic in ServiceSwitcher 2025-09-17 21:03:38 -04:00
Mark Backman
9e7260393a Merge pull request #2671 from pipecat-ai/mb/fix-asyncai-ttstextframe
fix: AsyncAITTSService wasn't pushing TTSTextFrames
2025-09-17 14:06:41 -07:00
Mark Backman
073b585c52 fix: AsyncAITTSService wasn't pushing TTSTextFrames 2025-09-17 16:54:18 -04:00
Aleix Conchillo Flaqué
81c2e51bec Merge pull request #2669 from pipecat-ai/aleix/interruption-task-frame-wait-fixes
interruption task frame wait fixes
2025-09-17 13:47:57 -07:00
Aleix Conchillo Flaqué
42344125b1 tests: add unit tests for push_interruption_task_frame_and_wait() 2025-09-17 13:38:22 -07:00
Aleix Conchillo Flaqué
db5bcfaa51 FrameProcessor: fix push_interruption_task_frame_and_wait() 2025-09-17 13:38:21 -07:00
kompfner
615239b7d2 Merge pull request #2646 from pipecat-ai/pk/service-switcher-unit-tests
`ServiceSwitcher` unit tests (ended up being much more than that)
2025-09-17 16:30:18 -04:00
Paul Kompfner
27f1e9dd69 Update CHANGELOG with a description of the recently-fixed ServiceSwitcher bugs 2025-09-17 16:27:12 -04:00
Paul Kompfner
bd760deff2 Update comment with more detail for posterity 2025-09-17 16:19:31 -04:00
Paul Kompfner
8bc3c89140 Fix a bug preventing usage of multiple ServiceSwitchers in a pipeline 2025-09-17 16:09:18 -04:00
Paul Kompfner
2cd2567a37 Add a unit tests validating that multiple ServiceSwitchers can be used in the same pipeline (currently failing) 2025-09-17 16:04:30 -04:00
Paul Kompfner
5b55988846 Denote a couple of variables are private with a leading underscore 2025-09-17 15:38:28 -04:00
Paul Kompfner
a12392182c Simplify, undoing the change allowing controlling ServiceSwitcher with immediate frames (SystemFrames). Service switcher frames are ControlFrames, which are easier to reason about. We can always build the immediate option later if needed (i.e. if there's sufficient user pull for it) 2025-09-17 15:35:02 -04:00
Paul Kompfner
b814b70e1e Allow controlling ServiceSwitcher with either immediate frames (SystemFrames) or in-order frames (ControlFrames).
Immediate is the "default", i.e. has the more obvious name (e.g. `ManuallySwitchServiceFrame` v `ManuallySwitchServiceControlFrame`), since that's *probably* what users will want to reach for. Also, the immediate frames are more likely to behave like what we had before the last few commits, where the service switch would always "jump the queue" by having an immediate effect once it hit the `ServiceSwitcher` in the pipeline, jumping ahead of frames in front of it destined for the service.
2025-09-17 15:35:02 -04:00
Paul Kompfner
a1f84e1b50 Remove extraneous unit tests 2025-09-17 15:35:02 -04:00
Paul Kompfner
0839b48da8 Fix an issue where the upstream ServiceSwitcherFilter wouldn't get updated with the current active service 2025-09-17 15:35:02 -04:00
Paul Kompfner
de51637b77 Update ServiceSwitcher so that ServiceSwitcherFrames (which might update the currently active service) are processed and have an effect at the expected time. We should be able to, for example, queue:
- A text frame
- A `ManuallySwitchServiceFrame` (which is a `ServiceSwitcherFrame`)
- Another text frame

And expect that the first text frame be handled by the initially active service and the second text frame be handled by the newly active one.

Previously, the `ManuallySwitchServiceFrame` would have an effect too early, causing both text frames to be handled by the newly active service. Why? Because the frame filtering condition was being updated *directly* by the `ServiceSwitcher`, which is upstream from the services it's switching between. It could therefore update the filters *before* the services received the prior frames.
2025-09-17 15:35:02 -04:00
Paul Kompfner
e1b1dc16ec Add unit tests for ServiceSwitcher 2025-09-17 15:35:02 -04:00
Mark Backman
1fe27eb0a2 Merge pull request #2660 from pipecat-ai/mb/fix-user-idle-processor-cancel-task
fix: clean up how UserIdleProcessor handles return False
2025-09-16 14:48:59 -07:00
Mark Backman
d7e1389497 fix: clean up how UserIdleProcessor handles return False 2025-09-16 17:44:06 -04:00
Aleix Conchillo Flaqué
8c7230aa8f Merge pull request #2668 from pipecat-ai/aleix/livekit-update
livekit package update
2025-09-16 14:43:18 -07:00
Aleix Conchillo Flaqué
2cf71239b0 examples(01b): use TTSSpeakFrame instead of TextFrame 2025-09-16 17:18:45 -04:00
Aleix Conchillo Flaqué
ec2c62e32b pyproject: update to livekit 1.0.13
Fixes #2643
2025-09-16 17:18:44 -04:00
Mark Backman
38ce85e9a0 Merge pull request #2667 from zytegalaxy/mcp-serverparameters-typefix
fix: replace `Tuple` type with `TypeAlias` for server params in MCP client
2025-09-16 14:14:59 -07:00
Mark Backman
2279e5a899 Merge pull request #2663 from pipecat-ai/mb/websockets-15
Add support for websockets 15.0
2025-09-16 14:08:36 -07:00
Mark Backman
cce6eb5d87 Merge pull request #2666 from pipecat-ai/mb/update-38b-local-turn-model
38b: Update bundled ONNX smart-turn model
2025-09-16 14:05:12 -07:00
mehrdad
c2b98ae557 fix(lint): fix space format issue 2025-09-16 13:44:15 -07:00
Filipi da Silva Fuchter
727eb12b16 Merge pull request #2648 from pipecat-ai/filipi/pcc_small_webrtc
Creating SmallWebRTCRequestHandler for managing peer connections.
2025-09-16 16:37:04 -03:00
mehrdad
ba96bd05d3 fix: replace Tuple type with TypeAlias for server params in MCP client 2025-09-16 11:44:25 -07:00
Mark Backman
8ead309f8d 38b: Update bundled ONNX smart-turn model 2025-09-16 13:17:14 -04:00
Mark Backman
fad0e55c64 Add websockets-base optional dependency and use for DRY pyproject.toml 2025-09-16 11:24:38 -04:00
Mark Backman
74b1af56a0 Update uv.lock 2025-09-16 11:21:49 -04:00
Mark Backman
6924850ec4 Add support for websockets 15.0 2025-09-16 11:21:49 -04:00
marcus-daily
dfe7815dc5 Smart Turn v3: removing torch and torchaudio deps 2025-09-16 16:02:41 +01:00
Pablo Elgueta
69f0a75882 feat: add support for global location in Vertex AI base URL
- Update _get_base_url method to handle 'global' location case
- Use 'aiplatform.googleapis.com' for global locations
- Use '{location}-aiplatform.googleapis.com' for regional locations
- Maintains backward compatibility with existing regional endpoints
2025-09-16 10:28:22 -03:00
Mark Backman
cca90791c4 Merge pull request #2652 from pipecat-ai/mb/fix-audio-buffer-processor-has-audio
fix: AudioBufferProcessor has_audio returns based on user or bot audi…
2025-09-15 18:43:59 -07:00
Mark Backman
f2a5d408de fix: AudioBufferProcessor has_audio returns based on user or bot audio existing 2025-09-15 21:35:35 -04:00
Aleix Conchillo Flaqué
044c6eba46 Merge pull request #2655 from pipecat-ai/aleix/add-on-pipeline-finalized
PipelineTask: add on_pipeline_finished event
2025-09-15 15:32:04 -07:00
Aleix Conchillo Flaqué
db71089f5e PipelineTask: add on_pipeline_finished event
This deprecates `on_pipeline_stopped`, `on_pipeline_ended` and
`on_pipeline_cancelled`.
2025-09-15 15:28:33 -07:00
Aleix Conchillo Flaqué
f861f5066f Merge pull request #2654 from pipecat-ai/aleix/unify-on-client-disconnected
transports: on_client_disconnected only if remote client disconnects
2025-09-15 15:18:57 -07:00
kompfner
81cede2c60 Merge pull request #2653 from pipecat-ai/pk/llm-context-adapting-tests
`LLMContext`-adapting unit tests
2025-09-15 16:38:46 -04:00
kompfner
7603203230 Merge pull request #2644 from pipecat-ai/pk/run-inference-unit-tests
`run_inference` unit tests
2025-09-15 16:26:10 -04:00
Aleix Conchillo Flaqué
8569b61598 transports: on_client_disconnected only if remote client disconnects 2025-09-15 11:35:40 -07:00
Paul Kompfner
fe42187dc1 Implement LLMService.create_llm_specific_message() so that users don't need to just know what value of llm to provide to the LLMSpecificMessage constructor 2025-09-15 14:15:22 -04:00
Paul Kompfner
999e88c942 Add unit tests for AWSBedrockLLMAdapter.get_llm_invocation_params(), focusing on messages specifically 2025-09-15 12:08:21 -04:00
Paul Kompfner
c04df2f28b Add unit tests for AnthropicLLMAdapter.get_llm_invocation_params(), focusing on messages specifically 2025-09-15 11:55:48 -04:00
Paul Kompfner
100ef0ab5c Add unit tests for GeminiLLMAdapter.get_llm_invocation_params(), focusing on messages specifically 2025-09-15 11:38:23 -04:00
Paul Kompfner
42886d7105 Add unit tests for OpenAILLMAdapter.get_llm_invocation_params(), focusing on messages specifically. Also, fix a bug in OpenAILLMAdapter (found thanks to the unit tests) where it wasn't "unwrapping" LLMSpecificMessages. 2025-09-15 11:17:11 -04:00
Mark Backman
22cbba002a Merge pull request #2651 from pipecat-ai/mb/heygen-bot-speaking-frame
fix: push BotStartedSpeakingFrame in HeyGenVideoService
2025-09-15 08:02:25 -07:00
Filipi Fuchter
0a043154f2 Removing not used import. 2025-09-15 10:46:43 -03:00
Filipi Fuchter
5e322eba9e Supporting both single and multiple connection modes. 2025-09-15 10:43:46 -03:00
Filipi Fuchter
11d0c3d46d Refactoring SmallWebRTCRequestHandler. 2025-09-15 09:58:44 -03:00
Mark Backman
c873798ce5 fix: push BotStartedSpeakingFrame in HeyGenVideoService 2025-09-14 08:12:44 -04:00
Filipi Fuchter
95f72f6dce Creating SmallWebRTCRequestHandler for managing peer connections. 2025-09-12 18:15:24 -03:00
Aleix Conchillo Flaqué
d8cd28bb8b Merge pull request #2640 from pipecat-ai/aleix/pipecat-0.0.85
update CHANGELOG for 0.0.85
2025-09-12 11:06:41 -07:00
Aleix Conchillo Flaqué
c2df6c8aee update CHANGELOG for 0.0.85 2025-09-12 11:03:32 -07:00
Aleix Conchillo Flaqué
82478be861 scripts(evals): add 19b-openai-realtime-text 2025-09-12 11:03:32 -07:00
Aleix Conchillo Flaqué
0f2b7bc01b examples(foundational): fix 19b-openai-realtime-beta-text 2025-09-12 11:03:32 -07:00
Aleix Conchillo Flaqué
1b2a5df017 Merge pull request #2622 from pipecat-ai/mb/call-data-runner
Add to, from phone info and custom data to the development runner
2025-09-12 10:28:17 -07:00
Mark Backman
2f496ac74f Add optional body parameter to WebsocketRunnerArguments 2025-09-12 11:28:12 -04:00
Mark Backman
22633a63b0 Update changelog 2025-09-12 11:15:03 -04:00
Mark Backman
e5ed0424e4 Remove to/from data from Plivo, as it will rely on body information 2025-09-12 11:10:03 -04:00
Paul Kompfner
786387722a Fix an issue in AWSBedrockLLMService.run_inference—exceptions should propagate, just like with other LLM services 2025-09-12 11:09:32 -04:00
Paul Kompfner
9f82c6b4a4 Add unit tests for run_inference 2025-09-12 11:07:11 -04:00
Mark Backman
99cfcb1d4e Parsed custom data from Plivo extraHeaders 2025-09-12 08:11:30 -04:00
Mark Backman
d595676436 Add custom data handling for Twilio 2025-09-12 08:11:30 -04:00
Aleix Conchillo Flaqué
0190812ee8 Merge pull request #2639 from pipecat-ai/aleix/min-words-interruption-unit-test
MinWordsInterruptionStrategy unit test
2025-09-11 18:52:39 -07:00
Aleix Conchillo Flaqué
2a24061bbb examples(07ad): remove deprecated user_continuous_stream 2025-09-11 18:50:00 -07:00
Aleix Conchillo Flaqué
89f7e7d199 update CHANGELOG with BaseOutputTransport fix 2025-09-11 16:58:44 -07:00
Aleix Conchillo Flaqué
384814e640 Merge pull request #2456 from a6kme/patch-1
Only set last_frame_time when handling OutputAudioRawFrame
2025-09-11 16:56:25 -07:00
Aleix Conchillo Flaqué
ab4364b833 update CHANGELOG and fix formatting 2025-09-11 15:34:47 -07:00
Aleix Conchillo Flaqué
fafdadad3c Merge pull request #2473 from TheNotary/adds-interim-transcription-frame-support
adds support to Azure STT for creating InterimTranscriptFrames
2025-09-11 15:33:38 -07:00
Aleix Conchillo Flaqué
05dc2fa916 updated CHANGELOG.md with GoogleTTSService updates 2025-09-11 14:36:21 -07:00
Aleix Conchillo Flaqué
0c30cc6ea6 Merge pull request #2547 from manishkjs/feat/google-tts-voice-cloning
feat: add voice cloning and speaking rate to GoogleTTSService
2025-09-11 14:32:21 -07:00
Aleix Conchillo Flaqué
c26d336e34 Merge pull request #2545 from pipecat-ai/aleix/aws-nova-sonic-pre-load-cue
AWSNovaSonicLLMService: pre-load audio cue in the constructor
2025-09-11 14:31:26 -07:00
Mark Backman
37b6198787 Merge pull request #2635 from pipecat-ai/mb/openai-tts-speed 2025-09-11 14:22:51 -07:00
kompfner
3c271da94c Merge pull request #2633 from pipecat-ai/pk/uv-readme-updates
Updating the README to reflect that:
2025-09-11 17:07:41 -04:00
kompfner
be28d3f93b Merge pull request #2637 from pipecat-ai/pk/llm-context-evals-and-bug-fix
`LLMContext` evals and bug fix
2025-09-11 17:00:07 -04:00
marcus-daily
d2f210e960 Bundle Smart Turn v3 with Pipecat 2025-09-11 21:37:16 +01:00
Aleix Conchillo Flaqué
57add41971 tests: add unit test for MinWordsInterruptionStrategy 2025-09-11 13:07:30 -07:00
Aleix Conchillo Flaqué
74b38b59d6 tests(utils): allow passing PipelineParams to run_test() 2025-09-11 13:02:21 -07:00
kompfner
dac58deffc Merge pull request #2636 from pipecat-ai/pk/uv-lock-update-for-smart-turn-v3
uv.lock update for Smart Turn v3
2025-09-11 14:35:36 -04:00
Paul Kompfner
aff11f5121 Fix missing import in llm_response_universal.py 2025-09-11 14:33:17 -04:00
Paul Kompfner
a4023d3915 Update evals to include examples that exercise the universal LLMContext 2025-09-11 14:32:56 -04:00
Paul Kompfner
d6543d244d uv.lock update for Smart Turn v3 2025-09-11 14:07:17 -04:00
Mark Backman
fafcd79870 OpenAITTSService: add speed arg 2025-09-11 13:53:52 -04:00
Paul Kompfner
6a717fbbd1 Updating the README to reflect that:
- various dependencies that previously didn't work with Python 3.13 now seem to
- ultravox isn't fully supported on macOS
2025-09-11 12:27:43 -04:00
Aleix Conchillo Flaqué
9b3f6927c2 Merge pull request #2621 from pipecat-ai/aleix/interruption-task-frame
interruption task frame
2025-09-11 09:22:35 -07:00
Aleix Conchillo Flaqué
0b21f8a6bd FrameProcessor: add push_interruption_task_frame_and_wait() 2025-09-11 09:19:44 -07:00
Aleix Conchillo Flaqué
8249b014f0 frames: BotInterruptionFrame is deprecated, use InterruptionTaskFrame 2025-09-11 09:01:54 -07:00
Aleix Conchillo Flaqué
9d9f10ae0e frames: StartInterruptionFrame is deprecated, use InterruptionFrame 2025-09-11 09:01:54 -07:00
Aleix Conchillo Flaqué
e27b23694d frames: add new TaskFrame
TaskFrame is a base class for other frames that are meant to be sent to the
pipeline task.
2025-09-11 09:01:52 -07:00
marcus-daily
66ce5fe6bd Ruff fixes 2025-09-11 16:04:56 +01:00
marcus-daily
a9b53dc800 Update inference session options 2025-09-11 16:04:56 +01:00
marcus-daily
818352a300 Formatting 2025-09-11 16:04:56 +01:00
marcus-daily
3e9fc7be19 Update onnxruntime version 2025-09-11 16:04:56 +01:00
marcus-daily
a2e76bcad8 Smart Turn V3 support 2025-09-11 16:04:56 +01:00
Mark Backman
8e8e42717b Add to and from phone information to the development runner 2025-09-11 10:44:21 -04:00
kompfner
b31322e38e Merge pull request #2619 from pipecat-ai/pk/aws-universal-context
Expand universal `LLMContext` support to AWS Bedrock
2025-09-11 09:33:08 -04:00
Jessie Wei
305108be9a [TwilioFrameSerializer]: Add parameter validation 2025-09-10 23:00:15 +00:00
Aleix Conchillo Flaqué
908325484d Merge pull request #2614 from pipecat-ai/aleix/readme-client-sdks-table
README: update clients' table
2025-09-10 10:21:18 -07:00
Mark Backman
dd6ff789c7 Merge pull request #2628 from pipecat-ai/mb/fix-13-push-frame
fix: 13 foundational examples now push frames from TranscriptionLogger
2025-09-10 09:13:04 -07:00
Mark Backman
f4938e0fad fix: 13 foundational examples now push frames from TranscriptionLogger 2025-09-10 10:40:10 -04:00
Jessie Wei
2e1f397d17 Support TwilioFrameSerializer region/edge settings to that the call is terminated successfully for regions other than default us1 region 2025-09-10 14:30:10 +00:00
James Hush
e8f60c7c6f Handle missing rawResponse in transcription messages (#2623)
* Handle missing rawResponse in transcription messages

- Use message.get('rawResponse', {}) to safely access rawResponse field
- Default is_final to False when rawResponse is missing
- Add proper type annotations for better code clarity
- Minor import formatting cleanup

This prevents KeyError crashes when transcription messages from Daily's API
don't include the rawResponse field in edge cases.

* docs: add changelog line
2025-09-10 15:03:23 +08:00
Paul Kompfner
fedb8a201f Update 12d example to use LLMContext, now that AWS Bedrock supports it 2025-09-09 16:24:13 -04:00
Paul Kompfner
8ccd220a60 Add universal LLMContext support to AWSBedrockLLMService.run_inference() 2025-09-09 16:00:32 -04:00
Paul Kompfner
fe79de8f27 When converting universal LLMContext messages to AWS Bedrock expected format, automatically update non-initial "system"-role messages to "user"-role messages, as we do in other non-OpenAI LLM services 2025-09-09 15:50:03 -04:00
Paul Kompfner
176573c342 Add to CHANGELOG AWS Bedrock's support for universal LLMContext 2025-09-09 15:31:56 -04:00
Paul Kompfner
75f9914f49 Add support for universal LLMContext to AWS Bedrock LLM service 2025-09-09 15:25:04 -04:00
Paul Kompfner
f4d6715e32 Add foundational example using AWS Bedrock with universal LLMContext 2025-09-09 10:49:51 -04:00
kompfner
38f6e33f97 Merge pull request #2598 from pipecat-ai/pk/deprecate-vision-image-raw-frame
Remove `VisionImageRawFrame`, which was previously being handled dire…
2025-09-08 17:13:28 -04:00
Paul Kompfner
1c3e4e34e5 Minor fix to AWS Bedrock console logging to handle image messages in the context 2025-09-08 17:10:11 -04:00
Paul Kompfner
623c660027 Remove debugging comment 2025-09-08 17:01:51 -04:00
Paul Kompfner
a3e65ab3b5 The VisionImageRawFrame removal and corresponding VisionImageFrameAggregator deprecation will now happen in version 0.0.85 2025-09-08 17:01:47 -04:00
Paul Kompfner
f3a4b416df Remove VisionImageRawFrame, which was previously being handled directly by the LLM services, and deprecate the associated VisionImageFrameAggregator.
Removing `VisionImageRawFrame` lets us simplify LLM services' logic, getting us closer to the idealized architecture where all they care about is handling context frames.

This change is in service of getting us closer to ready to deprecate usage of `OpenAILLMContext` and subclasses in favor of the universal `LLMContext`, at least for the traditional text-to-text LLMs.

Why remove `VisionImageRawFrame` rather than deprecate? It's "internal"—only created by `VisionImageFrameAggregator`—and never intended to be used directly by users (it would be difficult to use directly anyway).

Move the logic that was once in `VisionImageFrameAggregator` directly into the examples. Reasoning:
- If `UserImageRequester` is defined in the examples, it makes sense for `UserImageProcessor` to be too, as it’s the flip side of the same coin, so to speak
- The logic is now pretty trivial
- This kind of one-shot, history-less image-describing pipeline shouldn't be common at all; it's ok for it to live in examples rather than as a dedicated class
- In the short term, this enables us to create `LLMContext`s for services that support it and `OpenAILLMContext`s for services that don't yet (AWS)

This commit also adds missing translation from OpenAI-format image context messages to AWS format. Note that this isn't a wasted effort in the face of the upcoming migration to universal `LLMContext`—this work will be reused as it has to be implemented there too.
2025-09-08 17:00:08 -04:00
Aleix Conchillo Flaqué
aa471a4ef5 update CHANGELOG with LiveKitTransport updates 2025-09-08 13:53:21 -07:00
Aleix Conchillo Flaqué
d55133a44f Merge pull request #2604 from alexyzhou/feature/livekit_video_and_bug_fix
Feature: Add support for livekit video stream and minor bug fixes
2025-09-08 13:51:14 -07:00
Aleix Conchillo Flaqué
0f1cf81691 README: update clients' table 2025-09-08 12:08:32 -07:00
kompfner
ac4d335799 Merge pull request #2613 from pipecat-ai/pk/mistral-message-fixups
Apply additional fixups to context messages to meet Mistral-specific …
2025-09-08 13:59:54 -04:00
Paul Kompfner
e65385c151 Tweak the Mistral-specific context messages fixup logic to handle the (mostly academic) possibility of a "tool" message appearing at the end 2025-09-08 13:55:09 -04:00
Paul Kompfner
0bb7df7a6b Remove stray debugging message 2025-09-08 13:38:26 -04:00
Paul Kompfner
daee1ddf3b Apply additional fixups to context messages to meet Mistral-specific requirements 2025-09-08 11:26:58 -04:00
Adithya Suresh
d1f72c1c0b Added usage metrics 2025-09-08 14:44:25 +10:00
Aleix Conchillo Flaqué
1cccb97ccf Merge pull request #2608 from pipecat-ai/aleix/deprecate-noisereducefilter
audio(filters): deprecate NoisereduceFilter
2025-09-07 20:54:09 -07:00
Aleix Conchillo Flaqué
d7794abf21 audio(filters): deprecate NoisereduceFilter 2025-09-07 20:52:17 -07:00
Aleix Conchillo Flaqué
6a6a63a532 Merge pull request #2607 from pipecat-ai/aleix/scripts-evals-improve-eval-prompt
scripts(evals): allow user to talk and only eval when needed
2025-09-07 20:49:43 -07:00
Mark Backman
6edb6fed41 Merge pull request #2606 from pipecat-ai/mb/quickstart-lockfile
Remove uv.lock from quickstart
2025-09-07 06:10:14 -07:00
Mark Backman
a537382816 Add OpenAIRealtimeLLMService, AzureRealtimeLLMService (#2596)
* Add OpenAI Realtime module

* Add foundational examples for OpenAI Realtime

* Add deprecation warning to OpenAIRealtimeBetaLLMService

* Add deprecation warning to AzureRealtimeBetaLLMService

* Update Changelog
2025-09-07 09:09:57 -04:00
Aleix Conchillo Flaqué
46deaada70 scripts(evals): allow user to talk and only eval when needed 2025-09-06 19:19:08 -07:00
TheNotary
7366b1aee0 adds missing InterimTranscriptionFrame import 2025-09-06 14:40:19 -05:00
Mark Backman
dbc52bc6b0 Remove uv.lock from quickstart 2025-09-06 11:13:50 -04:00
Alex Zhou
d6432589f6 fix: fix format and lint by ruff 2025-09-06 10:50:47 +08:00
Alex Zhou
13b73d4406 feat: Add support for pipecat video stream; fix the bug of duplicate participants when connecting; fix the bug of RTVI messages sent via livekit messages; 2025-09-06 10:41:33 +08:00
Aleix Conchillo Flaqué
85d8282f7e Merge pull request #2602 from pipecat-ai/aleix/pipecat-0.0.84
update CHANGELOG for 0.0.84
2025-09-05 19:35:26 -07:00
Aleix Conchillo Flaqué
070690ec64 update CHANGELOG for 0.0.84 2025-09-05 18:22:50 -07:00
Aleix Conchillo Flaqué
b9c96fd623 Merge pull request #2601 from pipecat-ai/aleix/daily-python-0.19.9
pyproject: update daily-python to 0.19.9
2025-09-05 18:21:49 -07:00
Aleix Conchillo Flaqué
f8b2ab6331 pyproject: update daily-python to 0.19.9 2025-09-05 18:14:57 -07:00
Mark Backman
ea3f7e3c34 Merge pull request #2600 from pipecat-ai/mb/livekit-dtmf
LiveKitTransport: Add support to send DTMF
2025-09-05 15:25:32 -07:00
Mark Backman
2f44f88b08 LiveKitTransport: Add support to send DTMF 2025-09-05 18:23:04 -04:00
Mark Backman
25747a001b Merge pull request #2599 from pipecat-ai/mb/fix-daily-dtmf
DTMF: Add support for native DTMF implementation where available
2025-09-05 15:20:05 -07:00
Mark Backman
fbe4338440 DTMF: Add support for native DTMF implementation where available 2025-09-05 18:16:56 -04:00
Filipi da Silva Fuchter
64b4c65728 Merge pull request #2595 from pipecat-ai/filipi/heygen_quality
Improving HeyGen example video quality.
2025-09-05 17:19:25 -03:00
kompfner
29442969a9 Merge pull request #2597 from pipecat-ai/pk/fix-anthropic-tool-less-usage
Fix Anthropic tool-less usage
2025-09-05 15:30:29 -04:00
Paul Kompfner
dc2e1d4ad3 Fix Anthropic tool-less usage 2025-09-05 11:47:31 -04:00
Filipi Fuchter
5477dfcbea Improving HeyGen example video quality. 2025-09-05 11:30:01 -03:00
kompfner
516f0e08ab Merge pull request #2590 from pipecat-ai/pk/gemini-multimodal-live-doesnt-support-llm-context
Raise an error when attempting to use Gemini Multimodal Live with uni…
2025-09-05 09:22:33 -04:00
Paul Kompfner
246f9f3325 Raise an error when attempting to use Gemini Multimodal Live with universal LLMContext. This is exactly the same error we already have for the other s2s models, AWS Nova Sonic and OpenAI Realtime, it was just missing from this service. 2025-09-04 16:47:08 -04:00
Manish Kumar
4699ee8d86 docs: add docstring for voice_cloning_key and update CHANGELOG 2025-09-04 22:45:51 +05:30
kompfner
3d850e8cc5 Merge pull request #2574 from pipecat-ai/pk/expand-universal-llm-context-support-to-anthropic
Expand universal `LLMContext` support to Anthropic
2025-09-04 13:09:44 -04:00
Paul Kompfner
6e734a37f9 Fix a bug in AWSBedrockLLMService.run_inference(); it was expecting the wrong format for the system instruction 2025-09-04 13:04:15 -04:00
Paul Kompfner
f72ca2fd7d Remove unnecessary system_instruction argument from run_inference() methods 2025-09-04 13:04:15 -04:00
Paul Kompfner
0826d72f74 Add deprecation warning for using enable_prompt_caching_beta param 2025-09-04 13:04:15 -04:00
Paul Kompfner
ba5ebfa0ec Fixed subtle CHANGELOG conflict after release of 0.0.83: universal LLMContext support for Anthropic didn't make that release. Also, some automatic Prettier fixes. 2025-09-04 13:04:11 -04:00
Paul Kompfner
dc3412b2df Bump a deprecation to 0.0.84, as 0.0.83 just shipped 2025-09-04 13:03:06 -04:00
Paul Kompfner
b2e9fd9341 Rename Anthropic enable_prompt_caching_beta parameter to just enable_prompt_caching 2025-09-04 13:03:06 -04:00
Paul Kompfner
c11b207c97 Add Anthropic to CHANGELOG list of services newly supporting runtime LLM switching 2025-09-04 13:03:06 -04:00
Paul Kompfner
d6205027cf Trivial cleanup 2025-09-04 13:03:06 -04:00
Paul Kompfner
986160c077 Fix a bug where the Anthropic adapter's merge-consecutive-messages-with-the-same-role logic was unintentionally affecting the source LLMContext's messages, resulting in more and more duplication of text with each inference 2025-09-04 13:03:06 -04:00
Paul Kompfner
b56ff86fee Minor refactor of AnthropicLLMAdapter cache-control-marker-adding logic (without really changing its behavior) 2025-09-04 13:03:06 -04:00
Paul Kompfner
5c574eaad9 Add support for universal LLMContext to Anthropic LLM service 2025-09-04 13:03:06 -04:00
Paul Kompfner
2df231143a Add foundational example using Anthropic with universal LLMContext 2025-09-04 13:03:06 -04:00
Aleix Conchillo Flaqué
e3597801d4 AWSNovaSonicLLMService: pre-load audio cue in the constructor 2025-09-04 09:31:39 -07:00
Aleix Conchillo Flaqué
65298ab792 update CHANGELOG with AWSBedrockLLMService fix 2025-09-04 09:24:55 -07:00
Aleix Conchillo Flaqué
b609b02614 Merge pull request #2568 from ezisezis/fix-bedrock-timeouts
fix timeout handling in AWSBedrockLLMService
2025-09-04 09:23:28 -07:00
Aleix Conchillo Flaqué
f2b50c14d2 Merge pull request #2573 from pipecat-ai/vp-minor-fixes-07s
example 07s: minor typo updates
2025-09-04 09:21:32 -07:00
Aleix Conchillo Flaqué
ee3b023986 update CHANGELOG with OpenAIImageGenService fix 2025-09-04 09:20:02 -07:00
Aleix Conchillo Flaqué
0d9e1190d7 Merge pull request #2583 from sassanh/main
fix: openai image generator now initiates URLImageRawFrame with correct order of arguments
2025-09-04 09:17:51 -07:00
Mark Backman
595a7c7fbe Merge pull request #2587 from pipecat-ai/mb/update-quickstart-0.0.83
Update quickstart pyproject to use 0.0.83
2025-09-04 07:42:56 -07:00
Mark Backman
586586f743 Update quickstart pyproject to use 0.0.83 2025-09-04 10:36:58 -04:00
Mark Backman
a1c6ad539d Merge pull request #2585 from ashotbagh/feat/asyncai-multilingual-support
feat(asyncai): add multilingual TTS support
2025-09-04 05:03:45 -07:00
Ashot
daf7fed8b3 feat(asyncai): add multilingual TTS support 2025-09-04 13:58:50 +04:00
Adithya Suresh
446d99d194 Bug fixes 2025-09-04 17:08:16 +10:00
Adithya Suresh
cbdbdee4c0 Initial StrandsAgentsProcessor implementation 2025-09-04 16:58:58 +10:00
Sassan Haradji
a26647c433 fix: openai image generator now initiates URLImageRawFrame with correct order of arguments 2025-09-04 06:09:57 +03:30
Aleix Conchillo Flaqué
0fab56fc13 Merge pull request #2577 from pipecat-ai/aleix/pipecat-0.0.83
update CHANGELOG for 0.0.83
2025-09-03 16:49:24 -07:00
Aleix Conchillo Flaqué
f0baff94b2 update CHANGELOG for 0.0.83 2025-09-03 16:47:43 -07:00
Aleix Conchillo Flaqué
d146170fd6 Merge pull request #2580 from pipecat-ai/mb/add-cerebras-evals
Add 14k (CerebrasLLMService) to release evals
2025-09-03 15:08:28 -07:00
Filipi da Silva Fuchter
001a2d36e5 Merge pull request #2579 from pipecat-ai/filipi/input_message
Creating InputTransportMessageUrgentFrame
2025-09-03 19:01:07 -03:00
Filipi Fuchter
99e237b1e2 Fixed an issue where messages received from the transport were always being resent. 2025-09-03 18:58:34 -03:00
Aleix Conchillo Flaqué
978f644f19 Merge pull request #2578 from pipecat-ai/aleix/user-speaking-frame
add UserSpeakingFrame and UserStartedSpeakingFrame/UserStopeedSpeakingFrame updates
2025-09-03 14:55:45 -07:00
Aleix Conchillo Flaqué
5a4c6b9618 BaseInputTransport: push UserStartedSpeakingFrame/UserStoppedSpeakingFrame upstream 2025-09-03 14:32:32 -07:00
Mark Backman
977a57c8fb Add 14k (CerebrasLLMService) to release evals 2025-09-03 17:11:38 -04:00
Mark Backman
c64bc5a636 Merge pull request #2576 from joyceerhl/joyce/cerebras-default
fix: update default Cerebras model to GPT-OSS-120B
2025-09-03 14:10:28 -07:00
Joyce Er
eba006d39c Fix nits 2025-09-03 14:07:49 -07:00
Joyce Er
a001f6f193 Switch to GPT-OSS-120B 2025-09-03 14:00:27 -07:00
Aleix Conchillo Flaqué
09d6ec1098 introduce and push UserSpeakingFrame upstream/downstream 2025-09-03 13:56:01 -07:00
Aleix Conchillo Flaqué
f56be9315a Merge pull request #2572 from pipecat-ai/aleix/deepgram-disconnect-task
ParallePipeline: wait for CancelFrame in all branches
2025-09-03 13:10:55 -07:00
Mark Backman
8e5880b2e7 Merge pull request #2575 from pipecat-ai/mb/fix-foundational-frame-direction
fix: Specify frame direction in 06a push_frame
2025-09-03 12:46:10 -07:00
Joyce Er
d8ac6f2c1a fix: update default Cerebras model to Qwen 3 32B 2025-09-03 12:23:36 -07:00
Mark Backman
052ffe8712 fix: Specify frame direction in 06a push_frame 2025-09-03 15:07:05 -04:00
Aleix Conchillo Flaqué
b52296450c DeepgramSTTService: remove raising CancelledError 2025-09-03 11:24:02 -07:00
Aleix Conchillo Flaqué
c71cec04d3 ParallelPipeline: wait for CancelFrame in all branches 2025-09-03 11:23:37 -07:00
vipyne
83f64ecd3b example 07s: minor typo updates 2025-09-03 12:11:07 -05:00
Aleix Conchillo Flaqué
d19170d8b1 Merge pull request #2565 from pipecat-ai/aleix/reorganize-transports
transports: reorganize module
2025-09-03 08:52:49 -07:00
Mark Backman
8b95d74193 Merge pull request #2571 from pipecat-ai/mb/fix-docs-0.0.83
Fix docs generation before 0.0.83 release
2025-09-03 08:52:20 -07:00
Mark Backman
3c4694a8f1 Fix docs generation before 0.0.83 release 2025-09-03 11:31:14 -04:00
kompfner
b9748b1228 Merge pull request #2563 from pipecat-ai/pk/expand-universal-llm-context-support-to-more-llms
Expand universal `LLMContext` support to more LLMs
2025-09-03 11:20:26 -04:00
Paul Kompfner
def1cf1548 Update CHANGELOG with entry about expanded support among LLM services for new universal LLMContext 2025-09-03 09:07:57 -04:00
Paul Kompfner
9b216116f1 Remove supports_universal_context gate from OpenAI-library-based LLM services. It's no longer needed, as they all now support universal LLMContext. 2025-09-03 09:07:18 -04:00
Paul Kompfner
7cb372ebb9 Add support for universal LLMContext to Together.ai LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
6838bc1e51 Add a few missing LLMContext type hints 2025-09-03 09:07:18 -04:00
Paul Kompfner
e04f42167e Add support for universal LLMContext to SambaNova LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
91a3f63e28 Add support for universal LLMContext to Qwen LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
b24eb76559 Add QWEN_API_KEY to env.example 2025-09-03 09:07:18 -04:00
Paul Kompfner
d9ea02595b Add support for universal LLMContext to Ollama LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
5bc0e49baa Add support for universal LLMContext to Perplexity LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
ec138b97d9 Add support for universal LLMContext to OpenRouter LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
0c32cc29a7 Add support for universal LLMContext to OpenPipe LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
d740bab99e Add support for universal LLMContext to NVIDIA NIM LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
ac62183eb6 Add NVIDIA_API_KEY to env.example 2025-09-03 09:07:18 -04:00
Paul Kompfner
34f823bcac Add support for universal LLMContext to Groq LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
b4e1051066 Add support for universal LLMContext to Grok LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
d8882bc381 Add support for universal LLMContext to Google Vertex AI LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
da18d0a562 Add support for universal LLMContext to Fireworks AI LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
f8e13a82cf Fix Fireworks AI function calling example 2025-09-03 09:07:18 -04:00
Paul Kompfner
2b00d37e94 Add support for universal LLMContext to DeepSeek LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
2dbd17da4d Fix Cerebras function calling example 2025-09-03 09:07:18 -04:00
Paul Kompfner
d45fbd5455 Add support for universal LLMContext to Cerebras LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
b22bdff6d0 Add support for universal LLMContext to Azure LLM service 2025-09-03 09:07:18 -04:00
Paul Kompfner
2b286365e0 Add MISTRAL_API_KEY to env.example 2025-09-03 09:07:18 -04:00
Eduards Klavins
0a3e98857e fix timeout handling in AWSBedrockLLMService 2025-09-03 11:52:30 +03:00
Aleix Conchillo Flaqué
aeb9f1ffca transports: reorganize module 2025-09-02 17:31:39 -07:00
Filipi da Silva Fuchter
7f1100bd4c Merge pull request #2513 from pipecat-ai/filipi/whatsapp
Support for the new WhatsApp Cloud API
2025-09-02 18:06:59 -03:00
Filipi Fuchter
8fbd9b5af7 Added support for WhatsApp User-initiated Calls. 2025-09-02 18:05:00 -03:00
Filipi Fuchter
49c1f0bd08 Fixed SmallWebRTCTransport to not use mid to decide if the transceiver should be sendrecv or not. 2025-09-02 18:04:51 -03:00
Aleix Conchillo Flaqué
ce7a0512f9 Merge pull request #2562 from pipecat-ai/aleix/ai-coustics-speech-enhancement
add ai-coustics speech enhancement filter
2025-09-02 13:13:28 -07:00
Aleix Conchillo Flaqué
fdcd14dd21 updated CHANGELOG with AICFilter and fix deprecations 2025-09-02 13:10:10 -07:00
Mark Backman
0386599163 Added Daily SIP room creation utility (#2560)
* Added Daily SIP room creation utility to configure() function

* Add sip_codecs to the DailyRoomSipParams
2025-09-02 14:12:04 -04:00
Corvin Jaedicke
c1ce3d7d2b bumped aic-sdk version to v1.0.1 with minor changes 2025-09-02 11:11:29 -07:00
Corvin Jaedicke
8ecece2d9c Add AIC SDK audio filter 2025-09-02 11:11:29 -07:00
Filipi da Silva Fuchter
0d8ab7abca Merge pull request #2552 from pipecat-ai/filipi/freeze_issues
Fixed an issue where the pipeline could freeze.
2025-09-02 14:43:08 -03:00
Filipi Fuchter
dea7c22020 Fixed an issue where the pipeline could freeze. 2025-09-02 13:58:41 -03:00
Mark Backman
cfe11267f4 Merge pull request #2546 from pipecat-ai/mb/update-changelog-mem0
Add mem0 changelog entry
2025-09-02 07:22:52 -07:00
Mark Backman
d0c97d3602 Add mem0 changelog entry 2025-09-02 10:03:17 -04:00
Mark Backman
37e1551abc Merge pull request #2555 from pipecat-ai/mb/update-uv-lock-quickstart 2025-09-02 04:19:15 -07:00
Mark Backman
e1477e79f0 Merge pull request #2538 from rimelabs/rime-flush-audio-update
Use Rime’s official {"operation": "flush"} command in flush_audio() for proper text buffer flushing
2025-09-01 18:01:20 -07:00
Mark Backman
547b126d98 Update required pipecat-ai version for quickstart 2025-09-01 20:52:44 -04:00
Mark Backman
447e3b28eb Update uv.lock for quickstart 2025-09-01 20:52:12 -04:00
gokuljs
472efa2971 ruff format fix 2025-09-02 04:25:28 +05:30
Mark Backman
64486ef50b Merge pull request #2536 from gladiaio/PLA-38-missing-config-parameters
Gladia - add missing config parameters
2025-09-01 12:42:10 -07:00
gokuljs
5f801743d0 Add changelog entry for RimeTTSService flush_audio API update 2025-09-01 22:16:02 +05:30
Fabrice Lamant
802c5d04f4 update changelog 2025-09-01 10:21:11 +02:00
Aleix Conchillo Flaqué
83b90da53a Merge pull request #2537 from pipecat-ai/aleix/pipeline-task-cleanup-observers
PipelineTask: cleanup observers
2025-08-31 13:44:38 -07:00
Aleix Conchillo Flaqué
1f49de5cdf Merge pull request #2542 from pipecat-ai/aleix/remove-stop-interruption-frame
frames: remove StopInterruptionFrame
2025-08-31 13:44:22 -07:00
Manish Kumar
2ee481d541 feat: add voice cloning and speaking rate to GoogleTTSService 2025-08-30 23:04:59 +05:30
Mark Backman
7cf099eae7 Merge pull request #2541 from parshvadaftari/user/parshva/update_mem0
Update mem0 integration
2025-08-30 05:11:31 -07:00
Mark Backman
93a8ea3cb2 Merge pull request #2543 from pipecat-ai/mb/docs-extensions
Add Extensions to ref docs generation
2025-08-30 04:20:03 -07:00
Aleix Conchillo Flaqué
776aafddfb Merge pull request #2534 from pipecat-ai/aleix/pyright-1.1.404
pyproject: update pyright and ruff
2025-08-29 19:55:54 -07:00
Mark Backman
d56762262a Fix docs build errors 2025-08-29 20:24:35 -04:00
Mark Backman
bbcf35d657 Add Extensions to reference docs generation 2025-08-29 20:17:34 -04:00
Mark Backman
972546b24f Add IVR navigation (#2529) 2025-08-29 20:08:17 -04:00
Aleix Conchillo Flaqué
8b351f5bec pyproject: update pyright and ruff 2025-08-29 17:02:13 -07:00
Aleix Conchillo Flaqué
bd7d9346b7 frames: remove StopInterruptionFrame 2025-08-29 16:40:01 -07:00
Aleix Conchillo Flaqué
81325be4f3 Merge pull request #2540 from pipecat-ai/aleix/dtmf-tones-slower
audio(dtmf): use longer tones and longer gaps
2025-08-29 15:15:01 -07:00
Aleix Conchillo Flaqué
399f8de6ef audio(dtmf): use longer tones and longer gaps 2025-08-29 15:10:20 -07:00
parshvadaftari
60c070e077 update mem0 integration for reduced latency and better performance 2025-08-30 02:27:36 +05:30
gokuljs
e3f2faabf7 Merge branch 'main' of github.com:rimelabs/pipecat into rime-flush-audio-update 2025-08-30 01:18:50 +05:30
Aleix Conchillo Flaqué
b5a644dd6f PipelineTask: cleanup observers 2025-08-29 10:54:36 -07:00
gokuljs
e06bd6049e update flush operation in flush audio function under rime tts 2025-08-29 21:27:14 +05:30
Fabrice Lamant
25b595e125 add suggestions 2025-08-29 14:51:20 +02:00
Fabrice Lamant
edc8cc1e69 remove sample_rate from GladiaInputParams 2025-08-29 14:00:00 +02:00
Fabrice Lamant
633dd69dee feat: add logging for pipecat version and session url 2025-08-29 13:47:16 +02:00
Fabrice Lamant
1a1d5a1081 feat: add missing config params 2025-08-29 13:46:44 +02:00
Aleix Conchillo Flaqué
c1b8d2acab Merge pull request #2532 from pipecat-ai/aleix/universal-dtmf-support
Universal DTMF support
2025-08-28 21:04:13 -07:00
Aleix Conchillo Flaqué
ea368e4c5f scripts(dtmf): added generate_dtmf.sh to generate DTMF wav files 2025-08-28 21:01:41 -07:00
Aleix Conchillo Flaqué
f03deb6ecc DailyTransport: remove send_dtmf() and write_dtmf() 2025-08-28 21:01:41 -07:00
Aleix Conchillo Flaqué
0e01ac8ef6 BaseOutputTransport: implement generic write_dtmf() 2025-08-28 21:01:41 -07:00
Aleix Conchillo Flaqué
5787743ab3 audio(dtmf): added DTMF audio files and load_dtmf_audio() 2025-08-28 21:01:41 -07:00
Aleix Conchillo Flaqué
79be0695dd make sure warnings are always displayed 2025-08-28 17:43:29 -07:00
Aleix Conchillo Flaqué
a5c5e069ba move pipecat.frames.frames.KeypadEntry to pipecat.audio.dtmf.types.KeypadEntry 2025-08-28 17:43:29 -07:00
Aleix Conchillo Flaqué
77c34076f7 Merge pull request #2531 from pipecat-ai/aleix/pipecat-0.0.82
update CHANGELOG for 0.0.82
2025-08-28 13:04:41 -07:00
Aleix Conchillo Flaqué
d67cece356 update CHANGELOG for 0.0.82 2025-08-28 13:02:47 -07:00
Aleix Conchillo Flaqué
275c8b59c5 MistralLLMService: fix build_chat_completion_params() 2025-08-28 12:04:14 -07:00
Aleix Conchillo Flaqué
5ebcea2a3b scripts(eval): change "result" function call parameter 2025-08-28 11:38:59 -07:00
Aleix Conchillo Flaqué
64f2135ddc examples(14f): use default models 2025-08-28 11:38:59 -07:00
kompfner
a74231f036 Merge pull request #2515 from pipecat-ai/pk/llm-run-frame
Add `LLMRunFrame` to trigger an LLM response, replacing `context_aggr…
2025-08-28 10:01:00 -04:00
Paul Kompfner
189749b579 Add LLMRunFrame to trigger an LLM response, replacing context_aggregator.user().get_context_frame() 2025-08-28 09:53:33 -04:00
Aleix Conchillo Flaqué
e384ca949e Merge pull request #2512 from pipecat-ai/aleix/textframe-skip-tts
TextFrame: add skip_tts field
2025-08-27 16:26:03 -07:00
Aleix Conchillo Flaqué
eb248fedc1 add skip_tts to LLMFullResponseStartFrame/LLMFullResponseEndFrame 2025-08-27 16:23:27 -07:00
Aleix Conchillo Flaqué
16f57be72c LLMConfigureOutputFrame: allow configuring LLM output 2025-08-27 16:23:27 -07:00
Aleix Conchillo Flaqué
5803936838 TextFrame: add skip_tts field
This lets a text frame bypass TTS while still being included in the LLM
context. Useful for cases like structured text that isn’t meant to be spoken but
should still contribute to context.
2025-08-27 16:23:27 -07:00
Aleix Conchillo Flaqué
d9837dd1e5 Merge pull request #2527 from pipecat-ai/aleix/daily-python-0.19.8
pyproject: update daily-python to 0.19.8
2025-08-27 16:22:49 -07:00
Aleix Conchillo Flaqué
e48c9fc3e2 pyproject: update daily-python to 0.19.8 2025-08-27 16:00:36 -07:00
Aleix Conchillo Flaqué
3c4454a33e Merge pull request #2526 from pipecat-ai/aleix/pipeline-task-wait-for-startframe
PipelineTask: wait for StartFrame to reach end of pipeline
2025-08-27 15:57:10 -07:00
Aleix Conchillo Flaqué
2a0780e6ef PipelineTask: wait for StartFrame to reach end of pipeline
Fixes #2498
2025-08-27 14:23:09 -07:00
Aleix Conchillo Flaqué
5e121346fb Merge pull request #2516 from pipecat-ai/aleix/rtvi-client-version-check
RTVIProcessor: make check sure client version is set
2025-08-27 14:02:14 -07:00
Aleix Conchillo Flaqué
2bdca8d22c RTVIProcessor: make check sure client version is set 2025-08-27 13:36:11 -07:00
Aleix Conchillo Flaqué
1f5888bcf7 Merge pull request #2517 from pipecat-ai/aleix/unify-get-messages-for-logging
unify get_messages_for_logging()
2025-08-27 12:49:36 -07:00
Mark Backman
3d09f9a2af Merge pull request #2524 from pipecat-ai/mb/cartesia-speed
Cartesia: update speed InputParam
2025-08-27 12:47:29 -07:00
Aleix Conchillo Flaqué
cd3563bb16 unify get_messages_for_logging()
Some implementations were returing a list and some were returning a JSON
string. They should all return a list and the user would decide if it wants to
transform that into JSON.
2025-08-27 12:45:24 -07:00
Aleix Conchillo Flaqué
3e79ef4118 Merge pull request #2525 from pipecat-ai/aleix/daily-fix-send-dtmf
DailyTransport: fix sending DTMF tones
2025-08-27 12:44:27 -07:00
Aleix Conchillo Flaqué
2613da1a1f PipelineTask: increase CANCEL_TIMEOUT_SECS to 20 2025-08-27 11:50:48 -07:00
Aleix Conchillo Flaqué
41d40f9a11 DailyTransport: make sure we have a client before joining/leaving 2025-08-27 11:50:48 -07:00
Aleix Conchillo Flaqué
74af2b6aa4 DailyTransport: fix sending DTMF tones 2025-08-27 11:50:48 -07:00
Mark Backman
f7d9f32b0f Cartesia: update speed InputParam 2025-08-27 13:34:28 -04:00
Mark Backman
6074af60ef Merge pull request #2521 from pipecat-ai/mb/update-quickstart-pcc-docker
Update quickstart to use pcc docker command
2025-08-27 08:13:31 -07:00
Mark Backman
7ef6893c0d Merge pull request #2523 from sam-s10s/fix/connection-none
Speechmatics TTS connection issue
2025-08-27 08:09:46 -07:00
Sam Sykes
cc5557e051 changelog 2025-08-27 16:07:31 +01:00
Sam Sykes
06f7a92c99 fix to finally statement 2025-08-27 14:43:07 +01:00
Mark Backman
61a333ccae Update quickstart to use pcc docker command 2025-08-26 21:29:13 -04:00
Mark Backman
fc3d84dff7 Merge pull request #2501 from pipecat-ai/mb/aws-tts-more-flexible-auth
Support additional authentication mechanisms for AWS services
2025-08-26 18:05:37 -07:00
Mark Backman
86a37d8cea Add changelog entry for SentryMetrics missing import fix 2025-08-26 21:00:16 -04:00
Mark Backman
3f66acf9f1 Merge pull request #2520 from geluso/bugfix-missing-asyncio-import
add missing import asyncio
2025-08-26 17:59:25 -07:00
Mark Backman
facfaa2dd4 AWSBedrockLLMService: Allow setting auth credentials via env vars 2025-08-26 20:59:12 -04:00
Mark Backman
8250c381d1 AWSPollyTTSService: allow setting auth credentials through provider chain 2025-08-26 20:58:02 -04:00
Steve Geluso
32f9e48865 add missing import asyncio 2025-08-26 17:40:11 -07:00
Filipi Fuchter
76eef837b6 Removing watchdog from SarvamTTSService. 2025-08-26 18:44:58 -03:00
Filipi Fuchter
c9aaa463b7 Mentioning the recent SarvamTTSService changes in the changelog. 2025-08-26 18:44:58 -03:00
pratham-sarvam
6d582e41b7 Added Sarvam TTS Websocket Implementation (#2356)
* Added Sarvam TTS Websocket Implementation

* Addressed some of the comments on PR

* added change voice logic

* added changes from main

* pushing text frames and added flush audio

* updated docs string for better docs

* Addressed comments and added some improvements

* pushed optional args down

* removed new line

* made aiohttp session mandatory in http service

* added push frame and removed unused function

* removed pong message

* added disconnecting logic

---------

Co-authored-by: vinayak-sarvam <vinayak@sarvam.ai>
2025-08-26 18:10:26 -03:00
kompfner
ca29f62bff Merge pull request #2510 from pipecat-ai/pk/fix-set-tools-types
Update types for tools in `LLMSetToolsFrame` and `LLMContextAggregato…
2025-08-26 14:12:21 -04:00
Aleix Conchillo Flaqué
0dced68c3c Merge pull request #2511 from pipecat-ai/aleix/end-of-pipline-warning
PipelineTask: warn if CancelFrame doesn't reach the end
2025-08-26 11:02:26 -07:00
Aleix Conchillo Flaqué
8ab81d289a PipelineTask: warn if CancelFrame doesn't reach the end 2025-08-26 10:36:33 -07:00
Paul Kompfner
f457d00760 Update types for tools in LLMSetToolsFrame and LLMContextAggregator.set_tools(), for two reason:
1. `ToolsSchema` has been supported in `LLMSetToolsFrame` for a while but wasn't properly reflected in these type hints
2. The new universal `LLMContext` expects tools to be either a `ToolsSchema` or `NOT_GIVEN`.
2025-08-26 11:32:21 -04:00
kompfner
f5118c4412 Merge pull request #2440 from pipecat-ai/pk/prototype-llm-failover-attempt-4
Support for runtime LLM switching
2025-08-26 09:55:03 -04:00
Paul Kompfner
a79fe40162 Fix a typo in the CHANGELOG 2025-08-26 09:51:48 -04:00
Paul Kompfner
dcb4949e20 Move ServiceSwitcherFrame and ManuallySwitchServiceFrame to frames.py 2025-08-26 09:47:37 -04:00
Paul Kompfner
8b543e558d Add CHANGELOG entry describing LLMService.run_inference() 2025-08-26 09:47:32 -04:00
Paul Kompfner
8181962236 Add CHANGELOG entry describing LLM switcher 2025-08-26 09:46:51 -04:00
Paul Kompfner
98dc891640 Move CHANGELOG log entry from 0.0.81 to Unreleased 2025-08-26 09:45:49 -04:00
Paul Kompfner
71de0da570 ServiceSwitchers are now controlled using frames rather than with direct method calls 2025-08-26 09:44:15 -04:00
Paul Kompfner
b40c8bb81d Refactor LLMSwitcher into a base ServiceSwitcher and an LLMSwitcher that subclasses it 2025-08-26 09:44:15 -04:00
Paul Kompfner
43f1b59b86 Convert LLM generate_summary() methods to the more generic run_inference() 2025-08-26 09:44:15 -04:00
Paul Kompfner
a0a2bb3aa4 In GeminiLLMAdapter, when translating from the universal LLMContext format, only pull out the first "system" message as the system instruction, and convert subsequent ones into "user" messages. This is a more correct thing to do than simply drop subsequent "system" messages, especially when potentially sharing a context between multiple LLMs. 2025-08-26 09:44:15 -04:00
Paul Kompfner
04a50df3d5 Add LLMSwitcher, with LLMSwitcherStrategyManual as the first supported switching strategy 2025-08-26 09:44:15 -04:00
Paul Kompfner
8c0edffaff Fix bug in AWS Bedrock conversation summarization. It was using an out-of-date pattern (the _client property no longer exists) 2025-08-26 09:44:15 -04:00
Paul Kompfner
fe6063fdbe Introduce an affordance to LLMService for generating a summary of a conversation directly (i.e. without going through the pipeline).
This abstraction will allow us to update Pipecat Flows to avoid reaching into LLM service internals to generate summaries.

In addition to being a helpful refactor to remove a fragile part of Pipecat Flows, this change helps set the stage for supporting the upcoming `LLMSwitcher`, where the “active” LLM will only be determined at runtime—today, Pipecat Flows needs to know ahead of time what type of LLM it’s working with, to load an LLM-specific “adapter” that does the work of generating summaries, among other things.
2025-08-26 09:44:15 -04:00
Paul Kompfner
195146adb2 Bump deprecation warning version, as this commit is not expected to ship until version 0.0.82. 2025-08-26 09:44:15 -04:00
Paul Kompfner
cab9e18cc9 Port recent change to LLMAssistantContextAggregator to universal LLMAssistantAggregator 2025-08-26 09:44:15 -04:00
Paul Kompfner
baef688e4e Port recent changes to LLMUserContextAggregator to universal LLMUserAggregator 2025-08-26 09:44:15 -04:00
Paul Kompfner
f1f43fe500 After a rebase, rename foundational examples showing usage of universal context to avoid naming conflict with a recently-added example. 2025-08-26 09:44:15 -04:00
Paul Kompfner
73b63f8d35 Remove unnecessary import 2025-08-26 09:44:15 -04:00
Paul Kompfner
0c14b33e92 Deprecate GoogleLLMOpenAIBetaService 2025-08-26 09:44:15 -04:00
Paul Kompfner
09beaccaf0 Assorted minor improvements after code review 2025-08-26 09:44:15 -04:00
Paul Kompfner
40557a1aae Remove TODO comment 2025-08-26 09:44:15 -04:00
Paul Kompfner
ecc4cc4a79 Add support for universal LLMContext to RTVIObserver 2025-08-26 09:44:15 -04:00
Paul Kompfner
37be8805f4 ruff 2025-08-26 09:44:15 -04:00
Paul Kompfner
93c7e64995 Add missing PERPLEXITY_API_KEY in env.example 2025-08-26 09:44:15 -04:00
Paul Kompfner
9de2bd61a9 Add supports_universal_context for OpenAILLMService subclasses so that we can gradually roll out support for universal LLMContext in a controlled manner.
Also update `get_chat_completions()` implementations with the new argument type.
2025-08-26 09:44:15 -04:00
Paul Kompfner
566af71862 Add CHANGELOG entry for the universal LLMContext machinery 2025-08-26 09:44:15 -04:00
Paul Kompfner
12064bd6e6 Add a bit of helpful info in an error message 2025-08-26 09:44:15 -04:00
Paul Kompfner
a962459151 Change LLMContextAggregatorPair.create(context) to LLMContextAggregatorPair(context) 2025-08-26 09:44:15 -04:00
Paul Kompfner
8fc76a29bc Raise errors when trying to use universal LLMContext with LLM services that don't yet support it 2025-08-26 09:44:15 -04:00
Paul Kompfner
e3019261a5 Fix classes that subclass BaseLLMAdapter by adding placeholder stuff until support for universal LLMContext machinery comes to all LLM services 2025-08-26 09:44:15 -04:00
Paul Kompfner
fa1f6f1c51 In LLMContext, normalize an empty provided ToolsSchema to NOT_GIVEN 2025-08-26 09:44:15 -04:00
Paul Kompfner
337f00c16c Minor fix: add a type annotation 2025-08-26 09:44:15 -04:00
Paul Kompfner
d50922cdcd Update Google adapter to handle possibility of system message in standard format being provided as a list of text parts rather than just a string. 2025-08-26 09:44:15 -04:00
Paul Kompfner
47f5ca6265 Update Gemini adapter to be able to handle LLMSpecificMessages containing Google-formatted messages 2025-08-26 09:44:15 -04:00
Paul Kompfner
2eddb6ffda [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Remove outdated comment
2025-08-26 09:44:15 -04:00
Paul Kompfner
560a6f2247 [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Make `LLMContext.add_audio_frames_message()` respect the OpenAI standard format
2025-08-26 09:44:15 -04:00
Paul Kompfner
59ecb19000 [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Add support for LLM-specific messages in the universal `LLMContext`, to enable using LLM-specific functionality while still using the universal LLM context
2025-08-26 09:44:15 -04:00
Paul Kompfner
cfb094b3c8 [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Make it so that tools in `LLMContext` are guaranteed to be either a `ToolsSchema` or `NOT_GIVEN`
2025-08-26 09:44:15 -04:00
Paul Kompfner
1f7e8e001b [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Update some types to also allow for universal `LLMContext`
2025-08-26 09:44:15 -04:00
Paul Kompfner
688b136141 [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Add to Google LLM service support for universal LLM context
2025-08-26 09:44:15 -04:00
Paul Kompfner
809c4c1bc5 [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Add to OpenAI LLM service support for universal LLM context
2025-08-26 09:44:15 -04:00
Paul Kompfner
81ca5e6601 [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Formatting fix + dead import cleanup
2025-08-26 09:44:15 -04:00
Paul Kompfner
ebc49d2252 [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Add a "universal" alias for `OpenAILLMContextAssistantTimestampFrame`: `LLMContextAssistantTimestampFrame`
2025-08-26 09:44:15 -04:00
Paul Kompfner
ff8d158e18 [WIP] Universal (LLM-agnostic) context machinery to support runtime LLM switching.
- Added universal `LLMContext` and associated context aggregators.
2025-08-26 09:44:15 -04:00
Aleix Conchillo Flaqué
37980b0854 Merge pull request #2504 from pipecat-ai/aleix/cartesia-fix-timeout-reconnection
CartesiaTTSService: reconnect on Cartesia's timeout
2025-08-25 15:24:31 -07:00
Aleix Conchillo Flaqué
39ebc2c9c1 CartesiaTTSService: reconnect on Cartesia's timeout 2025-08-25 14:09:03 -07:00
Aleix Conchillo Flaqué
ab61d09ec1 Merge pull request #2502 from pipecat-ai/aleix/pipecat-0.0.81
update CHANGELOG for 0.0.81
2025-08-25 09:28:21 -07:00
Aleix Conchillo Flaqué
e4afc0a13c update CHANGELOG for 0.0.81 2025-08-25 08:22:28 -07:00
Mark Backman
dde3d2395b Merge pull request #2491 from pipecat-ai/mb/update-quickstart 2025-08-23 06:34:37 -07:00
Aleix Conchillo Flaqué
30b36c3d6e Merge pull request #2497 from pipecat-ai/aleix/pipeline-task-fix-cancellation
PipelineTask: handle cancellations gracefully
2025-08-22 22:37:12 -07:00
Mark Backman
de4dfc3ed4 Update deployment steps 2025-08-23 00:19:26 -04:00
Aleix Conchillo Flaqué
a0128516ff PipelineTask: handle cancellations gracefully 2025-08-22 19:04:31 -07:00
Aleix Conchillo Flaqué
db3b8c7325 Merge pull request #2496 from pipecat-ai/aleix/release-evals-always-provide-eval-prompt
scripts(evals): always require an eval prompt
2025-08-22 18:11:33 -07:00
Aleix Conchillo Flaqué
9273ec0f25 scripts(evals): always require an eval prompt 2025-08-22 16:57:47 -07:00
Mark Backman
8dfa1187be Merge pull request #2402 from pipecat-ai/mb/voicemail-detection
Add voicemail detection
2025-08-22 14:51:13 -07:00
Mark Backman
e17fd580c6 Update README 2025-08-22 15:56:56 -04:00
mattie ruth backman
3e3d50a855 Fix issue with request images from the camera introduced in smallwebrtctransport 2025-08-22 15:02:33 -04:00
Mark Backman
402661ae03 Prevent user speaking frames from entering the classifier branch after a conversation is detected 2025-08-22 14:09:45 -04:00
Mark Backman
69c6a95b8a Simplify frames in the NotifierGate 2025-08-22 14:09:45 -04:00
Mark Backman
4d49210a73 Rename system_prompt to custom_system_prompt; improve dev ex for classification prompt requirements 2025-08-22 14:09:45 -04:00
Aleix Conchillo Flaqué
5f8a22ef2f Merge pull request #2493 from pipecat-ai/aleix/runner-task-asyncio-cancellation
PipelineRunner/PipelineTask: fix asyncio task cancellation
2025-08-22 09:13:58 -07:00
Aleix Conchillo Flaqué
606ad0826a Merge pull request #2492 from pipecat-ai/aleix/wait-for-task-deprecated
FrameProcessor: wait_for_task is now deprecated
2025-08-22 09:13:34 -07:00
Mark Backman
57028255ee Update changelog, mention text LLMs only 2025-08-22 12:12:17 -04:00
Mark Backman
87ebbab758 Only set/clear voicemail_event when voicemail is detected 2025-08-22 12:12:17 -04:00
Mark Backman
bd401e8d6f Rename TTSBuffer to TTSGate 2025-08-22 12:12:17 -04:00
Mark Backman
f0dfab23e7 Cleanup 2025-08-22 12:12:17 -04:00
Mark Backman
fbc907c371 Change path to extensions 2025-08-22 12:12:17 -04:00
Mark Backman
40b5ef485d Add base NotifierGate class and ClassifierGate, ConversationGate subclasses 2025-08-22 12:12:17 -04:00
Mark Backman
b30af3e155 Tests specify USER_SPEAKS_FIRST or BOT_SPEAKS_FIRST 2025-08-22 12:12:17 -04:00
Mark Backman
446bb5cddf Refactor callback to event 2025-08-22 12:12:17 -04:00
Mark Backman
1c1ee94074 Add 44 to evals, update evals to support user speaking first 2025-08-22 12:12:17 -04:00
Mark Backman
ac30083b45 Add CHANGELOG entry 2025-08-22 12:12:17 -04:00
Mark Backman
ce579d4266 Make on_voicemail_detected callback required, cleanup logging 2025-08-22 12:12:17 -04:00
Mark Backman
5a07b30c7a Class name changes, add TTSStarted/StoppedFrame to the TTSBuffer 2025-08-22 12:12:17 -04:00
Mark Backman
9da33f3897 Handle multiple user inputs from the user when a voicemail is detected; add a configurable timeout to emitting the callback 2025-08-22 12:12:17 -04:00
Mark Backman
5ca82ec61e Final docstrings, comments, and cleanup 2025-08-22 12:12:17 -04:00
Mark Backman
0067c7df47 Add aggregation to classifier LLM output and validate prompt 2025-08-22 12:12:17 -04:00
Mark Backman
ab03db5b0c Updated prompt, add custom system_prompt input 2025-08-22 12:12:17 -04:00
Mark Backman
238d6bf9ab Add buffering logic 2025-08-22 12:12:17 -04:00
Mark Backman
90ae85bab2 More updates—added new voicemail module 2025-08-22 12:12:17 -04:00
Mark Backman
29e09b2053 POC demo in progress 2025-08-22 12:12:17 -04:00
mattie ruth backman
bad9977e8c PR feedback and more explicit about only supporting exporting 1 video 2025-08-22 11:24:22 -04:00
mattie ruth backman
b987579d54 update smallWebRTC screen support to support the utils format for listening to screenshares 2025-08-22 11:24:22 -04:00
mattie ruth backman
40f1f4ff11 Add support to smallWebRTCTransport for receiving screenshare videos 2025-08-22 11:24:22 -04:00
Aleix Conchillo Flaqué
a3ad31d0f6 README: recommended python version is 3.12 2025-08-21 23:50:00 -07:00
Aleix Conchillo Flaqué
8044c4170d PipelineRunner/PipelineTask: fix asyncio task cancellation 2025-08-21 23:50:00 -07:00
Aleix Conchillo Flaqué
bc51e7abc6 FrameProcessor: wait_for_task is now deprecated 2025-08-21 21:17:47 -07:00
Aleix Conchillo Flaqué
256ecf4d71 Merge pull request #2490 from pipecat-ai/aleix/speechmatics-exceptions
Speechmatics exception handling
2025-08-21 19:48:43 -07:00
Aleix Conchillo Flaqué
c16969c4f5 Merge pull request #2489 from pipecat-ai/aleix/daily-python-0.19.7
pyproject: update daily-python to 0.19.7
2025-08-21 19:48:31 -07:00
Mark Backman
8ef64d8c8d Update quickstart, make it deployable 2025-08-21 22:32:34 -04:00
Aleix Conchillo Flaqué
4947d08733 GladiaSTTService: update loggin levels 2025-08-21 18:42:23 -07:00
Aleix Conchillo Flaqué
b61846534d SpeechmaticsSTTService: improve exception handling and loggin 2025-08-21 18:42:23 -07:00
Aleix Conchillo Flaqué
8f01cd220a pyproject: update daily-python to 0.19.7 2025-08-21 18:40:01 -07:00
Aleix Conchillo Flaqué
3abaaf80e0 Merge pull request #2487 from pipecat-ai/aleix/watchdog-timers-removal
remove watchdog timers and specific asyncio implementations
2025-08-21 18:37:35 -07:00
Aleix Conchillo Flaqué
13890fa021 github(tests): use python 3.12 to run unit tests/coverage 2025-08-21 18:09:56 -07:00
Aleix Conchillo Flaqué
802af28888 update pytest-asyncio to 1.1.0 2025-08-21 18:09:56 -07:00
Aleix Conchillo Flaqué
24a628c85e remove watchdog timers and specific asyncio implementations
Watchdog timers have been removed. They were introduced in 0.0.72 to help
diagnose pipeline freezes. Unfortunately, they proved ineffective since they
required developers to use Pipecat-specific queues, iterators, and events to
correctly reset the timer, which limited their usefulness and added friction.
2025-08-21 18:09:56 -07:00
Mark Backman
ddab95835b Merge pull request #2474 from pipecat-ai/mb/add-frames-pipeline-idle
Add UserStarted/StoppedSpeakingFrames to idle_timeout_frames
2025-08-21 03:45:46 -07:00
Mark Backman
cb13f4b4cb Add user speaking and transcription frames to idle_timeout_frames 2025-08-21 06:43:10 -04:00
Aleix Conchillo Flaqué
4793277d34 Merge pull request #2480 from pipecat-ai/aleix/replace-asyncio-waitfor
replace asyncio.wait_for for wait_for2.wait_for
2025-08-20 17:43:32 -07:00
Aleix Conchillo Flaqué
28c729cc36 replace asyncio.wait_for for wait_for2.wait_for 2025-08-20 15:26:57 -07:00
Aleix Conchillo Flaqué
4d07c7b77c Merge pull request #2479 from pipecat-ai/aleix/simplify-dtmf-aggregator
DTMFAggregator: no need for interruption task
2025-08-20 15:15:35 -07:00
Aleix Conchillo Flaqué
4ff0567025 BaseObject: allow keyword arguments 2025-08-20 15:14:31 -07:00
Aleix Conchillo Flaqué
1377dec01b DTMFAggregator: no need for interruption task
Now that system frames are queued there's no need to have an additional task to
push a `BotInterruptionFrame`.
2025-08-20 14:35:04 -07:00
Aleix Conchillo Flaqué
42f4d73a63 Merge pull request #2478 from pipecat-ai/aleix/fix-wait-for2-import
timeout: fix wait_for2 import
2025-08-20 14:29:19 -07:00
Aleix Conchillo Flaqué
f1c1ebf852 timeout: fix wait_for2 import 2025-08-20 14:24:16 -07:00
Aleix Conchillo Flaqué
eb6d43f6cb Merge pull request #2476 from pipecat-ai/aleix/add-asyncio-timeout
implement custom asyncio.wait_for()
2025-08-20 14:20:22 -07:00
Aleix Conchillo Flaqué
f387776985 add custom asyncio.wait_for()
This patch uses `wait_for2` package to implement `asyncio.wait_for()` for
Python < 3.12.

In Python 3.12, `asyncio.wait_for()` is implemented in terms of
`asyncio.timeout()` which fixed a bunch of issues. However, this was never
backported (because of the lack of `async.timeout()`) and there are still many
remainig issues, specially in Python 3.10, in `async.wait_for()`.

See https://github.com/python/cpython/pull/98518
2025-08-20 14:09:05 -07:00
Aleix Conchillo Flaqué
5286591826 Merge pull request #2464 from pipecat-ai/aleix/frame-processor-updates
various frame processor updates
2025-08-20 10:11:49 -07:00
Aleix Conchillo Flaqué
6831e63ec9 PipelineTask: use PipelineSource/PipelineSink and remove tasks 2025-08-20 10:08:54 -07:00
Aleix Conchillo Flaqué
12bcb7db64 ParallelPipeline: use PipelineSource/PipelineSink and remove tasks 2025-08-20 10:08:54 -07:00
Aleix Conchillo Flaqué
1b48b1d860 Pipeline: allow passing user source and sink processors 2025-08-20 10:08:54 -07:00
Aleix Conchillo Flaqué
d161e2767f FrameProcessor: allow pausing/resuming system frames 2025-08-20 10:08:54 -07:00
Aleix Conchillo Flaqué
4e3af00b6d tests: try to use default SleepFrame time 2025-08-20 10:08:54 -07:00
Aleix Conchillo Flaqué
4015aedb86 tests: fix unit tests 2025-08-20 10:08:54 -07:00
Aleix Conchillo Flaqué
75a6ee839b BaseObserver: added new on_process_frame 2025-08-20 10:08:54 -07:00
Aleix Conchillo Flaqué
13ce02c896 FrameProcessor: add new entry_processors() method 2025-08-20 10:08:54 -07:00
Aleix Conchillo Flaqué
2fd5885dc3 pipeline: implement processors property 2025-08-20 07:40:21 -07:00
Aleix Conchillo Flaqué
d743586bfb BasePipeline: move processors_with_metrics() to FrameProcessor 2025-08-20 07:40:21 -07:00
Aleix Conchillo Flaqué
8051017895 pipeline: wrap with pipelines, use direct mode and reduce tasks 2025-08-20 07:40:21 -07:00
Aleix Conchillo Flaqué
dc7bf98ce5 Pipeline: improve performance by using direct mode 2025-08-20 07:40:21 -07:00
Aleix Conchillo Flaqué
609a43a191 FrameProcessor: added processors/next/previous properties 2025-08-20 07:40:19 -07:00
Aleix Conchillo Flaqué
4fb04422d9 FrameProcessor: remove unused set_parent/get_parent 2025-08-20 07:40:02 -07:00
Mark Backman
2f74a7e674 Merge pull request #2469 from pipecat-ai/mb/11labs-text-normalization
Add apply_text_normalization to ElevenLabs TTS services
2025-08-19 18:21:33 -07:00
Mark Backman
5205f56087 Add apply_text_normalization to ElevenLabs TTS services 2025-08-19 21:19:00 -04:00
Mark Backman
694c792af3 Merge pull request #2470 from pipecat-ai/mb/11labs-settings-reconnect
Update ElevenLabsTTSService: update runtime configuration
2025-08-19 18:18:14 -07:00
TheNotary
48b3ad8f8f adds support for creating InterimTranscriptFrames for Azure speech services 2025-08-19 17:00:42 -05:00
Mark Backman
406e82a842 Merge pull request #2438 from pipecat-ai/mb/delete-old-docs
Remove stale docs
2025-08-19 12:22:54 -07:00
Mark Backman
837de5f893 Merge pull request #2468 from pipecat-ai/mb/fix-mistral-docs-errors
Fix Mistral docstrings build errors
2025-08-19 12:22:26 -07:00
Mark Backman
10b9b1da2f Merge pull request #2471 from pipecat-ai/mb/add-13j
Add foundational 13j for Azure STT
2025-08-19 12:10:03 -07:00
Mark Backman
7854a2ec83 Add foundational 13j for Azure STT 2025-08-19 14:36:31 -04:00
Mark Backman
ac7c69078f Merge pull request #2442 from pipecat-ai/mb/retry-completion
retry_on_timeout: Anthropic, AWS Bedrock
2025-08-19 11:23:43 -07:00
Mark Backman
c9b4356ea6 Update changelog 2025-08-19 14:21:18 -04:00
Mark Backman
b3e4421191 Add retry_on_timeout to AWSBedrockLLMService 2025-08-19 14:20:35 -04:00
Mark Backman
84058c3948 Add retry_on_timeout to AnthropicLLMService 2025-08-19 14:20:35 -04:00
Mark Backman
aebc781419 Update ElevenLabsTTSService to update when voice_settings change 2025-08-19 13:51:10 -04:00
Mark Backman
4160446f4c Update ElevenLabsTTSService: reconnect on model and language changes 2025-08-19 11:32:54 -04:00
Mark Backman
05a14af184 Fix Mistral docstrings build errors 2025-08-19 10:31:03 -04:00
Filipi da Silva Fuchter
89d2ef2bde Merge pull request #2465 from pipecat-ai/filipi/heygen_changing_log_level
Changing heygen log level to trace.
2025-08-19 07:50:11 -03:00
Filipi Fuchter
f550015efb Changing heygen log level to trace. 2025-08-18 18:00:25 -03:00
Abhishek
8bbdc7c8d1 Only set last_frame_time when handling OutputAudioRawFrame
We don't want to set `last_frame_time` on other frames like `HeartBeatFrame`, `LLMGeneratedTextFrame`, `InterruptionFrames` so that we can calculate `diff_time` and compare it against `vad_stop_secs` properly
2025-08-16 16:25:14 +05:30
Mark Backman
8fa44863fb Merge pull request #2455 from pipecat-ai/vp-log-line
log: add Disconnected from ElevenLabs debug log
2025-08-15 14:12:28 -07:00
vipyne
088cb56922 log: add Disconnected from ElevenLabs debug log 2025-08-15 15:05:07 -05:00
Aleix Conchillo Flaqué
a789e5feea Merge pull request #2451 from pipecat-ai/aleix/audio-buffer-processor-overlap
AudioBufferProcessor: fix overlap when buffer size is set
2025-08-14 15:31:50 -07:00
Aleix Conchillo Flaqué
16ca44131c Merge pull request #2452 from pipecat-ai/aleix/runner-daily-direct-handlesigint
Runner: set handle_sigint to True for Daily direct
2025-08-14 15:25:05 -07:00
Mark Backman
418860cf26 Merge pull request #2450 from pipecat-ai/mb/fix-openai-changelog-entry
fix: Move OpenAI retry changelog entry to the correct release
2025-08-14 15:23:00 -07:00
Aleix Conchillo Flaqué
e2fc8b3dce Runner: set handle_sigint to True for Daily direct 2025-08-14 14:55:52 -07:00
Aleix Conchillo Flaqué
8b641089f8 AudioBufferProcessor: fix overlap when buffer size is set 2025-08-14 14:44:08 -07:00
Mark Backman
d36ed755ce fix: Move OpenAI retry changelog entry to the correct release 2025-08-14 17:34:35 -04:00
Mark Backman
7aaf64fe55 Merge pull request #2447 from pipecat-ai/mb/update-foundational-readme
Improve the foundational example README
2025-08-14 09:51:01 -07:00
Mark Backman
5f52008974 Improve the foundational example README 2025-08-14 11:29:04 -04:00
Mark Backman
d520677b23 Merge pull request #2408 from pipecat-ai/mb/add-mistral-llm
Add MistralLLMService
2025-08-14 08:19:18 -07:00
Mark Backman
42bd1e9d40 Add Mistral to README and pyproject.toml 2025-08-14 11:15:52 -04:00
Mark Backman
7f0494aa04 Override build_chat_completion_params for Mistral 2025-08-14 10:32:18 -04:00
Mark Backman
b7ae2989ac Add foundational 14w-function-calling.py 2025-08-14 10:00:46 -04:00
Mark Backman
2b2b0f8121 Add MistralLLMService 2025-08-14 09:57:14 -04:00
Mark Backman
5ca33a2b00 Merge pull request #2445 from pipecat-ai/mb/fix-changelog-asyncai
fix: Changelog for Async AI bugfix
2025-08-14 06:48:08 -07:00
Mark Backman
938dcb613d fix: Changelog for Async AI bugfix 2025-08-14 09:13:03 -04:00
Mark Backman
bc748cf9d0 Merge pull request #2444 from ashotbagh/fix/asyncai-force-flush
fix(asyncai): force flush WS TTS to eliminate stalls
2025-08-14 06:10:16 -07:00
Ashot
3b55d16a49 fix(asyncai): force flush WS TTS to eliminate stalls 2025-08-14 16:34:34 +04:00
Mark Backman
d7f31e0cbd Merge pull request #2387 from pipecat-ai/mb/retry-chat-completion
Retry chat completions for OpenAILLMService and its subclasses
2025-08-13 14:39:40 -07:00
Mark Backman
c662a2d820 Merge pull request #2437 from pipecat-ai/mb/19-english
Foundational 19: Respond in English
2025-08-13 11:57:24 -07:00
Mark Backman
2c220ca54e Remove stale docs 2025-08-13 14:11:41 -04:00
Mark Backman
89f0ff17c0 Merge pull request #2430 from pipecat-ai/aleix/pipecat-0.0.80
update CHANGELOG for 0.0.80
2025-08-13 09:41:43 -07:00
Mark Backman
b5465364fa Foundational 19: Respond in English 2025-08-13 12:37:13 -04:00
Aleix Conchillo Flaqué
c024eb7b8c update CHANGELOG for 0.0.80 2025-08-13 11:46:24 -04:00
Mark Backman
608570e89d Merge pull request #2433 from pipecat-ai/mb/openai-realtime-text-modality
fix: Add text support to OpenAIRealtimeBetaLLMService
2025-08-13 08:41:33 -07:00
Mark Backman
3ad61a8a04 Remove stray - in changelog 2025-08-13 11:39:59 -04:00
Mark Backman
4c4bae2db6 Remove unnessecary messages from 19 and 19b examples 2025-08-13 11:39:59 -04:00
Mark Backman
901b6b5913 Add foundational 19b 2025-08-13 11:37:38 -04:00
Mark Backman
71cd0f1c87 fix: Add text support to OpenAIRealtimeBetaLLMService 2025-08-13 11:37:36 -04:00
Filipi da Silva Fuchter
a2a419e6db Merge pull request #2435 from pipecat-ai/filipi/small_webrtc_end_pipeline
Fixed an issue where `SmallWebRTCTransport` ended before TTS finished.
2025-08-13 11:58:33 -03:00
Filipi Fuchter
bbbbdc459a Fixed an issue where SmallWebRTCTransport ended before TTS finished. 2025-08-13 11:46:51 -03:00
Mark Backman
d203528dad Merge pull request #2333 from yohan-altrium/fix/2277-azure-tts-ssml-reserved-characters
Fixes 2277 - SSML reserved characters causes Azure TTS to fail
2025-08-13 06:27:30 -07:00
Yohan Liyanage
4bcca7956e Refactors the code based on PR comments and adds the relevant changelog entry. 2025-08-13 16:34:33 +05:30
Aleix Conchillo Flaqué
68a4cf4c68 Merge pull request #2427 from pipecat-ai/aleix/base-watchdog-priority-queue
WatchdogPriorityQueue: this is now a base class
2025-08-12 18:25:59 -07:00
Aleix Conchillo Flaqué
0508ddddfb WatchdogPriorityQueue: fix watchdog sentinel insertion
We now force each inserted item in the priority queue to be a tuple and the
actual value to be last in the tuple. All the previous values in the tuple also
need to be numeric.
2025-08-12 17:40:58 -07:00
Mark Backman
8714c9137f Code review fixes 2025-08-12 17:49:13 -04:00
Mark Backman
4c029fcfa7 Update OpenAILLMService subclasses to use the new build_chat_completion_params function 2025-08-12 17:48:51 -04:00
Mark Backman
5c86f8e687 Add timeout/retry logic and refactor parameter building in BaseOpenAILLMService
- Add timeout (default 5.0s) and retry_on_timeout parameters to constructor
- Implement timeout/retry logic in get_chat_completions using asyncio.wait_for
- Extract build_chat_completion_params() as public method for subclass customization
2025-08-12 17:48:51 -04:00
Mark Backman
54a4d8a9f8 Merge pull request #2422 from thsunkid/thu/fix-set-lang-in-base-whisper
Fix: assigns string code instead of Language enum to BaseWhisperSTTService._language
2025-08-12 11:57:46 -07:00
Mark Backman
38af514d95 Merge pull request #2407 from pipecat-ai/mb/add-gemini-tts
Add GeminiTTSService
2025-08-12 11:56:45 -07:00
Aleix Conchillo Flaqué
6aa80c0b8e Merge pull request #2424 from pipecat-ai/aleix/system-frame-queues-fix
FrameProcessor: fix race condition on FrameProcessorQueue
2025-08-12 11:56:00 -07:00
Mark Backman
e720573e60 Added 07n-interruptible-gemini 2025-08-12 14:54:49 -04:00
Mark Backman
541a43905b Add GeminiTTSService 2025-08-12 14:52:20 -04:00
Aleix Conchillo Flaqué
707df913cd FrameProcessor: fix race condition on FrameProcessorQueue
We need to increment the counters before the await otherwise we could go to a
different task that could add an item with the same counter.

Also, we need to handle non-frame items as well.
2025-08-12 11:48:22 -07:00
Aleix Conchillo Flaqué
3f3d757581 tests: added WatchdogQueue and WatchdogPriorityQueue unit tests 2025-08-12 11:48:22 -07:00
Aleix Conchillo Flaqué
7c781ce816 WatchdogPriorityQueue: make WatchdogPriorityCancelSentinel public 2025-08-12 11:34:31 -07:00
Aleix Conchillo Flaqué
f3efc9da00 WatchdogQueue: make WatchdogQueueCancelSentinel public 2025-08-12 11:34:31 -07:00
Mark Backman
827a70104d Merge pull request #2425 from pipecat-ai/mb/runner-add-exotel
Add Exotel support to the development runner
2025-08-12 10:36:54 -07:00
Mark Backman
a40327305c Add Exotel support to the development runner 2025-08-12 13:21:18 -04:00
Thu Nguyen
168af44429 Fix: assigns string code instead of Language enum to _language attr of BaseWhisperSTTService 2025-08-12 20:27:26 +07:00
Mark Backman
5f8433476c Merge pull request #2397 from gladiaio/PLA-37-GladiaSTTService-minor-tweaks
feat: add minor tweaks to GladiaSTTService
2025-08-12 04:59:40 -07:00
Fabrice Lamant
6a6fea74f5 fix: set default region to none 2025-08-12 13:31:51 +02:00
Mark Backman
91b557ecbf Merge pull request #2419 from pipecat-ai/mb/fix-lockfile-workflow 2025-08-12 03:39:54 -07:00
Mark Backman
be85291414 Merge pull request #2420 from pipecat-ai/mb/runner-handle-sigint-default 2025-08-12 03:39:29 -07:00
Fabrice Lamant
09f171b69d fix: only pass region if set 2025-08-12 12:05:38 +02:00
Aleix Conchillo Flaqué
929fd98958 Merge pull request #2416 from pipecat-ai/aleix/release-evals-vision
scripts(evals): add vision support
2025-08-11 20:08:08 -07:00
Aleix Conchillo Flaqué
1cfbfcaf11 scripts(evals): add vision support 2025-08-11 20:06:24 -07:00
Mark Backman
cd5a3c13bd Development runner: handle_sigint defaults to False 2025-08-11 22:06:56 -04:00
Mark Backman
9b871b0cc5 Update uv.lock, remove lockfile workflow, update CONTRIBUTING with dependency guidance 2025-08-11 21:39:25 -04:00
Mark Backman
0d499a8aa3 Merge pull request #2409 from pipecat-ai/mb/refactor-playht-http
Refactor PlayHTHttpTTSService to use aiohttp
2025-08-11 18:20:58 -07:00
Mark Backman
45292ab13d Merge pull request #2411 from pipecat-ai/mb/fix-websocket-service-retry
fix: WebsocketService retry logic incorrectly handling ConnectionClos…
2025-08-11 18:17:50 -07:00
Mark Backman
be6ea0dbf6 Code review feedback 2025-08-11 21:17:04 -04:00
Aleix Conchillo Flaqué
fb18ae174e Merge pull request #2417 from pipecat-ai/aleix/release-evals-15-series
scripts(evals): add multilinguag support and 15 series
2025-08-11 17:14:47 -07:00
Mark Backman
c4506523ab Refactor PlayHTHttpTTSService to use aiohttp 2025-08-11 19:58:25 -04:00
Aleix Conchillo Flaqué
b360cb31dc scripts(evals): add multilinguag support and 15 series 2025-08-11 15:21:14 -07:00
Aleix Conchillo Flaqué
07f104199c Merge pull request #2415 from pipecat-ai/aleix/moondream-2025-01-09
MoondreamService: update to revision 2025-01-09
2025-08-11 15:10:35 -07:00
Aleix Conchillo Flaqué
bc1949b4bf MoondreamService: update to revision 2025-01-09 2025-08-11 14:54:04 -07:00
Aleix Conchillo Flaqué
2035dd8b39 Merge pull request #2403 from pipecat-ai/aleix/system-frame-queue-priority-fix
FrameProcessor: fix system frame higher priorty and use a PriortyQueue
2025-08-11 13:57:57 -07:00
Aleix Conchillo Flaqué
24c8189327 Merge pull request #2405 from pipecat-ai/aleix/frame-processor-direct-mode
FrameProcessor: introduce direct mode
2025-08-11 13:57:34 -07:00
Mark Backman
998ac32627 Merge pull request #2413 from captaincaius/fix-stt-mute-filter-vad-frames-20250810
Add VADUserStartSpeakingFrame VADUserStopSpeakingFrame to STTMuteFilter (fix #2412)
2025-08-11 13:54:34 -07:00
Aleix Conchillo Flaqué
50645c1c4f README: recommend python 3.11-3.12
Python 3.11 has significant performance improvements compared to 3.10 which
makes Pipecat's asyncio heavy use  specially better.
2025-08-11 13:53:08 -07:00
Aleix Conchillo Flaqué
8ce29ee8f2 FrameProcessor: fix system frame higher priorty and use a PriortyQueue 2025-08-11 13:53:08 -07:00
Captain Caius
7b8aeef4cc update changelog 2025-08-11 12:45:54 -07:00
Aleix Conchillo Flaqué
6a24457f0e FrameProcessor: introduce direct mode
Direct mode avoids creating internal queues and tasks and processes frames right
away. This might be useful for some very simple processors.
2025-08-11 09:26:31 -07:00
Aleix Conchillo Flaqué
2c01c2b5b3 Merge pull request #2404 from pipecat-ai/aleix/examples-22-simplify-main-pipeline
examples(foundational): update 22 series with simple main pipelines
2025-08-11 09:14:39 -07:00
Aleix Conchillo Flaqué
1c2e114fa2 examples(foundational): update 22 series with simple main pipelines 2025-08-11 09:13:09 -07:00
Filipi da Silva Fuchter
0f137e36c2 Merge pull request #2399 from pipecat-ai/filipi/heygen_latency
Improving the latency of the `HeyGenVideoService`.
2025-08-11 09:13:10 -03:00
Filipi Fuchter
b7f12a96f1 Improving the latency of the HeyGenVideoService. 2025-08-11 09:11:17 -03:00
Filipi da Silva Fuchter
3331f71e17 Merge pull request #2398 from pipecat-ai/filipi/ttfb_metrics_video_services
Added TTFB metrics for `HeyGenVideoService` and `TavusVideoService`.
2025-08-11 09:09:27 -03:00
Filipi Fuchter
55d200e2d1 Added TTFB metrics for HeyGenVideoService and TavusVideoService. 2025-08-11 09:07:21 -03:00
Captain Caius
3fae00e067 Add VADUserStartSpeakingFrame VADUserStopSpeakingFrame to STTMuteFilter 2025-08-10 19:35:04 -07:00
Mark Backman
78cdefd191 Merge pull request #2410 from smokyabdulrahman/issue-2373
Support endpoint_id for AzureSTTService
2025-08-10 16:43:29 -07:00
Mark Backman
42502a4f3b fix: WebsocketService retry logic incorrectly handling ConnectionClosedOK exception 2025-08-10 19:35:05 -04:00
Abdulrahman Alrahma
fc67cc3302 Support endpoint_id for AzureSTTService 2025-08-10 22:24:47 +01:00
Fabrice Lamant
e503ea7466 feat: add minor tweaks to GladiaSTTService 2025-08-08 10:21:52 +02:00
Yohan Liyanage
248206e234 Fixes 2277 - SSML reserved characters in LLM generated text causes Azure TTS to fail. 2025-08-02 12:49:29 +05:30
Nischal Jain
ffa16dd136 added deepwiki badge for weekly repo refresh 2025-05-22 20:08:48 -07:00
499 changed files with 47346 additions and 20788 deletions

View File

@@ -25,7 +25,7 @@ jobs:
version: "latest"
- name: Set up Python
run: uv python install 3.10
run: uv python install 3.12
- name: Install system packages
run: |

View File

@@ -5,25 +5,25 @@ on:
inputs:
gitref:
type: string
description: "what git tag to build (e.g. v0.0.74)"
description: 'what git tag to build (e.g. v0.0.74)'
required: true
jobs:
build:
name: "Build and upload wheels"
name: 'Build and upload wheels'
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
version: 'latest'
- name: Set up Python
run: uv python install 3.10
run: uv python install 3.12
- name: Install development dependencies
run: uv sync --group dev
- name: Build project
@@ -35,9 +35,9 @@ jobs:
path: ./dist
publish-to-pypi:
name: "Publish to PyPI"
name: 'Publish to PyPI'
runs-on: ubuntu-latest
needs: [ build ]
needs: [build]
environment:
name: pypi
url: https://pypi.org/p/pipecat-ai
@@ -56,12 +56,12 @@ jobs:
print-hash: true
publish-to-test-pypi:
name: "Publish to Test PyPI"
name: 'Publish to Test PyPI'
runs-on: ubuntu-latest
needs: [ build ]
needs: [build]
environment:
name: testpypi
url: https://pypi.org/p/pipecat-ai
url: https://test.pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
@@ -70,7 +70,7 @@ jobs:
with:
name: wheels
path: ./dist
- name: Publish to PyPI
- name: Publish to Test PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true

View File

@@ -4,7 +4,7 @@ on: workflow_dispatch
jobs:
build:
name: "Build and upload wheels"
name: 'Build and upload wheels'
runs-on: ubuntu-latest
steps:
- name: Checkout repo
@@ -15,9 +15,9 @@ jobs:
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
version: 'latest'
- name: Set up Python
run: uv python install 3.10
run: uv python install 3.12
- name: Install development dependencies
run: uv sync --group dev
- name: Build project
@@ -29,12 +29,12 @@ jobs:
path: ./dist
publish-to-test-pypi:
name: "Publish to Test PyPI"
name: 'Publish to Test PyPI'
runs-on: ubuntu-latest
needs: [build]
environment:
name: testpypi
url: https://pypi.org/p/pipecat-ai
url: https://test.pypi.org/p/pipecat-ai
permissions:
id-token: write
steps:
@@ -43,7 +43,7 @@ jobs:
with:
name: wheels
path: ./dist
- name: Publish to PyPI
- name: Publish to Test PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
verbose: true

View File

@@ -9,14 +9,14 @@ on:
paths: ['pyproject.toml']
jobs:
test-dev-environment:
test-compatibility:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ['3.10.18', '3.11.13', '3.12.11', '3.13.5']
name: Dev Environment - Python ${{ matrix.python-version }}
name: Python ${{ matrix.python-version }}
steps:
- name: Checkout code
uses: actions/checkout@v4
@@ -55,69 +55,7 @@ jobs:
--no-extra moondream \
--no-extra mlx-whisper
- name: Verify dev installation
- name: Verify installation
run: |
uv run python --version
uv run python -c "import pipecat; print('✅ Dev environment - Pipecat imports successfully')"
test-user-experience:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ['3.10.18', '3.11.13', '3.12.11', '3.13.5']
name: User Experience - Python ${{ matrix.python-version }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
portaudio19-dev \
libcairo2-dev \
libgirepository1.0-dev \
pkg-config
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
version: 'latest'
- name: Set up Python ${{ matrix.python-version }}
run: |
uv python install ${{ matrix.python-version }}
- name: Build local package
run: |
uv build
- name: Create test project
run: |
mkdir test-project
cd test-project
uv init --python ${{ matrix.python-version }}
- name: Test comprehensive extras with uv add (Python 3.10-3.12)
if: "!startsWith(matrix.python-version, '3.13.')"
run: |
cd test-project
# Use uv add with built wheel to leverage dependency management
uv add "../dist/pipecat_ai-"*".whl[anthropic,assemblyai,asyncai,aws,aws-nova-sonic,azure,cartesia,cerebras,deepseek,daily,deepgram,elevenlabs,fal,fireworks,fish,gladia,google,grok,groq,gstreamer,heygen,inworld,koala,langchain,livekit,lmnt,local,mcp,mem0,mlx-whisper,moondream,nim,neuphonic,noisereduce,openai,openpipe,openrouter,perplexity,playht,qwen,rime,riva,runner,sambanova,sentry,local-smart-turn,remote-smart-turn,silero,simli,soniox,soundfile,speechmatics,tavus,together,tracing,ultravox,webrtc,websocket,whisper]"
- name: Test Python 3.13 compatible extras with uv add
if: startsWith(matrix.python-version, '3.13.')
run: |
cd test-project
# Use uv add with built wheel and Python 3.13 compatible extras
uv add "../dist/pipecat_ai-"*".whl[anthropic,assemblyai,asyncai,aws,aws-nova-sonic,azure,cartesia,cerebras,deepseek,daily,deepgram,elevenlabs,fal,fireworks,fish,gladia,google,grok,groq,gstreamer,heygen,inworld,koala,langchain,livekit,lmnt,local,mcp,mem0,nim,neuphonic,noisereduce,openai,openpipe,openrouter,perplexity,playht,qwen,rime,riva,runner,sambanova,sentry,remote-smart-turn,silero,simli,soniox,soundfile,speechmatics,tavus,together,tracing,webrtc,websocket,whisper]"
- name: Verify user installation
run: |
cd test-project
uv run python --version
uv run python -c "import pipecat; print('✅ User experience - Pipecat imports successfully')"
# Test that basic functionality works
uv run python -c "from pipecat.pipeline.pipeline import Pipeline; print('✅ Pipeline import works')"
uv run python -c "import pipecat; print('✅ Pipecat imports successfully')"

View File

@@ -23,17 +23,12 @@ jobs:
token: ${{ secrets.QUICKSTART_SYNC_TOKEN }}
path: quickstart-repo
- name: Sync files (excluding READMEs)
- name: Sync files (excluding uv.lock and README.md)
run: |
# Copy code files only, skip READMEs
cp examples/quickstart/bot.py quickstart-repo/
cp examples/quickstart/requirements.txt quickstart-repo/
cp examples/quickstart/env.example quickstart-repo/
# Copy any other files that aren't README.md
# Copy all files except uv.lock and README.md
find examples/quickstart -type f \
-not -name "README.md" \
-not -name "*.md" \
-not -name "uv.lock" \
-exec cp {} quickstart-repo/ \;
- name: Commit and push changes

View File

@@ -29,7 +29,7 @@ jobs:
version: "latest"
- name: Set up Python
run: uv python install 3.10
run: uv python install 3.12
- name: Install system packages
run: |

View File

@@ -1,42 +0,0 @@
name: Update lockfile
on:
push:
paths:
- 'pyproject.toml'
branches:
- main
workflow_dispatch: # Allows manual triggering from GitHub UI
jobs:
update-lockfile:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
# This gives the workflow permission to push back to the repo
token: ${{ secrets.GITHUB_TOKEN }}
- name: Install uv
uses: astral-sh/setup-uv@v1
- name: Update lockfile
run: uv lock
- name: Check for changes
id: verify-changed-files
run: |
if [ -n "$(git status --porcelain)" ]; then
echo "changed=true" >> $GITHUB_OUTPUT
else
echo "changed=false" >> $GITHUB_OUTPUT
fi
- name: Commit lockfile
if: steps.verify-changed-files.outputs.changed == 'true'
run: |
git config --local user.email "action@github.com"
git config --local user.name "GitHub Action"
git add uv.lock
git commit -m "chore: update uv.lock after dependency changes"
git push

File diff suppressed because it is too large Load Diff

336
COMMUNITY_INTEGRATIONS.md Normal file
View File

@@ -0,0 +1,336 @@
# Community Integrations Guide
Pipecat welcomes community-maintained integrations! As our ecosystem grows, we've established a process for any developer to create and maintain their own service integrations while ensuring discoverability for the Pipecat community.
## Overview
**What we support:** Community-maintained integrations that live in separate repositories and are maintained by their authors.
**What we don't do:** The Pipecat team does not code review, test, or maintain community integrations. We provide guidance and list approved integrations for discoverability.
**Why this approach:** This allows the community to move quickly while keeping the Pipecat core team focused on maintaining the framework itself.
## Submitting your Integration
To be listed as an official community integration, follow these steps:
### Step 1: Build Your Integration
Create your integration following the patterns and examples shown in the "Integration Patterns and Examples" section below.
### Step 2: Set Up Your Repository
Your repository must contain these components:
- **Source code** - Complete implementation following Pipecat patterns
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
- **README.md** - Must include:
- Introduction and explanation of your integration
- Installation instructions
- Usage instructions with Pipecat Pipeline
- How to run your example
- Pipecat version compatibility (e.g., "Tested with Pipecat v0.0.86")
- Company attribution: If you work for the company providing the service, please mention this in your README. This helps build confidence that the integration will be actively maintained.
- **LICENSE** - Permissive license (BSD-2 like Pipecat, or equivalent open source terms)
- **Code documentation** - Source code with docstrings (we recommend following [Pipecat's docstring conventions](https://github.com/pipecat-ai/pipecat/blob/main/CONTRIBUTING.md#docstring-conventions))
- **Changelog** - Maintain a changelog for version updates
### Step 3: Join Discord
Join our Discord: https://discord.gg/pipecat
### Step 4: Submit for Listing
Submit a pull request to add your integration to our [Community Integrations documentation page](https://docs.pipecat.ai/server/services/community-integrations).
**To submit:**
1. Fork the [Pipecat docs repository](https://github.com/pipecat-ai/docs)
2. Edit the file `server/services/community-integrations.mdx`
3. Add your integration to the appropriate service category table with:
- Service name
- Link to your repository
- Maintainer GitHub username(s)
4. Include a link to your demo video (approx 30-60 seconds) in your PR description showing:
- Core functionality of your integration
- Handling of an interruption (if applicable to service type)
5. Submit your pull request
Once your PR is submitted, post in the `#community-integrations` Discord channel to let us know.
## Integration Patterns and Examples
### STT (Speech-to-Text) Services
#### Websocket-based Services
**Base class:** `STTService`
**Examples:**
- [DeepgramSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/deepgram/stt.py)
- [SpeechmaticsSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/speechmatics/stt.py)
#### File-based Services
**Base class:** `SegmentedSTTService`
**Examples:**
- [RivaSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/riva/stt.py)
- [FalSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/fal/stt.py)
#### Key requirements:
- STT services should push `InterimTranscriptionFrames` and `TranscriptionFrames`
- If confidence values are available, filter for values >50% confidence
### LLM (Large Language Model) Services
#### OpenAI-Compatible Services
**Base class:** `OpenAILLMService`
**Examples:**
- [AzureLLMService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/azure/llm.py)
- [GrokLLMService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/grok/llm.py) - Shows overriding the base class where needed
#### Non-OpenAI Compatible Services
**Requires:** Full implementation
**Examples:**
- [AnthropicLLMService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/anthropic/llm.py)
- [GoogleLLMService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/llm.py)
#### Key requirements:
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
- **Context aggregation:** Implement context aggregation to collect user and assistant content:
- Aggregators come in pairs with a `user()` instance and `assistant()` instance
- Context must adhere to the `LLMContext` universal format
- Aggregators should handle adding messages, function calls, and images to the context
### TTS (Text-to-Speech) Services
#### AudioContextWordTTSService
**Use for:** Websocket-based services supporting word/timestamp alignment
**Example:**
- [CartesiaTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/tts.py)
#### InterruptibleTTSService
**Use for:** Websocket-based services without word/timestamp alignment, requiring disconnection on interruption
**Example:**
- [SarvamTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/sarvam/tts.py)
#### WordTTSService
**Use for:** HTTP-based services supporting word/timestamp alignment
**Example:**
- [ElevenLabsHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
#### TTSService
**Use for:** HTTP-based services without word/timestamp alignment
**Example:**
- [GoogleHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/tts.py)
#### Key requirements:
- For websocket services, use asyncio WebSocket implementation (required for v13+ support)
- Handle idle service timeouts with keepalives
- TTSServices push both audio (`TTSRawAudioFrame`) and text (`TTSTextFrame`) frames
### Telephony Serializers
Pipecat supports telephony provider integration using websocket connections to exchange MediaStreams. These services use a FrameSerializer to serialize and deserialize inputs from the FastAPIWebsocketTransport.
**Examples:**
- [Twilio](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/serializers/twilio.py)
- [Telnyx](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/serializers/telnyx.py)
#### Key requirements:
- Include hang-up functionality using the provider's native API, ideally using `aiohttp`
- Support DTMF (dual-tone multi-frequency) events if the provider supports them:
- Deserialize DTMF events from the provider's protocol to `InputDTMFFrame`
- Use `KeypadEntry` enum for valid keypad entries (0-9, \*, #, A-D)
- Handle invalid DTMF digits gracefully by returning `None`
### Image Generation Services
**Base class:** `ImageGenService`
**Examples:**
- [FalImageGenService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/fal/image.py)
- [GoogleImageGenService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/image.py)
#### Key requirements:
- Must implement `run_image_gen` method returning an `AsyncGenerator`
### Vision Services
Vision services process images and provide analysis such as descriptions, object detection, or visual question answering.
**Base class:** `VisionService`
**Example:**
- [MoondreamVisionService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/moondream/vision.py)
#### Key requirements:
- Must implement `run_vision` method that takes an `LLMContext` and returns an `AsyncGenerator[Frame, None]`
- The method processes the latest image in the context and yields frames with analysis results
- Typically yields `TextFrame` objects containing descriptions or answers
## Implementation Guidelines
### Naming Conventions
- **STT:** `VendorSTTService`
- **LLM:** `VendorLLMService`
- **TTS:**
- Websocket: `VendorTTSService`
- HTTP: `VendorHttpTTSService`
- **Image:** `VendorImageGenService`
- **Vision:** `VendorVisionService`
- **Telephony:** `VendorFrameSerializer`
### Metrics Support
Enable metrics in your service:
```python
def can_generate_metrics(self) -> bool:
"""Check if this service can generate processing metrics.
Returns:
True, as this service supports metrics.
"""
return True
```
### Dynamic Settings Updates
STT, LLM, and TTS services support `ServiceUpdateSettingsFrame` for dynamic configuration changes. The base STTService has an `_update_settings()` method that handles settings, and the private `_settings` `Dict` is used to store settings and provide access to the subclass.
```python
async def set_language(self, language: Language):
"""Set the recognition language and reconnect.
Args:
language: The language to use for speech recognition.
"""
logger.info(f"Switching STT language to: [{language}]")
self._settings["language"] = language
await self._disconnect()
await self._connect()
```
Note that, in this example, Deepgram requires the websocket connection be disconnected and reconnected to reinitialize the service with the new value. Consider if your service requires reconnection.
### Sample Rate Handling
Sample rates are set via PipelineParams and passed to each frame processor at initialization. The pattern is to _not_ set the sample rate value in the constructor of a given service. Instead, use the `start()` method to initialize sample rates from the frame:
```python
async def start(self, frame: StartFrame):
"""Start the service."""
await super().start(frame)
self._settings["output_format"]["sample_rate"] = self.sample_rate
await self._connect()
```
Note that `self.sample_rate` is a `@property` set in the TTSService base class, which provides access to the private sample rate value obtained from the StartFrame.
### Tracing Decorators
Use Pipecat's tracing decorators:
- **STT:** `@traced_stt` - decorate a function that handles `transcript`, `is_final`, `language` as args
- **LLM:** `@traced_llm` - decorate the `_process_context()` method
- **TTS:** `@traced_tts` - decorate the `run_tts()` method
## Best Practices
### Packaging and Distribution
- Use [uv](https://docs.astral.sh/uv/) for packaging (encouraged)
- Consider releasing to PyPI for easier installation
- Follow semantic versioning principles
- Maintain a changelog
### HTTP Communication
For REST-based communication, use aiohttp. Pipecat includes this as a required dependency, so using it prevents adding an additional dependency to your integration.
### Error Handling
- Wrap API calls in appropriate try/catch blocks
- Handle rate limits and network failures gracefully
- Provide meaningful error messages
- When errors occur, raise exceptions AND push `ErrorFrame`s to notify the pipeline:
```python
from pipecat.frames.frames import ErrorFrame
try:
# Your API call
result = await self._make_api_call()
except Exception as e:
# Push error frame to pipeline
await self.push_error(ErrorFrame(error=f"{self} error: {e}"))
# Raise or handle as appropriate
raise
```
### Testing
- Your foundational example serves as a valuable integration-level test
- Unit tests are nice to have. As the Pipecat teams provides better guidance, we will encourage unit testing more
## Disclaimer
Community integrations are community-maintained and not officially supported by the Pipecat team. Users should evaluate these integrations independently. The Pipecat team reserves the right to remove listings that become unmaintained or problematic.
## Staying Up to Date
Pipecat evolves rapidly to support the latest AI technologies and patterns. While we strive to minimize breaking changes, they do occur as the framework matures.
**We strongly recommend:**
- Join our Discord at https://discord.gg/pipecat and monitor the `#announcements` channel for release notifications
- Follow our changelog: https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md
- Test your integration against new Pipecat releases promptly
- Update your README with the last tested Pipecat version
This helps ensure your integration remains compatible and your users have clear expectations about version support.
## Questions?
Join our Discord community at https://discord.gg/pipecat and post in the `#community-integrations` channel for guidance and support.
For additional questions, you can also reach out to us at pipecat-ai@daily.co.

View File

@@ -1,5 +1,9 @@
## Contributing to Pipecat
**Want to add a new service integration?**
We encourage community-maintained integrations! Please see our [Community Integration Guide](COMMUNITY_INTEGRATIONS.md) for the process and requirements.
**Want to contribute to Pipecat core?**
We welcome contributions of all kinds! Your help is appreciated. Follow these steps to get involved:
1. **Fork this repository**: Start by forking the Pipecat Documentation repository to your GitHub account.
@@ -31,6 +35,23 @@ git push origin your-branch-name
Our maintainers will review your PR, and once everything is good, your contributions will be merged!
## Dependency Management
This project uses [uv](https://docs.astral.sh/uv/) for dependency management. The `uv.lock` file is committed to ensure reproducible builds.
### Adding or Updating Dependencies
1. Edit `pyproject.toml` to add/update dependencies
2. Run `uv lock` to update the lockfile with new dependency resolution
3. Run `uv sync` to install the updated dependencies locally
4. Always commit both files together:
```bash
git add pyproject.toml uv.lock
git commit -m "feat: add new dependency for feature X"
```
**Important:** Never manually edit `uv.lock`. It's auto-generated by `uv lock`.
## Code Style and Documentation
### Python Code Style

151
README.md
View File

@@ -2,7 +2,8 @@
<img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div></h1>
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![codecov](https://codecov.io/gh/pipecat-ai/pipecat/graph/badge.svg?token=LNVUIVO4Y9)](https://codecov.io/gh/pipecat-ai/pipecat) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat)
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![codecov](https://codecov.io/gh/pipecat-ai/pipecat/graph/badge.svg?token=LNVUIVO4Y9)](https://codecov.io/gh/pipecat-ai/pipecat) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/pipecat-ai/pipecat)
[![](https://getmanta.ai/api/badges?text=Manta%20Graph&link=manta)](https://getmanta.ai/pipecat)
# 🎙️ Pipecat: Real-Time Voice & Multimodal AI Agents
@@ -19,8 +20,6 @@
- **Business Agents** customer intake, support bots, guided flows
- **Complex Dialog Systems** design logic with structured conversations
🧭 Looking to build structured conversations? Check out [Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) for managing complex conversational states and transitions.
## 🧠 Why Pipecat?
- **Voice-first**: Integrates speech recognition, text-to-speech, and conversation handling
@@ -28,6 +27,39 @@
- **Composable Pipelines**: Build complex behavior from modular components
- **Real-Time**: Ultra-low latency interaction with different transports (e.g. WebSockets or WebRTC)
## 🌐 Pipecat Ecosystem
### 📱 Client SDKs
Building client applications? You can connect to Pipecat from any platform using our official SDKs:
<a href="https://docs.pipecat.ai/client/js/introduction">JavaScript</a> | <a href="https://docs.pipecat.ai/client/react/introduction">React</a> | <a href="https://docs.pipecat.ai/client/react-native/introduction">React Native</a> |
<a href="https://docs.pipecat.ai/client/ios/introduction">Swift</a> | <a href="https://docs.pipecat.ai/client/android/introduction">Kotlin</a> | <a href="https://docs.pipecat.ai/client/c++/introduction">C++</a> | <a href="https://github.com/pipecat-ai/pipecat-esp32">ESP32</a>
### 🧭 Structured conversations
Looking to build structured conversations? Check out [Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) for managing complex conversational states and transitions.
### 🪄 Beautiful UIs
Want to build beautiful and engaging experiences? Checkout the [Voice UI Kit](https://github.com/pipecat-ai/voice-ui-kit), a collection of components, hooks and templates for building voice AI applications quickly.
### 🛠️ Create and deploy projects
Create a new project in under a minute with the [Pipecat CLI](https://github.com/pipecat-ai/pipecat-cli). Then use the CLI to monitor and deploy your agent to production.
### 🔍 Debugging
Looking for help debugging your pipeline and processors? Check out [Whisker](https://github.com/pipecat-ai/whisker), a real-time Pipecat debugger.
### 🖥️ Terminal
Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.
### 📺️ Pipecat TV Channel
Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
## 🎬 See it in action
<p float="left">
@@ -35,35 +67,24 @@
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/storytelling-chatbot/image.png" width="400" /></a>
<br/>
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/translation-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/moondream-chatbot/image.png" width="400" /></a>
<a href="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/12-describe-video.py"><img src="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/assets/moondream.png" width="400" /></a>
</p>
## 📱 Client SDKs
You can connect to Pipecat from any platform using our official SDKs:
| Platform | SDK Repo | Description |
| -------- | ------------------------------------------------------------------------------ | -------------------------------- |
| Web | [pipecat-client-web](https://github.com/pipecat-ai/pipecat-client-web) | JavaScript and React client SDKs |
| iOS | [pipecat-client-ios](https://github.com/pipecat-ai/pipecat-client-ios) | Swift SDK for iOS |
| Android | [pipecat-client-android](https://github.com/pipecat-ai/pipecat-client-android) | Kotlin SDK for Android |
| C++ | [pipecat-client-cxx](https://github.com/pipecat-ai/pipecat-client-cxx) | C++ client SDK |
## 🧩 Available services
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
@@ -114,7 +135,8 @@ You can get started with Pipecat running on your local machine, then move your a
### Prerequisites
**Python Version:** 3.10+
**Minimum Python Version:** 3.10
**Recommended Python Version:** 3.12
### Setup Steps
@@ -128,7 +150,11 @@ You can get started with Pipecat running on your local machine, then move your a
2. Install development and testing dependencies:
```bash
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp --no-extra local
uv sync --group dev --all-extras \
--no-extra gstreamer \
--no-extra krisp \
--no-extra local \
--no-extra ultravox # (ultravox not fully supported on macOS)
```
3. Install the git pre-commit hooks:
@@ -137,23 +163,6 @@ You can get started with Pipecat running on your local machine, then move your a
uv run pre-commit install
```
### Python 3.13+ Compatibility
Some features require PyTorch, which doesn't yet support Python 3.13+. Install using:
```bash
uv sync --group dev --all-extras \
--no-extra gstreamer \
--no-extra krisp \
--no-extra local \
--no-extra local-smart-turn \
--no-extra mlx-whisper \
--no-extra moondream \
--no-extra ultravox
```
> **Tip:** For full compatibility, use Python 3.12: `uv python pin 3.12`
> **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.
### Running tests
@@ -170,54 +179,6 @@ Run a specific test suite:
uv run pytest tests/test_name.py
```
### Setting up your editor
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting via [Ruff](https://github.com/astral-sh/ruff).
#### Emacs
You can use [use-package](https://github.com/jwiegley/use-package) to install [emacs-lazy-ruff](https://github.com/christophermadsen/emacs-lazy-ruff) package and configure `ruff` arguments:
```elisp
(use-package lazy-ruff
:ensure t
:hook ((python-mode . lazy-ruff-mode))
:config
(setq lazy-ruff-format-command "ruff format")
(setq lazy-ruff-check-command "ruff check --select I"))
```
`ruff` was installed in the `venv` environment described before, so you should be able to use [pyvenv-auto](https://github.com/ryotaro612/pyvenv-auto) to automatically load that environment inside Emacs.
```elisp
(use-package pyvenv-auto
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
```
#### Visual Studio Code
Install the
[Ruff](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, and enable formatting on save:
```json
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true
}
```
#### PyCharm
`ruff` was installed in the `venv` environment described before, now to enable autoformatting on save, go to `File` -> `Settings` -> `Tools` -> `File Watchers` and add a new watcher with the following settings:
1. **Name**: `Ruff formatter`
2. **File type**: `Python`
3. **Working directory**: `$ContentRoot$`
4. **Arguments**: `format $FilePath$`
5. **Program**: `$PyInterpreterDirectory$/ruff`
## 🤝 Contributing
We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:

5
SECURITY.md Normal file
View File

@@ -0,0 +1,5 @@
# Security Policy
## Reporting a Vulnerability
Please email `disclosures@daily.co`.

View File

@@ -1,10 +0,0 @@
# Pipecat Docs
## [Architecture Overview](architecture.md)
Learn about the thinking behind the framework's design.
## [A Frame's Progress](frame-progress.md)
See how a Frame is processed through a Transport, a Pipeline, and a series of Frame Processors.

View File

@@ -50,6 +50,7 @@ autodoc_mock_imports = [
# Krisp - has build issues on some platforms
"pipecat_ai_krisp",
"krisp",
"krisp_audio",
# System-specific GUI libraries
"_tkinter",
"tkinter",

View File

@@ -21,6 +21,7 @@ Quick Links
Adapters <api/pipecat.adapters>
Audio <api/pipecat.audio>
Clocks <api/pipecat.clocks>
Extensions <api/pipecat.extensions>
Frames <api/pipecat.frames>
Metrics <api/pipecat.metrics>
Observers <api/pipecat.observers>

View File

@@ -1,17 +0,0 @@
# Pipecat architecture guide
## Frames
Frames can represent discrete chunks of data, for instance a chunk of text, a chunk of audio, or an image. They can also be used to as control flow, for instance a frame that indicates that there is no more data available, or that a user started or stopped talking. They can also represent more complex data structures, such as a message array used for an LLM completion.
## FrameProcessors
Frame processors operate on frames. Every frame processor implements a `process_frame` method that consumes one frame and produces zero or more frames. Frame processors can do simple transforms, such as concatenating text fragments into sentences, or they can treat frames as input for an AI Service, and emit chat completions based on message arrays or transform text into audio or images.
## Pipelines
Pipelines are lists of frame processors linked together. Frame processors can push frames upstream or downstream to their peers. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport as an output.
## Transports
Transports provide input and output frame processors to receive or send frames respectively. For example, the `DailyTransport` does this with a WebRTC session joined to a Daily.co room.

View File

@@ -1,46 +0,0 @@
# A Frame's Progress
1. A user says “Hello, LLM” and the cloud transcription service delivers a transcription to the Transport.
![A transcript frame arrives](images/frame-progress-01.png)
2. The Transport places a Transcription frame in the Pipelines source queue.
![Frame in source queue](images/frame-progress-02.png)
3. The Pipeline passes the Transcription frame to the first Frame Processor in its list, the LLM User Message Aggregator.
![To UMA](images/frame-progress-03.png)
4. The LLM User Message Aggregator updates the LLM Context with a `{“user”: “Hello LLM”}` message.
![Update context](images/frame-progress-04.png)
5. The LLM User Message Aggregator yields an LLM Message Frame, containing the updated LLM Context. The Pipeline passes this frame to the LLM Frame Processor.
![Update context](images/frame-progress-05.png)
6. The LLM Frame Processor creates a streaming chat completion based on the LLM context and yields the first chunk of a response, Text Frame with the value “Hi, “. The Pipeline passes this frame to the TTS Frame Processor. The TTS Frame Processor aggregates this response but doesnt yield anything, yet, because its waiting for a full sentence.
![LLM yields Text](images/frame-progress-06.png)
7. The LLM Frame Processor yields another Text Frame with the value “there.”. The Pipeline passes this frame to the TTS Frame Processor.
![LLM yields more Text](images/frame-progress-07.png)
8. The TTS Frame Processor now has a full sentence, so it starts streaming audio based on “Hi, there.” It yields the first chunk of streaming audio as an Audio frame, which the Pipeline passes to the LLM Assistant Message Aggregator.
![TTS yields Audio](images/frame-progress-08.png)
9. The LLM Assistant Message Aggregator doesnt do anything with Audio frames, so it immediately yields the frame, unchanged. This is the convention for all Frame Processors: frames that the processor doesnt process should be immediately yielded.
![pass-through](images/frame-progress-09.png)
10. The Pipeline places the first Audio frame in its sink queue, which is being watched by the Transport. Since the frame is now in a queue, the Pipeline can continue processing other frames. Note that the source and sink queues form a sort of “boundary of concurrent processing” between a Pipeline and the outside world. In a Pipeline, Frames are processed sequentially; once a Frame is on a queue it can be processed in parallel with the frames being processed by the Pipeline. TODO: link to a more in-depth section about this.
![sink queue](images/frame-progress-10.png)
11. The TTS Frame Processor yields another Audio frame as the Transport transmits the first Audio frame.
![parallel audio](images/frame-progress-11.png)
12. As before, the LLM Assistant Message Aggregator immediately yields the Audio frame and the Pipeline places the Audio frame in the sink queue.
![sink queue 2](images/frame-progress-12.png)
13. The TTS Frame Processor has no more frames to yield. The LLM Frame Processor emits an LLM Response End Frame, which the Pipeline passes to the TTS Frame Processor.
![response end](images/frame-progress-13.png)
14. The TTS Frame Processor immediately yields the LLM Response End Frame, so the Pipeline passes it along to the LLM Assistant Message Aggregator. The LLM Assistant Message Aggregator updates the LLM Context with the full response from the LLM. TODO TODO: I realized I forgot that the TSS Frame Processor also yields the Text frames that the LLM emitted so that the LLM Assistant Message Aggregator could accumulate them, arrggh.
![response end](images/frame-progress-14.png)
15. The system is quiet, and waiting for the next message from the Transport.
![response end](images/frame-progress-15.png)

View File

@@ -1,110 +0,0 @@
# Understanding Different Frame Types in the Pipecat System
In the Pipecat system, frames are used to represent different types of data and control signals that flow through the pipeline. Understanding these frame types is crucial for working with the system effectively. This tutorial will cover the main categories of frames and their specific uses.
## 1. Base Frame Classes
### Frame
The `Frame` class is the base class for all frames. It includes:
- `id`: A unique identifier
- `name`: A descriptive name
- `pts`: Presentation timestamp (optional)
### DataFrame
`DataFrame` is a subclass of `Frame` and serves as a base for most data-carrying frames.
## 2. Audio Frames
### AudioRawFrame
Represents a chunk of audio with properties:
- `audio`: Raw audio data
- `sample_rate`: Audio sample rate
- `num_channels`: Number of audio channels
Subclasses include:
- `InputAudioRawFrame`: For audio from input sources
- `OutputAudioRawFrame`: For audio to be played by output devices
- `TTSAudioRawFrame`: For audio generated by Text-to-Speech services
## 3. Image Frames
### ImageRawFrame
Represents an image with properties:
- `image`: Raw image data
- `size`: Image dimensions
- `format`: Image format (e.g., JPEG, PNG)
Subclasses include:
- `InputImageRawFrame`: For images from input sources
- `OutputImageRawFrame`: For images to be displayed
- `UserImageRawFrame`: For images associated with a specific user
- `VisionImageRawFrame`: For images with associated text for description
- `URLImageRawFrame`: For images with an associated URL
### SpriteFrame
Represents an animated sprite, containing a list of `ImageRawFrame` objects.
## 4. Text and Transcription Frames
### TextFrame
Represents a chunk of text, used for various purposes in the pipeline.
### TranscriptionFrame
A specialized `TextFrame` for speech transcriptions, including:
- `user_id`: ID of the speaking user
- `timestamp`: When the transcription was generated
- `language`: Detected language of the speech
### InterimTranscriptionFrame
Similar to `TranscriptionFrame`, but for interim (not final) transcriptions.
## 5. LLM (Language Model) Frames
### LLMMessagesFrame
Contains a list of messages for an LLM service to process.
### LLMMessagesAppendFrame and LLMMessagesUpdateFrame
Used to modify the current context of LLM messages.
### LLMSetToolsFrame
Specifies tools (functions) available for the LLM to use.
### LLMEnablePromptCachingFrame
Controls prompt caching in certain LLMs.
## 6. System and Control Frames
### SystemFrame
Base class for system-level frames.
Important system frames include:
- `StartFrame`: Initiates a pipeline
- `CancelFrame`: Stops a pipeline immediately
- `ErrorFrame`: Notifies of errors (with `FatalErrorFrame` for unrecoverable errors)
- `EndTaskFrame` and `CancelTaskFrame`: Control pipeline tasks
- `StartInterruptionFrame` and `StopInterruptionFrame`: Indicate user speech for interruptions
### ControlFrame
Base class for control-flow frames.
Notable control frames:
- `EndFrame`: Signals the end of a pipeline
- `LLMFullResponseStartFrame` and `LLMFullResponseEndFrame`: Bracket LLM responses
- `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame`: Indicate user speech activity
- `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame`: Indicate bot speech activity
- `TTSStartedFrame` and `TTSStoppedFrame`: Bracket Text-to-Speech responses
## 7. Special Purpose Frames
### MetricsFrame
Contains performance metrics data.
### FunctionCallInProgressFrame and FunctionCallResultFrame
Used for handling LLM function (tool) calls.
### ServiceUpdateSettingsFrame
Base class for updating service settings, with specific subclasses for LLM, TTS, and STT services.
## Conclusion
Understanding these frame types is essential for working with the Pipecat system. Each frame type serves a specific purpose in the pipeline, whether it's carrying data (like audio or images), controlling the flow of the pipeline, or managing system-level operations. By using the appropriate frame types, you can effectively process and transmit various kinds of information through your pipeline.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

View File

@@ -1,6 +1,12 @@
# AI-COUSTICS
AICOUSTICS_LICENSE_KEY=...
# Anthropic
ANTHROPIC_API_KEY=...
# Assembly AI
ASSEMBLYAI_API_KEY=...
# Async
ASYNCAI_API_KEY=...
ASYNCAI_VOICE_ID=...
@@ -18,12 +24,19 @@ AZURE_CHATGPT_API_KEY=...
AZURE_CHATGPT_ENDPOINT=https://...
AZURE_CHATGPT_MODEL=...
AZURE_REALTIME_API_KEY=...
AZURE_REALTIME_BASE_URL=...
AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Cartesia
CARTESIA_API_KEY=...
CARTESIA_VOICE_ID=...
# Cerebras
CEREBRAS_API_KEY=...
# Daily
DAILY_API_KEY=...
@@ -32,36 +45,75 @@ DAILY_SAMPLE_ROOM_URL=https://...
# Deepgram
DEEPGRAM_API_KEY=...
# DeepSeek
DEEPSEEK_API_KEY=...
# ElevenLabs
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
# Neuphonic
NEUPHONIC_API_KEY=...
# Fal
FAL_KEY=...
# Fireworks
FIREWORKS_API_KEY=...
# Fish Audio
FISH_API_KEY=...
# Gladia
GLADIA_API_KEY=...
GLADIA_REGION=...
# Google
GOOGLE_API_KEY=...
GOOGLE_CLOUD_PROJECT_ID=...
GOOGLE_TEST_CREDENTIALS=...
GOOGLE_VERTEX_TEST_CREDENTIALS=...
GOOGLE_CLOUD_PROJECT_ID=...
GOOGLE_CLOUD_LOCATION=...
GOOGLE_TEST_CREDENTIALS=...
# Grok
GROK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Heygen
HEYGEN_API_KEY=...
# Hume
HUME_API_KEY=...
HUME_VOICE_ID=...
# Inworld
INWORLD_API_KEY=...
# Krisp
KRISP_MODEL_PATH=...
# Krisp Viva
KRISP_VIVA_MODEL_PATH=...
# LiveKit
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
# LMNT
LMNT_API_KEY=...
LMNT_VOICE_ID=...
# PlayHT
PLAY_HT_USER_ID=...
PLAY_HT_API_KEY=...
# MiniMax
MINIMAX_API_KEY=...
MINIMAX_GROUP_ID=...
# Mistral
MISTRAL_API_KEY=...
# Neuphonic
NEUPHONIC_API_KEY=...
# NVIDIA
NVIDIA_API_KEY=...
# OpenAI
OPENAI_API_KEY=...
@@ -69,74 +121,73 @@ OPENAI_API_KEY=...
# OpenPipe
OPENPIPE_API_KEY=...
# Tavus
TAVUS_API_KEY=...
TAVUS_REPLICA_ID=...
TAVUS_PERSONA_ID=...
# OpenRouter
OPENROUTER_API_KEY=...
# Perplexity
PERPLEXITY_API_KEY=...
# Picovoice Koala
KOALA_ACCESS_KEY=...
# Piper
PIPER_BASE_URL=...
# PlayHT
PLAYHT_USER_ID=...
PLAYHT_API_KEY=...
# Plivo
PLIVO_AUTH_ID=...
PLIVO_AUTH_TOKEN=...
# Qwen
QWEN_API_KEY=...
# Rime
RIME_API_KEY=...
RIME_VOICE_ID=...
# SambaNova
SAMBANOVA_API_KEY=...
# Sarvam AI
SARVAM_API_KEY=...
# Sentry
SENTRY_DSN=...
# Simli
SIMLI_API_KEY=...
SIMLI_FACE_ID=...
# Krisp
KRISP_MODEL_PATH=...
# DeepSeek
DEEPSEEK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Grok
GROK_API_KEY=...
# Inworld
INWORLD_API_KEY=...
# Together.ai
TOGETHER_API_KEY=...
# Cerebras
CEREBRAS_API_KEY=...
# Fish Audio
FISH_API_KEY=...
# Assembly AI
ASSEMBLYAI_API_KEY=...
# OpenRouter
OPENROUTER_API_KEY=...
# Piper
PIPER_BASE_URL=...
# Smart turn
LOCAL_SMART_TURN_MODEL_PATH=...
FAL_SMART_TURN_API_KEY=...
# Soniox
SONIOX_API_KEY=...
# Speechmatics
SPEECHMATICS_API_KEY=...
# Tavus
TAVUS_API_KEY=...
TAVUS_REPLICA_ID=...
# Telnyx
TELNYX_API_KEY=...
TELNYX_ACCOUNT_SID=...
# Together.ai
TOGETHER_API_KEY=...
# Twilio
TWILIO_ACCOUNT_SID=...
TWILIO_AUTH_TOKEN=...
# MiniMax
MINIMAX_API_KEY=...
MINIMAX_GROUP_ID=...
# Sarvam AI
SARVAM_API_KEY=...
# Soniox
SONIOX_API_KEY=
# Speechmatics
SPEECHMATICS_API_KEY=...
# SambaNova
SAMBANOVA_API_KEY=...
# Sentry
SENTRY_DSN=...
# Heygen
HEYGEN_API_KEY=...
# WhatsApp
WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
WHATSAPP_PHONE_NUMBER_ID=...
WHATSAPP_APP_SECRET=...

View File

@@ -18,8 +18,8 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.piper.tts import PiperTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)

View File

@@ -18,8 +18,8 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.rime.tts import RimeHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)

View File

@@ -17,8 +17,8 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)

View File

@@ -11,13 +11,13 @@ import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import TextFrame
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.livekit import configure
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.services.livekit import LiveKitParams, LiveKitTransport
from pipecat.transports.livekit.transport import LiveKitParams, LiveKitTransport
load_dotenv(override=True)
@@ -50,7 +50,7 @@ async def main():
async def on_first_participant_joined(transport, participant_id):
await asyncio.sleep(1)
await task.queue_frame(
TextFrame(
TTSSpeakFrame(
"Hello there! How are you doing today? Would you like to talk about the weather?"
)
)

View File

@@ -17,8 +17,8 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.riva.tts import FastPitchTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)

View File

@@ -9,21 +9,18 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import EndFrame
from pipecat.frames.frames import EndFrame, LLMContextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext,
OpenAILLMContextFrame,
)
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -63,7 +60,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([OpenAILLMContextFrame(OpenAILLMContext(messages)), EndFrame()])
await task.queue_frames([LLMContextFrame(LLMContext(messages)), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

View File

@@ -18,7 +18,7 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.fal.image import FalImageGenService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)

View File

@@ -17,7 +17,7 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.image import GoogleImageGenService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)

View File

@@ -17,17 +17,22 @@ from fastapi.responses import RedirectResponse
from loguru import logger
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import IceServer, SmallWebRTCConnection
from pipecat.transports.smallwebrtc.connection import IceServer, SmallWebRTCConnection
from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport
load_dotenv(override=True)
@@ -55,7 +60,8 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
)
@@ -75,8 +81,8 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -103,7 +109,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -12,15 +12,20 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.daily import configure
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyLogLevel, DailyParams, DailyTransport
from pipecat.transports.daily.transport import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -40,10 +45,10 @@ async def main():
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
)
transport.set_log_level(DailyLogLevel.Info)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
@@ -59,8 +64,8 @@ async def main():
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -86,7 +91,7 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):

View File

@@ -12,23 +12,27 @@ import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotInterruptionFrame,
TextFrame,
InterruptionFrame,
TranscriptionFrame,
TTSSpeakFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.livekit import configure
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.livekit import LiveKitParams, LiveKitTransport
from pipecat.transports.livekit.transport import LiveKitParams, LiveKitTransport
load_dotenv(override=True)
@@ -46,7 +50,8 @@ async def main():
params=LiveKitParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
)
@@ -69,8 +74,8 @@ async def main():
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -98,7 +103,7 @@ async def main():
async def on_first_participant_joined(transport, participant_id):
await asyncio.sleep(1)
await task.queue_frame(
TextFrame(
TTSSpeakFrame(
"Hello there! How are you doing today? Would you like to talk about the weather?"
)
)
@@ -115,7 +120,7 @@ async def main():
await task.queue_frames(
[
BotInterruptionFrame(),
InterruptionFrame(),
UserStartedSpeakingFrame(),
TranscriptionFrame(
user_id=participant_id,

View File

@@ -14,6 +14,7 @@ from loguru import logger
from pipecat.frames.frames import (
DataFrame,
Frame,
LLMContextFrame,
LLMFullResponseStartFrame,
TextFrame,
)
@@ -21,10 +22,8 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext,
OpenAILLMContextFrame,
)
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
@@ -33,7 +32,7 @@ from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
@@ -156,7 +155,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
}
]
frames.append(MonthFrame(month=month))
frames.append(OpenAILLMContextFrame(OpenAILLMContext(messages)))
frames.append(LLMContextFrame(LLMContext(messages)))
task = PipelineTask(
pipeline,

View File

@@ -15,6 +15,7 @@ from loguru import logger
from pipecat.frames.frames import (
Frame,
LLMContextFrame,
OutputAudioRawFrame,
TextFrame,
TTSAudioRawFrame,
@@ -24,10 +25,8 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext,
OpenAILLMContextFrame,
)
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
@@ -140,7 +139,7 @@ async def main():
)
task = PipelineTask(pipeline)
await task.queue_frame(OpenAILLMContextFrame(OpenAILLMContext(messages)))
await task.queue_frame(LLMContextFrame(LLMContext(messages)))
await task.stop_when_done()
await runner.run(task)

View File

@@ -9,8 +9,11 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, MetricsFrame
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import Frame, LLMRunFrame, MetricsFrame
from pipecat.metrics.metrics import (
LLMUsageMetricsData,
ProcessingMetricsData,
@@ -20,7 +23,8 @@ from pipecat.metrics.metrics import (
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -28,8 +32,8 @@ from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -61,17 +65,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -97,8 +104,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -127,7 +134,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,17 +10,22 @@ from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
LLMRunFrame,
OutputImageRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -28,7 +33,7 @@ from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
@@ -65,7 +70,7 @@ class ImageSyncAggregator(FrameProcessor):
)
)
await self.push_frame(frame)
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -78,7 +83,8 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
@@ -86,7 +92,8 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -110,8 +117,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
@@ -144,7 +151,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -9,19 +9,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.stt import CartesiaSTTService
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -33,17 +38,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -51,7 +59,7 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
@@ -67,8 +75,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -96,7 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -9,19 +9,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -32,17 +37,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -66,8 +74,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -95,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -6,26 +6,29 @@
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response import (
LLMUserAggregatorParams,
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -49,121 +52,127 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""Speechmatics STT Service Example
"""Speechmatics STT and TTS Service Example
This example demonstrates using Speechmatics Speech-to-Text service with speaker diarization and intelligent speaker management. Key features:
This example demonstrates using Speechmatics Speech-to-Text and Text-to-Speech services
with speaker diarization and intelligent speaker management. Key features:
1. Speaker Diarization
1. Speaker Diarization (STT)
- Automatically identifies and distinguishes between different speakers
- First speaker is identified as 'S1', others get subsequent IDs
- Uses `enable_diarization` parameter to manage speaker detection
2. Smart Speaker Control
2. Smart Speaker Control (STT)
- `focus_speakers` parameter lets you target specific speakers (e.g. ["S1"])
- Other speakers will be wrapped in PASSIVE tags
- Only processes speech from focused speakers
- Words from all speakers are wrapped with XML tags for clear speaker identification
- Other speakers' speech only sent when focused speaker is active
3. Voice Activity Detection
3. Voice Activity Detection (STT)
- Built-in VAD using `enable_vad` parameter
- Remove `vad_analyzer` from `transport` config to use module's VAD
- Emits speaker started/stopped events
4. Configuration Options
4. Text-to-Speech (TTS)
- Low latency streaming audio synthesis
- Multiple voice options available including `sarah`, `theo`, and `megan`
5. Configuration Options
- `operating_point` parameter defaults to `ENHANCED` for optimal accuracy
- Configurable `end_of_utterance_silence_trigger` (default 0.5s)
- Customizable speaker formatting
- Additional diarization settings available
For detailed information about operating points and configuration:
https://docs.speechmatics.com/rt-api-ref
For detailed information:
- STT: https://docs.speechmatics.com/rt-api-ref
- TTS: https://docs.speechmatics.com/text-to-speech/quickstart
"""
logger.info(f"Starting bot")
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
enable_vad=True,
enable_diarization=True,
focus_speakers=["S1"],
end_of_utterance_silence_trigger=0.5,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
speaker_passive_format="<PASSIVE><{speaker_id}>{text}</{speaker_id}></PASSIVE>",
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
model="eleven_turbo_v2_5",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Alfred. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be converted to audio so don't include special characters in your answers. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
"Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
enable_vad=True,
enable_diarization=True,
focus_speakers=["S1"],
end_of_utterance_silence_trigger=0.5,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
speaker_passive_format="<PASSIVE><{speaker_id}>{text}</{speaker_id}></PASSIVE>",
),
},
]
)
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(
context,
user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
)
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
aiohttp_session=session,
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be converted to audio so don't include special characters in your answers. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
"Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
),
},
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
await runner.run(task)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):

View File

@@ -6,27 +6,33 @@
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response import (
LLMUserAggregatorParams,
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -37,116 +43,125 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""Run example using Speechmatics STT.
"""Run example using Speechmatics STT and TTS.
This example will use diarization within our STT service and output the words spoken by
each individual speaker and wrap them with XML tags for the LLM to process. Note the
instructions in the system context for the LLM. This greatly improves the conversation
experience by allowing the LLM to understand who is speaking in a multi-party call.
This example demonstrates a complete Speechmatics integration with both Speech-to-Text
and Text-to-Speech services:
By default, this example will use our ENHANCED operating point, which is optimized for
high accuracy. You can change this by setting the `operating_point` parameter to a different
value.
STT Features:
- Diarization to identify and distinguish between different speakers
- Words spoken by each speaker are wrapped with XML tags for LLM processing
- System context instructions help the LLM understand multi-party conversations
- ENHANCED operating point by default for optimal accuracy
For more information on operating points, see the Speechmatics documentation:
https://docs.speechmatics.com/rt-api-ref
TTS Features:
- Low latency streaming audio synthesis
- Multiple voice options available including `sarah`, `theo`, and `megan`
For more information:
- STT: https://docs.speechmatics.com/rt-api-ref
- TTS: https://docs.speechmatics.com/text-to-speech/quickstart
"""
logger.info(f"Starting bot")
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
enable_diarization=True,
end_of_utterance_silence_trigger=0.5,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
model="eleven_turbo_v2_5",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Alfred. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be converted to audio so don't include special characters in your answers. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
enable_diarization=True,
end_of_utterance_silence_trigger=0.5,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
},
]
)
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(
context,
user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
)
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
aiohttp_session=session,
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be converted to audio so don't include special characters in your answers. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
),
},
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
await runner.run(task)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -30,17 +35,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -66,8 +74,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -94,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -11,19 +11,24 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.inworld.tts import InworldTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -66,9 +74,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="Ashley",
model="inworld-tts-1",
streaming=streaming, # True: real-time chunks, False: complete audio then playback
params=InworldTTSService.InputParams(
temperature=0.8,
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
@@ -80,8 +85,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -109,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -11,19 +11,24 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.asyncai.tts import AsyncAIHttpTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -35,17 +40,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -72,8 +80,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -101,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.asyncai.tts import AsyncAITTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -68,8 +76,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -97,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -0,0 +1,169 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import datetime
import os
import wave
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.aic_filter import AICFilter
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# Create audio buffer processor so we can hear the audio fitler results.
audiobuffer = AudioBufferProcessor(
num_channels=2, # 1 for mono, 2 for stereo (user left, bot right)
enable_turn_audio=False, # Enable per-turn audio recording
)
def _create_aic_filter() -> AICFilter:
license_key = os.getenv("AICOUSTICS_LICENSE_KEY", "")
return AICFilter(
license_key=license_key,
enhancement_level=1.0,
)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=_create_aic_filter(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=_create_aic_filter(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=_create_aic_filter(),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
audiobuffer, # write audio data to a file
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
await audiobuffer.start_recording()
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
# Save or process the composite audio
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"./conversation_{timestamp}.wav"
# Create the WAV file
with wave.open(filename, "wb") as wf:
wf.setnchannels(num_channels)
wf.setsampwidth(2) # 16-bit audio
wf.setframerate(sample_rate)
wf.writeframes(audio)
logger.info(f"Saved recording to {filename}")
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -0,0 +1,138 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.hume.tts import HUME_SAMPLE_RATE, HumeTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = HumeTTSService(
api_key=os.getenv("HUME_API_KEY"),
# Replace with your Hume voice ID
voice_id="f898a92e-685f-43fa-985b-a46920f0650b",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(), # Transport user input
rtvi,
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
audio_out_sample_rate=HUME_SAMPLE_RATE,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
observers=[RTVIObserver(rtvi)],
)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -15,26 +15,24 @@ from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMMessagesUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
)
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext,
)
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.frameworks.langchain import LangchainProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -55,17 +53,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -100,19 +101,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
lc = LangchainProcessor(history_chain)
context = OpenAILLMContext()
tma_in = LLMUserContextAggregator(context=context)
tma_out = LLMAssistantContextAggregator(context=context)
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
tma_in, # User responses
context_aggregator.user(), # User responses
lc, # Langchain
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
context_aggregator.assistant(), # Assistant spoken responses
]
)
@@ -129,7 +129,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
# An `OpenAILLMContextFrame` will be picked up by the LangchainProcessor using
# An `LLMContextFrame` will be picked up by the LangchainProcessor using
# only the content of the last message to inject it in the prompt defined
# above. So no role is required here.
messages = [({"content": "Please briefly introduce yourself to the user."})]

View File

@@ -0,0 +1,122 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response_universal import (
LLMContext,
LLMContextAggregatorPair,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.flux.stt import DeepgramFluxSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramFluxSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@stt.event_handler("on_update")
async def on_deepgram_flux_update(stt, transcript):
logger.debug(f"On deeggram flux update: {transcript}")
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -0,0 +1,132 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramHttpTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramHttpTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-2-andromeda-en",
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -12,23 +12,24 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import (
BotInterruptionFrame,
StopInterruptionFrame,
InterruptionFrame,
LLMRunFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -71,8 +72,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -97,18 +98,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@stt.event_handler("on_speech_started")
async def on_speech_started(stt, *args, **kwargs):
await task.queue_frames([BotInterruptionFrame(), UserStartedSpeakingFrame()])
await task.queue_frames([InterruptionFrame(), UserStartedSpeakingFrame()])
@stt.event_handler("on_utterance_end")
async def on_utterance_end(stt, *args, **kwargs):
await task.queue_frames([StopInterruptionFrame(), UserStoppedSpeakingFrame()])
await task.queue_frames([UserStoppedSpeakingFrame()])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -65,8 +73,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -94,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -11,19 +11,24 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.elevenlabs.stt import ElevenLabsSTTService
from pipecat.services.elevenlabs.tts import ElevenLabsHttpTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -35,17 +40,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -55,7 +63,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = ElevenLabsSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
aiohttp_session=session,
)
tts = ElevenLabsHttpTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
@@ -72,8 +83,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -101,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -68,8 +76,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -97,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.playht.tts import PlayHTHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -33,17 +38,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -68,8 +76,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -97,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,11 +10,16 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -22,8 +27,8 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.playht.tts import PlayHTTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -70,8 +78,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -99,7 +107,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.azure.llm import AzureLLMService
from pipecat.services.azure.stt import AzureSTTService
from pipecat.services.azure.tts import AzureTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -33,17 +38,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -74,8 +82,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -103,7 +111,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -33,17 +38,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -68,8 +76,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -98,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -11,19 +11,24 @@ import time
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openpipe.llm import OpenPipeLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -73,8 +81,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -102,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -11,19 +11,24 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.xtts.tts import XTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -71,8 +79,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -100,7 +108,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,11 +10,16 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -23,8 +28,8 @@ from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -35,17 +40,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -77,8 +85,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -106,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.lmnt.tts import LmntTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -33,17 +38,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -64,8 +72,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -93,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,20 +10,25 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response import LLMUserAggregatorParams
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.groq.llm import GroqLLMService
from pipecat.services.groq.stt import GroqSTTService
from pipecat.services.groq.tts import GroqTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -67,8 +75,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
context, user_params=LLMUserAggregatorParams(aggregation_timeout=0.05)
)
@@ -98,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -0,0 +1,177 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesAppendFrame, LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.frameworks.strands_agents import StrandsAgentsProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.stt import AWSTranscribeSTTService
from pipecat.services.aws.tts import AWSPollyTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
# Strands agent setup
try:
from strands import Agent, tool
from strands.models import BedrockModel
except ImportError:
logger.warning("Strands not installed. Please install with: pip install strands-agents")
Agent = None
BedrockModel = None
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
def build_agent(model_id: str, max_tokens: int):
"""Create and configure a Strands agent for NAB customer service coaching.
Args:
model_id: The AWS Bedrock model ID to use
max_tokens: Maximum tokens for the model
Returns:
Configured Strands Agent
"""
@tool
def check_weather(location: str) -> str:
if location.lower() == "san francisco":
return "The weather in San Francisco is sunny and 30 degrees."
elif location.lower() == "sydney":
return "The weather in Sydney is cloudy and 20 degrees."
else:
return "I'm not sure about the weather in that location."
agent = Agent(
model=BedrockModel(
model_id=model_id,
max_tokens=max_tokens,
),
tools=[check_weather],
system_prompt="You are a helpful assistant that can check the weather in a given location.",
)
return agent
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = AWSTranscribeSTTService()
tts = AWSPollyTTSService(
region="us-west-2", # only specific regions support generative TTS
voice_id="Joanna",
params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
)
# Create Strands agent processor
try:
agent = build_agent(model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0", max_tokens=8000)
llm = StrandsAgentsProcessor(agent=agent)
logger.info("Successfully created Strands agent for NAB customer service coaching")
except Exception as e:
logger.error(f"Failed to create Strands agent: {e}")
raise ValueError(
"Unable to create Strands processor. Please ensure you have properly "
"installed strands-agents and configured your AWS credentials."
)
# Setup context aggregators for message handling
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # Speech-to-text
context_aggregator.user(), # User responses
llm, # Strands Agents processor
tts, # Text-to-speech
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames(
[
LLMMessagesAppendFrame(
messages=[
{
"role": "user",
"content": f"Greet the user and introduce yourself.",
}
],
run_llm=True,
)
]
)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -8,19 +8,24 @@
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService
from pipecat.services.aws.stt import AWSTranscribeSTTService
from pipecat.services.aws.tts import AWSPollyTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -31,17 +36,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -59,8 +67,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AWSBedrockLLMService(
aws_region="us-west-2",
model="us.anthropic.claude-3-5-haiku-20241022-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8, latency="optimized"),
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
)
messages = [
@@ -70,8 +78,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -99,7 +107,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -0,0 +1,151 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""
A conversational AI bot using Gemini for both LLM, STT and TTS.
This example demonstrates how to use Gemini's image generation capabilities.
Features showcased:
- Gemini LLM for conversation and image generation
- Google TTS and STT
Run with:
python examples/foundational/07n-interruptible-gemini-image.py
Make sure to set your environment variables:
export GOOGLE_API_KEY=your_api_key_here
"""
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.google.stt import GoogleSTTService
from pipecat.services.google.tts import GoogleTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
tts = GoogleTTSService(
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash-image",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # Gemini TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation with a styled introduction
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -0,0 +1,171 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""
A conversational AI bot using Gemini for both LLM and TTS.
This example demonstrates how to use Gemini's TTS capabilities with the new
GeminiTTSService, which uses Gemini's TTS-specific models instead of Google Cloud TTS.
Features showcased:
- Gemini LLM for conversation
- Gemini TTS with natural voice control
- Support for different voice personalities
- Style and tone control through natural language prompts
Run with:
python examples/foundational/gemini-tts.py
Make sure to set your environment variables:
export GOOGLE_API_KEY=your_api_key_here
"""
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.google.stt import GoogleSTTService
from pipecat.services.google.tts import GeminiTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot with Gemini TTS")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
tts = GeminiTTSService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash-preview-tts", # TTS-specific model
voice_id="Charon",
params=GeminiTTSService.InputParams(language=Language.EN_US),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
)
# System message that instructs the AI on how to speak
messages = [
{
"role": "system",
"content": """You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
IMPORTANT: Since you're using Gemini TTS which supports natural voice control, you can include speaking instructions in your responses. For example:
- "Say cheerfully: Welcome to our conversation!"
- "Read this in a calm, professional tone: Here are the details you requested."
- "Speak in an excited whisper: I have some great news to share!"
- "Say slowly and clearly: Let me explain this step by step."
Feel free to use natural language instructions to control your voice style, tone, pace, and emotion. The TTS system will interpret these instructions and adjust the speech accordingly.
Your output will be converted to audio, so avoid special characters in your answers. Respond to what the user said in a creative and helpful way.""",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # Gemini TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation with a styled introduction
messages.append(
{
"role": "system",
"content": "Say cheerfully and warmly: Hello! I'm your AI assistant powered by Gemini's new TTS technology. I can speak with different voices, tones, and styles. How can I help you today?",
}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -10,11 +10,16 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.llm import GoogleLLMService
@@ -22,8 +27,8 @@ from pipecat.services.google.stt import GoogleSTTService
from pipecat.services.google.tts import GoogleTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -53,8 +61,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
location="us",
)
tts = GoogleTTSService(
@@ -77,8 +86,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -106,7 +115,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -70,8 +78,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -99,7 +107,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -0,0 +1,129 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispVivaFilter(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispVivaFilter(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispVivaFilter(),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -11,19 +11,24 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.krisp_filter import KrispFilter
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,19 +39,22 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispFilter(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispFilter(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispFilter(),
),
}
@@ -68,8 +76,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -97,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -11,19 +11,24 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.rime.tts import RimeHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -35,17 +40,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -73,8 +81,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -102,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.rime.tts import RimeTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -33,17 +38,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -67,8 +75,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -96,7 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.nim.llm import NimLLMService
from pipecat.services.riva.stt import RivaSTTService
from pipecat.services.riva.tts import RivaTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -33,17 +38,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -64,8 +72,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -93,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -12,13 +12,17 @@ from dotenv import load_dotenv
from google.genai.types import Content, Part
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
Frame,
InputAudioRawFrame,
InterruptionFrame,
LLMFullResponseEndFrame,
LLMFullResponseStartFrame,
StartInterruptionFrame,
LLMRunFrame,
TextFrame,
TranscriptionFrame,
UserStartedSpeakingFrame,
@@ -27,7 +31,8 @@ from pipecat.frames.frames import (
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -35,8 +40,8 @@ from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.google.tts import GoogleTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -92,9 +97,8 @@ class UserAudioCollector(FrameProcessor):
elif isinstance(frame, UserStoppedSpeakingFrame):
self._user_speaking = False
self._context.add_audio_frames_message(audio_frames=self._audio_frames)
await self._user_context_aggregator.push_frame(
self._user_context_aggregator.get_context_frame()
)
await self._user_context_aggregator.push_frame(LLMRunFrame())
elif isinstance(frame, InputAudioRawFrame):
if self._user_speaking:
self._audio_frames.append(frame)
@@ -150,7 +154,7 @@ class TranscriptExtractor(FrameProcessor):
await self.push_frame(frame, direction)
class TanscriptionContextFixup(FrameProcessor):
class TranscriptionContextFixup(FrameProcessor):
def __init__(self, context):
super().__init__()
self._context = context
@@ -181,9 +185,7 @@ class TanscriptionContextFixup(FrameProcessor):
if isinstance(frame, MagicDemoTranscriptionFrame):
self._transcript = frame.text
elif isinstance(frame, LLMFullResponseEndFrame) or isinstance(
frame, StartInterruptionFrame
):
elif isinstance(frame, LLMFullResponseEndFrame) or isinstance(frame, InterruptionFrame):
self.swap_user_audio()
self.add_transcript_back_to_inference_output()
self._transcript = ""
@@ -198,17 +200,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -240,11 +245,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
audio_collector = UserAudioCollector(context, context_aggregator.user())
pull_transcript_out_of_llm_output = TranscriptExtractor(context)
fixup_context_messages = TanscriptionContextFixup(context)
fixup_context_messages = TranscriptionContextFixup(context)
pipeline = Pipeline(
[
@@ -274,7 +279,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.fish.tts import FishAudioTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -68,8 +76,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -97,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,7 +10,10 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -19,8 +22,8 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.ultravox.stt import UltravoxSTTService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -44,17 +47,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}

View File

@@ -11,19 +11,24 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.neuphonic.tts import NeuphonicHttpTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -35,17 +40,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -72,8 +80,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -101,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.neuphonic.tts import NeuphonicTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -33,17 +38,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -67,8 +75,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -96,7 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -10,19 +10,24 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.fal.stt import FalSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -34,17 +39,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -70,8 +78,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -99,7 +107,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -11,11 +11,16 @@ import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -34,7 +39,8 @@ async def main():
LocalAudioTransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
)
)
@@ -54,8 +60,8 @@ async def main():
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -78,7 +84,7 @@ async def main():
)
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner()

View File

@@ -11,11 +11,16 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -23,8 +28,8 @@ from pipecat.services.minimax.tts import MiniMaxHttpTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -36,17 +41,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -74,8 +82,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
@@ -103,7 +111,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -0,0 +1,136 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.sarvam.stt import SarvamSTTService
from pipecat.services.sarvam.tts import SarvamHttpTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Create an HTTP session
async with aiohttp.ClientSession() as session:
stt = SarvamSTTService(
api_key=os.getenv("SARVAM_API_KEY"),
model="saarika:v2.5",
)
tts = SarvamHttpTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
aiohttp_session=session,
params=SarvamHttpTTSService.InputParams(language=Language.EN),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -5,26 +5,31 @@
#
import asyncio
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame, TTSUpdateSettingsFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.sarvam.stt import SarvamSTTService
from pipecat.services.sarvam.tts import SarvamTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
@@ -36,17 +41,20 @@ transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -54,64 +62,67 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Create an HTTP session
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = SarvamSTTService(
api_key=os.getenv("SARVAM_API_KEY"),
model="saarika:v2.5",
)
tts = SarvamTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
aiohttp_session=session,
params=SarvamTTSService.InputParams(language=Language.EN),
)
tts = SarvamTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
model="bulbul:v2",
voice_id="manisha",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
# Optionally, you can wait for 30 seconds and then change the voice.
# await asyncio.sleep(30)
# await task.queue_frame(TTSUpdateSettingsFrame(settings={"voice": "anushka"}))
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(task)
async def bot(runner_args: RunnerArguments):

View File

@@ -1,149 +0,0 @@
import asyncio
import logging
import os
from typing import Tuple
import aiohttp
from dotenv import load_dotenv
from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.aggregators import SentenceAggregator
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext,
OpenAILLMContextFrame,
)
from pipecat.runner.daily import configure
from pipecat.services.azure import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyTransport
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("pipecat")
logger.setLevel(logging.DEBUG)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Respond bot",
duration_minutes=10,
mic_enabled=True,
mic_sample_rate=16000,
camera_enabled=True,
camera_width=1024,
camera_height=1024,
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
tts1 = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts2 = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)
dalle = FalImageGenService(
params=FalImageGenService.InputParams(image_size="1024x1024"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
bot1_messages = [
{
"role": "system",
"content": "You are a stern librarian. You strongly believe that a hot dog is a sandwich. Start by stating this fact in a few sentences, then be prepared to debate this with the user. You shouldn't ever compromise on the fundamental truth that a hot dog is a sandwich. Your responses should only be a few sentences long.",
},
]
bot2_messages = [
{
"role": "system",
"content": "You are a silly cat, and you strongly believe that a hot dog is not a sandwich. Debate this with the user, only responding with a few sentences. Don't ever accept that a hot dog is a sandwich.",
},
]
async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
"""This function streams text from the LLM and uses the TTS service to convert
that text to speech as it's received.
"""
source_queue = asyncio.Queue()
sink_queue = asyncio.Queue()
sentence_aggregator = SentenceAggregator()
pipeline = Pipeline([llm, sentence_aggregator, tts1], source_queue, sink_queue)
await source_queue.put(OpenAILLMContextFrame(OpenAILLMContext(messages)))
await source_queue.put(EndFrame())
await pipeline.run_pipeline()
message = ""
all_audio = bytearray()
while sink_queue.qsize():
frame = sink_queue.get_nowait()
if isinstance(frame, TextFrame):
message += frame.text
elif isinstance(frame, AudioFrame):
all_audio.extend(frame.audio)
return (message, all_audio)
async def get_bot1_statement():
message, audio = await get_text_and_audio(bot1_messages)
bot1_messages.append({"role": "assistant", "content": message})
bot2_messages.append({"role": "user", "content": message})
return audio
async def get_bot2_statement():
message, audio = await get_text_and_audio(bot2_messages)
bot2_messages.append({"role": "assistant", "content": message})
bot1_messages.append({"role": "user", "content": message})
return audio
async def argue():
for i in range(100):
print(f"In iteration {i}")
bot1_description = "A woman conservatively dressed as a librarian in a library surrounded by books, cartoon, serious, highly detailed"
(audio1, image_data1) = await asyncio.gather(
get_bot1_statement(), dalle.run_image_gen(bot1_description)
)
await transport.send_queue.put(
[
ImageFrame(image_data1[1], image_data1[2]),
AudioFrame(audio1),
]
)
bot2_description = "A cat dressed in a hot dog costume, cartoon, bright colors, funny, highly detailed"
(audio2, image_data2) = await asyncio.gather(
get_bot2_statement(), dalle.run_image_gen(bot2_description)
)
await transport.send_queue.put(
[
ImageFrame(image_data2[1], image_data2[2]),
AudioFrame(audio2),
]
)
await asyncio.gather(transport.run(), argue())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,170 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import io
import os
import re
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
Frame,
LLMRunFrame,
MetricsFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
def format_metrics(metrics, indent=0):
lines = []
tab = "\t" * indent
for metric in metrics:
lines.append(tab + type(metric).__name__)
for field, value in vars(metric).items():
if hasattr(value, "__dict__") and not isinstance(
value, (str, int, float, bool, type(None))
):
lines.append(f"{tab}\t{field}={type(value).__name__}")
for k, v in vars(value).items():
lines.append(f"{tab}\t\t{k}={repr(v)}")
else:
lines.append(f"{tab}\t{field}={repr(value)}")
return "\n".join(lines)
class MetricsFrameLogger(FrameProcessor):
"""MetricsFrameLogger formats and logs all MetericsFrames"""
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, MetricsFrame):
logger.info(f"{frame.name}\n {format_metrics(frame.data)}")
await self.push_frame(frame, direction)
# ALWAYS push all frames
else:
# SUPER IMPORTANT: always push every frame!
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
metrics_frame_processor = MetricsFrameLogger()
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
metrics_frame_processor, # pretty print metrics frames
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -22,7 +22,7 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)

View File

@@ -24,8 +24,8 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport, maybe_capture_participant_camera
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)

Some files were not shown because too many files have changed in this diff Show More