Compare commits

...

596 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
228afe01ed Merge pull request #1993 from pipecat-ai/aleix/pipecat-0.0.71
update CHANGELOG for 0.0.71
2025-06-10 14:42:09 -07:00
Aleix Conchillo Flaqué
61a5154e49 update CHANGELOG for 0.0.71 2025-06-10 14:34:30 -07:00
Sunah Suh
d3df75aaa0 Add additional_span_attributes param to PipelineTask for extra otel… (#1992) 2025-06-10 17:32:24 -04:00
Aleix Conchillo Flaqué
c59180dd6e udpate CHANGELOG 2025-06-10 14:23:02 -07:00
Mark Backman
e4c2310632 Merge pull request #1990 from pipecat-ai/mb/more-cartesia-stt
Add Cartesia STT docs link to README, fix set_model error
2025-06-10 17:19:11 -04:00
Aleix Conchillo Flaqué
e1735a2da1 Merge pull request #1991 from pipecat-ai/aleix/pipecat-0.0.70
update CHANGELOG for 0.0.70
2025-06-10 14:08:52 -07:00
Aleix Conchillo Flaqué
c101c9c8e1 update CHANGELOG for 0.0.70 2025-06-10 13:37:28 -07:00
Aleix Conchillo Flaqué
96dc162de5 Merge pull request #1988 from pipecat-ai/aleix/update-examples-22b
examples(22b): remove unnecessary parallel pipeline branch
2025-06-10 12:58:37 -07:00
Mark Backman
257dbe3104 Fix model param error 2025-06-10 15:14:47 -04:00
Mark Backman
cd98657e3c Add Cartesia STT docs link to README 2025-06-10 15:09:13 -04:00
Aleix Conchillo Flaqué
03eb22fe0a examples(22b): remove unnecessary parallel pipeline branch 2025-06-10 09:05:58 -07:00
Mark Backman
0bb61d72ab Merge pull request #1984 from pipecat-ai/mb/cartesia-stt-cleanup
CartesiaSTTService cleanup
2025-06-10 10:30:18 -04:00
Mark Backman
f758508a82 Merge pull request #1978 from pipecat-ai/mb/rime-languages
Add languages to RimeHttpTTSService, extend lang support to German an…
2025-06-10 10:27:15 -04:00
Mark Backman
69d0218d7e Add languages to RimeHttpTTSService, extend lang support to German and French 2025-06-10 10:20:41 -04:00
Aleix Conchillo Flaqué
eb5e5ab1df update CHANGELOG 2025-06-09 20:22:39 -07:00
Aleix Conchillo Flaqué
093697906c Merge pull request #1954 from WebinarGeek/wg/gladia-informal-translations
Gladia informal translations
2025-06-09 20:21:40 -07:00
Aleix Conchillo Flaqué
efe96b7ed1 Merge pull request #1986 from pipecat-ai/aleix/daily-python0.19.2
pyproject: update daily-python to 0.19.2
2025-06-09 19:46:14 -07:00
Aleix Conchillo Flaqué
7ecdd41ab9 pyproject: update daily-python to 0.19.2 2025-06-09 17:29:07 -07:00
Mark Backman
aec70d61e9 CartesiaSTTService cleanup 2025-06-09 15:20:57 -04:00
Mark Backman
2efac13344 Merge pull request #1983 from pipecat-ai/mb/exotel-resampling
Resample audio in ExotelFrameSerializer
2025-06-09 14:41:08 -04:00
Mark Backman
15aeb11c36 Resample audio in ExotelFrameSerializer 2025-06-09 14:02:25 -04:00
Mark Backman
e705f4d984 Merge pull request #1972 from Vaibhav159/vl_add_exotel_serializer
adding exotel serializer
2025-06-09 13:54:26 -04:00
Shrey Gupta
96fa62fdfe [Add] Support for Cartesia AI STT (#1982) 2025-06-09 14:51:01 -03:00
Mark Backman
845c70797a Merge pull request #1975 from pipecat-ai/mb/11labs-flush-context-reset
fix: ElevenLabsTTSService reset context when flushing audio
2025-06-09 13:21:25 -04:00
kompfner
16048956c3 Merge pull request #1956 from pipecat-ai/pk/make-add-observer-sync
Make `PipelineTask.add_observer()` synchronous. This allows callers t…
2025-06-09 13:19:34 -04:00
Mark Backman
cf2f4b5902 fix: ElevenLabsTTSService reset context when flushing audio 2025-06-09 13:17:55 -04:00
marcus-daily
db46f33f34 Update to Android transports 0.3.7 2025-06-09 17:09:59 +01:00
Aleix Conchillo Flaqué
25d1515daf Merge pull request #1979 from pipecat-ai/aleix/buffer-tts-before-playback
buffer audio from TTS service before pushing frames
2025-06-09 08:43:55 -07:00
Paul Kompfner
a3469cd59f Add CHANGELOG entry describing PipelineTask.add_observer() being made synchronous 2025-06-09 11:37:30 -04:00
Paul Kompfner
513ce26200 Add unit test exercising synchronous usage of PipelineTask.add_observer() right after initializing the PipelineTask (before anything else is done with it) 2025-06-09 11:30:24 -04:00
Paul Kompfner
1cd96f94ff Make PipelineTask.add_observer() synchronous. This allows callers to call it before run()ning the PipelineTask first. Without this change, if they tried to do that, they would get an error because the TaskManager's event loop hadn't been set yet. 2025-06-09 11:30:24 -04:00
Aleix Conchillo Flaqué
901dd041f0 buffer audio from TTS service before pushing frames 2025-06-09 07:29:09 -07:00
Vaibhav159
a2ee94651e removing resampling 2025-06-07 12:53:55 +05:30
Aleix Conchillo Flaqué
abdce063f1 Merge pull request #1973 from pipecat-ai/aleix/assemblyai-yield-none
AssemblyAISTTService: yield None instead of Frame()
2025-06-06 15:12:16 -07:00
Aleix Conchillo Flaqué
a33ce5e4bf AssemblyAISTTService: yield None instead of Frame() 2025-06-06 14:41:01 -07:00
Filipi da Silva Fuchter
c9575eaef9 Merge pull request #1911 from pipecat-ai/filipi/websocket_transport_example
Adding support to WebsocketTransport
2025-06-06 17:25:07 -03:00
Filipi Fuchter
1e74476a71 Refactoring to use the observer inside the pipelinetask, and moving to start the bot inside on_client_ready. 2025-06-06 17:22:50 -03:00
Filipi Fuchter
82935884c4 Mentioning the new websocket example in the changelog. 2025-06-06 17:17:11 -03:00
Filipi Fuchter
d774a23768 Improving the readme to mention that can choose which server websocket to use. 2025-06-06 17:12:05 -03:00
Filipi Fuchter
e9f041e170 Removing the old websocket-server example 2025-06-06 17:09:01 -03:00
Filipi Fuchter
1f51b6e4f1 A Pipecat example demonstrating how to use WebsocketTransport 2025-06-06 17:08:43 -03:00
Filipi Fuchter
028650249c Adding support in ProtobufFrameSerializer to deserialize MessageFrame. 2025-06-06 17:07:39 -03:00
Vaibhav159
534197239f updating changelog 2025-06-07 00:24:54 +05:30
Vaibhav159
d2f4bb574c adding exotel serializer 2025-06-07 00:22:41 +05:30
Aleix Conchillo Flaqué
07fb1a2c39 Merge pull request #1967 from counterleft/unused-http-client-session
Remove instantiation of unused http client session from certain examples
2025-06-05 12:59:01 -07:00
Aleix Conchillo Flaqué
581b800c43 Merge pull request #1961 from ken-kuro/patch-1
fix(piper-tts): typo
2025-06-05 12:57:58 -07:00
Aleix Conchillo Flaqué
30ca39287f Merge pull request #1962 from ken-kuro/patch-2
fix(fastapi_websocket): typo
2025-06-05 12:57:22 -07:00
Aleix Conchillo Flaqué
01fa9698de Merge pull request #1960 from pipecat-ai/aleix/disable-uvloop
disable uvloop by default and just let the user set it
2025-06-05 12:12:47 -07:00
Brian Mathiyakom
10bd969636 Remove instantiation of unused http client session
These examples don't make any HTTP requests with `session` so there
doesn't seem be a need to create one in the first place. Probably a
copy-paste from a previous example.
2025-06-05 11:45:13 -07:00
Kendrick Ha
f7761f2b61 fix(fastapi_websocket): typo 2025-06-05 13:55:28 +07:00
Kendrick Ha
49ff38a21f fix(piper-tts): typo 2025-06-05 13:50:56 +07:00
Aleix Conchillo Flaqué
8d161306c7 disable uvloop by default and just let the user set it 2025-06-04 21:25:06 -07:00
Vanessa Pyne
027a82dff1 Merge pull request #1958 from pipecat-ai/vp-livekit-fix
fix: transports/services/livekit.py typo
2025-06-04 12:27:47 -05:00
vipyne
cb409d58e0 fix: transports/services/livekit.py typo 2025-06-04 11:14:21 -05:00
Dan Berg
094e2f8151 Fix formatting 2025-06-03 17:21:51 +02:00
Dan Berg
71d121aeb9 Update CHANGELOG.md explaining informal on Gladia TranslationConfig 2025-06-03 17:15:29 +02:00
Dan Berg
b1a88af43c Add informal to Gladia TranslationConfig 2025-06-03 17:10:52 +02:00
Filipi da Silva Fuchter
f73eb4ebd9 Merge pull request #1949 from pipecat-ai/filipi/transport_destination_issue
Fixed transport destination issue
2025-06-03 08:41:34 -03:00
Filipi Fuchter
31ca9be299 Fixing missing await to self.reset. 2025-06-03 08:37:47 -03:00
Filipi Fuchter
1642c082d1 Describing the fix in the changelog. 2025-06-02 22:28:31 -03:00
Filipi Fuchter
892d213442 Fixing issue to keep the transport_destination. 2025-06-02 22:16:10 -03:00
Filipi Fuchter
fc24267e09 Waiting for the LLM response to reset. 2025-06-02 22:15:53 -03:00
Aleix Conchillo Flaqué
9b71bdc608 Merge pull request #1947 from pipecat-ai/aleix/pipecat-0.0.69
update CHANGELOG for 0.0.69
2025-06-02 12:51:51 -07:00
Aleix Conchillo Flaqué
310be89895 update CHANGELOG for 0.0.69 2025-06-02 12:07:50 -07:00
Aleix Conchillo Flaqué
71fbd57e12 Merge pull request #1938 from pipecat-ai/aleix/custom-interruption-strategies
allow custom interruption strategies
2025-06-02 12:05:50 -07:00
Aleix Conchillo Flaqué
ab4b48c823 examples(04a): fix daily_runner import 2025-06-02 12:01:26 -07:00
Aleix Conchillo Flaqué
532767cfa1 LLMUserContextAggregator: reset strategies when reseting the aggregator 2025-06-02 12:01:26 -07:00
Aleix Conchillo Flaqué
5512de3221 allow custom interruption strategies 2025-06-02 12:01:26 -07:00
Mark Backman
13546d5e8f Merge pull request #1946 from pipecat-ai/mb/fix-11labs-context
fix: Use AudioContextWordTTSService context methods in ElevenLabsTTSS…
2025-06-02 14:55:49 -04:00
Mark Backman
c6f1aa8086 fix: Use AudioContextWordTTSService context methods in ElevenLabsTTSService 2025-06-02 14:49:05 -04:00
Mark Backman
5606c47cb7 Merge pull request #1945 from pipecat-ai/mb/gemini-2.0
Reverting Gemini Live model back to gemini-2.0-flash-live-001
2025-06-02 14:25:30 -04:00
Filipi da Silva Fuchter
7f7cd96211 Merge pull request #1944 from pipecat-ai/fixing_tavus_transport
Adding the direction when pushing the frame.
2025-06-02 15:21:58 -03:00
Filipi Fuchter
b828bfd890 Adding the direction when pushing the BotStartedSpeaking and BotStoppedSpeaking frames. 2025-06-02 15:05:56 -03:00
Mark Backman
31d084eb78 Reverting Gemini Live model back to gemini-2.0-flash-live-001 2025-06-02 13:29:05 -04:00
Mark Backman
ab18b280e9 Merge pull request #1943 from pipecat-ai/mb/add-transcription-19-openai
Add a TranscriptProcessor to 19-openai-realtime-beta.py
2025-06-02 13:01:52 -04:00
Mark Backman
24e89c4081 Merge pull request #1936 from pipecat-ai/mb/fix-01-quickstart
Add daily to the foundational examples requirements.txt
2025-06-02 12:55:36 -04:00
Mark Backman
e129390f56 Add a TranscriptProcessor to 19-openai-realtime-beta.py 2025-06-02 11:38:49 -04:00
Mark Backman
4d7c87bb4c Merge pull request #1941 from pipecat-ai/vp-observer-up
chore: move observers arg in p2p-webrtc example; add deprecated to in…
2025-06-02 11:17:48 -04:00
Mark Backman
dac3f82a75 Merge pull request #1934 from counterleft/use-disconnect-on-small-webrtc-connection-example
Fix type checker error with missing function call in the small WebRTC transport example
2025-06-02 11:17:30 -04:00
Mark Backman
fd860921f1 Add daily to the foundational examples requirements.txt 2025-06-02 11:10:49 -04:00
vipyne
0482ccd48b chore: move observers arg in p2p-webrtc example; add deprecated to in line comments 2025-06-02 09:41:09 -05:00
Dominic Stewart
b8b1990617 Fix example env file (#1939)
* Fixed typo in the example env files
2025-06-02 15:12:43 +09:00
Dominic Stewart
70951b1198 Add simplified pstn examples (#1822)
* Add simplified pstn examples
* Add daily_twilio_sip_dial_out example
2025-06-02 14:50:21 +09:00
Mark Backman
6d24514ace Merge pull request #1937 from pipecat-ai/mb/fix-message-for-logging
fix: correctly display non-roman characters
2025-06-01 12:49:48 -04:00
Aleix Conchillo Flaqué
49915ceb84 Merge pull request #1683 from pipecat-ai/aleix/run-function-calls-sequentially
run function calls sequentially or in parallel
2025-06-01 09:47:35 -07:00
Mark Backman
925b13e337 fix: correctly display non-roman characters 2025-06-01 12:29:26 -04:00
Aleix Conchillo Flaqué
ef3143d558 LLMService: don't run function calls if none are given 2025-05-31 14:01:56 -07:00
Mark Backman
ed84637b55 Add additional function call for testing to 14e, 14r, 19, 19a, 26b 2025-05-31 13:57:52 -07:00
Aleix Conchillo Flaqué
897a944478 examples(14,14a): add restaurant recommendation function call 2025-05-31 13:57:52 -07:00
Aleix Conchillo Flaqué
d86343c38d examples: update to use on_function_calls_started event 2025-05-31 13:57:52 -07:00
Aleix Conchillo Flaqué
297afdd126 LLMService: add new FunctionCallsStartedFrame 2025-05-31 13:57:52 -07:00
Aleix Conchillo Flaqué
f0cbdc4e68 LLMService: add on_function_calls_started event 2025-05-31 13:57:52 -07:00
Aleix Conchillo Flaqué
40b52cadde LLMService: s/FunctionCallLLM/FunctionCallFromLLM/ s/FunctionCallRunner/FunctionCallRunnerItem/ 2025-05-31 13:57:52 -07:00
Aleix Conchillo Flaqué
04bf85ddfe LLMService: allow executing tasks sequentially and in parallel 2025-05-31 13:57:52 -07:00
Aleix Conchillo Flaqué
4809684a13 LLMService: cancel function calls on interruptions by default 2025-05-31 13:57:50 -07:00
Aleix Conchillo Flaqué
1eb50ad88f LLMService: pass LLM function calls all at once 2025-05-31 13:57:36 -07:00
Aleix Conchillo Flaqué
52569bcdb2 LLMService: don't allow running functions concurrently for now 2025-05-31 13:57:36 -07:00
Aleix Conchillo Flaqué
a50a407415 LLMService: run function calls sequentially 2025-05-31 13:57:35 -07:00
Aleix Conchillo Flaqué
9f223442c2 Merge pull request #1935 from pipecat-ai/aleix/script-evals
improve evals
2025-05-30 19:14:59 -07:00
Mark Backman
c647114bb9 Merge pull request #1927 from pipecat-ai/mb/gemini-tracing 2025-05-30 22:01:44 -04:00
Mark Backman
43719ec737 Update CHANGELOG 2025-05-30 21:36:39 -04:00
Mark Backman
8602557985 Refactor Gemini tracing to more closely match OpenAI Realtime, add TTFB metrics 2025-05-30 21:36:00 -04:00
Mark Backman
dd1f7d0875 Add tracking to OpenAI Realtime 2025-05-30 21:36:00 -04:00
Mark Backman
ec39e794d3 _handle_transcription 2025-05-30 21:36:00 -04:00
Mark Backman
7b1a937d4c Add tracing for Gemini Live 2025-05-30 21:36:00 -04:00
Mark Backman
0fd38d8115 Merge pull request #1931 from pipecat-ai/mb/num-words
Add support for interruption strategies
2025-05-30 21:14:02 -04:00
Mark Backman
7a4efc6212 Code review feedback 2025-05-30 21:09:15 -04:00
Brian Mathiyakom
2eb2c5a413 Use disconnect() because close() doesn't exist
SmallWebRTCConnection doesn't have a `close()`. There's a `_close()` but I assume that's private due to its naming. The closest function that uses `_close()` is `disconnect()`. I assume then, that the intended resource freeing function call should be to `disconnect()`.
2025-05-30 17:14:53 -07:00
Aleix Conchillo Flaqué
2fcfb0aa9f evals: don't use Deepgram's smart formatting 2025-05-30 16:55:55 -07:00
Aleix Conchillo Flaqué
f1df079512 evals: allow running a single eval 2025-05-30 16:55:55 -07:00
Aleix Conchillo Flaqué
d77bedbafb evals: move scripts/release to script/evals and add README 2025-05-30 15:04:05 -07:00
Mark Backman
b34c593c54 Add changelog entry 2025-05-30 16:48:42 -04:00
Mark Backman
62efbc3342 Add foundational example 42 2025-05-30 16:48:42 -04:00
Mark Backman
2d609a0bde Update LLmUserContextAggregator to conditionally push_aggregation 2025-05-30 16:48:42 -04:00
Mark Backman
6bc4b4a17f Update BaseInputTransport to not push StartInterruptionFrame when InterruptionConfig is set and bot is speaking 2025-05-30 16:48:42 -04:00
Mark Backman
b489e52080 Add InterruptionConfig 2025-05-30 16:15:20 -04:00
Aleix Conchillo Flaqué
a8aaeec52b Merge pull request #1926 from pipecat-ai/aleix/pause-base-input-transport
handle StopFrame in base input transport and stop pushing frames
2025-05-30 08:27:20 -07:00
Mark Backman
ad7eec181e Merge pull request #1923 from philipp-eisen/philipp/fix-ttfb_ms-implementation
Fix implementation of ttfb_ms metric
2025-05-30 11:03:42 -04:00
Aleix Conchillo Flaqué
b33897ffb9 SmallWebRTCTransport: don't initialize if a second StartFrame is received 2025-05-30 07:51:51 -07:00
Aleix Conchillo Flaqué
1c3d3f2f4b WebsocketClientTransport: don't initialize if a second StartFrame is received 2025-05-30 07:51:51 -07:00
Aleix Conchillo Flaqué
9a5a1edb6b WebsocketServerTransport: don't initialize if a second StartFrame is received 2025-05-30 07:51:51 -07:00
Aleix Conchillo Flaqué
f2eb869b02 FastAPIWebsocketTransport: don't initialize if a second StartFrame is received 2025-05-30 07:51:51 -07:00
Aleix Conchillo Flaqué
0c7e3cfcb2 LiveKitTransport: don't initialize if a second StartFrame is received 2025-05-30 07:51:51 -07:00
Aleix Conchillo Flaqué
24e19db29e TavusTransport: don't initialize if a second StartFrame is received 2025-05-30 07:51:51 -07:00
Aleix Conchillo Flaqué
bc6d7b7bbd examples(phone-chatbot): don't show dangling tasks in first pipeline 2025-05-30 07:51:51 -07:00
Aleix Conchillo Flaqué
cad271068e BaseInputTransport: handle StopFrame and pause pushing frames 2025-05-30 07:51:49 -07:00
Philipp Eisen
3425293115 Ensure correct formatting 2025-05-30 15:41:45 +02:00
Mark Backman
20dbfec3a9 Merge pull request #1876 from m-ods/m-ods/assemblyai-universal-streaming
Update AssemblyAI Streaming STT
2025-05-30 08:55:43 -04:00
marcus-daily
170057a75a Updating dependency version 2025-05-30 12:38:32 +01:00
marcus-daily
b86b761e0b Fixing Yaml syntax 2025-05-30 12:38:32 +01:00
marcus-daily
da0d2f0266 Small WebRTC transport demo app for Android 2025-05-30 12:38:32 +01:00
Martin Schweiger
321ea27c34 changelog entry 2025-05-30 17:15:58 +08:00
Philipp Eisen
b712e6b9aa Switch ttfb metric to use seconds instead 2025-05-30 11:13:26 +02:00
Martin Schweiger
b3652e6527 set vad_force_turn_endpoint to True by default 2025-05-30 17:07:54 +08:00
Philipp Eisen
bc97f397ef Switch to using float instead of int, but keep ms 2025-05-30 10:27:20 +02:00
Martin Schweiger
e5da3f6e68 add tracing 2025-05-30 10:55:19 +08:00
Martin Schweiger
8400539acf set formatted_finals true by default 2025-05-30 10:40:01 +08:00
Martin Schweiger
b5eac8dfed add message to TranscriptionFrame result 2025-05-30 10:39:07 +08:00
Martin Schweiger
ba312b5591 Merge branch 'main' into m-ods/assemblyai-universal-streaming 2025-05-30 10:29:49 +08:00
Martin Schweiger
f23572b318 Merge branch 'main' into m-ods/assemblyai-universal-streaming 2025-05-30 10:11:02 +08:00
Martin Schweiger
db838634e7 fix: double finals bug 2025-05-30 10:00:31 +08:00
Martin Schweiger
7f2e848a5c use FrameProcessor class methods 2025-05-30 09:40:46 +08:00
Martin Schweiger
096e854d50 remove .dict() 2025-05-30 09:31:20 +08:00
Martin Schweiger
3ffe8b3155 remove parse_obj 2025-05-30 09:29:51 +08:00
Mark Backman
a471f49b61 Merge pull request #1889 from pipecat-ai/mb/add-dtmf-aggregator
Add DTMFAggregator
2025-05-29 17:13:28 -04:00
Mark Backman
4d2a02f318 Refactor to create task on StartFrame, also cleanup 2025-05-29 17:10:54 -04:00
Mark Backman
0bec7db03b Emit a BotInterruptionFrame when the first keypress of a sequence is received 2025-05-29 17:07:18 -04:00
Mark Backman
74827f983f Add tests, improve frame timing 2025-05-29 17:07:18 -04:00
Mark Backman
0ed46f457e Add DTMFAggregator 2025-05-29 17:07:17 -04:00
Aleix Conchillo Flaqué
36b731be73 Merge pull request #1915 from pipecat-ai/aleix/uvloop-event-loop
use uvloop as the new event loop on Linux and macOS
2025-05-29 14:06:44 -07:00
Aleix Conchillo Flaqué
62fbdd4e81 Merge pull request #1922 from pipecat-ai/aleix/output-transport-cleanup
output transports cleanup
2025-05-29 14:06:17 -07:00
Aleix Conchillo Flaqué
ca7b0650c2 examples: capture camera or screen. allow setting framerate 2025-05-29 13:16:44 -07:00
Aleix Conchillo Flaqué
67dd146038 output transports cleanup 2025-05-29 13:16:44 -07:00
Aleix Conchillo Flaqué
fb66df2efd use uvloop as the new event loop on Linux and macOS 2025-05-29 11:24:21 -07:00
Aleix Conchillo Flaqué
2395ca0057 Merge pull request #1921 from pipecat-ai/aleix/transcription-frame-result
add STT result field to TranscriptionFrame/InterimTranscriptionFrame
2025-05-29 11:22:38 -07:00
Aleix Conchillo Flaqué
d203789490 add STT result field to TranscriptionFrame/InterimTranscriptionFrame 2025-05-29 11:21:44 -07:00
Aleix Conchillo Flaqué
7ea0e31cd4 Merge pull request #1924 from pipecat-ai/aleix/new-examples-package
add new pipecat.examples package and make runner public
2025-05-29 11:18:22 -07:00
Mark Backman
d3bf13a503 Merge pull request #1917 from pipecat-ai/mb/fix-twilio-chatbot-client
fix: update websocket_client to use FrameProcessorSetup.task_manager
2025-05-29 14:05:27 -04:00
Mark Backman
ea91970499 Update the twilio-chatbot client to work with the updated server, which requires call_sid 2025-05-29 14:02:49 -04:00
Mark Backman
803b3f2cc4 fix: update websocket_client to use FrameProcessorSetup 2025-05-29 14:02:47 -04:00
Aleix Conchillo Flaqué
1788ba6c5c add new pipecat.examples package and make runner public 2025-05-29 10:43:12 -07:00
Philipp Eisen
5209bd3d9f Fix implemetation of ttfb_ms metric 2025-05-29 19:24:04 +02:00
Aleix Conchillo Flaqué
cb9178f1ec Merge pull request #1914 from pipecat-ai/aleix/base-output-daily-handle-dtmf-frames
output transport can now handle DTMF keypress
2025-05-29 09:35:01 -07:00
Aleix Conchillo Flaqué
5676920a6a BaseOutputTransport/DailyTransport: allow sending DTMF keypress 2025-05-29 09:33:29 -07:00
Aleix Conchillo Flaqué
513221d9fd added OutputDTMFUrgentFrame 2025-05-29 09:32:57 -07:00
Mark Backman
a33d0b4b53 Merge pull request #1904 from pipecat-ai/mb/aws-bedrock-no-op-tool
Add no_op tool to AWSBedrockLLMService
2025-05-29 10:29:19 -04:00
Mark Backman
bee242b781 Safer check when using the no_operation tool 2025-05-29 10:25:20 -04:00
Mark Backman
fa1c98ff29 Add no_op tool to AWSBedrockLLMService 2025-05-29 10:25:19 -04:00
Mark Backman
ae3a7d9bed Merge pull request #1920 from alexflorensa/alexflorensa/set-deepgram-model-name
fix(deepgram-stt): set model name to Deepgram STT
2025-05-29 09:41:56 -04:00
alexflorensa
0c2efb312c fix(deepgram-stt) Set model name to Deepgram STT 2025-05-29 12:08:16 +02:00
Aleix Conchillo Flaqué
cf8eeaab0b Merge pull request #1909 from pipecat-ai/aleix/daily-expose-transcription-functions
DailyTransport: expose start_transcription/stop_transcription
2025-05-28 17:06:40 -07:00
Vanessa Pyne
2f8cb3ce76 Merge pull request #1804 from pipecat-ai/vp-text-webrtc-chatbot
add text chatbot example using small webrtc transport
2025-05-28 14:25:42 -05:00
vipyne
821da723c0 update SmallWebRTCTransport text examples with new run_example 2025-05-28 13:54:45 -05:00
Filipi Fuchter
575b97ba60 Some improvements and cleanups in the SmallWebRTCTransport text examples. 2025-05-28 13:54:45 -05:00
vipyne
cc0819b709 examples: add text and audio over webrtc transport
update filenames

add high level comments to 41* examples
2025-05-28 13:54:45 -05:00
vipyne
318d6f042b examples: add text chatbot example using small webrtc transport
examples: send small webrtc UI updates in RTVIServerMessageFrame

add explanation about RTVI server messages being specific to
small-webrtc-prebuilt UI
2025-05-28 13:54:45 -05:00
Aleix Conchillo Flaqué
05ae3a1703 DailyTransport: expose start_transcription/stop_transcription 2025-05-28 11:21:26 -07:00
Aleix Conchillo Flaqué
8e54805e62 Merge pull request #1908 from pipecat-ai/aleix/add-manifest-file-to-reduce-sdist
add MANIFEST.in to reduce sdist tarball size
2025-05-28 10:15:38 -07:00
Aleix Conchillo Flaqué
64399a72f3 add MANIFEST.in to reduce sdist tarball size 2025-05-28 10:09:39 -07:00
Aleix Conchillo Flaqué
6c33f0b0bd Merge pull request #1907 from pipecat-ai/aleix/pipecat-0.0.68
update CHANGELOG for pipecat 0.0.68
2025-05-28 09:41:34 -07:00
Aleix Conchillo Flaqué
aca304b395 update CHANGELOG for pipecat 0.0.68 2025-05-28 09:08:08 -07:00
Aleix Conchillo Flaqué
db5c9e67be Merge pull request #1906 from pipecat-ai/aleix/daily-python-0.19.1
pyproject: update daily-python to 0.19.1
2025-05-28 09:05:13 -07:00
Aleix Conchillo Flaqué
2313cec792 pyproject: update daily-python to 0.19.1 2025-05-28 00:36:40 -07:00
Aleix Conchillo Flaqué
83acaf692a Merge pull request #1890 from pipecat-ai/aleix/examples-multi-transport
add support for multiple transports to foundational examples
2025-05-28 00:27:01 -07:00
Aleix Conchillo Flaqué
e9aeb2662b scripts: allow specifying a name for the test run 2025-05-28 00:22:55 -07:00
Aleix Conchillo Flaqué
356f4039e4 scripts: allow storing logs for release evals 2025-05-27 21:10:22 -07:00
Aleix Conchillo Flaqué
736c7f1f30 scripts: allow storing audio for release evals 2025-05-27 18:09:25 -07:00
Aleix Conchillo Flaqué
2994448036 introduce release evals
This is an initial attempt to implement evals for all (or most) of our
foundational examples. Before we release, we want to make sure all of them work
and reply properly. Until now this has been done manually, hopefully this will
be useful to speed up our release process.
2025-05-27 17:42:52 -07:00
Aleix Conchillo Flaqué
d476d9ea05 examples: remove "on_client_closed"
This has been replaced for "on_client_disconnected" in SmallWebRTCTransport to
match other transports and therefore it is not necessary anymore.
2025-05-27 17:42:52 -07:00
Filipi Fuchter
bf31bce440 Updated SmallWebRTCTransport to align with how other transports handle on_client_disconnected 2025-05-27 17:42:52 -07:00
Aleix Conchillo Flaqué
6393e89022 examples(foundational): update handle_signint depending on transport 2025-05-27 17:42:52 -07:00
Aleix Conchillo Flaqué
884268fce3 examples(foundational): allow running examples with twilio 2025-05-27 17:42:52 -07:00
Aleix Conchillo Flaqué
071a9307c9 transport(websocket): do not require a frame serializer 2025-05-27 17:42:52 -07:00
Aleix Conchillo Flaqué
2cdfaa0a82 examples(foundational): support multiple transports 2025-05-27 17:42:52 -07:00
Aleix Conchillo Flaqué
ecf878e14d DailyTransport: allow requesting video frames with any framerate 2025-05-27 17:42:50 -07:00
Aleix Conchillo Flaqué
4eed335bc7 PipelineTask: check if pipeline has already been cancelled 2025-05-27 17:40:39 -07:00
Aleix Conchillo Flaqué
2e57bb74d2 BaseObject: do not raise exception if event handler not registered 2025-05-27 17:40:35 -07:00
Aleix Conchillo Flaqué
0a39769cd0 Merge pull request #1901 from pipecat-ai/aleix/deepgram-sdk-4.1.0
pyproject: update deepgram-sdk to 4.1.0
2025-05-27 17:20:54 -07:00
Aleix Conchillo Flaqué
bdb6a9e5d1 Merge pull request #1903 from pipecat-ai/aleix/openpipe-4.50.0
pyproject: update openpipe to 4.50.0
2025-05-27 17:20:30 -07:00
Aleix Conchillo Flaqué
f88e0eb96d pyproject: update openpipe to 4.50.0 2025-05-27 15:22:55 -07:00
Aleix Conchillo Flaqué
0099f60d29 deepgram: fix an issue with user provided LiveOptions 2025-05-27 15:19:17 -07:00
Filipi da Silva Fuchter
eaf9f20c56 Merge pull request #1898 from pipecat-ai/tavus_transport_fixes
Fixing TavusTransport with some TTS services.
2025-05-27 16:33:19 -03:00
Aleix Conchillo Flaqué
e987c4741a pyproject: update deepgram-sdk to 4.1.0 2025-05-27 10:32:48 -07:00
Aleix Conchillo Flaqué
6242278abd Merge pull request #1895 from pipecat-ai/aleix/more-avoid-mutable-default-values
more avoiding mutable default constructor values
2025-05-27 10:08:42 -07:00
Mark Backman
30850a431a Merge pull request #1886 from pipecat-ai/mb/add-otel-llm-output
Add LLM response tracing to OTel tracing
2025-05-27 11:58:57 -04:00
Mark Backman
1e7407c042 Merge pull request #1899 from pipecat-ai/mb/genai-to-standard-messages
fix: Update GoogleLLMContext to_standard_messages to be compatible with google-genai
2025-05-27 11:57:44 -04:00
Mark Backman
6d94f31ff2 Update foundational example 33 to work with google-genai 2025-05-27 11:20:21 -04:00
Mark Backman
ebb3d1cfd3 fix: Update GoogleLLMContext to_standard_messages to be compatible with google-genai 2025-05-27 11:20:21 -04:00
Filipi Fuchter
acce9489d7 Creating the silence based on the chunk size. 2025-05-27 11:26:34 -03:00
Filipi Fuchter
3d442620f9 Removing not used imports. 2025-05-27 10:42:06 -03:00
Martin Schweiger
f1d7eb8565 final touches 2025-05-27 21:29:50 +08:00
Filipi Fuchter
798b935ff6 The remaining audio should not be sent as done. 2025-05-27 10:28:40 -03:00
Filipi Fuchter
3039a1444e Refactoring the TavusVideoService to match the same the behavior of the bot started speaking and bot stopped speaking. 2025-05-27 10:26:41 -03:00
Mark Backman
aa7d15beb3 fix: move LLM call outside tracing try block to prevent double execution 2025-05-26 22:06:31 -04:00
Filipi Fuchter
2b3d2cb342 Fixing the issue when using the TavusVideoService with some TTS services. 2025-05-26 22:55:20 -03:00
Filipi Fuchter
5a58357429 Fixing the issue when using the TavusTransport with some TTS services. 2025-05-26 18:34:52 -03:00
Mark Backman
366add2536 Merge pull request #1878 from pipecat-ai/mb/add-plivo-serializer
Add PlivoFrameSerializer
2025-05-26 11:07:13 -04:00
Mark Backman
e13c9fd42e Add PlivoFrameSerializer 2025-05-26 11:00:01 -04:00
Mark Backman
2a6c01f634 Merge pull request #1885 from pipecat-ai/mb/update-langfuse-example 2025-05-26 10:14:02 -04:00
Mark Backman
bf29722e78 Merge pull request #1884 from pipecat-ai/mb/readme-otel-twilio-telnyx 2025-05-26 10:13:23 -04:00
Mark Backman
db227ad15f Merge pull request #1897 from pipecat-ai/mb/fix-websocket-client-example
Fix mismatched html tag in websocket client example
2025-05-26 09:26:53 -04:00
Mark Backman
514716042b Fix mismatched html tag in websocket client example 2025-05-26 08:25:03 -04:00
Aleix Conchillo Flaqué
7a767e680c more avoiding mutable default constructor values 2025-05-25 21:00:35 -07:00
Martin Schweiger
320b52eb1e Merge branch 'm-ods/assemblyai-universal-streaming' of https://github.com/m-ods/pipecat into m-ods/assemblyai-universal-streaming 2025-05-26 09:13:08 +08:00
Martin Schweiger
428cee75c5 Add User-Agent header to AssemblyAI websocket connection 2025-05-26 09:10:55 +08:00
Martin Schweiger
5479a55b2c Add websockets dependency to assemblyai extra 2025-05-26 09:08:56 +08:00
Mark Backman
d1f2a5d04f Merge pull request #1868 from aristid/google-streaming-tts
Add Google streaming TTS as a base TTS service
2025-05-24 12:42:47 -04:00
aristid
09ba319f3e Merge branch 'main' into google-streaming-tts 2025-05-24 17:16:22 +02:00
Mark Backman
6f524fb816 Add LLM response to OTel tracing 2025-05-24 09:15:39 -04:00
unknown
d3e2a9e5c0 Change default voice and fix formatting 2025-05-24 15:14:39 +02:00
Mark Backman
b4cd7d7941 Langfuse OTel env.example improvements 2025-05-24 08:17:10 -04:00
Mark Backman
cd03b91115 Update README: Add OTel, Add serializers 2025-05-24 07:23:59 -04:00
Aleix Conchillo Flaqué
f86d002ceb Merge pull request #1881 from pipecat-ai/aleix/daily-input-audio-and-video-task-fix
daily input audio and video task fix
2025-05-23 19:39:25 -07:00
Aleix Conchillo Flaqué
940926b5ec TavusVideoService: no need to enable audio/video outputs 2025-05-23 19:29:34 -07:00
Aleix Conchillo Flaqué
85c096df0b DailyTransport: create audio/video input tasks when input flag is enabled 2025-05-23 19:28:18 -07:00
Filipi da Silva Fuchter
76d93522ac Merge pull request #1820 from pipecat-ai/tavus_video_service
Tavus improvements
2025-05-23 23:11:00 -03:00
Filipi Fuchter
31492831cc Updating the changelog and readme to reflect the Tavus changes. 2025-05-23 23:04:04 -03:00
Filipi Fuchter
8221dd594e Creating a Tavus example using the DailyTransport. 2025-05-23 23:03:40 -03:00
Filipi Fuchter
6346ca1a84 Creating a Tavus example using the SmallWebRTCTransport. 2025-05-23 23:03:24 -03:00
Filipi Fuchter
4a3404883f Creating a Tavus example using the new TavusTransport. 2025-05-23 23:03:16 -03:00
Filipi Fuchter
1ebca35313 Queuing the app messages if the SmallWebrtcTransport is not connected yet. 2025-05-23 23:03:04 -03:00
Filipi Fuchter
e0d1381f87 Refactoring the TavusVideoService to allow to work with any transport. 2025-05-23 23:02:49 -03:00
Filipi Fuchter
86e6841569 Creating TavusTransport and TavusTransportClient. 2025-05-23 23:02:37 -03:00
Aleix Conchillo Flaqué
28b7a92a00 Merge pull request #1880 from pipecat-ai/aleix/daily-resize-event-loop-fix
BaseOutputTransport: don't block event loop during image resize
2025-05-23 18:32:00 -07:00
Aleix Conchillo Flaqué
4db5b18694 BaseOutputTransport: don't block event loop during image resize 2025-05-23 18:30:28 -07:00
Aleix Conchillo Flaqué
a628e921c0 Merge pull request #1879 from pipecat-ai/aleix/daily-fix-video-task
DailyTransport: fix video task variable
2025-05-23 17:56:08 -07:00
Aleix Conchillo Flaqué
6ca6ff37c9 DailyTransport: fix video task variable 2025-05-23 17:54:25 -07:00
Aleix Conchillo Flaqué
456db3710a Merge pull request #1828 from pipecat-ai/aleix/daily-use-audio-renderers
DailyTransport: replace virtual speaker and microphones
2025-05-23 13:31:51 -07:00
Aleix Conchillo Flaqué
50f024c6f9 LiveKitTransport: use UserAudioRawFrame instead of InputAudioRawFrame 2025-05-23 11:27:53 -07:00
Aleix Conchillo Flaqué
a4de75a8c0 Merge pull request #1867 from pipecat-ai/aleix/user-bot-latency-log-observer
observers: added UserBotLatencyLogObserver
2025-05-23 09:23:03 -07:00
Aleix Conchillo Flaqué
88e8fcdaca observers: added UserBotLatencyLogObserver 2025-05-23 09:17:53 -07:00
unknown
bfe9952c9a Remove sleep(0), add doc string etc. 2025-05-23 12:11:08 +02:00
Martin Schweiger
7f568e3e7e Merge branch 'main' into m-ods/assemblyai-universal-streaming 2025-05-23 17:39:00 +08:00
Martin Schweiger
9b8800ac1d update stt.py 2025-05-23 17:32:31 +08:00
Martin Schweiger
fd53712567 add models for new streaming service 2025-05-23 17:32:12 +08:00
Aleix Conchillo Flaqué
7f74c2465c SileroVADAnalyzer: improve non-matching sample rate log 2025-05-23 01:47:09 -07:00
Aleix Conchillo Flaqué
30d67a78eb examples(chatbot-audio-recording): use same sample rate to avoid downsampling 2025-05-23 01:47:09 -07:00
Aleix Conchillo Flaqué
c3cfd1f0ce DailyTransport: process audio, video and event callbacks in separate tasks 2025-05-23 01:47:09 -07:00
Aleix Conchillo Flaqué
69ac70eed8 DailyTransport: replace virtual microphone with custom microphone track 2025-05-23 01:47:09 -07:00
Aleix Conchillo Flaqué
fcf49e79cc DailyTransport: use participant audio renderers instead of virtual speaker 2025-05-23 01:47:09 -07:00
Aleix Conchillo Flaqué
8d4894846d pyproject: update to daily-python 0.19.0 2025-05-23 01:47:09 -07:00
Vanessa Pyne
a809b710c5 Merge pull request #1844 from pipecat-ai/vp-docsinreadme
add docs link at top of readme
2025-05-22 21:52:18 -05:00
vipyne
f6289e9db2 add docs link at top of readme 2025-05-22 21:51:29 -05:00
Mark Backman
26b4c4df22 Merge pull request #1870 from pipecat-ai/mb/gemini-2.5-flash-update
Update GeminiMultimodalLiveLLMService to use Gemini 2.5 Flash Native …
2025-05-22 18:19:55 -04:00
Mark Backman
f3a9844295 Merge pull request #1860 from pipecat-ai/mb/organize-otel-demos
Reorganize OpenTelemetry demos, add top-level README
2025-05-22 18:15:20 -04:00
Mark Backman
692821bdae Merge pull request #1873 from pipecat-ai/mb/readme-sarvam
Add SarvamTTSService to README
2025-05-22 18:14:40 -04:00
Mark Backman
ee143d5b3a Update GeminiMultimodalLiveLLMService to use Gemini 2.5 Flash Native Audio Dialog model 2025-05-22 18:13:41 -04:00
Mark Backman
7e178a634a Merge pull request #1871 from pipecat-ai/mb/claude-sonnet-4-update
Update default model for Anthropic to Claude Sonnet 4
2025-05-22 18:12:47 -04:00
Mark Backman
fe88a3d80b Add SarvamTTSService to README 2025-05-22 18:11:11 -04:00
Mark Backman
a196eac290 Merge pull request #1872 from pipecat-ai/mb/add-sarvam-tts
Add SarvamTTSService
2025-05-22 18:02:36 -04:00
Mark Backman
3c819955a2 Add SarvamTTSService 2025-05-22 16:23:08 -04:00
Mark Backman
ca0d7bbbed Update default model for Anthropic to Claude Sonnet 4 2025-05-22 15:13:33 -04:00
Mark Backman
f93bd1e817 Merge pull request #1864 from pipecat-ai/mb/fix-11lab-set-model-voice
Fix: ElevenLabsTTSService, change voice and model
2025-05-22 14:36:24 -04:00
Mark Backman
415bc6ca0a Fix: ElevenLabsTTSService, change voice and model 2025-05-22 14:28:50 -04:00
Mark Backman
8543c8d11d Merge pull request #1869 from pipecat-ai/mb/update-readme-nova-sonic
Add AWS Nova Sonic to README
2025-05-22 14:07:35 -04:00
Mark Backman
bf5ad64575 Add AWS Nova Sonic to README 2025-05-22 14:03:28 -04:00
unknown
d42d02d809 Add Google streaming TTS as a base TTS service. Rename non-streaming service to GoogleHttpTTSService. 2025-05-22 11:21:06 +02:00
Aleix Conchillo Flaqué
0718f79ff2 Merge pull request #1866 from pipecat-ai/aleix/base-observers-are-base-objects
BaseObserver: inherit from BaseObject so we can have events
2025-05-21 16:07:38 -07:00
Aleix Conchillo Flaqué
9bbce225ce BaseObserver: inherit from BaseObject so we can have events 2025-05-21 16:04:44 -07:00
Mark Backman
fb35fd6d71 Merge pull request #1859 from pipecat-ai/mb/otel-attribute-naming
Update OTel attribute names
2025-05-21 12:10:15 -04:00
Mark Backman
b4fd92aed6 Merge pull request #1862 from marctorsoc/clean-links-in-md-text-filter
Add link cleaning in MD text filter
2025-05-21 09:20:27 -04:00
Mark Backman
36931825b3 Merge pull request #1854 from sklinglernv/fix/elevenlab-tts
fix(elevenlabs tts): message parameter naming
2025-05-21 09:17:29 -04:00
marc.torsoc
ca35299dcd add link cleaning and a test for it 2025-05-21 12:08:53 +02:00
Severin Klingler
e74b900914 revert most of the changes except keyword naming fix 2025-05-21 09:24:03 +02:00
Mark Backman
25115668a7 Reorganize OpenTelemetry demos, add top-level README 2025-05-20 23:30:46 -04:00
Mark Backman
fb94db3e64 Update to use GenAI naming 2025-05-20 22:56:02 -04:00
Mark Backman
c4778e770e Merge pull request #1835 from marcklingen/langfuse-tracing
Add examples/open-telemetry-tracing-langfuse
2025-05-20 18:22:55 -04:00
Mark Backman
3860cdf97b Update OTel attribute names 2025-05-20 18:00:46 -04:00
Aleix Conchillo Flaqué
f3aec0c4ac Merge pull request #1829 from pipecat-ai/aleix/pipeline-task-add-observer
PipelineTask: add add_observer()
2025-05-20 13:18:24 -07:00
Aleix Conchillo Flaqué
d333094149 PipelineTask: add add_observer() and remove_observer() 2025-05-20 13:16:06 -07:00
Aleix Conchillo Flaqué
609ff4e66c Merge pull request #1841 from pipecat-ai/aleix/base-text-aggregator-async
make BaseTextAggregator and BaseTextFilter functions async
2025-05-20 13:13:54 -07:00
Aleix Conchillo Flaqué
cbccbcd9e7 BaseTextFilter: make functions async 2025-05-20 13:11:44 -07:00
Aleix Conchillo Flaqué
54b1d7fcc1 BaseTextAggregator: make functions async 2025-05-20 13:11:42 -07:00
Aleix Conchillo Flaqué
54388c0d9b Merge pull request #1850 from pipecat-ai/aleix/transcription-message-user-id
TranscriptionMessage: add user_id field
2025-05-20 13:10:42 -07:00
Aleix Conchillo Flaqué
228c866aaa Merge pull request #1857 from pipecat-ai/aleix/avoid-mutable-default-values
avoid mutable default constructor values
2025-05-20 13:10:24 -07:00
Aleix Conchillo Flaqué
a09bd648af avoid mutable default constructor values 2025-05-20 11:59:28 -07:00
Vanessa Pyne
3e4ae61c75 Merge pull request #1856 from pipecat-ai/vp-mcp-debug
mcp: fix typo in tool call response
2025-05-20 13:59:11 -05:00
vipyne
7655c432c2 mcp: fix typo in tool call response 2025-05-20 11:16:59 -05:00
Aleix Conchillo Flaqué
25dd651757 TranscriptionMessage: add user_id field 2025-05-19 15:47:54 -07:00
Mark Backman
462aecea3e Merge pull request #1839 from pipecat-ai/mb/cartesia-speed
Add support for Cartesia's speed parameter, update clients and APIs, deprecate emotion
2025-05-19 17:57:25 -04:00
Mark Backman
5f37df790b Merge pull request #1848 from pipecat-ai/mb/fix-word-wrangler-transport-params
Fix: Add audio_in_enabled to Word Wrangler TransportParams
2025-05-19 17:52:05 -04:00
Mark Backman
8e4e03541c Update CHANGELOG 2025-05-19 17:51:27 -04:00
Aleix Conchillo Flaqué
c1252fc7eb Merge pull request #1840 from pipecat-ai/aleix/base-object-dont-create-tasks
BaseObject: don't create event handler tasks if none is registered
2025-05-19 14:12:31 -07:00
Mark Backman
ed1077cc9a Fix: Add audio_in_enabled to Word Wrangler TransportParams 2025-05-19 15:53:29 -04:00
Mark Backman
4c761a7b22 Merge pull request #1847 from pipecat-ai/mb/update-otel
Keep span identifiers in attributes only
2025-05-19 14:37:42 -04:00
Mark Backman
9bc3df7803 Keep span identifiers in attributes only 2025-05-19 12:25:13 -04:00
Aleix Conchillo Flaqué
5e5060a6fe BaseObject: don't create event handler tasks if none is registered 2025-05-19 09:24:56 -07:00
Aleix Conchillo Flaqué
2b66eddaa1 Merge pull request #1830 from pipecat-ai/aleix/pipeline-task-frame-events
PipelineTask: add new started/stopped/ended/cancelled events
2025-05-19 08:32:28 -07:00
Mark Backman
916b9d6c6d Add an example for CartesiaHttpTTSService 2025-05-19 11:31:47 -04:00
Mark Backman
bd09ccd608 Update CartesiaHttpTTSService to work with the new cartesia 2.0 client 2025-05-19 11:31:28 -04:00
Mark Backman
682f8e4d45 Bump the cartesia_version for CartesiaTTSService, and cartesia package for CartesiaHttpTTSService 2025-05-19 11:10:03 -04:00
Mark Backman
c9d0af9ee0 Deprecate emotion, add new speed parameter 2025-05-19 09:43:24 -04:00
Severin Klingler
e1299d59bf fix(elevenlabs tts): Fix message paramter naming and make use of contexts to send out TTSStoppedFrames() 2025-05-19 15:22:13 +02:00
Mark Backman
61da6437ea Merge pull request #1834 from pipecat-ai/mb/gemini-live-tokens
Fix: Make LLMTokenUsage more robust
2025-05-19 09:04:07 -04:00
Marc Klingen
798705469b Update README.md 2025-05-18 21:11:20 +02:00
Marc Klingen
459a753de3 add reference to main otel example 2025-05-18 19:56:12 +02:00
Marc Klingen
1092ce70b3 add video of langfuse trace 2025-05-18 19:46:38 +02:00
Marc Klingen
9511c189bd revert original folder 2025-05-18 19:42:13 +02:00
Marc Klingen
66fea9e2ee create new example folder 2025-05-18 19:41:17 +02:00
Marc Klingen
69ae83516e use http exporter 2025-05-18 19:11:06 +02:00
Mark Backman
144ea36c81 Fix: Make LLMTokenUsage more robust 2025-05-18 07:41:16 -04:00
Mark Backman
7a8ab9a900 Merge pull request #1672 from golbin/main
Use "use_original_timestamps" only for sonic-2 model
2025-05-18 07:24:58 -04:00
Jin Kim
c4b35055b4 Update CHANGELOG.md 2025-05-18 16:54:29 +09:00
Jin Kim
a4c04e7c17 Opt out Sonic models from use_original_timestamps 2025-05-18 16:52:37 +09:00
Jin Kim
a6f7e7fc30 Merge branch 'pipecat-ai:main' into main 2025-05-18 16:48:24 +09:00
Aleix Conchillo Flaqué
d5ebc883b3 PipelineTask: add new started/stopped/ended/cancelled events 2025-05-17 22:46:22 -07:00
Mark Backman
deb43df0a4 Merge pull request #1824 from pipecat-ai/mb/gemini-live-transcribe-user-audio
Update GeminiMultimodalLiveLLMService to use Gemini's user transcription
2025-05-16 22:51:04 -04:00
Mattie Ruth
88e472b3f1 Update Modal Readme (#1825) 2025-05-16 17:40:57 -04:00
Mark Backman
f59fb8167d Merge pull request #1784 from thsunkid/thu/handle-transcript-gpt4o-audio
Handle audio transcript from gpt-4o-audio and clean up logs
2025-05-16 13:20:16 -04:00
Mark Backman
fac6f526f7 Add comments and docstrings 2025-05-16 10:54:50 -04:00
Mark Backman
2f78d74ce6 Change Gemini Live to use Gemini provided usage metrics 2025-05-16 10:53:01 -04:00
Mark Backman
d3942dda52 Gemini Live to transcribe user audio 2025-05-16 10:53:01 -04:00
Mark Backman
c00e9a8d3a Merge pull request #1819 from kaikato/lmnt-model-langs
LmntTTSService: add model param and additional languages
2025-05-16 08:49:55 -04:00
kaikato
c3b95767f3 LmntTTSService: add model param and additional languages 2025-05-16 04:24:57 +00:00
Mark Backman
90f27a3090 Merge pull request #1816 from pipecat-ai/mb/add-minimax-tts
Add MiniMax TTS
2025-05-15 18:05:13 -04:00
Mark Backman
b6f09defc9 Add MiniMax TTS 2025-05-15 18:02:29 -04:00
Aleix Conchillo Flaqué
172813bcfb Merge pull request #1815 from pipecat-ai/aleix/remove-silerovad-processor
remove SileroVAD() frame processor
2025-05-15 13:44:44 -07:00
Aleix Conchillo Flaqué
95c25efab7 remove SileroVAD() frame processor 2025-05-15 11:55:20 -07:00
Aleix Conchillo Flaqué
a51af35024 Merge pull request #1814 from pipecat-ai/aleix/examples-dependabot-05142025
examples: updates for dependabot 05/14/2025
2025-05-15 11:38:45 -07:00
Mark Backman
119fd5ba7d Merge pull request #1025 from fatwang2/main
added hailuo tts service
2025-05-15 14:29:24 -04:00
Aleix Conchillo Flaqué
0718a812bd examples: updates for dependabot 05/14/2025 2025-05-14 22:51:08 -07:00
Mark Backman
3814501b48 Merge pull request #1811 from pipecat-ai/mb/dont-require-tracing-dep
Fix: Resolve an issue where tracing imports were required
2025-05-14 12:35:47 -04:00
Mark Backman
7a5205dbda Fix: Resolve an issue where tracing imports were required 2025-05-14 12:29:08 -04:00
Thu Nguyen
15a5028d23 Revert log changes 2025-05-14 22:28:25 +08:00
Thu Nguyen
fee2648ac0 Handle audio transcript from gpt-4o-audio and clean up logs 2025-05-14 13:02:22 +07:00
Varun Singh
04c02c9a20 Merge pull request #1810 from pipecat-ai/vr000m-receiving-custom-sip-headers
added handling for sipHeaders
2025-05-13 23:02:14 -07:00
Varun Singh
0ff7195a83 Update README.md
updating docs
2025-05-13 19:08:43 -04:00
Varun Singh
3b91aa013a added handling for sipHeaders 2025-05-13 16:00:05 -07:00
Mark Backman
50f6235edb Add support for OpenTelemetry tracing (#1729)
* Also added TurnTrackingObserver, TurnTraceObserver, foundational 29, open-telemetry-example
2025-05-13 17:18:11 -04:00
Aleix Conchillo Flaqué
6f4d94f91b Merge pull request #1800 from pipecat-ai/aleix/frame-processors-setup
introduce frame processors setup
2025-05-13 13:18:06 -07:00
Aleix Conchillo Flaqué
83a4c7d443 RTVIProcessor: remove unused code 2025-05-13 11:26:37 -07:00
Aleix Conchillo Flaqué
8171fec925 SmallWebRTCConnection: complain if av package not found 2025-05-13 11:26:37 -07:00
Aleix Conchillo Flaqué
175f352ea7 add FrameProcessor.setup() to setup processors before StartFrame 2025-05-13 11:26:35 -07:00
Filipi da Silva Fuchter
5290161ac4 Merge pull request #1746 from pipecat-ai/simple_chatbot-react-native
Simple chatbot: React Native client
2025-05-13 10:48:09 -03:00
Filipi Fuchter
8762019ed7 Not setting the local audio level when the user stopped speaking. 2025-05-13 10:46:30 -03:00
Filipi Fuchter
61a59fa158 Fixing useNavigation typescript warning. 2025-05-13 10:36:39 -03:00
Filipi Fuchter
55eea20c8e Renaming expo environment variable 2025-05-13 10:32:27 -03:00
kompfner
9a621f0c54 Merge pull request #1805 from pipecat-ai/pk/aws-nova-sonic-aggregate-user-transcription-text
AWS Nova Sonic service - aggregate user transcription text; it was fr…
2025-05-13 09:13:58 -04:00
Paul Kompfner
55fc24e933 AWS Nova Sonic service - aggregate user transcription text; it was fragmented across many conversation history messages before 2025-05-13 09:13:28 -04:00
Filipi da Silva Fuchter
b14608f09b Merge pull request #1799 from pipecat-ai/daily_audio_source
Using audio source for capturing Daily's participant audio
2025-05-13 08:15:10 -03:00
Mark Backman
4a25c57337 Merge pull request #1806 from pipecat-ai/aleix/run-test-observers
tests: allow passing observers to run_test()
2025-05-12 22:10:44 -04:00
Aleix Conchillo Flaqué
f800e35ccb tests: allow passing observers to run_test() 2025-05-12 17:53:02 -07:00
Vanessa Pyne
12d49a9b9d Merge pull request #1801 from pipecat-ai/vp-fix-typo
update examples
2025-05-12 15:33:56 -05:00
vipyne
b25b251a44 update examples 2025-05-12 14:07:17 -05:00
Mattie Ruth
64b2a75a94 Update Modal App: (#1755)
* Update Modal App:

Updated Modal App to include:

1. Latest Modal API usage
2. Ability to launch different Pipecat pipelines, much like the
   simple chatbot example
3. Ability to choose which pipeline is launched via the
   /connect endpoint
4. Added a pipeline option for connecting to a self-hosted LLM
   on Modal
5. Improved READMEs
6. Added a web client for interacting with the Modal deployment

tmp

* Update README
2025-05-12 12:45:43 -05:00
Aleix Conchillo Flaqué
b33a60f3a5 Merge pull request #1793 from pipecat-ai/khk/deepgram-async-fix
Fix Deepgram TTS streaming
2025-05-12 09:59:46 -07:00
Filipi Fuchter
d22dbb1a6d Fixing ruff format. 2025-05-12 10:36:21 -03:00
Filipi Fuchter
983199a6cd New example capturing the audio from the participant using the custom audio source. 2025-05-12 10:18:43 -03:00
Filipi Fuchter
133d7ee33a Fixing the default audio source for capture_participant_audio 2025-05-12 10:16:32 -03:00
Mark Backman
0bd888afc7 Merge pull request #1796 from nikp06/patch-1
Wrong deprecation warning when importing ai_services.py
2025-05-12 09:12:48 -04:00
nikp06
537bd1c58d Update ai_services.py
fix: correct deprecation warning format in ai_services module
2025-05-12 12:01:13 +02:00
Kwindla Hultman Kramer
5ef519fe2c Fix Deepgram TTS to use stream_raw() 2025-05-11 15:40:31 -07:00
Mark Backman
20498fb47f Merge pull request #1790 from AngeloGiacco/angelo/fix-api-key
[elevenlabs tts ] fix api key
2025-05-10 19:16:27 -04:00
Angelo Giacco
b57dfb3b5d fix lint 2025-05-10 16:36:26 +01:00
Angelo Giacco
0355ed4aa1 move api key to ws header 2025-05-10 16:34:01 +01:00
Angelo Giacco
1e76cc7bdc fix: elevenlabs api key 2025-05-10 16:09:20 +01:00
Vanessa Pyne
18c0374126 Merge pull request #1785 from pipecat-ai/vp-small-filenmae-change
39-aws-nova-sonic.py -> 40-aws-nova-sonic.py
2025-05-09 12:19:09 -05:00
Aleix Conchillo Flaqué
7072fba7e7 Merge pull request #1780 from pipecat-ai/aleix/deprecate-google-generativeai
GoogleLLMService: deprecate google-generativeai
2025-05-09 09:18:30 -07:00
Aleix Conchillo Flaqué
3d702a5c39 minor examples cleanup 2025-05-09 09:16:10 -07:00
Aleix Conchillo Flaqué
f31efa42c9 GoogleLLMService: deprecate google-generativeai 2025-05-09 09:14:43 -07:00
vipyne
74b369ff20 39-aws-nova-sonic.py -> 40-aws-nova-sonic.py 2025-05-09 08:30:59 -05:00
Filipi Fuchter
46eed0a59a Bumping to use the latest version of @pipecat-ai/react-native-daily-transport, and removing code not needed. 2025-05-08 18:18:00 -03:00
kompfner
9643296e29 Merge pull request #1779 from pipecat-ai/pk/aws-nova-sonic-missing-params-export
Add missing `Params` export to AWS Nova Sonic module
2025-05-08 16:04:38 -04:00
Paul Kompfner
c83c5b5a34 Add missing Params export to AWS Nova Sonic module 2025-05-08 15:23:25 -04:00
Filipi Fuchter
277e2d7fc0 Merge branch 'main' into simple_chatbot-react-native 2025-05-08 09:03:16 -03:00
Mark Backman
7280e390d9 Merge pull request #1774 from pipecat-ai/mb/moondream-ex-server
Add load_dotenv to moondream example server
2025-05-07 19:02:30 -04:00
Mark Backman
4efc3f0a39 Merge pull request #1775 from pipecat-ai/mb/patient-ex-env
Add load_dotenv to patient-intake server file
2025-05-07 19:02:20 -04:00
Mark Backman
cb7e7a8aa3 Add load_dotenv to patient-intake server file 2025-05-07 18:40:04 -04:00
Mark Backman
9136402846 Add load_dotenv to moondream example server 2025-05-07 18:29:27 -04:00
Aleix Conchillo Flaqué
260fc76137 Merge pull request #1773 from pipecat-ai/aleix/pipecat-0.0.67
update CHANGELOG for 0.0.67
2025-05-07 15:05:55 -07:00
Aleix Conchillo Flaqué
7cfb9a4d15 update CHANGELOG for 0.0.67 2025-05-07 14:59:16 -07:00
Mark Backman
2089e0c974 Merge pull request #1768 from pipecat-ai/mb/update-observers
Add DebugLogObserver
2025-05-07 17:30:49 -04:00
Mark Backman
9e0b4fe5d1 Replace list with tuple 2025-05-07 17:21:09 -04:00
Mark Backman
75ce632f84 Add DebugLogObserver 2025-05-07 17:21:08 -04:00
Mark Backman
efeb96c4e8 Remove unused imports 2025-05-07 17:20:42 -04:00
kompfner
fb5438e9c2 Merge pull request #1770 from pipecat-ai/pk/amazon-nova-sonic-interruption-reliability
AWS Nova Sonic service - make interruption handling more reliable, in…
2025-05-07 17:16:06 -04:00
Mark Backman
7da9f66e1c Merge pull request #1761 from pipecat-ai/mb/elevenlabs-context-id
Update ElevenLabsTTSService to use the new websocket API
2025-05-07 17:12:06 -04:00
Mark Backman
9e16e3d614 Update ElevenLabsTTSService to use the new websocket API 2025-05-07 17:09:52 -04:00
Paul Kompfner
84d040c6d0 AWS Nova Sonic service - make interruption handling more reliable, in terms of:
- not getting the conversation into a "stuck" state
- not losing assistant text that should've made it into the context
2025-05-07 16:34:18 -04:00
Mark Backman
f3e0beb8f1 Merge pull request #1762 from pipecat-ai/iss-1734-rtvi-function-call-breakage
Revert breaking change in RTVI protocol for function calling
2025-05-07 15:25:22 -04:00
Aleix Conchillo Flaqué
e00a1196ef Merge pull request #1767 from pipecat-ai/aleix/daily-python-0.18.2
pyproject: update daily-python to 0.18.2
2025-05-07 12:19:59 -07:00
Aleix Conchillo Flaqué
3867c0f8e7 Merge pull request #1766 from pipecat-ai/aleix/daily-fix-multiple-audio-video-sources
fix multiple audio video sources
2025-05-07 12:19:46 -07:00
Aleix Conchillo Flaqué
cdf0953722 pyproject: update daily-python to 0.18.2 2025-05-07 11:56:36 -07:00
Aleix Conchillo Flaqué
ed00f7d071 add video_source field to UserImageRequestFrame 2025-05-07 11:50:21 -07:00
Aleix Conchillo Flaqué
a3038afa02 DailyTransport: fix multiple audio/video sources 2025-05-07 11:50:00 -07:00
kompfner
f9ca0b8cc6 Merge pull request #1704 from pipecat-ai/pk/amazon-nova-sonic
Amazon Nova Sonic LLM service
2025-05-07 14:45:28 -04:00
Paul Kompfner
2920aa5af4 [WIP] AWS Nova Sonic service - pull AWS Nova Sonic support out of the aws optional dependency in pyproject.toml and into its own aws-nova-sonic optional dependency. That's because it requires Python >= 3.12, a higher version than the base project's 3.10. This change allows anyone using any of the other AWS services (including our own unit tests) to continue using the lower Python version. 2025-05-07 14:32:32 -04:00
Paul Kompfner
93c9cc4a0e [WIP] AWS Nova Sonic service - minor fix 2025-05-07 13:54:06 -04:00
Paul Kompfner
b53f9235e4 [WIP] AWS Nova Sonic service - remove unnecessary _context_available state, instead just relying on the presence of _context 2025-05-07 13:54:06 -04:00
Paul Kompfner
1491462d15 [WIP] AWS Nova Sonic service - remove _handling_bot_stopped_speaking, which no longer seems to be necessary; I'm no longer observing back-to-back BotStoppedSpeaking frames 2025-05-07 13:54:06 -04:00
Paul Kompfner
c78f779800 [WIP] AWS Nova Sonic service - log an error message if you try to use AWS Nova Sonic without the proper dependency (e.g. without having done pip install pipecat-ai[aws]) 2025-05-07 13:54:06 -04:00
Paul Kompfner
b013e375fb [WIP] AWS Nova Sonic service - simplify a bit of logic (and do the same simplification in the OpenAI Realtime service) 2025-05-07 13:54:06 -04:00
Paul Kompfner
52036138c1 [WIP] AWS Nova Sonic service - remove unnecessary (no-op) code 2025-05-07 13:54:06 -04:00
Paul Kompfner
4ba9a42861 [WIP] AWS Nova Sonic service - add more accurate typing 2025-05-07 13:54:06 -04:00
Paul Kompfner
27bff7a759 [WIP] AWS Nova Sonic service - fix comment 2025-05-07 13:54:06 -04:00
Paul Kompfner
896f8d85f7 [WIP] AWS Nova Sonic service - remove out-of-date TODO comment 2025-05-07 13:54:06 -04:00
Paul Kompfner
ed06cdd2c7 [WIP] AWS Nova Sonic service - add CHANGELOG entry 2025-05-07 13:54:02 -04:00
Paul Kompfner
8473647269 [WIP] AWS Nova Sonic service - update persistent-context example to better avoid saving "transitional", as opposed to meaningful, context messages 2025-05-07 13:52:51 -04:00
Paul Kompfner
5579145a06 [WIP] AWS Nova Sonic service - post-rebase, update examples to play nicely with recent pipecat changes 2025-05-07 13:52:51 -04:00
Paul Kompfner
35848d10b3 [WIP] AWS Nova Sonic service - remove various TODO comments 2025-05-07 13:52:51 -04:00
Paul Kompfner
c7e223e85a [WIP] AWS Nova Sonic service - remove print statements in favor of logger 2025-05-07 13:52:51 -04:00
Paul Kompfner
885b2d1d2f [WIP] AWS Nova Sonic service - make parameters configurable 2025-05-07 13:52:51 -04:00
Paul Kompfner
73020be511 [WIP] AWS Nova Sonic service - minor fix: only try to read received JSON if we have it 2025-05-07 13:52:51 -04:00
Paul Kompfner
d388c057c0 [WIP] AWS Nova Sonic service - recover from unwanted disconnection due to an error 2025-05-07 13:52:51 -04:00
Paul Kompfner
c4d0f91a7f [WIP] AWS Nova Sonic service - remove some old code that was accidentally still there, possibly sending a duplicate system instruction 2025-05-07 13:52:51 -04:00
Paul Kompfner
467233be04 [WIP] AWS Nova Sonic service - support multi-line system prompt 2025-05-07 13:52:51 -04:00
Paul Kompfner
2b02d08f4c [WIP] AWS Nova Sonic service - add comments to examples pointing out the us-east-1 is the only supported region so far 2025-05-07 13:52:51 -04:00
Paul Kompfner
9fe265ea64 [WIP] AWS Nova Sonic service - implement ability to persist and load conversations 2025-05-07 13:52:51 -04:00
Paul Kompfner
cc1f4ba81c [WIP] AWS Nova Sonic service - add a hacky way of programmatically triggering an assistant response 2025-05-07 13:52:51 -04:00
Paul Kompfner
3784bdbd27 [WIP] AWS Nova Sonic service - in our hacky direct manipulation of the context, aggregate assistant text rather than recording every chunk as a separate message 2025-05-07 13:52:51 -04:00
Paul Kompfner
4ffdc3b77c [WIP] AWS Nova Sonic service - do hacky direct manipulation of the context for now, since I can't seem to get assistant context aggregation working properly with frames, grr 2025-05-07 13:52:51 -04:00
Paul Kompfner
38c9fa681a [WIP] AWS Nova Sonic service - Protect against back-to-back BotStoppedSpeaking calls, which I've observed 2025-05-07 13:52:51 -04:00
Paul Kompfner
c477039954 [WIP] AWS Nova Sonic service - just for safety, add a short delay after BotStoppedSpeaking before sending LLMFullResponseEndFrame + TTSStoppedFrame, to give a bit of leeway for the LLM to deliver the "FINAL" text block describing what was said 2025-05-07 13:52:51 -04:00
Paul Kompfner
d6ef3d64ac [WIP] AWS Nova Sonic service - fix context problems of double-counting LLM text, and mis-categorizing user text as LLM text 2025-05-07 13:52:51 -04:00
Paul Kompfner
6938152db6 [WIP] AWS Nova Sonic service - fix comment 2025-05-07 13:52:51 -04:00
Paul Kompfner
2154db07f0 [WIP] AWS Nova Sonic service - remove unnecessary error log 2025-05-07 13:52:51 -04:00
Paul Kompfner
5e0803479e [WIP] AWS Nova Sonic service - add send_transcription_frames option 2025-05-07 13:52:51 -04:00
Paul Kompfner
3960c604a4 [WIP] AWS Nova Sonic service - fix empty assistant conversation history item in the context after tool use 2025-05-07 13:52:51 -04:00
Paul Kompfner
394648f1c9 [WIP] AWS Nova Sonic service - fix user utterances not making it into the context 2025-05-07 13:52:51 -04:00
Paul Kompfner
da5c4953d5 [WIP] AWS Nova Sonic service - allow passing in tools into initializer 2025-05-07 13:52:51 -04:00
Paul Kompfner
2b7e1cb5b1 [WIP] AWS Nova Sonic service - add tool calling 2025-05-07 13:52:51 -04:00
Paul Kompfner
f182eafb40 [WIP] AWS Nova Sonic service - add ability to pass in OpenAILLMContext 2025-05-07 13:52:51 -04:00
Paul Kompfner
9f7f42e885 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
9b8bce1914 [WIP] AWS Nova Sonic service - add voice_id 2025-05-07 13:52:51 -04:00
Paul Kompfner
96d05e12fc [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
68c1069548 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
5b64613f65 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
1f9baefba8 [WIP] AWS Nova Sonic service - added stubs for handling interruption and user-started-speaking frames 2025-05-07 13:52:51 -04:00
Paul Kompfner
0c255d2618 [WIP] AWS Nova Sonic service - added TTSTextFrame and reworked/cleaned up some bookkeeping logic 2025-05-07 13:52:51 -04:00
Paul Kompfner
a38206de9c [WIP] AWS Nova Sonic service - added TranscriptionFrame 2025-05-07 13:52:51 -04:00
Paul Kompfner
260f7c9b85 [WIP] AWS Nova Sonic service - format 2025-05-07 13:52:51 -04:00
Paul Kompfner
de294caed9 [WIP] AWS Nova Sonic service - added LLMFullResponseStartFrame, LLMTextFrame, and LLMFullResponseEndFrame 2025-05-07 13:52:51 -04:00
Paul Kompfner
e40aa4f99a [WIP] AWS Nova Sonic service - added TTSStartedFrame and TTSStoppedFrame 2025-05-07 13:52:51 -04:00
Paul Kompfner
b1d413b9be [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
8cbad070ad [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
13569a5a5a [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
d789334a60 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
7668b27fc0 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
6d30f441e8 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
a9e395b366 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
5e5626f04f [WIP] AWS Nova Sonic service 2025-05-07 13:52:47 -04:00
Aleix Conchillo Flaqué
d80aa5b44e Merge pull request #1753 from pipecat-ai/aleix/add-bedrock-support
Add support for Amazon Bedrock LLMs
2025-05-07 09:31:48 -07:00
Aleix Conchillo Flaqué
80ef6dc4de update README with AWS Bedrock and Transcribe 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
458549f7df AWSBedrockLLMService: fix function calling 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
a8405649d0 aws: use AWS prefix for all services 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
ce1a72850b tests: add bedrock context aggregator tests 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
58de381746 AWS: add missing utils 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
bed2e894a2 BedrockLLMService: pull initial system frame from messages 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
b4de98cfb7 AWS: various cleanups (logs, imports...) 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
a4b9db9e07 fix formatting 2025-05-07 09:26:26 -07:00
Adithya Suresh
664111a3c9 Added cache related info to metrics 2025-05-07 09:26:26 -07:00
Adithya Suresh
aa964847f3 System param to be a list 2025-05-07 09:26:26 -07:00
Adithya Suresh
fa5cac7e0a Bug fix in content format 2025-05-07 09:26:26 -07:00
Adithya Suresh
b2b01861b2 Remove model restriction 2025-05-07 09:26:26 -07:00
Adithya Suresh
f014f718eb Restructured STT and enabled prosody tags for generative Polly 2025-05-07 09:26:26 -07:00
Adithya Suresh
05ae8d3ffa Removed OpenAI based context formatting 2025-05-07 09:26:26 -07:00
Adithya Suresh
88c9e08bd8 Updated tools parsing logic 2025-05-07 09:26:26 -07:00
Adithya Suresh
844f61dfea Initial implementation 2025-05-07 09:26:26 -07:00
Tico Ballagas
acb7d597cb Change example to use generative voices 2025-05-07 09:19:49 -07:00
Tico Ballagas
2b18f60261 Initial implementation of AWS Transcribe TTS 2025-05-07 09:19:49 -07:00
mattie ruth backman
5b66133a6c Revert breaking change in RTVI protocol for function calling 2025-05-07 12:08:28 -04:00
Mark Backman
0c5bc6a57a Merge pull request #1760 from WebinarGeek/wg/daily-active-speaker-event
DailyTransport: added on_active_speaker_changed event handler
2025-05-07 11:17:49 -04:00
Mark Backman
7981e00955 Merge pull request #1759 from pipecat-ai/mb/readme-nvidia-riva
Update README with Riva services
2025-05-07 11:13:51 -04:00
Dan Berg
5e39c0cfeb DailyTransport: added on_active_speaker_changed event handler 2025-05-07 15:22:30 +02:00
Mark Backman
a444701929 Update README with Riva services 2025-05-07 09:02:08 -04:00
Mark Backman
f6c1eb5d9d Merge pull request #1757 from pipecat-ai/mb/remove-canonical
Removing CanonicalMetricsService
2025-05-06 21:38:03 -04:00
Mark Backman
a1d46cb26b Removing CanonicalMetricsService 2025-05-06 21:23:23 -04:00
Aleix Conchillo Flaqué
99ab148d88 Merge pull request #1739 from pipecat-ai/aleix/observers-frame-pushed-class
BaseObserver: add FramePushed class and deprecate multiple arguments
2025-05-06 15:29:05 -07:00
Aleix Conchillo Flaqué
d69fa5dba5 update CHANGELOG with UltravoxSTTService fix 2025-05-06 15:26:25 -07:00
Aleix Conchillo Flaqué
0d30b000af BaseObserver: add FramePushed class and deprecated multiple arguments 2025-05-06 15:26:23 -07:00
Mark Backman
e7c0e742d2 Merge pull request #1752 from pipecat-ai/mb/deepgram-tts-aura-2
Update Deepgram TTS default voice to Aura 2 voice
2025-05-06 16:26:26 -04:00
Mark Backman
2aff2dcca3 Merge pull request #1751 from pipecat-ai/mb/11labs-enable_ssml_parsing
Add enable_ssml_parsing and enable_logging to ElevenLabsTTSService
2025-05-06 16:25:20 -04:00
Mark Backman
288f8865c8 Add enable_logging to ElevenLabsTTSService 2025-05-06 12:13:26 -04:00
Mark Backman
8691870bcb Update Deepgram TTS default voice to Aura 2 voice 2025-05-06 11:29:32 -04:00
Mark Backman
e06146c237 Add enable_ssml_parsing to ElevenLabsTTSService 2025-05-06 11:06:57 -04:00
Aleix Conchillo Flaqué
c68e990cda Merge pull request #1748 from pipecat-ai/aleix/task-manager-dictionary
task manager dictionary and cleanup PipelineTask
2025-05-06 07:57:53 -07:00
Aleix Conchillo Flaqué
4583905313 PipelineTask: cleanup if task is cancelled from outside Pipecat 2025-05-05 21:33:21 -07:00
Aleix Conchillo Flaqué
9cc498b1fa TaskManager: use a dictionary instead of a set to store tasks 2025-05-05 21:27:49 -07:00
Mark Backman
b3c5dc4045 Merge pull request #1443 from adithyaxx/anthropic-client-bug-fixes
Handle missing token counts in AsyncAnthropicBedrock client properly
2025-05-05 21:13:11 -04:00
Filipi Fuchter
56ca7360ae Fixing versions 2025-05-05 19:11:59 -03:00
Filipi Fuchter
d5ab3251f0 Bumping the dependencies, updating readme, adding .gitignore. 2025-05-05 18:43:04 -03:00
Filipi Fuchter
915c284420 Fixing readme 2025-05-05 18:32:04 -03:00
Aleix Conchillo Flaqué
3824da7261 Merge pull request #1745 from pipecat-ai/aleix/make-sure-transports-are-ready
only send data to transports after they are really ready
2025-05-05 14:21:36 -07:00
Filipi Fuchter
40154824e8 Creating a RN example for simple-chatbot 2025-05-05 18:17:39 -03:00
Aleix Conchillo Flaqué
855d567b1e only send data to transports after they are really ready 2025-05-05 14:06:58 -07:00
Mark Backman
b323a7bd88 Merge pull request #1742 from pipecat-ai/mb/pcc-krisp-filter
Update pipecat-cloud-example to use Krisp in PCC deployment only
2025-05-05 15:46:12 -04:00
Mark Backman
fa011d0018 Update pipecat-cloud-example to use Krisp in PCC deployment only 2025-05-05 15:09:29 -04:00
Aleix Conchillo Flaqué
e15fa8777a Merge pull request #1737 from CerebriumAI/kyle/fix-ultravox-spacing
[Fix] Ultravox frame spacing issue
2025-05-05 09:34:49 -07:00
Aleix Conchillo Flaqué
2143a6d927 Merge pull request #1732 from pipecat-ai/aleix/daily-remote-custom-tracks
DailyTransport: remove custom tracks before leaving
2025-05-05 08:44:11 -07:00
Aleix Conchillo Flaqué
044e2d3e73 DailyTransport: remove custom tracks before leaving 2025-05-05 08:35:35 -07:00
Kyle Gani
be112ec63f Merge branch 'kyle/fix-ultravox-performance' of github.com:CerebriumAI/pipecat into kyle/fix-ultravox-performance 2025-05-05 17:13:26 +02:00
Kyle Gani
d2f56c4e8f Fix: Spacing issue 2025-05-05 17:13:21 +02:00
Mark Backman
ddc6a9c695 Merge pull request #1670 from pipecat-ai/mb/daily-twilio-sip-example
Add standalone Daily + Twilio SIP example
2025-05-05 10:57:16 -04:00
Mark Backman
2bebdbc371 Merge pull request #1671 from pipecat-ai/khk/rime-arcana
support for rime arcana model
2025-05-05 10:54:50 -04:00
Mark Backman
8b9f1f0608 Add a changelog entry 2025-05-05 10:51:46 -04:00
Kwindla Hultman Kramer
b25f3b2ed2 support for rime arcana model 2025-05-05 10:50:46 -04:00
Mark Backman
a995cf81b6 Merge pull request #1724 from pipecat-ai/mb/demo-fixes
Demo fixes
2025-05-05 08:44:57 -04:00
Aleix Conchillo Flaqué
75d261639f Merge pull request #1726 from pipecat-ai/aleix/pipecat-0.0.66
update CHANGELOG for pipecat 0.0.66
2025-05-02 20:54:57 -07:00
Aleix Conchillo Flaqué
f720d795d0 update CHANGELOG for pipecat 0.0.66 2025-05-02 20:29:51 -07:00
Aleix Conchillo Flaqué
f6fe83e358 Merge pull request #1725 from pipecat-ai/aleix/update-daily-python-0.18.1
update to daily-python 0.18.1
2025-05-02 20:27:50 -07:00
Mark Backman
0513d0b6a8 Update README 2025-05-02 22:44:50 -04:00
Mark Backman
0679bb217d Remove Twilio from phone-chatbot directory 2025-05-02 22:18:50 -04:00
Mark Backman
38bd55e518 Update README 2025-05-02 22:18:50 -04:00
Mark Backman
65c7423280 Add other dial-in event handlers 2025-05-02 22:18:50 -04:00
Mark Backman
f24a85cc94 Add logic to only forward the first on_dialin_ready event 2025-05-02 22:18:50 -04:00
Mark Backman
53887b7c98 Display phone number in WebRTC call 2025-05-02 22:18:50 -04:00
Mark Backman
523c012c38 Use a Twilio asset to ring the phone throughout 2025-05-02 22:18:50 -04:00
Mark Backman
97c28989c1 Add standalone Daily + Twilio SIP example 2025-05-02 22:18:50 -04:00
Mark Backman
c19be6ebb2 Demo fixes 2025-05-02 20:58:10 -04:00
Aleix Conchillo Flaqué
54971a0735 update to daily-python 0.18.1 2025-05-02 17:47:44 -07:00
Mark Backman
4513e81e13 Merge pull request #1723 from pipecat-ai/mb/base-output-bot-speaking-log
Only display the destination in the bot started/stopped speaking log …
2025-05-02 17:32:47 -04:00
Mark Backman
872204b795 Only display the destination in the bot started/stopped speaking log when there is a desintation 2025-05-02 17:29:28 -04:00
Aleix Conchillo Flaqué
a94cbfe6f5 Merge pull request #1722 from pipecat-ai/aleix/base-output-transport-audio-task-fix
BaseOutputTransport: always initialize audio task
2025-05-02 14:26:30 -07:00
Aleix Conchillo Flaqué
7152faafb2 BaseOutputTransport: always initialize audio task
We also use the audio task to also send synchronized images with audio.
2025-05-02 14:23:15 -07:00
Mark Backman
e6aadaccd8 Merge pull request #1721 from pipecat-ai/mb/simli-silent-frames
Fix: SimliVideoService was continuously emitting audio, preventing Bo…
2025-05-02 16:44:39 -04:00
Mark Backman
3a73aa71b8 Merge pull request #1613 from pipecat-ai/mb/improve-storybot-readme
demo: Restructure storytelling-chatbot directory, update README steps…
2025-05-02 16:39:59 -04:00
Mark Backman
814e7509e1 demo: Restructure storytelling-chatbot directory, update README steps, link to vercel demo 2025-05-02 16:37:37 -04:00
Vanessa Pyne
e0cf5ec016 Merge pull request #1705 from pipecat-ai/vp-update-nvidia-models
Riva Service: add magpie-tts-multilingual model
2025-05-02 15:34:23 -05:00
vipyne
667bd32e6a Riva: remove deprecated lines in example 2025-05-02 15:33:10 -05:00
vipyne
b2ecd83706 update CHANGELOG with Riva details 2025-05-02 15:33:10 -05:00
vipyne
b2754117c8 Riva: refactor function_id and model_name 2025-05-02 15:33:10 -05:00
vipyne
6c428c303b update magpie voice 2025-05-02 15:33:10 -05:00
Mark Backman
e7d889a143 Update RivaSTTService to use by default 2025-05-02 15:33:10 -05:00
Mark Backman
da60e7069b Update pyproject.toml to use nvidia-riva-client 2.19.1 2025-05-02 15:33:10 -05:00
Mark Backman
c14406a3b9 Demos use the latest services 2025-05-02 15:33:10 -05:00
Mark Backman
725ab5ec21 Small fixes: No default api_key of None, ParakeetSTTService uses RivaSTTService.InputParams 2025-05-02 15:33:10 -05:00
Mark Backman
daf9d47e58 Update RivaSegmentedSTTService 2025-05-02 15:33:10 -05:00
vipyne
63a65627a2 Riva Service: add magpie-tts-multilingual model 2025-05-02 15:33:10 -05:00
Mark Backman
02c07755b0 Add Changelog entry for PR 1707 2025-05-02 15:33:10 -05:00
Matt Kim
15cbd18acc [Rime] Add phonemizeBetweenBrackets and pauseBetweenBrackets to RimeTTSService (ws)
There is a fix incoming in
2025-05-02 15:33:10 -05:00
Kwindla Hultman Kramer
93c40b87dc small groq updates 2025-05-02 15:33:10 -05:00
Mark Backman
eeaa9f67a1 Fix: SimliVideoService was continuously emitting audio, preventing BotStoppedSpeakingFrame from being sent 2025-05-02 16:32:42 -04:00
Mark Backman
b60691c7b2 Merge pull request #1720 from pipecat-ai/mb/changelog-pr-1707
Add Changelog entry for PR 1707
2025-05-02 16:13:40 -04:00
Mark Backman
2bb1b0b343 Add Changelog entry for PR 1707 2025-05-02 16:09:50 -04:00
Mark Backman
047ef9f86c Merge pull request #1707 from rimelabs/matt/rime/url_param_serialization
[Rime] Add new params to RimeTTSService
2025-05-02 16:08:01 -04:00
Kwindla Hultman Kramer
9a2c603c91 Merge pull request #1711 from pipecat-ai/khk/groq-updates 2025-05-02 12:21:15 -07:00
Filipi da Silva Fuchter
94c4169407 Merge pull request #1717 from pipecat-ai/local_smart_turn_torch
Local smart turn torch
2025-05-02 15:53:30 -03:00
Filipi Fuchter
cb8a551db8 Mentioning the new LocalSmartTurnAnalyzer in the changelog. 2025-05-02 14:32:18 -03:00
Filipi Fuchter
779f09af70 Fixing lint. 2025-05-02 14:22:38 -03:00
Filipi Fuchter
19dc0f2bfb New example using the local smart turn 2025-05-02 14:21:42 -03:00
Filipi Fuchter
f0709e22ba Creating a local smart turn using torch. 2025-05-02 14:21:29 -03:00
Mark Backman
8250736f5e Merge pull request #1708 from pipecat-ai/mb/gemini-user-context
Push GeminiMultimodalLiveLLMService TranscriptionFrame Upstream, remo…
2025-05-02 13:10:27 -04:00
Mark Backman
83348a9f93 Merge pull request #1714 from pipecat-ai/mb/fix-gemini-text-modality
Restore TEXT modalities support to GeminiMultimodalLiveLLMService
2025-05-02 10:41:05 -04:00
Mark Backman
96d40903a9 Only send TTSStoppedFrame from Gemini when in AUDIO mode, only send one LLMFullResponseEndFrame 2025-05-02 10:18:53 -04:00
Aleix Conchillo Flaqué
2560811805 Merge pull request #1697 from pipecat-ai/aleix/daily-custom-audio-tracks
add support for multiple transport destinations
2025-05-02 06:34:09 -07:00
Mark Backman
2b8c44c008 Merge pull request #1710 from pipecat-ai/mb/openai-context-aggregation
fix: OpenAIRealtimeBetaLLMService writes two assistant messages to th…
2025-05-02 07:43:35 -04:00
Mark Backman
38e2d37674 Restore TEXT modalities support to GeminiMultimodalLiveLLMService 2025-05-02 07:36:12 -04:00
Vanessa Pyne
6278561f88 Merge pull request #1709 from pipecat-ai/vp-fix-fastpitch-params-update
Riva TTS: update FastPitch params
2025-05-01 21:23:10 -05:00
Aleix Conchillo Flaqué
750e79c1ce DailyParams: rename to camera/microphone_out_enabled 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
71eb2963c5 examples: added daily-custom-tracks 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
f44e2c86ea BaseOutputTransport: compute sample_rate and audio_chunk_size in main class 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
afe1f0df8c DailyTransport: make sure we can write audio frames to destination 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
458fddfb48 update CHANGELOG with new Daily and Transport features 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
8d915c5ccb DailyParams: allow enabling/disabling camera/microphone tracks 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
304153dd03 TTSService: set transport destination to all TTS frames 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
a6781b7352 rename destination to transport_destination 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
5ad0058303 update CHANGELOG with frame source/destination support 2025-05-01 19:11:13 -07:00
Aleix Conchillo Flaqué
75c039de33 examples: add daily-multi-translation 2025-05-01 19:11:13 -07:00
Aleix Conchillo Flaqué
74e3c3677e DailyTransport: fix audio/video renderers registration 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
dc20327f10 DailyTransport: register audio destination and use custom tracks 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
e738affd29 BaseOutputTransport: allow sending audio/video to multiple destinations 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
ef3d732607 DailyTransport: allow capturing multiple simultaneous audio/video sources 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
6d63cff1bf DailyTransport: custom audio tracks support 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
12f42605a1 pyproject: update daily-python to 0.18.0 2025-05-01 18:58:44 -07:00
Kwindla Hultman Kramer
fac3337927 small groq updates 2025-05-01 17:09:15 -07:00
Mark Backman
76d198151c Push GeminiMultimodalLiveLLMService TranscriptionFrame Upstream, remove direct context addition 2025-05-01 15:41:04 -04:00
Mark Backman
6a907058de fix: OpenAIRealtimeBetaLLMService writes two assistant messages to the context 2025-05-01 15:37:39 -04:00
vipyne
6e1f531f64 Riva TTS: update FastPitch params
91138c3f66 (diff-ece228577b1d233ce600a948243f90cece53e3a9b89554a0b27a48bc4d6e0fdfR45)
2025-05-01 11:14:41 -05:00
Matt Kim
4232cca5b6 [Rime] Add phonemizeBetweenBrackets and pauseBetweenBrackets to RimeTTSService (ws)
There is a fix incoming in
2025-04-30 18:09:22 -07:00
Jin Kim
cf2f249f8a Use "use_original_timestamps" only for sonic-2 model 2025-04-27 19:33:14 +09:00
Prem Adithya
c510870736 Merge branch 'pipecat-ai:main' into anthropic-client-bug-fixes 2025-04-04 16:41:04 +11:00
Adithya Suresh
e8783f6a33 Handle cache token counts being none 2025-03-31 15:25:11 +11:00
fatwang2
8cda4512ad Merge branch 'pipecat-ai:main' into main 2025-02-06 10:50:25 +08:00
fatwang2
fc90bdc638 changed to HailuoHttpTTSService 2025-01-19 09:43:48 +08:00
fatwang2
5a88165a26 Merge branch 'pipecat-ai:main' into main 2025-01-19 09:40:08 +08:00
fatwang2
3466842cd4 add hailuo tts service 2025-01-17 12:46:05 +08:00
635 changed files with 39507 additions and 8881 deletions

View File

@@ -6,11 +6,13 @@ on:
- main
paths:
- "examples/simple-chatbot/client/android/**"
- "examples/p2p-webrtc/video-transform/client/android/**"
pull_request:
branches:
- "**"
paths:
- "examples/simple-chatbot/client/android/**"
- "examples/p2p-webrtc/video-transform/client/android/**"
workflow_dispatch:
inputs:
sdk_git_ref:
@@ -23,7 +25,7 @@ concurrency:
jobs:
sdk:
name: "Simple chatbot demo"
name: "Demo apps"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
@@ -37,12 +39,22 @@ jobs:
distribution: 'temurin'
java-version: '17'
- name: Build demo app
- name: "Example app: Simple Chatbot"
working-directory: examples/simple-chatbot/client/android
run: ./gradlew :simple-chatbot-client:assembleDebug
- name: Upload demo APK
- name: Upload Simple Chatbot APK
uses: actions/upload-artifact@v4
with:
name: Simple Chatbot Android Client
path: examples/simple-chatbot/client/android/simple-chatbot-client/build/outputs/apk/debug/simple-chatbot-client-debug.apk
- name: "Example app: Small WebRTC Client"
working-directory: examples/p2p-webrtc/video-transform/client/android
run: ./gradlew :small-webrtc-client:assembleDebug
- name: Upload Small WebRTC APK
uses: actions/upload-artifact@v4
with:
name: Small WebRTC Android Client
path: examples/p2p-webrtc/video-transform/client/android/small-webrtc-client/build/outputs/apk/debug/small-webrtc-client-debug.apk

View File

@@ -5,10 +5,481 @@ All notable changes to **Pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [0.0.71] - 2025-06-10
### Added
- Adds a parameter called `additional_span_attributes` to PipelineTask that
lets you add any additional attributes you'd like to the conversation span.
### Fixed
- Fixed an issue with `CartesiaSTTService` initialization.
## [0.0.70] - 2025-06-10
### Added
- Added `ExotelFrameSerializer` to handle telephony calls via Exotel.
- Added the option `informal` to `TranslationConfig` on Gladia config.
Allowing to force informal language forms when available.
- Added `CartesiaSTTService` which is a websocket based implementation to
transcribe audio. Added a foundational example in
`13f-cartesia-transcription.py`
- Added an `websocket` example, showing how to use the new Pipecat client
`WebsocketTransport` to connect with Pipecat `FastAPIWebsocketTransport` or
`WebsocketServerTransport`.
- Added language support to `RimeHttpTTSService`. Extended languages to include
German and French for both `RimeTTSService` and `RimeHttpTTSService`.
### Changed
- Upgraded `daily-python` to 0.19.2.
- Make `PipelineTask.add_observer()` synchronous. This allows callers to call it
before doing the work of running the `PipelineTask` (i.e. without invoking
`PipelineTask.set_event_loop()` first).
- Pipecat 0.0.69 forced `uvloop` event loop on Linux on macOS. Unfortunately,
this is causing issue in some systems. So, `uvloop` is not enabled by default
anymore. If you want to use `uvloop` you can just set the `asyncio` event
policy before starting your agent with:
```python
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
```
### Fixed
- Fixed an issue with various TTS services that would cause audio glitches at
the start of every bot turn.
- Fixed an `ElevenLabsTTSService` issue where a context warning was printed
when pushing a `TTSSpeakFrame`.
- Fixed an `AssemblyAISTTService` issue that could cause unexpected behavior
when yielding empty `Frame()`s.
- Fixed an issue where `OutputAudioRawFrame.transport_destination` was being
reset to `None` instead of retaining its intended value before sending the
audio frame to `write_audio_frame`.
- Fixed a typo in Livekit transport that prevented initialization.
## [0.0.69] - 2025-06-02 "AI Engineer World's Fair release" ✨
### Added
- Added a new frame `FunctionCallsStartedFrame`. This frame is pushed both
upstream and downstream from the LLM service to indicate that one or more
function calls are going to be executed.
- Added LLM services `on_function_calls_started` event. This event will be
triggered when the LLM service receives function calls from the model and is
going to start executing them.
- Function calls can now be executed sequentially (in the order received in the
completion) by passing `run_in_parallel=False` when creating your LLM
service. By default, if the LLM completion returns 2 or more function calls
they run concurrently. In both cases, concurrently and sequentially, a new LLM
completion will run when the last function call finishes.
- Added OpenTelemetry tracing for `GeminiMultimodalLiveLLMService` and
`OpenAIRealtimeBetaLLMService`.
- Added initial support for interruption strategies, which determine if the user
should interrupt the bot while the bot is speaking. Interruption strategies
can be based on factors such as audio volume or the number of words spoken by
the user. These can be specified via the new `interruption_strategies` field
in `PipelineParams`. A new `MinWordsInterruptionStrategy` strategy has been
introduced which triggers an interruption if the user has spoken a minimum
number of words. If no interruption strategies are specified, the normal
interruption behavior applies. If multiple strategies are provided, the first
one that evaluates to true will trigger the interruption.
- `BaseInputTransport` now handles `StopFrame`. When a `StopFrame` is received
the transport will pause sending frames downstream until a new `StartFrame` is
received. This allows the transport to be reused (keeping the same connection)
in a different pipeline.
- Updated AssemblyAI STT service to support their latest streaming
speech-to-text model with improved transcription latency and endpointing.
- You can now access STT service results through the new
`TranscriptionFrame.result` and `InterimTranscriptionFrame.result` field. This
is useful in case you use some specific settings for the STT and you want to
access the STT results.
- The examples runner is now public from the `pipecat.examples` package. This
allows everyone to build their own examples and run them easily.
- It is now possible to push `OutputDTMFFrame` or `OutputDTMFUrgentFrame` with
`DailyTransport`. This will be sent properly if a Daily dial-out connection
has been established.
- Added `OutputDTMFUrgentFrame` to send a DTMF keypress quickly. The previous
`OutputDTMFFrame` queues the keypress with the rest of data frames.
- Added `DTMFAggregator`, which aggregates keypad presses into
`TranscriptionFrame`s. Aggregation occurs after a timeout, termination key
press, or user interruption. You can specify the prefix of the
`TranscriptionFrame`.
- Added new functions `DailyTransport.start_transcription()` and
`DailyTransport.stop_transcription()` to be able to start and stop Daily
transcription dynamically (maybe with different settings).
### Changed
- Reverted the default model for `GeminiMultimodalLiveLLMService` back to
`models/gemini-2.0-flash-live-001`.
`gemini-2.5-flash-preview-native-audio-dialog` has inconsistent performance.
You can opt in to using this model by setting the `model` arg.
- Function calls are now cancelled by default if there's an interruption. To
disable this behavior you can set `cancel_on_interruption=False` when
registering the function call. Since function calls are executed as tasks you
can tell if a function call has been cancelled by catching the
`asyncio.CancelledError` exception (and don't forget to raise it again!).
- Updated OpenTelemetry tracing attribute `metrics.ttfb_ms` to `metrics.ttfb`.
The attribute reports TTFB in seconds.
### Deprecated
- `DailyTransport.send_dtmf()` is deprecated, push an `OutputDTMFFrame` or an
`OutputDTMFUrgentFrame` instead.
### Fixed
- Fixed an issue with `ElevenLabsTTSService` where long responses would
continue generating output even after an interruption.
- Fixed an issue with the `OpenAILLMContext` where non-Roman characters were
being incorrectly encoded as Unicode escape sequences. This was a logging
issue and did not impact the actual conversation.
- In `AWSBedrockLLMService`, worked around a possible bug in AWS Bedrock where
a `toolConfig` is required if there has been previous tool use in the
messages array. This workaround includes a no_op factory function call is
used to satisfy the requirement.
- Fixed `WebsocketClientTransport` to use `FrameProcessorSetup.task_manager`
instead of `StartFrame.task_manager`.
### Performance
- Use `uvloop` as the new event loop on Linux and macOS systems.
## [0.0.68] - 2025-05-28
### Added
- Added `GoogleHttpTTSService` which uses Google's HTTP TTS API.
- Added `TavusTransport`, a new transport implementation compatible with any
Pipecat pipeline. When using the `TavusTransport`the Pipecat bot will
connect in the same room as the Tavus Avatar and the user.
- Added `PlivoFrameSerializer` to support Plivo calls. A full running example
has also been added to `examples/plivo-chatbot`.
- Added `UserBotLatencyLogObserver`. This is an observer that logs the latency
between when the user stops speaking and when the bot starts speaking. This
gives you an initial idea on how quickly the AI services respond.
- Added `SarvamTTSService`, which implements Sarvam AI's TTS API:
https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert.
- Added `PipelineTask.add_observer()` and `PipelineTask.remove_observer()` to
allow mangaging observers at runtime. This is useful for cases where the task
is passed around to other code components that might want to observe the
pipeline dynamically.
- Added `user_id` field to `TranscriptionMessage`. This allows identifying the
user in a multi-user scenario. Note that this requires that
`TranscriptionFrame` has the `user_id` properly set.
- Added new `PipelineTask` event handlers `on_pipeline_started`,
`on_pipeline_stopped`, `on_pipeline_ended` and `on_pipeline_cancelled`, which
correspond to the `StartFrame`, `StopFrame`, `EndFrame` and `CancelFrame`
respectively.
- Added additional languages to `LmntTTSService`. Languages include: `hi`,
`id`, `it`, `ja`, `nl`, `pl`, `ru`, `sv`, `th`, `tr`, `uk`, `vi`.
- Added a `model` parameter to the `LmntTTSService` constructor, allowing
switching between LMNT models.
- Added `MiniMaxHttpTTSService`, which implements MiniMax's T2A API for TTS.
Learn more: https://www.minimax.io/platform_overview
- A new function `FrameProcessor.setup()` has been added to allow setting up
frame processors before receiving a `StartFrame`. This is what's happening
internally: `FrameProcessor.setup()` is called, `StartFrame` is pushed from
the beginning of the pipeline, your regular pipeline operations, `EndFrame`
or `CancelFrame` are pushed from the beginning of the pipeline and finally
`FrameProcessor.cleanup()` is called.
- Added support for OpenTelemetry tracing in Pipecat. This initial
implementation includes:
- A `setup_tracing` method where you can specify your OpenTelemetry exporter
- Service decorators for STT (`@traced_stt`), LLM (`@traced_llm`), and TTS
(`@traced_tts`) which trace the execution and collect properties and
metrics (TTFB, token usage, character counts, etc.)
- Class decorators that provide execution tracking; these are generic and can
be used for service tracking as needed
- Spans that help track traces on a per conversations and turn basis:
```
conversation-uuid
├── turn-1
│ ├── stt_deepgramsttservice
│ ├── llm_openaillmservice
│ └── tts_cartesiattsservice
...
└── turn-n
└── ...
```
By default, Pipecat has implemented service decorators to trace execution of
STT, LLM, and TTS services. You can enable tracing by setting
`enable_tracing` to `True` in the PipelineTask.
- Added `TurnTrackingObserver`, which tracks the start and end of a user/bot
turn pair and emits events `on_turn_started` and `on_turn_stopped`
corresponding to the start and end of a turn, respectively.
- Allow passing observers to `run_test()` while running unit tests.
### Changed
- Upgraded `daily-python` to 0.19.1.
- ⚠️ Updated `SmallWebRTCTransport` to align with how other transports handle
`on_client_disconnected`. Now, when the connection is closed and no reconnection
is attempted, `on_client_disconnected` is called instead of `on_client_close`. The
`on_client_close` callback is no longer used, use `on_client_disconnected` instead.
- Check if `PipelineTask` has already been cancelled.
- Don't raise an exception if event handler is not registered.
- Upgraded `deepgram-sdk` to 4.1.0.
- Updated `GoogleTTSService` to use Google's streaming TTS API. The default
voice also updated to `en-US-Chirp3-HD-Charon`.
- ⚠️ Refactored the `TavusVideoService`, so it acts like a proxy, sending audio
to Tavus and receiving both audio and video. This will make
`TavusVideoService` usable with any Pipecat pipeline and with any transport.
This is a **breaking change**, check the
`examples/foundational/21a-tavus-layer-small-webrtc.py` to see how to use it.
- `DailyTransport` now uses custom microphone audio tracks instead of virtual
microphones. Now, multiple Daily transports can be used in the same process.
- `DailyTransport` now captures audio from individual participants instead of
the whole room. This allows identifying audio frames per participant.
- Updated the default model for `AnthropicLLMService` to
`claude-sonnet-4-20250514`.
- Updated the default model for `GeminiMultimodalLiveLLMService` to
`models/gemini-2.5-flash-preview-native-audio-dialog`.
- `BaseTextFilter` methods `filter()`, `update_settings()`,
`handle_interruption()` and `reset_interruption()` are now async.
- `BaseTextAggregator` methods `aggregate()`, `handle_interruption()` and
`reset()` are now async.
- The API version for `CartesiaTTSService` and `CartesiaHttpTTSService` has
been updated. Also, the `cartesia` dependency has been updated to 2.x.
- `CartesiaTTSService` and `CartesiaHttpTTSService` now support Cartesia's new
`speed` parameter which accepts values of `slow`, `normal`, and `fast`.
- `GeminiMultimodalLiveLLMService` now uses the user transcription and usage
metrics provided by Gemini Live.
- `GoogleLLMService` has been updated to use `google-genai` instead of the
deprecated `google-generativeai`.
### Deprecated
- In `CartesiaTTSService` and `CartesiaHttpTTSService`, `emotion` has been
deprecated by Cartesia. Pipecat is following suit and deprecating `emotion`
as well.
### Removed
- Since `GeminiMultimodalLiveLLMService` now transcribes it's own audio, the
`transcribe_user_audio` arg has been removed. Audio is now transcribed
automatically.
- Removed `SileroVAD` frame processor, just use `SileroVADAnalyzer`
instead. Also removed, `07a-interruptible-vad.py` example.
### Fixed
- Fixed a `DailyTransport` issue that was not allow capturing video frames if
framerate was greater than zero.
- Fixed a `DeegramSTTService` connection issue when the user provided their own
`LiveOptions`.
- Fixed a `DailyTransport` issue that would cause images needing resize to block
the event loop.
- Fixed an issue with `ElevenLabsTTSService` where changing the model or voice
while the service is running wasn't working.
- Fixed an issue that would cause multiple instances of the same class to behave
incorrectly if any of the given constructor arguments defaulted to a mutable
value (e.g. lists, dictionaries, objects).
- Fixed an issue with `CartesiaTTSService` where `TTSTextFrame` messages weren't
being emitted when the model was set to `sonic`. This resulted in the
assistant context not being updated with assistant messages.
### Performance
- `DailyTransport`: process audio, video and events in separate tasks.
- Don't create event handler tasks if no user event handlers have been
registered.
### Other
- It is now possible to run all (or most) foundational example with multiple
transports. By default, they run with P2P (Peer-To-Peer) WebRTC so you can try
everything locally. You can also run them with Daily or even with a Twilio
phone number.
- Added foundation examples `07y-interruptible-minimax.py` and
`07z-interruptible-sarvam.py`to show how to use the `MiniMaxHttpTTSService`
and `SarvamTTSService`, respectively.
- Added an `open-telemetry-tracing` example, showing how to setup tracing. The
example also includes Jaeger as an open source OpenTelemetry client to review
traces from the example runs.
- Added foundational example `29-turn-tracking-observer.py` to show how to use
the `TurnTrackingObserver`.
## [0.0.67] - 2025-05-07
### Added
- Added `DebugLogObserver` for detailed frame logging with configurable
filtering by frame type and endpoint. This observer automatically extracts
and formats all frame data fields for debug logging.
- `UserImageRequestFrame.video_source` field has been added to request an image
from the desired video source.
- Added support for the AWS Nova Sonic speech-to-speech model with the new
`AWSNovaSonicLLMService`.
See https://docs.aws.amazon.com/nova/latest/userguide/speech.html.
Note that it requires Python >= 3.12 and `pip install pipecat-ai[aws-nova-sonic]`.
- Added new AWS services `AWSBedrockLLMService` and `AWSTranscribeSTTService`.
- Added `on_active_speaker_changed` event handler to the `DailyTransport` class.
- Added `enable_ssml_parsing` and `enable_logging` to `InputParams` in
`ElevenLabsTTSService`.
- Added support to `RimeHttpTTSService` for the `arcana` model.
### Changed
- Updated `ElevenLabsTTSService` to use the beta websocket API
(multi-stream-input). This new API supports context_ids and cancelling those
contexts, which greatly improves interruption handling.
- Observers `on_push_frame()` now take a single argument `FramePushed` instead
of multiple arguments.
- Updated the default voice for `DeepgramTTSService` to `aura-2-helena-en`.
### Deprecated
- `PollyTTSService` is now deprecated, use `AWSPollyTTSService` instead.
- Observer `on_push_frame(src, dst, frame, direction, timestamp)` is now
deprecated, use `on_push_frame(data: FramePushed)` instead.
### Fixed
- Fixed a `DailyTransport` issue that was causing issues when multiple audio or
video sources where being captured.
- Fixed a `UltravoxSTTService` issue that would cause the service to generate
all tokens as one word.
- Fixed a `PipelineTask` issue that would cause tasks to not be cancelled if
task was cancelled from outside of Pipecat.
- Fixed a `TaskManager` that was causing dangling tasks to be reported.
- Fixed an issue that could cause data to be sent to the transports when they
were still not ready.
- Remove custom audio tracks from `DailyTransport` before leaving.
### Removed
- Removed `CanonicalMetricsService` as it's no longer maintained.
## [0.0.66] - 2025-05-02
### Added
- Added two new input parameters to `RimeTTSService`: `pause_between_brackets`
and `phonemize_between_brackets`.
- Added support for cross-platform local smart turn detection. You can use
`LocalSmartTurnAnalyzer` for on-device inference using Torch.
- `BaseOutputTransport` now allows multiple destinations if the transport
implementation supports it (e.g. Daily's custom tracks). With multiple
destinations it is possible to send different audio or video tracks with a
single transport simultaneously. To do that, you need to set the new
`Frame.transport_destination` field with your desired transport destination
(e.g. custom track name), tell the transport you want a new destination with
`TransportParams.audio_out_destinations` or
`TransportParams.video_out_destinations` and the transport should take care of
the rest.
- Similar to the new `Frame.transport_destination`, there's a new
`Frame.transport_source` field which is set by the `BaseInputTransport` if the
incoming data comes from a non-default source (e.g. custom tracks).
- `TTSService` has a new `transport_destination` constructor parameter. This
parameter will be used to update the `Frame.transport_destination` field for
each generated `TTSAudioRawFrame`. This allows sending multiple bots' audio to
multiple destinations in the same pipeline.
- Added `DailyTransportParams.camera_out_enabled` and
`DailyTransportParams.microphone_out_enabled` which allows you to
enable/disable the main output camera or microphone tracks. This is useful if
you only want to use custom tracks and not send the main tracks. Note that you
still need `audio_out_enabled=True` or `video_out_enabled`.
- Added `DailyTransport.capture_participant_audio()` which allows you to capture
an audio source (e.g. "microphone", "screenAudio" or a custom track name) from
a remote participant.
- Added `DailyTransport.update_publishing()` which allows you to update the call
video and audio publishing settings (e.g. audio and video quality).
- Added `RTVIObserverParams` which allows you to configure what RTVI messages
are sent to the clients.
@@ -37,6 +508,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed
- `TransportParams.audio_mixer` now supports a string and also a dictionary to
provide a mixer per destination. For example:
```python
audio_out_mixer={
"track-1": SoundfileMixer(...),
"track-2": SoundfileMixer(...),
"track-N": SoundfileMixer(...),
},
```
- The `STTMuteFilter` now mutes `InterimTranscriptionFrame` and
`TranscriptionFrame` which allows the `STTMuteFilter` to be used in
conjunction with transports that generate transcripts, e.g. `DailyTransport`.
@@ -70,6 +552,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
case there's no need to push audio to the rest of the pipeline, but this is
not a very common case.
- Added `RivaSegmentedSTTService`, which allows Riva offline/batch models, such
as to be "canary-1b-asr" used in Pipecat.
### Deprecated
- Function calls with parameters
@@ -85,8 +570,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `TransportParams.vad_audio_passthrough` parameter is now deprecated, use
`TransportParams.audio_in_passthrough` instead.
- `ParakeetSTTService` is now deprecated, use `RivaSTTService` instead, which uses
the model "parakeet-ctc-1.1b-asr" by default.
- `FastPitchTTSService` is now deprecated, use `RivaTTSService` instead, which uses
the model "magpie-tts-multilingual" by default.
### Fixed
- Fixed an issue with `SimliVideoService` where the bot was continuously outputting
audio, which prevents the `BotStoppedSpeakingFrame` from being emitted.
- Fixed an issue where `OpenAIRealtimeBetaLLMService` would add two assistant
messages to the context.
- Fixed an issue with `GeminiMultimodalLiveLLMService` where the context
contained tokens instead of words.
@@ -102,6 +599,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Other
- Added `examples/daily-custom-tracks` to show how to send and receive Daily
custom tracks.
- Added `examples/daily-multi-translation` to showcase how to send multiple
simulataneous translations with the same transport.
- Added 04 foundational examples for client/server transports. Also, renamed
`29-livekit-audio-chat.py` to `04b-transports-livekit.py`.

4
MANIFEST.in Normal file
View File

@@ -0,0 +1,4 @@
prune docs
prune examples
prune scripts
prune tests

View File

@@ -8,6 +8,8 @@
**Pipecat** is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.
> Want to dive right in? [Install Pipecat](https://docs.pipecat.ai/getting-started/installation) then try the [quickstart](https://docs.pipecat.ai/getting-started/quickstart).
## 🚀 What You Can Build
- **Voice Assistants** natural, streaming conversations with AI
@@ -49,18 +51,19 @@ You can connect to Pipecat from any platform using our official SDKs:
## 🧩 Available services
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)

View File

@@ -10,7 +10,6 @@ pipecat-ai[anthropic]
pipecat-ai[assemblyai]
pipecat-ai[aws]
pipecat-ai[azure]
pipecat-ai[canonical]
pipecat-ai[cartesia]
pipecat-ai[cerebras]
pipecat-ai[deepseek]

View File

@@ -95,9 +95,16 @@ OPENROUTER_API_KEY=...
PIPER_BASE_URL=...
# Smart turn
LOCAL_SMART_TURN_MODEL_PATH=
LOCAL_SMART_TURN_MODEL_PATH=...
FAL_SMART_TURN_API_KEY=...
# Twilio
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=
TWILIO_ACCOUNT_SID=...
TWILIO_AUTH_TOKEN=...
# MiniMax
MINIMAX_API_KEY=...
MINIMAX_GROUP_ID=...
# Sarvam AI
SARVAM_API_KEY=...

View File

@@ -12,7 +12,7 @@
"@daily-co/daily-js": "0.74.0"
},
"devDependencies": {
"vite": "^6.0.9"
"vite": "^6.3.5"
}
},
"node_modules/@babel/runtime": {
@@ -999,9 +999,9 @@
}
},
"node_modules/vite": {
"version": "6.3.3",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.3.tgz",
"integrity": "sha512-5nXH+QsELbFKhsEfWLkHrvgRpTdGJzqOZ+utSdmPTvwHmvU6ITTm3xx+mRusihkcI8GeC7lCDyn3kDtiki9scw==",
"version": "6.3.5",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.5.tgz",
"integrity": "sha512-cZn6NDFE7wdTpINgs++ZJ4N49W2vRp8LCKrn3Ob1kYNtOo21vfDoaV5GzBfLU4MovSAB8uNRm4jgzVQZ+mBzPQ==",
"dev": true,
"dependencies": {
"esbuild": "^0.25.0",

View File

@@ -12,7 +12,7 @@
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.0.9"
"vite": "^6.3.5"
},
"dependencies": {
"@daily-co/daily-js": "0.74.0"

View File

@@ -1,66 +0,0 @@
# Chatbot with canonical-metrics
This project implements a chatbot using a pipeline architecture that integrates audio processing, transcription, and a language model for conversational interactions. The chatbot operates within a daily communication environment, utilizing various services for text-to-speech and language model responses.
## Features
- **Audio Input and Output**: Captures microphone input and plays back audio responses.
- **Voice Activity Detection**: Utilizes Silero VAD to manage audio input intelligently.
- **Text-to-Speech**: Integrates ElevenLabs TTS service to convert text responses into audio.
- **Language Model Interaction**: Uses OpenAI's GPT-4 model to generate responses based on user input.
- **Transcription Services**: Captures and transcribes participant speech for analytics.
- **Metrics Collection**: Sends audio data for analysis via Canonical Metrics Service.
## Requirements
- Python 3.10+
- `python-dotenv`
- Additional libraries from the `pipecat` package.
## Setup
1. Clone the repository.
2. Install the required packages.
3. Set up environment variables for API keys:
- `OPENAI_API_KEY`
- `ELEVENLABS_API_KEY`
- `CANONICAL_API_KEY`
- `CANONICAL_API_URL`
4. Run the script.
## Usage
The chatbot introduces itself and engages in conversations, providing brief and creative responses. Designed for flexibility, it can support multiple languages with appropriate configuration.
## Events
- Participants joining or leaving the call are handled dynamically, adjusting the chatbot's behavior accordingly.
The first time, things might take extra time to get started since VAD (Voice Activity Detection) model needs to be downloaded.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/` in your browser to start a chatbot session.
## Build and test the Docker image
```
docker build -t chatbot .
docker run --env-file .env -p 7860:7860 chatbot
```

View File

@@ -1,146 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import uuid
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
from pipecat.services.canonical.metrics import CanonicalMetricsService
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
audio_in_enabled=True,
video_out_enabled=False,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="cgSgspJ2msm6clMCkdW9",
#
# Spanish
#
# model="eleven_multilingual_v2",
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
#
# English
#
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself. Keep all your responses to 12 words or fewer.",
#
# Spanish
#
# "content": "Eres Chatbot, un amigable y útil robot. Tu objetivo es demostrar tus capacidades de una manera breve. Tus respuestas se convertiran a audio así que nunca no debes incluir caracteres especiales. Contesta a lo que el usuario pregunte de una manera creativa, útil y breve. Empieza por presentarte a ti mismo.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
"""
CanonicalMetrics uses AudioBufferProcessor under the hood to buffer the audio. On
call completion, CanonicalMetrics will send the audio buffer to Canonical for
analysis. Visit https://voice.canonical.chat to learn more.
"""
audio_buffer_processor = AudioBufferProcessor(num_channels=2)
canonical = CanonicalMetricsService(
audio_buffer_processor=audio_buffer_processor,
aiohttp_session=session,
api_key=os.getenv("CANONICAL_API_KEY"),
call_id=str(uuid.uuid4()),
assistant="pipecat-chatbot",
assistant_speaks_first=True,
context=context,
)
pipeline = Pipeline(
[
transport.input(), # microphone
context_aggregator.user(),
llm,
tts,
transport.output(),
canonical, # uploads audio buffer to Canonical AI for metrics
audio_buffer_processor, # captures audio into a buffer
context_aggregator.assistant(),
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await audio_buffer_processor.start_recording()
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
# Here we don't want to cancel, we just want to finish sending
# whatever is queued, so we use an EndFrame().
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,5 +0,0 @@
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,openai,silero,elevenlabs,canonical]

View File

@@ -128,7 +128,14 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
allow_interruptions=True,
),
)
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):

View File

@@ -53,4 +53,3 @@ async def configure(aiohttp_session: aiohttp.ClientSession):
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)
return (url, token)

View File

@@ -0,0 +1,39 @@
# Daily Custom Tracks
This example shows how to send and receive Daily custom tracks. We will run a simple `daily-python` application to send an audio file with a custom track (named "pipecat") to a room. Then, the Pipecat bot will mirror that custom track into another custom track (named "pipecat-mirror") in the same room.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
## Run the bot
Start the bot by giving it a Daily room URL.
```bash
python bot.py -u ROOM_URL
```
The bot will wait for the first participant to join. Then, it will mirror a custom track named "pipecat" into a new custom track named "pipecat-mirror".
## Run the sender
Now, run the custom track sender. This is a simple `daily-python` application that opens and audio file and sends it as a custom track to the same Daily room.
```bash
python custom_track_sender.py -u ROOM_URL -i office-ambience-mono-16000.mp3
```
## Open client
Finally, open the client so you can hear both custom tracks.
```bash
open index.html
```
Once the client is opened, copy the URL of the Daily room and join it. You should be able to select which custom track you want to hear.

View File

@@ -0,0 +1,87 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
import aiohttp
from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, InputAudioRawFrame, OutputAudioRawFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.services.daily import DailyParams, DailyTransport
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class CustomTrackMirrorProcessor(FrameProcessor):
def __init__(self, transport_destination: str, **kwargs):
super().__init__(**kwargs)
self._transport_destination = transport_destination
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, InputAudioRawFrame) and frame.transport_source:
output_frame = OutputAudioRawFrame(
audio=frame.audio,
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
)
output_frame.transport_destination = self._transport_destination
await self.push_frame(output_frame)
else:
await self.push_frame(frame, direction)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Custom tracks mirror",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
microphone_out_enabled=False, # Disable since we just use custom tracks
audio_out_destinations=["pipecat-mirror"],
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
CustomTrackMirrorProcessor("pipecat-mirror"),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_audio(participant["id"], audio_source="pipecat")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,74 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import time
from daily import CallClient, CustomAudioSource, Daily
from pydub import AudioSegment
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument("-u", "--url", type=str, required=True, help="URL of the Daily room to join")
parser.add_argument(
"-i", "--input", type=str, required=True, help="Input audio file (needs 16000 sample rate)"
)
args, _ = parser.parse_known_args()
audio = AudioSegment.from_mp3(args.input)
raw_bytes = audio.raw_data
sample_rate = audio.frame_rate
channels = audio.channels
print(f"Length: {len(raw_bytes)} bytes")
print(f"Sample rate: {sample_rate}, Channels: {channels}")
# Initialize the Daily context & create call client
Daily.init()
client = CallClient()
# Join the room and indicate we have a custom track named "pipecat".
client.join(
args.url,
client_settings={
"publishing": {
"camera": False,
"microphone": False,
"customAudio": {"pipecat": True},
},
},
)
# Just sleep for a couple of seconds. To do this well we should really use
# completions.
time.sleep(2)
# Create the custom audio source. This is where we will write our audio.
audio_source = CustomAudioSource(sample_rate, channels)
# Create an audio track and assign it our audio source.
client.add_custom_audio_track("pipecat", audio_source)
# Just sleep for a second. To do this well we should really use completions.
time.sleep(1)
try:
# Just write one second of audio until we have read all the file.
chunk_size = sample_rate * channels * 2
while len(raw_bytes) > 0:
chunk = raw_bytes[:chunk_size]
raw_bytes = raw_bytes[chunk_size:]
audio_source.write_frames(chunk)
except KeyboardInterrupt:
client.leave()
# Just sleep for a second. To do this well we should really use completions.
time.sleep(1)
client.release()

View File

@@ -0,0 +1,173 @@
<html>
<head>
<title>daily custom tracks</title>
</head>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.js"></script>
<link
rel="stylesheet"
type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.css"
/>
<script>
function enableButton(buttonId, enable) {
const button = document.getElementById(buttonId);
button.disabled = !enable;
}
function enableJoinButton(enable) {
enableButton("join-button", enable);
}
function enableLeaveButton(enable) {
enableButton("leave-button", enable);
}
function destroyPlayers(query) {
const items = document.querySelectorAll(query);
if (items) {
for (const item of items) {
item.remove();
}
}
}
function destroyParticipantPlayers(participantId) {
destroyPlayers(`audio[data-participant-id="${participantId}"]`);
destroyPlayers(`button[data-participant-id="${participantId}"]`);
}
async function startPlayer(player, track) {
player.muted = false;
player.autoplay = true;
if (track != null) {
player.srcObject = new MediaStream([track]);
}
}
async function buildAudioPlayer(track, participantId) {
const audioContainer = document.getElementById("audio-container");
const player = document.createElement("audio");
player.dataset.participantId = participantId;
// Create a new button for controlling audio
const audioControlButton = document.createElement("button");
audioControlButton.className = "ui primary green button"
audioControlButton.innerText = track._mediaTag == "cam-audio" ? "english" : track._mediaTag;
audioControlButton.dataset.participantId = participantId;
audioControlButton.onclick = () => {
if (player.paused) {
player.play();
audioControlButton.className = "ui primary red button"
} else {
player.pause();
audioControlButton.className = "ui primary green button"
}
};
audioContainer.appendChild(player);
audioContainer.appendChild(audioControlButton);
await startPlayer(player, track);
player.pause()
return player;
}
function subscribeToTracks(participantId) {
console.log(`subscribing to track`);
if (participantId === "local") {
return;
}
callObject.updateParticipant(participantId, {
setSubscribedTracks: {
audio: true,
video: false,
custom: true,
},
});
}
function startDaily() {
enableJoinButton(true);
enableLeaveButton(false);
window.callObject = window.DailyIframe.createCallObject({});
callObject.on("participant-joined", (e) => {
if (!e.participant.local) {
console.log("participant-joined", e.participant);
subscribeToTracks(e.participant.session_id);
}
});
callObject.on("participant-left", (e) => {
console.log("participant-left", e.participant.session_id);
destroyParticipantPlayers(e.participant.session_id);
});
callObject.on("track-started", async (e) => {
console.log("track-started", e.track);
if (e.track.kind === "audio") {
await buildAudioPlayer(e.track, e.participant.session_id);
}
});
}
async function joinRoom() {
enableJoinButton(false);
enableLeaveButton(true);
const meetingUrl = document.getElementById("meeting-url").value;
callObject.join({
url: meetingUrl,
startVideoOff: true,
startAudioOff: true,
subscribeToTracksAutomatically: false,
receiveSettings: {
base: { video: { layer: 0 } },
},
});
}
async function leaveRoom() {
enableJoinButton(true);
enableLeaveButton(false);
callObject.leave();
const audioContainer = document.getElementById("audio-container");
audioContainer.replaceChildren();
}
</script>
<body onload="startDaily()">
<div class="ui centered page grid" style="margin-top: 30px">
<div class="ten wide column">
<div class="ui form" style="margin-top: 30px">
<div class="field">
<label>Meeting URL</label>
<input id="meeting-url" value="" />
</div>
</div>
</div>
</div>
<div class="ui centered aligned header" style="margin-top: 30px">
<button id="join-button" class="ui primary button" onclick="joinRoom()">
Join
</button>
<button id="leave-button" class="ui button" onclick="leaveRoom()">
Leave
</button>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="audio-container"></div><br/>
</div>
</div>
</body>
</html>

View File

@@ -0,0 +1,2 @@
pydub
pipecat-ai[daily]

View File

@@ -1,7 +1,12 @@
FROM python:3.10-bullseye
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
WORKDIR /app
RUN pip3 install -r requirements.txt

View File

@@ -0,0 +1,39 @@
# Daily Multi Translation
This example shows how to use Daily to stream multiple simultaneous translations using a single transport. Daily provides custom tracks and in this example we will simultaneously translate incoming audio in English to Spanish, French and German, each of them being sent to a custom track.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/` in your browser. This will open a Daily Prebuilt room where you will speak in English (make sure you are not muted).
## Open client
Next, you need to open the client that will listen to the translations.
```bash
open index.html
```
Once the client is opened, copy the URL of the Daily room created above and join it. You should be able to select which translation you want to hear.
## Build and test the Docker image
```
docker build -t daily-multi-translation .
docker run --env-file .env -p 7860:7860 daily-multi-translation
```

View File

@@ -0,0 +1,165 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.mixers.soundfile_mixer import SoundfileMixer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
BACKGROUND_SOUND_FILE = "office-ambience-mono-16000.mp3"
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Multi translation bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_out_mixer={
"spanish": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
"french": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
"german": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
},
audio_out_destinations=["spanish", "french", "german"],
microphone_out_enabled=False, # Disable since we just use custom tracks
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts_spanish = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="cefcb124-080b-4655-b31f-932f3ee743de",
transport_destination="spanish",
)
tts_french = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="8832a0b5-47b2-4751-bb22-6a8e2149303d",
transport_destination="french",
)
tts_german = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="38aabb6a-f52b-4fb0-a3d1-988518f4dc06",
transport_destination="german",
)
messages_spanish = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into Spanish.",
},
]
messages_french = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into French.",
},
]
messages_german = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into German.",
},
]
llm_spanish = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm_french = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm_german = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
context_spanish = OpenAILLMContext(messages_spanish)
context_aggregator_spanish = llm_spanish.create_context_aggregator(context_spanish)
context_french = OpenAILLMContext(messages_french)
context_aggregator_french = llm_french.create_context_aggregator(context_french)
context_german = OpenAILLMContext(messages_german)
context_aggregator_german = llm_german.create_context_aggregator(context_german)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
ParallelPipeline(
# Spanish pipeline.
[
context_aggregator_spanish.user(),
llm_spanish,
tts_spanish,
context_aggregator_spanish.assistant(),
],
# French pipeline.
[
context_aggregator_french.user(),
llm_french,
tts_french,
context_aggregator_french.assistant(),
],
# German pipeline.
[
context_aggregator_german.user(),
llm_german,
tts_german,
context_aggregator_german.assistant(),
],
),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
observers=[TranscriptionLogObserver()],
)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,6 +1,5 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=7df...
OPENAI_API_KEY=sk-PL...
ELEVENLABS_API_KEY=aeb...
CANONICAL_API_KEY=can...
CANONICAL_API_URL=
DEEPGRAM_API_KEY=efb...
CARTESIA_API_KEY=aeb...

View File

@@ -0,0 +1,202 @@
<html>
<head>
<title>daily multi translation</title>
</head>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<script
src="https://code.jquery.com/jquery-3.1.1.min.js"
integrity="sha256-hVVnYaiADRTO2PzUGmuLJr8BLUSjGIZsDYGmIJLv2b8="
crossorigin="anonymous"
></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.js"></script>
<link
rel="stylesheet"
type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.css"
/>
<script>
function enableButton(buttonId, enable) {
const button = document.getElementById(buttonId);
button.disabled = !enable;
}
function enableJoinButton(enable) {
enableButton("join-button", enable);
}
function enableLeaveButton(enable) {
enableButton("leave-button", enable);
}
function destroyPlayers(query) {
const items = document.querySelectorAll(query);
if (items) {
for (const item of items) {
item.remove();
}
}
}
function destroyParticipantPlayers(participantId) {
destroyPlayers(`video[data-participant-id="${participantId}"]`);
destroyPlayers(`audio[data-participant-id="${participantId}"]`);
destroyPlayers(`button[data-participant-id="${participantId}"]`);
}
async function startPlayer(player, track) {
player.muted = false;
player.autoplay = true;
if (track != null) {
player.srcObject = new MediaStream([track]);
}
}
async function buildVideoPlayer(track, participantId) {
const videoContainer = document.getElementById("video-container");
const player = document.createElement("video");
player.dataset.participantId = participantId;
videoContainer.appendChild(player);
await startPlayer(player, track);
await player.play();
return player;
}
async function buildAudioPlayer(track, participantId) {
const audioContainer = document.getElementById("audio-container");
const player = document.createElement("audio");
player.dataset.participantId = participantId;
// Create a new button for controlling audio
const audioControlButton = document.createElement("button");
audioControlButton.className = "ui primary green button"
audioControlButton.innerText = track._mediaTag == "cam-audio" ? "english" : track._mediaTag;
audioControlButton.dataset.participantId = participantId;
audioControlButton.onclick = () => {
if (player.paused) {
player.play();
audioControlButton.className = "ui primary red button"
} else {
player.pause();
audioControlButton.className = "ui primary green button"
}
};
audioContainer.appendChild(player);
audioContainer.appendChild(audioControlButton);
await startPlayer(player, track);
player.pause()
return player;
}
function subscribeToTracks(participantId) {
console.log(`subscribing to track`);
if (participantId === "local") {
return;
}
callObject.updateParticipant(participantId, {
setSubscribedTracks: {
audio: true,
video: true,
custom: true,
},
});
}
function startDaily() {
enableJoinButton(true);
enableLeaveButton(false);
window.callObject = window.DailyIframe.createCallObject({});
callObject.on("participant-joined", (e) => {
if (!e.participant.local) {
console.log("participant-joined", e.participant);
subscribeToTracks(e.participant.session_id);
}
});
callObject.on("participant-left", (e) => {
console.log("participant-left", e.participant.session_id);
destroyParticipantPlayers(e.participant.session_id);
});
callObject.on("track-started", async (e) => {
console.log("track-started", e.track);
if (e.track.kind === "video") {
await buildVideoPlayer(e.track, e.participant.session_id);
} else if (e.track.kind === "audio") {
await buildAudioPlayer(e.track, e.participant.session_id);
}
});
}
async function joinRoom() {
enableJoinButton(false);
enableLeaveButton(true);
const meetingUrl = document.getElementById("meeting-url").value;
callObject.join({
url: meetingUrl,
startVideoOff: true,
startAudioOff: true,
subscribeToTracksAutomatically: false,
receiveSettings: {
base: { video: { layer: 0 } },
},
});
}
async function leaveRoom() {
enableJoinButton(true);
enableLeaveButton(false);
callObject.leave();
const videoContainer = document.getElementById("video-container");
videoContainer.replaceChildren();
const audioContainer = document.getElementById("audio-container");
audioContainer.replaceChildren();
}
</script>
<body onload="startDaily()">
<div class="ui centered page grid" style="margin-top: 30px">
<div class="ten wide column">
<div class="ui form" style="margin-top: 30px">
<div class="field">
<label>Meeting URL</label>
<input id="meeting-url" value="" />
</div>
</div>
</div>
</div>
<div class="ui centered aligned header" style="margin-top: 30px">
<button id="join-button" class="ui primary button" onclick="joinRoom()">
Join
</button>
<button id="leave-button" class="ui button" onclick="leaveRoom()">
Leave
</button>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="audio-container"></div><br/>
</div>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="video-container" class="ui segment"></div>
</div>
</div>
</body>
</html>

View File

@@ -0,0 +1,5 @@
aiofiles
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,deepgram,openai,silero,cartesia]

View File

@@ -0,0 +1,55 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -1,3 +1,6 @@
# Modal clone
modal-examples
# Python
__pycache__/
*.py[cod]

View File

@@ -1,37 +1,91 @@
# Deploying Pipecat to Modal.com
Barebones deployment example for [modal.com](https://www.modal.com)
Deployment example for [modal.com](https://www.modal.com). This example demonstrates how to deploy a FastAPI webapp to Modal with an RTVI compatible `/connect` endpoint that launches a Pipecat pipeline in a separate Modal container and returns a room/token for the client to join. This example also supports providing a parameter to the `/connect` endpoint for specifying which Pipecat pipeline to launch; openai, gemini, or vllm. The vllm pipeline points to a self-hosted OpenAI compatible LLM, using a llama model (neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16), deployed to Modal.
1. Install dependencies
![](diagram.jpg)
```bash
python -m venv venv
source venv/bin/active # or OS equivalent
pip install -r requirements.txt
```
# Running this Example
2. Setup .env
## Install the Modal CLI
```bash
cp env.example .env
```
Setup a Modal account and install it on your machine if you have not already, following their easy 3 steps in their [Getting Started Guide](https://modal.com/docs/guide#getting-started)
Alternatively, you can configure your Modal app to use [secrets](https://modal.com/docs/guide/secrets)
## Deploy a self-serve LLM
3. Test the app locally
1. Deploy Modal's OpenAI-compatible LLM service:
```bash
modal serve app.py
```
```bash
git clone https://github.com/modal-labs/modal-examples
cd modal-examples
modal deploy 06_gpu_and_ml/llm-serving/vllm_inference.py
```
Refer to Modal's guide and example for [Deploying an OpenAI-compatible LLM service with vLLM](https://modal.com/docs/examples/vllm_inference) for more details.
2. Take note of the endpoint URL from the previous step, which will look like:
```
https://{your-workspace}--example-vllm-openai-compatible-serve.modal.run
```
You'll need this for the `bot_vllm.py` file in the next section.
**Note:** The default Modal LLM example uses Llama-3.1 and will shut down after 15 minutes of inactivity. Cold starts take 5-10 minutes. To prepare the service, we recommend visiting the `/docs` endpoint (`https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run/docs`) for your deployed LLM and wait for it to fully load before connecting your client.
## Deploy FastAPI App and Pipecat pipeline to Modal
1. Setup environment variables
```bash
cd server
cp env.example .env
# Modify .env to provide your service API Keys
```
Alternatively, you can configure your Modal app to use [secrets](https://modal.com/docs/guide/secrets)
2. Update the `modal_url` in `server/src/bot_vllm.py` to point to the url produced from the self-serve llm deploy, mentioned above.
3. From within the `server` directory, test the app locally:
```bash
modal serve app.py
```
4. Deploy to production
```bash
modal deploy app.py
```
```bash
modal deploy app.py
```
## Configuration options
5. Note the endpoint URL produced from this deployment. It will look like:
This app sets some sensible defaults for reducing cold starts, such as `minkeep_warm=1`, which will keep at least 1 warm instance ready for your bot function.
```bash
https://{your-workspace}--pipecat-modal-fastapi-app.modal.run
```
It has been configured to only allow a concurrency of 1 (`max_inputs=1`) as each user will require their own running function.
You'll need this URL for the client's `app.js` configuration mentioned in its README.
## Launch your bots on Modal
### Option 1: Direct Link
Simply click on the url displayed after running the server or deploy step to launch an agent and be redirected to a Daily room to talk with the launched bot. This will use the OpenAI pipeline.
### Option 2: Connect via an RTVI Client
Follow the instructions provided in the [client folder's README](client/javascript/README.md) for building and running a custom client that connects to your Modal endpoint. The provided client provides a dropdown for choosing which bot pipeline to run.
# Navigating your llm, server, and Pipecat logs
In your [Modal dashboard](https://modal.com/apps), you should have two Apps listed under Live Apps:
1. `example-vllm-openai-compatible`: This App contains the containers and logs used to run your self-hosted LLM. There will be just one App Function listed: `serve`. Click on this function to view logs for your LLM.
2. `pipecat-modal`: This App contains the containers and logs used to run your `connect` endpoints and Pipecat pipelines. It will list two App Functions:
1. `fastapi_app`: This function is running the endpoints that your client will interact with and initiate starting a new pipeline (`/`, `/connect`, `/status`). Click on this function to see logs for each endpoint hit.
2. `bot_runner`: This function handles launching and running a bot pipeline. Click on this function to get a list of all pipeline runs and access each run's logs.
# Modal + Pipecat Tips
- In most other Pipecat examples, we use `Popen` to launch the pipeline process from the `/connect` endpoint. In this example, we use a Modal function instead. This allows us to run the pipelines using a separately defined Modal image as well as run each pipeline in an isolated container.
- For the FastAPI and most common Pipecat Pipeline containers, a default `debian_slim` CPU-only should be all that's required to run. GPU containers are needed for self-hosted services.
- To minimize cold starts of the pipeline and reduce latency for users, set `min_containers=1` on the Modal Function that launches the pipeline to ensure at least one warm instance of your function is always available.
- For next steps on running a self-hosted llm and reducing latency, check out all of [Modal's LLM examples](https://modal.com/docs/examples/vllm_inference).

View File

@@ -1,80 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
import modal
from bot import _voice_bot_process
from fastapi import HTTPException
from fastapi.responses import JSONResponse
from loguru import logger
MAX_SESSION_TIME = 15 * 60 # 15 minutes
app = modal.App("pipecat-modal")
image = modal.Image.debian_slim(python_version="3.12").pip_install_from_requirements(
"requirements.txt"
)
@app.function(
image=image,
cpu=1.0,
secrets=[modal.Secret.from_dotenv()],
keep_warm=1,
enable_memory_snapshot=True,
max_inputs=1, # Do not reuse instances across requests
retries=0,
)
def launch_bot_process(room_url: str, token: str):
_voice_bot_process(room_url, token)
@app.function(
image=image,
secrets=[modal.Secret.from_dotenv()],
)
@modal.web_endpoint(method="POST")
async def start():
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomParams,
)
logger.info("Request received")
async with aiohttp.ClientSession() as session:
daily_rest_helper = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=session,
)
# Create new Daily room
room = await daily_rest_helper.create_room(DailyRoomParams())
if not room.url:
raise HTTPException(
status_code=500,
detail="Unable to create room",
)
logger.info(f"Created room: {room.url}")
# Create bot token for room
token = await daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
logger.info(f"Bot token created: {token}")
# Spawn a new bot process
launch_bot_process.spawn(room_url=room.url, token=token)
# Return room URL to the user to join
# Note: in production, you would want to return a token to the user
return JSONResponse(content={"room_url": room.url, token: token})

View File

@@ -1,95 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token: str):
transport = DailyTransport(
room_url,
token,
"bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
def _voice_bot_process(room_url: str, token: str):
asyncio.run(main(room_url, token))

View File

@@ -0,0 +1 @@
node_modules

View File

@@ -0,0 +1,29 @@
# JavaScript Implementation
Basic implementation using the [Pipecat JavaScript SDK](https://docs.pipecat.ai/client/js/introduction).
## Setup
1. Deploy the Modal server. See the main [README](../../README).
2. Navigate to the `client/javascript` directory:
```bash
cd client/javascript
```
3. Modify the baseUrl in src/app.js to point to your deployed Modal endpoint
4. Install dependencies:
```bash
npm install
```
5. Run the client app:
```
npm run dev
```
6. Visit http://localhost:5173 in your browser.

View File

@@ -0,0 +1,49 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>AI Chatbot</title>
</head>
<body>
<div class="container">
<div class="status-bar">
<div class="status">
Status: <span id="connection-status">Disconnected</span>
</div>
<div class="controls">
<select id="bot-selector">
<option value="openai">OpenAI</option>
<option value="gemini">Gemini</option>
<option value="vllm">Llama</option>
</select>
<button id="connect-btn">Connect</button>
<button id="disconnect-btn" disabled>Disconnect</button>
</div>
</div>
<div class="main-content">
<div class="bot-container">
<div id="bot-video-container"></div>
<audio id="bot-audio" autoplay></audio>
</div>
</div>
<div class="device-bar">
<div class="device-controls">
<select id="device-selector"></select>
<button id="mic-toggle-btn">Mute Mic</button>
</div>
</div>
<div class="debug-panel">
<h3>Debug Info</h3>
<div id="debug-log"></div>
</div>
</div>
<script type="module" src="/src/app.js"></script>
<link rel="stylesheet" href="/src/style.css" />
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,21 @@
{
"name": "client",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"keywords": [],
"author": "",
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.10"
}
}

View File

@@ -0,0 +1,381 @@
/**
* Copyright (c) 20242025, Daily
*
* SPDX-License-Identifier: BSD 2-Clause License
*/
/**
* RTVI Client Implementation
*
* This client connects to an RTVI-compatible bot server using WebRTC (via Daily).
* It handles audio/video streaming and manages the connection lifecycle.
*
* Requirements:
* - A running RTVI bot server (defaults to http://localhost:7860)
* - The server must implement the /connect endpoint that returns Daily.co room credentials
* - Browser with WebRTC support
*/
import { RTVIClient, RTVIEvent } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
/**
* ChatbotClient handles the connection and media management for a real-time
* voice and video interaction with an AI bot.
*/
class ChatbotClient {
constructor() {
// Initialize client state
this.rtviClient = null;
this.setupDOMElements();
this.initializeClientAndTransport();
this.setupEventListeners();
}
/**
* Set up references to DOM elements and create necessary media elements
*/
setupDOMElements() {
// Get references to UI control elements
this.connectBtn = document.getElementById('connect-btn');
this.disconnectBtn = document.getElementById('disconnect-btn');
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
this.botVideoContainer = document.getElementById('bot-video-container');
this.deviceSelector = document.getElementById('device-selector');
// Create an audio element for bot's voice output
this.botAudio = document.createElement('audio');
this.botAudio.autoplay = true;
this.botAudio.playsInline = true;
document.body.appendChild(this.botAudio);
}
/**
* Set up event listeners for connect/disconnect buttons
*/
setupEventListeners() {
this.connectBtn.addEventListener('click', () => this.connect());
this.disconnectBtn.addEventListener('click', () => this.disconnect());
// Populate device selector
this.rtviClient.getAllMics().then((mics) => {
console.log('Available mics:', mics);
mics.forEach((device) => {
const option = document.createElement('option');
option.value = device.deviceId;
option.textContent = device.label || `Microphone ${device.deviceId}`;
this.deviceSelector.appendChild(option);
});
});
this.deviceSelector.addEventListener('change', (event) => {
const selectedDeviceId = event.target.value;
console.log('Selected device ID:', selectedDeviceId);
this.rtviClient.updateMic(selectedDeviceId);
});
// Handle mic mute/unmute toggle
const micToggleBtn = document.getElementById('mic-toggle-btn');
micToggleBtn.addEventListener('click', () => {
let micEnabled = this.rtviClient.isMicEnabled;
micToggleBtn.textContent = micEnabled ? 'Unmute Mic' : 'Mute Mic';
this.rtviClient.enableMic(!micEnabled);
// Add logic to mute/unmute the mic
if (micEnabled) {
console.log('Mic muted');
// Add code to mute the mic
} else {
console.log('Mic unmuted');
// Add code to unmute the mic
}
});
}
/**
* Set up the RTVI client and Daily transport
*/
async initializeClientAndTransport() {
// Initialize the RTVI client with a DailyTransport and our configuration
this.rtviClient = new RTVIClient({
transport: new DailyTransport(),
params: {
// REPLACE WITH YOUR MODAL URL ENDPOINT
baseUrl:
'https://<Modal workspace>--pipecat-modal-bot-launcher.modal.run',
endpoints: {
connect: '/connect',
},
requestData: {
bot_name: 'openai',
},
},
enableMic: true, // Enable microphone for user input
enableCam: false,
callbacks: {
// Handle connection state changes
onConnected: () => {
this.updateStatus('Connected');
this.connectBtn.disabled = true;
this.disconnectBtn.disabled = false;
this.log('Client connected');
},
onDisconnected: () => {
this.updateStatus('Disconnected');
this.connectBtn.disabled = false;
this.disconnectBtn.disabled = true;
this.log('Client disconnected');
},
// Handle transport state changes
onTransportStateChanged: (state) => {
this.updateStatus(`Transport: ${state}`);
this.log(`Transport state changed: ${state}`);
if (state === 'connecting') {
window.startTime = Date.now();
}
if (state === 'ready') {
this.setupMediaTracks();
console.warn('TIME TO BOT READY:', Date.now() - window.startTime);
}
},
// Handle bot connection events
onBotConnected: (participant) => {
this.log(`Bot connected: ${JSON.stringify(participant)}`);
},
onBotDisconnected: (participant) => {
this.log(`Bot disconnected: ${JSON.stringify(participant)}`);
},
onBotReady: (data) => {
this.log(`Bot ready: ${JSON.stringify(data)}`);
this.setupMediaTracks();
},
// Transcript events
onUserTranscript: (data) => {
// Only log final transcripts
if (data.final) {
this.log(`User: ${data.text}`);
}
},
onBotTranscript: (data) => {
this.log(`Bot: ${data.text}`);
},
// Error handling
onMessageError: (error) => {
console.log('Message error:', error);
},
onMicUpdated: (data) => {
console.log('Mic updated:', data);
this.deviceSelector.value = data.deviceId;
},
onError: (error) => {
console.log('Error:', JSON.stringify(error));
},
},
});
// Set up listeners for media track events
this.setupTrackListeners();
await this.rtviClient.initDevices();
window.client = this.rtviClient;
}
/**
* Add a timestamped message to the debug log
*/
log(message) {
const entry = document.createElement('div');
entry.textContent = `${new Date().toISOString()} - ${message}`;
// Add styling based on message type
if (message.startsWith('User: ')) {
entry.style.color = '#2196F3'; // blue for user
} else if (message.startsWith('Bot: ')) {
entry.style.color = '#4CAF50'; // green for bot
}
this.debugLog.appendChild(entry);
this.debugLog.scrollTop = this.debugLog.scrollHeight;
console.log(message);
}
/**
* Update the connection status display
*/
updateStatus(status) {
this.statusSpan.textContent = status;
this.log(`Status: ${status}`);
}
/**
* Check for available media tracks and set them up if present
* This is called when the bot is ready or when the transport state changes to ready
*/
setupMediaTracks() {
if (!this.rtviClient) return;
// Get current tracks from the client
const tracks = this.rtviClient.tracks();
// Set up any available bot tracks
if (tracks.bot?.audio) {
this.setupAudioTrack(tracks.bot.audio);
}
if (tracks.bot?.video) {
this.setupVideoTrack(tracks.bot.video);
}
}
/**
* Set up listeners for track events (start/stop)
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.rtviClient) return;
// Listen for new tracks starting
this.rtviClient.on(RTVIEvent.TrackStarted, (track, participant) => {
// Only handle non-local (bot) tracks
if (!participant?.local) {
if (track.kind === 'audio') {
this.setupAudioTrack(track);
} else if (track.kind === 'video') {
this.setupVideoTrack(track);
}
this.log(
`Track started event: ${track.kind} from ${
participant?.name || 'unknown'
}`
);
} else {
this.log('Local mic unmuted');
}
});
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
if (participant.local) {
this.log('Local mic muted');
return;
}
this.log(
`Track stopped event: ${track.kind} from ${
participant?.name || 'unknown'
}`
);
});
}
/**
* Set up an audio track for playback
* Handles both initial setup and track updates
*/
setupAudioTrack(track) {
this.log('Setting up audio track');
// Check if we're already playing this track
if (this.botAudio.srcObject) {
const oldTrack = this.botAudio.srcObject.getAudioTracks()[0];
if (oldTrack?.id === track.id) return;
}
// Create a new MediaStream with the track and set it as the audio source
this.botAudio.srcObject = new MediaStream([track]);
}
/**
* Set up a video track for display
* Handles both initial setup and track updates
*/
setupVideoTrack(track) {
this.log('Setting up video track');
const videoEl = document.createElement('video');
videoEl.autoplay = true;
videoEl.playsInline = true;
videoEl.muted = true;
videoEl.style.width = '100%';
videoEl.style.height = '100%';
videoEl.style.objectFit = 'cover';
// Check if we're already displaying this track
if (this.botVideoContainer.querySelector('video')?.srcObject) {
const oldTrack = this.botVideoContainer
.querySelector('video')
.srcObject.getVideoTracks()[0];
if (oldTrack?.id === track.id) return;
}
// Create a new MediaStream with the track and set it as the video source
videoEl.srcObject = new MediaStream([track]);
this.botVideoContainer.innerHTML = '';
this.botVideoContainer.appendChild(videoEl);
}
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
*/
async connect() {
try {
const botSelector = document.getElementById('bot-selector');
const selectedBot = botSelector.value;
this.rtviClient.params.requestData.bot_name = selectedBot;
// Initialize audio/video devices
this.log('Initializing devices...');
await this.rtviClient.initDevices();
// Connect to the bot
this.log(`Connecting to bot: ${selectedBot}`);
await this.rtviClient.connect();
this.log('Connection complete');
} catch (error) {
// Handle any errors during connection
console.error('Connection error:', error);
this.log(`Error connecting: ${JSON.stringify(error.message)}`);
this.log(`Error stack: ${error.stack}`);
this.updateStatus('Error');
// Clean up if there's an error
if (this.rtviClient) {
try {
await this.rtviClient.disconnect();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError.message}`);
}
}
}
}
/**
* Disconnect from the bot and clean up media resources
*/
async disconnect() {
if (this.rtviClient) {
try {
// Disconnect the RTVI client
await this.rtviClient.disconnect();
// Clean up audio
if (this.botAudio.srcObject) {
this.botAudio.srcObject.getTracks().forEach((track) => track.stop());
this.botAudio.srcObject = null;
}
// Clean up video
if (this.botVideoContainer.querySelector('video')?.srcObject) {
const video = this.botVideoContainer.querySelector('video');
video.srcObject.getTracks().forEach((track) => track.stop());
video.srcObject = null;
}
this.botVideoContainer.innerHTML = '';
} catch (error) {
this.log(`Error disconnecting: ${error.message}`);
}
}
}
}
// Initialize the client when the page loads
window.addEventListener('DOMContentLoaded', () => {
new ChatbotClient();
});

View File

@@ -0,0 +1,135 @@
body {
margin: 0;
padding: 20px;
font-family: Arial, sans-serif;
background-color: #f0f0f0;
}
.container {
max-width: 1200px;
margin: 0 auto;
}
.status-bar,
.device-bar {
display: flex;
justify-content: space-between;
align-items: center;
padding: 10px;
background-color: #fff;
border-radius: 8px;
margin-bottom: 20px;
}
.controls,
.device-controls {
display: flex;
align-items: center;
gap: 10px; /* Adds spacing between elements */
}
.device-controls {
margin-left: auto;
}
.controls button,
.device-controls button {
padding: 8px 16px;
margin-left: 10px;
border: none;
border-radius: 4px;
cursor: pointer;
}
#bot-selector,
#device-selector {
padding: 8px 16px;
padding-right: 40px;
border: none;
border-radius: 4px;
background-color: #6c757d; /* Gray background */
color: white; /* White text */
cursor: pointer;
appearance: none; /* Removes default browser styling for dropdowns */
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24' fill='white'%3E%3Cpath d='M7 10l5 5 5-5z'/%3E%3C/svg%3E"); /* Custom arrow */
background-repeat: no-repeat;
background-position: right 8px center; /* Position the arrow */
}
#bot-selector:focus,
#device-selector:focus {
outline: none;
box-shadow: 0 0 4px rgba(0, 0, 0, 0.3); /* Add a subtle focus effect */
}
#connect-btn {
background-color: #4caf50;
color: white;
}
#disconnect-btn {
background-color: #f44336;
color: white;
}
#mic-toggle-btn {
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.main-content {
background-color: #fff;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
}
.bot-container {
display: flex;
flex-direction: column;
align-items: center;
}
#bot-video-container {
width: 640px;
height: 360px;
background-color: #e0e0e0;
border-radius: 8px;
margin: 20px auto;
overflow: hidden;
display: flex;
align-items: center;
justify-content: center;
}
#bot-video-container video {
width: 100%;
height: 100%;
object-fit: cover;
}
.debug-panel {
background-color: #fff;
border-radius: 8px;
padding: 20px;
}
.debug-panel h3 {
margin: 0 0 10px 0;
font-size: 16px;
font-weight: bold;
}
#debug-log {
height: 200px;
overflow-y: auto;
background-color: #f8f8f8;
padding: 10px;
border-radius: 4px;
font-family: monospace;
font-size: 12px;
line-height: 1.4;
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

View File

@@ -1,3 +0,0 @@
DAILY_API_KEY=
OPENAI_API_KEY=
CARTESIA_API_KEY=

View File

@@ -1,4 +0,0 @@
python-dotenv==1.0.1
modal==0.71.3
pipecat-ai[daily,silero,cartesia,openai]
fastapi==0.115.6

View File

@@ -0,0 +1,307 @@
"""modal_example.
This module shows a simple example of how to deploy a bot using Modal and FastAPI.
It includes:
- FastAPI endpoints for starting agents and checking bot statuses.
- Dynamic loading of bot implementations.
- Use of a Daily transport for bot communication.
"""
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import importlib
import os
from contextlib import asynccontextmanager
from typing import Any, Dict, Literal
import aiohttp
import modal
from fastapi import APIRouter, FastAPI, HTTPException
from fastapi.responses import JSONResponse, RedirectResponse
from pydantic import BaseModel
# container specifications for the FastAPI web server
web_image = (
modal.Image.debian_slim(python_version="3.13")
.pip_install_from_requirements("requirements.txt")
.pip_install("pipecat-ai[daily]")
.add_local_dir("src", remote_path="/root/src")
)
# container specifications for the Pipecat pipeline
bot_image = (
modal.Image.debian_slim(python_version="3.13")
.apt_install("ffmpeg")
.pip_install_from_requirements("requirements.txt")
.pip_install("pipecat-ai[daily,elevenlabs,openai,silero,google]")
.add_local_dir("src", remote_path="/root/src")
)
app = modal.App("pipecat-modal", secrets=[modal.Secret.from_dotenv()])
router = APIRouter()
bot_jobs = {}
daily_helpers = {}
# Names of all supported bot implementations
# These correspond to the bot files in the src directory
BotName = Literal["openai", "gemini", "vllm"]
def cleanup():
"""Cleanup function to terminate all bot processes.
Called during server shutdown.
"""
for entry in bot_jobs.values():
func = modal.FunctionCall.from_id(entry[0])
if func:
func.cancel()
def get_bot_file(bot_name: BotName) -> str:
"""Retrieve the bot file name corresponding to the provided bot_name.
Args:
bot_name (BotName): The name of the bot (e.g., 'openai', 'gemini', 'vllm').
Returns:
str: The file name corresponding to the bot implementation.
Raises:
ValueError: If the bot name is invalid or not supported.
"""
# bot_implementation = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
bot_implementation = bot_name.lower().strip()
if not bot_implementation:
bot_implementation = "openai"
if bot_implementation not in ["openai", "gemini", "vllm"]:
raise ValueError(
f"Invalid BOT_IMPLEMENTATION: {bot_implementation}. Must be 'openai' or 'gemini' or 'vllm'"
)
return f"bot_{bot_implementation}"
def get_runner(path: str, bot_file: str) -> callable:
"""Dynamically import the run_bot function based on the bot name.
Args:
path (str): The path to the bot files (e.g., 'src').
bot_file (str): The file name of the bot implementation (e.g., 'openai', 'gemini', 'vllm').
Returns:
function: The run_bot function from the specified bot module.
Raises:
ImportError: If the specified bot module or run_bot function is not found.
"""
try:
# Dynamically construct the module name
module_name = f"{path}.{bot_file}"
# Import the module
module = importlib.import_module(module_name)
# Get the run_bot function from the module
return getattr(module, "run_bot")
except (ImportError, AttributeError) as e:
raise ImportError(f"Failed to import run_bot from {module_name}: {e}")
async def create_room_and_token() -> tuple[str, str]:
"""Create a Daily room and generate an authentication token.
This function checks for existing room URL and token in the environment variables.
If not found, it creates a new room using the Daily API and generates a token for it.
Returns:
tuple[str, str]: A tuple containing the room URL and the authentication token.
Raises:
HTTPException: If room creation or token generation fails.
"""
from pipecat.transports.services.helpers.daily_rest import DailyRoomParams
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
token = os.getenv("DAILY_SAMPLE_ROOM_TOKEN", None)
if not room_url:
room = await daily_helpers["rest"].create_room(DailyRoomParams())
if not room.url:
raise HTTPException(status_code=500, detail="Failed to create room")
room_url = room.url
token = await daily_helpers["rest"].get_token(room_url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room_url}")
return room_url, token
@app.function(image=bot_image, min_containers=1)
async def bot_runner(room_url, token, bot_name: BotName = "openai"):
"""Launch the provided bot process, providing the given room URL and token for the bot to join.
Args:
room_url (str): The URL of the Daily room where the bot and client will communicate.
token (str): The authentication token for the room.
bot_name (BotName): The name of the bot implementation to use. Defaults to "openai".
Raises:
HTTPException: If the bot pipeline fails to start.
"""
try:
path = "src"
bot_file = get_bot_file(bot_name)
run_bot = get_runner(path, bot_file)
print(f"Starting bot process: {bot_file} -u {room_url} -t {token}")
await run_bot(room_url, token)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start bot pipeline: {e}")
@asynccontextmanager
async def lifespan(app: FastAPI):
"""FastAPI lifespan manager that handles startup and shutdown tasks.
- Creates aiohttp session
- Initializes Daily API helper
- Cleans up resources on shutdown
"""
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
class ConnectData(BaseModel):
"""Data provided by client to specify the bot pipeline.
Attributes:
bot_name (BotName): The name of the bot to connect to. Defaults to "openai".
"""
bot_name: BotName = "openai"
async def start(data: ConnectData):
"""Internal method to start a bot agent and return the room URL and token.
Args:
data (ConnectData): The data containing the bot name to use.
Returns:
tuple[str, str]: A tuple containing the room URL and token.
"""
room_url, token = await create_room_and_token()
launch_bot_func = modal.Function.from_name("pipecat-modal", "bot_runner")
function_id = launch_bot_func.spawn(room_url, token, data.bot_name)
bot_jobs[function_id] = (function_id, room_url)
return room_url, token
@router.get("/")
async def start_agent():
"""A user endpoint for launching a bot agent and redirecting to the created room URL.
This function retrieves the bot implementation from the environment,
starts the bot agent, and redirects the user to the room URL to
interact with the bot through a Daily Prebuilt Interface.
Returns:
RedirectResponse: A response that redirects to the room URL.
"""
bot_name = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
print(f"Starting bot: {bot_name}")
room_url, token = await start(ConnectData(bot_name=bot_name))
return RedirectResponse(room_url)
@router.post("/connect")
async def rtvi_connect(data: ConnectData) -> Dict[Any, Any]:
"""A user endpoint for launching a bot agent and retrieving the room/token credentials.
This function retrieves the bot implementation from the request, if provided,
starts the bot agent, and returns the room URL and token for the bot. This allows the
client to then connect to the bot using their own RTVI interface.
Args:
data (ConnectData): Optional. The data containing the bot name to use.
Returns:
Dict[Any, Any]: A dictionary containing the room URL and token.
"""
print(f"Starting bot: {data.bot_name}")
if data is None or not data.bot_name:
data.bot_name = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
room_url, token = await start(data)
return {"room_url": room_url, "token": token}
@router.get("/status/{fid}")
def get_status(fid: str):
"""Retrieve the status of a bot process by its function ID.
Args:
fid (str): The function ID of the bot process.
Returns:
JSONResponse: A JSON response containing the bot's status and result code.
Raises:
HTTPException: If the bot process with the given ID is not found.
"""
func = modal.FunctionCall.from_id(fid)
if not func:
raise HTTPException(status_code=404, detail=f"Bot with process id: {fid} not found")
try:
result = func.get(timeout=0)
return JSONResponse({"bot_id": fid, "status": "finished", "code": result})
except modal.exception.OutputExpiredError:
return JSONResponse({"bot_id": fid, "status": "finished", "code": 404})
except TimeoutError:
return JSONResponse({"bot_id": fid, "status": "running", "code": 202})
@app.function(image=web_image, min_containers=1)
@modal.concurrent(max_inputs=1)
@modal.asgi_app()
def fastapi_app():
"""Create and configure the FastAPI application.
This function initializes the FastAPI app with middleware, routes, and lifespan management.
It is decorated to be used as a Modal ASGI app.
"""
from fastapi.middleware.cors import CORSMiddleware
# Initialize FastAPI app
web_app = FastAPI(lifespan=lifespan)
web_app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Include the endpoints from endpoints.py
web_app.include_router(router)
return web_app

View File

@@ -0,0 +1,14 @@
DAILY_API_KEY=
# determines which bot file to default to: 'openai', 'gemini', or 'vllm'
BOT_IMPLEMENTATION=openai
# needed for the openai bot pipeline
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
# needed for the gemini live bot pipeline
GOOGLE_API_KEY=
# needed if you modified the API Key for your self-hosted LLM
VLLM_API_KEY=

View File

@@ -0,0 +1,2 @@
python-dotenv==1.0.1
modal==0.71.3

Binary file not shown.

After

Width:  |  Height:  |  Size: 759 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 884 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 876 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 881 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 874 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 882 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 885 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 888 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 890 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 898 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 836 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 905 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 849 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 864 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 858 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 875 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 881 KiB

View File

@@ -0,0 +1,198 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Gemini Bot Implementation.
This module implements a chatbot using Google's Gemini Multimodal Live model.
It includes:
- Real-time audio/video interaction through Daily
- Animated robot avatar
- Speech-to-speech model
The bot runs as part of a pipeline that processes audio/video frames and manages
the conversation flow using Gemini's streaming capabilities.
"""
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
OutputImageRawFrame,
SpriteFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.services.gemini_multimodal_live.gemini import GeminiMultimodalLiveLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
try:
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
except ValueError:
# Handle the case where logger is already initialized
pass
sprites = []
script_dir = os.path.dirname(__file__)
for i in range(1, 26):
# Build the full path to the image file
full_path = os.path.join(script_dir, f"assets/robot0{i}.png")
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
# Create a smooth animation by adding reversed frames
flipped = sprites[::-1]
sprites.extend(flipped)
# Define static and animated states
quiet_frame = sprites[0] # Static frame for when bot is listening
talking_frame = SpriteFrame(images=sprites) # Animation sequence for when bot is talking
class TalkingAnimation(FrameProcessor):
"""Manages the bot's visual animation states.
Switches between static (listening) and animated (talking) states based on
the bot's current speaking status.
"""
def __init__(self):
super().__init__()
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and update animation state.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
# Switch to talking animation when bot starts speaking
if isinstance(frame, BotStartedSpeakingFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
# Return to static frame when bot stops speaking
elif isinstance(frame, BotStoppedSpeakingFrame):
await self.push_frame(quiet_frame)
self._is_talking = False
await self.push_frame(frame, direction)
async def run_bot(room_url: str, token: str):
"""Main bot execution function.
Sets up and runs the bot pipeline including:
- Daily video transport with specific audio parameters
- Gemini Live multimodal model integration
- Voice activity detection
- Animation processing
- RTVI event handling
"""
# Set up Daily transport with specific audio/video parameters for Gemini
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=576,
vad_enabled=True,
vad_audio_passthrough=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5)),
),
)
# Initialize the Gemini Multimodal Live model
llm = GeminiMultimodalLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
voice_id="Puck", # Aoede, Charon, Fenrir, Kore, Puck
transcribe_user_audio=True,
)
messages = [
{
"role": "user",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself.",
},
]
# Set up conversation context and management
# The context_aggregator will automatically collect conversation context
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
ta = TalkingAnimation()
#
# RTVI events for Pipecat client UI
#
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(),
rtvi,
context_aggregator.user(),
llm,
ta,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
)
await task.queue_frame(quiet_frame)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
# Kick off the conversation
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -0,0 +1,226 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""OpenAI Bot Implementation.
This module implements a chatbot using OpenAI's GPT-4 model for natural language
processing. It includes:
- Real-time audio/video interaction through Daily
- Animated robot avatar
- Text-to-speech using ElevenLabs
- Support for both English and Spanish
The bot runs as part of a pipeline that processes audio/video frames and manages
the conversation flow.
"""
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
OutputImageRawFrame,
SpriteFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
try:
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
except ValueError:
# Handle the case where logger is already initialized
pass
sprites = []
script_dir = os.path.dirname(__file__)
# Load sequential animation frames
for i in range(1, 26):
# Build the full path to the image file
full_path = os.path.join(script_dir, f"assets/robot0{i}.png")
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
# Create a smooth animation by adding reversed frames
flipped = sprites[::-1]
sprites.extend(flipped)
# Define static and animated states
quiet_frame = sprites[0] # Static frame for when bot is listening
talking_frame = SpriteFrame(images=sprites) # Animation sequence for when bot is talking
class TalkingAnimation(FrameProcessor):
"""Manages the bot's visual animation states.
Switches between static (listening) and animated (talking) states based on
the bot's current speaking status.
"""
def __init__(self):
super().__init__()
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and update animation state.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
# Switch to talking animation when bot starts speaking
if isinstance(frame, BotStartedSpeakingFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
# Return to static frame when bot stops speaking
elif isinstance(frame, BotStoppedSpeakingFrame):
await self.push_frame(quiet_frame)
self._is_talking = False
await self.push_frame(frame, direction)
async def run_bot(room_url: str, token: str):
"""Main bot execution function.
Sets up and runs the bot pipeline including:
- Daily video transport
- Speech-to-text and text-to-speech services
- Language model integration
- Animation processing
- RTVI event handling
"""
# Set up Daily transport with video/audio parameters
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=576,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
),
)
# Initialize text-to-speech service
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="SAz9YHcvj6GT2YYXdXww",
#
# Spanish
#
# model="eleven_multilingual_v2",
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
# Initialize LLM service
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
#
# English
#
"content": "You are an incessant one-upper. Start by asking the user how their day is going.",
#
# Spanish
#
# "content": "Eres Chatbot, un amigable y útil robot. Tu objetivo es demostrar tus capacidades de una manera breve. Tus respuestas se convertiran a audio así que nunca no debes incluir caracteres especiales. Contesta a lo que el usuario pregunte de una manera creativa, útil y breve. Empieza por presentarte a ti mismo.",
},
]
# Set up conversation context and management
# The context_aggregator will automatically collect conversation context
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
ta = TalkingAnimation()
#
# RTVI events for Pipecat client UI
#
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(),
rtvi,
context_aggregator.user(),
llm,
tts,
ta,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
)
await task.queue_frame(quiet_frame)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
# Kick off the conversation
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -0,0 +1,239 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""OpenAI Bot Implementation.
This module implements a chatbot using OpenAI's GPT-4 model for natural language
processing. It includes:
- Real-time audio/video interaction through Daily
- Animated robot avatar
- Text-to-speech using ElevenLabs
- Support for both English and Spanish
The bot runs as part of a pipeline that processes audio/video frames and manages
the conversation flow.
"""
import os
import sys
from typing import List
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionMessageParam
from PIL import Image
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
OutputImageRawFrame,
SpriteFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
try:
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
except ValueError:
# Handle the case where logger is already initialized
pass
# REPLACE WITH YOUR MODAL URL ENDPOINT
modal_url = "https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run"
api_key = os.getenv("VLLM_API_KEY", "super-secret-key")
sprites = []
script_dir = os.path.dirname(__file__)
# Load sequential animation frames
for i in range(1, 26):
# Build the full path to the image file
full_path = os.path.join(script_dir, f"assets/robot0{i}.png")
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
# Create a smooth animation by adding reversed frames
flipped = sprites[::-1]
sprites.extend(flipped)
# Define static and animated states
quiet_frame = sprites[0] # Static frame for when bot is listening
talking_frame = SpriteFrame(images=sprites) # Animation sequence for when bot is talking
class TalkingAnimation(FrameProcessor):
"""Manages the bot's visual animation states.
Switches between static (listening) and animated (talking) states based on
the bot's current speaking status.
"""
def __init__(self):
super().__init__()
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and update animation state.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
# Switch to talking animation when bot starts speaking
if isinstance(frame, BotStartedSpeakingFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
# Return to static frame when bot stops speaking
elif isinstance(frame, BotStoppedSpeakingFrame):
await self.push_frame(quiet_frame)
self._is_talking = False
await self.push_frame(frame, direction)
async def run_bot(room_url: str, token: str):
"""Main bot execution function.
Sets up and runs the bot pipeline including:
- Daily video transport
- Speech-to-text and text-to-speech services
- Language model integration
- Animation processing
- RTVI event handling
"""
# Set up Daily transport with video/audio parameters
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=576,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
),
)
# Initialize text-to-speech service
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="D38z5RcWu1voky8WS1ja",
#
# Spanish
#
# model="eleven_multilingual_v2",
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
# Initialize LLM service
llm = OpenAILLMService(
# To use OpenAI
api_key=api_key,
# Or, to use a local vLLM (or similar) api server
model="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16",
base_url=f"{modal_url}/v1",
)
messages = [
{
"role": "system",
#
# English
#
"content": "You are a salesman for Modal, the cloud-native serverless Python computing platform.",
#
# Spanish
#
# "content": "Eres Chatbot, un amigable y útil robot. Tu objetivo es demostrar tus capacidades de una manera breve. Tus respuestas se convertiran a audio así que nunca no debes incluir caracteres especiales. Contesta a lo que el usuario pregunte de una manera creativa, útil y breve. Empieza por presentarte a ti mismo.",
},
]
# Set up conversation context and management
# The context_aggregator will automatically collect conversation context
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
ta = TalkingAnimation()
#
# RTVI events for Pipecat client UI
#
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(),
rtvi,
context_aggregator.user(),
llm,
tts,
ta,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
)
await task.queue_frame(quiet_frame)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
# Kick off the conversation
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -0,0 +1,84 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import importlib
import os
def get_bot_file(arg_bot: str | None) -> str:
bot_implementation = arg_bot or os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
if not bot_implementation:
bot_implementation = "openai"
if bot_implementation not in ["openai", "gemini", "vllm"]:
raise ValueError(
f"Invalid BOT_IMPLEMENTATION: {bot_implementation}. Must be 'openai' or 'gemini'"
)
return f"bot_{bot_implementation}"
def get_runner(bot_file: str):
"""Dynamically import the run_bot function based on the bot name.
Args:
bot_name (str): The name of the bot implementation (e.g., 'openai', 'gemini').
Returns:
function: The run_bot function from the specified bot module.
Raises:
ImportError: If the specified bot module or run_bot function is not found.
"""
try:
# Dynamically construct the module name
module_name = f"{bot_file}"
# Import the module
module = importlib.import_module(module_name)
# Get the run_bot function from the module
return getattr(module, "run_bot")
except (ImportError, AttributeError) as e:
raise ImportError(f"Failed to import run_bot from {module_name}: {e}")
def main():
"""Parse the args to launch the appropriate bot using the given room/token."""
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-t",
"--token",
type=str,
required=False,
help="Daily room token",
)
parser.add_argument(
"-b",
"--bot",
type=str,
required=False,
help="Bot runner to use (e.g., openai, gemini)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
token = args.token or os.getenv("DAILY_SAMPLE_ROOM_TOKEN")
bot_file = get_bot_file(args.bot)
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
run_bot = get_runner(bot_file)
asyncio.run(run_bot(url, token))
if __name__ == "__main__":
main()

View File

@@ -100,7 +100,28 @@ phone numbers with valid values for your use case.
### Dialin Request
The server will receive a request when a call is received from Daily.
The server will receive a request when a call is received from Daily.
The payload that the webhook received is as follows:
```json
{
// for dial-in from webhook
"To": "+14152251493",
"From": "+14158483432",
"callId": "string-contains-uuid",
"callDomain": "string-contains-uuid",
"sipHeaders": {
"X-My-Custom-Header": "value",
"x-caller": "+1234567890",
"x-called": "+1987654321",
},
}
```
The `To`, `From`, `callId`, `callDomain` fields are converted to
`snake_case` and mapped to `dialin_settings`. In addition, `sipHeader`
contains any custom SIP headers received by Daily on the SIP
interconnect address (`sip_uri`). These are headers sent from
Twilio or other external SIP platforms, for example, to send the
caller's phone number.
### Dialout Request
@@ -158,6 +179,7 @@ curl -X POST http://localhost:3000/api/dial \
"From": "+1987654321",
"callId": "call-uuid-123",
"callDomain": "domain-uuid-456",
"sipHeader": {},
"dialout_settings": [
{
"phoneNumber": "+1234567890",

View File

@@ -39,6 +39,11 @@ class RoomRequest(BaseModel):
None, description="A flag to perform voicemail or answeing-machine detection"
)
call_transfer: Optional[Dict[str, Any]] = Field(None, description="to initiate a call transfer")
sipHeaders: Optional[Dict[str, Any]] = Field(
None,
alias="sip_headers",
description="Custom SIP headers received from the external SIP provider",
)
class Config:
populate_by_name = True
@@ -57,6 +62,14 @@ class RoomRequest(BaseModel):
"callDomain": "string-contains-uuid"
These need to be remapped to dialin_settings
In addition, we may receive in the body that can be
sent to the bot as a custom field, sip_headers
"sipHeaders": {
"X-My-Custom-Header": "value",
"x-caller": "+14158483432",
"x-called": "+14152251493",
},
"dialout_settings": [
{"phoneNumber": "+14158483432", "callerId": "+14152251493"},
{"sipUri": "sip:username@sip.hostname"}
@@ -157,6 +170,7 @@ async def dial(request: RoomRequest, raw_request: Request):
"dialout_settings": request.dialout_settings,
"voicemail_detection": request.voicemail_detection,
"call_transfer": request.call_transfer,
"sip_headers": request.sipHeaders, # passing the SIP headers to the bot
},
}

View File

@@ -65,6 +65,7 @@ export default async function handler(req, res) {
From,
callId,
callDomain,
sipHeaders,
dialout_settings,
voicemail_detection,
call_transfer
@@ -117,6 +118,7 @@ export default async function handler(req, res) {
dialout_settings,
voicemail_detection,
call_transfer,
sip_headers: sipHeaders,
},
};

View File

@@ -4,6 +4,7 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import aiohttp
@@ -21,44 +22,23 @@ from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
# Check if we're in local development mode
LOCAL_RUN = os.getenv("LOCAL_RUN")
if LOCAL_RUN:
import asyncio
import webbrowser
try:
from local_runner import configure
except ImportError:
logger.error("Could not import local_runner module. Local development mode may not work.")
# Load environment variables
load_dotenv(override=True)
# Check if we're in local development mode
LOCAL_RUN = os.getenv("LOCAL_RUN")
async def main(room_url: str, token: str):
async def main(transport: DailyTransport):
"""Main pipeline setup and execution function.
Args:
room_url: The Daily room URL
token: The Daily room token
transport: The DailyTransport object for the bot
"""
logger.debug("Starting bot in room: {}", room_url)
transport = DailyTransport(
room_url,
token,
"bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
logger.debug("Starting bot")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
@@ -126,10 +106,25 @@ async def bot(args: DailySessionArguments):
body: The configuration object from the request body
session_id: The session ID for logging
"""
from pipecat.audio.filters.krisp_filter import KrispFilter
logger.info(f"Bot process initialized {args.room_url} {args.token}")
transport = DailyTransport(
args.room_url,
args.token,
"Pipecat Bot",
DailyParams(
audio_in_enabled=True,
audio_in_filter=None if LOCAL_RUN else KrispFilter(),
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
try:
await main(args.room_url, args.token)
await main(transport)
logger.info("Bot process completed")
except Exception as e:
logger.exception(f"Error in bot process: {str(e)}")
@@ -137,18 +132,27 @@ async def bot(args: DailySessionArguments):
# Local development functions
async def local_main():
async def local_daily():
"""Function for local development testing."""
from local_runner import configure
try:
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
logger.warning("_")
logger.warning("_")
logger.warning(f"Talk to your voice agent here: {room_url}")
logger.warning("_")
logger.warning("_")
webbrowser.open(room_url)
await main(room_url, token)
transport = DailyTransport(
room_url,
token,
"Pipecat Bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
await main(transport)
except Exception as e:
logger.exception(f"Error in local development mode: {e}")
@@ -156,6 +160,6 @@ async def local_main():
# Local development entry point
if LOCAL_RUN and __name__ == "__main__":
try:
asyncio.run(local_main())
asyncio.run(local_daily())
except Exception as e:
logger.exception(f"Failed to run in local mode: {e}")

View File

@@ -1,2 +1,4 @@
CARTESIA_API_KEY=
OPENAI_API_KEY=
OPENAI_API_KEY=
# Local dev only
DAILY_API_KEY=

View File

@@ -7,6 +7,7 @@
import os
import aiohttp
from fastapi import HTTPException
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams

View File

@@ -1,6 +1,8 @@
agent_name = "my-first-agent"
image = "your-username/my-first-agent:0.1"
image_credentials = "your-dockerhub-creds"
secret_set = "my-first-agent-secrets"
enable_krisp = true
[scaling]
min_instances = 0

View File

@@ -16,23 +16,25 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.piper.tts import PiperTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_out_enabled=True),
}
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
# Create an HTTP session
async with aiohttp.ClientSession() as session:
@@ -47,12 +49,12 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -16,24 +16,25 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.rime.tts import RimeHttpTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_out_enabled=True),
}
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
),
)
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = RimeHttpTTSService(
@@ -49,12 +50,12 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -15,23 +15,25 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_out_enabled=True),
}
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
@@ -45,12 +47,12 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -77,37 +77,36 @@ async def configure_livekit():
async def main():
async with aiohttp.ClientSession() as session:
(url, token, room_name) = await configure_livekit()
(url, token, room_name) = await configure_livekit()
transport = LiveKitTransport(
url=url,
token=token,
room_name=room_name,
params=LiveKitParams(audio_out_enabled=True),
)
transport = LiveKitTransport(
url=url,
token=token,
room_name=room_name,
params=LiveKitParams(audio_out_enabled=True),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
runner = PipelineRunner()
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()]))
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant_id):
await asyncio.sleep(1)
await task.queue_frame(
TextFrame(
"Hello there! How are you doing today? Would you like to talk about the weather?"
)
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant_id):
await asyncio.sleep(1)
await task.queue_frame(
TextFrame(
"Hello there! How are you doing today? Would you like to talk about the weather?"
)
)
await runner.run(task)
await runner.run(task)
if __name__ == "__main__":

View File

@@ -15,23 +15,25 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.riva.tts import FastPitchTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_out_enabled=True),
}
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
@@ -42,12 +44,12 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -16,23 +16,25 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_out_enabled=True),
}
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
@@ -55,12 +57,12 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
async def on_client_connected(transport, client):
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -16,25 +16,31 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal.image import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
"webrtc": lambda: TransportParams(
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
}
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
# Create an HTTP session
async with aiohttp.ClientSession() as session:
@@ -54,18 +60,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -15,25 +15,31 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.google.image import GoogleImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
"webrtc": lambda: TransportParams(
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
}
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
imagegen = GoogleImageGenService(
api_key=os.getenv("GOOGLE_API_KEY"),
@@ -54,18 +60,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -5,10 +5,17 @@
#
import argparse
import asyncio
import os
from contextlib import asynccontextmanager
from typing import Dict
import uvicorn
from dotenv import load_dotenv
from fastapi import BackgroundTasks, FastAPI
from fastapi.responses import RedirectResponse
from loguru import logger
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
@@ -20,14 +27,29 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.network.webrtc_connection import IceServer, SmallWebRTCConnection
load_dotenv(override=True)
app = FastAPI()
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
# Store connections by pc_id
pcs_map: Dict[str, SmallWebRTCConnection] = {}
ice_servers = [
IceServer(
urls="stun:stun.l.google.com:19302",
)
]
# Mount the frontend at /
app.mount("/client", SmallWebRTCPrebuiltUI)
async def run_example(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
@@ -88,10 +110,6 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
@@ -99,7 +117,58 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
await runner.run(task)
if __name__ == "__main__":
from run import main
@app.get("/", include_in_schema=False)
async def root_redirect():
return RedirectResponse(url="/client/")
main()
@app.post("/api/offer")
async def offer(request: dict, background_tasks: BackgroundTasks):
pc_id = request.get("pc_id")
if pc_id and pc_id in pcs_map:
pipecat_connection = pcs_map[pc_id]
logger.info(f"Reusing existing connection for pc_id: {pc_id}")
await pipecat_connection.renegotiate(
sdp=request["sdp"],
type=request["type"],
restart_pc=request.get("restart_pc", False),
)
else:
pipecat_connection = SmallWebRTCConnection(ice_servers)
await pipecat_connection.initialize(sdp=request["sdp"], type=request["type"])
@pipecat_connection.event_handler("closed")
async def handle_disconnected(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Discarding peer connection for pc_id: {webrtc_connection.pc_id}")
pcs_map.pop(webrtc_connection.pc_id, None)
# Run example function with SmallWebRTC transport arguments.
background_tasks.add_task(run_example, pipecat_connection)
answer = pipecat_connection.get_answer()
# Updating the peer connection inside the map
pcs_map[answer["pc_id"]] = pipecat_connection
return answer
@asynccontextmanager
async def lifespan(app: FastAPI):
yield # Run app
coros = [pc.disconnect() for pc in pcs_map.values()]
await asyncio.gather(*coros)
pcs_map.clear()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument(
"--host", default="localhost", help="Host for HTTP server (default: localhost)"
)
parser.add_argument(
"--port", type=int, default=7860, help="Port for HTTP server (default: 7860)"
)
args = parser.parse_args()
uvicorn.run(app, host=args.host, port=args.port)

View File

@@ -9,11 +9,11 @@ import os
import sys
import aiohttp
from daily_runner import configure
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.examples.daily_runner import configure
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -37,9 +37,9 @@ async def main():
token,
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)

View File

@@ -10,7 +10,6 @@ import json
import os
import sys
import aiohttp
from deepgram import LiveOptions
from dotenv import load_dotenv
from livekit import api
@@ -104,101 +103,100 @@ async def configure_livekit():
async def main():
async with aiohttp.ClientSession() as session:
(url, token, room_name) = await configure_livekit()
(url, token, room_name) = await configure_livekit()
transport = LiveKitTransport(
url=url,
token=token,
room_name=room_name,
params=LiveKitParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
transport = LiveKitTransport(
url=url,
token=token,
room_name=room_name,
params=LiveKitParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(
vad_events=True,
),
)
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(
vad_events=True,
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be converted to audio so don't include special characters in your answers. "
"Respond to what the user said in a creative and helpful way.",
},
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be converted to audio so don't include special characters in your answers. "
"Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
runner = PipelineRunner()
runner = PipelineRunner()
task = PipelineTask(
Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
],
),
params=PipelineParams(
allow_interruptions=True, enable_metrics=True, enable_usage_metrics=True
),
)
task = PipelineTask(
Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
],
),
params=PipelineParams(
allow_interruptions=True, enable_metrics=True, enable_usage_metrics=True
),
)
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant_id):
await asyncio.sleep(1)
await task.queue_frame(
TextFrame(
"Hello there! How are you doing today? Would you like to talk about the weather?"
)
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant_id):
await asyncio.sleep(1)
await task.queue_frame(
TextFrame(
"Hello there! How are you doing today? Would you like to talk about the weather?"
)
)
# Register an event handler to receive data from the participant via text chat
# in the LiveKit room. This will be used to as transcription frames and
# interrupt the bot and pass it to llm for processing and
# then pass back to the participant as audio output.
@transport.event_handler("on_data_received")
async def on_data_received(transport, data, participant_id):
logger.info(f"Received data from participant {participant_id}: {data}")
# convert data from bytes to string
json_data = json.loads(data)
# Register an event handler to receive data from the participant via text chat
# in the LiveKit room. This will be used to as transcription frames and
# interrupt the bot and pass it to llm for processing and
# then pass back to the participant as audio output.
@transport.event_handler("on_data_received")
async def on_data_received(transport, data, participant_id):
logger.info(f"Received data from participant {participant_id}: {data}")
# convert data from bytes to string
json_data = json.loads(data)
await task.queue_frames(
[
BotInterruptionFrame(),
UserStartedSpeakingFrame(),
TranscriptionFrame(
user_id=participant_id,
timestamp=json_data["timestamp"],
text=json_data["message"],
),
UserStoppedSpeakingFrame(),
],
)
await task.queue_frames(
[
BotInterruptionFrame(),
UserStartedSpeakingFrame(),
TranscriptionFrame(
user_id=participant_id,
timestamp=json_data["timestamp"],
text=json_data["message"],
),
UserStoppedSpeakingFrame(),
],
)
await runner.run(task)
await runner.run(task)
if __name__ == "__main__":

View File

@@ -28,9 +28,8 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
@@ -64,7 +63,26 @@ class MonthPrepender(FrameProcessor):
await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
"webrtc": lambda: TransportParams(
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
}
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
"""Run the Calendar Month Narration bot using WebRTC transport.
Args:
@@ -73,17 +91,6 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
"""
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
)
# Create an HTTP session for API calls
async with aiohttp.ClientSession() as session:
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
@@ -159,18 +166,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
# Run the pipeline
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -26,9 +26,9 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
@@ -53,17 +53,30 @@ class MetricsLogger(FrameProcessor):
await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
@@ -117,17 +130,13 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -26,9 +26,8 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
@@ -68,20 +67,31 @@ class ImageSyncAggregator(FrameProcessor):
await self.push_frame(frame)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(),
),
}
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(),
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
@@ -139,17 +149,13 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -15,36 +15,46 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.aws.tts import PollyTTSService
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = PollyTTSService(
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region=os.getenv("AWS_REGION"),
voice_id="Amy",
params=PollyTTSService.InputParams(engine="neural", language="en-GB", rate="1.05"),
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
@@ -62,7 +72,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
@@ -91,18 +101,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

View File

@@ -18,25 +18,37 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
@@ -88,18 +100,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from run import main
from pipecat.examples.run import main
main()
main(run_example, transport_params=transport_params)

Some files were not shown because too many files have changed in this diff Show More