Compare commits

...

738 Commits

Author SHA1 Message Date
James Hush
230d92850a example: realtime with transcripts 2025-02-26 16:29:07 +08:00
Aleix Conchillo Flaqué
96c6aeaada Merge pull request #1295 from pipecat-ai/aleix/pipelinetask-keyword-arguments
PipelineTask: force constructor keyword arguments
2025-02-25 19:00:58 -08:00
Aleix Conchillo Flaqué
6722aae598 PipelineTask: force constructor keyword arguments 2025-02-25 18:58:47 -08:00
Aleix Conchillo Flaqué
66564392a6 Merge pull request #1293 from pipecat-ai/aleix/log-pipecat-version
log pipecat version on application startup
2025-02-25 18:57:52 -08:00
Aleix Conchillo Flaqué
f258f5ab66 Merge pull request #1292 from pipecat-ai/aleix/audiocontext-terminate-nicely
AudioContextWordTTSService: wait for all requested audio
2025-02-25 18:56:41 -08:00
Aleix Conchillo Flaqué
f8f0578c3d log pipecat version on application startup 2025-02-25 18:55:45 -08:00
Aleix Conchillo Flaqué
aa60a413f3 Merge pull request #1294 from pipecat-ai/aleix/improve-test-requirements
improve test-requirements.txt
2025-02-25 18:55:18 -08:00
Aleix Conchillo Flaqué
3e66f2378d improve test-requirements.txt 2025-02-25 17:34:33 -08:00
Aleix Conchillo Flaqué
9a50f33e36 AudioContextWordTTSService: wait for all requested audio 2025-02-25 15:35:47 -08:00
Aleix Conchillo Flaqué
4bd5e9c0a7 Merge pull request #1285 from pipecat-ai/aleix/handle-stop-task-gracefully
handle stop task gracefully
2025-02-25 11:25:38 -08:00
Mark Backman
12092c8715 Merge pull request #1288 from pipecat-ai/mb/clean-up-tts-text-input
TTSService: Remove newlines before sending text to TTS service to gen…
2025-02-25 14:00:43 -05:00
Mark Backman
92cc6d39f2 TTSService: Remove newlines before sending text to TTS service to generate 2025-02-25 13:37:25 -05:00
Aleix Conchillo Flaqué
34a50033cb tk: use TkTransportParams in examples 2025-02-25 10:24:24 -08:00
Aleix Conchillo Flaqué
e60b65228b allow multiple StartFrames 2025-02-25 10:24:04 -08:00
Mark Backman
e74864335b Merge pull request #1287 from pipecat-ai/mb/30-observer-pipeline-task
Example 30: Move observers to PipelineTask
2025-02-25 12:11:23 -05:00
Mark Backman
27a088a457 Merge pull request #1286 from pipecat-ai/mb/update-grok-2
Set grok-2 as default model for GrokLLMSService
2025-02-25 12:11:09 -05:00
Mark Backman
cfe72143b8 Example 30: Move observers to PipelineTask 2025-02-25 10:54:25 -05:00
Mark Backman
36a729cbfe Set grok-2 as default model for GrokLLMSService 2025-02-25 10:00:45 -05:00
Aleix Conchillo Flaqué
d2f006682c introduce new BaseTaskManager 2025-02-24 23:38:51 -08:00
Aleix Conchillo Flaqué
fb7fe540f5 tts: don't connect to websocket if already connected 2025-02-24 23:38:51 -08:00
Aleix Conchillo Flaqué
1ec68bd071 make sure we don't create tasks if already created 2025-02-24 23:38:51 -08:00
Aleix Conchillo Flaqué
4536d03e82 FrameProcessor: cancel input/push tasks on CancelFrame 2025-02-24 23:38:51 -08:00
Aleix Conchillo Flaqué
699704732c asyncio: re-raise CancelledError in wait_for_task() 2025-02-24 23:38:51 -08:00
Aleix Conchillo Flaqué
376d969a77 task: handle StopFrame and StopTaskFrame gracefully 2025-02-24 23:38:51 -08:00
Aleix Conchillo Flaqué
68789dfcf0 frames: add new StopFrame 2025-02-24 21:34:23 -08:00
Aleix Conchillo Flaqué
fe9fc61c4e Merge pull request #1282 from pipecat-ai/aleix/pipelinetask-observers-constructor
PipelineTask: pass observers in contructor parameter
2025-02-24 21:29:46 -08:00
Aleix Conchillo Flaqué
6028f0f23a PipelineTask: pass observers in contructor parameter 2025-02-24 21:29:17 -08:00
Aleix Conchillo Flaqué
e9a0959e28 Merge pull request #1283 from pipecat-ai/aleix/check-dangling-tasks
PipelineTask: add check_dangling_tasks parameter
2025-02-24 21:26:32 -08:00
Dominic Stewart
f66be2cfa7 Dom/gemini system prompt switching (#1260)
* Updated example to use Gemini

* Fixed typo

* Based on feedback, made the gemini file something that can be called separately

* Updated the readme

* Updated the readme

* Changed example to use gemini 2.0 flash lite

* This works

* Improvement

* I think this works

* Updated the code to use the correct prompt broken down into smaller pieces

* Added a few more things to detect in the prompt

* Fixed import ordering

* Updated prompt for non gemini bot to look for more voicemail examples, plus added logic to detect if we're doing dialin or not to avoid a non-fatal dialin related error

* moved terminate call to handlers class

* Simplified logic for dialin

* Forgot to use the same logic for the openai bot

* Starting to add logic for native audio input for flash lite

* Fixed logic

* Fixed some code based on suggestions
2025-02-24 22:29:55 -06:00
Aleix Conchillo Flaqué
f818bed58f Merge pull request #1281 from pipecat-ai/aleix/google-context-aggregator-upgrade-context
google: updgrade OpenAILLMContext to GoogleLLMContext
2025-02-24 17:37:26 -08:00
Aleix Conchillo Flaqué
07b9be5308 PipelineTask: add check_dangling_tasks parameter 2025-02-24 17:33:10 -08:00
Aleix Conchillo Flaqué
40c2452d6e google: updgrade OpenAILLMContext to GoogleLLMContext 2025-02-24 15:35:18 -08:00
Aleix Conchillo Flaqué
30cdd1b71a Merge pull request #1280 from pipecat-ai/aleix/add-completion-timeout
services(llm): add on_completion_timeout event
2025-02-24 15:07:20 -08:00
Aleix Conchillo Flaqué
2110b79507 services(llm): add on_completion_timeout event 2025-02-24 14:55:36 -08:00
Aleix Conchillo Flaqué
fc544fa61c Merge pull request #1272 from pipecat-ai/aleix/tts-websocket-interruptions
services: fix some TTS websocket service interruption handling
2025-02-24 14:54:41 -08:00
Mark Backman
976fe95304 Merge pull request #1279 from pipecat-ai/mb/remove-open-optional-dep
Remove `openai` optional dependency from services as it's now required
2025-02-24 17:42:53 -05:00
Aleix Conchillo Flaqué
408270b647 lmnt: don't send "eof" before closing the socket 2025-02-24 14:37:37 -08:00
Mark Backman
1dfb75bc9d Merge pull request #1278 from pipecat-ai/mb/claude-3-7
Update AnthropicLLMService to use claude-3-7-sonnet-20250219 by default
2025-02-24 15:41:28 -05:00
Mark Backman
cefc2a1088 Fix test-requirements.text ordering 2025-02-24 15:06:13 -05:00
Mark Backman
3b9b9200ea Remove openai optional dependency from services as it's now required 2025-02-24 15:05:42 -05:00
Mark Backman
d6f29a0f4b Update AnthropicLLMService to use claude-3-7-sonnet-20250219 by default 2025-02-24 14:32:00 -05:00
Aleix Conchillo Flaqué
5b762d11ef Merge pull request #1228 from CarlKho-Minerva/main
Missing Cartesia~=1.3.1 → `test-requirements`
2025-02-24 08:47:41 -08:00
Aleix Conchillo Flaqué
2f3e2da6b9 Merge pull request #1259 from pipecat-ai/openai-not-optional
Since the `openai` package is used by pretty much everything in pipec…
2025-02-24 08:45:45 -08:00
allenmylath
45058d4a94 Update audio_buffer_processor.py (#1266) 2025-02-24 08:41:19 -08:00
Aleix Conchillo Flaqué
5b637bd826 services: fix some TTS websocket service interruption handling 2025-02-24 08:37:22 -08:00
Mark Backman
2d4fd7e903 Merge pull request #1274 from pipecat-ai/mb/add-ellipsis-test
Add one additional ellipsis test to test_utils_string
2025-02-23 11:26:20 -05:00
Mark Backman
b5662520aa Add one additional ellipsis test to test_utils_string 2025-02-23 11:04:24 -05:00
Aleix Conchillo Flaqué
af45c170b5 Merge pull request #1264 from pipecat-ai/aleix/add-log-observers
add initial log observers
2025-02-21 15:20:45 -08:00
Aleix Conchillo Flaqué
65f548b2ec examples(30-observer): update to use LLMLogObserver 2025-02-21 15:15:16 -08:00
Aleix Conchillo Flaqué
b29ab8c608 observers: add LLMLogObserver and TranscriptionLogObserver 2025-02-21 15:15:16 -08:00
Aleix Conchillo Flaqué
d6dc37f0b6 Merge pull request #1269 from pipecat-ai/aleix/endofsentence-support-ellipses
utils: add support for ellipses in match_endofsentence()
2025-02-21 15:08:22 -08:00
Aleix Conchillo Flaqué
12bce2e8c0 utils: add support for ellipses in match_endofsentence() 2025-02-21 15:05:50 -08:00
Aleix Conchillo Flaqué
4acf7296e0 Merge pull request #1261 from pipecat-ai/aleix/emualted-frames-being-triggered-prematurely
LLMUserContextAggregator: don't reset timer with interim transcription
2025-02-21 10:15:28 -08:00
Aleix Conchillo Flaqué
98706d429c LLMUserContextAggregator: make sure incoming transcription has text 2025-02-21 10:12:54 -08:00
Aleix Conchillo Flaqué
41720b1a13 LLMUserContextAggregator: don't reset timer with interim transcription
It turns out that in some cases we only get interim transcriptions (e.g. someone
is speaking very very softly or someone is talking in the background). In those
cases we don't want to interrupt the bot because there's really nothing to
interrupt the bot for.

We originally thought we should interrupt the bot right at the time we got an
interim frame, but this is causing too many false positives. It's actually
better to simply wait for a real transcription before interrupting (in case VAD
didn't interrupt).
2025-02-21 09:05:56 -08:00
Aleix Conchillo Flaqué
3ef4245166 Merge pull request #1265 from pipecat-ai/aleix/transport-remove-audio-out-is-live 2025-02-21 06:51:09 -08:00
Filipi da Silva Fuchter
3bb0797922 Merge pull request #1257 from pipecat-ai/fastapi_disconnect_issue
Fixed an issue where FastAPI was not triggering on_client_disconnected.
2025-02-21 09:15:15 -03:00
Filipi Fuchter
7c7b4c52af Fixed an issue where EndTaskFrame was not triggering on_client_disconnected or closing the WebSocket in FastAPI. 2025-02-21 09:11:58 -03:00
Aleix Conchillo Flaqué
01f083b7fc transports: remove TransportParams.audio_out_is_live 2025-02-20 23:33:06 -08:00
Aleix Conchillo Flaqué
91fcaebe25 Merge pull request #1263 from Vaibhav159/vl_fix_deepgram_sample_rate_mismatch
fixing deepgram mismatch
2025-02-20 22:39:06 -08:00
Vaibhav159
9c5fe5c85e fixing deepgram mismatch 2025-02-21 09:32:40 +05:30
Aleix Conchillo Flaqué
7e5e167a4b Merge pull request #1250 from pipecat-ai/aleix/context-aggregation-simulatenous-text-tools
AssistantContextAggregator: append aggregation and tools in the same turn
2025-02-20 17:32:57 -08:00
Aleix Conchillo Flaqué
d04c4b36f3 AssistantContextAggregator: append aggregation and tools in the same turn 2025-02-20 17:29:43 -08:00
Aleix Conchillo Flaqué
a811e53626 Merge pull request #1253 from pipecat-ai/aleix/http-tts-services-stopped-frame
HTTP TTS services stopped frame
2025-02-20 17:28:05 -08:00
Paul Kompfner
df57202a05 Since the openai package is used by pretty much everything in pipecat (due to OpenAILLMContext being the standard context representation), let's make it a non-optional dependency.
This change solves an issue faced by users who aren't intending to use OpenAI getting scary error messages saying that they need the `openai` optional dependency "in order to use OpenAI", along with an instruction to set the OPENAI_API_KEY environment variable.

Note that with this change we could theoretically remove from pyproject.toml a number of defined optional dependencies that list only the `openai` package as a dependency (like `deepseek`, for example), but I didn't want to "break the API" in terms of how users install/consume pipecat and its set of built-in services.

Finally, I removed the `python-deepcompare` dependency from the `openai` optional dependency, since it appears to me like it was added by mistake (my guess is it was used for debugging during development and then never removed).
2025-02-20 15:21:35 -05:00
Aleix Conchillo Flaqué
69e6f3fdb7 rime: pass aiohttp session to constructor 2025-02-20 07:36:24 -08:00
Aleix Conchillo Flaqué
6809254963 tts: fix metrics and TTSStoppedFrame frame in HTTP services
Fixes #1247
2025-02-20 07:36:21 -08:00
Aleix Conchillo Flaqué
81093d3bed Merge pull request #1252 from pipecat-ai/aleix/remove-vad-extra-logging
BaseInputTransport: remove VAD logging
2025-02-20 07:32:20 -08:00
Aleix Conchillo Flaqué
d9a67164f6 Merge pull request #1251 from pipecat-ai/aleix/fish-tts-service-push-stop-frame
FishAudioTTSService should push TTSStoppedFrame
2025-02-20 07:32:05 -08:00
Aleix Conchillo Flaqué
98259af54e update CHANGELOG 2025-02-19 22:05:48 -08:00
Dominic Stewart
039d144c79 examples(phone-bot): updated example to use Gemini (#1233) 2025-02-19 22:03:37 -08:00
Aleix Conchillo Flaqué
d0f67fc189 BaseInputTransport: remove VAD logging
These logs are very verbose. They were added to try to find an issue that
resulted in being because of low CPU/memory resources, but these logs were not
helpful to determine that.
2025-02-19 21:55:11 -08:00
Aleix Conchillo Flaqué
6e3f96aa83 fish: automatically send TTSStoppedFrame after timeout 2025-02-19 21:41:18 -08:00
Aleix Conchillo Flaqué
293677588d tts: make push_stop_frames default to 2.0s 2025-02-19 21:39:00 -08:00
Filipi da Silva Fuchter
77e777b1ce Merge pull request #1249 from pipecat-ai/invoking_call_start_function
Fixed an issue that `start_callback` was not invoked for some LLM services
2025-02-19 18:09:00 -03:00
Filipi Fuchter
7e7926059c Fixed an issue that start_callback was not invoked for some LLM services. 2025-02-19 18:04:20 -03:00
Aleix Conchillo Flaqué
c948754eff Merge pull request #1248 from pipecat-ai/aleix/daily-transport-room-url
daily: add room_url property
2025-02-19 09:46:46 -08:00
Aleix Conchillo Flaqué
83f1a8830d daily: add room_url property 2025-02-19 09:29:53 -08:00
James Hush
80f8e05fcf docs: fix transcripts in translation chatbot example (#1199) 2025-02-19 16:07:22 +08:00
Aleix Conchillo Flaqué
afd1a1e80b Merge pull request #1245 from pipecat-ai/aleix/stt-mute-filter-trace-logging 2025-02-18 21:21:55 -08:00
Aleix Conchillo Flaqué
84ac88cad7 STTMuteFilter: change suppressed logging to trace 2025-02-18 18:03:37 -08:00
Aleix Conchillo Flaqué
211163e5c7 Merge pull request #1241 from pipecat-ai/aleix/deepgram-nova-3
deepgram: use the new nova-3 model as default
2025-02-18 17:53:04 -08:00
Aleix Conchillo Flaqué
1b0bcebef6 deepgram: use the new nova-3 model as default 2025-02-18 17:51:54 -08:00
Aleix Conchillo Flaqué
89736b03c4 Merge pull request #1243 from pipecat-ai/aleix/add-deepgram-addons
deepgram: add ability to provide custom addons
2025-02-18 17:47:48 -08:00
Aleix Conchillo Flaqué
4edda718ed deepgram: add ability to provide custom addons 2025-02-18 17:45:41 -08:00
Aleix Conchillo Flaqué
22a62edc9e Merge pull request #1242 from pipecat-ai/aleix/utils-network-exponential
network: added exponential_backoff_time() function
2025-02-18 17:44:21 -08:00
Aleix Conchillo Flaqué
50b6cc8135 network: added exponential_backoff_time() function 2025-02-18 17:42:43 -08:00
Aleix Conchillo Flaqué
45cf36925a Merge pull request #1240 from pipecat-ai/aleix/handle-deepgram-on-error
deepgram: handle error event and reconnect
2025-02-18 17:41:29 -08:00
Filipi da Silva Fuchter
83a71e1fec Merge pull request #1112 from pipecat-ai/bot-ready-signalling-rn
React Native client for the bot ready example.
2025-02-18 15:17:38 -03:00
Filipi Fuchter
e809c8680e Upgrading to use the latest node stable version 2025-02-18 15:12:44 -03:00
Aleix Conchillo Flaqué
c926063d74 deepgram: handle error event and reconnect 2025-02-18 09:52:18 -08:00
Aleix Conchillo Flaqué
0334550356 Merge pull request #1238 from pipecat-ai/aleix/stt-mute-filter-ignore-input-audio-frames
STTMuteFilter: ignore audio frames so no transcriptions are generated
2025-02-18 09:48:13 -08:00
Aleix Conchillo Flaqué
90b9dce710 STTMuteFilter: ignore audio frames so no transcriptions are generated 2025-02-17 19:59:05 -08:00
Carl Kho
a5cdd5f1b8 Add Cartesia API key to dot-env.template 2025-02-14 21:29:37 -08:00
Carl Kho
5f937b8479 Update test requirements to include Cartesia version 1.3.1 2025-02-14 21:14:32 -08:00
Aleix Conchillo Flaqué
b45f7fee6f Merge pull request #1225 from pipecat-ai/aleix/prepare-0.0.57
update CHANGELOG for 0.0.57
2025-02-14 18:50:08 -08:00
Aleix Conchillo Flaqué
01c06c5cac update CHANGELOG for 0.0.57 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
329e89c1d9 TTSService: push BotStoppedSpeakingFrame 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
883410d8ac FrameProcessor: no need to create an input event every time 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
1f5b790dd0 TTSService: reset processing text during interruptions 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
a107b1cb4b examples(06a): use CartesiaTTSService 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
63950912f0 LLMAssistantContextAggregator: add missing variable initialization 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
2ce9402571 LLMAssistantResponseAggregator: initialize messages 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
f6912c0f9a utils: don't consider colon an end of sentence 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
633a4d4c58 FalImageGenService: load image async to not block the event loop 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
67da745bb3 tts: make frame pausing/resuming optional 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
5126d4de92 tts: handle incoming frames pausing/resuming from base TTSService class 2025-02-14 18:47:33 -08:00
Aleix Conchillo Flaqué
426d7ac213 transports: some local audio and tk updates 2025-02-14 18:47:33 -08:00
Mark Backman
9115692c72 Merge pull request #1227 from pipecat-ai/mb/fix-25-error
fix: ensure proper Google message format conversion in transcription …
2025-02-14 21:01:05 -05:00
Mark Backman
c26fe3f277 fix: ensure proper Google message format conversion in transcription filter 2025-02-14 20:28:26 -05:00
Mark Backman
47b059d387 Merge pull request #1226 from pipecat-ai/mb/add-transcript-processor-tests
tests: add tests for TranscriptProcessor
2025-02-14 19:50:38 -05:00
Mark Backman
a49d81e519 tests: add tests for TranscriptProcessor 2025-02-14 17:10:40 -05:00
Aleix Conchillo Flaqué
b3a575c7c7 Merge pull request #1212 from Vaibhav159/vl_fix_incorrect_has_regular_messages_check
fixing google llm service error
2025-02-14 13:16:37 -08:00
Aleix Conchillo Flaqué
790d0c1256 Merge pull request #1224 from M1ngXU/patch-1
Update openai.py
2025-02-14 13:13:00 -08:00
Aleix Conchillo Flaqué
ee7e0dc3f7 Merge pull request #1223 from pipecat-ai/aleix/audio-context-tts-service
audio context tts service and cartesia fixes
2025-02-14 12:12:42 -08:00
Aleix Conchillo Flaqué
f53ee79ddb RimeTTSService: use AudioContextWordTTSService 2025-02-14 11:55:54 -08:00
Aleix Conchillo Flaqué
aeadb40c3f CartesiaTTSService: use AudioContextWordTTSService
By supporting multiple audio requests we fix an issue that was causing audio
overlapping.
2025-02-14 11:55:54 -08:00
Aleix Conchillo Flaqué
cacb07f4c2 introduce AudioContextWordTTSService 2025-02-14 11:55:54 -08:00
M1ngXU
0b91d821fb Update openai.py
d
2025-02-14 20:27:08 +01:00
Aleix Conchillo Flaqué
af66a43056 Merge pull request #1222 from pipecat-ai/aleix/websocket-service-handle-clean-disconnection
WebsocketService: handle clean server disconnection
2025-02-14 10:33:54 -08:00
Aleix Conchillo Flaqué
e006dcf172 WebsocketService: handle clean server disconnection
The websocket async iterator doesn't raise an exception when the server
disconnects cleanly. We should handle that and raise an exception so we can
reconnect.
2025-02-14 10:11:56 -08:00
Filipi da Silva Fuchter
8588f8b0d8 Merge pull request #1220 from pipecat-ai/instant_voice_demo_example
Instant voice example.
2025-02-14 14:24:13 -03:00
Filipi Fuchter
bff54547b0 Instant voice example. 2025-02-14 14:19:17 -03:00
Mark Backman
b2754bf208 Merge pull request #1219 from pipecat-ai/mb/markdown-text-filter-tests
Add MarkdownTextFilter tests
2025-02-13 21:10:52 -05:00
Mark Backman
9a4942b0d0 Merge pull request #1218 from pipecat-ai/mb/user-idle-tests
Add UserIdleProcessor tests
2025-02-13 18:53:22 -05:00
Mark Backman
ed6201910b Add MarkdownTextFilter tests 2025-02-13 18:51:46 -05:00
Mark Backman
ac5ebc587e Add tests for UserIdleProcessor 2025-02-13 18:47:29 -05:00
Aleix Conchillo Flaqué
dff4c54e57 Merge pull request #1209 from pipecat-ai/aleix/reimplement-llm-response-aggregators
reimplement LLM response aggregators
2025-02-13 15:30:40 -08:00
Aleix Conchillo Flaqué
c744409651 SegmentedSTTService: fix process_audio_frame() arguments 2025-02-13 15:25:22 -08:00
Aleix Conchillo Flaqué
7578fbeaef update google requirements 2025-02-13 15:25:22 -08:00
Aleix Conchillo Flaqué
5909dff423 LLMContextResponseAggregator: add VAD emulation support 2025-02-13 15:25:22 -08:00
Aleix Conchillo Flaqué
a6502df72c services: forgot to pass context instead of user aggregator 2025-02-13 13:50:33 -08:00
Aleix Conchillo Flaqué
e0d24d7fc0 update CHANGELOG 2025-02-13 13:21:32 -08:00
Aleix Conchillo Flaqué
99779046a8 services: use push_context_frame() 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
67cdc0063a BaseTransportOutput: allow pushing frames upstream 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
b28f752afa tests: add anthropic and google aggregator tests 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
463078e375 initialize assistant aggregators with context and push upstream instead 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
84510fd521 LLMUserContextAggregator: add space between transcriptions 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
9f6a1c093a LLMUserContextAggregator: reset user speaking time after bot interruption 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
b602e78625 tests: add OpenAI context aggregator tests 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
7c815121ea LLMContextResponseAggregator: add missing reset() implementation 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
16a107948b services: missing kwargs in anthropic/openai user context aggregator 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
839aa7d935 llm_response: add some initial docstrings to LLM aggregators 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
4cbcfe2b0b LLMUserContextAggregator: interrupt the bot if VAD happened a while back 2025-02-13 13:20:38 -08:00
Aleix Conchillo Flaqué
91a628d1ba UserResponseAggregator: implement on top of LLMUserResponseAggregator 2025-02-13 13:20:37 -08:00
Aleix Conchillo Flaqué
50288eeaaa tests: add LLM response aggregators tests 2025-02-13 13:20:37 -08:00
Aleix Conchillo Flaqué
e1f2bbceb3 reimplement LLM response aggregators 2025-02-13 13:20:37 -08:00
Aleix Conchillo Flaqué
8bdd7ed0ed tests: implement langchain tests with run_test() 2025-02-13 13:20:37 -08:00
Aleix Conchillo Flaqué
1b7dfe8126 tests: add a new SleepFrame
The new SleepFrame allow us to control when system frames are pushed to the
pipeline.
2025-02-13 13:20:37 -08:00
Aleix Conchillo Flaqué
d1ee851a65 tests: rename some variables to make things clearer 2025-02-13 13:20:37 -08:00
Filipi da Silva Fuchter
0358673b46 Merge pull request #1215 from pipecat-ai/instant_voice_demo
Instant voice demo improvements - part 02
2025-02-13 18:14:15 -03:00
Filipi Fuchter
16fe1b10e9 - Added support for the RTVIProcessor to handle buffered audio in base64 format, converting it into InputAudioRawFrame for transport.
- Added support for the `RTVIProcessor` to trigger `start_audio_in_streaming` only after the `client-ready` message.
2025-02-13 18:08:55 -03:00
Filipi Fuchter
f001819df8 - Added a new audio_in_stream_on_start field to TransportParams.
- Added a new method `start_audio_in_streaming` in the `BaseInputTransport`.
- Updated `DailyTransport` to respect the `audio_in_stream_on_start` field, ensuring it only starts receiving the audio input if it is enabled.
2025-02-13 18:08:36 -03:00
Filipi Fuchter
dceec60186 Updated FastAPIWebsocketOutputTransport to send TransportMessageFrame and TransportMessageUrgentFrame to the serializer. 2025-02-13 18:07:33 -03:00
Filipi Fuchter
b96979a4ed Update WebsocketServer to not wrap the message inside a text frame. 2025-02-13 18:07:04 -03:00
Mark Backman
745c40def4 Merge pull request #1214 from pipecat-ai/mb/stt-mute-tests
Improve STTMuteFilter, add tests
2025-02-13 09:50:43 -05:00
Mark Backman
42ab62716d Merge pull request #1198 from pipecat-ai/mb/more-whisper-params
Add prompt and temperature args to OpenAI and Groq hosted Whisper STT…
2025-02-13 09:16:38 -05:00
Mark Backman
16ba2010aa Refactor process_frame to be more consistent 2025-02-13 09:15:29 -05:00
Mark Backman
ec0ca46617 Fix temperature docstrings to reference optional 2025-02-13 09:04:20 -05:00
Mark Backman
6ff1f526ff Merge pull request #1216 from pipecat-ai/mb/google-cloud-speech
Add the google-cloud-speech package to the google dependency
2025-02-13 07:04:34 -05:00
Mark Backman
84143cc80c self._muted now returns from STT process_audio_frames 2025-02-13 07:00:44 -05:00
Mark Backman
229dccedc6 Add the google-cloud-speech package to the google dependency 2025-02-12 23:19:17 -05:00
Aleix Conchillo Flaqué
68aaa1f8f4 Merge pull request #1213 from pipecat-ai/aleix/base-transport-output-bot-vad-stop-secs
BaseOutputTransport: use specific VAD stop secs for the bot
2025-02-12 19:01:56 -08:00
Aleix Conchillo Flaqué
f110a45c85 BaseOutputTransport: use specific VAD stop secs for the bot 2025-02-12 19:01:39 -08:00
Mark Backman
1e8a86de63 Handle starting muted, add tests 2025-02-12 19:01:49 -05:00
Mark Backman
ee93e2a2b1 Reorder frame pushing for STTMuteFilter, update STTMuteFrame to SystemFrame 2025-02-12 15:51:18 -05:00
Mark Backman
2e87a019a8 Merge pull request #1208 from pipecat-ai/mb/stt-mute-first-bot-speech
Add new STTMuteStrategy: MUTE_UNTIL_FIRST_BOT_COMPLETE
2025-02-12 12:21:02 -05:00
Vaibhav159
687b3d9d4c fixing google llm service error 2025-02-12 22:22:04 +05:30
Mark Backman
397768d872 Add new STTMuteStrategy: MUTE_UNTIL_FIRST_BOT_COMPLETE 2025-02-12 10:59:28 -05:00
Mark Backman
24cdcd74e6 Merge pull request #1197 from pipecat-ai/mb/google-stt
Add GoogleSTTService
2025-02-12 10:16:18 -05:00
Mark Backman
5d6370690c Add _reconnect_if_needed to simplify reconnect logic 2025-02-12 10:11:18 -05:00
Mark Backman
9f728aa623 Add reconnect logic to handle Google's 5 min time limit 2025-02-12 10:11:18 -05:00
Mark Backman
32d8f6153f Update InputParams to languages: support str or List of Languages 2025-02-12 10:11:18 -05:00
Mark Backman
8c2071f248 Add ClientOptions for region selection 2025-02-12 10:11:18 -05:00
Mark Backman
a9c2197dc6 Add ability to update options 2025-02-12 10:11:18 -05:00
Mark Backman
ce0358804b Docstrings and cleanup 2025-02-12 10:11:18 -05:00
Mark Backman
66a6a6a295 Enable interim transcriptions, add VAD events option 2025-02-12 10:11:18 -05:00
Mark Backman
9f1732c390 Update CHANGELOG and README 2025-02-12 10:11:17 -05:00
Mark Backman
b44ddf2456 07n uses all Google services 2025-02-12 10:09:36 -05:00
Mark Backman
17420f4d0c Update language support 2025-02-12 10:09:36 -05:00
Mark Backman
6cb55ec2cb Add GoogleSTTService 2025-02-12 10:09:36 -05:00
Filipi da Silva Fuchter
e2b4554a54 Merge pull request #1129 from pipecat-ai/instant_voice_demo
Pipecat improvements for the instant voice demo
2025-02-12 11:53:40 -03:00
Mark Backman
fd68b82e48 Merge pull request #1163 from pipecat-ai/mb/rime-websocket
Add RimeTTSService
2025-02-12 09:51:56 -05:00
Filipi Fuchter
cc90f5ab9f Sending the RTVI messages to the websocket 2025-02-12 11:46:49 -03:00
Filipi Fuchter
08f40d9179 Adding support to DailyTransport receive raw-audio through appMessage 2025-02-12 11:46:37 -03:00
Aleix Conchillo Flaqué
80e1325621 include codecov.yml 2025-02-11 23:46:19 -08:00
Aleix Conchillo Flaqué
ed76a5bfa5 Merge pull request #1202 from pipecat-ai/aleix/fix-simli-audiolayout-error
simli: fix audio layout error
2025-02-11 22:24:22 -08:00
Mark Backman
69b0d9035f Mark end_time as unused 2025-02-11 17:44:52 -05:00
Mark Backman
dcc63dd648 Use the vendor default for temperature 2025-02-11 14:29:33 -05:00
Aleix Conchillo Flaqué
2d08f42870 Merge pull request #1204 from pipecat-ai/aleix/add-coverage-support
github: add coverage support
2025-02-11 11:09:25 -08:00
Mark Backman
0814c0bc82 Merge pull request #1203 from pipecat-ai/expose-update-remote-participants-on-daily-transport
Expose `update_remote_participants()` from `DailyTransport`
2025-02-11 13:57:08 -05:00
Paul Kompfner
28e233b195 Update CHANGELOG to reflect the addition of update_remote_participants() 2025-02-11 13:23:47 -05:00
Aleix Conchillo Flaqué
6e4d2d6ade examples: fix more dependabot warnings 2025-02-11 10:09:33 -08:00
Aleix Conchillo Flaqué
266135ec54 examples: fix dependabot warnings 2025-02-11 10:07:05 -08:00
Aleix Conchillo Flaqué
d81aa48262 test-requirements: update transformers to 4.48.0 2025-02-11 10:04:21 -08:00
Aleix Conchillo Flaqué
8c7752fbc2 github: add coverage support 2025-02-11 09:58:21 -08:00
Julien Le Bourg
77fb63372a fix: incorrectly changed the base type in my last pull request for L… (#1184)
* fix: incorrectly changed the base type in my last pull request for  LocalAudioTransport

* update examples to use the new LocalTransportParams

* add local device select example
2025-02-11 08:35:57 -08:00
Paul Kompfner
5a8279d3c2 Expose update_remote_participants() from DailyTransport 2025-02-11 11:28:03 -05:00
Aleix Conchillo Flaqué
4db620198a simli: fix audio layout error
Fixes #1201
2025-02-11 07:05:35 -08:00
Mark Backman
d35f4c6b99 Add prompt and temperature args to OpenAI and Groq hosted Whisper STT services 2025-02-10 21:06:37 -05:00
Aleix Conchillo Flaqué
0a990b2aaa Merge pull request #1196 from pipecat-ai/aleix/audio-buffer-processor-continuous-intermittent-stream
AudioBufferProcessor: handle continuous and intermittent user audio
2025-02-10 16:07:12 -08:00
Mark Backman
97586b132d Simplify _calculate_word_times 2025-02-10 18:45:49 -05:00
Mark Backman
8020db350e Update RimeHttpTTSService to use mistv2 model by default 2025-02-10 18:45:48 -05:00
Mark Backman
54f64b8dad Code review feedback 2025-02-10 18:45:08 -05:00
Mark Backman
8f8a3ae7f9 Add RimeTTSService 2025-02-10 18:45:06 -05:00
Mark Backman
344aff5681 Merge pull request #1191 from pipecat-ai/mb/azure-tts-error-handling
Improve AzureTTSService error handling
2025-02-10 18:01:39 -05:00
Mark Backman
0d2e90cff1 Merge pull request #1190 from pipecat-ai/mb/languages-hosted-whisper
Add language support to OpenAI and Groq hosted Whisper
2025-02-10 17:49:38 -05:00
Mark Backman
1a8dd6b713 Improve AzureTTSService error handling 2025-02-10 17:48:55 -05:00
Mark Backman
2dc585aee0 Merge pull request #1185 from pipecat-ai/mb/update-readme-hacking
Add missing pip install -e . step to the README, and clarify steps
2025-02-10 17:45:58 -05:00
Mark Backman
a64fa44811 Merge pull request #1186 from pipecat-ai/mb/whisper-multilingual
Add language support to WhisperSTTService
2025-02-10 17:26:10 -05:00
Aleix Conchillo Flaqué
baeb83484d Merge pull request #1194 from Vaibhav159/vl_fix_elevenlabs_disconnect_issue
fixing disconnect issue
2025-02-10 13:41:59 -08:00
Vaibhav159
b0c3f80963 resolve merge conf 2025-02-11 03:03:32 +05:30
Aleix Conchillo Flaqué
eb3c9b1e75 AudioBufferProcessor: handle continuous and intermittent user audio
Fixes #1172
2025-02-10 11:26:31 -08:00
Mark Backman
ad4cbdb1ec Merge pull request #1159 from Canonical-AI-Inc/gemini-rag
Gemini 2.0 Flash Lite RAG example
2025-02-10 13:42:11 -05:00
Aleix Conchillo Flaqué
32baee924b RTVI: fix premature bot-tts-text messages (#1193) 2025-02-10 10:37:54 -08:00
Adrian Cowham
9cc53509d1 PR feedback: renamed file, added docstring, changed file read logic 2025-02-10 09:39:01 -08:00
Vaibhav159
2c62d3bf32 break once ConnectionClosed error 2025-02-10 23:04:05 +05:30
Vaibhav159
b06b16adb7 fixing disconnect issue 2025-02-10 22:55:20 +05:30
Mark Backman
cd52d73027 Add language support to OpenAI and Groq hosted Whisper 2025-02-10 10:18:00 -05:00
Mark Backman
c9d8c572c7 Add language support to WhisperSTTService 2025-02-09 10:51:23 -05:00
Mark Backman
d9439fd398 Add missing pip install -e . step to the README, and clarify steps 2025-02-09 09:15:10 -05:00
Mark Backman
081abcedb3 Merge pull request #1176 from pipecat-ai/mb/stt-mute-deprecate-stt-service
Deprecate stt_service parameter in STTMuteFilter
2025-02-09 08:35:22 -05:00
Mark Backman
1455e24ad1 Add keyword args, collocated warnings import with the deprecation 2025-02-09 08:29:20 -05:00
Mark Backman
4613cf4790 Merge pull request #1181 from pipecat-ai/mb/daily-docstrings
Add docstrings to daily.py
2025-02-09 08:05:59 -05:00
Mark Backman
7aa2e1209d Merge pull request #1177 from pipecat-ai/mb/perplexity
Add PerplexityLLMService
2025-02-09 08:05:46 -05:00
Mark Backman
76daaab6ca Add PerplexityLLMService 2025-02-09 08:00:31 -05:00
Mark Backman
37cfe870cc Merge pull request #1183 from pipecat-ai/mb/add-groq-stt
Add GroqSTTService, BaseWhisperSTTService, and refactor OpenAISTTService
2025-02-09 07:56:35 -05:00
Mark Backman
160167758b Add docstrings to daily.py 2025-02-09 07:53:51 -05:00
Mark Backman
4b634713a5 Merge pull request #1182 from pipecat-ai/mb/28c-optional-db
Update 28c option to output to log line only by default
2025-02-09 07:52:21 -05:00
Mark Backman
72954d5f15 Remove to base_whisper.py 2025-02-09 07:51:30 -05:00
Mark Backman
f2b07271c1 Update GroqLLMService to use llama-3.3-70b-versatile as the default model 2025-02-09 07:51:30 -05:00
Mark Backman
32b9de5f51 Add GroqSTTService, BaseWhisperSTTService, and refactor OpenAISTTService 2025-02-09 07:51:28 -05:00
Mark Backman
71ce8f9bcf Merge pull request #1179 from pipecat-ai/mb/remove-command-dash-badge
Remove CommandDash badge from README
2025-02-09 07:47:32 -05:00
Mark Backman
7d05728e2f Update 28c option to output to log line only by default 2025-02-08 10:00:45 -05:00
Mark Backman
dee5448b57 Merge pull request #1123 from pipecat-ai/cb/sqlite
Add SQLite storage to the Gemini persistent storage example
2025-02-08 09:07:52 -05:00
Mark Backman
d67861925a Merge pull request #1128 from golbin/whisper-api
Add Whisper STT service using OpenAI API
2025-02-08 08:35:26 -05:00
Mark Backman
0180619d44 Merge pull request #1173 from TheCodingLand/local-pyaudio-device-ids
adds configurable device ids for local audio transport
2025-02-08 08:04:00 -05:00
Mark Backman
f07e498612 Remove CommandDash badge from README 2025-02-08 07:59:39 -05:00
TheCodingLand
57964cb929 fix LocalAudioTransport param type 2025-02-08 12:32:20 +01:00
TheCodingLand
6840c77684 apply ruff formatting 2025-02-08 12:03:23 +01:00
Mark Backman
a1b58115ce Deprecate stt_service parameter in STTMuteFilter 2025-02-07 19:24:03 -05:00
chadbailey59
23eb6e3d46 storybot fixes (#1175)
* storybot fixes

* readme cleanup
2025-02-07 13:58:02 -06:00
Mark Backman
74a2c38c6c Merge pull request #1174 from pipecat-ai/mb/bump-google-genai-version
Bump google-genai version to 1.0.0
2025-02-07 14:53:44 -05:00
Mark Backman
90b217fda8 Bump google-genai version to 1.0.0 2025-02-07 14:32:37 -05:00
Aleix Conchillo Flaqué
6855bc0ada Merge pull request #1166 from pipecat-ai/aleix/google-rtvi-observer
rtvi: separate specific google RTVI into a GoogleRTVIObserver
2025-02-08 03:19:02 +08:00
TheCodingLand
a359434307 remove Doc and Annotated imports 2025-02-07 19:42:34 +01:00
TheCodingLand
856c8959c3 enhance doc 2025-02-07 19:38:26 +01:00
TheCodingLand
8da7a42137 adds configurable input and output device ids for local audio 2025-02-07 19:23:18 +01:00
Aleix Conchillo Flaqué
510a0f5ef5 rtvi: deprecate RTVI.observer() 2025-02-07 09:19:43 -08:00
Aleix Conchillo Flaqué
03ac744bcf rtvi: deprecate frame processors 2025-02-07 09:17:29 -08:00
Aleix Conchillo Flaqué
b058461a7d GoogleRTVIObserver: add explicit constructor 2025-02-07 09:15:32 -08:00
Mark Backman
abd9f16b90 Export .rtvi, update new-chatbot example, rename and update foundational 32 2025-02-07 09:15:32 -08:00
Aleix Conchillo Flaqué
d07732f2e8 rtvi: separate specific google RTVI into a GoogleRTVIObserver 2025-02-07 09:15:32 -08:00
Aleix Conchillo Flaqué
4d25582e16 dev-requirements: update pyright and ruff 2025-02-06 21:51:57 -08:00
Aleix Conchillo Flaqué
d4b2160f9c Merge pull request #1161 from pipecat-ai/aleix/prepare-0.0.56
update CHANGELOG for 0.0.56
2025-02-06 13:50:04 -08:00
Aleix Conchillo Flaqué
dd7926aab5 update CHANGELOG for 0.0.56 2025-02-06 13:45:13 -08:00
Aleix Conchillo Flaqué
070bf66980 transports: fix local transports audio cleanup 2025-02-06 13:45:13 -08:00
Aleix Conchillo Flaqué
962fc27dbd Merge pull request #1160 from pipecat-ai/aleix/fix-unit-test-logging
tests: remove logger from tests.utils
2025-02-06 13:26:37 -08:00
Mark Backman
3d4d6132fc Merge pull request #1158 from pipecat-ai/mb/update-22c
Update foundation examples 22b, 22c, and 22d to be ready for function…
2025-02-06 16:25:05 -05:00
Aleix Conchillo Flaqué
a96d9294b7 tests: remove logger from tests.utils 2025-02-06 13:18:28 -08:00
Aleix Conchillo Flaqué
a6e78550d5 Merge pull request #1156 from pipecat-ai/aleix/prefer-optional
prefer Optional over to "| None"
2025-02-06 13:08:48 -08:00
Adrian Cowham
d9f6b7b93c added an example using using Gemini's large context window for RAG 2025-02-06 12:49:29 -08:00
Mark Backman
969de92ad9 Update foundation examples 22b, 22c, and 22d to be ready for function calling 2025-02-06 15:36:16 -05:00
Aleix Conchillo Flaqué
c4dbe92b30 prefer Optional over to "| None" 2025-02-06 11:11:37 -08:00
Aleix Conchillo Flaqué
684764fece Merge pull request #1155 from pipecat-ai/aleix/sentry-fixes-and-example
sentry fixes and example
2025-02-06 11:09:31 -08:00
Aleix Conchillo Flaqué
c4be07693f examples: added sentry-metrics example 2025-02-06 10:46:04 -08:00
Aleix Conchillo Flaqué
c5d5ca8232 SentryMetrics: use transactions and call parent methods 2025-02-06 10:44:38 -08:00
Mark Backman
428e763814 Merge pull request #1149 from pipecat-ai/mb/update-google-default-llm-model
Use gemini-2.0-flash-001 as the default model for GoogleLLMService
2025-02-06 12:41:13 -05:00
Mark Backman
0efa2711ff Merge pull request #1152 from pipecat-ai/mb/docstrings
Add docstrings for PipelineTask and related classes/functions
2025-02-06 12:30:12 -05:00
Mark Backman
4904f52cee Use gemini-2.0-flash-001 as the default model for GoogleLLMService 2025-02-06 12:29:15 -05:00
Aleix Conchillo Flaqué
dbcf14ddb4 Merge pull request #1154 from pipecat-ai/aleix/twilio-telnyx-sample-rates
serializers: don't update twilio/telnyx sample rates
2025-02-06 09:27:42 -08:00
Aleix Conchillo Flaqué
7c13ec10d9 examples: cleanup ElevenLabsTTSService constructor arguments 2025-02-06 09:25:52 -08:00
Aleix Conchillo Flaqué
29b9dccc53 serializers: don't update twilio/telnyx sample rates 2025-02-06 09:25:52 -08:00
Aleix Conchillo Flaqué
e8ce826473 Merge pull request #1151 from pipecat-ai/aleix/base-output-transport-resample
BaseOutputTransport: resample incoming audio if needed
2025-02-06 09:25:07 -08:00
Aleix Conchillo Flaqué
bbb991dfd8 Merge pull request #1153 from pipecat-ai/aleix/base-input-transport-show-vad
BaseInputTransport: show VAD results when interruptions not allowed
2025-02-06 09:24:12 -08:00
Mark Backman
4432e7e4f7 Add docstrings for PipelineTask and related classes/functions 2025-02-06 11:04:54 -05:00
Aleix Conchillo Flaqué
ee9cce64b2 BaseInputTransport: show VAD results when interruptions not allowed 2025-02-06 07:40:03 -08:00
Aleix Conchillo Flaqué
1ae4f0150d BaseOutputTransport: resample incoming audio if needed 2025-02-06 07:37:43 -08:00
Mark Backman
4c77c3ed34 Merge pull request #1148 from pipecat-ai/mb/fix-twilio-serializer
Fix sample rate handling in Twilio and Telnyx serializers
2025-02-06 10:25:13 -05:00
Aleix Conchillo Flaqué
975b97472a Merge pull request #1144 from pipecat-ai/aleix/frame-processor-missing-init-warning
FrameProcessor: add an error about missing super().process_frame(...)
2025-02-06 07:18:35 -08:00
Mark Backman
c8ccf13bc7 fix: Use audio_in_sample_rate to deserialize data for TelnyxFrameSerializer 2025-02-06 09:59:21 -05:00
Mark Backman
ba59736f87 fix: Use audio_in_sample_rate to deserialize data for TwilioFrameSerializer 2025-02-06 09:55:15 -05:00
Jin Kim
5989e1ed16 Merge branch 'main' into whisper-api 2025-02-06 13:14:36 +09:00
Aleix Conchillo Flaqué
bc21a0b817 FrameProcessor: add an error about missing super().process_frame(...) 2025-02-05 18:33:03 -08:00
Aleix Conchillo Flaqué
99d3227ff5 Merge pull request #1126 from pipecat-ai/aleix/prepare-0.0.55
update CHANGELOG for 0.0.55
2025-02-05 11:32:39 -08:00
Aleix Conchillo Flaqué
7730f59635 update CHANGELOG for 0.0.55 2025-02-05 11:30:40 -08:00
Aleix Conchillo Flaqué
ba31546c32 Merge pull request #1139 from pipecat-ai/aleix/task-start-metadata
pipeline task start metadata and unit test improvements
2025-02-05 10:51:51 -08:00
Aleix Conchillo Flaqué
a363d12d1f dev-requirements: fix conflicts because of nvidia-riva-client 2025-02-05 10:34:46 -08:00
Aleix Conchillo Flaqué
feab9c8fa2 tests: run_test() now uses PipelineTask 2025-02-05 10:34:38 -08:00
Aleix Conchillo Flaqué
61f6669926 task: allow passing StartFrame metadata via start_metadata param 2025-02-05 10:34:38 -08:00
Aleix Conchillo Flaqué
3be69908d2 Merge pull request #1131 from pipecat-ai/aleix/global-audio-sample-rates
introduce PipelineParams audio input/output sample rates
2025-02-05 08:11:25 -08:00
Aleix Conchillo Flaqué
fcb80ec330 playht: don't set sample_rate in _settings 2025-02-05 07:46:24 -08:00
Mark Backman
c9f5684e2f OpenAITTSService: Add warning about changing sample_rate 2025-02-05 10:13:46 -05:00
Mark Backman
c257fa1573 AzureTTSService, AzureHttpTTSService: add start() method 2025-02-05 10:05:19 -05:00
Mark Backman
97c55da29f PlayHTHttpTTSService: add start() method to set sample_rate 2025-02-05 09:54:41 -05:00
Aleix Conchillo Flaqué
49426aa9a1 transport(websocket): improve exception logging 2025-02-04 23:50:45 -08:00
Aleix Conchillo Flaqué
0a333c26da services(elevenlabs): warn if sample rate not supported 2025-02-04 23:50:21 -08:00
Aleix Conchillo Flaqué
75a29424ff examples(telnyx-chatbot): use cartesia so we can use 8khz 2025-02-04 23:49:50 -08:00
Filipi da Silva Fuchter
cd1b429308 Merge pull request #1133 from pipecat-ai/fixing_krisp_issue
Fixing the issue in Krisp when trying to create more than one
2025-02-04 20:44:29 -03:00
Filipi Fuchter
7f1ae4b8cc Fixing the issue in Krisp when trying to create more than one filter in the same process. 2025-02-04 20:10:56 -03:00
Aleix Conchillo Flaqué
af9fd811cd examples(moondream-chatbot): fix UserImageRequester 2025-02-04 14:37:53 -08:00
Aleix Conchillo Flaqué
69f5c9b9d3 update anthropic and openpipe versions 2025-02-04 14:37:36 -08:00
Aleix Conchillo Flaqué
ab45e481be introduce PipelineParams audio input/output sample rates 2025-02-04 14:12:56 -08:00
Jin Kim
ef1e4277d3 Add an example for Whisper using OpenAI API 2025-02-04 10:32:55 +09:00
Jin Kim
823b763b25 Change OpenAI example file name 2025-02-04 10:28:06 +09:00
Jin Kim
3cb189eb1f Add whisper STT service using OpenAI API 2025-02-04 10:27:28 +09:00
Aleix Conchillo Flaqué
cc54255c41 Merge pull request #1125 from pipecat-ai/aleix/twilio-chatbot-improvements 2025-02-03 11:10:33 -08:00
Aleix Conchillo Flaqué
1cdb66f889 examples(twilio-chatbot): create sample rate variable 2025-02-03 10:58:06 -08:00
Aleix Conchillo Flaqué
51a86a509c examples: multiple twilio-chatbot improvements 2025-02-03 10:36:24 -08:00
Aleix Conchillo Flaqué
824898f7b7 Merge pull request #1121 from pipecat-ai/aleix/audio-resamplers
introduce audio resamplers
2025-02-03 10:32:55 -08:00
Aleix Conchillo Flaqué
57dadb6359 audio(utils): some variable renames 2025-02-03 09:33:04 -08:00
Aleix Conchillo Flaqué
5dcdc68ef5 examples: fix 22 series initial gate state 2025-02-03 09:16:58 -08:00
Aleix Conchillo Flaqué
aafb2db620 GatedOpenAILLMContextAggregator: use keyword argument and add start_open 2025-02-03 09:16:44 -08:00
Aleix Conchillo Flaqué
f3f22cf61c AudioBufferProcessor: add start_recording()/stop_recording() 2025-02-01 11:06:58 -08:00
Aleix Conchillo Flaqué
371c2f3704 canonical: do not reset audio buffers 2025-02-01 11:06:58 -08:00
Aleix Conchillo Flaqué
1f14f62696 AudioBufferProcessor: fix audio buffer silence computation 2025-02-01 11:06:58 -08:00
Aleix Conchillo Flaqué
06449eff2c BaseAudioResampler: make resample() async 2025-02-01 11:06:58 -08:00
Aleix Conchillo Flaqué
dcfb86583d serializers: serialize()/deserialize() are now async 2025-02-01 11:06:58 -08:00
Aleix Conchillo Flaqué
cda34a1320 AudioBufferProcessor: fix user/bot audio buffers silence padding 2025-02-01 11:06:58 -08:00
Aleix Conchillo Flaqué
13611fd8e1 AudioBufferProcessor: call callback on CancelFrame 2025-02-01 11:06:58 -08:00
Aleix Conchillo Flaqué
fc89aad469 introduce audio resamplers 2025-02-01 11:06:55 -08:00
Aleix Conchillo Flaqué
6c7474e1a2 frames: add pass to DTMFFrames 2025-01-31 18:37:40 -08:00
Aleix Conchillo Flaqué
95f0dbf3f3 CHANGELOG.md: task.cancel() and EndFrame clarification 2025-01-31 18:35:35 -08:00
Aleix Conchillo Flaqué
11aeb68ddb frames: fix type s/OuputDTMFFrame/OutputDTMFFrame/ 2025-01-31 18:28:38 -08:00
Aleix Conchillo Flaqué
a43c102fc8 Merge pull request #1064 from jcbjoe/jg/additional_dtmf_frames
Added: Additional DTMF frames
2025-01-31 18:25:08 -08:00
Chad Bailey
d236973c0f moved sqlite code back to a single example 2025-01-31 23:18:06 +00:00
Mark Backman
16b49bdce6 Merge pull request #1122 from pipecat-ai/mb/openai-org-id
Add organization and project level auth in OpenAILLMService
2025-01-31 14:35:26 -05:00
Mark Backman
41477c8f78 Add organization and project level auth in OpenAILLMService 2025-01-31 14:27:25 -05:00
Aleix Conchillo Flaqué
bb9a2560c3 Merge pull request #1118 from pipecat-ai/aleix/task-manager
introduce TaskManager
2025-01-31 10:24:52 -08:00
Aleix Conchillo Flaqué
002699f16c rtvi: delay creating tasks until we get StartFrame 2025-01-31 10:06:11 -08:00
chadbailey59
a17243bc1e More Storybot updates (#1116)
* initial changes for gemini storybot

* storybot updates for gemini

* more storybot updates

* interim interruptible commit

* cleanup

* cleanup

* cleanup

* first draft

* wip

* more storybot fixes

* more storybot updates WIP

* committing before changing the image prompting strategy

* wip

* prompt updating

* cleanup

* cleanup

* cleanup

* readme cleanup

* fixup
2025-01-30 20:13:18 -06:00
Aleix Conchillo Flaqué
d95819746a tests: make sure QueuedFrameProcessor push frames 2025-01-30 13:48:44 -08:00
Aleix Conchillo Flaqué
b65f32e8e1 task: start TaskObserver when tasks can be created
We have to start proxy observer tasks once we know the TaskManager has an event
loop.
2025-01-30 13:46:56 -08:00
Aleix Conchillo Flaqué
0131d0a531 examples: make sure unhandled frames are always pushed 2025-01-30 13:15:49 -08:00
Aleix Conchillo Flaqué
642affb2fe add missing super().process_frame() calls 2025-01-30 13:15:17 -08:00
Aleix Conchillo Flaqué
a145005498 SyncParallelPipeline: cleanup source/sink processors 2025-01-30 13:13:02 -08:00
Aleix Conchillo Flaqué
241f241ed9 SyncParallelPipeline: don't add source/sink processors inside pipeline 2025-01-30 13:12:37 -08:00
Aleix Conchillo Flaqué
85e572e2d8 gladia: cleanup receive messages task 2025-01-30 13:10:47 -08:00
Aleix Conchillo Flaqué
10716e8ec1 utils: protect obj_id() and obj_count() with a lock 2025-01-30 13:10:36 -08:00
Aleix Conchillo Flaqué
41d60a14cc introduce TaskManager and PipelineRunner event loop 2025-01-30 13:10:36 -08:00
Aleix Conchillo Flaqué
e69c065a86 update CHANGELOG and fix formatting 2025-01-30 08:55:29 -08:00
Aleix Conchillo Flaqué
f90c17ab30 Merge pull request #1083 from team-telnyx/creating_telnyx_chatbot
Creating telnyx chatbot
2025-01-30 08:49:20 -08:00
Aleix Conchillo Flaqué
bc4fdd587a Merge pull request #1103 from pipecat-ai/aleix/tts-service-push-silence-before-tts-stop-frame
services(tts): allow pushing silence audio before TTSStoppedFrame
2025-01-30 08:48:41 -08:00
Aleix Conchillo Flaqué
665a6017f9 services(tts): allow pushing silence audio before TTSStoppedFrame 2025-01-30 08:46:56 -08:00
Aleix Conchillo Flaqué
4119d7a115 Merge pull request #1104 from pipecat-ai/aleix/twilio-transport-message-frames
serializers(twilio): handle transport message frames
2025-01-30 08:45:55 -08:00
Aleix Conchillo Flaqué
2634b03ffa serializers(twilio): handle transport message frames 2025-01-30 08:30:09 -08:00
Aleix Conchillo Flaqué
6a50759b9f Merge pull request #1105 from pipecat-ai/aleix/websocket-client
added new websocket client transport
2025-01-30 08:28:26 -08:00
Mark Backman
7982faba67 Merge pull request #1115 from pipecat-ai/mb/elevenlabs-language-fixes
Improve ElevenLabs language checking logic
2025-01-30 10:03:22 -05:00
Mark Backman
2b4bf57c04 Improve ElevenLabs language checking logic 2025-01-30 09:52:36 -05:00
Filipi Fuchter
7e3e126730 Migrating the base API URL for the react native example to an .env file. 2025-01-30 10:42:16 -03:00
Filipi Fuchter
75ca0571bb Improving the layout from the bot ready react native demo. 2025-01-30 10:31:04 -03:00
Filipi Fuchter
a48e5d0714 Only sending the message when it is a remote audio track. 2025-01-30 10:14:37 -03:00
Filipi Fuchter
2b6a992207 Sending the app-message to start playing audio once the track has started. 2025-01-30 09:37:33 -03:00
Filipi Fuchter
24cf106ed2 Refactoring the code to ask for the room that it should connect. 2025-01-30 09:14:18 -03:00
Rafal Skorski
b93e4ab9cb Formatting adjusted and the encoding selection moved from TelnyFrameSerilaizer to websocket_endpoint function in server.py 2025-01-30 12:52:30 +01:00
Dominic Stewart
c140c04b9a Merge pull request #1080 from DominicStewart/dom/voicemail-detection-bot
Add voicemail detection example
2025-01-30 09:20:12 +09:00
Dominic
a7c8d2af8e Removed extra space too 2025-01-30 09:18:29 +09:00
Dominic
f3f520a76a Removed formatting that vs code automatically adds to readme file 2025-01-30 09:17:27 +09:00
Mark Backman
5e0f42a3e0 Merge pull request #1111 from pipecat-ai/mb/gemini-restructure-messages
GoogleLLMContext: Allow _restructure_from_openai_messages to handle c…
2025-01-29 19:06:47 -05:00
Filipi Fuchter
95c8346cb5 Starting to create a react native client for the bot ready example. 2025-01-29 19:00:42 -03:00
Mark Backman
220ce9fd0f GoogleLLMContext: Allow _restructure_from_openai_messages to handle context frames that contain function call data and / or messages 2025-01-29 16:01:39 -05:00
Filipi da Silva Fuchter
5d0486a26f Merge pull request #1008 from pipecat-ai/cutting_initial_words
Avoid cutting off the beginning of the audio
2025-01-29 17:02:40 -03:00
Chad Bailey
bc98c2e36c added sqlite storage example 2025-01-29 19:12:15 +00:00
Aleix Conchillo Flaqué
091258f617 improve create_task names 2025-01-29 11:11:40 -08:00
Aleix Conchillo Flaqué
2a1408eb2a transports(websocket server): remove unused variable 2025-01-29 11:11:40 -08:00
Aleix Conchillo Flaqué
6393b41d58 transports(websocket): added WebsocketClientTransport 2025-01-29 11:11:37 -08:00
Filipi Fuchter
2a5728264c Adding missing dependency to openai 2025-01-29 15:52:42 -03:00
Filipi Fuchter
2ef0735462 Adding readme to teach how to use. 2025-01-29 15:45:48 -03:00
Filipi Fuchter
80bbfff4be Merge branch 'main' into cutting_initial_words 2025-01-29 15:36:52 -03:00
Aleix Conchillo Flaqué
4ff68e66b9 Merge pull request #1110 from pipecat-ai/aleix/frame-metadata
frames: added metadata field to Frame class
2025-01-29 10:30:59 -08:00
Aleix Conchillo Flaqué
3a688840fc frames: added metadata field to Frame class 2025-01-29 09:53:21 -08:00
Aleix Conchillo Flaqué
2ca8b95bbf Merge pull request #1106 from Vaibhav159/vl_moving_test_utils_to_pipecat_package
moving test utils inside of package
2025-01-29 09:44:34 -08:00
Mark Backman
2aafc6bd1d Merge pull request #1107 from AngeloGiacco/angelo/increase-ws-connection
fix: elevenlabs tts increase websocket max message size limit to 16MB
2025-01-29 10:04:42 -05:00
Angelo Giacco
0ff9ef8707 fix: add changelog 2025-01-29 14:27:39 +00:00
Angelo Giacco
596cae994d fix: elevenlabs tts increase websocket max message size limit to 16MB 2025-01-29 13:55:27 +00:00
Dominic
9ad9cb1ff8 Cleaned up formatting 2025-01-29 17:36:08 +09:00
Dominic Stewart
60e800e9ba Merge branch 'main' into dom/voicemail-detection-bot 2025-01-29 17:30:56 +09:00
Dominic
1c8f0ed7da Finalised code and added a bit about this example to the README 2025-01-29 17:27:44 +09:00
Vaibhav159
8407a86532 moving test utils inside of package 2025-01-29 12:46:43 +05:30
Dominic
417d661d28 Updated bot_runner and bot_daily with adjustments necessary to run voicemail detection from bot_daily code 2025-01-29 16:11:45 +09:00
Aleix Conchillo Flaqué
8cd23c42fc Merge pull request #1100 from pipecat-ai/aleix/use-task-cancel-on-left-disconnected
use `task.cancel()` when participant leaves/disconnects
2025-01-28 16:02:02 -08:00
Aleix Conchillo Flaqué
0547a15695 task: allow queuing a CancelFrame to cancel the task 2025-01-28 15:59:36 -08:00
Aleix Conchillo Flaqué
3fe2124314 examples: use task.cancel() when participant leaves or disconnects 2025-01-28 15:46:20 -08:00
Aleix Conchillo Flaqué
ba358a4f0a task: cleanup processors after task finishes running 2025-01-28 15:02:25 -08:00
Aleix Conchillo Flaqué
79ef8c947d Merge pull request #1099 from pipecat-ai/aleix/daily-transport-queue-events
transports(daily): queue events until join completes
2025-01-28 14:38:25 -08:00
Aleix Conchillo Flaqué
f024476b08 transports(daily): queue events until join completes 2025-01-28 11:22:42 -08:00
Dominic
73690a13d9 Moved voicemail detection to phone-chatbot and working on that now 2025-01-28 22:31:08 +09:00
Dominic
6ebf06a6fb Removed start_terminate_call function as unnecessary 2025-01-28 10:39:10 +09:00
Dominic
2f4f779c91 Fixed a few things 2025-01-28 10:39:10 +09:00
Dominic
941ee6e5e8 Add voicemail detection example 2025-01-28 10:39:10 +09:00
Aleix Conchillo Flaqué
cd5075ed7a Merge pull request #1097 from pipecat-ai/aleix/pipecat-0.0.57
prepare CHANGELOG for 0.0.54
2025-01-27 14:56:51 -08:00
Aleix Conchillo Flaqué
6f41a667c8 prepare CHANGELOG for 0.0.54 2025-01-27 14:48:56 -08:00
Aleix Conchillo Flaqué
0b222a7eae Merge pull request #1085 from pipecat-ai/aleix/task-creation-and-cancellation
improve task creation and cancellation
2025-01-27 14:47:20 -08:00
Aleix Conchillo Flaqué
f09f4b8fc4 services(tavus): fix EndFrame and CancelFrame processing 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
cca241a2b7 examples(22c): fix cancel_task call 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
1489e44740 gemini(multimodal live): fix model audio queue variable 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
f55f78e70e update CHANGELOG.md 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
10202dc529 transports(websockets): cancel or wait for tasks to finish 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
498805a34c FrameProcessor: add wait_for_task() 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
509f143e1b update CHANGELOG.md 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
737e4fa3bd gemini(multimodal live): connect on StartFrame 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
8b5228a105 utils: move task functions to asyncio module 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
6cc01bc5b0 examples: update 14 series with TTSSpeakFrame 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
2a2928d96c gemini: create transcribe tasks only once 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
a3a6adbd17 user_idle_processor: add missing parent cleanup() 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
bf5ced18b2 fix parallel pipelines cleanup 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
2eccd1b1e9 utils: update some logging levels 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
9374bed878 tests: langchain fixes 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
c03d0352b1 utils/tasks: added new documentation 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
af90b8b4fa utils: add wait_for_task() 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
0a9daa2f56 task: avoid canceling tasks more than once 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
e48c0e52ef transports(daily): avoid canceling task more than once 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
6bca8396d3 utils: error if we try to cancel the same task multiple times 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
c2d8a45a07 runner: warn about remaining dangling tasks 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
80a7f1b1e7 runner: improve signal handler task cancellation 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
aff6e24560 pipeline: fix pipeline cleanup 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
cb93f6b368 utils: store created tasks and add current_tasks() 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
ff0bcec33a transports: improve task naming 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
5885fcc230 add id and name properties 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
57b186cde8 base_transport: add name and id fields 2025-01-27 14:42:23 -08:00
Aleix Conchillo Flaqué
d1a3f404a5 improve task creation and cancellation
If a FrameProcessor needs to create a task it should use
FrameProcessor.create_task() and FrameProcessor.cancel_task(). This gives
Pipecat more control over all the tasks that are created in Pipecat.

Both functions internally use the utils module: utils.create_task() and
utils.cancel_task() which should also be used outside of FrameProcessors. That
is, unless strictly necessary, we should avoid using asyncio.create_task().
2025-01-27 14:42:23 -08:00
chadbailey59
179ddbea7d Add dialout to the Daily phone example (#998)
* added dialout to daily phone example

* cleanup

* cleanup

* pre-commit hook

* Fix typo

* More explicit README instructions

---------

Co-authored-by: Mark Backman <mark@daily.co>
2025-01-27 12:21:30 -06:00
Mark Backman
86c1e6a3bd Merge pull request #1081 from pipecat-ai/mb/user-idle-add-retry
Added retry functionality and a new callback to the UserIdleProcessor
2025-01-27 10:30:45 -05:00
Mark Backman
9e9822f17d Use inspect.signature to determine which callback to use 2025-01-27 10:24:58 -05:00
Mark Backman
5f9671e2ca Added retry functionality and a new callback to the UserIdleProcessor 2025-01-27 10:24:57 -05:00
Mark Backman
aac8961ae5 Merge pull request #1078 from pipecat-ai/mb/improve-error-handling-truncate-audio
Add better error handling for OpenAIRealtimeBetaLLMService truncate errors
2025-01-27 08:54:39 -05:00
Mark Backman
3e6377346a Merge pull request #1093 from pipecat-ai/mb/update-example-6a 2025-01-26 19:43:39 -05:00
Mark Backman
9d9a622b1a Merge pull request #1094 from pipecat-ai/mb/readme-service-section 2025-01-26 19:43:12 -05:00
Mark Backman
3e9a6b6262 Merge pull request #1095 from pipecat-ai/mb/elevenlabs-lang-codes 2025-01-26 12:21:28 -05:00
Mark Backman
fb3097560f Remove eleven_multilinguagal_v2 from language code list 2025-01-26 07:17:38 -05:00
Mark Backman
ff6368add0 Update README.md
Adding a section so that table can be linked to.
2025-01-25 16:12:53 -05:00
Mark Backman
89fd03d86f Merge pull request #1090 from vengad-arrowhead/main
Adding hindi danda symbol as end of sentence marker
2025-01-25 09:36:19 -05:00
Mark Backman
0672530d6b Fix foundational example 6a to switch images when the bot is speaking 2025-01-25 08:40:42 -05:00
vengadanathan srinivasan
7a0cfc8d3d Adding hindi danda symbol as end of sentence marker 2025-01-25 14:55:51 +05:30
Mark Backman
b881dd57b3 Merge pull request #1086 from pipecat-ai/mb/fix-expiry-time-type-mismatch 2025-01-24 17:31:08 -05:00
Mark Backman
abf0d0d053 Improve token parameter construction using DailyMeetingTokenProperties 2025-01-24 17:22:31 -05:00
Mark Backman
1acdf7aff7 Fix expiry_time type validation in get_token REST API helper 2025-01-24 17:21:50 -05:00
Mark Backman
96b90abda6 Merge pull request #1082 from pipecat-ai/mb/update-function-calling-examples
Update function calling examples to push a TextFrame in the start_cal…
2025-01-24 17:21:13 -05:00
Filipi da Silva Fuchter
202a844eeb Merge pull request #1051 from pipecat-ai/gemini_grounding_metadata_rtvi
Sending Search Response to RTVI
2025-01-24 19:20:50 -03:00
Filipi Fuchter
655d56f634 Fixing pydantic validation when creating meeting token. 2025-01-24 19:15:56 -03:00
Filipi Fuchter
07c84b733b Sending Search Response to RTVI 2025-01-24 18:59:46 -03:00
Filipi da Silva Fuchter
7c52736ff6 Merge pull request #1030 from pipecat-ai/gemini_grounding_metadata
Introduce support for extracting and processing grounding metadata from GoogleLLMService.
2025-01-24 15:41:54 -03:00
Mark Backman
48ce751602 Merge pull request #1075 from Vaibhav159/vl_add_daily_meeting_token_v2
adding models to DailyRestHelper
2025-01-24 13:21:52 -05:00
Vaibhav159
1f1e2dac2b wrapping things up 2025-01-24 23:44:23 +05:30
Vaibhav159
71c2dc3d05 minor typing change 2025-01-24 23:38:44 +05:30
Vaibhav159
ef02ece662 doc string 2025-01-24 22:47:40 +05:30
Vaibhav159
d5818fad5b addressing comments 2025-01-24 22:46:54 +05:30
Rafal Skorski
9c22bd8df1 Improving read me and encoding support 2025-01-24 16:44:11 +01:00
Mark Backman
dbea86baae Update function calling examples to push a TextFrame in the start_callback 2025-01-24 10:21:08 -05:00
Vaibhav159
c5faac1cf8 adding RecordingsBucketConfig 2025-01-24 15:14:20 +05:30
Vaibhav159
e106d7a215 adding line space 2025-01-24 09:12:07 +05:30
Vaibhav159
40c1a8369a updated changelog 2025-01-24 09:11:15 +05:30
Vaibhav159
6ab2404a98 adding more properties to daily room 2025-01-24 09:10:25 +05:30
Mark Backman
e61c996a2e Merge pull request #1079 from ecdeng/patch-1
Update cartesia.py to use the new model pointer `sonic`
2025-01-23 22:15:30 -05:00
Eric Deng
2c81dc1f06 Update cartesia.py to use the new model pointer sonic instead of sonic-english
We are now using `sonic` as a pointer to the latest stable release (https://docs.cartesia.ai/build-with-sonic/models#continuous-updates). sonic-english will forever point to `sonic-2024-10-19`, which is already out of date.
2025-01-23 15:47:07 -08:00
Mark Backman
53251dcb88 Add better error handling for OpenAIRealtimeBetaLLMService truncate errors 2025-01-23 14:25:08 -05:00
Mark Backman
d4e4b12109 Merge pull request #1071 from porcelaincode/patch-1
Update runner.py
2025-01-23 13:19:22 -05:00
Mark Backman
466d26a4f2 Merge pull request #1077 from Vaibhav159/vl_fix_missing_leftover_audio
adding missing audio buffer fix
2025-01-23 13:16:41 -05:00
Vaibhav159
ef511d580d adding missing audio buffer fix 2025-01-23 23:17:49 +05:30
Vaibhav159
5957ddb038 adding missing audio buffer fix 2025-01-23 23:17:18 +05:30
Vaibhav159
799c2d14b8 adding meeting token v2 func 2025-01-23 21:40:42 +05:30
Rafal Skorski
8eef21db6e Adding telnyx serializer 2025-01-23 15:39:46 +01:00
vatsal
dee1224530 Update runner.py 2025-01-23 13:21:49 +05:30
Mark Backman
fc6aa6eae8 Merge pull request #1060 from chhao01/patch-1
[bug]TypeError: object of type 'NoneType' has no len()
2025-01-22 19:14:35 -05:00
Mark Backman
ddd5bf70ab Merge pull request #1061 from Allenmylath/patch-21
Update README.md
2025-01-22 19:13:15 -05:00
allenmylath
aa59744444 Update examples/README.md
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-01-23 05:38:37 +05:30
chadbailey59
067ddfe505 Storytelling chatbot updates (#1066)
* initial changes for gemini storybot

* storybot updates for gemini

* more storybot updates

* interim interruptible commit

* cleanup

* cleanup

* cleanup

* cleanup
2025-01-22 15:20:21 -06:00
Mark Backman
a64df978e7 Merge pull request #1046 from pipecat-ai/mb/transcript-tts
Modified `TranscriptProcessor` to use `TTSTextFrame`s
2025-01-22 15:46:01 -05:00
Mark Backman
7167719761 Emit a transcription callback when receiving a CancelFrame, update examples accordingly 2025-01-22 14:56:29 -05:00
Mark Backman
e1430be9f9 Code review fixes 2025-01-22 14:56:29 -05:00
Mark Backman
c2fe8e7fdb Updated CHANGELOG 2025-01-22 14:56:28 -05:00
Mark Backman
31c77d8e35 Update examples for the updated TranscriptProcessor 2025-01-22 14:56:00 -05:00
Mark Backman
2a60d54830 Update the AssistantTranscriptProcessor to use TTSTextFrames in place of OpenAILLMContextFrames 2025-01-22 14:56:00 -05:00
Aleix Conchillo Flaqué
b3c99887dc Merge pull request #1068 from Canonical-AI-Inc/import-fix
Fixing missing import
2025-01-22 11:37:49 -08:00
Mark Backman
38ad75cc17 Merge pull request #1065 from pipecat-ai/mb/fix-openai_realtime-function-calling
OpenAIRealtimeBetaLLMService: Fixed an error in function calling
2025-01-22 14:37:01 -05:00
Adrian Cowham
2debac314c fixing missing import 2025-01-22 11:06:53 -08:00
Mark Backman
e0c9a1a1a2 Merge pull request #1041 from Allenmylath/patch-20
Update bot.py
2025-01-22 09:18:19 -05:00
allenmylath
4cdcca588e Update examples/moondream-chatbot/bot.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-01-22 19:40:12 +05:30
allenmylath
a90e81e2eb Update examples/moondream-chatbot/bot.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-01-22 19:38:36 +05:30
Mark Backman
0ba60c9e28 Merge pull request #975 from imsakg/main
fix(gemini): prevent non-audio modality processing
2025-01-22 09:03:18 -05:00
Mark Backman
5ca5fbd825 OpenAIRealtimeBetaLLMService: Fixed an error in function calling 2025-01-22 08:54:03 -05:00
Joe Garlick
b72504f1cb Added: Additional DTMF frames 2025-01-22 13:47:23 +00:00
allenmylath
2b52e2c109 Update README.md
Silero-tts changed to VAD, also description regarding session handling added to websocket chatbot
2025-01-22 14:42:35 +05:30
Cheng Hao
7e8fc2e7e2 [bug]TypeError: object of type 'NoneType' has no len()
Sometimes the chunk.choices is None, and I got exception like: 
```
TypeError: object of type 'NoneType' has no len()
```
2025-01-22 15:31:27 +08:00
Aleix Conchillo Flaqué
0d79a9eaa6 update CHANGELOG.md 2025-01-21 18:00:10 -08:00
Aleix Conchillo Flaqué
f89b9ec23f Merge pull request #1057 from pipecat-ai/aleix/replace-resampy-soxr
improve audio resampling by switching from resampy to soxr
2025-01-21 17:52:49 -08:00
Mark Backman
20d5824e56 Merge pull request #1058 from pipecat-ai/mb/fix-trace-log 2025-01-21 20:44:50 -05:00
Aleix Conchillo Flaqué
f23baa78d8 test-requirements: add soxr and remove resampy 2025-01-21 17:40:17 -08:00
Aleix Conchillo Flaqué
cacd6ba3fa improve audio resampling by switching from resampy to soxr 2025-01-21 17:40:17 -08:00
Aleix Conchillo Flaqué
f87ecd3a51 Merge pull request #1048 from pipecat-ai/aleix/add-unittest-utils
tests: add some initial run_test() utilities
2025-01-21 17:39:06 -08:00
Mark Backman
b96a922aa8 Fix trace log line for resume_processing_frames 2025-01-21 18:15:03 -05:00
Aleix Conchillo Flaqué
401d3ff267 tests: added PipelineTask tests 2025-01-21 11:45:43 -08:00
Aleix Conchillo Flaqué
ab4221a4db task: added BaseTask 2025-01-21 11:45:43 -08:00
Aleix Conchillo Flaqué
bd6f82cf94 task: allow specifying heartbeat period 2025-01-21 11:45:43 -08:00
Aleix Conchillo Flaqué
dd21b424d6 pyproject: ignore 'audioop' deprecation warning 2025-01-21 10:27:34 -08:00
Aleix Conchillo Flaqué
76884877dd tests: add pytest-asyncio dependency 2025-01-21 10:23:19 -08:00
Aleix Conchillo Flaqué
0d6c680133 README: add unit tests badge 2025-01-21 10:14:37 -08:00
Aleix Conchillo Flaqué
a27fe4bde2 tests: move test_ai_services to test_utils_string 2025-01-21 10:06:14 -08:00
Aleix Conchillo Flaqué
177cb2ca8b tests: initial pipeline and parallelpipeline tests 2025-01-21 09:57:54 -08:00
Aleix Conchillo Flaqué
3c970a3cee tests: add more filter tests 2025-01-21 09:43:57 -08:00
Aleix Conchillo Flaqué
af02f8f1cd filters(frame_filter): allow more than one frame 2025-01-21 09:43:33 -08:00
Aleix Conchillo Flaqué
2e0fb198bf frame_processor: allow pushing more frames after EndFrame
This can be useful for testing purposes. In real practice, there shouldn't be
any frames after an EndFrame is pushed.
2025-01-21 09:42:15 -08:00
Filipi da Silva Fuchter
4f758c5a3b Merge pull request #1050 from pipecat-ai/fix_rtvi_warning_msg
Ignoring transport messages that are not intended to RTVI.
2025-01-21 13:36:50 -03:00
Rafal Skorski
89b87289e2 elevenlabs key added to env.example 2025-01-21 17:12:27 +01:00
Rafal Skorski
e0e190a1a2 Create telnyx chat bot example application 2025-01-21 17:09:55 +01:00
Filipi Fuchter
3e0836b340 Ignoring transport messages that are not intended to RTVI. 2025-01-21 10:08:14 -03:00
Aleix Conchillo Flaqué
2f23693bf3 tests: fix test_protobuf_serializer.py 2025-01-20 18:39:59 -08:00
Aleix Conchillo Flaqué
b7dd9748cf serializers: fix special fix initialization 2025-01-20 18:39:41 -08:00
Aleix Conchillo Flaqué
d4d9c3b7ae tests: fix test_aggregators.py 2025-01-20 18:16:14 -08:00
Aleix Conchillo Flaqué
090bc81ec5 tests: add some initial run_test() utilities 2025-01-20 17:41:21 -08:00
Filipi Fuchter
9b61633aa0 Introduce support for extracting and processing grounding metadata from Google LLM responses. 2025-01-20 11:28:12 -03:00
Mark Backman
e3d53d3d9a Merge pull request #1044 from pipecat-ai/mb/elevenlabs-http-fix-voice-settings
Fixed a type error when using voice_settings in ElevenLabsHttpTTSService
2025-01-20 08:11:38 -05:00
Mark Backman
262d3a19c9 Fixed a type error when using voice_settings in ElevenLabsHttpTTSService 2025-01-20 07:57:02 -05:00
allenmylath
491feb691c Update bot.py
quiet and talking frames are determined based on BotStartedSpeakingFrame and BotStoppedSpeakingFrame not ttsframe
2025-01-20 14:00:17 +05:30
Aleix Conchillo Flaqué
e4f83b237e update CHANGELOG (remove 07d-interruptible-elevenlabs-http.py) 2025-01-19 11:36:18 -08:00
Aleix Conchillo Flaqué
a169e0cde9 Merge pull request #1035 from pipecat-ai/aleix/prepare-0.0.53
update CHANGELOG for 0.0.53
2025-01-18 14:50:35 -08:00
Aleix Conchillo Flaqué
c6d643d4ec update CHANGELOG for 0.0.53 2025-01-18 14:48:48 -08:00
Aleix Conchillo Flaqué
2abbd4bb27 Merge pull request #1039 from pipecat-ai/aleix/fish-audio-websocket-service
services(fish): FishAudioTTSService to use WebsocketService
2025-01-18 14:48:20 -08:00
Aleix Conchillo Flaqué
e0011a3996 services(fish): FishAudioTTSService to use WebsocketService 2025-01-18 14:29:45 -08:00
Aleix Conchillo Flaqué
ea44c59ddd Merge pull request #1037 from Vaibhav159/fixing_unused_11labs_package
removing unused 11labs package imports
2025-01-17 22:08:04 -08:00
Vaibhav159
a9c7dbbc05 removing unused code 2025-01-18 10:58:07 +05:30
Vaibhav159
8a87e92b2b adding missing 11labs package 2025-01-18 10:48:57 +05:30
Mark Backman
982f2becc6 Merge pull request #1002 from pipecat-ai/mb/add-on-error-callback
Register the on_error handler
2025-01-17 21:58:59 -05:00
Mark Backman
e049ae470d Register the on_error handler 2025-01-17 21:49:42 -05:00
Mark Backman
e159f2dce1 Merge pull request #1024 from pipecat-ai/mb/elevenlabs-http
Add ElevenLabsHttpTTSService
2025-01-17 21:30:31 -05:00
Aleix Conchillo Flaqué
e9162ae467 Merge pull request #1004 from Fluentsai/feature/dtmf_input
Twilio serializer reading dtmf websocket messages
2025-01-17 18:14:46 -08:00
Aleix Conchillo Flaqué
bb65512ff4 Merge pull request #1034 from pipecat-ai/aleix/ulaw-resample-update
ulaw resample update
2025-01-17 17:47:18 -08:00
Mark Backman
b81323d676 Code review fixes + docstrings 2025-01-17 20:12:43 -05:00
Aleix Conchillo Flaqué
65fa77dfa5 audio: use resample_audio to resample ulaw bytes 2025-01-17 15:24:41 -08:00
Aleix Conchillo Flaqué
9ddd9ae27c Merge pull request #1011 from Vaibhav159/vl_deepgram_metrics_without_vad
adding metric generation without deepgram VAD
2025-01-17 14:47:19 -08:00
Aleix Conchillo Flaqué
12fc6e17ef Merge pull request #1033 from pipecat-ai/aleix/observers-performance
task: add TaskObserver and avoid pipeline blocking
2025-01-17 14:43:26 -08:00
Aleix Conchillo Flaqué
3e4020cdba task: add TaskObserver and avoid pipeline blocking
Observers now process frames in separate tasks. This avoids blocking the
pipeline while the observer is processing the frame.
2025-01-17 11:15:52 -08:00
Aleix Conchillo Flaqué
4f883ee31f Merge pull request #1023 from pipecat-ai/aleix/introduce-heartbeat-frames
introduce heartbeat frames
2025-01-17 10:31:07 -08:00
Mark Backman
3ff360f042 Merge pull request #1032 from pipecat-ai/mb/user-idle-fixes
Start UserIdleProcessor on speaking frame, fix bug not pushing EndFrame
2025-01-17 13:18:09 -05:00
Aleix Conchillo Flaqué
45cbad5b3e task: add HEARTBEAT_MONITOR_SECONDS 2025-01-17 10:11:28 -08:00
Aleix Conchillo Flaqué
477d0d154b frame_processor: make sure clock is initialized 2025-01-17 10:05:23 -08:00
Aleix Conchillo Flaqué
4b3c776f58 task: don't use push queue to send a heartbeat
This is because we might be waiting for the EndFrame. Currently, if we push an
EndFrame to the task, the task will block until the EndFrame traverses all the
pipeline.
2025-01-17 10:04:24 -08:00
Aleix Conchillo Flaqué
da0c4cfd99 task: increase heartbeat monitoring to 5 seconds 2025-01-17 10:04:05 -08:00
Aleix Conchillo Flaqué
f22a00570d task: start heartbeats task when push task starts 2025-01-17 10:03:13 -08:00
Mark Backman
85f4663a41 Start UserIdleProcessor on speaking frame, fix bug not pushing EndFrame 2025-01-17 12:54:17 -05:00
Aleix Conchillo Flaqué
915e3bb3c7 Merge pull request #1029 from Vaibhav159/vl_fixing_idle_frame_processor_logic
fixing IdleFrameProcessor and UserIdleProcessor init logic
2025-01-17 06:48:13 -08:00
Vaibhav159
80779c48d6 sort fix 2025-01-17 20:07:25 +05:30
Vaibhav159
c444557965 fixing IdleFrameProcessor and UserIdleProcessor init logic 2025-01-17 19:50:53 +05:30
Mark Backman
d51893f61c Refactor for aiohttp, correct use of settings 2025-01-16 23:49:53 -05:00
Mark Backman
740d2743df Add TTFB metrics 2025-01-16 23:05:53 -05:00
Mark Backman
0dd22fb879 Merge pull request #1022 from pipecat-ai/mb/fix-abstractmethod
Remove @abstractmethod from set_model and set_model in TTSService class
2025-01-16 22:59:26 -05:00
Mark Backman
225b65c3d2 Add ElevenLabsHttpTTSService 2025-01-16 22:46:32 -05:00
Aleix Conchillo Flaqué
2503f76107 examples: add 31-heartbeats.py 2025-01-16 19:31:13 -08:00
Aleix Conchillo Flaqué
ff8aa68942 introduce heartbeat frames 2025-01-16 19:31:13 -08:00
Maxim Makatchev
c5edbf4b75 Made InputDTMFFrame a DataFrame and moved up to data frames 2025-01-17 12:27:04 +09:00
Aleix Conchillo Flaqué
799777774b Merge pull request #1018 from pipecat-ai/aleix/streamline-thread-pool-executors
transports: streamline max_workers for ThreadPoolExecutors
2025-01-16 19:05:41 -08:00
Mark Backman
fdef8a97e2 Remove @abstractmethod from set_model and set_model in TTSService class 2025-01-16 21:36:51 -05:00
Mark Backman
0163247410 Merge pull request #1021 from pipecat-ai/mb/improve-30
Add a second observer to the 30-observer.py example
2025-01-16 21:19:35 -05:00
James Hush
221e044046 demo: Update translator bot example (#1005)
* docs: Update translator bot example

Updates the translator bot to do the following:

- Allow you to specify the in and out languages
- Uses TranscriptionProcessor to handle transcriptions

* Simplify the example, improve performance

---------

Co-authored-by: Mark Backman <mark@daily.co>
2025-01-17 10:08:15 +08:00
Mark Backman
532fd31fd7 Add a second observer to the 30-observer.py example 2025-01-16 19:46:18 -05:00
Mark Backman
3e178fd46f Merge pull request #1020 from pipecat-ai/mb/observer-foundational
Add foundational example 30 to show how to use an Observer
2025-01-16 19:28:26 -05:00
Mark Backman
07cb8b7a89 Extend the example to include BotStartedSpeakingFrame and BotStoppedSpeakingFrame 2025-01-16 19:24:01 -05:00
Mark Backman
e805738d4c Merge pull request #1009 from pipecat-ai/mb/tts-ignore-interim-transcripts
TTSService should only process LLMTextFrames
2025-01-16 17:09:24 -05:00
Mark Backman
119bc7e35f Update check to exclude transcription frames 2025-01-16 16:43:46 -05:00
Mark Backman
b9b02845a3 Add foundational example 30 to show how to use an Observer 2025-01-16 16:37:32 -05:00
Aleix Conchillo Flaqué
3714f12edc Merge pull request #1019 from Canonical-AI-Inc/canonical-transcripts
Add transcript to Canonical Metrics Service
2025-01-16 13:36:55 -08:00
Aleix Conchillo Flaqué
d2b8171197 transports: streamline max_workers for ThreadPoolExecutors 2025-01-16 13:34:04 -08:00
Filipi Fuchter
c4c15eff39 Sending a silence frame to prevent the audio from clipping. 2025-01-16 18:30:19 -03:00
Adrian Cowham
d0b48c95bb updated the example to use stereo audio and pass in the context. also updated the service to send the transcripts if they're available 2025-01-16 13:12:38 -08:00
Aleix Conchillo Flaqué
73ed0c1ad7 Merge pull request #1017 from pipecat-ai/aleix/additional-trace-logging
additional trace logging
2025-01-16 12:38:47 -08:00
Vanessa Pyne
c211580fec Merge pull request #1016 from pipecat-ai/vp-1007-nonetype
services(gemini_multimodal_live): set content to [] if not present in messages
2025-01-16 14:14:50 -06:00
Aleix Conchillo Flaqué
359b55a85e additional trace logging 2025-01-16 11:19:42 -08:00
Filipi Fuchter
7efd00e0f7 Asking for the bot to send the audio only when the audio element is already on playing state. 2025-01-16 16:00:56 -03:00
kompfner
8b602a3f62 Merge pull request #1010 from pipecat-ai/ios-simplechatbot-assorted-improvements
iOS SimpleChatbot assorted improvements
2025-01-16 13:59:45 -05:00
kompfner
485c231f69 Merge pull request #1012 from pipecat-ai/simplechatbot-readme-local-pipecat
Add to the SimpleChatbot server README a step for pointing to the loc…
2025-01-16 13:46:19 -05:00
vipyne
8ba3b150eb services(gemini_multimodal_live): set content to [] if not present in messages
... which it will be if the message is a tool call
2025-01-16 11:59:02 -06:00
Paul Kompfner
b5f72b4378 Add to the SimpleChatbot server README a step for pointing to the local version of pipecat 2025-01-16 11:59:44 -05:00
Vaibhav159
85e7d62f94 fixing log text 2025-01-16 21:36:51 +05:30
Vaibhav159
923d33eeff fixing ruff 2025-01-16 21:32:48 +05:30
Vaibhav159
7ee6e7193d adding metric generation without deepgram VAD 2025-01-16 21:23:56 +05:30
Paul Kompfner
156fffe6fc In iOS SimpleChatbot demo, add clarifying note to Audio Settings section header explaining that "(No selection = system default)".
Ideally we could add a row showing that the system default is selected, but this is OK as a short-term fix. Also, the presence of that row might suggest that "system default" is selectable, but it's not: this is currently a limitation in the Pipecat Client.
2025-01-16 10:32:55 -05:00
Paul Kompfner
c9834e2712 In iOS SimpleChatbot demo, remove unused LLMHelperDelegate protocol conformance 2025-01-16 10:31:17 -05:00
Paul Kompfner
1e7e307f69 In iOS SimpleChatbot demo, call release() when disconnecting the voice client, since we're not using it after disconnecting 2025-01-16 10:30:06 -05:00
Mark Backman
67e47a388d TTSService should only process LLMTextFrames 2025-01-16 10:03:24 -05:00
Filipi Fuchter
119c0da299 Configuring a proxy so we can test from mobile 2025-01-16 11:02:53 -03:00
Filipi Fuchter
ea1323723d Handling the signalling to play the audio 2025-01-16 10:42:22 -03:00
Filipi Fuchter
d2efe27350 Improving the logs and updating status 2025-01-16 10:36:45 -03:00
Filipi Fuchter
5dc7d2a378 Creating the bot when pressing to connect. 2025-01-16 10:28:39 -03:00
Filipi Fuchter
88c540f9bc Starting to create the example signalling through app message. 2025-01-16 10:14:38 -03:00
Maxim Makatchev
dcf317f2fa Twilio serializer reading dtmf websocket messages and generating InputDTMFFrame containing the corresponding value of KeypadEntry 2025-01-16 17:43:12 +09:00
Aleix Conchillo Flaqué
b8ffd7b16b Merge pull request #996 from pipecat-ai/aleix/introduce-observers
introduce pipeline frame observers
2025-01-15 18:05:33 -08:00
Aleix Conchillo Flaqué
08f1dda94e observers: add a timestamp to on_push_frame() 2025-01-15 17:45:00 -08:00
Aleix Conchillo Flaqué
45039e7cde update CHANGELOG.md 2025-01-15 17:45:00 -08:00
Aleix Conchillo Flaqué
e50c76d075 examples(simple-chatbot): use RTVIObserver for server-client messages 2025-01-15 17:45:00 -08:00
Aleix Conchillo Flaqué
dd9f9179cc rtvi(RTVIObserver): use observers for RTVI server->client messages 2025-01-15 17:45:00 -08:00
Aleix Conchillo Flaqué
c8da531402 pipeline(task): add support for pipeline frame observers 2025-01-15 17:43:59 -08:00
Aleix Conchillo Flaqué
25bcaf5c7c observers: introduce pipeline observers 2025-01-15 17:43:59 -08:00
Aleix Conchillo Flaqué
2d0f3341c3 frames: add LLMTextFrame and TTSTextFrame
This is to distinguish what type of service has generated the TextFrames.
2025-01-15 17:43:59 -08:00
Aleix Conchillo Flaqué
7626d7b04b Merge pull request #999 from pipecat-ai/aleix/add-pre-commit-hooks
add pre-commit hooks
2025-01-15 17:39:34 -08:00
Aleix Conchillo Flaqué
f78520f7d0 add pre-commit hooks
Fixes #945
2025-01-15 13:44:21 -08:00
Aleix Conchillo Flaqué
bb4766455d Merge pull request #997 from pipecat-ai/aleix/update-dependencies-01-15-25
update dependencies (go back to numpy1)
2025-01-15 13:35:46 -08:00
Aleix Conchillo Flaqué
9dacbbbbf4 fix ruff formatting 2025-01-15 13:02:13 -08:00
Aleix Conchillo Flaqué
4de192fbb0 update dependencies (go back to numpy1)
Fixes #911, #913
2025-01-15 12:04:28 -08:00
kompfner
80b6c28431 Merge pull request #992 from pipecat-ai/live-updates-to-selected-and-available-mics
In the iOS SimpleChatbot demo, wire up live updates to the selected m…
2025-01-15 15:00:14 -05:00
Mark Backman
f471744bca Merge pull request #995 from pipecat-ai/vp-riva-bump
deps(riva): bump to 2.18.0
2025-01-15 14:35:39 -05:00
Mark Backman
d5df4b064b Merge pull request #987 from pipecat-ai/mb/deepseek-typo
Fix error log in DeepSeekLLMService and CerebrasLLMService
2025-01-15 14:31:34 -05:00
Mark Backman
06a0e29920 Merge pull request #991 from pipecat-ai/mb/update-web-simple-chatbot
Update simple-chatbot example to use the latest client SDKs
2025-01-15 13:36:03 -05:00
Aleix Conchillo Flaqué
64eb8e7262 Merge pull request #994 from Vaibhav159/vl_deepgram_with_vad
finalize on DeepgramSTTService on VAD
2025-01-15 10:28:11 -08:00
Filipi da Silva Fuchter
d8386c12dc Merge pull request #990 from pipecat-ai/bumping_ios_example
Using PipecatClient version 0.3.2
2025-01-15 14:29:01 -03:00
vipyne
50e798bcd9 deps(riva): bump to 2.18.0 2025-01-15 10:24:57 -06:00
Vaibhav159
d1ac7751da finalize on DeepgramSTTService 2025-01-15 20:43:23 +05:30
Paul Kompfner
110ce27c91 In the iOS SimpleChatbot demo, wire up live updates to the selected mic and available mics list. This is beneficial for a few reasons:
- Live updates are nice! We can now more easily see what's going on when we connect or disconnect a mic.
- Resolves an issue where the initial selected mic was not shown.
- Let us see when the Pipecat client automatically switches to a new mic, like when one is connected.
2025-01-15 09:56:27 -05:00
Mark Backman
8b657158ca Update React simple-chatbot client to use latest client SDKs 2025-01-15 09:50:43 -05:00
Mark Backman
cce14fca97 Update JS simple-chatbot client to use latest client SDKs 2025-01-15 09:47:20 -05:00
Filipi Fuchter
7c051516d8 Using PipecatClient version 0.3.2 2025-01-15 09:57:57 -03:00
Mark Backman
5f402ad741 Merge pull request #988 from pipecat-ai/mb/readme-openrouter
Update README.md
2025-01-14 18:38:35 -05:00
Mark Backman
a80b186cea Update README.md
Add OpenRouter to the README
2025-01-14 18:08:14 -05:00
Mark Backman
c65aaf3b2e Merge pull request #967 from sahilsuman933/openrouter-integration
feat(services): Add OpenRouter LLM Service Integration
2025-01-14 18:06:13 -05:00
Mark Backman
e815d7776f Fix error log in DeepSeekLLMService and CerebrasLLMService 2025-01-14 18:03:29 -05:00
sahil suman
11fc08ef24 fix changelog
Signed-off-by: sahil suman <sahilsuman933@gmail.com>
2025-01-15 02:57:09 +05:30
sahil suman
6f3b0fdf73 fix changelog
Signed-off-by: sahil suman <sahilsuman933@gmail.com>
2025-01-15 02:56:16 +05:30
sahil suman
885bc32827 added changes in changelog.
Signed-off-by: sahil suman <sahilsuman933@gmail.com>
2025-01-15 02:53:04 +05:30
sahil suman
7339cc7197 Merge remote-tracking branch 'origin/main' into openrouter-integration 2025-01-15 02:52:19 +05:30
sahil suman
62e9e6bc5a changed the file name.
Signed-off-by: sahil suman <sahilsuman933@gmail.com>
2025-01-15 02:21:58 +05:30
Sahil Suman
329da50338 Update src/pipecat/services/openrouter.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-01-15 02:20:22 +05:30
sahil suman
4d307d26d8 made the required changes.
Signed-off-by: sahil suman <sahilsuman933@gmail.com>
2025-01-15 02:19:05 +05:30
Mark Backman
a74b9354ec Merge pull request #962 from pipecat-ai/mb/improve-tts-reconnection-logic
Improve websocket based TTS service reconnection logic
2025-01-14 14:48:00 -05:00
sahil suman
11381a536f added example for function calling and made the required changes.
Signed-off-by: sahil suman <sahilsuman933@gmail.com>
2025-01-15 01:00:33 +05:30
Mark Backman
b53bc8a879 _calculate_wait_times as private, add and use WebsocketServiceException 2025-01-14 13:20:13 -05:00
Mark Backman
e3d8910814 Update CHANGELOG 2025-01-14 13:12:40 -05:00
Mark Backman
e60a59434f Refactor LMNTTTSService to make a websocket connection directly, then use the WebsocketService base class 2025-01-14 13:09:58 -05:00
Mark Backman
5e5de618f3 Update PlayHTTTSService to use the WebsocketService base class 2025-01-14 13:09:58 -05:00
Mark Backman
8af92f7923 Update ElevenLabsTTSService to use the WebsocketService base class 2025-01-14 13:09:58 -05:00
Mark Backman
f39e17857e Add a WebsocketService base class to retry, ensure that retries reset after a successful connection, update Cartesia to use the new WebsocketService 2025-01-14 13:09:58 -05:00
Aleix Conchillo Flaqué
5b632de04a Merge pull request #982 from pipecat-ai/aleix/pipelinetask-cleanup-sink
pipeline(task): cleanup Sink processor
2025-01-14 09:14:03 -08:00
Mark Backman
6bcc196489 Merge pull request #969 from pipecat-ai/mb/deepseek
Add support for DeepSeek LLM
2025-01-14 09:40:06 -05:00
Mark Backman
66375e9dff Update dot-env.template API keys 2025-01-14 09:34:34 -05:00
Mark Backman
bc839492b6 Add support for DeepSeek LLM 2025-01-14 09:34:33 -05:00
Filipi da Silva Fuchter
4854645637 Merge pull request #960 from pipecat-ai/example_gemini_with_goolge_search
Example with Gemini using google search to retrieve news.
2025-01-14 10:07:15 -03:00
Mark Backman
98e80b7d4a Merge pull request #970 from pipecat-ai/mb/user-controlled-run-llm
Add an override_run_llm option to optionally defer function call completion
2025-01-13 18:48:00 -05:00
Mark Backman
8c0ecb89de Refactor for new on_context_updated callback and new frame properties 2025-01-13 17:20:41 -05:00
Aleix Conchillo Flaqué
4c8fcb2cfc pipeline(task): cleanup Sink processor
Fixes #953
2025-01-13 13:29:44 -08:00
Aleix Conchillo Flaqué
92313d6ce7 Merge pull request #972 from pipecat-ai/aleix/simple-chatbot-android-workflow-update
github: only run android simple-chatbot worflow if android example modified
2025-01-13 13:26:12 -08:00
Mark Backman
1ca6ecc46e Update CHANGELOG 2025-01-13 09:49:09 -05:00
Mark Backman
f1947d7d38 Update Anthropic and Gemini to allow overriding run_llm 2025-01-13 09:48:43 -05:00
Mark Backman
0852570212 Update Grok for function call override 2025-01-13 09:48:43 -05:00
Mark Backman
874b8bb136 Allow for an override of running a completion after a function call completes, OpenAI 2025-01-13 09:48:43 -05:00
Mark Backman
da1878537b Merge pull request #974 from pipecat-ai/mb/26d-example
Align 26d example with foundation norms
2025-01-12 19:44:31 -05:00
Mark Backman
f406d93b0f Align 26d example with foundation norms 2025-01-12 19:19:16 -05:00
Aleix Conchillo Flaqué
3cd2b90177 Merge pull request #971 from pipecat-ai/aleix/update-copyright-keep-original-year
update copyright keeping original year (2024)
2025-01-12 11:37:15 -08:00
Aleix Conchillo Flaqué
c4f0c7bcfd github: only run android simple-chatbot worflow if android example modified 2025-01-12 11:35:34 -08:00
Aleix Conchillo Flaqué
95e69597f3 update copyright keeping original year (2024) 2025-01-12 11:34:00 -08:00
Aleix Conchillo Flaqué
710baa5e17 Merge pull request #973 from pipecat-ai/aleix/simple-chatbot-clients
examples/simple-chatbot: move clients to client directory
2025-01-12 11:28:21 -08:00
Mert Sefa AKGUN
14e5419913 fix(gemini): prevent non-audio modality processing
Add an early return in the _handle_transcribe_model_audio method to
prevent unnecessary processing when the modalities setting is not set
to audio. This change ensures that audio transcription only occurs
when appropriate.
2025-01-12 22:17:10 +03:00
Mark Backman
8c953bac41 Merge pull request #966 from imsakg/main
fix(services): handle TranscriptionFrame separately in TTSService
2025-01-12 11:33:38 -05:00
Mark Backman
4c0861ce39 Some addition links and README changes 2025-01-12 09:27:23 -05:00
Mark Backman
12b1e1db9d Merge pull request #965 from pipecat-ai/mb/aws-add-session-token
Add optional aws_session_token for PollyTTSService
2025-01-12 09:13:03 -05:00
Mark Backman
53bfdfd83f Merge pull request #963 from pipecat-ai/mb/cleanup-examples
Update examples to align with latest best practices
2025-01-12 09:12:34 -05:00
Mark Backman
2a5593afea Merge pull request #968 from pipecat-ai/mb/readme-websocket
Update README.md
2025-01-12 09:12:19 -05:00
Aleix Conchillo Flaqué
a04a920e54 examples/simple-chatbot: move clients to client directory 2025-01-11 19:16:05 -08:00
Aleix Conchillo Flaqué
2ce6d92455 Merge pull request #959 from KevGTL/fix-livekit-transport
fix: push input audio frame only via push_audio_frame()
2025-01-11 19:03:35 -08:00
Mark Backman
1ecd5da219 Update README.md
Add websocket docs links to README.
2025-01-11 08:37:17 -05:00
sahil suman
e04da334d7 add support for openrouter.
Signed-off-by: sahil suman <sahilsuman933@gmail.com>
2025-01-11 17:50:58 +05:30
Mert Sefa AKGUN
7ec351813c style(ai_services): fix import order with ruff 2025-01-11 13:04:26 +03:00
Mert Sefa AKGUN
df6c2fc403 fix(services): handle TranscriptionFrame separately in TTSService
Exclude TranscriptionFrame from text frame processing in TTSService by updating the type check condition. This resolves unintended processing behavior when handling different frame types.
2025-01-11 13:00:38 +03:00
Mark Backman
71e107725c Add optional aws_session_token for PollyTTSService 2025-01-10 19:33:47 -05:00
Mark Backman
4d0c11fcab Update examples to align with latest best practices 2025-01-10 15:07:06 -05:00
Mark Backman
a8ae79831e Merge pull request #921 from pipecat-ai/mb/playht-http
PlayHTHttpTTSService fixes
2025-01-10 13:26:45 -05:00
Mark Backman
86516d2415 PlayHTHttpTTSService fixes 2025-01-10 13:21:27 -05:00
Vanessa Pyne
5cd9dab14b Merge pull request #949 from imsakg/main
fix(examples): correct TTS service import and setup
2025-01-10 10:58:50 -06:00
Kwindla Hultman Kramer
a3e2e06975 Merge pull request #961 from pipecat-ai/khk/tiny-chatbot-readme-fix
fixed 404 in SimpleChatbot iOS example README
2025-01-10 08:45:05 -08:00
Kwindla Hultman Kramer
e7107b99c5 fixed 404 in SimpleChatbot iOS example README 2025-01-10 08:37:13 -08:00
Filipi Fuchter
aa1b8879ee Fixing ruff format 2025-01-10 13:21:51 -03:00
Mark Backman
6802459165 Merge pull request #956 from pipecat-ai/mb/tavus
Update the Tavus example and comment about using the PERSONA_ID
2025-01-10 11:18:05 -05:00
Filipi Fuchter
6719d1fddc Example with Gemini using google search to retrieve news. 2025-01-10 13:13:59 -03:00
kompfner
a798bf18f2 Merge pull request #955 from pipecat-ai/ios-simple-chatbot-mainactor-fixes
iOS SimpleChatbot @MainActor fixes
2025-01-10 09:37:02 -05:00
Kevin Oury
f9d0cca60f fix: push input audio frame only via push_audio_frame() 2025-01-10 15:02:38 +01:00
Mark Backman
cb22de0d13 Update the Tavus example and comment about using the PERSONA_ID 2025-01-10 08:01:00 -05:00
marcus-daily
7d161cc53b Setting target SDK to 35 2025-01-10 09:50:37 +00:00
marcus-daily
255abf46ef Updating Gradle and AGP 2025-01-10 09:50:37 +00:00
marcus-daily
27579bcb70 Fixing imports 2025-01-10 09:50:37 +00:00
marcus-daily
1295b64879 Updating library dependencies 2025-01-10 09:50:37 +00:00
marcus-daily
ca57670f65 Removing unnecessary drawables 2025-01-10 09:50:37 +00:00
marcus-daily
06d0a231b9 Android demo app for simple-chatbot example 2025-01-10 09:50:37 +00:00
Mert Sefa AKGUN
67af4e619b style(examples): fix ruff formatting in Gemini text example
Refactor `CartesiaTTSService` instantiation to comply with line
length requirements from the ruff linter.
2025-01-10 12:32:53 +03:00
Mert Sefa AKGUN
21c274944e Update examples/foundational/26d-gemini-multimodal-live-text.py
Co-authored-by: Vanessa Pyne <vipyne@gmail.com>
2025-01-10 12:28:13 +03:00
Paul Kompfner
3239249feb In the iOS SimpleChatbot, fix @MainActor-related warnings (which would be errors in Swift 6). The delegate methods aren't contractually guaranteed to run on the main thread, so we can't mark them as @MainActor. 2025-01-09 17:35:44 -05:00
Paul Kompfner
216979c377 Bump iOS SimpleChatbot's pipecat-client-ios-daily dependency to version 0.3.1 2025-01-09 16:22:26 -05:00
Filipi da Silva Fuchter
b9db53d3cd Merge pull request #952 from pipecat-ai/fixing_gemini_function_calling
Fixing GeminiMultimodalLiveLLMService function calling to work with pipecat-flows
2025-01-09 17:50:25 -03:00
Filipi Fuchter
58bfcc8370 Fixing GeminiMultimodalLiveLLMService function calling when using with pipecat-flows. 2025-01-09 12:22:37 -03:00
Mert Sefa AKGUN
6664c492ac feat(gemini): enable audio transcription in live text example
Add options to transcribe both user and model audio during the GeminiMultimodalLiveLLMService setup in the 26d-gemini-multimodal-live-text.py example.
2025-01-09 15:38:33 +03:00
Mert Sefa AKGUN
7634058f97 fix(examples): correct TTS service import and setup
- Update import to use CartesiaTTSService instead of CartesiaMultiLingualTTSService.
- Adjust GeminiMultimodalLiveLLMService setup to use set_model_modalities with TEXT modality.
2025-01-09 02:19:08 +03:00
Mark Backman
39c6446bdc Merge pull request #947 from pipecat-ai/mb/add-rime-set-voices
Add setters for model and voice to RimeHttpTTSService
2025-01-08 14:25:24 -05:00
Filipi da Silva Fuchter
2df7dfcc91 Merge pull request #943 from pipecat-ai/simple_chat_bot_ios
SimpleChatbot iOS app.
2025-01-08 16:17:39 -03:00
Mark Backman
c23c9e046c Add setters for model and voice to RimeHttpTTSService 2025-01-08 14:17:32 -05:00
Mark Backman
9dae753e8c Merge pull request #926 from imsakg/main
feat(gemini): add text handling to GeminiMultimodalLive
2025-01-08 13:42:17 -05:00
Mert Sefa AKGUN
40e9ee6d63 fix(examples): correct import order in Gemini example
- Move `CartesiaMultiLingualTTSService` import to maintain proper order.
- Reorganize `enum` import to adhere to styling standards.
2025-01-08 21:14:29 +03:00
Mert Sefa AKGUN
a342fe732e docs: update CHANGELOG with Gemini modalities and examples 2025-01-08 19:34:42 +03:00
Mert Sefa AKGUN
a729834482 refactor(gemini): reposition WebSocket connection code
Move WebSocket connection setup earlier in the function for better
organization and to prepare for subsequent configuration steps.
2025-01-08 19:29:36 +03:00
Mert Sefa AKGUN
94a6f1086e feat(gemini): change default modality to AUDIO
Modify the default modality in the `InputParams` class from TEXT to AUDIO
to better align with the intended use case for GeminiMultimodalLive
service.
2025-01-08 19:29:36 +03:00
Mert Sefa AKGUN
b42d3a8257 feat(gemini): add modality configuration for GeminiMultimodalLive
- Introduce `GeminiMultimodalModalities` enum for modality options.
- Add modality field to `InputParams`, defaulting to text.
- Simplify modality setup with `set_model_modalities` method.
- Refactor WebSocket configuration to support dynamic response modalities.
2025-01-08 19:29:36 +03:00
Mert Sefa AKGUN
12ae980abe feat(gemini): handle full text response in GeminiMultimodalLive
- Add a buffer to store bot text responses.
- Push a `LLMFullResponseStartFrame` when text begins.
- Clear the text buffer and send `LLMFullResponseEndFrame` after processing.
2025-01-08 19:29:36 +03:00
Mert Sefa AKGUN
cdb909958c feat(examples): add Gemini multimodal live text example
Introduce a new example `26d-gemini-multimodal-live-text.py` to
demonstrate the use of GeminiMultimodalLiveLLMService with text-only
responses. This example sets up a pipeline for audio input via DailyTransport,
processing with Gemini, and output via Cartesia TTS.
2025-01-08 19:29:35 +03:00
Mert Sefa AKGUN
c72c3025f6 feat(gemini): add configuration methods for response modalities
- Introduce `set_model_only_audio` and `set_model_only_text` methods
  to toggle between audio-only and text-only response modes in
  `GeminiMultimodalLiveLLMService`.
- Refactor configuration setup to a class attribute for improved
  reusability and maintenance.
- Remove redundant configuration instantiation in the WebSocket
  connection setup process.
2025-01-08 19:29:35 +03:00
Mert Sefa AKGUN
5cbd719780 feat(gemini): add text handling to GeminiMultimodalLive
- Introduce text attribute in Part class for handling string data.
- Incorporate text processing in GeminiMultimodalLiveLLMService to push TextFrame if text is present.
2025-01-08 19:29:35 +03:00
Filipi Fuchter
23d6290672 Removing not used class. 2025-01-08 12:05:04 -03:00
Filipi Fuchter
d4e7e11981 SimpleChatbot iOS app. 2025-01-08 12:00:11 -03:00
Mark Backman
8057fe3fcf Merge pull request #742 from Vaibhav159/vl_feature_websocket_fastapi_timeout
adding session_timeout param
2025-01-08 09:05:41 -05:00
Vaibhav159
3b446234a7 fix hyperlink 2025-01-08 10:54:27 +05:30
Vaibhav159
768487ffb3 final changelog 2025-01-08 10:53:32 +05:30
Vaibhav159
2da5620d10 adding changelog 2025-01-08 10:50:09 +05:30
Vaibhav159
af90d65b3b adding session timeout example in websocket-server example 2025-01-08 10:43:10 +05:30
Vaibhav159
c8569a7b67 Merge remote-tracking branch 'upstream/main' into vl_feature_websocket_fastapi_timeout 2025-01-08 10:21:36 +05:30
Vaibhav159
0ecd98c873 Merge branch 'main' into vl_feature_websocket_fastapi_timeout 2025-01-08 10:20:55 +05:30
Mark Backman
6f863ba2c6 Merge pull request #938 from jcbjoe/jg/optional-authentication-polly
Changed Polly authentication params to be optional
2025-01-07 15:37:23 -05:00
Mark Backman
602ca5ebe6 Merge pull request #939 from Vaibhav159/vl_adding_daily_room_properties
adding more daily room params
2025-01-07 14:33:59 -05:00
Vaibhav159
787ade41f3 adding missing doc string 2025-01-08 00:58:01 +05:30
Joe Garlick
bb767831d5 Added: Changelog entry 2025-01-07 19:05:02 +00:00
Mark Backman
bc25a771dc Merge pull request #935 from pipecat-ai/hush/modalUpdate
docs: update dependencies for modal demo
2025-01-07 13:57:46 -05:00
Vaibhav159
f37626f81d adding more daily room params 2025-01-07 21:38:05 +05:30
Mark Backman
9d54578e65 Merge pull request #934 from pipecat-ai/mb/bump-open-ai-version
Bump openai version to 1.59.0 for realtime and model updates
2025-01-07 08:29:45 -05:00
Joe Garlick
79afe7ec2a Changed: Polly authentication information to be optional 2025-01-07 11:43:57 +00:00
James Hush
2c1fd3c3cc docs: update dependencies for modal demo 2025-01-07 15:45:55 +08:00
Mark Backman
b0dd8e03a6 Bump openai version to 1.59.0 for realtime and model updates 2025-01-06 17:05:22 -05:00
Mark Backman
ee20e48ef8 Merge pull request #931 from pipecat-ai/mb/fix-openai-realtime-
Fix truncation timing of OpenAIRealtimeBetaLLMService
2025-01-06 16:25:09 -05:00
Mark Backman
12b5c5a646 Fix truncation timing of OpenAIRealtimeBetaLLMService 2025-01-06 15:37:58 -05:00
Mark Backman
7a021cc82d Merge pull request #929 from pipecat-ai/mb/add-google-journey-support
Added support for Google Journey TTS voices
2025-01-06 15:13:00 -05:00
Mark Backman
3e1ec4a8ee Added support for Google Journey TTS voices 2025-01-06 14:54:34 -05:00
Mark Backman
a1377b7f1a Merge pull request #924 from xtreme-sameer-vohra/patch-1
Update frames.py
2025-01-06 14:13:10 -05:00
Mark Backman
d6335886e2 Merge pull request #848 from Vaibhav159/vl_add_audio_and_chat_livekit_example
adding example for livekit audio and chat version
2025-01-06 13:27:38 -05:00
Vaibhav159
b3b7a5f023 adding 2025 license 2025-01-06 22:10:46 +05:30
Vaibhav159
5138017b57 ruff changes 2025-01-06 22:07:59 +05:30
Vaibhav159
87670067d7 adding changelog 2025-01-06 22:03:11 +05:30
Vaibhav159
656cd2859e Merge branch 'main' into vl_add_audio_and_chat_livekit_example 2025-01-06 21:57:43 +05:30
Mark Backman
15b2cc210c Merge pull request #927 from pipecat-ai/mb/update-copyright
Update copyright to 2025
2025-01-06 10:33:04 -05:00
Mark Backman
4667624b60 Update copyright to 2025 2025-01-06 10:19:37 -05:00
Sameer Vohra
d07ba80572 Update frames.py
fix minor typo in docs
2025-01-05 22:57:54 -05:00
Vaibhav159
62fc95300b adding livekit audio and chat version 2024-12-13 01:09:47 +05:30
Vaibhav159
6e8e7fa19a adding session_timeout in fastapi 2024-11-21 14:56:42 +05:30
Vaibhav159
7dfa886669 moving logic to WebsocketServerInputTransport 2024-11-21 14:45:24 +05:30
Vaibhav159
da254c5143 correcting _monitor_websocket 2024-11-21 12:36:51 +05:30
Vaibhav159
e11f128110 adding on_session_timeout 2024-11-21 12:34:32 +05:30
Vaibhav-Lodha
3aa89fb13a adding session_timeout param 2024-11-21 12:20:51 +05:30
525 changed files with 42509 additions and 5491 deletions

48
.github/workflows/android.yaml vendored Normal file
View File

@@ -0,0 +1,48 @@
name: android
on:
push:
branches:
- main
paths:
- "examples/simple-chatbot/client/android/**"
pull_request:
branches:
- "**"
paths:
- "examples/simple-chatbot/client/android/**"
workflow_dispatch:
inputs:
sdk_git_ref:
type: string
description: "Which git ref of the app to build"
concurrency:
group: build-android-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
sdk:
name: "Simple chatbot demo"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.sdk_git_ref || github.ref }}
- name: "Install Java"
uses: actions/setup-java@v4
with:
distribution: 'temurin'
java-version: '17'
- name: Build demo app
working-directory: examples/simple-chatbot/client/android
run: ./gradlew :simple-chatbot-client:assembleDebug
- name: Upload demo APK
uses: actions/upload-artifact@v4
with:
name: Simple Chatbot Android Client
path: examples/simple-chatbot/client/android/simple-chatbot-client/build/outputs/apk/debug/simple-chatbot-client-debug.apk

54
.github/workflows/coverage.yaml vendored Normal file
View File

@@ -0,0 +1,54 @@
name: coverage
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
jobs:
coverage:
name: "Coverage"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing dev-requirements.txt and test-requirements.txt which
# contain all dependencies needed to run the tests.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('dev-requirements.txt') }}-${{ hashFiles('test-requirements.txt') }}
path: .venv
- name: Install system packages
id: install_system_packages
run: |
sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt -r test-requirements.txt
- name: Run tests with coverage
run: |
source .venv/bin/activate
coverage run
coverage xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
slug: pipecat-ai/pipecat

View File

@@ -1,4 +1,4 @@
name: test
name: tests
on:
workflow_dispatch:
@@ -49,4 +49,4 @@ jobs:
- name: Test with pytest
run: |
source .venv/bin/activate
pytest --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
pytest

15
.gitignore vendored
View File

@@ -32,6 +32,21 @@ fly.toml
# Example files
pipecat/examples/twilio-chatbot/templates/streams.xml
pipecat/examples/bot-ready-signalling/client/react-native/node_modules/
pipecat/examples/bot-ready-signalling/client/react-native/.expo/
pipecat/examples/bot-ready-signalling/client/react-native/dist/
pipecat/examples/bot-ready-signalling/client/react-native/npm-debug.*
pipecat/examples/bot-ready-signalling/client/react-native/*.jks
pipecat/examples/bot-ready-signalling/client/react-native/*.p8
pipecat/examples/bot-ready-signalling/client/react-native/*.p12
pipecat/examples/bot-ready-signalling/client/react-native/*.key
pipecat/examples/bot-ready-signalling/client/react-native/*.mobileprovision
pipecat/examples/bot-ready-signalling/client/react-native/*.orig.*
pipecat/examples/bot-ready-signalling/client/react-native/web-build/
# macOS
.DS_Store
# Documentation
docs/api/_build/

7
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,7 @@
repos:
- repo: local
hooks:
- id: ruff-format-hook
name: Check ruff formatting
entry: sh scripts/pre-commit.sh
language: system

View File

@@ -5,6 +5,617 @@ All notable changes to **Pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Pipecat version will now be logged on every application startup. This will
help us identify what version we are running in case of any issues.
- Added a new `StopFrame` which can be used to stop a pipeline task while
keeping the frame processors running. The frame processors could then be used
in a different pipeline. The difference between a `StopFrame` and a
`StopTaskFrame` is that, as with `EndFrame` and `EndTaskFrame`, the
`StopFrame` is pushed from the task and the `StopTaskFrame` is pushed upstream
inside the pipeline by any processor.
- Added a new `PipelineTask` parameter `observers` that replaces the previous
`PipelineParams.observers`.
- Added a new `PipelineTask` parameter `check_dangling_tasks` to enable or
disable checking for frame processors' dangling tasks when the Pipeline
finishes running.
- Added new `on_completion_timeout` event for LLM services (all OpenAI-based
services, Anthropic and Google). Note that this event will only get triggered
if LLM timeouts are setup and if the timeout was reached. It can be useful to
retrigger another completion and see if the timeout was just a blip.
- Added new log observers `LLMLogObserver` and `TranscriptionLogObserver` that
can be useful for debugging your pipelines.
- Added `room_url` property to `DailyTransport`.
- Added `addons` argument to `DeepgramSTTService`.
- Added `exponential_backoff_time()` to `utils.network` module.
### Changed
- ⚠️ `PipelineTask` now requires keyword arguments (except for the first one for
the pipeline).
- The base `TTSService` class now strips leading newlines before sending text
to the TTS provider. This change is to solve issues where some TTS providers,
like Azure, would not output text due to newlines.
- `GrokLLMSService` now uses `grok-2` as the default model.
- `AnthropicLLMService` now uses `claude-3-7-sonnet-20250219` as the default
model.
- `RimeHttpTTSService` needs an `aiohttp.ClientSession` to be passed to the
constructor as all the other HTTP-based services.
- `RimeHttpTTSService` doesn't use a default voice anymore.
- `DeepgramSTTService` now uses the new `nova-3` model by default. If you want
to use the previous model you can pass `LiveOptions(model="nova-2-general")`.
(see https://deepgram.com/learn/introducing-nova-3-speech-to-text-api)
```python
stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
```
### Deprecated
- `PipelineParams.observers` is now deprecated, you the new `PipelineTask`
parameter `observers`.
### Removed
- Remove `TransportParams.audio_out_is_live` since it was not being used at all.
### Fixed
- Fixed an `AudioContextWordTTSService` issue that would cause an `EndFrame` to
disconnect from the TTS service before audio from all the contexts was
received. This affected services like Cartesia and Rime.
- Fixed an issue that was not allowing to pass an `OpenAILLMContext` to create
`GoogleLLMService`'s context aggregators.
- Fixed a `ElevenLabsTTSService`, `FishAudioTTSService`, `LMNTTTSService` and
`PlayHTTTSService` issue that was resulting in audio requested before an
interruption being played after an interruption.
- Fixed `match_endofsentence` support for ellipses.
- Fixed an issue that would cause undesired interruptions via
`EmulateUserStartedSpeakingFrame` when only interim transcriptions (i.e. no
final transcriptions) where received.
- Fixed an issue where `EndTaskFrame` was not triggering
`on_client_disconnected` or closing the WebSocket in FastAPI.
- Fixed an issue in `DeepgramSTTService` where the `sample_rate` passed to the
`LiveOptions` was not being used, causing the service to use the default
sample rate of pipeline.
- Fixed a context aggregator issue that would not append the LLM text response
to the context if a function call happened in the same LLM turn.
- Fixed an issue that was causing HTTP TTS services to push `TTSStoppedFrame`
more than once.
- Fixed a `FishAudioTTSService` issue where `TTSStoppedFrame` was not being
pushed.
- Fixed an issue that `start_callback` was not invoked for some LLM services.
- Fixed an issue that would cause `DeepgramSTTService` to stop working after an
error occurred (e.g. sudden network loss). If the network recovered we would
not reconnect.
- Fixed a `STTMuteFilter` issue that would not mute user audio frames causing
transcriptions to be generated by the STT service.
### Other
- Added Gemini support to `examples/phone-chatbot`.
## [0.0.57] - 2025-02-14
### Added
- Added new `AudioContextWordTTSService`. This is a TTS base class for TTS
services that handling multiple separate audio requests.
- Added new frames `EmulateUserStartedSpeakingFrame` and
`EmulateUserStoppedSpeakingFrame` which can be used to emulated VAD behavior
without VAD being present or not being triggered.
- Added a new `audio_in_stream_on_start` field to `TransportParams`.
- Added a new method `start_audio_in_streaming` in the `BaseInputTransport`.
- This method should be used to start receiving the input audio in case the
field `audio_in_stream_on_start` is set to `false`.
- Added support for the `RTVIProcessor` to handle buffered audio in `base64`
format, converting it into InputAudioRawFrame for transport.
- Added support for the `RTVIProcessor` to trigger `start_audio_in_streaming`
only after the `client-ready` message.
- Added new `MUTE_UNTIL_FIRST_BOT_COMPLETE` strategy to `STTMuteStrategy`. This
strategy starts muted and remains muted until the first bot speech completes,
ensuring the bot's first response cannot be interrupted. This complements the
existing `FIRST_SPEECH` strategy which only mutes during the first detected
bot speech.
- Added support for Google Cloud Speech-to-Text V2 through `GoogleSTTService`.
- Added `RimeTTSService`, a new `WordTTSService`. Updated the foundational
example `07q-interruptible-rime.py` to use `RimeTTSService`.
- Added support for Groq's Whisper API through the new `GroqSTTService` and
OpenAI's Whisper API through the new `OpenAISTTService`. Introduced a new
base class `BaseWhisperSTTService` to handle common Whisper API
functionality.
- Added `PerplexityLLMService` for Perplexity NIM API integration, with an
OpenAI-compatible interface. Also, added foundational example
`14n-function-calling-perplexity.py`.
- Added `DailyTransport.update_remote_participants()`. This allows you to update
remote participant's settings, like their permissions or which of their
devices are enabled. Requires that the local participant have participant
admin permission.
### Changed
- We don't consider a colon `:` and end of sentence any more.
- Updated `DailyTransport` to respect the `audio_in_stream_on_start` field,
ensuring it only starts receiving the audio input if it is enabled.
- Updated `FastAPIWebsocketOutputTransport` to send `TransportMessageFrame` and
`TransportMessageUrgentFrame` to the serializer.
- Updated `WebsocketServerOutputTransport` to send `TransportMessageFrame` and
`TransportMessageUrgentFrame` to the serializer.
- Enhanced `STTMuteConfig` to validate strategy combinations, preventing
`MUTE_UNTIL_FIRST_BOT_COMPLETE` and `FIRST_SPEECH` from being used together
as they handle first bot speech differently.
- Updated foundational example `07n-interruptible-google.py` to use all Google
services.
- `RimeHttpTTSService` now uses the `mistv2` model by default.
- Improved error handling in `AzureTTSService` to properly detect and log
synthesis cancellation errors.
- Enhanced `WhisperSTTService` with full language support and improved model
documentation.
- Updated foundation example `14f-function-calling-groq.py` to use
`GroqSTTService` for transcription.
- Updated `GroqLLMService` to use `llama-3.3-70b-versatile` as the default
model.
- `RTVIObserver` doesn't handle `LLMSearchResponseFrame` frames anymore. For
now, to handle those frames you need to create a `GoogleRTVIObserver`
instead.
### Deprecated
- `STTMuteFilter` constructor's `stt_service` parameter is now deprecated and
will be removed in a future version. The filter now manages mute state
internally instead of querying the STT service.
- `RTVI.observer()` is now deprecated, instantiate an `RTVIObserver` directly
instead.
- All RTVI frame processors (e.g. `RTVISpeakingProcessor`,
`RTVIBotLLMProcessor`) are now deprecated, instantiate an `RTVIObserver`
instead.
### Fixed
- Fixed a `FalImageGenService` issue that was causing the event loop to be
blocked while loading the downloadded image.
- Fixed a `CartesiaTTSService` service issue that would cause audio overlapping
in some cases.
- Fixed a websocket-based service issue (e.g. `CartesiaTTSService`) that was
preventing a reconnection after the server disconnected cleanly, which was
causing an inifite loop instead.
- Fixed a `BaseOutputTransport` issue that was causing upstream frames to no be
pushed upstream.
- Fixed multiple issue where user transcriptions where not being handled
properly. It was possible for short utterances to not trigger VAD which would
cause user transcriptions to be ignored. It was also possible for one or more
transcriptions to be generated after VAD in which case they would also be
ignored.
- Fixed an issue that was causing `BotStoppedSpeakingFrame` to be generated too
late. This could then cause issues unblocking `STTMuteFilter` later than
desired.
- Fixed an issue that was causing `AudioBufferProcessor` to not record
synchronized audio.
- Fixed an `RTVI` issue that was causing `bot-tts-text` messages to be sent
before being processed by the output transport.
- Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect
the websocket connection even when the connection is already closed.
- Fixed an issue where `has_regular_messages` condition was always true in
`GoogleLLMContext` due to `Part` having `function_call` & `function_response`
with `None` values.
### Other
- Added new `instant-voice` example. This example showcases how to enable
instant voice communication as soon as a user connects.
- Added new `local-input-select-stt` example. This examples allows you to play
with local audio inputs by slecting them through a nice text interface.
## [0.0.56] - 2025-02-06
### Changed
- Use `gemini-2.0-flash-001` as the default model for `GoogleLLMSerivce`.
- Improved foundational examples 22b, 22c, and 22d to support function calling.
With these base examples, `FunctionCallInProgressFrame` and
`FunctionCallResultFrame` will no longer be blocked by the gates.
### Fixed
- Fixed a `TkLocalTransport` and `LocalAudioTransport` issues that was causing
errors on cleanup.
- Fixed an issue that was causing `tests.utils` import to fail because of
logging setup.
- Fixed a `SentryMetrics` issue that was preventing any metrics to be sent to
Sentry and also was preventing from metrics frames to be pushed to the
pipeline.
- Fixed an issue in `BaseOutputTransport` where incoming audio would not be
resampled to the desired output sample rate.
- Fixed an issue with the `TwilioFrameSerializer` and `TelnyxFrameSerializer`
where `twilio_sample_rate` and `telnyx_sample_rate` were incorrectly
initialized to `audio_in_sample_rate`. Those values currently default to 8000
and should be set manually from the serializer constructor if a different
value is needed.
### Other
- Added a new `sentry-metrics` example.
## [0.0.55] - 2025-02-05
### Added
- Added a new `start_metadata` field to `PipelineParams`. The provided metadata
will be set to the initial `StartFrame` being pushed from the `PipelineTask`.
- Added new fields to `PipelineParams` to control audio input and output sample
rates for the whole pipeline. This allows controlling sample rates from a
single place instead of having to specify sample rates in each
service. Setting a sample rate to a service is still possible and will
override the value from `PipelineParams`.
- Introduce audio resamplers (`BaseAudioResampler`). This is just a base class
to implement audio resamplers. Currently, two implementations are provided
`SOXRAudioResampler` and `ResampyResampler`. A new
`create_default_resampler()` has been added (replacing the now deprecated
`resample_audio()`).
- It is now possible to specify the asyncio event loop that a `PipelineTask` and
all the processors should run on by passing it as a new argument to the
`PipelineRunner`. This could allow running pipelines in multiple threads each
one with its own event loop.
- Added a new `utils.TaskManager`. Instead of a global task manager we now have
a task manager per `PipelineTask`. In the previous version the task manager
was global, so running multiple simultaneous `PipelineTask`s could result in
dangling task warnings which were not actually true. In order, for all the
processors to know about the task manager, we pass it through the
`StartFrame`. This means that processors should create tasks when they receive
a `StartFrame` but not before (because they don't have a task manager yet).
- Added `TelnyxFrameSerializer` to support Telnyx calls. A full running example
has also been added to `examples/telnyx-chatbot`.
- Allow pushing silence audio frames before `TTSStoppedFrame`. This might be
useful for testing purposes, for example, passing bot audio to an STT service
which usually needs additional audio data to detect the utterance stopped.
- `TwilioSerializer` now supports transport message frames. With this we can
create Twilio emulators.
- Added a new transport: `WebsocketClientTransport`.
- Added a `metadata` field to `Frame` which makes it possible to pass custom
data to all frames.
- Added `test/utils.py` inside of pipecat package.
### Changed
- `GatedOpenAILLMContextAggregator` now require keyword arguments. Also, a new
`start_open` argument has been added to set the initial state of the gate.
- Added `organization` and `project` level authentication to
`OpenAILLMService`.
- Improved the language checking logic in `ElevenLabsTTSService` and
`ElevenLabsHttpTTSService` to properly handle language codes based on model
compatibility, with appropriate warnings when language codes cannot be
applied.
- Updated `GoogleLLMContext` to support pushing `LLMMessagesUpdateFrame`s that
contain a combination of function calls, function call responses, system
messages, or just messages.
- `InputDTMFFrame` is now based on `DTMFFrame`. There's also a new
`OutputDTMFFrame` frame.
### Deprecated
- `resample_audio()` is now deprecated, use `create_default_resampler()`
instead.
### Removed
- `AudioBufferProcessor.reset_audio_buffers()` has been removed, use
`AudioBufferProcessor.start_recording()` and
`AudioBufferProcessor.stop_recording()` instead.
### Fixed
- Fixed a `AudioBufferProcessor` that would cause crackling in some recordings.
- Fixed an issue in `AudioBufferProcessor` where user callback would not be
called on task cancellation.
- Fixed an issue in `AudioBufferProcessor` that would cause wrong silence
padding in some cases.
- Fixed an issue where `ElevenLabsTTSService` messages would return a 1009
websocket error by increasing the max message size limit to 16MB.
- Fixed a `DailyTransport` issue that would cause events to be triggered before
join finished.
- Fixed a `PipelineTask` issue that was preventing processors to be cleaned up
after cancelling the task.
- Fixed an issue where queuing a `CancelFrame` to a pipeline task would not
cause the task to finish. However, using `PipelineTask.cancel()` is still the
recommended way to cancel a task.
### Other
- Improved Unit Test `run_test()` to use `PipelineTask` and
`PipelineRunner`. There's now also some control around `StartFrame` and
`EndFrame`. The `EndTaskFrame` has been removed since it doesn't seem
necessary with this new approach.
- Updated `twilio-chatbot` with a few new features: use 8000 sample rate and
avoid resampling, a new client useful for stress testing and testing locally
without the need to make phone calls. Also, added audio recording on both the
client and the server to make sure the audio sounds good.
- Updated examples to use `task.cancel()` to immediately exit the example when a
participant leaves or disconnects, instead of pushing an `EndFrame`. Pushing
an `EndFrame` causes the bot to run through everything that is internally
queued (which could take some seconds). Note that using `task.cancel()` might
not always be the best option and pushing an `EndFrame` could still be
desirable to make sure all the pipeline is flushed.
## [0.0.54] - 2025-01-27
### Added
- In order to create tasks in Pipecat frame processors it is now recommended to
use `FrameProcessor.create_task()` (which uses the new
`utils.asyncio.create_task()`). It takes care of uncaught exceptions, task
cancellation handling and task management. To cancel or wait for a task there
is `FrameProcessor.cancel_task()` and `FrameProcessor.wait_for_task()`. All of
Pipecat processors have been updated accordingly. Also, when a pipeline runner
finishes, a warning about dangling tasks might appear, which indicates if any
of the created tasks was never cancelled or awaited for (using these new
functions).
- It is now possible to specify the period of the `PipelineTask` heartbeat
frames with `heartbeats_period_secs`.
- Added `DailyMeetingTokenProperties` and `DailyMeetingTokenParams` Pydantic models
for meeting token creation in `get_token` method of `DailyRESTHelper`.
- Added `enable_recording` and `geo` parameters to `DailyRoomProperties`.
- Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings
to a custom AWS bucket.
### Changed
- Enhanced `UserIdleProcessor` with retry functionality and control over idle
monitoring via new callback signature `(processor, retry_count) -> bool`.
Updated the `17-detect-user-idle.py` to show how to use the `retry_count`.
- Add defensive error handling for `OpenAIRealtimeBetaLLMService`'s audio
truncation. Audio truncation errors during interruptions now log a warning
and allow the session to continue instead of throwing an exception.
- Modified `TranscriptProcessor` to use TTS text frames for more accurate assistant
transcripts. Assistant messages are now aggregated based on bot speaking boundaries
rather than LLM context, providing better handling of interruptions and partial
utterances.
- Updated foundational examples `28a-transcription-processor-openai.py`,
`28b-transcript-processor-anthropic.py`, and
`28c-transcription-processor-gemini.py` to use the updated
`TranscriptProcessor`.
### Fixed
- Fixed an `GeminiMultimodalLiveLLMService` issue that was preventing the user
to push initial LLM assistant messages (using `LLMMessagesAppendFrame`).
- Added missing `FrameProcessor.cleanup()` calls to `Pipeline`,
`ParallelPipeline` and `UserIdleProcessor`.
- Fixed a type error when using `voice_settings` in `ElevenLabsHttpTTSService`.
- Fixed an issue where `OpenAIRealtimeBetaLLMService` function calling resulted
in an error.
- Fixed an issue in `AudioBufferProcessor` where the last audio buffer was not
being processed, in cases where the `_user_audio_buffer` was smaller than the
buffer size.
### Performance
- Replaced audio resampling library `resampy` with `soxr`. Resampling a 2:21s
audio file from 24KHz to 16KHz took 1.41s with `resampy` and 0.031s with
`soxr` with similar audio quality.
### Other
- Added initial unit test infrastructure.
## [0.0.53] - 2025-01-18
### Added
- Added `ElevenLabsHttpTTSService` which uses EleveLabs' HTTP API instead of the
websocket one.
- Introduced pipeline frame observers. Observers can view all the frames that go
through the pipeline without the need to inject processors in the
pipeline. This can be useful, for example, to implement frame loggers or
debuggers among other things. The example
`examples/foundational/30-observer.py` shows how to add an observer to a
pipeline for debugging.
- Introduced heartbeat frames. The pipeline task can now push periodic
heartbeats down the pipeline when `enable_heartbeats=True`. Heartbeats are
system frames that are supposed to make it all the way to the end of the
pipeline. When a heartbeat frame is received the traversing time (i.e. the
time it took to go through the whole pipeline) will be displayed (with TRACE
logging) otherwise a warning will be shown. The example
`examples/foundational/31-heartbeats.py` shows how to enable heartbeats and
forces warnings to be displayed.
- Added `LLMTextFrame` and `TTSTextFrame` which should be pushed by LLM and TTS
services respectively instead of `TextFrame`s.
- Added `OpenRouter` for OpenRouter integration with an OpenAI-compatible
interface. Added foundational example `14m-function-calling-openrouter.py`.
- Added a new `WebsocketService` based class for TTS services, containing
base functions and retry logic.
- Added `DeepSeekLLMService` for DeepSeek integration with an OpenAI-compatible
interface. Added foundational example `14l-function-calling-deepseek.py`.
- Added `FunctionCallResultProperties` dataclass to provide a structured way to
control function call behavior, including:
- `run_llm`: Controls whether to trigger LLM completion
- `on_context_updated`: Optional callback triggered after context update
- Added a new foundational example `07e-interruptible-playht-http.py` for easy
testing of `PlayHTHttpTTSService`.
- Added support for Google TTS Journey voices in `GoogleTTSService`.
- Added `29-livekit-audio-chat.py`, as a new foundational examples for
`LiveKitTransportLayer`.
- Added `enable_prejoin_ui`, `max_participants` and `start_video_off` params
to `DailyRoomProperties`.
- Added `session_timeout` to `FastAPIWebsocketTransport` and
`WebsocketServerTransport` for configuring session timeouts (in
seconds). Triggers `on_session_timeout` for custom timeout handling.
See [examples/websocket-server/bot.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/websocket-server/bot.py).
- Added the new modalities option and helper function to set Gemini output
modalities.
- Added `examples/foundational/26d-gemini-multimodal-live-text.py` which is
using Gemini as TEXT modality and using another TTS provider for TTS process.
### Changed
- Modified `UserIdleProcessor` to start monitoring only after first
conversation activity (`UserStartedSpeakingFrame` or
`BotStartedSpeakingFrame`) instead of immediately.
- Modified `OpenAIAssistantContextAggregator` to support controlled completions
and to emit context update callbacks via `FunctionCallResultProperties`.
- Added `aws_session_token` to the `PollyTTSService`.
- Changed the default model for `PlayHTHttpTTSService` to `Play3.0-mini-http`.
- `api_key`, `aws_access_key_id` and `region` are no longer required parameters
for the PollyTTSService (AWSTTSService)
- Added `session_timeout` example in `examples/websocket-server/bot.py` to
handle session timeout event.
- Changed `InputParams` in
`src/pipecat/services/gemini_multimodal_live/gemini.py` to support different
modalities.
- Changed `DeepgramSTTService` to send `finalize` event whenever VAD detects
`UserStoppedSpeakingFrame`. This helps in faster transcriptions and clearing
the `Deepgram` audio buffer.
### Fixed
- Fixed an issue where `DeepgramSTTService` was not generating metrics using
pipeline's VAD.
- Fixed `UserIdleProcessor` not properly propagating `EndFrame`s through the
pipeline.
- Fixed an issue where websocket based TTS services could incorrectly terminate
their connection due to a retry counter not resetting.
- Fixed a `PipelineTask` issue that would cause a dangling task after stopping
the pipeline with an `EndFrame`.
- Fixed an import issue for `PlayHTHttpTTSService`.
- Fixed an issue where languages couldn't be used with the `PlayHTHttpTTSService`.
- Fixed an issue where `OpenAIRealtimeBetaLLMService` audio chunks were hitting
an error when truncating audio content.
- Fixed an issue where setting the voice and model for `RimeHttpTTSService`
wasn't working.
- Fixed an issue where `IdleFrameProcessor` and `UserIdleProcessor` were getting
initialized before the start of the pipeline.
## [0.0.52] - 2024-12-24
### Added
@@ -1264,6 +1875,9 @@ async def on_connected(processor):
### Changed
- `FrameSerializer.serialize()` and `FrameSerializer.deserialize()` are now
`async`.
- `Filter` has been renamed to `FrameFilter` and it's now under
`processors/filters`.

View File

@@ -1,6 +1,6 @@
BSD 2-Clause License
Copyright (c) 2024, Daily
Copyright (c) 20242025, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

View File

@@ -2,7 +2,7 @@
 <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div></h1>
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) <a href="https://app.commanddash.io/agent/github_pipecat-ai_pipecat"><img src="https://img.shields.io/badge/AI-Code%20Agent-EB9FDA"></a>
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![codecov](https://codecov.io/gh/pipecat-ai/pipecat/graph/badge.svg?token=LNVUIVO4Y9)](https://codecov.io/gh/pipecat-ai/pipecat) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat)
Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.
@@ -53,19 +53,19 @@ To keep things lightweight, only the core framework is included by default. If y
pip install "pipecat-ai[option,...]"
```
Available options include:
### Available services
| Category | Services | Install Command Example |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[openai]"` |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), WebSocket, Local | `pip install "pipecat-ai[daily]"` |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
| Vision & Image | [Moondream](https://docs.pipecat.ai/server/services/vision/moondream), [fal](https://docs.pipecat.ai/server/services/image-generation/fal) | `pip install "pipecat-ai[moondream]"` |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
| Category | Services | Install Command Example |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[google]"` |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
| Vision & Image | [Moondream](https://docs.pipecat.ai/server/services/vision/moondream), [fal](https://docs.pipecat.ai/server/services/image-generation/fal) | `pip install "pipecat-ai[moondream]"` |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
@@ -81,7 +81,7 @@ Here is a very basic Pipecat bot that greets a user when they join a real-time s
```python
import asyncio
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
@@ -122,7 +122,7 @@ async def main():
# Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await task.cancel()
# Run the pipeline task
await runner.run(task)
@@ -149,27 +149,40 @@ Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://
## Hacking on the framework itself
_Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_
_Note: You may need to set up a virtual environment before following these instructions. From the root of the repo:_
```shell
python3 -m venv venv
source venv/bin/activate
```
From the root of this repo, run the following:
Install the development dependencies:
```shell
pip install -r dev-requirements.txt
python -m build
```
This builds the package. To use the package locally (e.g. to run sample files), run
Install the git pre-commit hooks (these help ensure your code follows project rules):
```shell
pip install --editable ".[option,...]"
pre-commit install
```
If you want to use this package from another directory, you can run:
Install the `pipecat-ai` package locally in editable mode:
```shell
pip install -e .
```
The `-e` or `--editable` option allows you to modify the code without reinstalling.
To include optional dependencies, add them to the install command. For example:
```shell
pip install -e ".[daily,deepgram,cartesia,openai,silero]" # Updated for the services you're using
```
If you want to use this package from another directory:
```shell
pip install "path_to_this_repo[option,...]"
@@ -180,7 +193,7 @@ pip install "path_to_this_repo[option,...]"
From the root directory, run:
```shell
pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
pytest
```
## Setting up your editor

11
codecov.yml Normal file
View File

@@ -0,0 +1,11 @@
coverage:
range: 50..90 # coverage lower than 50 is red, higher than 90 green, between color code
status:
project:
default:
target: auto # auto % coverage target
threshold: 5% # allow for 5% reduction of coverage without failing
# do not run coverage on patch nor changes
patch: false

View File

@@ -1,9 +1,12 @@
build~=1.2.2
grpcio-tools~=1.68.1
coverage~=7.6.12
grpcio-tools~=1.67.1
pip-tools~=7.4.1
pyright~=1.1.390
pre-commit~=4.0.1
pyright~=1.1.394
pytest~=8.3.4
ruff~=0.8.3
setuptools~=75.6.0
pytest-asyncio~=0.25.3
ruff~=0.9.7
setuptools~=70.0.0
setuptools_scm~=8.1.0
python-dotenv~=1.0.1

View File

@@ -18,6 +18,9 @@ AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Cartesia
CARTESIA_API_KEY=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
@@ -60,3 +63,27 @@ SIMLI_FACE_ID=...
# Krisp
KRISP_MODEL_PATH=...
# DeepSeek
DEEPSEEK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Grok
GROK_API_KEY=...
# Together.ai
TOGETHER_API_KEY=...
# Cerebras
CEREBRAS_API_KEY=...
# Fish Audio
FISH_API_KEY=...
# Assembly AI
ASSEMBLYAI_API_KEY=...
# OpenRouter
OPENROUTER_API_KEY=...

View File

@@ -39,10 +39,10 @@ Next, follow the steps in the README for each demo.
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| [Patient intake](patient-intake) | A chatbot that can call functions in response to user input. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [Phone Chatbot](phone-chatbot) | A chatbot that connects to PSTN/SIP phone calls, powered by Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [studypal](studypal) | A chatbot to have a conversation about any article on the web | |
| [WebSocket Chatbot Server](websocket-server) | A real-time websocket server that handles audio streaming and bot interactions with speech-to-text and text-to-speech capabilities | `python-websockets`, `openai`, `deepgram`, `silero-tts`, `numpy` |
| [WebSocket Chatbot Server](websocket-server) | A real-time websocket server that handles audio streaming and bot interactions with speech-to-text and text-to-speech capabilities. | Cartesia, Deepgram, OpenAI, Websockets |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.

View File

@@ -0,0 +1,45 @@
# Bot ready signaling
A simple Pipecat example demonstrating how to handle signaling between the client and the bot,
ensuring that the bot starts sending audio only when the client is available,
thereby avoiding the risk of cutting off the beginning of the audio.
## Quick Start
### First, start the bot server:
1. Navigate to the server directory:
```bash
cd server
```
2. Create and activate a virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install requirements:
```bash
pip install -r requirements.txt
```
4. Copy env.example to .env and configure:
- Add your API keys
5. Start the server:
```bash
python server.py
```
### Next, connect using the client app:
For client-side setup, refer to the [JavaScript Guide](client/javascript/README.md).
## Important Note
Ensure the bot server is running before using any client implementations.
## Requirements
- Python 3.10+
- Node.js 16+ (for JavaScript)
- Daily API key
- Cartesia API key
- Modern web browser with WebRTC support

View File

@@ -6,10 +6,10 @@ Basic implementation using the [Pipecat JavaScript SDK](https://docs.pipecat.ai/
1. Run the bot server. See the [server README](../../README).
2. Navigate to the `examples/javascript` directory:
2. Navigate to the `client/javascript` directory:
```bash
cd examples/javascript
cd client/javascript
```
3. Install dependencies:

View File

@@ -0,0 +1,34 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Chatbot</title>
</head>
<body>
<div class="container">
<div class="status-bar">
<div class="status">
Status: <span id="connection-status">Disconnected</span>
</div>
<div class="controls">
<button id="connect-btn">Connect</button>
<button id="disconnect-btn" disabled>Disconnect</button>
</div>
</div>
<audio id="bot-audio" autoplay></audio>
<div class="debug-panel">
<h3>Debug Info</h3>
<div id="debug-log"></div>
</div>
</div>
<script type="module" src="/src/app.js"></script>
<link rel="stylesheet" href="/src/style.css">
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -11,11 +11,10 @@
"author": "",
"license": "ISC",
"description": "",
"dependencies": {
"@daily-co/realtime-ai-daily": "^0.2.1",
"realtime-ai": "^0.2.1"
},
"devDependencies": {
"vite": "^6.0.2"
"vite": "^6.0.9"
},
"dependencies": {
"@daily-co/daily-js": "0.74.0"
}
}

View File

@@ -0,0 +1,216 @@
/**
* Copyright (c) 20242025, Daily
*
* SPDX-License-Identifier: BSD 2-Clause License
*/
import Daily from "@daily-co/daily-js";
/**
* ChatbotClient handles the connection and media management for a real-time
* voice interaction with an AI bot.
*/
class ChatbotClient {
constructor() {
// Initialize client state
this.dailyCallObject = null;
this.setupDOMElements();
this.setupEventListeners();
}
/**
* Set up references to DOM elements and create necessary media elements
*/
setupDOMElements() {
// Get references to UI control elements
this.connectBtn = document.getElementById('connect-btn');
this.disconnectBtn = document.getElementById('disconnect-btn');
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
// Create an audio element for bot's voice output
this.botAudio = document.createElement('audio');
this.botAudio.autoplay = true;
this.botAudio.playsInline = true;
document.body.appendChild(this.botAudio);
}
/**
* Set up event listeners for connect/disconnect buttons
*/
setupEventListeners() {
this.connectBtn.addEventListener('click', () => this.connect());
this.disconnectBtn.addEventListener('click', () => this.disconnect());
}
/**
* Add a timestamped message to the debug log
*/
log(message) {
const entry = document.createElement('div');
entry.textContent = `${new Date().toISOString()} - ${message}`;
// Add styling based on message type
if (message.startsWith('User: ')) {
entry.style.color = '#2196F3'; // blue for user
} else if (message.startsWith('Bot: ')) {
entry.style.color = '#4CAF50'; // green for bot
}
this.debugLog.appendChild(entry);
this.debugLog.scrollTop = this.debugLog.scrollHeight;
console.log(message);
}
/**
* Update the connection status display
*/
updateStatus(status) {
this.statusSpan.textContent = status;
this.log(`Status: ${status}`);
}
handleEventToConsole (evt) {
this.log(`Received event: ${evt.action}`);
};
/**
* Set up listeners for track events (start/stop)
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.dailyCallObject) return;
this.dailyCallObject.on("joined-meeting", () => {
this.updateStatus('Connected');
this.connectBtn.disabled = true;
this.disconnectBtn.disabled = false;
this.log('Client connected');
});
this.dailyCallObject.on("track-started", (evt) => {
if (evt.track.kind === "audio" && evt.participant.local === false) {
this.log("Audio track started.")
this.setupAudioTrack(evt.track);
}
});
this.dailyCallObject.on("track-stopped", this.handleEventToConsole.bind(this));
this.dailyCallObject.on("participant-joined", this.handleEventToConsole.bind(this));
this.dailyCallObject.on("participant-updated", this.handleEventToConsole.bind(this));
this.dailyCallObject.on("participant-left", () => {
// When the bot leaves, we are also disconnecting from the call
this.disconnect()
});
this.dailyCallObject.on("left-meeting", () => {
this.updateStatus('Disconnected');
this.connectBtn.disabled = false;
this.disconnectBtn.disabled = true;
this.log('Client disconnected');
});
this.dailyCallObject.on("error", this.handleEventToConsole.bind(this));
}
/**
* Set up an audio track for playback
* Handles both initial setup and track updates
*/
setupAudioTrack(track) {
this.log(`Setting up audio track, track state: ${track.readyState}, muted: ${track.muted}`);
// Check if we're already playing this track
if (this.botAudio.srcObject) {
const oldTrack = this.botAudio.srcObject.getAudioTracks()[0];
if (oldTrack?.id === track.id) return;
}
// Create a new MediaStream with the track and set it as the audio source
this.botAudio.srcObject = new MediaStream([track]);
this.botAudio.onplaying = async (event) => {
this.log("onplaying")
this.log("Will send the audio message to play the audio at the next tick")
this.dailyCallObject.sendAppMessage("playable")
}
}
async fetchRoomInfo() {
let connectUrl = '/connect'
let res = await fetch(connectUrl, {
method: "POST",
mode: "cors",
headers: new Headers({
"Content-Type": "application/json"
}),
})
if (res.ok) {
return res.json();
}
}
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
*/
async connect() {
try {
// Initialize the client
this.dailyCallObject = Daily.createCallObject({
subscribeToTracksAutomatically: true,
});
// Set up listeners for media track events
this.setupTrackListeners();
this.log('Creating the bot...');
let roomInfo = await this.fetchRoomInfo()
// Connect to the bot
this.log('Connecting to bot...');
// Only for making debugger easier
window.callObject = this.dailyCallObject;
await this.dailyCallObject.join({
url: roomInfo.room_url,
});
this.log('Connection complete');
} catch (error) {
// Handle any errors during connection
this.log(`Error connecting: ${error.message}`);
this.log(`Error stack: ${error.stack}`);
this.updateStatus('Error');
// Clean up if there's an error
if (this.dailyCallObject) {
try {
await this.dailyCallObject.leave();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError.message}`);
}
}
}
}
/**
* Disconnect from the bot and clean up media resources
*/
async disconnect() {
if (this.dailyCallObject) {
try {
// Disconnect the RTVI client
await this.dailyCallObject.leave();
await this.dailyCallObject.destroy();
this.dailyCallObject = null;
// Clean up audio
if (this.botAudio.srcObject) {
this.botAudio.srcObject.getTracks().forEach((track) => track.stop());
this.botAudio.srcObject = null;
}
} catch (error) {
this.log(`Error disconnecting: ${error.message}`);
}
}
}
}
// Initialize the client when the page loads
window.addEventListener('DOMContentLoaded', () => {
new ChatbotClient();
});

View File

@@ -0,0 +1,13 @@
import { defineConfig } from 'vite';
export default defineConfig({
server: {
proxy: {
// Proxy /api requests to the backend server
'/connect': {
target: 'http://0.0.0.0:7860', // Replace with your backend URL
changeOrigin: true,
},
},
},
});

View File

@@ -0,0 +1 @@
22.14

View File

@@ -0,0 +1,60 @@
# React Native Implementation
Basic implementation using the [Pipecat React Native SDK](https://docs.pipecat.ai/client/react-native/introduction).
## Usage
### Expo requirements
This project cannot be used with an [Expo Go](https://docs.expo.dev/workflow/expo-go/) app because [it requires custom native code](https://docs.expo.io/workflow/customizing/).
When a project requires custom native code or a config plugin, we need to transition from using [Expo Go](https://docs.expo.dev/workflow/expo-go/)
to a [development build](https://docs.expo.dev/development/introduction/).
More details about the custom native code used by this demo can be found in [rn-daily-js-expo-config-plugin](https://github.com/daily-co/rn-daily-js-expo-config-plugin).
### Building remotely
If you do not have experience with Xcode and Android Studio builds or do not have them installed locally on your computer, you will need to follow [this guide from Expo to use EAS Build](https://docs.expo.dev/development/create-development-builds/#create-and-install-eas-build).
### Building locally
You will need to have installed locally on your computer:
- [Xcode](https://developer.apple.com/xcode/) to build for iOS;
- [Android Studio](https://developer.android.com/studio) to build for Android;
#### Install the demo dependencies
```bash
# Use the version of node specified in .nvmrc
nvm i
# Install dependencies
npm i
# Before a native app can be compiled, the native source code must be generated.
npx expo prebuild
# Configure the environment variable to connect to the local server
cp env.example .env
# edit .env and add your local ip address, for example: http://192.168.1.16:7860
```
#### Running on Android
After plugging in an Android device [configured for debugging](https://developer.android.com/studio/debug/dev-options), run the following command:
```
npm run android
```
#### Running on iOS
Run the following command:
```
npm run ios
```
#### Connect to the server
Use the http://localhost:5173 in your app.

View File

@@ -0,0 +1,75 @@
{
"expo": {
"name": "bot-ready-rn",
"slug": "bot-ready-rn",
"version": "1.0.0",
"orientation": "portrait",
"icon": "./assets/icon.png",
"userInterfaceStyle": "light",
"splash": {
"image": "./assets/splash.png",
"resizeMode": "contain",
"backgroundColor": "#ffffff"
},
"updates": {
"fallbackToCacheTimeout": 0
},
"assetBundlePatterns": [
"**/*"
],
"ios": {
"supportsTablet": true,
"bitcode": false,
"bundleIdentifier": "co.daily.expo.BotReady",
"infoPlist": {
"UIBackgroundModes": [
"voip"
]
},
"appleTeamId": "EEBGKV9N3N"
},
"android": {
"adaptiveIcon": {
"foregroundImage": "./assets/adaptive-icon.png",
"backgroundColor": "#FFFFFF"
},
"package": "co.daily.expo.BotReady",
"permissions": [
"android.permission.ACCESS_NETWORK_STATE",
"android.permission.BLUETOOTH",
"android.permission.CAMERA",
"android.permission.INTERNET",
"android.permission.MODIFY_AUDIO_SETTINGS",
"android.permission.RECORD_AUDIO",
"android.permission.SYSTEM_ALERT_WINDOW",
"android.permission.WAKE_LOCK",
"android.permission.FOREGROUND_SERVICE",
"android.permission.FOREGROUND_SERVICE_CAMERA",
"android.permission.FOREGROUND_SERVICE_MICROPHONE",
"android.permission.FOREGROUND_SERVICE_MEDIA_PROJECTION",
"android.permission.POST_NOTIFICATIONS"
]
},
"web": {
"favicon": "./assets/favicon.png"
},
"plugins": [
"@config-plugins/react-native-webrtc",
"@daily-co/config-plugin-rn-daily-js",
[
"expo-build-properties",
{
"android": {
"minSdkVersion": 24,
"compileSdkVersion": 35,
"targetSdkVersion": 34,
"buildToolsVersion": "35.0.0"
},
"ios": {
"deploymentTarget": "15.1"
}
}
]
]
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

View File

@@ -0,0 +1,7 @@
module.exports = function(api) {
api.cache(true);
return {
presets: ['babel-preset-expo'],
plugins: [["module:react-native-dotenv"]],
};
};

View File

@@ -0,0 +1 @@
API_BASE_URL=http://YOUR_LOCAL_IP:7860

View File

@@ -0,0 +1,7 @@
import { registerRootComponent } from "expo";
import App from "./src/App";
// registerRootComponent calls AppRegistry.registerComponent('main', () => App);
// It also ensures that the environment is set up appropriately
registerRootComponent(App);

View File

@@ -0,0 +1,4 @@
// Learn more https://docs.expo.io/guides/customizing-metro
const { getDefaultConfig } = require('expo/metro-config');
module.exports = getDefaultConfig(__dirname);

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,31 @@
{
"name": "bot-ready-rn",
"version": "1.0.0",
"scripts": {
"start": "expo start --dev-client",
"android": "expo run:android --device",
"ios": "expo run:ios --device",
"web": "expo start --web"
},
"dependencies": {
"@config-plugins/react-native-webrtc": "^10.0.0",
"@daily-co/config-plugin-rn-daily-js": "0.0.7",
"@daily-co/react-native-daily-js": "^0.70.0",
"@daily-co/react-native-webrtc": "^118.0.3-daily.2",
"@react-native-async-storage/async-storage": "1.23.1",
"expo": "^52.0.0",
"expo-build-properties": "~0.13.1",
"expo-dev-client": "~5.0.5",
"expo-splash-screen": "~0.29.16",
"expo-status-bar": "~2.0.0",
"react": "18.3.1",
"react-native": "0.76.3",
"react-native-background-timer": "^2.4.1",
"react-native-dotenv": "^3.4.11",
"react-native-get-random-values": "^1.11.0"
},
"devDependencies": {
"@babel/core": "^7.12.9"
},
"private": true
}

View File

@@ -0,0 +1,121 @@
import React, { useState, useEffect } from 'react';
import {SafeAreaView, View, Text, Button, StyleSheet, ScrollView} from 'react-native';
import Daily from "@daily-co/react-native-daily-js";
import { API_BASE_URL } from "@env";
const CallScreen = () => {
const [connectionStatus, setConnectionStatus] = useState('Disconnected');
const [isConnected, setIsConnected] = useState(false);
const [callObject, setCallObject] = useState(null);
const [logs, setLogs] = useState([]);
useEffect(() => {
if (callObject) {
setupTrackListeners(callObject);
}
}, [callObject]);
const log = (message) => {
setLogs((prevLogs) => [...prevLogs, `${new Date().toISOString()} - ${message}`]);
console.log(message);
};
const setupTrackListeners = (callObject) => {
callObject.on("joined-meeting", () => {
setConnectionStatus('Connected');
setIsConnected(true);
log('Client connected');
});
callObject.on("left-meeting", () => {
setConnectionStatus('Disconnected');
setIsConnected(false);
log('Client disconnected');
});
callObject.on("participant-left", () => {
// When the bot leaves, we are also disconnecting from the call
disconnect().catch((err) => {
log(`Failed to disconnect ${err}`);
})
});
// Trigger so the bot can start sending audio
callObject.on("track-started", (evt) => {
if (evt.track.kind === "audio" && evt.participant.local === false) {
handleEventToConsole(evt)
log("Sending the message that will trigger the bot to play the audio.")
callObject.sendAppMessage("playable")
}
});
callObject.on("error", (evt) => log(`Error: ${evt.error}`));
// Other events just for awareness
callObject.on("track-stopped", handleEventToConsole);
callObject.on("participant-joined", handleEventToConsole);
callObject.on("participant-updated", handleEventToConsole);
};
const handleEventToConsole = (evt) => {
log(`Received event: ${evt.action}`);
};
const connect = async () => {
try {
const callObject = Daily.createCallObject({ subscribeToTracksAutomatically: true });
setCallObject(callObject);
const connectionUrl = `${API_BASE_URL}/connect`
const res = await fetch(connectionUrl, { method: "POST", headers: { "Content-Type": "application/json" } });
const roomInfo = await res.json();
await callObject.join({ url: roomInfo.room_url });
} catch (error) {
log(`Error connecting: ${error.message}`);
}
};
const disconnect = async () => {
if (callObject) {
try {
await callObject.leave();
await callObject.destroy();
setCallObject(null);
} catch (error) {
log(`Error disconnecting: ${error.message}`);
}
}
};
return (
<SafeAreaView style={styles.safeArea}>
<View style={styles.container}>
<View style={styles.statusBar}>
<Text>Status: <Text style={styles.status}>{connectionStatus}</Text></Text>
<View style={styles.controls}>
<Button
title={isConnected ? "Disconnect" : "Connect"}
onPress={isConnected ? disconnect : connect}
/>
</View>
</View>
<View style={styles.debugPanel}>
<Text style={styles.debugTitle}>Debug Info</Text>
<ScrollView style={styles.debugLog}>
{logs.map((logEntry, index) => (
<Text key={index} style={styles.logText}>{logEntry}</Text>
))}
</ScrollView>
</View>
</View>
</SafeAreaView>
);
};
const styles = StyleSheet.create({
safeArea: { flex: 1, backgroundColor: '#f0f0f0', padding: 20 },
container: { flex: 1, margin: 20 },
statusBar: { flexDirection: 'row', justifyContent: 'space-between', alignItems: 'center', padding: 10, backgroundColor: '#fff', borderRadius: 8, marginBottom: 20 },
status: { fontWeight: 'bold' },
controls: { flexDirection: 'row', gap: 10 },
debugPanel: { height: '80%', backgroundColor: '#fff', borderRadius: 8, padding: 20},
debugTitle: { fontSize: 16, fontWeight: 'bold' },
debugLog: { height: '100%', overflow: 'scroll', backgroundColor: '#f8f8f8', padding: 10, borderRadius: 4, fontFamily: 'monospace', fontSize: 12, lineHeight: 1.4 },
});
export default CallScreen;

View File

@@ -0,0 +1,50 @@
# Bot ready signaling Server
A FastAPI server that manages bot instances and provide endpoint for Pipecat client connections.
## Endpoints
- `POST /connect` - Pipecat client connection endpoint
## Environment Variables
Copy `env.example` to `.env` and configure:
```ini
# Required API Keys
DAILY_API_KEY= # Your Daily API key
CARTESIA_API_KEY= # Your Cartesia API key
# Optional Configuration
DAILY_API_URL= # Optional: Daily API URL (defaults to https://api.daily.co/v1)
DAILY_SAMPLE_ROOM_URL= # Optional: Fixed room URL for development
HOST= # Optional: Host address (defaults to 0.0.0.0)
FAST_API_PORT= # Optional: Port number (defaults to 7860)
```
## Running the Server
Set up and activate your virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
Install dependencies:
```bash
pip install -r requirements.txt
```
If you want to use the local version of `pipecat` in this repo rather than the last published version, also run:
```bash
pip install --editable "../../../[daily,cartesia,openai]"
```
Run the server:
```bash
python server.py
```

View File

@@ -0,0 +1,3 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=
CARTESIA_API_KEY=

View File

@@ -0,0 +1,4 @@
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,cartesia,openai]

View File

@@ -0,0 +1,64 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
from typing import Optional
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
(url, token, _) = await configure_with_args(aiohttp_session)
return (url, token)
async def configure_with_args(
aiohttp_session: aiohttp.ClientSession, parser: Optional[argparse.ArgumentParser] = None
):
if not parser:
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token, args)

View File

@@ -0,0 +1,147 @@
#
# Copyright (c) 2025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import subprocess
from contextlib import asynccontextmanager
from typing import Any, Dict
import aiohttp
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
# Load environment variables from .env file
load_dotenv(override=True)
# Dictionary to track bot processes: {pid: (process, room_url)}
bot_procs = {}
# Store Daily API helpers
daily_helpers = {}
def cleanup():
"""Cleanup function to terminate all bot processes.
Called during server shutdown.
"""
for entry in bot_procs.values():
proc = entry[0]
proc.terminate()
proc.wait()
@asynccontextmanager
async def lifespan(app: FastAPI):
"""FastAPI lifespan manager that handles startup and shutdown tasks.
- Creates aiohttp session
- Initializes Daily API helper
- Cleans up resources on shutdown
"""
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
# Initialize FastAPI app with lifespan manager
app = FastAPI(lifespan=lifespan)
# Configure CORS to allow requests from any origin
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
async def create_room_and_token() -> tuple[str, str]:
"""Helper function to create a Daily room and generate an access token.
Returns:
tuple[str, str]: A tuple containing (room_url, token)
Raises:
HTTPException: If room creation or token generation fails
"""
room = await daily_helpers["rest"].create_room(DailyRoomParams())
if not room.url:
raise HTTPException(status_code=500, detail="Failed to create room")
token = await daily_helpers["rest"].get_token(room.url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
return room.url, token
@app.post("/connect")
async def bot_connect(request: Request) -> Dict[Any, Any]:
"""Connect endpoint that creates a room and returns connection credentials.
This endpoint is called by client to establish a connection.
Returns:
Dict[Any, Any]: Authentication bundle containing room_url and token
Raises:
HTTPException: If room creation, token generation, or bot startup fails
"""
print("Creating room for RTVI connection")
room_url, token = await create_room_and_token()
print(f"Room URL: {room_url}")
# Start the bot process
try:
bot_file = "signalling_bot"
proc = subprocess.Popen(
[f"python3 -m {bot_file} -u {room_url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
bot_procs[proc.pid] = (proc, room_url)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
# Return the authentication bundle in format expected by DailyTransport
return {"room_url": room_url, "token": token}
if __name__ == "__main__":
import uvicorn
# Parse command line arguments for server configuration
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(description="Daily Travel Companion FastAPI server")
parser.add_argument("--host", type=str, default=default_host, help="Host address")
parser.add_argument("--port", type=int, default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true", help="Reload code on change")
config = parser.parse_args()
# Start the FastAPI server
uvicorn.run(
"server:app",
host=config.host,
port=config.port,
reload=config.reload,
)

View File

@@ -0,0 +1,95 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from dataclasses import dataclass
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import AudioRawFrame, EndFrame, OutputAudioRawFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
@dataclass
class SilenceFrame(OutputAudioRawFrame):
def __init__(
self,
*,
sample_rate: int,
duration: float,
):
# Initialize the parent class with the silent frame's data
super().__init__(
audio=self.create_silent_audio_frame(sample_rate, 1, duration).audio,
sample_rate=sample_rate,
num_channels=1,
)
@staticmethod
def create_silent_audio_frame(
sample_rate: int, num_channels: int, duration: float
) -> AudioRawFrame:
"""Create an AudioRawFrame containing silence."""
frame_size = num_channels * 2 # 2 bytes per sample for 16-bit audio
total_frames = int(sample_rate * duration)
total_bytes = total_frames * frame_size
silent_audio = bytes(total_bytes) # Create a byte array filled with zeros
return AudioRawFrame(audio=silent_audio, sample_rate=sample_rate, num_channels=num_channels)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when we receive a specific message
@transport.event_handler("on_app_message")
async def on_app_message(transport, message, sender):
logger.debug(f"Received app message: {message} - {sender}")
if "playable" not in message:
return
await task.queue_frames(
[
SilenceFrame(
sample_rate=task.params.audio_out_sample_rate,
duration=0.5,
),
TTSSpeakFrame(f"Hello there, how are you doing today ?"),
EndFrame(),
]
)
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -15,7 +15,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -65,7 +65,6 @@ async def main():
# English
#
voice_id="cgSgspJ2msm6clMCkdW9",
aiohttp_session=session,
#
# Spanish
#
@@ -97,7 +96,7 @@ async def main():
call completion, CanonicalMetrics will send the audio buffer to Canonical for
analysis. Visit https://voice.canonical.chat to learn more.
"""
audio_buffer_processor = AudioBufferProcessor()
audio_buffer_processor = AudioBufferProcessor(num_channels=2)
canonical = CanonicalMetricsService(
audio_buffer_processor=audio_buffer_processor,
aiohttp_session=session,
@@ -105,6 +104,7 @@ async def main():
call_id=str(uuid.uuid4()),
assistant="pipecat-chatbot",
assistant_speaks_first=True,
context=context,
)
pipeline = Pipeline(
[
@@ -119,21 +119,24 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await audio_buffer_processor.start_recording()
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.queue_frame(EndFrame())
await task.cancel()
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
# Here we don't want to cancel, we just want to finish sending
# whatever is queued, so we use an EndFrame().
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -53,4 +53,3 @@ async def configure(aiohttp_session: aiohttp.ClientSession):
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)
return (url, token)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -18,7 +18,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -83,7 +82,6 @@ async def main():
# English
#
voice_id="cgSgspJ2msm6clMCkdW9",
aiohttp_session=session,
#
# Spanish
#
@@ -110,8 +108,9 @@ async def main():
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# Save audio every 10 seconds.
audiobuffer = AudioBufferProcessor(buffer_size=480000)
# NOTE: Watch out! This will save all the conversation in memory. You
# can pass `buffer_size` to get periodic callbacks.
audiobuffer = AudioBufferProcessor()
pipeline = Pipeline(
[
@@ -125,7 +124,7 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
@@ -133,13 +132,14 @@ async def main():
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await audiobuffer.start_recording()
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.queue_frame(EndFrame())
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -7,7 +7,7 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -70,20 +70,22 @@ async def main(room_url: str, token: str):
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await task.cancel()
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
# Here we don't want to cancel, we just want to finish sending
# whatever is queued, so we use an EndFrame().
await task.queue_frame(EndFrame())
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -5,6 +5,15 @@ import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
@@ -12,16 +21,6 @@ logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token: str):
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
transport = DailyTransport(
room_url,
token,
@@ -63,7 +62,7 @@ async def main(room_url: str, token: str):
task = PipelineTask(
pipeline,
PipelineParams(
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -75,11 +74,11 @@ async def main(room_url: str, token: str):
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
python-dotenv==1.0.1
modal==0.65.48
pipecat-ai[daily,silero,cartesia,openai]==0.0.48
fastapi==0.115.4
aiohttp==3.11.9
modal==0.71.3
pipecat-ai[daily,silero,cartesia,openai]==0.0.52
fastapi==0.115.6
aiohttp==3.11.11

View File

@@ -1,85 +0,0 @@
<div align="center">
 <img alt="pipecat" width="300px" height="auto" src="image.png">
</div>
# Dialin example
Example project that demonstrates how to add phone number dialin to your Pipecat bots. We include examples for both Daily (`bot_daily.py`) and Twilio (`bot_twilio.py`), depending on who you want to use as a phone vendor.
- 🔁 Transport: Daily WebRTC
- 💬 Speech-to-Text: Deepgram via Daily transport
- 🤖 LLM: GPT4-o / OpenAI
- 🔉 Text-to-Speech: ElevenLabs
#### Should I use Daily or Twilio as a vendor?
If you're starting from scratch, using Daily to provision phone numbers alongside Daily as a transport offers some convenience (such as automatic call forwarding.)
If you already have Twilio numbers and workflows that you want to connect to your Pipecat bots, there is some additional configuration required (you'll need to create a `on_dialin_ready` and use the Twilio client to trigger the forward.)
You can read more about this, as well as see respective walkthroughs in our docs.
## Setup
```shell
# Install the requirements
pip install -r requirements.txt
# Setup your env
mv env.example .env
```
## Using Daily numbers
Run `bot_runner.py` to handle incoming HTTP requests:
`python bot_runner.py --host localhost`
Then target the following URL:
`POST /daily_start_bot`
For more configuration options, please consult Daily's API documentation.
## Using Twilio numbers
As above, but target the following URL:
`POST /twilio_start_bot`
For more configuration options, please consult Twilio's API documentation.
## Deployment example
A Dockerfile is included in this demo for convenience. Here is an example of how to build and deploy your bot to [fly.io](https://fly.io).
*Please note: This demo spawns agents as subprocesses for convenience / demonstration purposes. You would likely not want to do this in production as it would limit concurrency to available system resources. For more information on how to deploy your bots using VMs, refer to the Pipecat documentation.*
### Build the docker image
`docker build -t tag:project .`
### Launch the fly project
`mv fly.example.toml fly.toml`
`fly launch` (using the included fly.toml)
### Setup your secrets on Fly
Set the necessary secrets (found in `env.example`)
`fly secrets set DAILY_API_KEY=... OPENAI_API_KEY=... ELEVENLABS_API_KEY=... ELEVENLABS_VOICE_ID=...`
If you're using Twilio as a number vendor:
`fly secrets set TWILIO_ACCOUNT_SID=... TWILIO_AUTH_TOKEN=...`
### Deploy!
`fly deploy`
## Need to do something more advanced?
This demo covers the basics of bot telephony. If you want to know more about working with PSTN / SIP, please ping us on [Discord](https://discord.gg/pipecat).

View File

@@ -1,103 +0,0 @@
import argparse
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyDialinSettings, DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
daily_api_key = os.getenv("DAILY_API_KEY", "")
daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
async def main(room_url: str, token: str, callId: str, callDomain: str):
# diallin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
diallin_settings = DailyDialinSettings(call_id=callId, call_domain=callDomain)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
dialin_settings=diallin_settings,
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying 'Oh, hello! Who dares dial me at this hour?!'.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Simple ChatBot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
parser.add_argument("-i", type=str, help="Call ID")
parser.add_argument("-d", type=str, help="Call Domain")
config = parser.parse_args()
asyncio.run(main(config.u, config.t, config.i, config.d))

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -16,8 +16,7 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
load_dotenv(override=True)
@@ -26,7 +25,7 @@ logger.add(sys.stderr, level="DEBUG")
async def main():
transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))
transport = LocalAudioTransport(LocalAudioTransportParams(audio_out_enabled=True))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
@@ -41,7 +40,7 @@ async def main():
await asyncio.sleep(1)
await task.queue_frames([TTSSpeakFrame("Hello there, how is it going!"), EndFrame()])
runner = PipelineRunner()
runner = PipelineRunner(handle_sigint=False if sys.platform == "win32" else True)
await asyncio.gather(runner.run(task), say_something())

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -13,7 +13,7 @@ from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
@@ -53,7 +53,7 @@ async def main():
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await task.cancel()
await runner.run(task)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -18,8 +18,7 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
load_dotenv(override=True)
@@ -34,7 +33,9 @@ async def main():
transport = TkLocalTransport(
tk_root,
TransportParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
TkTransportParams(
camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024
),
)
imagegen = FalImageGenService(

View File

@@ -0,0 +1,65 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.google import GoogleImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Show a still frame image",
DailyParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
)
imagegen = GoogleImageGenService(
api_key=os.getenv("GOOGLE_API_KEY"),
)
runner = PipelineRunner()
task = PipelineTask(
Pipeline([imagegen, transport.output()]),
params=PipelineParams(enable_metrics=True),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frame(TextFrame("a cat in the style of picasso"))
await task.queue_frame(TextFrame("a dog in the style of picasso"))
await task.queue_frame(TextFrame("a fish in the style of picasso"))
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -51,7 +51,6 @@ async def main():
)
elevenlabs_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -30,8 +30,7 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport, TkOutputTransport
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
load_dotenv(override=True)
@@ -152,7 +151,7 @@ async def main():
transport = TkLocalTransport(
tk_root,
TransportParams(
TkTransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,7 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, LLMMessagesFrame, MetricsFrame
from pipecat.frames.frames import Frame, MetricsFrame
from pipecat.metrics.metrics import (
LLMUsageMetricsData,
ProcessingMetricsData,
@@ -38,6 +38,8 @@ logger.add(sys.stderr, level="DEBUG")
class MetricsLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, MetricsFrame):
for d in frame.data:
if isinstance(d, TTFBMetricsData):
@@ -47,9 +49,7 @@ class MetricsLogger(FrameProcessor):
elif isinstance(d, LLMUsageMetricsData):
tokens = d.value
print(
f"!!! MetricsFrame: {frame}, tokens: {
tokens.prompt_tokens}, characters: {
tokens.completion_tokens}"
f"!!! MetricsFrame: {frame}, tokens: {tokens.prompt_tokens}, characters: {tokens.completion_tokens}"
)
elif isinstance(d, TTSUsageMetricsData):
print(f"!!! MetricsFrame: {frame}, characters: {d.value}")
@@ -105,7 +105,10 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(enable_metrics=True, enable_usage_metrics=True),
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
@@ -113,7 +116,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -15,13 +15,19 @@ from PIL import Image
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, OutputImageRawFrame, SystemFrame, TextFrame
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
OutputImageRawFrame,
TextFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -45,7 +51,7 @@ class ImageSyncAggregator(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if not isinstance(frame, SystemFrame) and direction == FrameDirection.DOWNSTREAM:
if isinstance(frame, BotStartedSpeakingFrame):
await self.push_frame(
OutputImageRawFrame(
image=self._speaking_image_bytes,
@@ -53,7 +59,8 @@ class ImageSyncAggregator(FrameProcessor):
format=self._speaking_image_format,
)
)
await self.push_frame(frame)
elif isinstance(frame, BotStoppedSpeakingFrame):
await self.push_frame(
OutputImageRawFrame(
image=self._waiting_image_bytes,
@@ -61,8 +68,8 @@ class ImageSyncAggregator(FrameProcessor):
format=self._waiting_image_format,
)
)
else:
await self.push_frame(frame)
await self.push_frame(frame)
async def main():
@@ -84,7 +91,7 @@ async def main():
),
)
tts = CartesiaHttpTTSService(
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
@@ -109,16 +116,24 @@ async def main():
pipeline = Pipeline(
[
transport.input(),
image_sync_aggregator,
context_aggregator.user(),
llm,
tts,
image_sync_aggregator,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(pipeline)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
@@ -126,6 +141,10 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -13,7 +13,6 @@ from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -77,7 +76,7 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -90,7 +89,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -75,7 +74,7 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -88,7 +87,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -78,13 +77,25 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -101,7 +101,15 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
@@ -114,6 +122,10 @@ async def main():
messages = [({"content": "Please briefly introduce yourself to the user."})]
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -16,7 +16,6 @@ from runner import configure
from pipecat.frames.frames import (
BotInterruptionFrame,
LLMMessagesFrame,
StopInterruptionFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
@@ -80,7 +79,15 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@stt.event_handler("on_speech_started")
async def on_speech_started(stt, *args, **kwargs):
@@ -94,7 +101,11 @@ async def main():
async def on_first_participant_joined(transport, participant):
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -73,13 +72,25 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -75,7 +74,7 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -88,7 +87,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,12 +14,12 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService, OpenAITTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.playht import PlayHTHttpTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -38,14 +38,17 @@ async def main():
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="alloy")
tts = PlayHTHttpTTSService(
user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
@@ -70,14 +73,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -78,7 +77,7 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -91,7 +90,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -82,14 +81,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -0,0 +1,109 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService, OpenAISTTService, OpenAITTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
# You can use the OpenAI compatible API like Groq.
# stt = OpenAISTTService(
# base_url="https://api.groq.com/openai/v1",
# api_key="gsk_***",
# model="whisper-large-v3",
# )
stt = OpenAISTTService(api_key=os.getenv("OPENAI_API_KEY"), model="whisper-1")
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="alloy")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -15,7 +15,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -80,14 +79,26 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -74,14 +73,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -79,19 +78,27 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
# Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -39,7 +38,6 @@ async def main():
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
@@ -71,14 +69,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -89,8 +88,11 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True, enable_metrics=True, enable_usage_metrics=True
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@@ -98,7 +100,11 @@ async def main():
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -80,14 +79,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,14 +14,12 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.google import GoogleTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.google import GoogleLLMService, GoogleSTTService, GoogleTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -40,21 +38,22 @@ async def main():
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = GoogleTTSService(
voice_id="en-US-Neural2-J",
params=GoogleTTSService.InputParams(language="en-US", rate="1.05"),
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
tts = GoogleTTSService(
voice_id="en-US-Journey-F",
params=GoogleTTSService.InputParams(language=Language.EN_US),
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
messages = [
{
@@ -78,14 +77,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -79,14 +78,26 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,14 +14,10 @@ from loguru import logger
from runner import configure
from pipecat.audio.filters.krisp_filter import KrispFilter
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -63,28 +59,40 @@ async def main():
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,13 +14,12 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService
from pipecat.services.rime import RimeHttpTTSService
from pipecat.services.rime import RimeTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -45,10 +44,9 @@ async def main():
),
)
tts = RimeHttpTTSService(
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
params=RimeHttpTTSService.InputParams(reduce_latency=True),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
@@ -76,7 +74,7 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -89,7 +87,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -75,13 +74,17 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -216,11 +216,7 @@ async def main():
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GoogleLLMService(
model="gemini-1.5-flash-latest",
# model="gemini-exp-1114",
api_key=os.getenv("GOOGLE_API_KEY"),
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
messages = [
{
@@ -255,7 +251,7 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -268,6 +264,10 @@ async def main():
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -75,7 +74,7 @@ async def main():
task = PipelineTask(
pipeline,
PipelineParams(
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -88,7 +87,11 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()

View File

@@ -48,7 +48,6 @@ async def main():
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts2 = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -21,7 +21,7 @@ from pipecat.frames.frames import (
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -61,7 +61,6 @@ async def main():
"Test",
DailyParams(
audio_in_enabled=True,
audio_in_sample_rate=24000,
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
@@ -78,7 +77,13 @@ async def main():
runner = PipelineRunner()
task = PipelineTask(pipeline)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=24000,
audio_out_sample_rate=24000,
),
)
await runner.run(task)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -22,10 +22,9 @@ from pipecat.frames.frames import (
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -62,12 +61,12 @@ async def main():
tk_root.title("Local Mirror")
daily_transport = DailyTransport(
room_url, token, "Test", DailyParams(audio_in_enabled=True, audio_in_sample_rate=24000)
room_url, token, "Test", DailyParams(audio_in_enabled=True)
)
tk_transport = TkLocalTransport(
tk_root,
TransportParams(
TkTransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
@@ -82,7 +81,13 @@ async def main():
pipeline = Pipeline([daily_transport.input(), MirrorProcessor(), tk_transport.output()])
task = PipelineTask(pipeline)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=24000,
audio_out_sample_rate=24000,
),
)
async def run_tk():
while not task.has_finished():

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -76,7 +76,7 @@ async def main():
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -7,6 +7,7 @@
import asyncio
import os
import sys
from typing import Optional
import aiohttp
from dotenv import load_dotenv
@@ -32,7 +33,7 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
def __init__(self, participant_id: Optional[str] = None):
super().__init__()
self._participant_id = participant_id

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -7,6 +7,7 @@
import asyncio
import os
import sys
from typing import Optional
import aiohttp
from dotenv import load_dotenv
@@ -32,7 +33,7 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
def __init__(self, participant_id: Optional[str] = None):
super().__init__()
self._participant_id = participant_id
@@ -72,9 +73,7 @@ async def main():
vision_aggregator = VisionImageFrameAggregator()
google = GoogleLLMService(
model="gemini-1.5-flash-latest", api_key=os.getenv("GOOGLE_API_KEY")
)
google = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -7,6 +7,7 @@
import asyncio
import os
import sys
from typing import Optional
import aiohttp
from dotenv import load_dotenv
@@ -32,7 +33,7 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
def __init__(self, participant_id: Optional[str] = None):
super().__init__()
self._participant_id = participant_id

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -7,6 +7,7 @@
import asyncio
import os
import sys
from typing import Optional
import aiohttp
from dotenv import load_dotenv
@@ -32,7 +33,7 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
def __init__(self, participant_id: Optional[str] = None):
super().__init__()
self._participant_id = participant_id

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -16,8 +16,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
load_dotenv(override=True)
@@ -34,7 +33,7 @@ class TranscriptionLogger(FrameProcessor):
async def main():
transport = LocalAudioTransport(TransportParams(audio_in_enabled=True))
transport = LocalAudioTransport(LocalAudioTransportParams(audio_in_enabled=True))
stt = WhisperSTTService()
@@ -44,7 +43,7 @@ async def main():
task = PipelineTask(pipeline)
runner = PipelineRunner()
runner = PipelineRunner(handle_sigint=False if sys.platform == "win32" else True)
await runner.run(task)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024, Daily
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

Some files were not shown because too many files have changed in this diff Show More