Compare commits

...

217 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
86ed485711 Merge pull request #3440 from pipecat-ai/changelog-0.0.99
Release 0.0.99 - Changelog Update
2026-01-13 17:02:41 -08:00
Aleix Conchillo Flaqué
7e1b4a4e90 update cosmetic changelog updates for 0.0.99 2026-01-13 16:59:46 -08:00
aconchillo
4531d517da Update changelog for version 0.0.99 2026-01-14 00:49:15 +00:00
Aleix Conchillo Flaqué
6fd5847f84 Merge pull request #3439 from pipecat-ai/aleix/uv-lock-2026-01-13
uv.lock: upgrade to latest versions
2026-01-13 16:48:07 -08:00
Aleix Conchillo Flaqué
2015eba9b2 uv.lock: upgrade to latest versions 2026-01-13 16:45:44 -08:00
Mark Backman
84f16ee895 Merge pull request #3438 from pipecat-ai/mb/fix-26a
Fix 26a foundational
2026-01-13 19:43:50 -05:00
Aleix Conchillo Flaqué
5b2af03b16 Merge pull request #3437 from pipecat-ai/aleix/update-aggregator-logs
LLMContextAggregatorPair: make strategy logs less verbose
2026-01-13 16:39:29 -08:00
Mark Backman
b313395dc3 Fix 26a foundational 2026-01-13 19:31:24 -05:00
Aleix Conchillo Flaqué
0d6bdbee10 LLMContextAggregatorPair: make strategy logs less verbose 2026-01-13 15:11:22 -08:00
Aleix Conchillo Flaqué
248dac3a9d Merge pull request #3420 from pipecat-ai/pk/fix-gemini-3-parallel-function-calls
Fix parallel function calling with Gemini 3.
2026-01-13 14:40:33 -08:00
Paul Kompfner
be49a54856 Fast-exit in the fix for parallel function calling with Gemini 3, if we can determine up-front that there's no work to do 2026-01-13 17:32:20 -05:00
Aleix Conchillo Flaqué
bd9ee0d646 Merge pull request #3434 from pipecat-ai/aleix/context-appregator-pair-tuple
context aggregator pair tuple
2026-01-13 14:12:51 -08:00
Mark Backman
442e0e582d Merge pull request #3431 from pipecat-ai/mb/update-realtime-examples-transcript-handler
Update GeminiLiveLLMService to push thought frames, update 26a for new transcript events
2026-01-13 17:10:40 -05:00
kompfner
38194c0cff Merge pull request #3436 from pipecat-ai/pk/remove-transcript-processor-reference
Remove dead import of `TranscriptProcessor` (which is now deprecated)
2026-01-13 17:06:17 -05:00
Paul Kompfner
0ebdaba03c Remove dead import of TranscriptProcessor (which is now deprecated) 2026-01-13 17:02:57 -05:00
Aleix Conchillo Flaqué
ee82377d68 examples: fix 22d to push some CancelFrame and EndFrame 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
861588e4a3 examples: update all examples to use the new LLMContextAggregatorPair tuple 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
1ab3bf2ef6 LLMContextAggregatorPair: instances can now return a tuple 2026-01-13 14:01:53 -08:00
Mark Backman
bb00d223c9 Update 26a to use context aggregator transcription events 2026-01-13 17:01:10 -05:00
Aleix Conchillo Flaqué
86fbfaddd1 Merge pull request #3435 from pipecat-ai/aleix/fix-llm-context-create-audio-message
LLMContext: fix create_audio_message
2026-01-13 13:59:28 -08:00
Aleix Conchillo Flaqué
5612bf513b LLMContext: fix create_audio_message 2026-01-13 13:53:34 -08:00
Mark Backman
87d0dc9e24 Merge pull request #3412 from pipecat-ai/mb/remove-41a-b
Remove foundational examples 41a and 41b
2026-01-13 16:45:26 -05:00
Paul Kompfner
30fbcfbf71 Rework fix for parallel function calling with Gemini 3 2026-01-13 16:33:59 -05:00
Mark Backman
5d90f4ea06 Merge pull request #3428 from pipecat-ai/mb/fix-tracing-none-values
Fix TTS, realtime LLM services could return unknown for model_name
2026-01-13 15:40:10 -05:00
kompfner
f6d09e1574 Merge pull request #3430 from pipecat-ai/pk/request-image-frame-fixes
Fix request_image_frame and usage
2026-01-13 15:36:44 -05:00
Mark Backman
b8e48dee7f Merge pull request #3433 from pipecat-ai/mb/port-realtime-examples-transcript-events
Update examples to use transcription events from context aggregators
2026-01-13 15:36:06 -05:00
Mark Backman
a6ccb9ec69 Merge pull request #3427 from pipecat-ai/mb/add-07j-gladia-vad-example
Add 07j Gladia VAD foundational example, add to release evals
2026-01-13 15:35:24 -05:00
Mark Backman
66551ebdf5 Merge pull request #3426 from pipecat-ai/mb/changelog-3404
Add changelog fragments for PR 3404
2026-01-13 15:34:58 -05:00
Aleix Conchillo Flaqué
21534f7d83 added changelog file for #3430 2026-01-13 12:21:22 -08:00
Mark Backman
d591f9e108 Remove 28-transcription-processor.py 2026-01-13 15:20:59 -05:00
Mark Backman
aa2589d3be Update examples to use transcription events from context aggregators 2026-01-13 15:19:47 -05:00
Aleix Conchillo Flaqué
9d6067fa78 examples(foundational): speak "Let me check on that" in 14d examples 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
027e54425a examples(foundational): associate image requests to function calls 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
e268c73c41 LLMAssistantAggregator: cache function call requested images 2026-01-13 12:10:08 -08:00
Aleix Conchillo Flaqué
d3c57e2da0 UserImageRawFrame: don't deprecate request field 2026-01-13 11:56:13 -08:00
Aleix Conchillo Flaqué
02eace5a16 UserImageRequestFrame: don't deprecate function call related fields 2026-01-13 11:55:55 -08:00
Mark Backman
15bc1dd999 Update GeminiLiveLLMService to push Thought frames when thought content is returned 2026-01-13 14:13:00 -05:00
Paul Kompfner
b937956dc8 Fix request_image_frame and usage 2026-01-13 13:23:01 -05:00
Mark Backman
efbc0c8510 Fix TTS, realtime LLM services could return unknown for model_name 2026-01-13 12:12:15 -05:00
Himanshu Gunwant
d0f227189c fix: openai llm model name is unknown (#3422) 2026-01-13 11:55:52 -05:00
Mark Backman
41eef5efc4 Add 07j Gladia VAD foundational example, add to release evals 2026-01-13 11:36:15 -05:00
Mark Backman
f00f9d9f1a Add changelog fragments for PR 3404 2026-01-13 11:29:17 -05:00
Mark Backman
ae59b3ba36 Merge pull request #3404 from poseneror/feature/gladia-vad-events
feat(gladia): add VAD events support
2026-01-13 11:26:56 -05:00
Paul Kompfner
6668712f7b Add evals for parallel function calling 2026-01-13 11:03:38 -05:00
Paul Kompfner
8812686b17 Fix parallel function calling with Gemini 3.
Gemini expects parallel function calls to be passed in as a single multi-part `Content` block. This is important because only one of the function calls in a batch of parallel function calls gets a thought signature—if they're passed in as separate `Content` blocks, there'd be one or more missing thought signatures, which would result in a Gemini error.
2026-01-13 11:03:38 -05:00
kompfner
8b0f0b5bb4 Merge pull request #3425 from pipecat-ai/pk/gemini-3-flash-new-thinking-levels
Add Gemini 3 Flash-specific thinking levels
2026-01-13 11:02:53 -05:00
Paul Kompfner
f5e8a04e3b Bump aiortc dependency, which relaxes the constraint on av, which was pinned to 14.4.0, which no longer has all necessary wheels 2026-01-13 10:50:08 -05:00
Mark Backman
a298ce3b41 Merge pull request #3424 from pipecat-ai/mb/tts-append-trailing-space
Add append_trailing_space to TTSService to prevent vocalizing trailin…
2026-01-13 10:42:40 -05:00
Mark Backman
31daa889e8 Add append_trailing_space to TTSService to prevent vocalizing trailing punctuation; update DeepgramTTSService and RimeTTSService to use the arg 2026-01-13 10:38:54 -05:00
Paul Kompfner
76a058178e Add Gemini 3 Flash-specific thinking levels 2026-01-13 09:50:59 -05:00
poseneror
3304b18ac2 Add should_interrupt + broadcast user events 2026-01-13 14:27:35 +02:00
poseneror
b95a6afe77 feat(gladia): add VAD events support
Add support for Gladia's speech_start/speech_end events to emit
UserStartedSpeakingFrame and UserStoppedSpeakingFrame frames.

When enable_vad=True in GladiaInputParams:
- speech_start triggers interruption and pushes UserStartedSpeakingFrame
- speech_end pushes UserStoppedSpeakingFrame
- Tracks speaking state to prevent duplicate events

This allows using Gladia's built-in VAD instead of a separate VAD
in the pipeline.
2026-01-13 14:27:35 +02:00
Mark Backman
f6ed7d7582 Merge pull request #3418 from pipecat-ai/mb/speechmatics-task-cleanup 2026-01-12 19:24:56 -05:00
Mark Backman
cd3290df1c Small cleanup for task creation in SpeechmaticsSTTService 2026-01-12 16:00:32 -05:00
Mark Backman
2296caf529 Merge pull request #3414 from pipecat-ai/mb/changelog-3410
Update changelog for PR 3410.changed.md
2026-01-12 13:43:42 -05:00
Mark Backman
90ded6658d Merge pull request #3403 from pipecat-ai/mb/inworld-tts-add-keepalive
InworldTTSService: Add keepalive task
2026-01-12 13:31:24 -05:00
Mark Backman
7e97fb80a5 Merge pull request #3392 from pipecat-ai/mb/websocket-service-connection-closed-error
Add reconnect logic to WebsocketService in the event of ConnectionClo…
2026-01-12 13:11:43 -05:00
Mark Backman
b58471fdb1 Add Exotel and Vonage to Serializers in README services list 2026-01-12 12:24:56 -05:00
Aleix Conchillo Flaqué
46b4f9f29b Merge pull request #3413 from pipecat-ai/aleix/fix-assistant-thought-aggregation
LLMAssistantAggregator: reset aggregation after adding the thought, not before
2026-01-12 09:21:42 -08:00
Aleix Conchillo Flaqué
ec20d72aba LLMAssistantAggregator: reset aggregation after adding the thought, not before 2026-01-12 09:18:13 -08:00
Mark Backman
5743e2a99b Update changelog for PR 3410.changed.md 2026-01-12 12:15:40 -05:00
Mark Backman
2f429a2e76 Merge pull request #3410 from Vonage/feat/fastapi-ws-vonage-serializer
feat: update FastAPI WebSocket transport and add Vonage serializer
2026-01-12 12:10:57 -05:00
Varun Pratap Singh
3e982f7a4a refactor: rename audio_packet_bytes to fixed_audio_packet_size 2026-01-12 22:11:39 +05:30
Mark Backman
89484e281d Remove foundational examples 41a and 41b 2026-01-12 10:11:58 -05:00
Varun Pratap Singh
14a115f372 changelog: add fragments for PR #3410 2026-01-12 18:12:27 +05:30
Varun Pratap Singh
e96595fe59 feat: update FastAPI WebSocket transport and add Vonage serializer 2026-01-12 17:50:38 +05:30
Mark Backman
f58d21862b WebsocketService: Add _maybe_try_reconnect and use for exception cases 2026-01-11 16:43:37 -05:00
Mark Backman
aac24ad2d4 InworldTTSService: Add keepalive task 2026-01-10 11:20:20 -05:00
Aleix Conchillo Flaqué
1df9575e20 Merge pull request #3400 from pipecat-ai/aleix/ensure-bot-speaking-flag-is-set
BaseOutputTransport: ensure bot speaking flag is set on time
2026-01-10 07:34:26 -08:00
Aleix Conchillo Flaqué
64609fe80f BaseOutputTransport: ensure bot speaking flag is set on time 2026-01-09 20:40:25 -08:00
Aleix Conchillo Flaqué
533a54e111 Merge pull request #3399 from pipecat-ai/aleix/groq-switch-orpheus
GroqTTSService: switch to canopylabs/orpheus-v1-english
2026-01-09 20:39:56 -08:00
Aleix Conchillo Flaqué
b59c3eb470 GroqTTSService: switch to canopylabs/orpheus-v1-english 2026-01-09 18:14:48 -08:00
Aleix Conchillo Flaqué
0366fc35cb Merge pull request #3398 from pipecat-ai/aleix/examples-foundational-fix-49c-transport
examples(foundational): add missing transport.output() to 49c
2026-01-09 17:39:54 -08:00
Aleix Conchillo Flaqué
d86ff4b1ee Merge pull request #3397 from pipecat-ai/aleix/add-setup-pipeline-task
PipelineTask: add external pipeline task setup files
2026-01-09 17:39:25 -08:00
Aleix Conchillo Flaqué
f8040324e1 Merge pull request #3396 from pipecat-ai/aleix/add-pipeline-task-pipeline-property
PipelineTask: add pipeline property
2026-01-09 17:38:51 -08:00
Mark Backman
9c81acb159 Track websocket disconnecting status to improve error handling 2026-01-09 20:24:07 -05:00
Aleix Conchillo Flaqué
65395b1112 examples(foundational): add missing transport.output() to 49c 2026-01-09 16:44:04 -08:00
Aleix Conchillo Flaqué
d2696be03b PipelineTask: add external pipeline task setup files 2026-01-09 16:42:27 -08:00
Aleix Conchillo Flaqué
2da4d420f9 PipelineTask: add pipeline property 2026-01-09 15:47:02 -08:00
Aleix Conchillo Flaqué
a992f95c02 clarify changelog with #3343 fix 2026-01-09 10:37:16 -08:00
Aleix Conchillo Flaqué
edd8e07df6 update changelog with #3343 fix 2026-01-09 10:31:29 -08:00
Aleix Conchillo Flaqué
c813d43da0 Merge pull request #3343 from omChauhanDev/fix/auto-resolve-function-result
fix: keeping the Aggregator and Service states synchronized.
2026-01-09 10:04:20 -08:00
Aleix Conchillo Flaqué
c973445ab7 Merge pull request #3385 from pipecat-ai/aleix/context-aggregator-turn-stop-messages
user and assistant aggregator turn events
2026-01-09 09:52:48 -08:00
Aleix Conchillo Flaqué
25f6ba76d6 add start timestamp to user and assistant turn messages 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
8f47c569f9 examples(foundational): add 28-user-assistant-turns.py 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
c16801e524 examples(foundational): update 49 series with on_assistant_thought 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
dafcd0448f added changelog for new assistant turn events 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
24a52375c7 tests: added LLMAssistantAggregator unit tests 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
5f9e95038e BaseObject: improve logging messages 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
5cbb21afb2 deprecate TranscriptProcessor and related dataclasses and frames 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
119fab2996 LLMAssistantAggregator: allow thought aggregation without appending to context 2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
38d354c4ed LLMAssistantAggregator: add assistant turn and thought events 2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
cdb1074e11 LLMAssistantAggregator: no need to use BotStoppedSpeakingFrame
The end of turn is already handle with interruptions or with
LLMFullResponseEndFrame. LLMFullResponseEndFrame should never be blocked,
otherwise the assistant would not work.
2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
4b61fd2d7d LLMUserAggregator: add user turn stopped message argument
It is now possible to get the user aggregation when a `on_user_turn_stopped`
event is emitted.
2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
5a0a5c120b Merge pull request #3394 from pipecat-ai/aleix/base-smart-turn-update-vad-start-secs
smartturn: rename on_vad_start_secs_updated to update_vad_start_secs
2026-01-09 09:40:29 -08:00
Aleix Conchillo Flaqué
d92926ae54 smartturn: rename on_vad_start_secs_updated to update_vad_start_secs 2026-01-09 09:34:15 -08:00
Aleix Conchillo Flaqué
b34af5da24 Merge pull request #3372 from pipecat-ai/aleix/add-user-turn-controller-processor
add new UserTurnController and UserTurnProcessor
2026-01-09 09:29:10 -08:00
Aleix Conchillo Flaqué
5da1f86575 scripts: add 53-concurrent-llm-evaluation.py to release evals 2026-01-09 09:26:38 -08:00
Aleix Conchillo Flaqué
b0185e3539 tests: improve LLMUserAggregator tests 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
7232da6ba1 tests: added unit tests for UserTurnProcessor 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
9dff75cd44 examples: add 53-concurrent-llm-evaluation.py 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
6038860be0 tests: added unit tests for UserTurnController 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
4653de9f03 tests: rename test_bot_turn_start_strategy to test_user_turn_stop_strategy 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
fef79651ef turns: add UserTurnProcessor for advanced pipeline user turn management 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
3d54ca0a7c LLMUserAggregator: user UserTurnController for user turn management 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
199986815c turns: add UserTurnController for user turn management 2026-01-09 09:21:28 -08:00
Filipi da Silva Fuchter
0a3c00f68b Merge pull request #3391 from pipecat-ai/filipi/krisp_followup_improvements
Krisp VIVA follow-up improvements
2026-01-09 10:39:23 -05:00
marcus-daily
3e2467eb71 Fixing ruff formatting 2026-01-09 15:07:13 +00:00
marcus-daily
c4cc476c3d Updating changelog 2026-01-09 15:07:13 +00:00
marcus-daily
cc6ff1ac54 Reverting quickstart to match main 2026-01-09 15:07:13 +00:00
marcus-daily
b075502c4c Addressing code review comments 2026-01-09 15:07:13 +00:00
marcus-daily
35a99f92ab Take into account VAD start_secs when passing audio data to Smart Turn, and add an extra 500ms of pre-speech audio for good measure 2026-01-09 15:07:13 +00:00
Mark Backman
4fe0836cf9 Add reconnect logic to WebsocketService in the event of ConnectionClosedError 2026-01-09 09:03:01 -05:00
filipi87
8b7cc65ae6 Mentioning the Krisp Viva improvements in the changelog. 2026-01-09 10:43:01 -03:00
filipi87
4d495ba74f Fixing ruff format. 2026-01-09 10:32:36 -03:00
filipi87
de5de0b162 Fixed KrispVivaTurn to properly release the Krisp SDK. 2026-01-09 10:31:17 -03:00
filipi87
311da30802 Updating the Krisp Viva example to use Krisp turn model. 2026-01-09 10:19:13 -03:00
Garegin Harutyunyan
16819a5caa Krisp VIVA SDK Filter and Turn support. (#3261)
* Krisp VIVA SDK Filter and Turn support.

* Reverted the krisp_filter.py as it's already deprectaed.

* enabled test with krisp_audio mock.

* More review comment fixes.
reverted the state logic in viva filter to be similar to the existing impl on main branch.
Fixed tests, ruff, etc.

* More review comments for Turn detection.
removed integration tests.

* Moved the SDK init/deinit into start/stop
2026-01-09 08:15:08 -05:00
Mark Backman
72a44c2fcd Merge pull request #3386 from pipecat-ai/mb/deepgram-deprecate-vad-events
Deprecate support for vad_events in DeepgramSTTService
2026-01-09 07:56:03 -05:00
Mark Backman
7783b20b91 Merge pull request #3390 from dhruvladia-sarvam/update/sarvam-plugins 2026-01-09 07:11:13 -05:00
dhruvladia-sarvam
962ccbc0d7 fix 2026-01-09 14:26:28 +05:30
Mark Backman
4d61c5d7b2 Deprecate support for vad_events in DeepgramSTTService 2026-01-08 20:32:30 -05:00
Mark Backman
7ca4597ade Merge pull request #3379 from lukepayyapilli/fix/fastapi-websocket-json-text-handling
Fix FastAPIWebsocketTransport to handle both binary and text messages
2026-01-08 17:26:35 -05:00
Luke Payyapilli
f1a22728ab Add websocket extra to coverage workflow 2026-01-08 17:13:31 -05:00
Luke Payyapilli
ca88fc849f Add websocket extra to CI for FastAPI test coverage 2026-01-08 17:09:27 -05:00
Luke Payyapilli
ccd795445f Fix protobuf serializer test to compare attributes instead of frame objects 2026-01-08 17:00:40 -05:00
Luke Payyapilli
1874269a48 Remove FrameSerializerType enum and type property from serializers 2026-01-08 16:54:23 -05:00
Mark Backman
8b20373a8e Merge pull request #3380 from pipecat-ai/mb/changelog-3366
Add changelog fragment for PR 3366
2026-01-08 14:49:47 -05:00
Aleix Conchillo Flaqué
15dcb77a0c Merge pull request #3364 from pipecat-ai/rajneesh/add-daily-sip-provider-option
Add support for specifying sip provider.
2026-01-08 11:47:09 -08:00
Mark Backman
5d2fac9cd7 Add changelog fragment for PR 3366 2026-01-08 14:43:23 -05:00
Mark Backman
682b253760 Merge pull request #3366 from lukepayyapilli/fix/cartesia-allow-none-language
Allow language=None in CartesiaTTSService for auto-detection
2026-01-08 14:42:09 -05:00
Luke Payyapilli
f440de82e2 Handle None language in _process_word_timestamps_for_language 2026-01-08 13:59:21 -05:00
Mark Backman
5e0e6822c7 Merge pull request #3360 from pipecat-ai/mb/openai-realtime-send-image
Add video input (e.g. image input) support for OpenAI Realtime
2026-01-08 13:26:35 -05:00
Mark Backman
2aadac7a4d Update OpenAIRealtime image to video to align with GeminiLive 2026-01-08 13:23:08 -05:00
Filipi da Silva Fuchter
1098394486 Merge pull request #3374 from pipecat-ai/filipi/external_turn_controllers_interruptions
External turn controllers improvements
2026-01-08 13:05:41 -05:00
Mark Backman
b90a34228f Update 19c to remove pausing audio and input 2026-01-08 13:00:45 -05:00
Mark Backman
8bf8ebd34b Remove start_audio_paused from OpenAI Realtime demos, and others 2026-01-08 13:00:45 -05:00
Mark Backman
673d88417c Change Gemini Live and OpenAI Realtime logging to trace when sending a video frame 2026-01-08 13:00:45 -05:00
Mark Backman
3a7b489208 Add foundational 19c and add to evals 2026-01-08 13:00:45 -05:00
Mark Backman
7ae9eebc34 Add image input support for OpenAI Realtime 2026-01-08 13:00:44 -05:00
Mark Backman
8f83ba5878 Merge pull request #3376 from dhruvladia-sarvam/update/sarvam-plugins
headers update
2026-01-08 12:57:32 -05:00
filipi87
b8af3fa214 Improving should_interrupt docs for Speechmatics. 2026-01-08 14:53:29 -03:00
dhruvladia-sarvam
5ddec4f596 fix 2026-01-08 23:07:40 +05:30
dhruvladia-sarvam
8f4b4f4941 fix 2026-01-08 23:04:43 +05:30
dhruvladia-sarvam
953349f262 fix 2026-01-08 22:53:59 +05:30
Luke Payyapilli
b52ae0e56b Fix FastAPIWebsocketTransport to handle both binary and text messages 2026-01-08 11:25:18 -05:00
dhruvladia-sarvam
893b448534 headers update 2026-01-08 21:09:41 +05:30
Mark Backman
973769b8bc Merge pull request #3370 from pipecat-ai/mb/fix-azure-tts
AzureTTSService cleanup
2026-01-08 09:34:02 -05:00
filipi87
c8fa9d34e1 Adding a changelog entry for the new should_interrupt property. 2026-01-08 10:58:07 -03:00
filipi87
3069deb92f Allows defining whether Speechmatics should send an interruption when the user’s turn has started. 2026-01-08 10:50:33 -03:00
filipi87
68c9c01747 Allows defining whether Flux should send an interruption when the user’s turn has started. 2026-01-08 10:44:53 -03:00
filipi87
5e8f0baa12 Allows defining whether Deepgram should send an interruption when the user’s turn has started. 2026-01-08 10:36:52 -03:00
Mark Backman
8d1286cc00 Merge pull request #3371 from speechmatics/fix/voice-version-bump 2026-01-08 07:21:43 -05:00
Aleix Conchillo Flaqué
bda4dd339a Merge pull request #3373 from pipecat-ai/aleix/update-copyright-notices-2026
update examples and tests copyright and use a proper dash in 2024-2026
2026-01-07 20:36:40 -08:00
rajneeshksoni
f2e3034d24 Add support for specifying sip provider.
optional "provider" field in the RoomSipParams
2026-01-08 09:05:56 +05:30
Aleix Conchillo Flaqué
2626154a64 update examples and tests copyright and use a proper dash in 2024-2026 2026-01-07 19:32:22 -08:00
Sam Sykes
b770b2a419 Changelog 2026-01-07 16:56:56 -08:00
Sam Sykes
158c34b0f9 version bump 2026-01-07 16:54:53 -08:00
Mark Backman
d507c88d3e Merge pull request #3369 from pipecat-ai/mb/copyright-2026
Update copyright date range to 2024-2026
2026-01-07 17:07:05 -05:00
Mark Backman
98f70b775f Update copyright date range to 2024-2026 2026-01-07 16:58:13 -05:00
Mark Backman
54f4b824e4 Merge pull request #3356 from pipecat-ai/mb/gemini-live-user-transcript-timeout
Add timeout for handling user transcript messages
2026-01-07 16:47:23 -05:00
Mark Backman
2aa5307f0a Add _push_user_transcription to unify the logic to push user transcripts from a single utility function 2026-01-07 16:43:48 -05:00
Mark Backman
6c10d6ef8a Merge pull request #3367 from pipecat-ai/marcus/smart-turn-v3.2
Updated Smart Turn model weights to v3.2
2026-01-07 16:37:53 -05:00
Mark Backman
89b36f2b25 AzureTTSService: Restore metrics generation 2026-01-07 16:33:52 -05:00
Mark Backman
79a6adbcf3 AzureTTSService: Handle first chunk only for timestamps and TTFB metrics 2026-01-07 16:15:01 -05:00
Mark Backman
95f00a3c4b AzureTTSService: Align error handling with Pipecat norms 2026-01-07 15:45:30 -05:00
Mark Backman
3f8373f76f AzureTTSService: prevent word timestamp carryover on interruption 2026-01-07 15:39:37 -05:00
Mark Backman
23a9d3f4d7 Merge pull request #3334 from obata-kotobasamurai/fix/azure-tts-word-timestamp
Add word-level timestamp support to Azure TTS with race condition fix
2026-01-07 14:48:02 -05:00
Mark Backman
333279f45a Merge pull request #3328 from speechmatics/fix/speectmatics-vad
Update to SpeechmaticsSTTService for `0.0.99`
2026-01-07 14:42:21 -05:00
yukiobata1
add5f51201 updated azure tts.py file 2026-01-08 03:14:37 +09:00
marcus-daily
d1bedef5b3 Updated Smart Turn model weights to v3.2 2026-01-07 17:23:11 +00:00
Mark Backman
54cf0116a8 Merge pull request #3363 from pipecat-ai/mb/update-audo-context-inheritance
Update AudioContextTTSService to inherit from WebsocketTTSService
2026-01-07 12:08:47 -05:00
Luke Payyapilli
6b252fb46e Allow language=None in CartesiaTTSService for auto-detection 2026-01-07 11:50:21 -05:00
Sam Sykes
3e00a16f0f Remove unused import and correction to docs. 2026-01-07 07:45:26 -08:00
Sam Sykes
ecfd93544a Correction to UserStartedSpeakingFrame timing. 2026-01-07 07:43:47 -08:00
Sam Sykes
3ec89e49bf Added changelog for split_sentences and code tidy for end of turn handling. 2026-01-07 07:41:49 -08:00
Mark Backman
8762506e9f Update AudioContextTTSService to inherit from WebsocketTTSService 2026-01-07 09:10:27 -05:00
yukiobata1
7204bf9914 added changegelog 2026-01-07 13:32:31 +09:00
yukiobata1
f62c262f23 Call start_word_timestamps() when the first audio chunk arrives 2026-01-07 13:10:41 +09:00
Mark Backman
10aa784809 Merge pull request #3351 from okue/fix/stt-model-name-attribute
Fix STT model name attribute retrieval in tracing decorator
2026-01-06 16:10:56 -05:00
Filipi da Silva Fuchter
904f5dc183 Merge pull request #3338 from omChauhanDev/fix/smallwebrtc-mute-timeout-spam
fix(smallwebrtc): suppress timeout warnings when tracks are disabled
2026-01-06 09:07:52 -05:00
Mark Backman
c61a5e7173 Merge pull request #3346 from pipecat-ai/mb/cartesia-pronunciation-dict
Cartesia TTS: Add support for pronunciation_dict_id
2026-01-06 08:52:09 -05:00
Filipi da Silva Fuchter
81b28beef5 Merge pull request #3357 from pipecat-ai/filipi/live_avatar
Added support for using the HeyGen LiveAvatar API with the HeyGenTransport
2026-01-06 08:22:39 -05:00
filipi87
0d34356678 Adding a changelog entry for the HeyGen LiveAvatar API change. 2026-01-06 10:19:19 -03:00
filipi87
5412840a93 Added support for using the HeyGen LiveAvatar API with the HeyGenTransport. 2026-01-06 10:16:12 -03:00
yukiobata1
137bbb3d2c updated tts.py to match mark's version 2026-01-06 21:16:13 +09:00
Mark Backman
5a40054ac2 Merge pull request #3216 from mayurdd/patch-1
Adding include_language_detection param to Elevenlabs Realtime STT
2026-01-05 17:01:02 -05:00
Mark Backman
be621fbc5c Add timeout for handling user transcript messages 2026-01-05 16:58:14 -05:00
Mark Backman
9ab4836601 Merge pull request #3323 from pipecat-ai/mb/changelog-3322
Add changelog fragment for PR #3322
2026-01-05 16:55:52 -05:00
mayurdd
4671102833 Addressing the comments 2026-01-05 13:35:37 -08:00
Mayur Sirwani
67401a275b Adding include_language_detection to Elevenlabs Realtime STT
Adding a param to the config while connecting to the session
2026-01-05 13:27:27 -08:00
Mark Backman
c422588071 Merge pull request #3345 from pipecat-ai/mb/avoid-tts-dot
Add trailing space to DeepgramTTSService text generation
2026-01-05 15:56:14 -05:00
kompfner
fb12fec899 Merge pull request #3354 from pipecat-ai/pk/fix-aws-nova-sonic-example-for-nova-2-sonic
Fix the 20e example to use the proper conversation-start pattern for …
2026-01-05 11:17:57 -05:00
Paul Kompfner
c53c49558f Fix the 20e example to use the proper conversation-start pattern for the Nova 2 Sonic model 2026-01-05 10:56:08 -05:00
okue
1a26a2daa4 Fix STT model name attribute retrieval in tracing decorator
Changed getattr with default value to use 'or' operator for fallback.
This ensures proper model name retrieval when model_name attribute exists but is None or empty.
2026-01-05 17:20:48 +09:00
Mark Backman
d8be1282b5 Cartesia TTS: Add support for pronunciation_dict_id 2026-01-04 09:30:04 -05:00
Mark Backman
91bc5236b5 Add trailing space to DeepgramTTSService text generation 2026-01-04 08:53:48 -05:00
Om Chauhan
b278957111 fix: broadcast FunctionCallResultFrame, on implicit return 2026-01-03 19:52:25 +05:30
Mark Backman
1c80c739d6 Merge pull request #3335 from pipecat-ai/mb/update-evals-07-variants
Add 07 example variants to release evals
2026-01-02 15:32:12 -05:00
Om Chauhan
700a94222b fix(smallwebrtc): suppress timeout warnings when tracks are disabled 2026-01-01 22:00:08 +05:30
Sam Sykes
d5d2156689 Updated changelog. 2025-12-31 19:07:11 +00:00
Sam Sykes
8203ad08a8 Updated to have default as FIXED for Pipecat VAD. 2025-12-31 19:05:29 +00:00
Mark Backman
31907b90f0 Add 07 example variants to release evals 2025-12-31 09:11:00 -05:00
Mark Backman
7b595f10ce Merge pull request #3329 from omChauhanDev/deepgram-tts-validation
added encoding validation in DeepgramTTSService
2025-12-31 08:20:40 -05:00
yukiobata1
4f93d331b7 Added await to self.start_word_timestamps() 2025-12-31 19:19:21 +09:00
yukiobata1
32c6dccebe Add word-level timestamp support to Azure TTS with cumulative PTS fix
This commit adds word boundary support to AzureTTSService and fixes
the race condition that causes scrambled TTS output across multiple
sentences.

## Features Added

- Change AzureTTSService to inherit from WordTTSService
- Subscribe to Azure SDK's synthesis_word_boundary event
- Emit word-level text with timing information via _words_queue
- Add synthesis lock for sequential sentence processing

## Race Condition Fix

Previously, each sentence's word boundary timestamps reset to 0,
causing downstream components to interleave words when reordering
frames by PTS. This resulted in scrambled output like:
  'Hello ! I What am questions AI have assistant...'

The fix adds cumulative audio offset tracking to ensure monotonically
increasing PTS across all sentences:
  Sentence 1: pts = 0.1s, 0.5s, 0.8s (cumulative at end: 0.8s)
  Sentence 2: pts = 0.9s, 1.2s, 1.5s (0.8s + relative offset)

## Key Changes

- _cumulative_audio_offset: tracks total audio duration
- _handle_word_boundary: adds cumulative offset to timestamps
- _handle_completed: accumulates audio duration for next sentence
- flush_audio: resets cumulative offset at end of LLM response
- _handle_interruption: resets state on user interruption
- run_tts: uses synthesis lock for sequential processing

Fixes #2918

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 18:49:48 +09:00
Aleix Conchillo Flaqué
cbdc2b7d2d Merge pull request #3330 from pipecat-ai/aleix/update-turn-start-strategies-deprecations
update turn start strategies deprecations
2025-12-30 21:04:47 -08:00
Aleix Conchillo Flaqué
66a9dc70c7 LLMUserAggregator: fix turn strategies renaming 2025-12-30 20:59:48 -08:00
Aleix Conchillo Flaqué
846ca500d3 turns: update old turn_start_strategies deprecations 2025-12-30 19:50:10 -08:00
Om Chauhan
bd6afd445d added changelog 2025-12-31 09:18:18 +05:30
Om Chauhan
0663bbc2fb added encoding validation in DeepgramTTSService 2025-12-31 08:33:17 +05:30
Sam Sykes
8e7a951af8 updated changelog 2025-12-31 01:36:58 +00:00
Sam Sykes
ba1aeb8f7f Changelog 2025-12-31 01:31:46 +00:00
Sam Sykes
f7c74cfa80 Updated VAD 2025-12-31 01:28:31 +00:00
Mark Backman
2e700c8576 Merge pull request #3324 from pipecat-ai/mb/bump-small-webrtc-prebuilt-version
Bump small-webrtc-prebuilt verison to 2.0.4, update uv.lock
2025-12-30 20:10:11 -05:00
Mark Backman
f4626a4fc4 Bump small-webrtc-prebuilt verison to 2.0.4, update uv.lock 2025-12-30 14:19:20 -05:00
Mark Backman
e0b40a330f Add changelog fragment for PR #3322 2025-12-30 08:40:28 -05:00
652 changed files with 10468 additions and 4760 deletions

View File

@@ -33,7 +33,7 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
- name: Run tests with coverage
run: |

View File

@@ -37,7 +37,7 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
- name: Test with pytest
run: |

View File

@@ -7,6 +7,518 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [0.0.99] - 2026-01-13
### Added
- Introducing user turn strategies. User turn strategies indicate when the user
turn starts or stops. In conversational agents, these are often referred to
as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g.
using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using
an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are
evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategy
Available user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategy
The default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]
urn strategies are configured when setting up `LLMContextAggregatorPair`.
For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
)
],
)
),
)
```
In order to use the user turn strategies you must update to the new
universal `LLMContext` and `LLMContextAggregatorPair`.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural
network via pyrnnoise library.
(PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205))
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time
voice conversations:
- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz)
(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))
- Added an approximation of TTFB for Ultravox.
(PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268))
- Added a new `AudioContextTTSService` to the TTS service base classes. The
`AudioContextWordTTSService` now inherits from `AudioContextTTSService` and
`WebsocketWordTTSService`.
(PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289))
- `LLMUserAggregator` now exposes the following events:
- `on_user_turn_started`: triggered when a user turn starts
- `on_user_turn_stopped`: triggered when a user turn ends
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop
and times out
(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))
- Introducing user mute strategies. User mute strategies indicate when user
input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user
input from interrupting bot speech, tool execution, or other critical system
operations.
A list of strategies can be specified; all strategies are evaluated for
every frame so that each strategy can maintain its internal state. A user
frame is muted if any of the configured strategies indicates it should be
muted.
Available user mute strategies:
* `FirstSpeechUserMuteStrategy`
* `MuteUntilFirstBotCompleteUserMuteStrategy`
* `AlwaysUserMuteStrategy`
* `FunctionCallUserMuteStrategy`
User mute strategies replace the legacy `STTMuteFilter` and provide a more
flexible and composable approach to muting user input.
User mute strategies are configured when setting up the
`LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
FirstSpeechUserMuteStrategy(),
]
),
)
```
In order to use user mute strategies you should update to the new universal
`LLMContext` and `LLMContextAggregatorPair`.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService`
and `NvidiaTTSService`.
(PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300))
- Added `enable_interruptions` constructor argument to all user turn
strategies. This tells the `LLMUserAggregator` to push or not push an
`InterruptionFrame`.
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control
sentence splitting behavior for finals on sentence boundaries.
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
- Added word-level timestamp support to `AzureTTSService` for accurate
text-to-audio synchronization.
(PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334))
- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams`
and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation
dictionary feature for custom pronunciations.
(PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346))
- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport`
(see https://www.liveavatar.com/).
(PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357))
- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:
- New `start_video_paused` parameter to control initial video input state
- New `video_frame_detail` parameter to set image processing quality
("auto",
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
parameter.
- `set_video_input_paused()` method to pause/resume video input at runtime
- `set_video_frame_detail()` method to adjust video frame quality
dynamically
- Automatic rate limiting (1 frame per second) to prevent API overload
(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))
- Added `UserTurnProcessor`, a frame processor built on `UserTurnController`
that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames
and interruptions based on the controller's user turn strategies.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added `UserTurnController` to manage user turns. It emits
`on_user_turn_started`, `on_user_turn_stopped`, and
`on_user_turn_stop_timeout` events, and can be integrated into processors to
detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are
implemented using this controller.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added `should_interrupt` property to `DeepgramFluxSTTService`,
`DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the
bot should be interrupted when the external service detects user speech.
(PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374))
- `LLMAssistantAggregator` now exposes the following events:
- `on_assistant_turn_started`: triggered when the assistant turn starts
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
- `on_assistant_thought`: triggered when there's an assistant thought
available
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA
SDK (requires `krisp_audio`).
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Added support for setting up a pipeline task from external files. You can now
register custom pipeline task setup files by setting the
`PIPECAT_SETUP_FILES` environment variable. This variable should contain a
colon-separated list of Python files (e.g. `export
PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
function with the following signature:
```python
async def setup_pipeline_task(task: PipelineTask):
...
```
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
- Added a keepalive task for `InworldTTSService` to keep the service connected
in the event of no generations for longer periods of time.
(PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403))
- Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When
enabled, `GladiaSTTService` acts as the turn controller, emitting
`UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally
`InterruptionFrame`.
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
- Added `should_interrupt` property to `GladiaSTTService` to configure whether
the bot should be interrupted when the external service detects user speech.
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
- Added `VonageFrameSerializer` for the Vonage Video API Audio Connector
WebSocket protocol.
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
- Added `append_trailing_space` parameter to `TTSService` to automatically
append a trailing space to text before sending to TTS, helping prevent some
services from vocalizing trailing punctuation.
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
### Changed
- Updated `ElevenLabsRealtimeSTTService` to accept the
`include_language_detection` parameter to detect language.
```python
stt = ElevenLabsRealtimeSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
include_language_detection=True
)
```
(PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216))
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved
VAD, Smart Turn capabilities, and brings dramatic improvements to latency
without any impact on accuracy. Use the `turn_detection_mode` parameter to control
the endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
```python
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
```
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
- `daily-python` updated to 0.23.0.
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by
`DailyTransport` now include the transport source (i.e., the originating
audio track).
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
- Updates to Inworld TTS services:
- Improved `InworldTTSService`'s websocket implementation to better flush
and close context to better handle long inputs.
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))
- Updated `DeepgramSTTService` to push user started/stopped speaking and
interruption frames when `vad_enabled` is set to true. This centralizes the
frames into the service, removing the need to have your application code
handle Deepgram's events and push these frames.
(PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314))
- Added encoding validation to `DeepgramTTSService` to prevent unsupported
encodings from reaching the API. The service now raises `ValueError` at
initialization with a clear error message.
(PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329))
- Updated `read_audio_frame` & `read_video_frame` methods in
`SmallWebRTCClient` to check if the track is enabled before logging a
warning.
(PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336))
- Updated `CartesiaTTSService` to support setting `language=None`, resulting in
Cartesia auto-detecting the language of the conversation.
(PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366))
- The bundled Smart Turn weights are now updated to v3.2, which has better
handling of short utterances, and is more robust against background noise.
(PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367))
- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6`
(PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371))
- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
meaning that the start of the turn audio is not cut off. This improves
accuracy for short utterances.
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
(PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377))
- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter`
to share a single SDK instance within the same process.
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english`
and voice ID to `autumn`.
(PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399))
- Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio
packetization via the `fixed_audio_packet_size` parameter to support media
endpoints requiring strict framing and real-time pacing.
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
- `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to
`True` to prevent punctuation (e.g., “dot”) from being pronounced.
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
- Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`,
`LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns
thought content.
(PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431))
### Deprecated
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use
`pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- `FrameProcessor.interruption_strategies` is deprecated, use
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in
`pipecat.processors.aggregators.llm_response` are now deprecated. Use the new
universal `LLMContext` and `LLMContextAggregatorPair` instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and
`UserStoppedSpeakingFrame` frames.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame`
frames are deprecated.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in
unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies`
parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is
deprecated. Use the new `turn_detection_mode` parameter instead, with
`TurnDetectionMode.EXTERNAL`,`TurnDetectionMode.ADAPTIVE`, or
`TurnDetectionMode.SMART_TURN`. The `enable_vad` parameter is also
deprecated and is inferred from the `turn_detection_mode`.
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are
now deprecated in favor of the universal `LLMContext` and its associated
things.
From the developer's point of view, switching to using `LLMContext`
machinery will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```
(PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263))
- `STTMuteFilter` is deprecated and will be removed in a future version. Use
`LLMUserAggregator`'s new `user_mute_strategies` instead.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- `FrameProcessor.interruptions_allowed` is now deprecated, use
`LLMUserAggregator`'s new parameter `user_mute_strategies` instead.
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
- `PipelineParams.allow_interruptions` is now deprecated, use
`LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For
example, to disable interruptions but still get user turns you can do:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
),
),
)
```
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
- `TranscriptProcessor` and related data classes and frames
(`TranscriptionMessage`, `ThoughtTranscriptionMessage`,
`TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and
`LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and
`on_assistant_turn_stopped`) instead.
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
- Deprecated support for the `vad_events` `LiveOptions` in
`DeepgramSTTService`. Instead, use a local Silero VAD for VAD events.
Additionally, deprecated `should_interrupt` which will be removed along with
`vad_events` support in a future release.
(PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386))
- Loading external observers from files is deprecated, use the new pipeline
task setup files and `PIPECAT_SETUP_FILES` environment variable instead.
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
### Fixed
- Improved error handling in `ElevenLabsRealtimeSTTService`
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
that blocks the process if the websocket disconnects due to an error
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
- Fixed a bug in `STTMuteFilter` where the user was not always muted during
function calls, especially when there were multiple simultaneous calls.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate
memory" error when processing silence audio frames.
(PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322))
- Updated `SpeechmaticsSTTService` for version `0.0.99+`:
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
in order to finalize transcription.
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
detection.
- Only emit VAD + interruption frames if VAD is enabled within the plugin
(modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
- Fixed an issue with function calling where a handler failing to invoke its
result callback could leave the context stuck in IN_PROGRESS, causing LLM
inference for subsequent function call results to block while waiting on the
unresolved call.
(PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343))
- Fixed an issue with DeepgramTTSService where the model would output "Dot"
instead of a period in some circumstances.
(PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345))
- Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351))
- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were
occasionally not pushed.
(PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356))
- Fixed potential memory leaks and initialization issues in `KrispVivaFilter`
by improving SDK lifecycle management.
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was
set after awaiting, allowing the event loop to re-enter the method before the
guard was set.
(PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400))
- Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422))
- Fixed an issue in `traced_tts`, `traced_gemini_live`, and
`traced_openai_realtime` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428))
- Fixed `request_image_frame` (for backwards compatibility) and restored
function-callrelated fields in `UserImageRequestFrame` and
`UserImageRawFrame`, preventing a case where adding a non-LLM message to the
context could trigger duplicate LLM inferences (on image arrival and on
function-call result), potentially causing an infinite inference loop.
(PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430))
- Fixed `LLMContext.create_audio_message()` by correcting an internal helper
that was incorrectly declared async while being run in `asyncio.to_thread()`.
(PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435))
### Other
- Added `52-live-transcription.py` foundational example demonstrating live
transcription and translation from English to Spanish. In this example, the
bot is not interruptible: as the user continues speaking, English
transcriptions are queued, and the bot continuously translates and speaks
each queued sentence in Spanish without being interrupted by new user speech.
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows
how to use `UserTurnProcessor`.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added a new foundational example `28-user-assistant-turns.py` that shows how
to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to
gather a conversation transcript.
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
## [0.0.98] - 2025-12-17
### Added

View File

@@ -1,6 +1,6 @@
BSD 2-Clause License
Copyright (c) 20242025, Daily
Copyright (c) 20242026, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

View File

@@ -78,7 +78,7 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |

View File

@@ -1,42 +0,0 @@
- Introducing user turn strategies. User turn strategies indicate when the user turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategy
Available user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategy
The default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]
Turn strategies are configured when setting up `LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
)
],
)
),
)
```
In order to use the user turn strategies you must update to the new universal `LLMContext` and `LLMContextAggregatorPair`.

View File

@@ -1 +0,0 @@
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in unexpected behavior, use `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- `FrameProcessor.interruption_strategies` is deprecated, use `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` frames are deprecated.

View File

@@ -1 +0,0 @@
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames.

View File

@@ -1 +0,0 @@
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new universal `LLMContext` and `LLMContextAggregatorPair` instead.

View File

@@ -1 +0,0 @@
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural network via pyrnnoise library.

View File

@@ -1,15 +0,0 @@
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved VAD,
Smart Turn capabilities, and brings dramatic improvements to latency without
any impact on accuracy. Use the `turn_detection_mode` parameter to control the
endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
```python
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
```

View File

@@ -1,4 +0,0 @@
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is deprecated.
Use the new `turn_detection_mode` parameter instead, with `TurnDetectionMode.EXTERNAL`,
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. The `enable_vad`
parameter is also deprecated and is inferred from the `turn_detection_mode`.

View File

@@ -1,2 +0,0 @@
- Improved error handling in `ElevenLabsRealtimeSTTService`
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop that blocks the process if the websocket disconnects due to an error

View File

@@ -1 +0,0 @@
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by `DailyTransport` now include the transport source (i.e., the originating audio track).

View File

@@ -1 +0,0 @@
- `daily-python` updated to 0.23.0.

View File

@@ -1,15 +0,0 @@
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are now deprecated in favor of the universal `LLMContext` and its associated things.
From the developer's point of view, switching to using `LLMContext` machinery will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```

View File

@@ -1,8 +0,0 @@
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time voice conversations:
- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz)

View File

@@ -1 +0,0 @@
- Added an approximation of TTFB for Ultravox.

View File

@@ -1,5 +0,0 @@
- Updates to Inworld TTS services:
- Improved `InworldTTSService`'s websocket implementation to better flush and
close context to better handle long inputs.
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.

View File

@@ -1 +0,0 @@
- Added a new `AudioContextTTSService` to the TTS service base classes. The `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and `WebsocketWordTTSService`.

View File

@@ -1,4 +0,0 @@
- `LLMUserAggregator` now exposes the following events:
- `on_user_turn_started`: triggered when a user turn starts
- `on_user_turn_stopped`: triggered when a user turn ends
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop and times out

View File

@@ -1,29 +0,0 @@
- Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations.
A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted.
Available user mute strategies:
* `FirstSpeechUserMuteStrategy`
* `MuteUntilFirstBotCompleteUserMuteStrategy`
* `AlwaysUserMuteStrategy`
* `FunctionCallUserMuteStrategy`
User mute strategies replace the legacy `STTMuteFilter` and provide a more flexible and composable approach to muting user input.
User mute strategies are configured when setting up the `LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
FirstSpeechUserMuteStrategy(),
]
),
)
```
In order to use user mute strategies you should update to the new universal `LLMContext` and `LLMContextAggregatorPair`.

View File

@@ -1 +0,0 @@
- `STTMuteFilter` is deprecated and will be removed in a future version. Use `LLMUserAggregator`'s new `user_mute_strategies` instead.

View File

@@ -1 +0,0 @@
- Fixed a bug in `STTMuteFilter` where the user was not always muted during function calls, especially when there were multiple simultaneous calls.

View File

@@ -1 +0,0 @@
- `FrameProcessor.interruptions_allowed` is now deprecated, use `LLMUserAggregator`'s new parameter `user_mute_strategies` instead.

View File

@@ -1,12 +0,0 @@
- `PipelineParams.allow_interruptions` is now deprecated, use `LLMUserAggregator`'s new parameter `turn_start_strategies` instead. For example, to disable interruptions but still get user turns you can do:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
),
),
)
```

View File

@@ -1 +0,0 @@
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService` and `NvidiaTTSService`.

View File

@@ -1 +0,0 @@
- Updated `DeepgramSTTService` to push user started/stopped speaking and interruption frames when `vad_enabled` is set to true. This centralizes the frames into the service, removing the need to have your application code handle Deepgram's events and push these frames.

View File

@@ -1 +0,0 @@
- Added `enable_interruptions` constructor argument to all user turn strategies. This tells the `LLMUserAggregator` to push or not push an `InterruptionFrame`.

View File

@@ -1 +0,0 @@
- Added `52-live-transcription.py` foundational example demonstrating live transcription and translation from English to Spanish. In this example, the bot is not interruptible: as the user continues speaking, English transcriptions are queued, and the bot continuously translates and speaks each queued sentence in Spanish without being interrupted by new user speech.

View File

@@ -97,7 +97,8 @@ INWORLD_API_KEY=...
KRISP_MODEL_PATH=...
# Krisp Viva
KRISP_VIVA_MODEL_PATH=...
KRISP_VIVA_FILTER_MODEL_PATH=...
KRISP_VIVA_TURN_MODEL_PATH=...
# LiveKit
LIVEKIT_API_KEY=...

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -85,7 +85,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -98,11 +98,11 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -68,7 +68,7 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -82,11 +82,11 @@ async def main():
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -78,7 +78,7 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def main():
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -106,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -119,12 +119,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(),
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
ml,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -120,7 +120,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -138,12 +138,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(),
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
image_sync_aggregator,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -77,7 +77,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -90,11 +90,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -131,7 +131,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -140,11 +140,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -117,7 +117,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -132,11 +132,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -82,7 +82,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -100,11 +100,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
transport.input(),
rtvi,
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -79,7 +79,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -95,11 +95,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
transport.input(),
rtvi,
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -82,7 +82,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -97,11 +97,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -105,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -118,12 +118,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
audiobuffer, # write audio data to a file
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -81,7 +81,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -97,11 +97,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
transport.input(), # Transport user input
rtvi,
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS (HumeTTSService with word timestamps)
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -103,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
lc = LangchainProcessor(history_chain)
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -116,11 +116,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
lc, # Langchain
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -71,7 +71,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -80,11 +80,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -81,7 +81,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -96,11 +96,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -86,7 +86,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -99,11 +99,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -72,7 +72,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -81,11 +81,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -75,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -88,11 +88,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -85,7 +85,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -100,11 +100,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -80,7 +80,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -93,11 +93,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -84,7 +84,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -97,11 +97,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -84,7 +84,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -97,11 +97,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -83,7 +83,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -96,11 +96,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -81,7 +81,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -96,11 +96,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -0,0 +1,140 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY", ""),
region=os.getenv("GLADIA_REGION"),
params=GladiaInputParams(
language_config=LanguageConfig(
languages=[Language.EN],
),
enable_vad=True,
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
messages = [
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -87,7 +87,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -100,11 +100,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -75,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -88,11 +88,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User respones
user_aggregator, # User respones
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -120,7 +120,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Setup context aggregators for message handling
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -133,11 +133,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # Speech-to-text
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # Strands Agents processor
tts, # Text-to-speech
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -80,7 +80,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -93,11 +93,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -102,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -115,11 +115,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # Gemini TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -107,7 +107,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -120,11 +120,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # Gemini TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -90,7 +90,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -103,11 +103,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User respones
user_aggregator, # User respones
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -90,7 +90,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -103,11 +103,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User respones
user_aggregator, # User respones
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -80,7 +80,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -93,11 +93,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,9 +1,26 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Interruptible bot with Krisp VIVA noise filtering and turn detection.
This example demonstrates a conversational bot with:
- Krisp VIVA noise reduction on incoming audio
- Krisp VIVA Turn detection for natural interruptions
- Voice activity detection (VAD)
Required environment variables:
- KRISP_VIVA_FILTER_MODEL_PATH: Path to the Krisp noise filter model file (.kef)
- KRISP_VIVA_TURN_MODEL_PATH: Path to the Krisp turn detection model file (.kef)
- DEEPGRAM_API_KEY: Deepgram API key for STT/TTS
- OPENAI_API_KEY: OpenAI API key for LLM
Optional environment variables:
- KRISP_NOISE_SUPPRESSION_LEVEL: Noise suppression level 0-100 (default: 100)
Higher values = more aggressive noise reduction
"""
import os
@@ -11,7 +28,7 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.turn.krisp_viva_turn import KrispTurnParams, KrispVivaTurn
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
@@ -78,11 +95,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=KrispVivaTurn())]
),
),
)
@@ -91,11 +108,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -83,7 +83,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -98,11 +98,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -77,7 +77,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -90,11 +90,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -249,7 +249,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -257,7 +257,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
),
),
)
audio_collector = UserAudioCollector(context, context_aggregator.user())
audio_collector = UserAudioCollector(context, user_aggregator)
pull_transcript_out_of_llm_output = TranscriptExtractor(context)
fixup_context_messages = TranscriptionContextFixup(context)
@@ -265,12 +265,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
audio_collector,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
pull_transcript_out_of_llm_output,
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
fixup_context_messages,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -82,7 +82,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -97,11 +97,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -77,7 +77,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -90,11 +90,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -80,7 +80,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -93,11 +93,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -62,7 +62,7 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -75,11 +75,11 @@ async def main():
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

Some files were not shown because too many files have changed in this diff Show More