Compare commits

..

690 Commits

Author SHA1 Message Date
Mark Backman
5a6cc4d35c Replace assert-based type narrowing with local variables and guards
Use local variable narrowing and if-guards instead of assert statements
for type safety, since asserts are stripped with python -O.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 16:46:45 -05:00
Mark Backman
28be775740 Reduce type: ignore comments by fixing avoidable type mismatches
Replace ~20 type: ignore comments with proper type fixes:
- Widen set_tools() to accept List[dict] | ToolsSchema | NotGiven
- Widen create_task() to accept Coroutine | Awaitable
- Fix _turn_params to use BaseTurnParams instead of SmartTurnParams
- Make _thought_llm Optional[str] with assertion guard
- Add mixer assertion, websocket narrowing, ice_servers cast
- Use dict.get() in protobuf serializer
- Make remote_participants Optional in Daily transport

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 15:30:35 -05:00
Mark Backman
bc730e4069 Enable pyright basic type checking for core framework
Add pyright configuration (basic mode, Python 3.10) to pyproject.toml
and fix all 276 type errors in the core framework (everything except
services/ and adapters/). This establishes a CI-ready type checking
baseline as Pipecat approaches 1.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 15:30:35 -05:00
Mark Backman
104d06551a Merge pull request #3679 from pipecat-ai/mb/remove-to-be-updated
Remove SequentialMergePipeline
2026-02-08 15:28:38 -05:00
Mark Backman
90ad2a4e81 Remove SequentialMergePipeline 2026-02-08 14:44:48 -05:00
Mark Backman
570f2d7fc0 Merge pull request #3667 from ianbbqzy/ian/fix-auto-mode-space
[inworld] aggregate_sentence mode needs trailing space
2026-02-07 18:22:32 -05:00
Ian Lee
f3d99adf8f [inworld] aggregate_sentence mode needs trailing space 2026-02-07 15:18:24 -08:00
Mark Backman
d34f416281 Merge pull request #3598 from dhruvladia-sarvam/sarvam-v3-update
ASR and TTS v3 update
2026-02-07 10:51:35 -05:00
Mark Backman
5a1deb7cb4 Merge pull request #3659 from pipecat-ai/mb/change-vad-defaults
Set VADParams stop_secs to 0.2 by default
2026-02-06 23:51:50 -05:00
Mark Backman
a5fc2b1650 Set VADParams stop_secs to 0.2 by default 2026-02-06 23:49:08 -05:00
Aleix Conchillo Flaqué
5cb8d91431 added changelog file for #3616 2026-02-06 16:45:23 -08:00
Aleix Conchillo Flaqué
ce690848c0 Merge pull request #3616 from omChauhanDev/fix/function-call-timeout-task-cleanup
fix: ensure function call timeout task is always cancelled
2026-02-06 16:40:56 -08:00
Aleix Conchillo Flaqué
30f51edfcd Merge pull request #3668 from pipecat-ai/aleix/parallel-pipeline-buffering
Buffer internal frames during ParallelPipeline lifecycle sync
2026-02-06 15:25:32 -08:00
Aleix Conchillo Flaqué
cd03d449cb Update changelog skill with skip rules and allowed types 2026-02-06 15:23:14 -08:00
Aleix Conchillo Flaqué
57df03aade Update CLAUDE.md with PR workflow instructions 2026-02-06 15:23:14 -08:00
Aleix Conchillo Flaqué
4945cfbd8f Buffer internal frames during ParallelPipeline lifecycle synchronization
Processors inside parallel sub-pipelines can push frames during
StartFrame/EndFrame/CancelFrame processing. Previously these frames
could escape the ParallelPipeline before all branches finished
processing the lifecycle frame. Now they are buffered and flushed
after synchronization completes.
2026-02-06 15:15:46 -08:00
Mark Backman
8d37d3bae7 Merge pull request #3666 from pipecat-ai/mb/deepgram-stt-smart-format
DeepgramSTTService: disable smart_format by default
2026-02-06 14:04:37 -05:00
Mark Backman
d7b1624d3c Merge pull request #3663 from lukepayyapilli/fix/stream-close-sambanova-google
fix: close stream on cancellation for SambaNova and Google OpenAI services
2026-02-06 14:02:31 -05:00
Mark Backman
7f65204c3b DeepgramSTTService: disable smart_format by default 2026-02-06 13:45:10 -05:00
Aleix Conchillo Flaqué
97eff414c3 Merge pull request #3660 from pipecat-ai/aleix/interruption-frame-completion-event
Attach asyncio.Event to InterruptionFrame for completion signaling
2026-02-06 10:14:26 -08:00
Aleix Conchillo Flaqué
5b67e76de7 Add changelog for PR #3660 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
b9e79bd06a CLAUDE.md: explain about InterruptionFrame.complete() 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
d5105a78e6 STTMuteFilter should call frame.complete() when InterruptionFrame is blocked 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
a352b2d7a0 Add tests for InterruptionFrame completion event
Add tests for the event-based interruption completion: complete() sets
the event, complete() is safe without an event, the event fires at
the pipeline sink, and a warning is logged when the frame is blocked.

Also remove the unconditional await after the timeout so the function
returns instead of hanging when complete() is never called.
2026-02-06 09:57:24 -08:00
Aleix Conchillo Flaqué
2345090b10 Attach asyncio.Event to InterruptionFrame for completion signaling
Move the interruption wait event from per-processor instance state to
the frame itself. The event is created in
push_interruption_task_frame_and_wait(), threaded through
InterruptionTaskFrame → InterruptionFrame, and set when the frame
reaches the pipeline sink. This scopes the event to each interruption
flow rather than sharing mutable state on the processor.

Also adds a 2s timeout warning to help diagnose cases where
InterruptionFrame.complete() is never called.
2026-02-06 09:57:24 -08:00
Mark Backman
af562bf9a8 Merge pull request #3664 from pipecat-ai/mb/elevenlabs-scribe-v2
Update ElevenLabsSTTService to scribe_v2
2026-02-06 12:31:44 -05:00
Mark Backman
d4993f0dcf Update ElevenLabsSTTService to scribe_v2 2026-02-06 11:37:23 -05:00
Luke Payyapilli
1790a84bfd add changelog 2026-02-06 10:05:02 -05:00
Luke Payyapilli
29c53b99a4 fix: close stream on cancellation for SambaNova and Google OpenAI services 2026-02-06 10:02:40 -05:00
Mark Backman
aa5a855eab Merge pull request #3656 from pipecat-ai/mb/openai-realtime-stt
Add OpenAIRealtimeSTTService
2026-02-06 09:15:58 -05:00
Mark Backman
e66d6f8ffe Merge pull request #3658 from pipecat-ai/mb/bump-protobuf-5.29.6
Upgrade protobuf to >=5.29.6
2026-02-05 19:09:30 -05:00
Mark Backman
b8ac2ba713 Merge pull request #3593 from ianbbqzy/ian/inworld-auto-mode
Add auto_mode support for inworld plugin
2026-02-05 18:16:38 -05:00
Ian Lee
6eea40858e fix lint and changelog 2026-02-05 15:10:36 -08:00
Mark Backman
90700d10aa Upgrade protobuf to >=5.29.6 2026-02-05 18:08:52 -05:00
Mark Backman
fa85f7bbc7 Merge pull request #3640 from lukepayyapilli/fix/openai-stream-close
fix: close stream on cancellation to prevent socket leaks
2026-02-05 18:00:06 -05:00
Mark Backman
669f013970 Merge pull request #3657 from pipecat-ai/filipi/changing_no_audio_log_to_debug
Changing the ‘no audio received’ log from warning to debug.
2026-02-05 17:35:24 -05:00
filipi87
76f63e54e2 Changing the ‘no audio received’ log from warning to debug. 2026-02-05 18:07:14 -03:00
Filipi da Silva Fuchter
cce5a13444 Merge pull request #3650 from pipecat-ai/filipi/twilio_issues
Ignoring RTVI messages inside the Serializers by default.
2026-02-05 15:52:59 -05:00
Mark Backman
d11e1cd631 Update 13k to use ElevenLabsRealtimeSTTService 2026-02-05 15:48:00 -05:00
Mark Backman
8b9da632d1 Add OpenAIRealtimeSTTService 2026-02-05 15:48:00 -05:00
Mark Backman
b36f7892a4 Merge pull request #3654 from pipecat-ai/aleix/more-claude-update
CLAUDE.md: add RTVI and serializers
2026-02-05 15:23:35 -05:00
Mark Backman
9b43cde128 Merge pull request #3355 from itsderek23/user-bot-latency
Add `user_bot_latency_seconds` to OpenTelemetry turn spans
2026-02-05 15:23:15 -05:00
filipi87
6af4d872a8 Refactoring the serializers to ignore the RTVI messages by default. 2026-02-05 16:52:53 -03:00
Ian Lee
22398e1410 add changelog back 2026-02-05 11:39:39 -08:00
Ian Lee
d10467e043 update timestamps reset handling 2026-02-05 11:39:39 -08:00
Ian Lee
cbe131636d add changelog 2026-02-05 11:39:39 -08:00
Ian Lee
fef9e3ea32 Add auto_mode support for inworld plugin 2026-02-05 11:39:39 -08:00
Mark Backman
56d8ef2bf4 Deprecate UserBotLatencyLogObserver, update 29 example 2026-02-05 14:29:45 -05:00
Derek Haynes
8791559351 Add changelog entry for PR #3355 2026-02-05 14:29:45 -05:00
Derek Haynes
f6c919354f Add test for user bot latency 2026-02-05 14:29:45 -05:00
Derek Haynes
93138466d6 Feat: Add user-bot latency to OTel turn spans
This adds user-to-bot response latency tracking to OpenTelemetry spans:

- Created UserBotLatencyObserver as a reusable component for tracking
user-to-bot response latency
- Records the value as an attribute on turn spans (turn.user_bot_latency_seconds)
- Updated TurnTraceObserver to use UserBotLatencyObserver, following the same pattern as TurnTrackingObserver
- Updated PipelineTask to automatically create and wire UserBotLatencyObserver
when tracing is enabled (same as TurnTrackingObserver)
2026-02-05 14:29:42 -05:00
Mark Backman
5a5a98b497 Merge pull request #3649 from itsderek23/fix/tracing-orphan-spans
Fix orphan otel spans during flow initialization and transitions
2026-02-05 14:23:52 -05:00
Aleix Conchillo Flaqué
2b4f507d37 CLAUDE.md: add RTVI and serializers 2026-02-05 11:06:00 -08:00
Mark Backman
d6f3a90662 Merge pull request #3652 from pipecat-ai/mb/upgrade-small-webrtc-prebuilt-2.1.0
Upgrade pipecat-ai-small-webrtc-prebuilt to 2.1.0
2026-02-05 13:48:54 -05:00
Derek Haynes
8fb0e37965 Update changelog for #3649 2026-02-05 11:35:22 -07:00
Derek Haynes
0d45b48f7b Fix import placement 2026-02-05 11:26:58 -07:00
Mark Backman
6af4520b1f Merge pull request #3635 from pipecat-ai/filipi/fix_websocket
Fixed an error in the WebSocket transport that occurred when an InputTransportMessageFrame was received and broadcast.
2026-02-05 12:22:59 -05:00
filipi87
ba469e5645 Add changelog entry
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 12:19:51 -05:00
Mark Backman
bd12b60b5c Merge pull request #3614 from okue/fix/websocket-broadcast-frame-misuse
fix: pass frame class instead of instance to broadcast_frame in websocket transports
2026-02-05 12:19:03 -05:00
Mark Backman
54db37ea47 Upgrade pipecat-ai-small-webrtc-prebuilt to 2.1.0 2026-02-05 12:09:51 -05:00
filipi87
752e16f553 Ignoring RTVI messages inside TwilioSerializer by default. 2026-02-05 10:51:03 -03:00
Derek Haynes
7c7408a048 Fix orphan spans in tracing during flow initialization and transitions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 06:06:13 -07:00
Mark Backman
8f42343927 Merge pull request #3630 from pipecat-ai/mb/add-function-call-messages-rtvi
Add native RTVI function call lifecycle messages
2026-02-04 16:20:42 -05:00
Mark Backman
46da6cd91b Update changelogs 2026-02-04 11:19:30 -05:00
Mark Backman
ecb02d9049 Bump RTVI_PROTOCOL_VERSION to 1.2.0 2026-02-04 11:17:38 -05:00
Mark Backman
cc68e00125 Deprecate llm-function-call message 2026-02-04 11:17:23 -05:00
Mark Backman
e0e3b5250b Add RTVIObserverParams to control what information is included in function call events 2026-02-04 11:05:05 -05:00
Luke Payyapilli
55a3b10e70 fix(openai): close stream on cancellation to prevent socket leaks 2026-02-04 09:59:10 -05:00
dhruvladia-sarvam
e6b06414b3 change default speaker for bulbul:v3-beta to shubh 2026-02-04 16:46:35 +05:30
Aleix Conchillo Flaqué
6bcfb40d12 Merge pull request #3636 from pipecat-ai/aleix/initial-claude-md
initial CLAUDE.md
2026-02-03 19:31:16 -08:00
Aleix Conchillo Flaqué
65b1a8ce36 initial CLAUDE.md 2026-02-03 18:04:54 -08:00
Mark Backman
2db3d94d06 Merge pull request #3628 from pipecat-ai/mb/broadcast-speech-control-params-frame
Fix: Broadcast SpeechControlParamsFrame from VADController
2026-02-03 18:44:15 -05:00
Mark Backman
2a26b9f7a3 Fix: Broadcast SpeechControlParamsFrame from VADController 2026-02-03 18:40:39 -05:00
Aleix Conchillo Flaqué
4f77c532fb Merge pull request #3623 from pipecat-ai/aleix/pipeline-task-rtvi-always-set-bot-ready
PipelineTask: also call set_bot_ready() for external RTVI processors
2026-02-03 14:21:03 -08:00
Aleix Conchillo Flaqué
c3a4da4a29 PipelineTask: also call set_bot_ready() for external RTVI processors 2026-02-03 14:16:08 -08:00
Mark Backman
84ca0b6d58 Merge pull request #3629 from pipecat-ai/fix/telephony-websocket-stopasynciteration
Fix StopAsyncIteration in parse_telephony_websocket
2026-02-03 12:10:07 -05:00
Mark Backman
c1857d255d Avoid nesting try/excepts 2026-02-03 12:00:04 -05:00
Mark Backman
d50ec33079 Merge pull request #3542 from lukepayyapilli/fix/terminal-frames-uninterruptible
fix: make EndFrame and StopFrame uninterruptible to prevent pipeline freeze
2026-02-03 10:08:17 -05:00
Mark Backman
40c84faff5 Remove handle_function_call_start 2026-02-03 10:00:59 -05:00
Mark Backman
84cd9346f9 Add native RTVI function call lifecycle messages 2026-02-03 10:00:59 -05:00
Luke Payyapilli
5d5b19e1d2 Add changelog entry 2026-02-03 09:12:59 -05:00
Luke Payyapilli
8d3e10f054 Make EndFrame and StopFrame uninterruptible to prevent pipeline freeze 2026-02-03 09:12:59 -05:00
dhruvladia-sarvam
1665ce181a refactor(sarvam): centralize model configuration with dataclasses 2026-02-03 14:33:41 +05:30
James Hush
803a20cc00 Fix formatting: remove extra blank line
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:46:44 +08:00
James Hush
90bead06ab Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-03 16:42:13 +08:00
James Hush
b427d534ae Add tests for parse_telephony_websocket StopAsyncIteration handling
Tests cover:
- No messages received (raises ValueError)
- One message received (logs warning, continues)
- Two messages received (normal operation)
- All telephony providers (Twilio, Telnyx, Plivo, Exotel)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:33:36 +08:00
James Hush
b030f1178d Add changelog and improve docstring for parse_telephony_websocket
- Added changelog entry for bug fix
- Enhanced docstring with Args and Raises sections

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:26:09 +08:00
James Hush
a627597bca Fix StopAsyncIteration in parse_telephony_websocket
Handle WebSocket disconnections gracefully when telephony providers send
fewer messages than expected. Adds explicit StopAsyncIteration handling
for both first and second message retrieval.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:25:07 +08:00
Aleix Conchillo Flaqué
4c10ddb7bb upgrade uv.lock 2026-02-02 16:25:06 -08:00
Mark Backman
a4e499dc80 Merge pull request #3617 from pipecat-ai/fix/cjk-sentence-splitting
Fix sentence splitting for CJK and other non-Latin languages
2026-02-02 18:16:51 -05:00
Mark Backman
ca49acfaa6 Merge pull request #3619 from pipecat-ai/mb/resemble-readme
Resemble cleanup
2026-02-02 09:20:11 -05:00
Mark Backman
86147f15f3 Renumber the Resemble foundational example 2026-02-02 09:07:05 -05:00
Mark Backman
5cda72d138 Add Resemble TTS to README 2026-02-02 09:05:03 -05:00
Mark Backman
54e62a8177 Merge pull request #3134 from pipecat-ai/mb/resemble-tts-draft
Add ResembleAITTSService
2026-02-02 08:59:27 -05:00
Mark Backman
a592b7fdf0 Update per PR 1789, align with ErrorFrame norms 2026-02-02 08:55:29 -05:00
Mark Backman
ba2b7c05d6 Add ResembleAITTSService 2026-02-02 08:55:27 -05:00
James Hush
774041e9a1 Add changelog for PR #3617
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:47:22 +08:00
James Hush
763002f2bc Fix sentence splitting for CJK and other non-Latin languages in TTS pipeline
NLTK's sent_tokenize() only supports ~15 European languages and defaults to
English. For Japanese, Chinese, Korean, Hindi, Arabic, and other non-Latin
languages, NLTK fails to recognize sentence boundaries like 。?! causing
text to accumulate until flush instead of being emitted sentence-by-sentence.

Add a fallback in match_endofsentence() that scans for unambiguous non-Latin
sentence-ending punctuation when NLTK fails to split the text. Latin
punctuation (. ! ? ; …) is excluded from the fallback since NLTK handles
those correctly and they can be ambiguous (abbreviations, decimals, etc.).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:27:49 +08:00
Om Chauhan
50dedf350d fix: ensure function call timeout task is always cancelled 2026-02-02 08:38:54 +05:30
okue
d3ecbb11c1 fix: pass frame class instead of instance to broadcast_frame in websocket transports
broadcast_frame() expects a frame class and kwargs, but the three
websocket input transports (fastapi, client, server) were incorrectly
passing a frame instance. This would cause a TypeError at runtime when
an InputTransportMessageFrame was received.
2026-02-01 20:38:34 +09:00
Aleix Conchillo Flaqué
f453227ba3 Merge pull request #3612 from pipecat-ai/aleix/use-kokoro-onnx
KokoroTTSService: use kokoro-onnx instead of kokoro
2026-01-31 21:03:55 -08:00
Aleix Conchillo Flaqué
52cc64019a Merge pull request #3611 from pipecat-ai/aleix/aicoustics-example-update
examples: update 07zd to use vad_analyzer in LLMUserAggregator
2026-01-31 21:02:50 -08:00
Aleix Conchillo Flaqué
95689cc81c KokoroTTSService: use kokoro-onnx instead of kokoro 2026-01-31 17:20:27 -08:00
Aleix Conchillo Flaqué
675c7c43e3 examples: update 07zd to use vad_analyzer in LLMUserAggregator 2026-01-31 15:31:15 -08:00
Aleix Conchillo Flaqué
bfd19e867c Merge pull request #3610 from pipecat-ai/aleix/dont-add-rtvi-observer-if-already-there
PipelineTask: don't add RTVIObserver if already there
2026-01-31 14:57:52 -08:00
Aleix Conchillo Flaqué
acc9923c0a PipelineTask: don't add RTVIObserver if already there 2026-01-31 14:54:29 -08:00
Mark Backman
bdc9e7e2e4 Merge pull request #3608 from pipecat-ai/mb/quickstart-0.0.101
Update quickstart for 0.0.101
2026-01-31 10:39:17 -05:00
Mark Backman
a587e1b99a Update quickstart for 0.0.101 2026-01-31 09:52:24 -05:00
Aleix Conchillo Flaqué
7853e5ca93 Merge pull request #3606 from pipecat-ai/changelog-0.0.101
Release 0.0.101 - Changelog Update
2026-01-30 22:58:22 -08:00
aconchillo
614b8e1a62 Update changelog for version 0.0.101 2026-01-30 22:54:31 -08:00
Aleix Conchillo Flaqué
ef51c2a5c6 changelog: fix 3582 changed file 2026-01-30 22:48:26 -08:00
Aleix Conchillo Flaqué
f42dc0d38e Merge pull request #3605 from pipecat-ai/aleix/gemini-live-schedule-transcription-timeout-handler
GeminiLiveLLMService: let the transcription timeout handler be scheduled
2026-01-30 22:44:05 -08:00
Aleix Conchillo Flaqué
d87f3543c7 GeminiLiveLLMService: let the transcription timeout handler be scheduled 2026-01-30 22:41:10 -08:00
Aleix Conchillo Flaqué
fee633cb92 scripts(evals): disable kokoro for now 2026-01-30 21:23:42 -08:00
Aleix Conchillo Flaqué
607af91153 Merge pull request #3604 from pipecat-ai/mb/fix-ivr-navigator-aggregation
Fix IVRNavigator to push AggregatedTextFrame when switching to conver…
2026-01-30 21:22:20 -08:00
Mark Backman
e779233918 Fix IVRNavigator to push AggregatedTextFrame when switching to conversation mode 2026-01-30 21:07:49 -05:00
Aleix Conchillo Flaqué
604d5d0b14 examples: update 07zi and 07zj to use vad_analyzer form LLMUserAggregator 2026-01-30 16:14:02 -08:00
Mark Backman
342ae7af41 Merge pull request #3601 from pipecat-ai/mb/add-22-release-evals
Add 22 foundational to release evals
2026-01-30 15:31:54 -05:00
Mark Backman
c92ec1552e Add 22 foundational to release evals 2026-01-30 15:12:52 -05:00
Aleix Conchillo Flaqué
93160f1455 scripts(evals): remove vad_analyzer from transport 2026-01-30 12:08:12 -08:00
Aleix Conchillo Flaqué
e3158e1131 Merge pull request #3600 from pipecat-ai/aleix/llm-server-timeout-task-never-waited
LLMService: make sure function call timeout handler is started
2026-01-30 12:01:18 -08:00
Mark Backman
63a23246d5 Add UserTurnCompletionLLMServiceMixin (#3518)
* Added UserTurnCompletionLLMServiceMixin class

* Added 22-filter-incomplete-turns.py foundational example

* Removed old 22 natural conversation foundational examples

* Added test_user_turn_completion_mixin.py
2026-01-30 14:57:15 -05:00
Aleix Conchillo Flaqué
569ea9849a Merge pull request #3599 from pipecat-ai/aleix/release-evals-disable-rtvi
scripts(evals): disable RTVI
2026-01-30 11:44:46 -08:00
Aleix Conchillo Flaqué
a98ca9b65b LLMService: make sure function call timeout handler is started 2026-01-30 11:38:26 -08:00
Aleix Conchillo Flaqué
c9310789dc scripts(evals): use new vad_analyzer from LLMUSerAggregator 2026-01-30 10:57:17 -08:00
Aleix Conchillo Flaqué
b93e12d701 scripts(evals): disable RTVI 2026-01-30 10:52:38 -08:00
Aleix Conchillo Flaqué
3f77da627d Merge pull request #3583 from pipecat-ai/aleix/move-vad-analyzer-to-llm-user-aggregator
VAD analyzer is now passed to LLMUserAggregator
2026-01-30 10:46:10 -08:00
Aleix Conchillo Flaqué
35d265770d LLMUserAggregator: don't process certain self-queued frames 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
9632efec8c VADProcessor: broadcast frames 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
27dbfa1eda NvidiaTTSService: return AsyncIterator instead of AsyncIterable 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
183c0aa4ef LLMUserAggregator: queue frames internally so strategies and controllers can process them 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
a69a037ffa changelog: add updates for #3583 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
c46e7f5da0 TurnAnalyzerUserTurnStopStrategy: only update vad params if frame contains vad 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
307aeaeda0 examples: update with LLMUserAggregatorParams vad_analyzer and VADProcessor 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
305ab44132 tests: add unittest.main() call 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
b486f35c70 audio: add new VADProcessor 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
c92080b0d2 LLMUserAggregator: add vad_analyzer and use VADController 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
ddfedaf478 audio(vad): add new VADController 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
b1ad4d5ab0 BaseInputTransport: deprecate vad_analyzer 2026-01-30 10:07:33 -08:00
Aleix Conchillo Flaqué
0857aa87be Merge pull request #3595 from pipecat-ai/aleix/add-kokoro-tts-support
services(tss): add new KokoroTTSService
2026-01-30 09:49:05 -08:00
Aleix Conchillo Flaqué
fd3c5f69b7 upgrade uv.lock 2026-01-30 09:41:33 -08:00
Aleix Conchillo Flaqué
72ab329513 services(tss): add new KokoroTTSService 2026-01-30 09:39:01 -08:00
Filipi da Silva Fuchter
7999d08b7e Merge pull request #3052 from Navigate-AI/fork/main
Include pts in video and audio frames in SmallWebRTCClient
2026-01-30 09:03:29 -05:00
dhruvladia-sarvam
57821cf709 fix 2026-01-30 16:07:52 +05:30
dhruvladia-sarvam
18045582a9 ASR and TTS v3 update 2026-01-30 15:53:06 +05:30
Mark Backman
7be2b8cc34 Merge pull request #3587 from pipecat-ai/mb/gradium-improvements
GradiumSTTService now flushes pending transcripts on VAD stopped dete…
2026-01-29 18:11:25 -05:00
Aleix Conchillo Flaqué
671cc8eb74 Merge pull request #3590 from pipecat-ai/aleix/custom-cli-runner-args
runner: allow custom CLI arguments
2026-01-29 13:53:27 -08:00
Aleix Conchillo Flaqué
b4dce656f0 Merge pull request #3594 from pipecat-ai/aleix/user-turn-controller-reset-timeout-on-interims
UserTurnController: reset user turn timeout with interim transcriptions
2026-01-29 13:12:44 -08:00
Aleix Conchillo Flaqué
253a1d1114 UserTurnController: reset user turn timeout with interim transcriptions 2026-01-29 13:10:10 -08:00
Aleix Conchillo Flaqué
ca613bcb79 Merge pull request #3592 from pipecat-ai/aleix/broadcast-frame-no-deepcopy
don't deep copy fields when broadcasting frames
2026-01-29 11:50:20 -08:00
Aleix Conchillo Flaqué
0423acd8a0 STTService: just clear buffer before running run_stt() 2026-01-29 11:47:57 -08:00
Aleix Conchillo Flaqué
7eabaaa0ef FrameProcessors: do not deepcopy fields when broadcasting frames 2026-01-29 11:47:57 -08:00
Aleix Conchillo Flaqué
bbb8b53d03 runner: allow custom CLI arguments 2026-01-29 10:15:53 -08:00
Aleix Conchillo Flaqué
f3b72e9263 Merge pull request #3585 from pipecat-ai/aleix/improve-piper-tts-support
improve Piper TTS support
2026-01-29 08:36:13 -08:00
Mark Backman
31c7fbc5ba Add delay_in_frames and language support 2026-01-29 10:59:04 -05:00
Mark Backman
6ab12626d6 GradiumSTTService now flushes pending transcripts on VAD stopped detection 2026-01-29 10:26:17 -05:00
Mark Backman
b77a50de73 Merge pull request #3529 from lukepayyapilli/fix/llm-timeout-without-retry
feat: handle exceptions for BaseOpenAILLMService
2026-01-29 09:12:54 -05:00
Luke Payyapilli
433c1b9b92 add catch-all exception handler per review feedback 2026-01-29 09:07:06 -05:00
Aleix Conchillo Flaqué
bd00587092 changelog: add files for 3585 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
5a85e27cc5 PiperHttpTTSService: allow passing a voice id 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
11daa43b1b TTSService: resample _stream_audio_frames_from_iterator() input audio if needed 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
875614ff7a tts: add support for local PiperTTSService 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
eb1bf1e446 tts: rename PiperTTSService to PiperHttpTTSService 2026-01-28 23:27:32 -08:00
mattie ruth backman
7456a0a55f Fix the /start and /offer/api proxy endpoints for smallWebRTC to match pipecat cloud behavior WRT requestData 2026-01-28 15:25:13 -05:00
Filipi da Silva Fuchter
27277ed3d9 Merge pull request #3571 from pipecat-ai/filipi/funcion_call_improvements
Function call improvements
2026-01-28 14:03:40 -05:00
filipi87
5543bc56f3 Add changelog files for PR #3571
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 15:43:59 -03:00
filipi87
c8496dfb8e Updated the examples which use UserImageRequestFrame to defer the function call result. 2026-01-28 15:39:21 -03:00
filipi87
d3f4cbb620 Providing a way to defer the function call results. 2026-01-28 15:39:06 -03:00
filipi87
c9f922c479 Removed an overridden method that was identical to the parent implementation. 2026-01-28 15:38:40 -03:00
Aleix Conchillo Flaqué
49bd3da26b Merge pull request #3582 from pipecat-ai/aleix/daily-sample-room-url
rename DAILY_SAMPLE_ROOM_URL to DAILY_ROOM_URL
2026-01-28 10:38:14 -08:00
Aleix Conchillo Flaqué
f3ef488925 rename DAILY_SAMPLE_ROOM_URL to DAILY_ROOM_URL 2026-01-28 10:05:27 -08:00
Aleix Conchillo Flaqué
4f08098917 Merge pull request #3580 from Pulkit0729/fix/livekit
fix: adding missing livekit transport configs
2026-01-28 10:04:34 -08:00
Pulkit
a7cd5b0322 fix: adding missing livekit transport configs 2026-01-28 23:15:03 +05:30
Aleix Conchillo Flaqué
55dadc9118 tests(genesys): fix formatting 2026-01-28 09:15:42 -08:00
Aleix Conchillo Flaqué
01bbf61e0d Merge pull request #3500 from ssillerom/feature/genesys_serializer
Feature/genesys serializer
2026-01-28 09:09:11 -08:00
ssillerom
10fb77c0e2 added changelog file 2026-01-28 18:07:33 +01:00
ssillerom
2612fae527 ruff linting 2026-01-28 18:02:51 +01:00
ssillerom
c5be67f293 fix: create disconnect message passing output vars 2026-01-28 17:56:21 +01:00
kompfner
312caaba86 Merge pull request #3429 from lukepayyapilli/fix/gemini-live-interrupted-signal
feat: handle server_content.interrupted for faster interruptions
2026-01-28 10:25:36 -05:00
Luke Payyapilli
ff0eb6d286 fix: emit ErrorFrame on LLM completion timeout 2026-01-28 09:44:32 -05:00
ssillerom
ef6bbace98 fixes: super init inhereted class to set event hanlders in the construct 2026-01-28 15:40:24 +01:00
Filipi da Silva Fuchter
06ec21387f Merge pull request #3581 from pipecat-ai/filipi/open_ai_audio_duration
Fixed race condition in OpenAIRealtimeLLMService
2026-01-28 07:42:35 -05:00
filipi87
bdae177125 Adding changelog entry for the OpenAiRealtimeLLMService fix. 2026-01-28 08:39:11 -03:00
filipi87
468e159f9b Fixed race condition in OpenAIRealtimeLLMService that could cause an error when truncating the conversation. 2026-01-28 08:36:31 -03:00
ssillerom
a4acafd3be feature: added event handlers in constructor and call func in each _handle_* func 2026-01-28 10:54:26 +01:00
ssillerom
105824a372 Merge main into feature/genesys_serializer
Incorporates latest changes from main branch including:
- AIC filter and VAD updates
- STT service improvements
- Base serializer changes
- Various bug fixes
2026-01-28 10:48:56 +01:00
ssillerom
55e0d4ecc4 ruff fixes done 2026-01-28 08:59:28 +01:00
ssillerom
9102e81cb8 added tests to the PR 2026-01-27 23:39:43 +01:00
ssillerom
d7d8e93a3d feature: added custom params in closed message to genesys, simplified create_* functions, simplified constructor method and simplified opened message 2026-01-27 23:36:47 +01:00
Mark Backman
bf9b166464 Merge pull request #3575 from pipecat-ai/mb/fix-turn-stopped-event-end-cancel-frame
Emit on_assistant_turn_stopped and on_user_turn_stopped from EndFrame…
2026-01-27 14:55:34 -05:00
Mark Backman
e80e0eab29 Emit on_assistant_turn_stopped and on_user_turn_stopped from EndFrame or CancelFrame 2026-01-27 14:50:10 -05:00
Mark Backman
61242e6575 Merge pull request #3574 from pipecat-ai/mb/fix-websocket-close-message-handling
Fix WebsocketService infinite loop on graceful server disconnect
2026-01-27 13:53:26 -05:00
Aleix Conchillo Flaqué
8841387121 Merge pull request #3560 from pipecat-ai/aleix/serializer-base-objects
FrameSerializer: subclass from BaseObject so we can add events
2026-01-27 09:58:44 -08:00
Aleix Conchillo Flaqué
ee695ae9fe FrameSerializer: subclass from BaseObject so we can add events 2026-01-27 09:53:46 -08:00
Mark Backman
52012b0fb2 Fix WebsocketService infinite loop on graceful server disconnect 2026-01-27 12:41:28 -05:00
Mark Backman
f7a1c6b719 Merge pull request #3408 from ai-coustics/aic-v2
Add ai-coustics AIC SDK v2 support with model downloading
2026-01-27 10:38:26 -05:00
Gökmen Görgen
6aa77ccc13 group aic related changes in changelog. 2026-01-27 16:22:54 +01:00
Gökmen Görgen
45b7ec4e2c re-enable 07zd-interruptible-aicoustics.py in release evals. 2026-01-27 16:18:56 +01:00
Mark Backman
1c434c6ad5 Merge pull request #3562 from speechmatics/fix/smx-ttfs-finals
Support TTFS for Speechmatics STT
2026-01-27 08:35:34 -05:00
Mark Backman
4591affba9 Merge pull request #3568 from pipecat-ai/mb/changelog-3536 2026-01-27 07:14:41 -05:00
Sam Sykes
91346f5f37 Add support for self.request_finalize() for Pipecat-based VAD. 2026-01-27 10:44:35 +00:00
Filipi da Silva Fuchter
6a66ebe332 Merge pull request #3541 from pipecat-ai/filipi/audio_buffer
Refactoring AudioBufferProcessor to fix audio track synchronization.
2026-01-27 05:32:41 -05:00
Filipi da Silva Fuchter
c1d4180042 Merge pull request #3567 from pipecat-ai/filipi/openai_realtime_audio_duration
Fixed race condition in OpenAIRealtimeBetaLLMService
2026-01-27 05:30:33 -05:00
Gökmen Görgen
81a53c699c handle AIC processor init errors gracefully and ensure _aic_ready reflects readiness 2026-01-27 11:28:05 +01:00
Sam Sykes
60168f7f69 remove comment 2026-01-26 23:16:43 +00:00
Sam Sykes
23d7608e5f changelog update 2026-01-26 23:15:30 +00:00
Sam Sykes
99242c0a93 linting updates 2026-01-26 23:14:40 +00:00
Sam Sykes
3a71865cf4 removed old metrics 2026-01-26 23:11:25 +00:00
Mark Backman
ecf2e69f3f Merge pull request #3536 from surapuramakhil/main
LLMAssistantAggregator: preserve non-ASCII characters in JSON output
2026-01-26 16:42:05 -05:00
Mark Backman
febd52274d Add changelog fragment for PR 3536 2026-01-26 16:42:00 -05:00
Mark Backman
1542d922e7 Merge pull request #3546 from pipecat-ai/pk/changelog-fragment-for-pr-3406
Added a changelog fragment for PR 3406
2026-01-26 16:31:57 -05:00
Paul Kompfner
15d5d1159e Added a changelog fragment for PR 3406 2026-01-26 16:27:33 -05:00
Mark Backman
884630a6bd Merge pull request #3559 from pipecat-ai/aleix/transport-broadcast-fixes
transports: fix broadcast_frame_class reference
2026-01-26 16:25:31 -05:00
Mark Backman
1cf137c6a8 Merge pull request #3565 from pipecat-ai/markbackman-patch-1 2026-01-26 15:49:35 -05:00
filipi87
98fcfd7c91 Adding changelog entry for the OpenAiRealtimeBetaLLMService fix. 2026-01-26 17:19:08 -03:00
filipi87
2f23f2e39c Fixed race condition in OpenAIRealtimeBetaLLMService that could cause an error when truncating the conversation. 2026-01-26 17:08:27 -03:00
Mark Backman
9c6b11cecf Update README links to use absolute URLs 2026-01-26 13:03:39 -05:00
Sam Sykes
fc1444c9d6 Updated changelog 2026-01-26 16:25:37 +00:00
Sam Sykes
ea94939add update dependency 2026-01-26 16:24:56 +00:00
Sam Sykes
0c69ae6371 Changelog entry. 2026-01-26 16:07:59 +00:00
Sam Sykes
8b88280bb1 Default to using EXTERNAL mode. 2026-01-26 15:52:42 +00:00
Sam Sykes
960d0faea5 support is_eou for final segment in utterance 2026-01-26 15:48:04 +00:00
Luke Payyapilli
b9390ccb1b Address review: remove UserStartedSpeakingFrame, add explanatory comment 2026-01-26 10:08:17 -05:00
Mark Backman
061a0dc43d Merge pull request #3498 from pipecat-ai/mb/azure-tts-8khz-workaround
AzureTTSService 8khz workaround
2026-01-26 09:48:22 -05:00
Mark Backman
328bbe069f Merge pull request #3554 from pipecat-ai/mb/simplify-stt-ttfb
Simplify STT finalize handling
2026-01-26 08:00:04 -05:00
Mark Backman
dc32ecc872 Merge pull request #3555 from pipecat-ai/mb/speechmatics-stt-ttfb
Align Speechmatics STT TTFB metrics with STT classes
2026-01-26 07:59:34 -05:00
Gökmen Görgen
ca2eb1904f Merge remote-tracking branch 'origin/aic-v2' into aic-v2 2026-01-26 10:16:23 +01:00
Gökmen Görgen
4bce58f270 update changelog and remove outdated dependency notes 2026-01-26 10:15:15 +01:00
Gökmen Görgen
7572d63f8f Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:40 +01:00
Gökmen Görgen
3c463c9416 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:33 +01:00
Gökmen Görgen
bd618d64e3 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:16 +01:00
Gökmen Görgen
a824660df7 add unit tests for AICVADAnalyzer and AICFilter. 2026-01-26 09:56:36 +01:00
Gökmen Görgen
58b9019852 bump aic-sdk to 2.0.1 in optional dependencies. 2026-01-26 09:14:16 +01:00
Gökmen Görgen
afcdef8c81 docstring clarification. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
bd92104fb3 clarify voice confidence method behavior in AIC VAD. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
34e9f224a8 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
dca7f3b5b0 add changelog. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
70a85cd192 use path for keeping the consistency between the parameters. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
91e86658b7 force developer to set a license key, it's required. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
0a8588669c address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
0e99400148 two dots are rust specific thinks, I'm not sure if it's familiar for Python developers. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
648f20db6d Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
09b5b6b12d Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
0e6a423955 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
dc8972cd94 log optimal number of frames for given sample rate in AICFilter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
e4e2231958 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
18b3ee743b replace os with pathlib.Path in AICFilter for path handling consistency. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
65b8e0e89c rename enabled to bypass in AICFilter for clarity. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
b77f8b065f remove voice gain. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
5fd43faec3 add min speech duration. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
abebcf37bd address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
ca4e3c79f9 Update pyproject.toml
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
e8d1bec03b Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
f0cc54589e remove enhancement level parameter from AICFilter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
22b9aac2ff use quail model in the example. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
7f86f4ac27 fix class name. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
dcab79753b even the parameters are fixed, keep aic ready for processing. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
bdded9b026 set SDK ID for telemetry in AIC filter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
1e1e275fea address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
effb6aa8f4 clean up unused imports in audio utils. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a4a9bae79e drop v1 support from aic. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
c943ef9261 keep uv.lock as it is. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
f05809520b Remove outdated AIC Filter and VAD v2 files, migrate to consolidated implementations.
Added the new ACIFilter to the same module.
2026-01-26 08:44:17 +01:00
Gökmen Görgen
ec17dc6626 aic-sdk-py v2.
# Conflicts:
#	uv.lock

# Conflicts:
#	examples/foundational/07zd-interruptible-aicoustics.py
#	pyproject.toml
#	src/pipecat/audio/filters/aic_filter.py
#	src/pipecat/audio/vad/aic_vad.py
2026-01-26 08:44:17 +01:00
Gökmen Görgen
4e85e81d9b Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Tobias <76444201+Fl1tzi@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
a1cc88a233 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Tobias <76444201+Fl1tzi@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
61a230ec53 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Stephan Eckes <stephan@steck.tech>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
a13380b574 clean up unused imports in audio utils. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
2a927189d9 reorganize imports. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a90c15362c drop v1 support from aic. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
d3bdd2d246 use new model id. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
465ae4f706 keep uv.lock as it is. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a0d801b658 Remove outdated AIC Filter and VAD v2 files, migrate to consolidated implementations.
Added the new ACIFilter to the same module.
2026-01-26 08:44:17 +01:00
Gökmen Görgen
35919a84e3 aic-sdk-py v2.
# Conflicts:
#	uv.lock
2026-01-26 08:44:17 +01:00
Aleix Conchillo Flaqué
f94a60f381 transports: fix broadcast_frame_class reference 2026-01-25 15:42:09 -08:00
ssillerom
a446bca72d changes: added OutputTransportUrgentFrame to on closed, removed callback 2026-01-25 21:12:28 +01:00
Sergio Sillero
8ae834366b Merge branch 'pipecat-ai:main' into feature/genesys_serializer 2026-01-25 21:04:27 +01:00
Mark Backman
a4acc12f91 Align Speechmatics STT TTFB metrics with STT classes 2026-01-24 18:26:34 -05:00
Mark Backman
e93112e76e Simplify STT finalize handling 2026-01-24 15:28:27 -05:00
Mark Backman
680bcaac66 Merge pull request #3550 from pipecat-ai/mb/update-smart-turn-data-env-var
Update env var to PIPECAT_SMART_TURN_LOG_DATA
2026-01-24 13:52:36 -05:00
Mark Backman
d2ac9006a2 Update env var to PIPECAT_SMART_TURN_LOG_DATA 2026-01-24 12:50:42 -05:00
Mark Backman
bcb019e8ab Add TTFB metrics for STT services (#3495) 2026-01-23 18:47:34 -05:00
kompfner
4ea546785f Merge pull request #3406 from omChauhanDev/fix/openrouter-gemini-messages
fix(openrouter): handle multiple system messages for Gemini models
2026-01-23 14:53:59 -05:00
filipi87
f128cdd19a Adding a changelog entry to the AudioBufferProcessor fix. 2026-01-23 16:16:01 -03:00
filipi87
7921bce4af Refactoring AudioBufferProcessor to fix audio track synchronization. 2026-01-23 16:15:48 -03:00
Luke Payyapilli
cadced3f79 feat: handle server_content.interrupted for faster barge-in response 2026-01-23 10:41:04 -05:00
Aleix Conchillo Flaqué
8951442b8e Merge pull request #3534 from pipecat-ai/aleix/claude-skills-pr-description
claude: add pr-description skill
2026-01-22 17:34:46 -08:00
Aleix Conchillo Flaqué
7e6e3031e7 claude: add pr-description skill 2026-01-22 13:41:50 -08:00
Akhil
3b3c7aa8cc LLMAssistantAggregator: preserve non-ASCII characters in JSON output
Add ensure_ascii=False to json.dumps() calls for tool call arguments
and function call results to prevent unnecessary unicode escaping.
2026-01-22 15:37:44 -06:00
Aleix Conchillo Flaqué
308829f92b Merge pull request #3533 from pipecat-ai/aleix/claude-skills-docstring
claude: add docstring skill
2026-01-22 12:58:38 -08:00
Aleix Conchillo Flaqué
82a799e63e claude: add docstring skill 2026-01-22 12:53:38 -08:00
Cale Shapera
6b5bcae86f change default Inworld TTS model to inworld-tts-1.5-max (#3531) 2026-01-22 14:21:15 -05:00
Mark Backman
836073849c Merge pull request #3527 from weakcamel/patch-1
Update README.md - fix Google Imagen URL
2026-01-22 10:46:10 -05:00
Waldek Maleska
b13b65d6e2 Update README.md - fix Google Imagen URL 2026-01-22 15:17:41 +00:00
Mark Backman
3d545b718d Merge pull request #3344 from omChauhanDev/fix/stt-dynamic-language-update
fix: treat language as first-class STT setting
2026-01-22 09:21:56 -05:00
marcus-daily
f2fa5d9733 Updating changelog 2026-01-22 14:17:59 +00:00
marcus-daily
76b774072c Formatting fixes 2026-01-22 14:17:59 +00:00
marcus-daily
b6341ffaa5 Save Smart Turn input data if SMART_TURN_LOG_DATA is set 2026-01-22 14:17:59 +00:00
Mark Backman
29fae67c9e Merge pull request #3523 from omChauhanDev/add-location-support-google-tts
feat(google): add location parameter to TTS services
2026-01-22 09:12:16 -05:00
Mark Backman
718ea1c15e Merge pull request #3526 from pipecat-ai/mb/remove-logs
Remove application logs
2026-01-22 08:48:07 -05:00
Mark Backman
8e09d94614 Remove application logs 2026-01-22 08:28:52 -05:00
Aleix Conchillo Flaqué
de73e28563 Merge pull request #3510 from omChauhanDev/feat/add-reached-filter-methods
feat(task): add additive filter methods for frame monitoring
2026-01-21 21:05:33 -08:00
Aleix Conchillo Flaqué
55250b4f7e Merge pull request #3521 from pipecat-ai/aleix/claude-changelog-skill
claude: initial /changelog skill
2026-01-21 20:50:47 -08:00
Om Chauhan
281145a991 added changelog 2026-01-22 09:55:57 +05:30
Om Chauhan
7bd32e2fe5 feat(google): add location parameter to TTS services 2026-01-22 09:49:19 +05:30
James Hush
8f05d95f50 feat: add video_out_codec parameter for DailyTransport (#3520)
* feat: add video_out_codec parameter for DailyTransport

Add video_out_codec parameter to TransportParams allowing configuration
of the preferred video codec (VP8, H264, H265) for video output.

When set, this passes the preferredCodec option to Daily's
VideoPublishingSettings during the join operation.

* chore: move video_out_codec parameter to changelog folder (#3522)

* Initial plan

* Move video_out_codec parameter to changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

* Revert all CHANGELOG.md changes, keep only changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-01-22 11:31:07 +08:00
Om Chauhan
87c12f3098 changed frame filter storage type from tuples to sets 2026-01-22 08:43:46 +05:30
Om Chauhan
9c0bf89247 added changelog 2026-01-22 08:43:46 +05:30
Om Chauhan
6e44a2ab49 feat(task): add additive filter methods for frame monitoring 2026-01-22 08:43:46 +05:30
Aleix Conchillo Flaqué
7aa7b86aed claude: initial /changelog skill 2026-01-21 18:43:04 -08:00
Aleix Conchillo Flaqué
5ad9faeb4c Merge pull request #3519 from pipecat-ai/aleix/embedded-rtvi-processor
automatically add RTVI to the pipeline
2026-01-21 18:17:26 -08:00
Aleix Conchillo Flaqué
9e8f8b45c6 added changelog files for #3519 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
0ee11ad333 tests: disable RTVI in tests by default 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
124a3c35af RTVIObserver: don't handle some frames direction 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
054e504868 examples(foundational): remove RTVI (automatically added by PipelineTask) 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
e85a00cc0e PipelineTask: automatically add RTVI processor and RTVI observer
If `enable_rtvi` is enabled (enabled by default) and RTVI processor will be
added automatically to the pipeline. Also, and RTVI observer will be
registered.
2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
cc61cdbba3 RTVIProcessor: add create_rtvi_observer() 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
62f4708d43 transports: broadcast InputTransportMessageFrame frames 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
ba0ddb1832 FrameProcessor: copy kwargs when broadcasting frame 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
eacd2a4b71 FrameProcessor: add broadcast_frame_instance() 2026-01-21 18:14:17 -08:00
Mark Backman
7ed110650d Merge pull request #3516 from okue/minorpatch1
refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
2026-01-21 10:33:59 -05:00
okue
4a724379fc refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
The _bot_speaking flag does not need to be set in this method,
so the redundant assignment has been removed.
2026-01-21 23:59:15 +09:00
Aleix Conchillo Flaqué
768d3958dd Merge pull request #3512 from pipecat-ai/changelog-0.0.100
Release 0.0.100 - Changelog Update
2026-01-20 19:32:56 -08:00
aconchillo
5f9ff8bd58 Update changelog for version 0.0.100 2026-01-20 19:21:19 -08:00
Aleix Conchillo Flaqué
59ed422052 Merge pull request #3511 from pipecat-ai/aleix/camb-tts-client-on-start
CambTTSService: initialize client during StartFrame
2026-01-20 19:17:45 -08:00
Aleix Conchillo Flaqué
7e0ca113af CambTTSService: initialize client during StartFrame 2026-01-20 19:07:12 -08:00
Aleix Conchillo Flaqué
13c52e0e6d Merge pull request #3509 from pipecat-ai/aleix/nvidia-stt-tts-improvements
NVIDIA STT/TTS performance improvements
2026-01-20 16:39:12 -08:00
Aleix Conchillo Flaqué
a787fd9cd8 NVIDIATTSService: process incoming audio frame right away
Process audio as soon as we receive it from the generator. Previously, we were
reading from the generator and adding elements into a queue until there was no
more data, then we would process the queue.
2026-01-20 15:41:05 -08:00
Aleix Conchillo Flaqué
14495c425a NVIDIASTTService: no need for additional queue and task 2026-01-20 13:50:17 -08:00
Aleix Conchillo Flaqué
461bd0a2e0 update changelog for #3494 and #3499 2026-01-20 13:26:40 -08:00
Aleix Conchillo Flaqué
bd45ce2b4e Merge pull request #3499 from lukepayyapilli/fix/livekit-video-queue-memory-leak
fix(livekit): prevent memory leak when video_in_enabled is False
2026-01-20 13:21:21 -08:00
Aleix Conchillo Flaqué
a266644b06 Merge pull request #3494 from omChauhanDev/fix/uninterruptible-frame-handling
fix: preserve UninterruptibleFrames in __reset_process_queue
2026-01-20 13:19:40 -08:00
Mark Backman
03faadd7f9 Merge pull request #3508 from pipecat-ai/ss/log-daily-ids
Log Daily participant and meeting session IDs upon successful join in…
2026-01-20 15:43:48 -05:00
Aleix Conchillo Flaqué
bf43032652 Merge pull request #3504 from pipecat-ai/aleix/nvidia-stt-tts-error-handling
NVIDIA STT/TTS error handling
2026-01-20 09:41:08 -08:00
Sunah Suh
fa6f924b31 Log Daily participant and meeting session IDs upon successful join in Daily Transport 2026-01-20 11:31:17 -06:00
Aleix Conchillo Flaqué
a010a020fd add changelog fo 3504 2026-01-20 09:03:30 -08:00
Aleix Conchillo Flaqué
655006aff5 NvidiaSegmentedSTTService: simplify exception handling 2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
671dc8cd9b NvidiaSTTService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
9a718ded1e NvidiaTTSService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
024809b39a Merge pull request #3503 from pipecat-ai/aleix/ai-service-start-end-cancel
AIService: handle StartFrame/EndFrame/CancelFrame exceptions
2026-01-20 08:56:39 -08:00
Aleix Conchillo Flaqué
6cf0d53d00 AIService: handle StartFrame/EndFrame/CancelFrame exceptions
If AIService subclasses implement start()/stop()/cancel() and exception are not
handled, execution will not continue and therefore the originator frames will
not be pushed. This would cause the pipeline to not be started (i.e. StartFrame
would not be pushed downstream) or stopped properly.
2026-01-20 08:54:22 -08:00
kompfner
778dacc9a8 Merge pull request #3486 from pipecat-ai/pk/fix-nova-sonic-reset-conversation
Fix `AWSNovaSonicLLMService.reset_conversation()`
2026-01-20 10:07:38 -05:00
Paul Kompfner
06b3ecd2d6 In AWS Nova Sonic service, send the "interactive" user message (which triggers the bot response) only after sending the audio input start event, per the AWS team's recommendation 2026-01-20 09:56:25 -05:00
Paul Kompfner
b4d143e39b Add CHANGELOG for fixing AWSNovaSonicLLMService.reset_conversation() 2026-01-20 09:56:25 -05:00
Paul Kompfner
c89083e72e Improve 20e example to ask the bot to give a recap when loading a previous conversation from disk 2026-01-20 09:56:25 -05:00
Luke Payyapilli
1ac811ab32 chore: revert unrelated uv.lock changes 2026-01-20 09:19:43 -05:00
Luke Payyapilli
f6359d460e chore: install livekit as optional extra in CI instead of dev dep 2026-01-20 09:16:16 -05:00
Aleix Conchillo Flaqué
f03a7175c7 Merge pull request #3501 from pipecat-ai/aleix/improve-eval-numerical-word-prompt
scripts(eval): give examples to numerical word answers
2026-01-19 20:22:06 -08:00
Aleix Conchillo Flaqué
aed44c863a scripts(eval): give examples to numerical word answers
Some models need extra help.
2026-01-19 14:37:00 -08:00
ssillerom
fa5da3b0be change comments 2026-01-19 20:49:23 +01:00
ssillerom
7e82a0cf49 feature: Genesys AudioHook WebSocket protocol serializer for Pipecat 2026-01-19 20:45:22 +01:00
Mark Backman
cddd6d5b0a Merge pull request #3492 from pipecat-ai/mb/remove-unused-imports
Remove unused imports
2026-01-19 14:07:16 -05:00
Mark Backman
11cf891ac8 Manual updates for unused imports 2026-01-19 14:03:22 -05:00
Luke Payyapilli
c89ae717fe style: fix ruff formatting 2026-01-19 11:13:41 -05:00
Luke Payyapilli
562bdd3084 test: add livekit to dev deps and improve test clarity 2026-01-19 11:11:54 -05:00
Mark Backman
cc4c3650e1 Merge pull request #3491 from pipecat-ai/mb/update-release-evals
Add Camb TTS to release evals
2026-01-19 11:04:05 -05:00
Luke Payyapilli
dfc1f09b77 fix(livekit): prevent memory leak when video_in_enabled is False 2026-01-19 11:00:23 -05:00
Mark Backman
0b1a4792b8 Bump to latest azure-cognitiveservices-speech version, 1.47.0 2026-01-19 09:52:28 -05:00
Mark Backman
14bd3b1b32 Set Azure TTS default prosody rate to None 2026-01-19 09:19:57 -05:00
Mark Backman
f733e77496 AzureTTS: work around word ordering issue at 8khz sample rate 2026-01-19 09:13:41 -05:00
Filipi da Silva Fuchter
5fc46cc450 Merge pull request #3493 from omChauhanDev/fix/globally-unique-pc-id
fix: make SmallWebRTCConnection pc_id globally unique
2026-01-19 09:04:48 -05:00
Om Chauhan
4a9eb82f92 fix: preserve UninterruptibleFrames in __reset_process_queue 2026-01-18 20:39:13 +05:30
Om Chauhan
990d8386e4 fix: make SmallWebRTCConnection pc_id globally unique 2026-01-18 19:41:51 +05:30
Mark Backman
ce7d823770 Remove unused imports 2026-01-18 08:22:22 -05:00
Mark Backman
0b93c3f900 Add Camb TTS to release evals 2026-01-17 16:27:16 -05:00
Mark Backman
829c5f4604 Merge pull request #3169 from Incanta/hathora
Add Hathora STT and TTS services
2026-01-17 16:25:12 -05:00
Mike Seese
dc8ea615d9 add hathora to run-release-evals.py 2026-01-17 10:33:58 -08:00
Mike Seese
a3d206050d move hathora example as requested 2026-01-17 10:31:08 -08:00
Mike Seese
f48a567873 run the linter 2026-01-17 10:30:47 -08:00
Mark Backman
e69ccd8ea7 Merge pull request #3490 from pipecat-ai/mb/on-user-mute-events
Add on_user_mute_started and on_user_mute_stopped events
2026-01-17 11:05:15 -05:00
Mark Backman
11924bb980 Add on_user_mute_started and on_user_mute_stopped events 2026-01-17 11:01:46 -05:00
Mark Backman
af89154e96 Merge pull request #3489 from pipecat-ai/mb/fix-azure-tts-punctuation-spacing
fix: AzureTTSService punctuation spacing
2026-01-17 11:00:30 -05:00
Mark Backman
1485ea0831 Merge pull request #3488 from pipecat-ai/mb/on-user-turn-idle
Update on_user_idle to on_user_turn_idle
2026-01-17 11:00:16 -05:00
Mark Backman
e22bc777d8 Fix spacing for CJK languages 2026-01-17 09:04:50 -05:00
Mark Backman
043403fe23 fix: AzureTTSService punctuation spacing 2026-01-17 08:18:31 -05:00
Mark Backman
1e1160906e Update on_user_idle to on_user_turn_idle 2026-01-17 07:04:27 -05:00
Aleix Conchillo Flaqué
f7d3e63063 Merge pull request #3474 from pipecat-ai/fix/optional-member-access-function-call-cancel
Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
2026-01-16 22:06:45 -08:00
Paul Kompfner
6fa797c8e4 Fix AWS Nova Sonic reset_conversation(), which would previously error out.
Issues:
- After disconnecting, we were prematurely sending audio messages using the new prompt and content names, before the new prompt and content were created
- We weren't properly sending system instruction and conversation history messages to Nova Sonic with `"interactive": false`
2026-01-16 22:31:54 -05:00
Mark Backman
473d39791b Merge pull request #3482 from pipecat-ai/mb/user-idle-in-user-aggregator
Add UserIdleController, deprecate UserIdleProcessor
2026-01-16 18:47:10 -05:00
Aleix Conchillo Flaqué
2114abb8c6 add changelog file for 3484 2026-01-16 15:46:29 -08:00
Aleix Conchillo Flaqué
4fb4c26f55 Merge pull request #3484 from amichyrpi/main
Remove async_mode parameter from Mem0 storage
2026-01-16 15:44:52 -08:00
Mark Backman
2e8e574ea5 Add UserIdleController, deprecate UserIdleProcessor 2026-01-16 18:44:19 -05:00
Aleix Conchillo Flaqué
84c7e97be2 Merge pull request #3483 from pipecat-ai/aleix/throttle-user-speaking-frame
throttle user speaking frame
2026-01-16 15:29:37 -08:00
Amory Hen
a6e7c99d55 Remove async_mode parameter from Mem0 storage 2026-01-17 00:26:38 +01:00
Aleix Conchillo Flaqué
ac3fa7f91f BaseOuputTransport: minor cleanup 2026-01-16 15:15:49 -08:00
Aleix Conchillo Flaqué
6eadad53b2 BaseInputTransport: throttle UserSpeakingFrame 2026-01-16 15:15:49 -08:00
kompfner
b11150f31f Merge pull request #3480 from pipecat-ai/pk/fix-grok-realtime-smallwebrtc
Fix an issue where Grok Realtime would error out when running with Sm…
2026-01-16 15:46:27 -05:00
Paul Kompfner
836cf60611 Fix an issue where Grok Realtime would error out when running with SmallWebRTC transport.
The underlying issue was related to the fact that we were sending audio to Grok before we had configured the Grok session with our default input sample rate (16000), so Grok was interpreting those initial audio chunks as having its default sample rate (24000). We didn't see this issue when using the Daily transport simply because in our test environments Daily took a smidge longer than a reflexive (localhost) pure WebRTC connection, so we would only send audio to Grok *after* we had configured the Grok session with the desired sample rate.
2026-01-16 15:41:33 -05:00
James Hush
1c13ad95a5 Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
Extract dictionary value to local variable and check for None before
accessing cancel_on_interruption attribute, since the dictionary values
are typed as Optional[FunctionCallInProgressFrame].
2026-01-16 15:04:26 -05:00
Mark Backman
1e8516e91d Merge pull request #3476 from pipecat-ai/mb/project-urls
Update project.urls for PyPI
2026-01-16 14:57:39 -05:00
Mark Backman
32c775311d Merge pull request #3471 from pipecat-ai/mb/fix-pydantic-2.12-docs
Revert pydantic 2.12 extra type annotation
2026-01-16 14:57:24 -05:00
Mark Backman
28d0bb98de Merge pull request #3472 from pipecat-ai/mb/whisker-dev
Add whisker_setup.py setup file to .gitignore
2026-01-16 14:55:48 -05:00
Aleix Conchillo Flaqué
a9a9f3aeaa Merge pull request #3462 from pipecat-ai/aleix/fix-min-words-transcription-aggregation
MinWordsUserTurnStartStrategy: don't aggregate transcriptions
2026-01-16 11:18:23 -08:00
Aleix Conchillo Flaqué
c2a0735975 MinWordsUserTurnStartStrategy: don't aggregate transcriptions
If we aggregate transcriptions we will get incorrect interruptions. For example,
if we have a strategy with min_words=3 and we say "One" and pause, then "Two"
and pause and then "Three", this would trigger the start of the turn when it
shouldn't. We should only look at the incoming transcription text and don't
aggregate it with the previous.
2026-01-16 11:16:06 -08:00
Aleix Conchillo Flaqué
41cb53f6c2 Merge pull request #3479 from pipecat-ai/aleix/turns-mute-to-user-mute
turns: move mute to user_mute
2026-01-16 11:11:50 -08:00
Aleix Conchillo Flaqué
58552af8fd examples(foundational): remote STTMuteFilter example 2026-01-16 11:07:20 -08:00
Aleix Conchillo Flaqué
c7ab87b0cc turns: move mute to user_mute 2026-01-16 11:07:20 -08:00
Mark Backman
11ecc5fdee Update project.urls for PyPI 2026-01-16 12:48:13 -05:00
kompfner
19fb3eed9f Merge pull request #3466 from pipecat-ai/pk/fix-aws-nova-sonic-rtvi-bot-output
Fix realtime (speech-to-speech) services' RTVI event compatibility
2026-01-16 09:56:13 -05:00
Mark Backman
b292b32374 Merge pull request #3461 from glennpow/glenn/websocket-headers
Allow WebsocketClientTransport to send custom headers
2026-01-15 20:26:36 -05:00
Mark Backman
63d1393bb0 Add whisker_setup.py to .gitignore 2026-01-15 20:21:25 -05:00
Glenn Powell
37914cb062 Removed import and added changelog entry. 2026-01-15 16:47:15 -08:00
Mark Backman
ec40696854 Revert pydantic 2.12 extra type annotation 2026-01-15 19:16:15 -05:00
Mike Seese
2249f3d673 add requested changes from code review 2026-01-15 15:27:56 -08:00
Mike Seese
d2df324f29 fix some bugs after testing changes 2026-01-15 15:27:56 -08:00
Mike Seese
67fdb0b659 use parent _settings dict instead of self._params pattern 2026-01-15 15:27:56 -08:00
Mike Seese
e77bdf66f9 add can_generate_metrics functions 2026-01-15 15:27:56 -08:00
Mike Seese
1b3b67779c switch hathora services to use InputParams pattern 2026-01-15 15:27:55 -08:00
Mike Seese
6c7e386391 remove traced_stt from run_stt 2026-01-15 15:27:55 -08:00
Mike Seese
ba25b279d6 fix issues with PR suggestions 2026-01-15 15:27:55 -08:00
Mike Seese
e7c83c19b6 port turn_start_strategies to the newer user_turn_strategies 2026-01-15 15:27:55 -08:00
Mike Seese
7be7fb49a3 remove turn_analyzer args from transport params 2026-01-15 15:27:54 -08:00
Mike Seese
bcccb4cbb3 put fallback sample_rate value in function arg 2026-01-15 15:27:54 -08:00
Mike Seese
e9f1d951d3 Apply suggestions from code review
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-01-15 15:27:54 -08:00
Mike Seese
e5632a9339 transition Hathora service to use the unified API and apply PR feedback
add Hathora to root files

Hathora run linter

added hathora changelog
2026-01-15 15:27:53 -08:00
Mike Seese
1510fb4fc0 add Hathora STT and TTS services 2026-01-15 15:26:52 -08:00
Mark Backman
64a1ad2649 Merge pull request #3470 from pipecat-ai/mb/fix-docs-0.0.99
Docs fixes after 0.0.99
2026-01-15 17:34:44 -05:00
Mark Backman
4458ca1d24 Mock FastAPI 2026-01-15 17:29:47 -05:00
Mark Backman
21aaa48e62 Fix pydantic issues impacting autodoc 2026-01-15 17:29:47 -05:00
Mark Backman
e75c241030 Merge pull request #3468 from pipecat-ai/mb/camb-cleanuo
Clean up CambTTSService
2026-01-15 17:16:28 -05:00
Mark Backman
60216048a8 Docs fixes after 0.0.99 2026-01-15 16:40:42 -05:00
Mark Backman
f3c2e29fb4 Clean up CambTTSService 2026-01-15 15:59:17 -05:00
Paul Kompfner
ce99924be4 Add CHANGELOG entry describing fix for the missing "bot-llm-text" RTVI event when using realtime (speech-to-speech) services 2026-01-15 15:55:39 -05:00
Paul Kompfner
5de80a60d4 Fix "bot-llm-text" not firing when using Grok Realtime 2026-01-15 15:30:00 -05:00
Paul Kompfner
5753762350 Fix "bot-llm-text" not firing when using OpenAI Realtime 2026-01-15 15:16:08 -05:00
Paul Kompfner
885b318b04 Fix "bot-llm-text" not firing when using Gemini Live 2026-01-15 15:03:45 -05:00
Paul Kompfner
7a22d58cf4 Fix "bot-llm-text" not firing when using AWS Nova Sonic 2026-01-15 14:56:50 -05:00
Mark Backman
c8e4b462c9 Merge pull request #3460 from pipecat-ai/mb/reorder-07-examples
Renumber the 07 foundational examples
2026-01-15 14:44:21 -05:00
Mark Backman
30a3f42255 Merge pull request #3349 from eRuaro/feat/camb-tts-integration
Add Camb.ai TTS integration with MARS models
2026-01-15 14:43:12 -05:00
Neil Ruaro
26ddb2de2f minimal uv.lock update for camb-sdk 2026-01-16 03:18:01 +08:00
Neil Ruaro
f60eeaa212 reverted uv.lock, updated readthedocs.yaml, copyright year updates 2026-01-16 02:50:18 +08:00
Neil Ruaro
8cf72b36cb manually add camb-sdk to uv.lock, exclude camb from docs build 2026-01-16 02:26:38 +08:00
Neil Ruaro
38c3bcef96 exclude camb from docs build 2026-01-16 02:20:26 +08:00
Neil Ruaro
80604ba7b6 remove _update_settings method 2026-01-16 02:00:48 +08:00
Neil Ruaro
256c70c631 use UserTurnStrategies 2026-01-16 01:32:08 +08:00
Glenn Powell
0e3532c529 Allow WebsocketClientTransport to send custom headers 2026-01-15 09:31:48 -08:00
Neil Ruaro
9942fcfeb2 updated per PR reviews 2026-01-16 01:20:17 +08:00
Neil Ruaro
003c24ca6e Make model parameter explicit in docstring example 2026-01-16 01:18:37 +08:00
Neil Ruaro
ed120d014d Add model-specific sample rates, transport example, and fix audio buffer alignment 2026-01-16 01:18:37 +08:00
Neil Ruaro
e76a3d04f0 Update Camb TTS to 48kHz sample rate 2026-01-16 01:18:37 +08:00
Neil Ruaro
641d17007f Clean up Camb TTS service and tests 2026-01-16 01:18:37 +08:00
Neil Ruaro
9293b5f24a Migrate Camb TTS service from raw HTTP to official SDK
- Replace aiohttp with camb SDK (AsyncCambAI client)
- Add support for passing existing SDK client instance
- Simplify API: no longer requires aiohttp_session parameter
- Update example to use simplified initialization
- Rewrite tests to mock SDK client instead of HTTP servers
2026-01-16 01:18:37 +08:00
Neil Ruaro
c1f3cbd1d4 Yield TTSAudioRawFrame directly instead of calling private method 2026-01-16 01:18:37 +08:00
Neil Ruaro
78fa2ab65e Update default voice ID, fix MARS naming, and clean up example 2026-01-16 01:18:37 +08:00
Neil Ruaro
56da2caeed Update Camb.ai TTS inference options 2026-01-16 01:18:37 +08:00
Neil Ruaro
a541d65255 Update MARS model names to mars-flash, mars-pro, mars-instruct
Rename model identifiers from mars-8-* to the new naming convention:
- mars-8-flash -> mars-flash (default)
- mars-8 -> removed
- mars-8-instruct -> mars-instruct
- Added mars-pro
2026-01-16 01:18:37 +08:00
Neil Ruaro
a3d7e9eafe Address PR feedback: add --voice-id arg, remove test script
- Add --voice-id CLI argument to example (default: 2681)
- Remove test_camb_quick.py from examples/ (tests belong in tests/)
- Update docstring with new usage
2026-01-16 01:18:36 +08:00
Neil Ruaro
54933bea2a Rename changelog to PR number 2026-01-16 01:18:36 +08:00
Neil Ruaro
fcab9899cc Add changelog entry for Camb.ai TTS integration 2026-01-16 01:18:36 +08:00
Neil Ruaro
be098e85db Remove non-working Daily/WebRTC example
The Daily transport example had authentication issues. Keeping the
local audio example (07zb-interruptible-camb-local.py) which works.
2026-01-16 01:18:36 +08:00
Neil Ruaro
ed0ff46a87 added local test 2026-01-16 01:18:36 +08:00
Neil Ruaro
7ae0d651d6 added cambai tts integration 2026-01-16 01:18:36 +08:00
Mark Backman
efd4432cfb Renumber the 07 foundational examples 2026-01-15 10:26:17 -05:00
kompfner
24082b84f2 Merge pull request #3453 from pipecat-ai/pk/consistency-pass-on-user-started-stopped-speaking-frames
Do a consistency pass on how we're sending `UserStartedSpeakingFrame`…
2026-01-15 09:24:14 -05:00
Aleix Conchillo Flaqué
dcd5840341 Merge pull request #3455 from pipecat-ai/aleix/reset-user-turn-start-strategies
UserTurnController: reset user turn start strategies when turn triggered
2026-01-14 19:28:32 -08:00
Aleix Conchillo Flaqué
9e705ce768 UserTurnController: reset user turn start strategies when turn triggered 2026-01-14 18:20:29 -08:00
Mark Backman
965466cc09 Merge pull request #3454 from pipecat-ai/mb/external-turn-strategies-timeout
fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrat…
2026-01-14 20:15:31 -05:00
Mark Backman
f3993f1775 fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrategies 2026-01-14 20:10:56 -05:00
Paul Kompfner
e107902b14 Do a consistency pass on how we're sending UserStartedSpeakingFrames and UserStoppedSpeakingFrames. The codebase is now consistent in broadcasting both types of frames up and downstream. 2026-01-14 18:47:15 -05:00
kompfner
e7b5ff49f4 Merge pull request #3447 from pipecat-ai/pk/add-pr-3420-to-changelog
Add PR 3420 to CHANGELOG (it was missing)
2026-01-14 15:33:44 -05:00
Paul Kompfner
e33172c44e Add PR 3420 to CHANGELOG (it was missing) 2026-01-14 15:33:07 -05:00
Mark Backman
3d858e8aa6 Merge pull request #3444 from pipecat-ai/mb/update-quickstart-0.0.99
Update quickstart example for 0.0.99
2026-01-14 10:29:55 -05:00
Mark Backman
eab059c49a Merge pull request #3446 from pipecat-ai/mb/add-3392-changelog
Add PR 3392 to changelog, linting cleanup
2026-01-14 10:28:57 -05:00
Mark Backman
4aaff04fb3 Add PR 3392 to changelog, linting cleanup 2026-01-14 09:43:17 -05:00
Mark Backman
cb364f3cab Update quickstart example for 0.0.99 2026-01-14 08:59:20 -05:00
Mark Backman
a9bfb090c3 Merge pull request #3287 from ashotbagh/feature/asyncai-multicontext-wss
Fix TTFB metric and add multi-context WebSocket support for Async TTS
2026-01-14 07:52:52 -05:00
Ashot
c4ae4025f3 Adjustments of Async TTS for multicontext websocket support 2026-01-14 16:33:30 +04:00
Ashot
15067c678d adapt Async TTS to updated AudioContextTTSService 2026-01-14 15:45:27 +04:00
Ashot
5ae592f38e Improve Async TTS interruption handling by using AudioContextTTSService class and add changelog fragments 2026-01-14 15:45:27 +04:00
Ashot
9cdbc56be3 Fix TTFB metric and add multi-context WebSocket support for Async TTS 2026-01-14 15:45:27 +04:00
Aleix Conchillo Flaqué
86ed485711 Merge pull request #3440 from pipecat-ai/changelog-0.0.99
Release 0.0.99 - Changelog Update
2026-01-13 17:02:41 -08:00
Aleix Conchillo Flaqué
7e1b4a4e90 update cosmetic changelog updates for 0.0.99 2026-01-13 16:59:46 -08:00
aconchillo
4531d517da Update changelog for version 0.0.99 2026-01-14 00:49:15 +00:00
Aleix Conchillo Flaqué
6fd5847f84 Merge pull request #3439 from pipecat-ai/aleix/uv-lock-2026-01-13
uv.lock: upgrade to latest versions
2026-01-13 16:48:07 -08:00
Aleix Conchillo Flaqué
2015eba9b2 uv.lock: upgrade to latest versions 2026-01-13 16:45:44 -08:00
Mark Backman
84f16ee895 Merge pull request #3438 from pipecat-ai/mb/fix-26a
Fix 26a foundational
2026-01-13 19:43:50 -05:00
Aleix Conchillo Flaqué
5b2af03b16 Merge pull request #3437 from pipecat-ai/aleix/update-aggregator-logs
LLMContextAggregatorPair: make strategy logs less verbose
2026-01-13 16:39:29 -08:00
Mark Backman
b313395dc3 Fix 26a foundational 2026-01-13 19:31:24 -05:00
Aleix Conchillo Flaqué
0d6bdbee10 LLMContextAggregatorPair: make strategy logs less verbose 2026-01-13 15:11:22 -08:00
Aleix Conchillo Flaqué
248dac3a9d Merge pull request #3420 from pipecat-ai/pk/fix-gemini-3-parallel-function-calls
Fix parallel function calling with Gemini 3.
2026-01-13 14:40:33 -08:00
Paul Kompfner
be49a54856 Fast-exit in the fix for parallel function calling with Gemini 3, if we can determine up-front that there's no work to do 2026-01-13 17:32:20 -05:00
Aleix Conchillo Flaqué
bd9ee0d646 Merge pull request #3434 from pipecat-ai/aleix/context-appregator-pair-tuple
context aggregator pair tuple
2026-01-13 14:12:51 -08:00
Mark Backman
442e0e582d Merge pull request #3431 from pipecat-ai/mb/update-realtime-examples-transcript-handler
Update GeminiLiveLLMService to push thought frames, update 26a for new transcript events
2026-01-13 17:10:40 -05:00
kompfner
38194c0cff Merge pull request #3436 from pipecat-ai/pk/remove-transcript-processor-reference
Remove dead import of `TranscriptProcessor` (which is now deprecated)
2026-01-13 17:06:17 -05:00
Paul Kompfner
0ebdaba03c Remove dead import of TranscriptProcessor (which is now deprecated) 2026-01-13 17:02:57 -05:00
Aleix Conchillo Flaqué
ee82377d68 examples: fix 22d to push some CancelFrame and EndFrame 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
861588e4a3 examples: update all examples to use the new LLMContextAggregatorPair tuple 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
1ab3bf2ef6 LLMContextAggregatorPair: instances can now return a tuple 2026-01-13 14:01:53 -08:00
Mark Backman
bb00d223c9 Update 26a to use context aggregator transcription events 2026-01-13 17:01:10 -05:00
Aleix Conchillo Flaqué
86fbfaddd1 Merge pull request #3435 from pipecat-ai/aleix/fix-llm-context-create-audio-message
LLMContext: fix create_audio_message
2026-01-13 13:59:28 -08:00
Aleix Conchillo Flaqué
5612bf513b LLMContext: fix create_audio_message 2026-01-13 13:53:34 -08:00
Mark Backman
87d0dc9e24 Merge pull request #3412 from pipecat-ai/mb/remove-41a-b
Remove foundational examples 41a and 41b
2026-01-13 16:45:26 -05:00
Paul Kompfner
30fbcfbf71 Rework fix for parallel function calling with Gemini 3 2026-01-13 16:33:59 -05:00
Mark Backman
5d90f4ea06 Merge pull request #3428 from pipecat-ai/mb/fix-tracing-none-values
Fix TTS, realtime LLM services could return unknown for model_name
2026-01-13 15:40:10 -05:00
kompfner
f6d09e1574 Merge pull request #3430 from pipecat-ai/pk/request-image-frame-fixes
Fix request_image_frame and usage
2026-01-13 15:36:44 -05:00
Mark Backman
b8e48dee7f Merge pull request #3433 from pipecat-ai/mb/port-realtime-examples-transcript-events
Update examples to use transcription events from context aggregators
2026-01-13 15:36:06 -05:00
Mark Backman
a6ccb9ec69 Merge pull request #3427 from pipecat-ai/mb/add-07j-gladia-vad-example
Add 07j Gladia VAD foundational example, add to release evals
2026-01-13 15:35:24 -05:00
Mark Backman
66551ebdf5 Merge pull request #3426 from pipecat-ai/mb/changelog-3404
Add changelog fragments for PR 3404
2026-01-13 15:34:58 -05:00
Aleix Conchillo Flaqué
21534f7d83 added changelog file for #3430 2026-01-13 12:21:22 -08:00
Mark Backman
d591f9e108 Remove 28-transcription-processor.py 2026-01-13 15:20:59 -05:00
Mark Backman
aa2589d3be Update examples to use transcription events from context aggregators 2026-01-13 15:19:47 -05:00
Aleix Conchillo Flaqué
9d6067fa78 examples(foundational): speak "Let me check on that" in 14d examples 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
027e54425a examples(foundational): associate image requests to function calls 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
e268c73c41 LLMAssistantAggregator: cache function call requested images 2026-01-13 12:10:08 -08:00
Aleix Conchillo Flaqué
d3c57e2da0 UserImageRawFrame: don't deprecate request field 2026-01-13 11:56:13 -08:00
Aleix Conchillo Flaqué
02eace5a16 UserImageRequestFrame: don't deprecate function call related fields 2026-01-13 11:55:55 -08:00
Mark Backman
15bc1dd999 Update GeminiLiveLLMService to push Thought frames when thought content is returned 2026-01-13 14:13:00 -05:00
Paul Kompfner
b937956dc8 Fix request_image_frame and usage 2026-01-13 13:23:01 -05:00
Mark Backman
efbc0c8510 Fix TTS, realtime LLM services could return unknown for model_name 2026-01-13 12:12:15 -05:00
Himanshu Gunwant
d0f227189c fix: openai llm model name is unknown (#3422) 2026-01-13 11:55:52 -05:00
Mark Backman
41eef5efc4 Add 07j Gladia VAD foundational example, add to release evals 2026-01-13 11:36:15 -05:00
Mark Backman
f00f9d9f1a Add changelog fragments for PR 3404 2026-01-13 11:29:17 -05:00
Mark Backman
ae59b3ba36 Merge pull request #3404 from poseneror/feature/gladia-vad-events
feat(gladia): add VAD events support
2026-01-13 11:26:56 -05:00
Paul Kompfner
6668712f7b Add evals for parallel function calling 2026-01-13 11:03:38 -05:00
Paul Kompfner
8812686b17 Fix parallel function calling with Gemini 3.
Gemini expects parallel function calls to be passed in as a single multi-part `Content` block. This is important because only one of the function calls in a batch of parallel function calls gets a thought signature—if they're passed in as separate `Content` blocks, there'd be one or more missing thought signatures, which would result in a Gemini error.
2026-01-13 11:03:38 -05:00
kompfner
8b0f0b5bb4 Merge pull request #3425 from pipecat-ai/pk/gemini-3-flash-new-thinking-levels
Add Gemini 3 Flash-specific thinking levels
2026-01-13 11:02:53 -05:00
Paul Kompfner
f5e8a04e3b Bump aiortc dependency, which relaxes the constraint on av, which was pinned to 14.4.0, which no longer has all necessary wheels 2026-01-13 10:50:08 -05:00
Mark Backman
a298ce3b41 Merge pull request #3424 from pipecat-ai/mb/tts-append-trailing-space
Add append_trailing_space to TTSService to prevent vocalizing trailin…
2026-01-13 10:42:40 -05:00
Mark Backman
31daa889e8 Add append_trailing_space to TTSService to prevent vocalizing trailing punctuation; update DeepgramTTSService and RimeTTSService to use the arg 2026-01-13 10:38:54 -05:00
Paul Kompfner
76a058178e Add Gemini 3 Flash-specific thinking levels 2026-01-13 09:50:59 -05:00
poseneror
3304b18ac2 Add should_interrupt + broadcast user events 2026-01-13 14:27:35 +02:00
poseneror
b95a6afe77 feat(gladia): add VAD events support
Add support for Gladia's speech_start/speech_end events to emit
UserStartedSpeakingFrame and UserStoppedSpeakingFrame frames.

When enable_vad=True in GladiaInputParams:
- speech_start triggers interruption and pushes UserStartedSpeakingFrame
- speech_end pushes UserStoppedSpeakingFrame
- Tracks speaking state to prevent duplicate events

This allows using Gladia's built-in VAD instead of a separate VAD
in the pipeline.
2026-01-13 14:27:35 +02:00
Mark Backman
f6ed7d7582 Merge pull request #3418 from pipecat-ai/mb/speechmatics-task-cleanup 2026-01-12 19:24:56 -05:00
Mark Backman
cd3290df1c Small cleanup for task creation in SpeechmaticsSTTService 2026-01-12 16:00:32 -05:00
Mark Backman
2296caf529 Merge pull request #3414 from pipecat-ai/mb/changelog-3410
Update changelog for PR 3410.changed.md
2026-01-12 13:43:42 -05:00
Mark Backman
90ded6658d Merge pull request #3403 from pipecat-ai/mb/inworld-tts-add-keepalive
InworldTTSService: Add keepalive task
2026-01-12 13:31:24 -05:00
Mark Backman
7e97fb80a5 Merge pull request #3392 from pipecat-ai/mb/websocket-service-connection-closed-error
Add reconnect logic to WebsocketService in the event of ConnectionClo…
2026-01-12 13:11:43 -05:00
Mark Backman
b58471fdb1 Add Exotel and Vonage to Serializers in README services list 2026-01-12 12:24:56 -05:00
Aleix Conchillo Flaqué
46b4f9f29b Merge pull request #3413 from pipecat-ai/aleix/fix-assistant-thought-aggregation
LLMAssistantAggregator: reset aggregation after adding the thought, not before
2026-01-12 09:21:42 -08:00
Aleix Conchillo Flaqué
ec20d72aba LLMAssistantAggregator: reset aggregation after adding the thought, not before 2026-01-12 09:18:13 -08:00
Mark Backman
5743e2a99b Update changelog for PR 3410.changed.md 2026-01-12 12:15:40 -05:00
Mark Backman
2f429a2e76 Merge pull request #3410 from Vonage/feat/fastapi-ws-vonage-serializer
feat: update FastAPI WebSocket transport and add Vonage serializer
2026-01-12 12:10:57 -05:00
Varun Pratap Singh
3e982f7a4a refactor: rename audio_packet_bytes to fixed_audio_packet_size 2026-01-12 22:11:39 +05:30
Mark Backman
89484e281d Remove foundational examples 41a and 41b 2026-01-12 10:11:58 -05:00
Varun Pratap Singh
14a115f372 changelog: add fragments for PR #3410 2026-01-12 18:12:27 +05:30
Varun Pratap Singh
e96595fe59 feat: update FastAPI WebSocket transport and add Vonage serializer 2026-01-12 17:50:38 +05:30
Mark Backman
f58d21862b WebsocketService: Add _maybe_try_reconnect and use for exception cases 2026-01-11 16:43:37 -05:00
Om Chauhan
38506f51f7 fix(openrouter): handle multiple system messages for Gemini models 2026-01-11 21:19:47 +05:30
Mark Backman
aac24ad2d4 InworldTTSService: Add keepalive task 2026-01-10 11:20:20 -05:00
Aleix Conchillo Flaqué
1df9575e20 Merge pull request #3400 from pipecat-ai/aleix/ensure-bot-speaking-flag-is-set
BaseOutputTransport: ensure bot speaking flag is set on time
2026-01-10 07:34:26 -08:00
Aleix Conchillo Flaqué
64609fe80f BaseOutputTransport: ensure bot speaking flag is set on time 2026-01-09 20:40:25 -08:00
Aleix Conchillo Flaqué
533a54e111 Merge pull request #3399 from pipecat-ai/aleix/groq-switch-orpheus
GroqTTSService: switch to canopylabs/orpheus-v1-english
2026-01-09 20:39:56 -08:00
Aleix Conchillo Flaqué
b59c3eb470 GroqTTSService: switch to canopylabs/orpheus-v1-english 2026-01-09 18:14:48 -08:00
Aleix Conchillo Flaqué
0366fc35cb Merge pull request #3398 from pipecat-ai/aleix/examples-foundational-fix-49c-transport
examples(foundational): add missing transport.output() to 49c
2026-01-09 17:39:54 -08:00
Aleix Conchillo Flaqué
d86ff4b1ee Merge pull request #3397 from pipecat-ai/aleix/add-setup-pipeline-task
PipelineTask: add external pipeline task setup files
2026-01-09 17:39:25 -08:00
Aleix Conchillo Flaqué
f8040324e1 Merge pull request #3396 from pipecat-ai/aleix/add-pipeline-task-pipeline-property
PipelineTask: add pipeline property
2026-01-09 17:38:51 -08:00
Mark Backman
9c81acb159 Track websocket disconnecting status to improve error handling 2026-01-09 20:24:07 -05:00
Aleix Conchillo Flaqué
65395b1112 examples(foundational): add missing transport.output() to 49c 2026-01-09 16:44:04 -08:00
Aleix Conchillo Flaqué
d2696be03b PipelineTask: add external pipeline task setup files 2026-01-09 16:42:27 -08:00
Aleix Conchillo Flaqué
2da4d420f9 PipelineTask: add pipeline property 2026-01-09 15:47:02 -08:00
Aleix Conchillo Flaqué
a992f95c02 clarify changelog with #3343 fix 2026-01-09 10:37:16 -08:00
Aleix Conchillo Flaqué
edd8e07df6 update changelog with #3343 fix 2026-01-09 10:31:29 -08:00
Aleix Conchillo Flaqué
c813d43da0 Merge pull request #3343 from omChauhanDev/fix/auto-resolve-function-result
fix: keeping the Aggregator and Service states synchronized.
2026-01-09 10:04:20 -08:00
Aleix Conchillo Flaqué
c973445ab7 Merge pull request #3385 from pipecat-ai/aleix/context-aggregator-turn-stop-messages
user and assistant aggregator turn events
2026-01-09 09:52:48 -08:00
Aleix Conchillo Flaqué
25f6ba76d6 add start timestamp to user and assistant turn messages 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
8f47c569f9 examples(foundational): add 28-user-assistant-turns.py 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
c16801e524 examples(foundational): update 49 series with on_assistant_thought 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
dafcd0448f added changelog for new assistant turn events 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
24a52375c7 tests: added LLMAssistantAggregator unit tests 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
5f9e95038e BaseObject: improve logging messages 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
5cbb21afb2 deprecate TranscriptProcessor and related dataclasses and frames 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
119fab2996 LLMAssistantAggregator: allow thought aggregation without appending to context 2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
38d354c4ed LLMAssistantAggregator: add assistant turn and thought events 2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
cdb1074e11 LLMAssistantAggregator: no need to use BotStoppedSpeakingFrame
The end of turn is already handle with interruptions or with
LLMFullResponseEndFrame. LLMFullResponseEndFrame should never be blocked,
otherwise the assistant would not work.
2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
4b61fd2d7d LLMUserAggregator: add user turn stopped message argument
It is now possible to get the user aggregation when a `on_user_turn_stopped`
event is emitted.
2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
5a0a5c120b Merge pull request #3394 from pipecat-ai/aleix/base-smart-turn-update-vad-start-secs
smartturn: rename on_vad_start_secs_updated to update_vad_start_secs
2026-01-09 09:40:29 -08:00
Aleix Conchillo Flaqué
d92926ae54 smartturn: rename on_vad_start_secs_updated to update_vad_start_secs 2026-01-09 09:34:15 -08:00
Aleix Conchillo Flaqué
b34af5da24 Merge pull request #3372 from pipecat-ai/aleix/add-user-turn-controller-processor
add new UserTurnController and UserTurnProcessor
2026-01-09 09:29:10 -08:00
Aleix Conchillo Flaqué
5da1f86575 scripts: add 53-concurrent-llm-evaluation.py to release evals 2026-01-09 09:26:38 -08:00
Aleix Conchillo Flaqué
b0185e3539 tests: improve LLMUserAggregator tests 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
7232da6ba1 tests: added unit tests for UserTurnProcessor 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
9dff75cd44 examples: add 53-concurrent-llm-evaluation.py 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
6038860be0 tests: added unit tests for UserTurnController 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
4653de9f03 tests: rename test_bot_turn_start_strategy to test_user_turn_stop_strategy 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
fef79651ef turns: add UserTurnProcessor for advanced pipeline user turn management 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
3d54ca0a7c LLMUserAggregator: user UserTurnController for user turn management 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
199986815c turns: add UserTurnController for user turn management 2026-01-09 09:21:28 -08:00
Filipi da Silva Fuchter
0a3c00f68b Merge pull request #3391 from pipecat-ai/filipi/krisp_followup_improvements
Krisp VIVA follow-up improvements
2026-01-09 10:39:23 -05:00
marcus-daily
3e2467eb71 Fixing ruff formatting 2026-01-09 15:07:13 +00:00
marcus-daily
c4cc476c3d Updating changelog 2026-01-09 15:07:13 +00:00
marcus-daily
cc6ff1ac54 Reverting quickstart to match main 2026-01-09 15:07:13 +00:00
marcus-daily
b075502c4c Addressing code review comments 2026-01-09 15:07:13 +00:00
marcus-daily
35a99f92ab Take into account VAD start_secs when passing audio data to Smart Turn, and add an extra 500ms of pre-speech audio for good measure 2026-01-09 15:07:13 +00:00
Mark Backman
4fe0836cf9 Add reconnect logic to WebsocketService in the event of ConnectionClosedError 2026-01-09 09:03:01 -05:00
filipi87
8b7cc65ae6 Mentioning the Krisp Viva improvements in the changelog. 2026-01-09 10:43:01 -03:00
filipi87
4d495ba74f Fixing ruff format. 2026-01-09 10:32:36 -03:00
filipi87
de5de0b162 Fixed KrispVivaTurn to properly release the Krisp SDK. 2026-01-09 10:31:17 -03:00
filipi87
311da30802 Updating the Krisp Viva example to use Krisp turn model. 2026-01-09 10:19:13 -03:00
Garegin Harutyunyan
16819a5caa Krisp VIVA SDK Filter and Turn support. (#3261)
* Krisp VIVA SDK Filter and Turn support.

* Reverted the krisp_filter.py as it's already deprectaed.

* enabled test with krisp_audio mock.

* More review comment fixes.
reverted the state logic in viva filter to be similar to the existing impl on main branch.
Fixed tests, ruff, etc.

* More review comments for Turn detection.
removed integration tests.

* Moved the SDK init/deinit into start/stop
2026-01-09 08:15:08 -05:00
Mark Backman
72a44c2fcd Merge pull request #3386 from pipecat-ai/mb/deepgram-deprecate-vad-events
Deprecate support for vad_events in DeepgramSTTService
2026-01-09 07:56:03 -05:00
Mark Backman
7783b20b91 Merge pull request #3390 from dhruvladia-sarvam/update/sarvam-plugins 2026-01-09 07:11:13 -05:00
dhruvladia-sarvam
962ccbc0d7 fix 2026-01-09 14:26:28 +05:30
Mark Backman
4d61c5d7b2 Deprecate support for vad_events in DeepgramSTTService 2026-01-08 20:32:30 -05:00
Mark Backman
7ca4597ade Merge pull request #3379 from lukepayyapilli/fix/fastapi-websocket-json-text-handling
Fix FastAPIWebsocketTransport to handle both binary and text messages
2026-01-08 17:26:35 -05:00
Luke Payyapilli
f1a22728ab Add websocket extra to coverage workflow 2026-01-08 17:13:31 -05:00
Luke Payyapilli
ca88fc849f Add websocket extra to CI for FastAPI test coverage 2026-01-08 17:09:27 -05:00
Luke Payyapilli
ccd795445f Fix protobuf serializer test to compare attributes instead of frame objects 2026-01-08 17:00:40 -05:00
Luke Payyapilli
1874269a48 Remove FrameSerializerType enum and type property from serializers 2026-01-08 16:54:23 -05:00
Mark Backman
8b20373a8e Merge pull request #3380 from pipecat-ai/mb/changelog-3366
Add changelog fragment for PR 3366
2026-01-08 14:49:47 -05:00
Aleix Conchillo Flaqué
15dcb77a0c Merge pull request #3364 from pipecat-ai/rajneesh/add-daily-sip-provider-option
Add support for specifying sip provider.
2026-01-08 11:47:09 -08:00
Mark Backman
5d2fac9cd7 Add changelog fragment for PR 3366 2026-01-08 14:43:23 -05:00
Mark Backman
682b253760 Merge pull request #3366 from lukepayyapilli/fix/cartesia-allow-none-language
Allow language=None in CartesiaTTSService for auto-detection
2026-01-08 14:42:09 -05:00
Luke Payyapilli
f440de82e2 Handle None language in _process_word_timestamps_for_language 2026-01-08 13:59:21 -05:00
Mark Backman
5e0e6822c7 Merge pull request #3360 from pipecat-ai/mb/openai-realtime-send-image
Add video input (e.g. image input) support for OpenAI Realtime
2026-01-08 13:26:35 -05:00
Mark Backman
2aadac7a4d Update OpenAIRealtime image to video to align with GeminiLive 2026-01-08 13:23:08 -05:00
Filipi da Silva Fuchter
1098394486 Merge pull request #3374 from pipecat-ai/filipi/external_turn_controllers_interruptions
External turn controllers improvements
2026-01-08 13:05:41 -05:00
Mark Backman
b90a34228f Update 19c to remove pausing audio and input 2026-01-08 13:00:45 -05:00
Mark Backman
8bf8ebd34b Remove start_audio_paused from OpenAI Realtime demos, and others 2026-01-08 13:00:45 -05:00
Mark Backman
673d88417c Change Gemini Live and OpenAI Realtime logging to trace when sending a video frame 2026-01-08 13:00:45 -05:00
Mark Backman
3a7b489208 Add foundational 19c and add to evals 2026-01-08 13:00:45 -05:00
Mark Backman
7ae9eebc34 Add image input support for OpenAI Realtime 2026-01-08 13:00:44 -05:00
Mark Backman
8f83ba5878 Merge pull request #3376 from dhruvladia-sarvam/update/sarvam-plugins
headers update
2026-01-08 12:57:32 -05:00
filipi87
b8af3fa214 Improving should_interrupt docs for Speechmatics. 2026-01-08 14:53:29 -03:00
dhruvladia-sarvam
5ddec4f596 fix 2026-01-08 23:07:40 +05:30
dhruvladia-sarvam
8f4b4f4941 fix 2026-01-08 23:04:43 +05:30
dhruvladia-sarvam
953349f262 fix 2026-01-08 22:53:59 +05:30
Luke Payyapilli
b52ae0e56b Fix FastAPIWebsocketTransport to handle both binary and text messages 2026-01-08 11:25:18 -05:00
dhruvladia-sarvam
893b448534 headers update 2026-01-08 21:09:41 +05:30
Mark Backman
973769b8bc Merge pull request #3370 from pipecat-ai/mb/fix-azure-tts
AzureTTSService cleanup
2026-01-08 09:34:02 -05:00
filipi87
c8fa9d34e1 Adding a changelog entry for the new should_interrupt property. 2026-01-08 10:58:07 -03:00
filipi87
3069deb92f Allows defining whether Speechmatics should send an interruption when the user’s turn has started. 2026-01-08 10:50:33 -03:00
filipi87
68c9c01747 Allows defining whether Flux should send an interruption when the user’s turn has started. 2026-01-08 10:44:53 -03:00
filipi87
5e8f0baa12 Allows defining whether Deepgram should send an interruption when the user’s turn has started. 2026-01-08 10:36:52 -03:00
Mark Backman
8d1286cc00 Merge pull request #3371 from speechmatics/fix/voice-version-bump 2026-01-08 07:21:43 -05:00
Aleix Conchillo Flaqué
bda4dd339a Merge pull request #3373 from pipecat-ai/aleix/update-copyright-notices-2026
update examples and tests copyright and use a proper dash in 2024-2026
2026-01-07 20:36:40 -08:00
rajneeshksoni
f2e3034d24 Add support for specifying sip provider.
optional "provider" field in the RoomSipParams
2026-01-08 09:05:56 +05:30
Aleix Conchillo Flaqué
2626154a64 update examples and tests copyright and use a proper dash in 2024-2026 2026-01-07 19:32:22 -08:00
Sam Sykes
b770b2a419 Changelog 2026-01-07 16:56:56 -08:00
Sam Sykes
158c34b0f9 version bump 2026-01-07 16:54:53 -08:00
Mark Backman
d507c88d3e Merge pull request #3369 from pipecat-ai/mb/copyright-2026
Update copyright date range to 2024-2026
2026-01-07 17:07:05 -05:00
Mark Backman
98f70b775f Update copyright date range to 2024-2026 2026-01-07 16:58:13 -05:00
Mark Backman
54f4b824e4 Merge pull request #3356 from pipecat-ai/mb/gemini-live-user-transcript-timeout
Add timeout for handling user transcript messages
2026-01-07 16:47:23 -05:00
Mark Backman
2aa5307f0a Add _push_user_transcription to unify the logic to push user transcripts from a single utility function 2026-01-07 16:43:48 -05:00
Mark Backman
6c10d6ef8a Merge pull request #3367 from pipecat-ai/marcus/smart-turn-v3.2
Updated Smart Turn model weights to v3.2
2026-01-07 16:37:53 -05:00
Mark Backman
89b36f2b25 AzureTTSService: Restore metrics generation 2026-01-07 16:33:52 -05:00
Mark Backman
79a6adbcf3 AzureTTSService: Handle first chunk only for timestamps and TTFB metrics 2026-01-07 16:15:01 -05:00
Mark Backman
95f00a3c4b AzureTTSService: Align error handling with Pipecat norms 2026-01-07 15:45:30 -05:00
Mark Backman
3f8373f76f AzureTTSService: prevent word timestamp carryover on interruption 2026-01-07 15:39:37 -05:00
Mark Backman
23a9d3f4d7 Merge pull request #3334 from obata-kotobasamurai/fix/azure-tts-word-timestamp
Add word-level timestamp support to Azure TTS with race condition fix
2026-01-07 14:48:02 -05:00
Mark Backman
333279f45a Merge pull request #3328 from speechmatics/fix/speectmatics-vad
Update to SpeechmaticsSTTService for `0.0.99`
2026-01-07 14:42:21 -05:00
yukiobata1
add5f51201 updated azure tts.py file 2026-01-08 03:14:37 +09:00
marcus-daily
d1bedef5b3 Updated Smart Turn model weights to v3.2 2026-01-07 17:23:11 +00:00
Mark Backman
54cf0116a8 Merge pull request #3363 from pipecat-ai/mb/update-audo-context-inheritance
Update AudioContextTTSService to inherit from WebsocketTTSService
2026-01-07 12:08:47 -05:00
Luke Payyapilli
6b252fb46e Allow language=None in CartesiaTTSService for auto-detection 2026-01-07 11:50:21 -05:00
Sam Sykes
3e00a16f0f Remove unused import and correction to docs. 2026-01-07 07:45:26 -08:00
Sam Sykes
ecfd93544a Correction to UserStartedSpeakingFrame timing. 2026-01-07 07:43:47 -08:00
Sam Sykes
3ec89e49bf Added changelog for split_sentences and code tidy for end of turn handling. 2026-01-07 07:41:49 -08:00
Mark Backman
8762506e9f Update AudioContextTTSService to inherit from WebsocketTTSService 2026-01-07 09:10:27 -05:00
yukiobata1
7204bf9914 added changegelog 2026-01-07 13:32:31 +09:00
yukiobata1
f62c262f23 Call start_word_timestamps() when the first audio chunk arrives 2026-01-07 13:10:41 +09:00
Mark Backman
10aa784809 Merge pull request #3351 from okue/fix/stt-model-name-attribute
Fix STT model name attribute retrieval in tracing decorator
2026-01-06 16:10:56 -05:00
Filipi da Silva Fuchter
904f5dc183 Merge pull request #3338 from omChauhanDev/fix/smallwebrtc-mute-timeout-spam
fix(smallwebrtc): suppress timeout warnings when tracks are disabled
2026-01-06 09:07:52 -05:00
Mark Backman
c61a5e7173 Merge pull request #3346 from pipecat-ai/mb/cartesia-pronunciation-dict
Cartesia TTS: Add support for pronunciation_dict_id
2026-01-06 08:52:09 -05:00
Filipi da Silva Fuchter
81b28beef5 Merge pull request #3357 from pipecat-ai/filipi/live_avatar
Added support for using the HeyGen LiveAvatar API with the HeyGenTransport
2026-01-06 08:22:39 -05:00
filipi87
0d34356678 Adding a changelog entry for the HeyGen LiveAvatar API change. 2026-01-06 10:19:19 -03:00
filipi87
5412840a93 Added support for using the HeyGen LiveAvatar API with the HeyGenTransport. 2026-01-06 10:16:12 -03:00
yukiobata1
137bbb3d2c updated tts.py to match mark's version 2026-01-06 21:16:13 +09:00
Mark Backman
5a40054ac2 Merge pull request #3216 from mayurdd/patch-1
Adding include_language_detection param to Elevenlabs Realtime STT
2026-01-05 17:01:02 -05:00
Mark Backman
be621fbc5c Add timeout for handling user transcript messages 2026-01-05 16:58:14 -05:00
Mark Backman
9ab4836601 Merge pull request #3323 from pipecat-ai/mb/changelog-3322
Add changelog fragment for PR #3322
2026-01-05 16:55:52 -05:00
mayurdd
4671102833 Addressing the comments 2026-01-05 13:35:37 -08:00
Mayur Sirwani
67401a275b Adding include_language_detection to Elevenlabs Realtime STT
Adding a param to the config while connecting to the session
2026-01-05 13:27:27 -08:00
Mark Backman
c422588071 Merge pull request #3345 from pipecat-ai/mb/avoid-tts-dot
Add trailing space to DeepgramTTSService text generation
2026-01-05 15:56:14 -05:00
kompfner
fb12fec899 Merge pull request #3354 from pipecat-ai/pk/fix-aws-nova-sonic-example-for-nova-2-sonic
Fix the 20e example to use the proper conversation-start pattern for …
2026-01-05 11:17:57 -05:00
Paul Kompfner
c53c49558f Fix the 20e example to use the proper conversation-start pattern for the Nova 2 Sonic model 2026-01-05 10:56:08 -05:00
okue
1a26a2daa4 Fix STT model name attribute retrieval in tracing decorator
Changed getattr with default value to use 'or' operator for fallback.
This ensures proper model name retrieval when model_name attribute exists but is None or empty.
2026-01-05 17:20:48 +09:00
Mark Backman
d8be1282b5 Cartesia TTS: Add support for pronunciation_dict_id 2026-01-04 09:30:04 -05:00
Mark Backman
91bc5236b5 Add trailing space to DeepgramTTSService text generation 2026-01-04 08:53:48 -05:00
Om Chauhan
1ceb01665f fix: treat language as first-class STT setting 2026-01-04 11:04:30 +05:30
Om Chauhan
b278957111 fix: broadcast FunctionCallResultFrame, on implicit return 2026-01-03 19:52:25 +05:30
Mark Backman
1c80c739d6 Merge pull request #3335 from pipecat-ai/mb/update-evals-07-variants
Add 07 example variants to release evals
2026-01-02 15:32:12 -05:00
Om Chauhan
700a94222b fix(smallwebrtc): suppress timeout warnings when tracks are disabled 2026-01-01 22:00:08 +05:30
Sam Sykes
d5d2156689 Updated changelog. 2025-12-31 19:07:11 +00:00
Sam Sykes
8203ad08a8 Updated to have default as FIXED for Pipecat VAD. 2025-12-31 19:05:29 +00:00
Mark Backman
31907b90f0 Add 07 example variants to release evals 2025-12-31 09:11:00 -05:00
Mark Backman
7b595f10ce Merge pull request #3329 from omChauhanDev/deepgram-tts-validation
added encoding validation in DeepgramTTSService
2025-12-31 08:20:40 -05:00
yukiobata1
4f93d331b7 Added await to self.start_word_timestamps() 2025-12-31 19:19:21 +09:00
yukiobata1
32c6dccebe Add word-level timestamp support to Azure TTS with cumulative PTS fix
This commit adds word boundary support to AzureTTSService and fixes
the race condition that causes scrambled TTS output across multiple
sentences.

## Features Added

- Change AzureTTSService to inherit from WordTTSService
- Subscribe to Azure SDK's synthesis_word_boundary event
- Emit word-level text with timing information via _words_queue
- Add synthesis lock for sequential sentence processing

## Race Condition Fix

Previously, each sentence's word boundary timestamps reset to 0,
causing downstream components to interleave words when reordering
frames by PTS. This resulted in scrambled output like:
  'Hello ! I What am questions AI have assistant...'

The fix adds cumulative audio offset tracking to ensure monotonically
increasing PTS across all sentences:
  Sentence 1: pts = 0.1s, 0.5s, 0.8s (cumulative at end: 0.8s)
  Sentence 2: pts = 0.9s, 1.2s, 1.5s (0.8s + relative offset)

## Key Changes

- _cumulative_audio_offset: tracks total audio duration
- _handle_word_boundary: adds cumulative offset to timestamps
- _handle_completed: accumulates audio duration for next sentence
- flush_audio: resets cumulative offset at end of LLM response
- _handle_interruption: resets state on user interruption
- run_tts: uses synthesis lock for sequential processing

Fixes #2918

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 18:49:48 +09:00
Aleix Conchillo Flaqué
cbdc2b7d2d Merge pull request #3330 from pipecat-ai/aleix/update-turn-start-strategies-deprecations
update turn start strategies deprecations
2025-12-30 21:04:47 -08:00
Aleix Conchillo Flaqué
66a9dc70c7 LLMUserAggregator: fix turn strategies renaming 2025-12-30 20:59:48 -08:00
Aleix Conchillo Flaqué
846ca500d3 turns: update old turn_start_strategies deprecations 2025-12-30 19:50:10 -08:00
Om Chauhan
bd6afd445d added changelog 2025-12-31 09:18:18 +05:30
Om Chauhan
0663bbc2fb added encoding validation in DeepgramTTSService 2025-12-31 08:33:17 +05:30
Sam Sykes
8e7a951af8 updated changelog 2025-12-31 01:36:58 +00:00
Sam Sykes
ba1aeb8f7f Changelog 2025-12-31 01:31:46 +00:00
Sam Sykes
f7c74cfa80 Updated VAD 2025-12-31 01:28:31 +00:00
Mark Backman
2e700c8576 Merge pull request #3324 from pipecat-ai/mb/bump-small-webrtc-prebuilt-version
Bump small-webrtc-prebuilt verison to 2.0.4, update uv.lock
2025-12-30 20:10:11 -05:00
Mark Backman
f4626a4fc4 Bump small-webrtc-prebuilt verison to 2.0.4, update uv.lock 2025-12-30 14:19:20 -05:00
Mark Backman
e0b40a330f Add changelog fragment for PR #3322 2025-12-30 08:40:28 -05:00
Martin Liu
8dfc59be13 Include pts in incoming video and audio frames 2025-11-12 18:36:56 -05:00
736 changed files with 25719 additions and 9897 deletions

View File

@@ -0,0 +1,47 @@
---
name: changelog
description: Create changelog files for important commits in a PR
---
Create changelog files for the important commits in this PR. The PR number is provided as an argument.
## Instructions
1. Skip changelog for: documentation-only, internal refactoring, test-only, CI changes.
2. First, check what commits are on the current branch compared to main:
```
git log main..HEAD --oneline
```
3. For each significant change, create a changelog file in the `changelog/` folder using the format:
Allowed types: `added`, `changed`, `deprecated`, `removed`, `fixed`, `security`, `performance`, `other`
- `{PR_NUMBER}.added.md` - for new features
- `{PR_NUMBER}.added.2.md`, `{PR_NUMBER}.added.3.md` - for additional entries of the same type
- `{PR_NUMBER}.changed.md` - for changes to existing functionality
- `{PR_NUMBER}.fixed.md` - for bug fixes
- `{PR_NUMBER}.deprecated.md` - for deprecations
- `{PR_NUMBER}.removed.md` - for removed features
- `{PR_NUMBER}.security.md` - for security fixes
- `{PR_NUMBER}.performance.md` - for performance improvements
- `{PR_NUMBER}.other.md` - for other changes
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
5. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.
6. Use ⚠️ emoji prefix for breaking changes.
## Example
For PR #3519 with a new feature and a bug fix:
`changelog/3519.added.md`:
```
- Added `SomeNewFeature` for doing something useful.
```
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly.
```

View File

@@ -0,0 +1,257 @@
---
name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
## Instructions
1. First, find the class in the codebase:
```
Search for "class ClassName" in src/pipecat/
```
2. If multiple files contain that class name:
- List all matches with their file paths
- Ask the user which one they want to document
- Wait for confirmation before proceeding
3. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component
4. Apply documentation in this order:
- Module docstring (at top, after imports)
- Class docstrings
- `__init__` methods (always document constructor parameters)
- Public methods (not starting with `_`)
- Dataclass/config classes with field descriptions
5. Skip documentation for:
- Private methods (starting with `_`)
- Simple dunder methods (`__str__`, `__repr__`, `__post_init__`)
- Very simple pass-through properties
- **Already documented code** - If a class, method, or function already has a complete docstring that follows the project style, do not modify it. A docstring is complete if it has:
- A one-line summary
- Args section (if it has parameters)
- Returns section (if it returns something meaningful)
- Only add or improve documentation where it is missing or incomplete
## Module Docstring Format
```python
"""[One-line description of module purpose].
[Optional: Longer explanation of functionality, key classes, or use cases.]
"""
```
Example:
```python
"""Neuphonic text-to-speech service implementations.
This module provides WebSocket and HTTP-based integrations with Neuphonic's
text-to-speech API for real-time audio synthesis.
"""
```
## Class Docstring Format
```python
class ClassName:
"""One-line summary describing what the class does.
[Longer description explaining purpose, behavior, and key features.
Use action-oriented language.]
[Optional: Event handlers, usage notes, or important caveats.]
"""
```
Example:
```python
class FrameProcessor(BaseObject):
"""Base class for all frame processors in the pipeline.
Frame processors are the building blocks of Pipecat pipelines, they can be
linked to form complex processing pipelines. They receive frames, process
them, and pass them to the next or previous processor in the chain.
Event handlers available:
- on_before_process_frame: Called before a frame is processed
- on_after_process_frame: Called after a frame is processed
Example::
@processor.event_handler("on_before_process_frame")
async def on_before_process_frame(processor, frame):
...
@processor.event_handler("on_after_process_frame")
async def on_after_process_frame(processor, frame):
...
"""
```
Note: When listing event handlers, do NOT use backticks. Include an `Example::` section (with double colon for Sphinx) showing the decorator pattern and function signature for each event.
## Constructor (`__init__`) Format
```python
def __init__(self, *, param1: Type, param2: Type = default, **kwargs):
"""Initialize the [ClassName].
Args:
param1: Description of param1 and its purpose.
param2: Description of param2. Defaults to [default].
**kwargs: Additional arguments passed to parent class.
"""
```
Example:
```python
def __init__(
self,
*,
api_key: str,
voice_id: Optional[str] = None,
sample_rate: Optional[int] = 22050,
**kwargs,
):
"""Initialize the Neuphonic TTS service.
Args:
api_key: Neuphonic API key for authentication.
voice_id: ID of the voice to use for synthesis.
sample_rate: Audio sample rate in Hz. Defaults to 22050.
**kwargs: Additional arguments passed to parent InterruptibleTTSService.
"""
```
## Method Docstring Format
```python
async def method_name(self, param1: Type) -> ReturnType:
"""One-line summary of what method does.
[Longer description if behavior isn't obvious.]
Args:
param1: Description of param1.
Returns:
Description of return value.
Raises:
ExceptionType: When this exception is raised.
"""
```
Example:
```python
async def put(self, item: Tuple[Frame, FrameDirection, FrameCallback]):
"""Put an item into the priority queue.
System frames (`SystemFrame`) have higher priority than any other
frames. If a non-frame item is provided it will have the highest priority.
Args:
item: The item to enqueue.
"""
```
## Dataclass/Config Format
```python
@dataclass
class ConfigName:
"""One-line description of configuration.
[Explanation of when/how to use this config.]
Parameters:
field1: Description of field1.
field2: Description of field2. Defaults to [default].
"""
field1: Type
field2: Type = default_value
```
Example:
```python
@dataclass
class FrameProcessorSetup:
"""Configuration parameters for frame processor initialization.
Parameters:
clock: The clock instance for timing operations.
task_manager: The task manager for handling async operations.
observer: Optional observer for monitoring frame processing events.
"""
clock: BaseClock
task_manager: BaseTaskManager
observer: Optional[BaseObserver] = None
```
## Enum Documentation Format
```python
class EnumName(Enum):
"""One-line description of the enum purpose.
[Longer description of how the enum is used.]
Parameters:
VALUE1: Description of VALUE1.
VALUE2: Description of VALUE2.
"""
VALUE1 = 1
VALUE2 = 2
```
## Writing Style Guidelines
- **Concise and professional** - No casual language or filler words
- **Action-oriented** - Start with verbs: "Processes...", "Manages...", "Converts..."
- **Purpose before implementation** - Explain WHY before HOW
- **Clear parameter descriptions** - Include type hints, defaults, and purpose
- **No redundant type info** - Type hints are in the signature, don't repeat in description
- **Use backticks for code references** - Wrap class names, method names, event names, parameter names, and code snippets in backticks
Good: "Neuphonic API key for authentication."
Bad: "str: The API key (string) that is used for authenticating with Neuphonic."
Good: "Triggers `on_speech_started` when the `VADAnalyzer` detects speech."
Bad: "Triggers on_speech_started when the VADAnalyzer detects speech."
## Deprecation Notice Format
When documenting deprecated code:
```python
"""[Description].
.. deprecated:: X.X.X
`ClassName` is deprecated and will be removed in a future version.
Use `NewClassName` instead.
"""
```
## Checklist
Before finishing, verify:
- [ ] Module has a docstring at the top (after copyright header and imports)
- [ ] All public classes have docstrings
- [ ] All `__init__` methods document their parameters
- [ ] All public methods have docstrings with Args/Returns/Raises as needed
- [ ] Dataclasses use "Parameters:" section for field descriptions
- [ ] Enums document each value in "Parameters:" section
- [ ] Writing is concise and action-oriented
- [ ] No documentation added to private methods (starting with `_`)
- [ ] Existing complete docstrings were left unchanged

View File

@@ -0,0 +1,128 @@
---
name: pr-description
description: Update a GitHub PR description with a summary of changes
---
Update a GitHub pull request description based on the changes in the PR.
## Arguments
```
/pr-description <PR_NUMBER> [--fixes <ISSUE_NUMBERS>]
```
- `PR_NUMBER` (required): The pull request number to update
- `--fixes` (optional): Comma-separated issue numbers that this PR fixes (e.g., `--fixes 123,456`)
Examples:
- `/pr-description 3534`
- `/pr-description 3534 --fixes 123`
- `/pr-description 3534 --fixes 123,456,789`
## Instructions
1. First, gather information about the PR:
- Use GitHub plugin to get PR details (title, current description, base branch)
- Use local git to get commits: `git log main..HEAD --oneline`
- Use local git to get the diff: `git diff main..HEAD`
- Parse any `--fixes` argument for issue numbers
2. Check the existing PR description:
- If it already has a complete, accurate description that reflects the changes, do nothing
- If it's missing sections, incomplete, or outdated compared to the actual changes, proceed to update
- If it only has the template placeholder text, generate a full description
3. Analyze the changes:
- Understand the purpose of each commit
- Identify any breaking changes (API changes, removed features, behavior changes)
- Look for new features, bug fixes, refactoring, or documentation changes
- Collect issue numbers from:
- The `--fixes` argument (if provided)
- Commit messages (patterns like "Fixes #123", "Closes #456", "Resolves #789")
4. Generate or update the PR description with these sections:
## PR Description Format
### Summary (always include)
Brief bullet points describing what changed and why. Focus on the *purpose* and *impact*, not implementation details.
```markdown
## Summary
- Added X to enable Y
- Fixed bug where Z would happen
- Refactored W for better maintainability
```
### Breaking Changes (include only if applicable)
Document any changes that affect existing users or APIs.
```markdown
## Breaking Changes
- `ClassName.method()` now requires a `param` argument
- Removed deprecated `old_function()` - use `new_function()` instead
```
### Testing (include when non-obvious)
How to verify the changes work. Skip for trivial changes.
```markdown
## Testing
- Run `uv run pytest tests/test_feature.py` to verify the fix
- Example usage: `uv run examples/new_feature.py`
```
### Fixes (include if issues are provided or found in commits)
List issues this PR fixes. GitHub will automatically close these issues when the PR is merged.
```markdown
## Fixes
- Fixes #123
- Fixes #456
```
Note: Use "Fixes #X" format (not "Closes" or "Resolves") for consistency. Each issue should be on its own line with "Fixes" to ensure GitHub auto-closes them.
## Guidelines
- **Be concise** - Reviewers should understand the PR in 30 seconds
- **Focus on why** - The diff shows *what* changed, explain *why*
- **Skip empty sections** - Only include sections that have content
- **Use bullet points** - Easier to scan than paragraphs
- **Don't duplicate the diff** - Avoid listing every file or line changed
## Example Output
```markdown
## Summary
- Added `/docstring` skill for documenting Python modules with Google-style docstrings
- Skill finds classes by name and handles conflicts when multiple matches exist
- Skips already-documented code to avoid unnecessary changes
## Testing
/docstring ClassName
## Fixes
- Fixes #123
```
## Checklist
Before updating the PR:
- [ ] Verified existing description needs updating (not already complete)
- [ ] Summary accurately reflects the changes
- [ ] Breaking changes are clearly documented (if any)
- [ ] No unnecessary sections included
- [ ] Description is concise and scannable

View File

@@ -33,7 +33,14 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra google \
--extra langchain \
--extra livekit \
--extra piper \
--extra websocket
- name: Run tests with coverage
run: |

View File

@@ -37,7 +37,14 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra google \
--extra langchain \
--extra livekit \
--extra piper \
--extra websocket
- name: Test with pytest
run: |

16
.gitignore vendored
View File

@@ -4,7 +4,14 @@ __pycache__/
*~
venv
.venv
/.idea
.idea
.gradle
.next
next-env.d.ts
local.properties
*.log
*.lock
smart_turn_audio_log
#*#
# Distribution / Packaging
@@ -27,7 +34,7 @@ share/python-wheels/
*.egg
MANIFEST
.DS_Store
.env
.env*
fly.toml
# Examples
@@ -51,4 +58,7 @@ docs/api/_build/
docs/api/api
# uv
.python-version
.python-version
# Pipecat
whisker_setup.py

View File

@@ -7,6 +7,916 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [0.0.101] - 2026-01-30
### Added
- Additions for `AICFilter` and `AICVADAnalyzer`:
- Added model downloading support to `AICFilter` with `model_id` and
`model_download_dir` parameters.
- Added `model_path` parameter to `AICFilter` for loading local `.aicmodel`
files.
- Added unit tests for `AICFilter` and `AICVADAnalyzer`.
(PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))
- Added handling for `server_content.interrupted` signal in the Gemini Live
service for faster interruption response in the case where there isn't
already turn tracking in the pipeline, e.g. local VAD + context aggregators.
When there is already turn tracking in the pipeline, the additional
interruption does no harm.
(PR [#3429](https://github.com/pipecat-ai/pipecat/pull/3429))
- Added new `GenesysFrameSerializer` for the Genesys AudioHook WebSocket
protocol, enabling bidirectional audio streaming between Pipecat pipelines
and Genesys Cloud contact center.
(PR [#3500](https://github.com/pipecat-ai/pipecat/pull/3500))
- Added `reached_upstream_types` and `reached_downstream_types` read-only
properties to `PipelineTask` for inspecting current frame filters.
(PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))
- Added `add_reached_upstream_filter()` and `add_reached_downstream_filter()`
methods to `PipelineTask` for appending frame types.
(PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))
- Added `UserTurnCompletionLLMServiceMixin` for LLM services to detect and
filter incomplete user turns. When enabled via `filter_incomplete_user_turns`
in `LLMUserAggregatorParams`, the LLM outputs a turn completion marker at the
start of each response: ✓ (complete), ○ (incomplete short), or ◐ (incomplete
long). Incomplete turns are suppressed, and configurable timeouts
automatically re-prompt the user.
(PR [#3518](https://github.com/pipecat-ai/pipecat/pull/3518))
- Added `FrameProcessor.broadcast_frame_instance(frame)` method to broadcast a
frame instance by extracting its fields and creating new instances for each
direction.
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
- `PipelineTask` now automatically adds `RTVIProcessor` and registers
`RTVIObserver` when `enable_rtvi=True` (default), simplifying pipeline setup.
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
- Added `RTVIProcessor.create_rtvi_observer()` factory method for creating RTVI
observers.
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
- Added `video_out_codec` parameter to `TransportParams` allowing configuration
of the preferred video codec (e.g., `"VP8"`, `"H264"`, `"H265"`) for video
output in `DailyTransport`.
(PR [#3520](https://github.com/pipecat-ai/pipecat/pull/3520))
- Added `location` parameter to Google TTS services (`GoogleHttpTTSService`,
`GoogleTTSService`, `GeminiTTSService`) for regional endpoint support.
(PR [#3523](https://github.com/pipecat-ai/pipecat/pull/3523))
- Added new `PIPECAT_SMART_TURN_LOG_DATA` environment variable, which causes
Smart Turn input data to be saved to disk
(PR [#3525](https://github.com/pipecat-ai/pipecat/pull/3525))
- Added `result_callback` parameter to `UserImageRequestFrame` to support
deferred function call results.
(PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))
- Added `function_call_timeout_secs` parameter to `LLMService` to configure
timeout for deferred function calls (defaults to 10.0 seconds).
(PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))
- Added `vad_analyzer` parameter to `LLMUserAggregatorParams`. VAD analysis is
now handled inside the `LLMUserAggregator` rather than in the transport,
keeping voice activity detection closer to where it is consumed. The
`vad_analyzer` on `BaseInputTransport` is now deprecated.
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(),
),
)
```
(PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))
- Added `VADProcessor` for detecting speech in audio streams within a pipeline.
Pushes `VADUserStartedSpeakingFrame`, `VADUserStoppedSpeakingFrame`, and
`UserSpeakingFrame` downstream based on VAD state changes.
(PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))
- Added `VADController` for managing voice activity detection state and
emitting speech events independently of transport or pipeline processors.
(PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))
- Added local `PiperTTSService` for offline text-to-speech using Piper voice
models. The existing HTTP-based service has been renamed to
`PiperHttpTTSService`.
(PR [#3585](https://github.com/pipecat-ai/pipecat/pull/3585))
- `main()` in `pipecat.runner.run` now accepts an optional
`argparse.ArgumentParser`, allowing bots to define custom CLI arguments
accessible via `runner_args.cli_args`.
(PR [#3590](https://github.com/pipecat-ai/pipecat/pull/3590))
- Added `KokoroTTSService` for local text-to-speech synthesis using the
Kokoro-82M model.
(PR [#3595](https://github.com/pipecat-ai/pipecat/pull/3595))
### Changed
- Updated `AICFilter` and `AICVADAnalyzer` to use aic-sdk ~= 2.0.1.
(PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))
- Improved the STT TTFB (Time To First Byte) measurement, reporting the delay
between when the user stops speaking and when the final transcription is
received. Note: Unlike traditional TTFB which measures from a discrete
request, STT services receive continuous audio input—so we measure from
speech end to final transcript, which captures the latency that matters for
voice AI applications. In support of this change, added `finalized` field to
`TranscriptionFrame` to indicate when a transcript is the final result for an
utterance.
(PR [#3495](https://github.com/pipecat-ai/pipecat/pull/3495))
- `SarvamSTTService` now defaults `vad_signals` and `high_vad_sensitivity` to
`None` (omitted from connection parameters), improving latency by ~300ms
compared to the previous defaults.
(PR [#3495](https://github.com/pipecat-ai/pipecat/pull/3495))
- Changed frame filter storage from tuples to sets in `PipelineTask`.
(PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))
- Changed default Inworld TTS model from `inworld-tts-1` to
`inworld-tts-1.5-max`.
(PR [#3531](https://github.com/pipecat-ai/pipecat/pull/3531))
- `FrameSerializer` now subclasses from `BaseObject` to enable event support.
(PR [#3560](https://github.com/pipecat-ai/pipecat/pull/3560))
- Added support for TTFS in `SpeechmaticsSTTService` and set the default mode
to `EXTERNAL` to support Pipecat-controlled VAD.
- Changed dependency to `speechmatics-voice[smart]>=0.2.8`
(PR [#3562](https://github.com/pipecat-ai/pipecat/pull/3562))
- ⚠️ Changed function call handling to use timeout-based completion instead of
immediate callback execution.
- Function calls that defer their results (e.g., `UserImageRequestFrame`)
now use a timeout mechanism
- The `result_callback` is invoked automatically when the deferred
operation completes or after timeout
- This change affects examples using `UserImageRequestFrame` - the
`result_callback` should now be passed to the frame instead of being called
immediately
(PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))
- Pipecat runner now uses `DAILY_ROOM_URL` instead of `DAILY_SAMPLE_ROOM_URL`.
(PR [#3582](https://github.com/pipecat-ai/pipecat/pull/3582))
- Updates to `GradiumSTTService`:
- Now flushes pending transcriptions when VAD detects the user stopped
speaking, improving response latency.
- `GradiumSTTService` now supports `InputParams` for configuring `language`
and `delay_in_frames` settings.
(PR [#3587](https://github.com/pipecat-ai/pipecat/pull/3587))
### Deprecated
- ⚠️ Deprecated `vad_analyzer` parameter on `BaseInputTransport`. Pass
`vad_analyzer` to `LLMUserAggregatorParams` instead or use `VADProcessor` in
the pipeline.
(PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))
### Removed
- Removed deprecated `AICFilter` parameters: `enhancement_level`, `voice_gain`,
`noise_gate_enable`.
(PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))
### Fixed
- Fixed an issue where if you were using `OpenRouterLLMService` with a Gemini
model, it wouldn't handle multiple `"system"` messages as expected (and as we
do in `GoogleLLMService`), which is to convert subsequent ones into `"user"`
messages. Instead, the latest `"system"` message would overwrite the previous
ones.
(PR [#3406](https://github.com/pipecat-ai/pipecat/pull/3406))
- Transports now properly broadcast `InputTransportMessageFrame` frames both
upstream and downstream instead of only pushing downstream.
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
- Fixed `FrameProcessor.broadcast_frame()` to deep copy kwargs, preventing
shared mutable references between the downstream and upstream frame
instances.
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
- Fixed OpenAI LLM services to emit `ErrorFrame` on completion timeout,
enabling proper error handling and LLMSwitcher failover.
(PR [#3529](https://github.com/pipecat-ai/pipecat/pull/3529))
- Fixed a logging issue where non-ASCII characters (e.g., Japanese, Chinese,
etc.) were being unnecessarily escaped to Unicode sequences when function
call occurred.
(PR [#3536](https://github.com/pipecat-ai/pipecat/pull/3536))
- Fixed how audio tracks are synchronized inside the `AudioBufferProcessor` to
fix timing issues where silence and audio were misaligned between user and
bot buffers.
(PR [#3541](https://github.com/pipecat-ai/pipecat/pull/3541))
- Fixed race condition in `OpenAIRealtimeBetaLLMService` that could cause an
error when truncating the conversation.
(PR [#3567](https://github.com/pipecat-ai/pipecat/pull/3567))
- Fixed an infinite loop in `WebsocketService` that blocked the event loop when
a remote server closed the connection gracefully.
(PR [#3574](https://github.com/pipecat-ai/pipecat/pull/3574))
- Fixed `LLMUserAggregator` and `LLMAssistantAggregator` not emitting pending
transcripts via `on_user_turn_stopped` and `on_assistant_turn_stopped` events
when the conversation ends (`EndFrame`) or is cancelled (`CancelFrame`).
(PR [#3575](https://github.com/pipecat-ai/pipecat/pull/3575))
- Added missing `LiveKitRunnerArguments` and `LiveKitTransport` support in
runner utilities to enable LiveKit transport configuration.
(PR [#3580](https://github.com/pipecat-ai/pipecat/pull/3580))
- Fixed race condition in `OpenAIRealtimeLLMService` that could cause an error
when truncating the conversation.
(PR [#3581](https://github.com/pipecat-ai/pipecat/pull/3581))
- Fixed `PiperHttpTTSService` (olf `PiperTTSService`) to resample audio output
based on the model's sample rate parsed from the WAV header.
(PR [#3585](https://github.com/pipecat-ai/pipecat/pull/3585))
- Fixed `UserTurnController` to reset user turn timeout when interim
transcriptions are received.
(PR [#3594](https://github.com/pipecat-ai/pipecat/pull/3594))
- Fixed an issue in the `IVRNavigator` where the `TextFrame`s pushed had
incorrect spacing. Now, the internal `IVRProcessor` pushes
`AggregatedTextFrame`s when in conversation mode. This allows for controlling
spacing of the outputted, aggregated text.
(PR [#3604](https://github.com/pipecat-ai/pipecat/pull/3604))
- Fixed `GeminiLiveLLMService` transcription timeout handler not being
scheduled by yielding to the event loop after task creation.
(PR [#3605](https://github.com/pipecat-ai/pipecat/pull/3605))
## [0.0.100] - 2026-01-20
### Added
- Added Hathora service to support Hathora-hosted TTS and STT models (only
non-streaming)
(PR [#3169](https://github.com/pipecat-ai/pipecat/pull/3169))
- Added `CambTTSService`, using Camb.ai's TTS integration with MARS models
(mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech
synthesis.
(PR [#3349](https://github.com/pipecat-ai/pipecat/pull/3349))
- Added the `additional_headers` param to `WebsocketClientParams`, allowing
`WebsocketClientTransport` to send custom headers on connect, for cases such
as authentication.
(PR [#3461](https://github.com/pipecat-ai/pipecat/pull/3461))
- Added `UserIdleController` for detecting user idle state, integrated into
`LLMUserAggregator` and `UserTurnProcessor` via optional `user_idle_timeout`
parameter. Emits `on_user_turn_idle` event for application-level handling.
Deprecated `UserIdleProcessor` in favor of the new compositional approach.
(PR [#3482](https://github.com/pipecat-ai/pipecat/pull/3482))
- Added `on_user_mute_started` and `on_user_mute_stopped` event handlers to
`LLMUserAggregator` for tracking user mute state changes.
(PR [#3490](https://github.com/pipecat-ai/pipecat/pull/3490))
### Changed
- Enhanced interruption handling in `AsyncAITTSService` by supporting
multi-context WebSocket sessions for more robust context management.
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
- Throttle `UserSpeakingFrame` to broadcast at most every 200ms instead of on
every audio chunk, reducing frame processing overhead during user speech.
(PR [#3483](https://github.com/pipecat-ai/pipecat/pull/3483))
### Deprecated
- For consistency with other package names, we just deprecated
`pipecat.turns.mute` (introduced in Pipecat 0.0.99) in favor of
`pipecat.turns.user_mute`.
(PR [#3479](https://github.com/pipecat-ai/pipecat/pull/3479))
### Fixed
- Corrected TTFB metric calculation in `AsyncAIHttpTTSService`.
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
- Fixed an issue where the "bot-llm-text" RTVI event would not fire for
realtime (speech-to-speech) services:
- `AWSNovaSonicLLMService`
- `GeminiLiveLLMService`
- `OpenAIRealtimeLLMService`
- `GrokRealtimeLLMService`
The issue was that these services weren't pushing `LLMTextFrame`s. Now
they do.
(PR [#3446](https://github.com/pipecat-ai/pipecat/pull/3446))
- Fixed an issue where `on_user_turn_stop_timeout` could fire while a user is
talking when using `ExternalUserTurnStrategies`.
(PR [#3454](https://github.com/pipecat-ai/pipecat/pull/3454))
- Fixed an issue where user turn start strategies were not being reset after a
user turn started, causing incorrect strategy behavior.
(PR [#3455](https://github.com/pipecat-ai/pipecat/pull/3455))
- Fixed `MinWordsUserTurnStartStrategy` to not aggregate transcriptions,
preventing incorrect turn starts when words are spoken with pauses between
them.
(PR [#3462](https://github.com/pipecat-ai/pipecat/pull/3462))
- Fixed an issue where Grok Realtime would error out when running with
SmallWebRTC transport.
(PR [#3480](https://github.com/pipecat-ai/pipecat/pull/3480))
- Fixed a `Mem0MemoryService` issue where passing `async_mode: true` was
causing an error. See
https://docs.mem0.ai/platform/features/async-mode-default-change.
(PR [#3484](https://github.com/pipecat-ai/pipecat/pull/3484))
- Fixed `AWSNovaSonicLLMService.reset_conversation()`, which would previously
error out. Now it successfully reconnects and "rehydrates" from the context
object.
(PR [#3486](https://github.com/pipecat-ai/pipecat/pull/3486))
- Fixed `AzureTTSService` transcript formatting issues:
- Punctuation now appears without extra spaces (e.g., "Hello!" instead of
"Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces
between characters
(PR [#3489](https://github.com/pipecat-ai/pipecat/pull/3489))
- Fixed an issue where `UninterruptibleFrame` frames would not be preserved in
some cases.
(PR [#3494](https://github.com/pipecat-ai/pipecat/pull/3494))
- Fixed memory leak in `LiveKitTransport` when `video_in_enabled` is `False`.
(PR [#3499](https://github.com/pipecat-ai/pipecat/pull/3499))
- Fixed an issue in `AIService` where unhandled exceptions in `start()`,
`stop()`, or `cancel()` implementations would prevent `process_frame()` to
continue and therefore `StartFrame`, `EndFrame`, or `CancelFrame` from being
pushed downstream, causing the pipeline to not start or stop properly.
(PR [#3503](https://github.com/pipecat-ai/pipecat/pull/3503))
- Moved `NVIDIATTSService` and `NVIDIASTTService` client initialization from
constructor to `start()` for better error handling.
(PR [#3504](https://github.com/pipecat-ai/pipecat/pull/3504))
- Optimized `NVIDIATTSService` to process incoming audio frames immediately.
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
- Optimized `NVIDIASTTService` by removing unnecessary queue and task.
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
- Fixed a `CambTTSService` issue where client was being initialized in the
constructor which wouldn't allow for proper Pipeline error handling.
(PR [#3511](https://github.com/pipecat-ai/pipecat/pull/3511))
## [0.0.99] - 2026-01-13
### Added
- Introducing user turn strategies. User turn strategies indicate when the user
turn starts or stops. In conversational agents, these are often referred to
as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g.
using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using
an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are
evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategy
Available user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategy
The default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]
Turn strategies are configured when setting up `LLMContextAggregatorPair`.
For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
)
],
)
),
)
```
In order to use the user turn strategies you must update to the new
universal `LLMContext` and `LLMContextAggregatorPair`.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural
network via pyrnnoise library.
(PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205))
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time
voice conversations:
- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz)
(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))
- Added an approximation of TTFB for Ultravox.
(PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268))
- Added a new `AudioContextTTSService` to the TTS service base classes. The
`AudioContextWordTTSService` now inherits from `AudioContextTTSService` and
`WebsocketWordTTSService`.
(PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289))
- `LLMUserAggregator` now exposes the following events:
- `on_user_turn_started`: triggered when a user turn starts
- `on_user_turn_stopped`: triggered when a user turn ends
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop
and times out
(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))
- Introducing user mute strategies. User mute strategies indicate when user
input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user
input from interrupting bot speech, tool execution, or other critical system
operations.
A list of strategies can be specified; all strategies are evaluated for
every frame so that each strategy can maintain its internal state. A user
frame is muted if any of the configured strategies indicates it should be
muted.
Available user mute strategies:
- `FirstSpeechUserMuteStrategy`
- `MuteUntilFirstBotCompleteUserMuteStrategy`
- `AlwaysUserMuteStrategy`
- `FunctionCallUserMuteStrategy`
User mute strategies replace the legacy `STTMuteFilter` and provide a more
flexible and composable approach to muting user input.
User mute strategies are configured when setting up the
`LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
FirstSpeechUserMuteStrategy(),
]
),
)
```
In order to use user mute strategies you should update to the new universal
`LLMContext` and `LLMContextAggregatorPair`.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService`
and `NvidiaTTSService`.
(PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300))
- Added `enable_interruptions` constructor argument to all user turn
strategies. This tells the `LLMUserAggregator` to push or not push an
`InterruptionFrame`.
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control
sentence splitting behavior for finals on sentence boundaries.
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
- Added word-level timestamp support to `AzureTTSService` for accurate
text-to-audio synchronization.
(PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334))
- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams`
and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation
dictionary feature for custom pronunciations.
(PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346))
- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport`
(see https://www.liveavatar.com/).
(PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357))
- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:
- New `start_video_paused` parameter to control initial video input state
- New `video_frame_detail` parameter to set image processing quality
("auto",
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
parameter.
- `set_video_input_paused()` method to pause/resume video input at runtime
- `set_video_frame_detail()` method to adjust video frame quality
dynamically
- Automatic rate limiting (1 frame per second) to prevent API overload
(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))
- Added `UserTurnProcessor`, a frame processor built on `UserTurnController`
that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames
and interruptions based on the controller's user turn strategies.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added `UserTurnController` to manage user turns. It emits
`on_user_turn_started`, `on_user_turn_stopped`, and
`on_user_turn_stop_timeout` events, and can be integrated into processors to
detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are
implemented using this controller.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added `should_interrupt` property to `DeepgramFluxSTTService`,
`DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the
bot should be interrupted when the external service detects user speech.
(PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374))
- `LLMAssistantAggregator` now exposes the following events:
- `on_assistant_turn_started`: triggered when the assistant turn starts
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
- `on_assistant_thought`: triggered when there's an assistant thought
available
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA
SDK (requires `krisp_audio`).
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Added support for setting up a pipeline task from external files. You can now
register custom pipeline task setup files by setting the
`PIPECAT_SETUP_FILES` environment variable. This variable should contain a
colon-separated list of Python files (e.g. `export
PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
function with the following signature:
```python
async def setup_pipeline_task(task: PipelineTask):
...
```
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
- Added a keepalive task for `InworldTTSService` to keep the service connected
in the event of no generations for longer periods of time.
(PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403))
- Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When
enabled, `GladiaSTTService` acts as the turn controller, emitting
`UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally
`InterruptionFrame`.
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
- Added `should_interrupt` property to `GladiaSTTService` to configure whether
the bot should be interrupted when the external service detects user speech.
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
- Added `VonageFrameSerializer` for the Vonage Video API Audio Connector
WebSocket protocol.
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
- Added `append_trailing_space` parameter to `TTSService` to automatically
append a trailing space to text before sending to TTS, helping prevent some
services from vocalizing trailing punctuation.
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
### Changed
- Updated `ElevenLabsRealtimeSTTService` to accept the
`include_language_detection` parameter to detect language.
```python
stt = ElevenLabsRealtimeSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
include_language_detection=True
)
```
(PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216))
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved
VAD, Smart Turn capabilities, and brings dramatic improvements to latency
without any impact on accuracy. Use the `turn_detection_mode` parameter to control
the endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
```python
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
```
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
- `daily-python` updated to 0.23.0.
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by
`DailyTransport` now include the transport source (i.e., the originating
audio track).
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
- Updates to Inworld TTS services:
- Improved `InworldTTSService`'s websocket implementation to better flush
and close context to better handle long inputs.
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))
- Improved the error handling and reconnection logic for `WebsocketServer` by
distinguishing between errors when disconnecting and websocket communication
errors.
(PR [#3392](https://github.com/pipecat-ai/pipecat/pull/3392))
- Updated `DeepgramSTTService` to push user started/stopped speaking and
interruption frames when `vad_enabled` is set to true. This centralizes the
frames into the service, removing the need to have your application code
handle Deepgram's events and push these frames.
(PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314))
- Added encoding validation to `DeepgramTTSService` to prevent unsupported
encodings from reaching the API. The service now raises `ValueError` at
initialization with a clear error message.
(PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329))
- Updated `read_audio_frame` & `read_video_frame` methods in
`SmallWebRTCClient` to check if the track is enabled before logging a
warning.
(PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336))
- Updated `CartesiaTTSService` to support setting `language=None`, resulting in
Cartesia auto-detecting the language of the conversation.
(PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366))
- The bundled Smart Turn weights are now updated to v3.2, which has better
handling of short utterances, and is more robust against background noise.
(PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367))
- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6`
(PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371))
- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
meaning that the start of the turn audio is not cut off. This improves
accuracy for short utterances.
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
(PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377))
- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter`
to share a single SDK instance within the same process.
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english`
and voice ID to `autumn`.
(PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399))
- Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio
packetization via the `fixed_audio_packet_size` parameter to support media
endpoints requiring strict framing and real-time pacing.
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
- `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to
`True` to prevent punctuation (e.g., “dot”) from being pronounced.
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
- Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`,
`LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns
thought content.
(PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431))
### Deprecated
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use
`pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- `FrameProcessor.interruption_strategies` is deprecated, use
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in
`pipecat.processors.aggregators.llm_response` are now deprecated. Use the new
universal `LLMContext` and `LLMContextAggregatorPair` instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and
`UserStoppedSpeakingFrame` frames.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame`
frames are deprecated.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in
unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies`
parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is
deprecated. Use the new `turn_detection_mode` parameter instead, with
`TurnDetectionMode.EXTERNAL`,`TurnDetectionMode.ADAPTIVE`, or
`TurnDetectionMode.SMART_TURN`. The `enable_vad` parameter is also
deprecated and is inferred from the `turn_detection_mode`.
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are
now deprecated in favor of the universal `LLMContext` and its associated
things.
From the developer's point of view, switching to using `LLMContext`
machinery will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```
(PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263))
- `STTMuteFilter` is deprecated and will be removed in a future version. Use
`LLMUserAggregator`'s new `user_mute_strategies` instead.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- `FrameProcessor.interruptions_allowed` is now deprecated, use
`LLMUserAggregator`'s new parameter `user_mute_strategies` instead.
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
- `PipelineParams.allow_interruptions` is now deprecated, use
`LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For
example, to disable interruptions but still get user turns you can do:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
),
),
)
```
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
- `TranscriptProcessor` and related data classes and frames
(`TranscriptionMessage`, `ThoughtTranscriptionMessage`,
`TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and
`LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and
`on_assistant_turn_stopped`) instead.
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
- Deprecated support for the `vad_events` `LiveOptions` in
`DeepgramSTTService`. Instead, use a local Silero VAD for VAD events.
Additionally, deprecated `should_interrupt` which will be removed along with
`vad_events` support in a future release.
(PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386))
- Loading external observers from files is deprecated, use the new pipeline
task setup files and `PIPECAT_SETUP_FILES` environment variable instead.
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
### Fixed
- Improved error handling in `ElevenLabsRealtimeSTTService`
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
that blocks the process if the websocket disconnects due to an error
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
- Fixed a bug in `STTMuteFilter` where the user was not always muted during
function calls, especially when there were multiple simultaneous calls.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate
memory" error when processing silence audio frames.
(PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322))
- Updated `SpeechmaticsSTTService` for version `0.0.99+`:
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
in order to finalize transcription.
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
detection.
- Only emit VAD + interruption frames if VAD is enabled within the plugin
(modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
- Fixed an issue with function calling where a handler failing to invoke its
result callback could leave the context stuck in IN_PROGRESS, causing LLM
inference for subsequent function call results to block while waiting on the
unresolved call.
(PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343))
- Fixed an issue with DeepgramTTSService where the model would output "Dot"
instead of a period in some circumstances.
(PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345))
- Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351))
- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were
occasionally not pushed.
(PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356))
- Fixed potential memory leaks and initialization issues in `KrispVivaFilter`
by improving SDK lifecycle management.
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was
set after awaiting, allowing the event loop to re-enter the method before the
guard was set.
(PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400))
- Fixed parallel function calling when using Gemini thinking.
(PR [3420](https://github.com/pipecat-ai/pipecat/pull/3420))
- Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422))
- Fixed an issue in `traced_tts`, `traced_gemini_live`, and
`traced_openai_realtime` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428))
- Fixed `request_image_frame` (for backwards compatibility) and restored
function-callrelated fields in `UserImageRequestFrame` and
`UserImageRawFrame`, preventing a case where adding a non-LLM message to the
context could trigger duplicate LLM inferences (on image arrival and on
function-call result), potentially causing an infinite inference loop.
(PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430))
- Fixed `LLMContext.create_audio_message()` by correcting an internal helper
that was incorrectly declared async while being run in `asyncio.to_thread()`.
(PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435))
### Other
- Added `52-live-transcription.py` foundational example demonstrating live
transcription and translation from English to Spanish. In this example, the
bot is not interruptible: as the user continues speaking, English
transcriptions are queued, and the bot continuously translates and speaks
each queued sentence in Spanish without being interrupted by new user speech.
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows
how to use `UserTurnProcessor`.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added a new foundational example `28-user-assistant-turns.py` that shows how
to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to
gather a conversation transcript.
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
## [0.0.98] - 2025-12-17
### Added

143
CLAUDE.md Normal file
View File

@@ -0,0 +1,143 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
```
## Architecture
### Frame-Based Pipeline Processing
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
Transport Input → Pipeline Source → [Processor1] → [Processor2] → ... → Pipeline Sink → Transport Output
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): External I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to exectue fast.
### Key Directories
| Directory | Purpose |
|---------------------------|----------------------------------------------------|
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
### Docstring Example
```python
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Pull Requests
After creating a PR, use `/changelog <pr_number>` to generate the changelog file and `/pr-description <pr_number>` to update the PR description.

View File

@@ -1,6 +1,6 @@
BSD 2-Clause License
Copyright (c) 20242025, Daily
Copyright (c) 20242026, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

View File

@@ -73,15 +73,15 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Category | Services |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |

View File

@@ -1,42 +0,0 @@
- Introducing user turn strategies. User turn strategies indicate when the user turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategy
Available user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategy
The default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]
Turn strategies are configured when setting up `LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
)
],
)
),
)
```
In order to use the user turn strategies you must update to the new universal `LLMContext` and `LLMContextAggregatorPair`.

View File

@@ -1 +0,0 @@
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in unexpected behavior, use `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- `FrameProcessor.interruption_strategies` is deprecated, use `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` frames are deprecated.

View File

@@ -1 +0,0 @@
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames.

View File

@@ -1 +0,0 @@
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new universal `LLMContext` and `LLMContextAggregatorPair` instead.

View File

@@ -1 +0,0 @@
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.

1
changelog/3134.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `ResembleAITTSService` for text-to-speech using Resemble AI's streaming WebSocket API with word-level timestamps and jitter buffering for smooth audio playback.

View File

@@ -1 +0,0 @@
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural network via pyrnnoise library.

View File

@@ -1,15 +0,0 @@
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved VAD,
Smart Turn capabilities, and brings dramatic improvements to latency without
any impact on accuracy. Use the `turn_detection_mode` parameter to control the
endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
```python
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
```

View File

@@ -1,4 +0,0 @@
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is deprecated.
Use the new `turn_detection_mode` parameter instead, with `TurnDetectionMode.EXTERNAL`,
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. The `enable_vad`
parameter is also deprecated and is inferred from the `turn_detection_mode`.

View File

@@ -1,2 +0,0 @@
- Improved error handling in `ElevenLabsRealtimeSTTService`
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop that blocks the process if the websocket disconnects due to an error

View File

@@ -1 +0,0 @@
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by `DailyTransport` now include the transport source (i.e., the originating audio track).

View File

@@ -1 +0,0 @@
- `daily-python` updated to 0.23.0.

View File

@@ -1,15 +0,0 @@
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are now deprecated in favor of the universal `LLMContext` and its associated things.
From the developer's point of view, switching to using `LLMContext` machinery will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```

View File

@@ -1,8 +0,0 @@
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time voice conversations:
- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz)

View File

@@ -1 +0,0 @@
- Added an approximation of TTFB for Ultravox.

View File

@@ -1,5 +0,0 @@
- Updates to Inworld TTS services:
- Improved `InworldTTSService`'s websocket implementation to better flush and
close context to better handle long inputs.
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.

View File

@@ -1 +0,0 @@
- Added a new `AudioContextTTSService` to the TTS service base classes. The `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and `WebsocketWordTTSService`.

View File

@@ -1,4 +0,0 @@
- `LLMUserAggregator` now exposes the following events:
- `on_user_turn_started`: triggered when a user turn starts
- `on_user_turn_stopped`: triggered when a user turn ends
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop and times out

View File

@@ -1,29 +0,0 @@
- Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations.
A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted.
Available user mute strategies:
* `FirstSpeechUserMuteStrategy`
* `MuteUntilFirstBotCompleteUserMuteStrategy`
* `AlwaysUserMuteStrategy`
* `FunctionCallUserMuteStrategy`
User mute strategies replace the legacy `STTMuteFilter` and provide a more flexible and composable approach to muting user input.
User mute strategies are configured when setting up the `LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
FirstSpeechUserMuteStrategy(),
]
),
)
```
In order to use user mute strategies you should update to the new universal `LLMContext` and `LLMContextAggregatorPair`.

View File

@@ -1 +0,0 @@
- `STTMuteFilter` is deprecated and will be removed in a future version. Use `LLMUserAggregator`'s new `user_mute_strategies` instead.

View File

@@ -1 +0,0 @@
- Fixed a bug in `STTMuteFilter` where the user was not always muted during function calls, especially when there were multiple simultaneous calls.

View File

@@ -1 +0,0 @@
- `FrameProcessor.interruptions_allowed` is now deprecated, use `LLMUserAggregator`'s new parameter `user_mute_strategies` instead.

View File

@@ -1,12 +0,0 @@
- `PipelineParams.allow_interruptions` is now deprecated, use `LLMUserAggregator`'s new parameter `turn_start_strategies` instead. For example, to disable interruptions but still get user turns you can do:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
),
),
)
```

View File

@@ -1 +0,0 @@
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService` and `NvidiaTTSService`.

View File

@@ -1 +0,0 @@
- Updated `DeepgramSTTService` to push user started/stopped speaking and interruption frames when `vad_enabled` is set to true. This centralizes the frames into the service, removing the need to have your application code handle Deepgram's events and push these frames.

View File

@@ -1 +0,0 @@
- Added `enable_interruptions` constructor argument to all user turn strategies. This tells the `LLMUserAggregator` to push or not push an `InterruptionFrame`.

View File

@@ -1 +0,0 @@
- Added `52-live-transcription.py` foundational example demonstrating live transcription and translation from English to Spanish. In this example, the bot is not interruptible: as the user continues speaking, English transcriptions are queued, and the bot continuously translates and speaks each queued sentence in Spanish without being interrupted by new user speech.

View File

@@ -1 +0,0 @@
- Frame processors can now push frames from the top of the pipeline using new methods `queue_task_frame()` and `queue_task_frames()`.

1
changelog/3355.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `UserBotLatencyObserver` for tracking user-to-bot response latency. When tracing is enabled, latency measurements are automatically recorded as `turn.user_bot_latency_seconds` attributes on OpenTelemetry turn spans.

View File

@@ -0,0 +1 @@
- Deprecated `UserBotLatencyLogObserver`. Use `UserBotLatencyObserver` directly with its `on_latency_measured` event handler instead.

1
changelog/3542.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed pipeline freeze when `InterruptionFrame` discards `EndFrame` or `StopFrame` by making terminal frames uninterruptible.

1
changelog/3589.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed OpenAI LLM stream not being closed on cancellation/exception, which could leak sockets.

1
changelog/3593.added.md Normal file
View File

@@ -0,0 +1 @@
- Added support for Inworld TTS Websocket Auto Mode for improved latency

View File

@@ -0,0 +1 @@
- Updated timestamps to be cumulative within an agent turn, using flushCompleted message as an indication of when timestamps from the server are reset to 0

1
changelog/3610.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `PipelineTask` adding duplicate `RTVIProcessor` and `RTVIObserver` when they were already provided in the pipeline or observers list. They are now detected and skipped, with appropriate warnings and errors logged for mismatched configurations.

View File

@@ -0,0 +1 @@
- Changed `KokoroTTSService` to use `kokoro-onnx` instead of `kokoro` as the underlying TTS engine.

1
changelog/3616.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed function call timeout task not being cancelled when the handler completes without calling `result_callback` or is cancelled externally, which caused `RuntimeWarning: coroutine was never awaited`.

5
changelog/3617.fixed.md Normal file
View File

@@ -0,0 +1,5 @@
- Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin
languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK
languages, causing text to accumulate until flush instead of being split at
sentence boundaries. Added fallback detection for unambiguous non-Latin
sentence-ending punctuation (e.g., `。`, ``, ``).

1
changelog/3623.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `PipelineTask` to also call `set_bot_ready()` when an external `RTVIProcessor` is provided.

1
changelog/3628.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `VADController` not broadcasting `SpeechControlParamsFrame` on startup, which prevented STT services from receiving VAD params needed for TTFB measurement.

1
changelog/3629.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `StopAsyncIteration` exceptions in `parse_telephony_websocket()` when WebSocket connections close before sending expected messages.

1
changelog/3630.added.md Normal file
View File

@@ -0,0 +1 @@
- Added RTVI function call lifecycle events (`llm-function-call-started`, `llm-function-call-in-progress`, `llm-function-call-stopped`) with configurable security levels via `RTVIObserverParams.function_call_report_level`. Supports per-function control over what information is exposed (`DISABLED`, `NONE`, `NAME`, or `FULL`).

View File

@@ -0,0 +1 @@
- Deprecated `RTVILLMFunctionCallMessage`, `RTVILLMFunctionCallMessageData`, and `RTVIProcessor.handle_function_call()`. Use the new `llm-function-call-in-progress` event sent automatically by `RTVIObserver` instead.

1
changelog/3635.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed WebSocket transport error when broadcasting `InputTransportMessageFrame` by correctly instantiating the frame with its message parameter.

1
changelog/3649.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed orphan OpenTelemetry spans during flow initialization and transitions in tracing.

View File

@@ -0,0 +1 @@
- Upgraded the `pipecat-ai-small-webrtc-prebuilt` package to v2.1.0.

1
changelog/3656.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `OpenAIRealtimeSTTService` for real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD and server-side VAD modes, noise reduction, and automatic reconnection.

10
changelog/3659.changed.md Normal file
View File

@@ -0,0 +1,10 @@
- ⚠️ The default `VADParams` `stop_secs` default is changing from `0.8` seconds
to `0.2` seconds. This change both simplifies the developer experience and
improves the performance of STT services. With a shorter `stop_secs` value,
STT services using a local VAD can finalize sooner, resulting in faster
transcription.
- `SpeechTimeoutUserTurnStopStrategy`: control how long to wait for
additional user speech using `user_speech_timeout` (default: 0.6 sec).
- `TurnAnalyzerUserTurnStopStrategy`: the turn analyzer automatically adjusts
the user wait time based on the audio input.

View File

@@ -0,0 +1 @@
- Moved interruption wait event from per-processor instance state to `InterruptionFrame` itself. Added `InterruptionFrame.complete()` to signal when the interruption has fully traversed the pipeline. Custom processors that block or consume an `InterruptionFrame` before it reaches the pipeline sink must call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()`. A warning is logged if completion does not happen within 2 seconds.

1
changelog/3663.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `SambaNovaLLMService` and `GoogleLLMOpenAIBetaService` streams not being closed on cancellation/exception, which could leak sockets.

View File

@@ -0,0 +1 @@
- Update the default model to `scribe_v2` for `ElevenLabsSTTService`.

View File

@@ -0,0 +1 @@
- Changed the `DeepgramSTTService` default setting for `smart_format` to `False`, as agents don't need smart formatting. Disabling this setting provides a small performance improvement, as well.

1
changelog/3667.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed an issue in `InworldTTSService` where punctuation was pronounced. Now, the `InworldTTSService` ensures proper spacing between sentences, resolving pronunciation issues.

1
changelog/3668.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `ParallelPipeline` allowing frames pushed by internal processors to escape during lifecycle frame (`StartFrame`/`EndFrame`/`CancelFrame`) synchronization. These frames are now buffered and flushed after all branches complete.

1
changelog/3678.added.md Normal file
View File

@@ -0,0 +1 @@
- Added pyright basic type checking configuration for the core framework.

View File

@@ -91,6 +91,25 @@ autodoc_mock_imports = [
# MLX dependencies (Apple Silicon specific)
"mlx",
"mlx_whisper", # Note: might need underscore format too
# Pydantic v2 compatibility issues in third-party SDKs
"hume",
"hume.tts",
"hume.tts.types",
"cartesia",
"camb",
"sarvamai",
"openpipe",
"openai.types.beta.realtime",
"langchain_core",
"langchain_core.messages",
# FastAPI - Pydantic v2 compatibility issues during Sphinx autodoc
"fastapi",
"fastapi.applications",
"fastapi.routing",
"fastapi.params",
"fastapi.middleware",
"fastapi.responses",
"uvicorn",
]
# HTML output settings

View File

@@ -31,6 +31,9 @@ AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Camb.ai
CAMB_API_KEY=...
# Cartesia
CARTESIA_API_KEY=...
CARTESIA_VOICE_ID=...
@@ -40,7 +43,7 @@ CEREBRAS_API_KEY=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
DAILY_ROOM_URL=https://...
# Deepgram
DEEPGRAM_API_KEY=...
@@ -82,6 +85,9 @@ GROK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Hathora
HATHORA_API_KEY=...
# Heygen
HEYGEN_API_KEY=...
HEYGEN_LIVE_AVATAR_API_KEY=...
@@ -97,7 +103,8 @@ INWORLD_API_KEY=...
KRISP_MODEL_PATH=...
# Krisp Viva
KRISP_VIVA_MODEL_PATH=...
KRISP_VIVA_FILTER_MODEL_PATH=...
KRISP_VIVA_TURN_MODEL_PATH=...
# LiveKit
LIVEKIT_API_KEY=...
@@ -149,6 +156,10 @@ PLIVO_AUTH_TOKEN=...
# Qwen
QWEN_API_KEY=...
# Resemble AI
RESEMBLE_API_KEY=
RESEMBLE_VOICE_UUID=
# Rime
RIME_API_KEY=...
RIME_VOICE_ID=...

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -16,7 +16,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.piper.tts import PiperTTSService
from pipecat.services.piper.tts import PiperHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -24,9 +24,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
@@ -39,7 +38,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = PiperTTSService(
tts = PiperHttpTTSService(
base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -25,9 +25,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -23,9 +23,8 @@ from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -22,9 +22,8 @@ from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -19,7 +19,6 @@ from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -64,7 +63,6 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
)
@@ -85,12 +83,13 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -98,11 +97,11 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -49,7 +48,6 @@ async def main():
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
)
@@ -68,7 +66,7 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -76,17 +74,18 @@ async def main():
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -14,7 +14,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
InterruptionFrame,
TranscriptionFrame,
@@ -54,7 +53,6 @@ async def main():
params=LiveKitParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
)
@@ -78,12 +76,13 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -91,11 +90,11 @@ async def main():
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -65,9 +65,8 @@ class MonthPrepender(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_out_enabled=True,

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,7 +11,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import Frame, LLMRunFrame, MetricsFrame
from pipecat.metrics.metrics import (
LLMUsageMetricsData,
@@ -62,24 +61,20 @@ class MetricsLogger(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -106,12 +101,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -119,12 +115,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(),
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
ml,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,7 +12,6 @@ from PIL import Image
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
@@ -77,9 +76,8 @@ class ImageSyncAggregator(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -87,7 +85,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
@@ -95,7 +92,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -120,12 +116,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -138,12 +135,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(),
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
image_sync_aggregator,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,7 +11,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -35,24 +34,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -77,12 +72,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -90,11 +86,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,7 +11,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -34,24 +33,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -76,12 +71,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -89,11 +85,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -33,9 +33,8 @@ from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -131,7 +130,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -140,11 +139,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,7 +12,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -37,24 +36,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -117,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -125,6 +120,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -132,11 +128,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -17,7 +17,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMMessagesUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -50,24 +49,20 @@ def get_session_history(session_id: str) -> BaseChatMessageHistory:
return message_store[session_id]
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -103,12 +98,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
lc = LangchainProcessor(history_chain)
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -116,11 +112,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
lc, # Langchain
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -32,9 +32,8 @@ from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -71,7 +70,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -80,11 +79,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -13,7 +13,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -37,24 +36,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -81,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -89,6 +84,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -96,11 +92,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,7 +12,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -36,24 +35,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -86,12 +81,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -99,11 +95,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -33,9 +33,8 @@ from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -72,7 +71,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -81,11 +80,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,7 +12,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -36,24 +35,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -75,12 +70,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -88,11 +84,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -13,7 +13,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -37,24 +36,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -85,7 +80,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -93,6 +88,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -100,11 +96,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,7 +12,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -36,24 +35,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -78,12 +73,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -91,11 +87,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,7 +12,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -35,24 +34,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -78,12 +73,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -91,11 +87,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,7 +12,6 @@ from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -36,24 +35,20 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -80,12 +75,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -93,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

Some files were not shown because too many files have changed in this diff Show More