Compare commits

...

1171 Commits

Author SHA1 Message Date
Mark Backman
5a6cc4d35c Replace assert-based type narrowing with local variables and guards
Use local variable narrowing and if-guards instead of assert statements
for type safety, since asserts are stripped with python -O.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 16:46:45 -05:00
Mark Backman
28be775740 Reduce type: ignore comments by fixing avoidable type mismatches
Replace ~20 type: ignore comments with proper type fixes:
- Widen set_tools() to accept List[dict] | ToolsSchema | NotGiven
- Widen create_task() to accept Coroutine | Awaitable
- Fix _turn_params to use BaseTurnParams instead of SmartTurnParams
- Make _thought_llm Optional[str] with assertion guard
- Add mixer assertion, websocket narrowing, ice_servers cast
- Use dict.get() in protobuf serializer
- Make remote_participants Optional in Daily transport

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 15:30:35 -05:00
Mark Backman
bc730e4069 Enable pyright basic type checking for core framework
Add pyright configuration (basic mode, Python 3.10) to pyproject.toml
and fix all 276 type errors in the core framework (everything except
services/ and adapters/). This establishes a CI-ready type checking
baseline as Pipecat approaches 1.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 15:30:35 -05:00
Mark Backman
104d06551a Merge pull request #3679 from pipecat-ai/mb/remove-to-be-updated
Remove SequentialMergePipeline
2026-02-08 15:28:38 -05:00
Mark Backman
90ad2a4e81 Remove SequentialMergePipeline 2026-02-08 14:44:48 -05:00
Mark Backman
570f2d7fc0 Merge pull request #3667 from ianbbqzy/ian/fix-auto-mode-space
[inworld] aggregate_sentence mode needs trailing space
2026-02-07 18:22:32 -05:00
Ian Lee
f3d99adf8f [inworld] aggregate_sentence mode needs trailing space 2026-02-07 15:18:24 -08:00
Mark Backman
d34f416281 Merge pull request #3598 from dhruvladia-sarvam/sarvam-v3-update
ASR and TTS v3 update
2026-02-07 10:51:35 -05:00
Mark Backman
5a1deb7cb4 Merge pull request #3659 from pipecat-ai/mb/change-vad-defaults
Set VADParams stop_secs to 0.2 by default
2026-02-06 23:51:50 -05:00
Mark Backman
a5fc2b1650 Set VADParams stop_secs to 0.2 by default 2026-02-06 23:49:08 -05:00
Aleix Conchillo Flaqué
5cb8d91431 added changelog file for #3616 2026-02-06 16:45:23 -08:00
Aleix Conchillo Flaqué
ce690848c0 Merge pull request #3616 from omChauhanDev/fix/function-call-timeout-task-cleanup
fix: ensure function call timeout task is always cancelled
2026-02-06 16:40:56 -08:00
Aleix Conchillo Flaqué
30f51edfcd Merge pull request #3668 from pipecat-ai/aleix/parallel-pipeline-buffering
Buffer internal frames during ParallelPipeline lifecycle sync
2026-02-06 15:25:32 -08:00
Aleix Conchillo Flaqué
cd03d449cb Update changelog skill with skip rules and allowed types 2026-02-06 15:23:14 -08:00
Aleix Conchillo Flaqué
57df03aade Update CLAUDE.md with PR workflow instructions 2026-02-06 15:23:14 -08:00
Aleix Conchillo Flaqué
4945cfbd8f Buffer internal frames during ParallelPipeline lifecycle synchronization
Processors inside parallel sub-pipelines can push frames during
StartFrame/EndFrame/CancelFrame processing. Previously these frames
could escape the ParallelPipeline before all branches finished
processing the lifecycle frame. Now they are buffered and flushed
after synchronization completes.
2026-02-06 15:15:46 -08:00
Mark Backman
8d37d3bae7 Merge pull request #3666 from pipecat-ai/mb/deepgram-stt-smart-format
DeepgramSTTService: disable smart_format by default
2026-02-06 14:04:37 -05:00
Mark Backman
d7b1624d3c Merge pull request #3663 from lukepayyapilli/fix/stream-close-sambanova-google
fix: close stream on cancellation for SambaNova and Google OpenAI services
2026-02-06 14:02:31 -05:00
Mark Backman
7f65204c3b DeepgramSTTService: disable smart_format by default 2026-02-06 13:45:10 -05:00
Aleix Conchillo Flaqué
97eff414c3 Merge pull request #3660 from pipecat-ai/aleix/interruption-frame-completion-event
Attach asyncio.Event to InterruptionFrame for completion signaling
2026-02-06 10:14:26 -08:00
Aleix Conchillo Flaqué
5b67e76de7 Add changelog for PR #3660 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
b9e79bd06a CLAUDE.md: explain about InterruptionFrame.complete() 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
d5105a78e6 STTMuteFilter should call frame.complete() when InterruptionFrame is blocked 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
a352b2d7a0 Add tests for InterruptionFrame completion event
Add tests for the event-based interruption completion: complete() sets
the event, complete() is safe without an event, the event fires at
the pipeline sink, and a warning is logged when the frame is blocked.

Also remove the unconditional await after the timeout so the function
returns instead of hanging when complete() is never called.
2026-02-06 09:57:24 -08:00
Aleix Conchillo Flaqué
2345090b10 Attach asyncio.Event to InterruptionFrame for completion signaling
Move the interruption wait event from per-processor instance state to
the frame itself. The event is created in
push_interruption_task_frame_and_wait(), threaded through
InterruptionTaskFrame → InterruptionFrame, and set when the frame
reaches the pipeline sink. This scopes the event to each interruption
flow rather than sharing mutable state on the processor.

Also adds a 2s timeout warning to help diagnose cases where
InterruptionFrame.complete() is never called.
2026-02-06 09:57:24 -08:00
Mark Backman
af562bf9a8 Merge pull request #3664 from pipecat-ai/mb/elevenlabs-scribe-v2
Update ElevenLabsSTTService to scribe_v2
2026-02-06 12:31:44 -05:00
Mark Backman
d4993f0dcf Update ElevenLabsSTTService to scribe_v2 2026-02-06 11:37:23 -05:00
Luke Payyapilli
1790a84bfd add changelog 2026-02-06 10:05:02 -05:00
Luke Payyapilli
29c53b99a4 fix: close stream on cancellation for SambaNova and Google OpenAI services 2026-02-06 10:02:40 -05:00
Mark Backman
aa5a855eab Merge pull request #3656 from pipecat-ai/mb/openai-realtime-stt
Add OpenAIRealtimeSTTService
2026-02-06 09:15:58 -05:00
Mark Backman
e66d6f8ffe Merge pull request #3658 from pipecat-ai/mb/bump-protobuf-5.29.6
Upgrade protobuf to >=5.29.6
2026-02-05 19:09:30 -05:00
Mark Backman
b8ac2ba713 Merge pull request #3593 from ianbbqzy/ian/inworld-auto-mode
Add auto_mode support for inworld plugin
2026-02-05 18:16:38 -05:00
Ian Lee
6eea40858e fix lint and changelog 2026-02-05 15:10:36 -08:00
Mark Backman
90700d10aa Upgrade protobuf to >=5.29.6 2026-02-05 18:08:52 -05:00
Mark Backman
fa85f7bbc7 Merge pull request #3640 from lukepayyapilli/fix/openai-stream-close
fix: close stream on cancellation to prevent socket leaks
2026-02-05 18:00:06 -05:00
Mark Backman
669f013970 Merge pull request #3657 from pipecat-ai/filipi/changing_no_audio_log_to_debug
Changing the ‘no audio received’ log from warning to debug.
2026-02-05 17:35:24 -05:00
filipi87
76f63e54e2 Changing the ‘no audio received’ log from warning to debug. 2026-02-05 18:07:14 -03:00
Filipi da Silva Fuchter
cce5a13444 Merge pull request #3650 from pipecat-ai/filipi/twilio_issues
Ignoring RTVI messages inside the Serializers by default.
2026-02-05 15:52:59 -05:00
Mark Backman
d11e1cd631 Update 13k to use ElevenLabsRealtimeSTTService 2026-02-05 15:48:00 -05:00
Mark Backman
8b9da632d1 Add OpenAIRealtimeSTTService 2026-02-05 15:48:00 -05:00
Mark Backman
b36f7892a4 Merge pull request #3654 from pipecat-ai/aleix/more-claude-update
CLAUDE.md: add RTVI and serializers
2026-02-05 15:23:35 -05:00
Mark Backman
9b43cde128 Merge pull request #3355 from itsderek23/user-bot-latency
Add `user_bot_latency_seconds` to OpenTelemetry turn spans
2026-02-05 15:23:15 -05:00
filipi87
6af4d872a8 Refactoring the serializers to ignore the RTVI messages by default. 2026-02-05 16:52:53 -03:00
Ian Lee
22398e1410 add changelog back 2026-02-05 11:39:39 -08:00
Ian Lee
d10467e043 update timestamps reset handling 2026-02-05 11:39:39 -08:00
Ian Lee
cbe131636d add changelog 2026-02-05 11:39:39 -08:00
Ian Lee
fef9e3ea32 Add auto_mode support for inworld plugin 2026-02-05 11:39:39 -08:00
Mark Backman
56d8ef2bf4 Deprecate UserBotLatencyLogObserver, update 29 example 2026-02-05 14:29:45 -05:00
Derek Haynes
8791559351 Add changelog entry for PR #3355 2026-02-05 14:29:45 -05:00
Derek Haynes
f6c919354f Add test for user bot latency 2026-02-05 14:29:45 -05:00
Derek Haynes
93138466d6 Feat: Add user-bot latency to OTel turn spans
This adds user-to-bot response latency tracking to OpenTelemetry spans:

- Created UserBotLatencyObserver as a reusable component for tracking
user-to-bot response latency
- Records the value as an attribute on turn spans (turn.user_bot_latency_seconds)
- Updated TurnTraceObserver to use UserBotLatencyObserver, following the same pattern as TurnTrackingObserver
- Updated PipelineTask to automatically create and wire UserBotLatencyObserver
when tracing is enabled (same as TurnTrackingObserver)
2026-02-05 14:29:42 -05:00
Mark Backman
5a5a98b497 Merge pull request #3649 from itsderek23/fix/tracing-orphan-spans
Fix orphan otel spans during flow initialization and transitions
2026-02-05 14:23:52 -05:00
Aleix Conchillo Flaqué
2b4f507d37 CLAUDE.md: add RTVI and serializers 2026-02-05 11:06:00 -08:00
Mark Backman
d6f3a90662 Merge pull request #3652 from pipecat-ai/mb/upgrade-small-webrtc-prebuilt-2.1.0
Upgrade pipecat-ai-small-webrtc-prebuilt to 2.1.0
2026-02-05 13:48:54 -05:00
Derek Haynes
8fb0e37965 Update changelog for #3649 2026-02-05 11:35:22 -07:00
Derek Haynes
0d45b48f7b Fix import placement 2026-02-05 11:26:58 -07:00
Mark Backman
6af4520b1f Merge pull request #3635 from pipecat-ai/filipi/fix_websocket
Fixed an error in the WebSocket transport that occurred when an InputTransportMessageFrame was received and broadcast.
2026-02-05 12:22:59 -05:00
filipi87
ba469e5645 Add changelog entry
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 12:19:51 -05:00
Mark Backman
bd12b60b5c Merge pull request #3614 from okue/fix/websocket-broadcast-frame-misuse
fix: pass frame class instead of instance to broadcast_frame in websocket transports
2026-02-05 12:19:03 -05:00
Mark Backman
54db37ea47 Upgrade pipecat-ai-small-webrtc-prebuilt to 2.1.0 2026-02-05 12:09:51 -05:00
filipi87
752e16f553 Ignoring RTVI messages inside TwilioSerializer by default. 2026-02-05 10:51:03 -03:00
Derek Haynes
7c7408a048 Fix orphan spans in tracing during flow initialization and transitions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 06:06:13 -07:00
Mark Backman
8f42343927 Merge pull request #3630 from pipecat-ai/mb/add-function-call-messages-rtvi
Add native RTVI function call lifecycle messages
2026-02-04 16:20:42 -05:00
Mark Backman
46da6cd91b Update changelogs 2026-02-04 11:19:30 -05:00
Mark Backman
ecb02d9049 Bump RTVI_PROTOCOL_VERSION to 1.2.0 2026-02-04 11:17:38 -05:00
Mark Backman
cc68e00125 Deprecate llm-function-call message 2026-02-04 11:17:23 -05:00
Mark Backman
e0e3b5250b Add RTVIObserverParams to control what information is included in function call events 2026-02-04 11:05:05 -05:00
Luke Payyapilli
55a3b10e70 fix(openai): close stream on cancellation to prevent socket leaks 2026-02-04 09:59:10 -05:00
dhruvladia-sarvam
e6b06414b3 change default speaker for bulbul:v3-beta to shubh 2026-02-04 16:46:35 +05:30
Aleix Conchillo Flaqué
6bcfb40d12 Merge pull request #3636 from pipecat-ai/aleix/initial-claude-md
initial CLAUDE.md
2026-02-03 19:31:16 -08:00
Aleix Conchillo Flaqué
65b1a8ce36 initial CLAUDE.md 2026-02-03 18:04:54 -08:00
Mark Backman
2db3d94d06 Merge pull request #3628 from pipecat-ai/mb/broadcast-speech-control-params-frame
Fix: Broadcast SpeechControlParamsFrame from VADController
2026-02-03 18:44:15 -05:00
Mark Backman
2a26b9f7a3 Fix: Broadcast SpeechControlParamsFrame from VADController 2026-02-03 18:40:39 -05:00
Aleix Conchillo Flaqué
4f77c532fb Merge pull request #3623 from pipecat-ai/aleix/pipeline-task-rtvi-always-set-bot-ready
PipelineTask: also call set_bot_ready() for external RTVI processors
2026-02-03 14:21:03 -08:00
Aleix Conchillo Flaqué
c3a4da4a29 PipelineTask: also call set_bot_ready() for external RTVI processors 2026-02-03 14:16:08 -08:00
Mark Backman
84ca0b6d58 Merge pull request #3629 from pipecat-ai/fix/telephony-websocket-stopasynciteration
Fix StopAsyncIteration in parse_telephony_websocket
2026-02-03 12:10:07 -05:00
Mark Backman
c1857d255d Avoid nesting try/excepts 2026-02-03 12:00:04 -05:00
Mark Backman
d50ec33079 Merge pull request #3542 from lukepayyapilli/fix/terminal-frames-uninterruptible
fix: make EndFrame and StopFrame uninterruptible to prevent pipeline freeze
2026-02-03 10:08:17 -05:00
Mark Backman
40c84faff5 Remove handle_function_call_start 2026-02-03 10:00:59 -05:00
Mark Backman
84cd9346f9 Add native RTVI function call lifecycle messages 2026-02-03 10:00:59 -05:00
Luke Payyapilli
5d5b19e1d2 Add changelog entry 2026-02-03 09:12:59 -05:00
Luke Payyapilli
8d3e10f054 Make EndFrame and StopFrame uninterruptible to prevent pipeline freeze 2026-02-03 09:12:59 -05:00
dhruvladia-sarvam
1665ce181a refactor(sarvam): centralize model configuration with dataclasses 2026-02-03 14:33:41 +05:30
James Hush
803a20cc00 Fix formatting: remove extra blank line
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:46:44 +08:00
James Hush
90bead06ab Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-03 16:42:13 +08:00
James Hush
b427d534ae Add tests for parse_telephony_websocket StopAsyncIteration handling
Tests cover:
- No messages received (raises ValueError)
- One message received (logs warning, continues)
- Two messages received (normal operation)
- All telephony providers (Twilio, Telnyx, Plivo, Exotel)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:33:36 +08:00
James Hush
b030f1178d Add changelog and improve docstring for parse_telephony_websocket
- Added changelog entry for bug fix
- Enhanced docstring with Args and Raises sections

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:26:09 +08:00
James Hush
a627597bca Fix StopAsyncIteration in parse_telephony_websocket
Handle WebSocket disconnections gracefully when telephony providers send
fewer messages than expected. Adds explicit StopAsyncIteration handling
for both first and second message retrieval.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:25:07 +08:00
Aleix Conchillo Flaqué
4c10ddb7bb upgrade uv.lock 2026-02-02 16:25:06 -08:00
Mark Backman
a4e499dc80 Merge pull request #3617 from pipecat-ai/fix/cjk-sentence-splitting
Fix sentence splitting for CJK and other non-Latin languages
2026-02-02 18:16:51 -05:00
Mark Backman
ca49acfaa6 Merge pull request #3619 from pipecat-ai/mb/resemble-readme
Resemble cleanup
2026-02-02 09:20:11 -05:00
Mark Backman
86147f15f3 Renumber the Resemble foundational example 2026-02-02 09:07:05 -05:00
Mark Backman
5cda72d138 Add Resemble TTS to README 2026-02-02 09:05:03 -05:00
Mark Backman
54e62a8177 Merge pull request #3134 from pipecat-ai/mb/resemble-tts-draft
Add ResembleAITTSService
2026-02-02 08:59:27 -05:00
Mark Backman
a592b7fdf0 Update per PR 1789, align with ErrorFrame norms 2026-02-02 08:55:29 -05:00
Mark Backman
ba2b7c05d6 Add ResembleAITTSService 2026-02-02 08:55:27 -05:00
James Hush
774041e9a1 Add changelog for PR #3617
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:47:22 +08:00
James Hush
763002f2bc Fix sentence splitting for CJK and other non-Latin languages in TTS pipeline
NLTK's sent_tokenize() only supports ~15 European languages and defaults to
English. For Japanese, Chinese, Korean, Hindi, Arabic, and other non-Latin
languages, NLTK fails to recognize sentence boundaries like 。?! causing
text to accumulate until flush instead of being emitted sentence-by-sentence.

Add a fallback in match_endofsentence() that scans for unambiguous non-Latin
sentence-ending punctuation when NLTK fails to split the text. Latin
punctuation (. ! ? ; …) is excluded from the fallback since NLTK handles
those correctly and they can be ambiguous (abbreviations, decimals, etc.).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:27:49 +08:00
Om Chauhan
50dedf350d fix: ensure function call timeout task is always cancelled 2026-02-02 08:38:54 +05:30
okue
d3ecbb11c1 fix: pass frame class instead of instance to broadcast_frame in websocket transports
broadcast_frame() expects a frame class and kwargs, but the three
websocket input transports (fastapi, client, server) were incorrectly
passing a frame instance. This would cause a TypeError at runtime when
an InputTransportMessageFrame was received.
2026-02-01 20:38:34 +09:00
Aleix Conchillo Flaqué
f453227ba3 Merge pull request #3612 from pipecat-ai/aleix/use-kokoro-onnx
KokoroTTSService: use kokoro-onnx instead of kokoro
2026-01-31 21:03:55 -08:00
Aleix Conchillo Flaqué
52cc64019a Merge pull request #3611 from pipecat-ai/aleix/aicoustics-example-update
examples: update 07zd to use vad_analyzer in LLMUserAggregator
2026-01-31 21:02:50 -08:00
Aleix Conchillo Flaqué
95689cc81c KokoroTTSService: use kokoro-onnx instead of kokoro 2026-01-31 17:20:27 -08:00
Aleix Conchillo Flaqué
675c7c43e3 examples: update 07zd to use vad_analyzer in LLMUserAggregator 2026-01-31 15:31:15 -08:00
Aleix Conchillo Flaqué
bfd19e867c Merge pull request #3610 from pipecat-ai/aleix/dont-add-rtvi-observer-if-already-there
PipelineTask: don't add RTVIObserver if already there
2026-01-31 14:57:52 -08:00
Aleix Conchillo Flaqué
acc9923c0a PipelineTask: don't add RTVIObserver if already there 2026-01-31 14:54:29 -08:00
Mark Backman
bdc9e7e2e4 Merge pull request #3608 from pipecat-ai/mb/quickstart-0.0.101
Update quickstart for 0.0.101
2026-01-31 10:39:17 -05:00
Mark Backman
a587e1b99a Update quickstart for 0.0.101 2026-01-31 09:52:24 -05:00
Aleix Conchillo Flaqué
7853e5ca93 Merge pull request #3606 from pipecat-ai/changelog-0.0.101
Release 0.0.101 - Changelog Update
2026-01-30 22:58:22 -08:00
aconchillo
614b8e1a62 Update changelog for version 0.0.101 2026-01-30 22:54:31 -08:00
Aleix Conchillo Flaqué
ef51c2a5c6 changelog: fix 3582 changed file 2026-01-30 22:48:26 -08:00
Aleix Conchillo Flaqué
f42dc0d38e Merge pull request #3605 from pipecat-ai/aleix/gemini-live-schedule-transcription-timeout-handler
GeminiLiveLLMService: let the transcription timeout handler be scheduled
2026-01-30 22:44:05 -08:00
Aleix Conchillo Flaqué
d87f3543c7 GeminiLiveLLMService: let the transcription timeout handler be scheduled 2026-01-30 22:41:10 -08:00
Aleix Conchillo Flaqué
fee633cb92 scripts(evals): disable kokoro for now 2026-01-30 21:23:42 -08:00
Aleix Conchillo Flaqué
607af91153 Merge pull request #3604 from pipecat-ai/mb/fix-ivr-navigator-aggregation
Fix IVRNavigator to push AggregatedTextFrame when switching to conver…
2026-01-30 21:22:20 -08:00
Mark Backman
e779233918 Fix IVRNavigator to push AggregatedTextFrame when switching to conversation mode 2026-01-30 21:07:49 -05:00
Aleix Conchillo Flaqué
604d5d0b14 examples: update 07zi and 07zj to use vad_analyzer form LLMUserAggregator 2026-01-30 16:14:02 -08:00
Mark Backman
342ae7af41 Merge pull request #3601 from pipecat-ai/mb/add-22-release-evals
Add 22 foundational to release evals
2026-01-30 15:31:54 -05:00
Mark Backman
c92ec1552e Add 22 foundational to release evals 2026-01-30 15:12:52 -05:00
Aleix Conchillo Flaqué
93160f1455 scripts(evals): remove vad_analyzer from transport 2026-01-30 12:08:12 -08:00
Aleix Conchillo Flaqué
e3158e1131 Merge pull request #3600 from pipecat-ai/aleix/llm-server-timeout-task-never-waited
LLMService: make sure function call timeout handler is started
2026-01-30 12:01:18 -08:00
Mark Backman
63a23246d5 Add UserTurnCompletionLLMServiceMixin (#3518)
* Added UserTurnCompletionLLMServiceMixin class

* Added 22-filter-incomplete-turns.py foundational example

* Removed old 22 natural conversation foundational examples

* Added test_user_turn_completion_mixin.py
2026-01-30 14:57:15 -05:00
Aleix Conchillo Flaqué
569ea9849a Merge pull request #3599 from pipecat-ai/aleix/release-evals-disable-rtvi
scripts(evals): disable RTVI
2026-01-30 11:44:46 -08:00
Aleix Conchillo Flaqué
a98ca9b65b LLMService: make sure function call timeout handler is started 2026-01-30 11:38:26 -08:00
Aleix Conchillo Flaqué
c9310789dc scripts(evals): use new vad_analyzer from LLMUSerAggregator 2026-01-30 10:57:17 -08:00
Aleix Conchillo Flaqué
b93e12d701 scripts(evals): disable RTVI 2026-01-30 10:52:38 -08:00
Aleix Conchillo Flaqué
3f77da627d Merge pull request #3583 from pipecat-ai/aleix/move-vad-analyzer-to-llm-user-aggregator
VAD analyzer is now passed to LLMUserAggregator
2026-01-30 10:46:10 -08:00
Aleix Conchillo Flaqué
35d265770d LLMUserAggregator: don't process certain self-queued frames 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
9632efec8c VADProcessor: broadcast frames 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
27dbfa1eda NvidiaTTSService: return AsyncIterator instead of AsyncIterable 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
183c0aa4ef LLMUserAggregator: queue frames internally so strategies and controllers can process them 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
a69a037ffa changelog: add updates for #3583 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
c46e7f5da0 TurnAnalyzerUserTurnStopStrategy: only update vad params if frame contains vad 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
307aeaeda0 examples: update with LLMUserAggregatorParams vad_analyzer and VADProcessor 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
305ab44132 tests: add unittest.main() call 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
b486f35c70 audio: add new VADProcessor 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
c92080b0d2 LLMUserAggregator: add vad_analyzer and use VADController 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
ddfedaf478 audio(vad): add new VADController 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
b1ad4d5ab0 BaseInputTransport: deprecate vad_analyzer 2026-01-30 10:07:33 -08:00
Aleix Conchillo Flaqué
0857aa87be Merge pull request #3595 from pipecat-ai/aleix/add-kokoro-tts-support
services(tss): add new KokoroTTSService
2026-01-30 09:49:05 -08:00
Aleix Conchillo Flaqué
fd3c5f69b7 upgrade uv.lock 2026-01-30 09:41:33 -08:00
Aleix Conchillo Flaqué
72ab329513 services(tss): add new KokoroTTSService 2026-01-30 09:39:01 -08:00
Filipi da Silva Fuchter
7999d08b7e Merge pull request #3052 from Navigate-AI/fork/main
Include pts in video and audio frames in SmallWebRTCClient
2026-01-30 09:03:29 -05:00
dhruvladia-sarvam
57821cf709 fix 2026-01-30 16:07:52 +05:30
dhruvladia-sarvam
18045582a9 ASR and TTS v3 update 2026-01-30 15:53:06 +05:30
Mark Backman
7be2b8cc34 Merge pull request #3587 from pipecat-ai/mb/gradium-improvements
GradiumSTTService now flushes pending transcripts on VAD stopped dete…
2026-01-29 18:11:25 -05:00
Aleix Conchillo Flaqué
671cc8eb74 Merge pull request #3590 from pipecat-ai/aleix/custom-cli-runner-args
runner: allow custom CLI arguments
2026-01-29 13:53:27 -08:00
Aleix Conchillo Flaqué
b4dce656f0 Merge pull request #3594 from pipecat-ai/aleix/user-turn-controller-reset-timeout-on-interims
UserTurnController: reset user turn timeout with interim transcriptions
2026-01-29 13:12:44 -08:00
Aleix Conchillo Flaqué
253a1d1114 UserTurnController: reset user turn timeout with interim transcriptions 2026-01-29 13:10:10 -08:00
Aleix Conchillo Flaqué
ca613bcb79 Merge pull request #3592 from pipecat-ai/aleix/broadcast-frame-no-deepcopy
don't deep copy fields when broadcasting frames
2026-01-29 11:50:20 -08:00
Aleix Conchillo Flaqué
0423acd8a0 STTService: just clear buffer before running run_stt() 2026-01-29 11:47:57 -08:00
Aleix Conchillo Flaqué
7eabaaa0ef FrameProcessors: do not deepcopy fields when broadcasting frames 2026-01-29 11:47:57 -08:00
Aleix Conchillo Flaqué
bbb8b53d03 runner: allow custom CLI arguments 2026-01-29 10:15:53 -08:00
Aleix Conchillo Flaqué
f3b72e9263 Merge pull request #3585 from pipecat-ai/aleix/improve-piper-tts-support
improve Piper TTS support
2026-01-29 08:36:13 -08:00
Mark Backman
31c7fbc5ba Add delay_in_frames and language support 2026-01-29 10:59:04 -05:00
Mark Backman
6ab12626d6 GradiumSTTService now flushes pending transcripts on VAD stopped detection 2026-01-29 10:26:17 -05:00
Mark Backman
b77a50de73 Merge pull request #3529 from lukepayyapilli/fix/llm-timeout-without-retry
feat: handle exceptions for BaseOpenAILLMService
2026-01-29 09:12:54 -05:00
Luke Payyapilli
433c1b9b92 add catch-all exception handler per review feedback 2026-01-29 09:07:06 -05:00
Aleix Conchillo Flaqué
bd00587092 changelog: add files for 3585 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
5a85e27cc5 PiperHttpTTSService: allow passing a voice id 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
11daa43b1b TTSService: resample _stream_audio_frames_from_iterator() input audio if needed 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
875614ff7a tts: add support for local PiperTTSService 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
eb1bf1e446 tts: rename PiperTTSService to PiperHttpTTSService 2026-01-28 23:27:32 -08:00
mattie ruth backman
7456a0a55f Fix the /start and /offer/api proxy endpoints for smallWebRTC to match pipecat cloud behavior WRT requestData 2026-01-28 15:25:13 -05:00
Filipi da Silva Fuchter
27277ed3d9 Merge pull request #3571 from pipecat-ai/filipi/funcion_call_improvements
Function call improvements
2026-01-28 14:03:40 -05:00
filipi87
5543bc56f3 Add changelog files for PR #3571
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 15:43:59 -03:00
filipi87
c8496dfb8e Updated the examples which use UserImageRequestFrame to defer the function call result. 2026-01-28 15:39:21 -03:00
filipi87
d3f4cbb620 Providing a way to defer the function call results. 2026-01-28 15:39:06 -03:00
filipi87
c9f922c479 Removed an overridden method that was identical to the parent implementation. 2026-01-28 15:38:40 -03:00
Aleix Conchillo Flaqué
49bd3da26b Merge pull request #3582 from pipecat-ai/aleix/daily-sample-room-url
rename DAILY_SAMPLE_ROOM_URL to DAILY_ROOM_URL
2026-01-28 10:38:14 -08:00
Aleix Conchillo Flaqué
f3ef488925 rename DAILY_SAMPLE_ROOM_URL to DAILY_ROOM_URL 2026-01-28 10:05:27 -08:00
Aleix Conchillo Flaqué
4f08098917 Merge pull request #3580 from Pulkit0729/fix/livekit
fix: adding missing livekit transport configs
2026-01-28 10:04:34 -08:00
Pulkit
a7cd5b0322 fix: adding missing livekit transport configs 2026-01-28 23:15:03 +05:30
Aleix Conchillo Flaqué
55dadc9118 tests(genesys): fix formatting 2026-01-28 09:15:42 -08:00
Aleix Conchillo Flaqué
01bbf61e0d Merge pull request #3500 from ssillerom/feature/genesys_serializer
Feature/genesys serializer
2026-01-28 09:09:11 -08:00
ssillerom
10fb77c0e2 added changelog file 2026-01-28 18:07:33 +01:00
ssillerom
2612fae527 ruff linting 2026-01-28 18:02:51 +01:00
ssillerom
c5be67f293 fix: create disconnect message passing output vars 2026-01-28 17:56:21 +01:00
kompfner
312caaba86 Merge pull request #3429 from lukepayyapilli/fix/gemini-live-interrupted-signal
feat: handle server_content.interrupted for faster interruptions
2026-01-28 10:25:36 -05:00
Luke Payyapilli
ff0eb6d286 fix: emit ErrorFrame on LLM completion timeout 2026-01-28 09:44:32 -05:00
ssillerom
ef6bbace98 fixes: super init inhereted class to set event hanlders in the construct 2026-01-28 15:40:24 +01:00
Filipi da Silva Fuchter
06ec21387f Merge pull request #3581 from pipecat-ai/filipi/open_ai_audio_duration
Fixed race condition in OpenAIRealtimeLLMService
2026-01-28 07:42:35 -05:00
filipi87
bdae177125 Adding changelog entry for the OpenAiRealtimeLLMService fix. 2026-01-28 08:39:11 -03:00
filipi87
468e159f9b Fixed race condition in OpenAIRealtimeLLMService that could cause an error when truncating the conversation. 2026-01-28 08:36:31 -03:00
ssillerom
a4acafd3be feature: added event handlers in constructor and call func in each _handle_* func 2026-01-28 10:54:26 +01:00
ssillerom
105824a372 Merge main into feature/genesys_serializer
Incorporates latest changes from main branch including:
- AIC filter and VAD updates
- STT service improvements
- Base serializer changes
- Various bug fixes
2026-01-28 10:48:56 +01:00
ssillerom
55e0d4ecc4 ruff fixes done 2026-01-28 08:59:28 +01:00
ssillerom
9102e81cb8 added tests to the PR 2026-01-27 23:39:43 +01:00
ssillerom
d7d8e93a3d feature: added custom params in closed message to genesys, simplified create_* functions, simplified constructor method and simplified opened message 2026-01-27 23:36:47 +01:00
Mark Backman
bf9b166464 Merge pull request #3575 from pipecat-ai/mb/fix-turn-stopped-event-end-cancel-frame
Emit on_assistant_turn_stopped and on_user_turn_stopped from EndFrame…
2026-01-27 14:55:34 -05:00
Mark Backman
e80e0eab29 Emit on_assistant_turn_stopped and on_user_turn_stopped from EndFrame or CancelFrame 2026-01-27 14:50:10 -05:00
Mark Backman
61242e6575 Merge pull request #3574 from pipecat-ai/mb/fix-websocket-close-message-handling
Fix WebsocketService infinite loop on graceful server disconnect
2026-01-27 13:53:26 -05:00
Aleix Conchillo Flaqué
8841387121 Merge pull request #3560 from pipecat-ai/aleix/serializer-base-objects
FrameSerializer: subclass from BaseObject so we can add events
2026-01-27 09:58:44 -08:00
Aleix Conchillo Flaqué
ee695ae9fe FrameSerializer: subclass from BaseObject so we can add events 2026-01-27 09:53:46 -08:00
Mark Backman
52012b0fb2 Fix WebsocketService infinite loop on graceful server disconnect 2026-01-27 12:41:28 -05:00
Mark Backman
f7a1c6b719 Merge pull request #3408 from ai-coustics/aic-v2
Add ai-coustics AIC SDK v2 support with model downloading
2026-01-27 10:38:26 -05:00
Gökmen Görgen
6aa77ccc13 group aic related changes in changelog. 2026-01-27 16:22:54 +01:00
Gökmen Görgen
45b7ec4e2c re-enable 07zd-interruptible-aicoustics.py in release evals. 2026-01-27 16:18:56 +01:00
Mark Backman
1c434c6ad5 Merge pull request #3562 from speechmatics/fix/smx-ttfs-finals
Support TTFS for Speechmatics STT
2026-01-27 08:35:34 -05:00
Mark Backman
4591affba9 Merge pull request #3568 from pipecat-ai/mb/changelog-3536 2026-01-27 07:14:41 -05:00
Sam Sykes
91346f5f37 Add support for self.request_finalize() for Pipecat-based VAD. 2026-01-27 10:44:35 +00:00
Filipi da Silva Fuchter
6a66ebe332 Merge pull request #3541 from pipecat-ai/filipi/audio_buffer
Refactoring AudioBufferProcessor to fix audio track synchronization.
2026-01-27 05:32:41 -05:00
Filipi da Silva Fuchter
c1d4180042 Merge pull request #3567 from pipecat-ai/filipi/openai_realtime_audio_duration
Fixed race condition in OpenAIRealtimeBetaLLMService
2026-01-27 05:30:33 -05:00
Gökmen Görgen
81a53c699c handle AIC processor init errors gracefully and ensure _aic_ready reflects readiness 2026-01-27 11:28:05 +01:00
Sam Sykes
60168f7f69 remove comment 2026-01-26 23:16:43 +00:00
Sam Sykes
23d7608e5f changelog update 2026-01-26 23:15:30 +00:00
Sam Sykes
99242c0a93 linting updates 2026-01-26 23:14:40 +00:00
Sam Sykes
3a71865cf4 removed old metrics 2026-01-26 23:11:25 +00:00
Mark Backman
ecf2e69f3f Merge pull request #3536 from surapuramakhil/main
LLMAssistantAggregator: preserve non-ASCII characters in JSON output
2026-01-26 16:42:05 -05:00
Mark Backman
febd52274d Add changelog fragment for PR 3536 2026-01-26 16:42:00 -05:00
Mark Backman
1542d922e7 Merge pull request #3546 from pipecat-ai/pk/changelog-fragment-for-pr-3406
Added a changelog fragment for PR 3406
2026-01-26 16:31:57 -05:00
Paul Kompfner
15d5d1159e Added a changelog fragment for PR 3406 2026-01-26 16:27:33 -05:00
Mark Backman
884630a6bd Merge pull request #3559 from pipecat-ai/aleix/transport-broadcast-fixes
transports: fix broadcast_frame_class reference
2026-01-26 16:25:31 -05:00
Mark Backman
1cf137c6a8 Merge pull request #3565 from pipecat-ai/markbackman-patch-1 2026-01-26 15:49:35 -05:00
filipi87
98fcfd7c91 Adding changelog entry for the OpenAiRealtimeBetaLLMService fix. 2026-01-26 17:19:08 -03:00
filipi87
2f23f2e39c Fixed race condition in OpenAIRealtimeBetaLLMService that could cause an error when truncating the conversation. 2026-01-26 17:08:27 -03:00
Mark Backman
9c6b11cecf Update README links to use absolute URLs 2026-01-26 13:03:39 -05:00
Sam Sykes
fc1444c9d6 Updated changelog 2026-01-26 16:25:37 +00:00
Sam Sykes
ea94939add update dependency 2026-01-26 16:24:56 +00:00
Sam Sykes
0c69ae6371 Changelog entry. 2026-01-26 16:07:59 +00:00
Sam Sykes
8b88280bb1 Default to using EXTERNAL mode. 2026-01-26 15:52:42 +00:00
Sam Sykes
960d0faea5 support is_eou for final segment in utterance 2026-01-26 15:48:04 +00:00
Luke Payyapilli
b9390ccb1b Address review: remove UserStartedSpeakingFrame, add explanatory comment 2026-01-26 10:08:17 -05:00
Mark Backman
061a0dc43d Merge pull request #3498 from pipecat-ai/mb/azure-tts-8khz-workaround
AzureTTSService 8khz workaround
2026-01-26 09:48:22 -05:00
Mark Backman
328bbe069f Merge pull request #3554 from pipecat-ai/mb/simplify-stt-ttfb
Simplify STT finalize handling
2026-01-26 08:00:04 -05:00
Mark Backman
dc32ecc872 Merge pull request #3555 from pipecat-ai/mb/speechmatics-stt-ttfb
Align Speechmatics STT TTFB metrics with STT classes
2026-01-26 07:59:34 -05:00
Gökmen Görgen
ca2eb1904f Merge remote-tracking branch 'origin/aic-v2' into aic-v2 2026-01-26 10:16:23 +01:00
Gökmen Görgen
4bce58f270 update changelog and remove outdated dependency notes 2026-01-26 10:15:15 +01:00
Gökmen Görgen
7572d63f8f Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:40 +01:00
Gökmen Görgen
3c463c9416 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:33 +01:00
Gökmen Görgen
bd618d64e3 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:16 +01:00
Gökmen Görgen
a824660df7 add unit tests for AICVADAnalyzer and AICFilter. 2026-01-26 09:56:36 +01:00
Gökmen Görgen
58b9019852 bump aic-sdk to 2.0.1 in optional dependencies. 2026-01-26 09:14:16 +01:00
Gökmen Görgen
afcdef8c81 docstring clarification. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
bd92104fb3 clarify voice confidence method behavior in AIC VAD. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
34e9f224a8 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
dca7f3b5b0 add changelog. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
70a85cd192 use path for keeping the consistency between the parameters. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
91e86658b7 force developer to set a license key, it's required. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
0a8588669c address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
0e99400148 two dots are rust specific thinks, I'm not sure if it's familiar for Python developers. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
648f20db6d Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
09b5b6b12d Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
0e6a423955 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
dc8972cd94 log optimal number of frames for given sample rate in AICFilter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
e4e2231958 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
18b3ee743b replace os with pathlib.Path in AICFilter for path handling consistency. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
65b8e0e89c rename enabled to bypass in AICFilter for clarity. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
b77f8b065f remove voice gain. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
5fd43faec3 add min speech duration. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
abebcf37bd address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
ca4e3c79f9 Update pyproject.toml
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
e8d1bec03b Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
f0cc54589e remove enhancement level parameter from AICFilter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
22b9aac2ff use quail model in the example. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
7f86f4ac27 fix class name. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
dcab79753b even the parameters are fixed, keep aic ready for processing. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
bdded9b026 set SDK ID for telemetry in AIC filter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
1e1e275fea address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
effb6aa8f4 clean up unused imports in audio utils. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a4a9bae79e drop v1 support from aic. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
c943ef9261 keep uv.lock as it is. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
f05809520b Remove outdated AIC Filter and VAD v2 files, migrate to consolidated implementations.
Added the new ACIFilter to the same module.
2026-01-26 08:44:17 +01:00
Gökmen Görgen
ec17dc6626 aic-sdk-py v2.
# Conflicts:
#	uv.lock

# Conflicts:
#	examples/foundational/07zd-interruptible-aicoustics.py
#	pyproject.toml
#	src/pipecat/audio/filters/aic_filter.py
#	src/pipecat/audio/vad/aic_vad.py
2026-01-26 08:44:17 +01:00
Gökmen Görgen
4e85e81d9b Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Tobias <76444201+Fl1tzi@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
a1cc88a233 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Tobias <76444201+Fl1tzi@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
61a230ec53 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Stephan Eckes <stephan@steck.tech>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
a13380b574 clean up unused imports in audio utils. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
2a927189d9 reorganize imports. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a90c15362c drop v1 support from aic. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
d3bdd2d246 use new model id. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
465ae4f706 keep uv.lock as it is. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a0d801b658 Remove outdated AIC Filter and VAD v2 files, migrate to consolidated implementations.
Added the new ACIFilter to the same module.
2026-01-26 08:44:17 +01:00
Gökmen Görgen
35919a84e3 aic-sdk-py v2.
# Conflicts:
#	uv.lock
2026-01-26 08:44:17 +01:00
Aleix Conchillo Flaqué
f94a60f381 transports: fix broadcast_frame_class reference 2026-01-25 15:42:09 -08:00
ssillerom
a446bca72d changes: added OutputTransportUrgentFrame to on closed, removed callback 2026-01-25 21:12:28 +01:00
Sergio Sillero
8ae834366b Merge branch 'pipecat-ai:main' into feature/genesys_serializer 2026-01-25 21:04:27 +01:00
Mark Backman
a4acc12f91 Align Speechmatics STT TTFB metrics with STT classes 2026-01-24 18:26:34 -05:00
Mark Backman
e93112e76e Simplify STT finalize handling 2026-01-24 15:28:27 -05:00
Mark Backman
680bcaac66 Merge pull request #3550 from pipecat-ai/mb/update-smart-turn-data-env-var
Update env var to PIPECAT_SMART_TURN_LOG_DATA
2026-01-24 13:52:36 -05:00
Mark Backman
d2ac9006a2 Update env var to PIPECAT_SMART_TURN_LOG_DATA 2026-01-24 12:50:42 -05:00
Mark Backman
bcb019e8ab Add TTFB metrics for STT services (#3495) 2026-01-23 18:47:34 -05:00
kompfner
4ea546785f Merge pull request #3406 from omChauhanDev/fix/openrouter-gemini-messages
fix(openrouter): handle multiple system messages for Gemini models
2026-01-23 14:53:59 -05:00
filipi87
f128cdd19a Adding a changelog entry to the AudioBufferProcessor fix. 2026-01-23 16:16:01 -03:00
filipi87
7921bce4af Refactoring AudioBufferProcessor to fix audio track synchronization. 2026-01-23 16:15:48 -03:00
Luke Payyapilli
cadced3f79 feat: handle server_content.interrupted for faster barge-in response 2026-01-23 10:41:04 -05:00
Aleix Conchillo Flaqué
8951442b8e Merge pull request #3534 from pipecat-ai/aleix/claude-skills-pr-description
claude: add pr-description skill
2026-01-22 17:34:46 -08:00
Aleix Conchillo Flaqué
7e6e3031e7 claude: add pr-description skill 2026-01-22 13:41:50 -08:00
Akhil
3b3c7aa8cc LLMAssistantAggregator: preserve non-ASCII characters in JSON output
Add ensure_ascii=False to json.dumps() calls for tool call arguments
and function call results to prevent unnecessary unicode escaping.
2026-01-22 15:37:44 -06:00
Aleix Conchillo Flaqué
308829f92b Merge pull request #3533 from pipecat-ai/aleix/claude-skills-docstring
claude: add docstring skill
2026-01-22 12:58:38 -08:00
Aleix Conchillo Flaqué
82a799e63e claude: add docstring skill 2026-01-22 12:53:38 -08:00
Cale Shapera
6b5bcae86f change default Inworld TTS model to inworld-tts-1.5-max (#3531) 2026-01-22 14:21:15 -05:00
Mark Backman
836073849c Merge pull request #3527 from weakcamel/patch-1
Update README.md - fix Google Imagen URL
2026-01-22 10:46:10 -05:00
Waldek Maleska
b13b65d6e2 Update README.md - fix Google Imagen URL 2026-01-22 15:17:41 +00:00
Mark Backman
3d545b718d Merge pull request #3344 from omChauhanDev/fix/stt-dynamic-language-update
fix: treat language as first-class STT setting
2026-01-22 09:21:56 -05:00
marcus-daily
f2fa5d9733 Updating changelog 2026-01-22 14:17:59 +00:00
marcus-daily
76b774072c Formatting fixes 2026-01-22 14:17:59 +00:00
marcus-daily
b6341ffaa5 Save Smart Turn input data if SMART_TURN_LOG_DATA is set 2026-01-22 14:17:59 +00:00
Mark Backman
29fae67c9e Merge pull request #3523 from omChauhanDev/add-location-support-google-tts
feat(google): add location parameter to TTS services
2026-01-22 09:12:16 -05:00
Mark Backman
718ea1c15e Merge pull request #3526 from pipecat-ai/mb/remove-logs
Remove application logs
2026-01-22 08:48:07 -05:00
Mark Backman
8e09d94614 Remove application logs 2026-01-22 08:28:52 -05:00
Aleix Conchillo Flaqué
de73e28563 Merge pull request #3510 from omChauhanDev/feat/add-reached-filter-methods
feat(task): add additive filter methods for frame monitoring
2026-01-21 21:05:33 -08:00
Aleix Conchillo Flaqué
55250b4f7e Merge pull request #3521 from pipecat-ai/aleix/claude-changelog-skill
claude: initial /changelog skill
2026-01-21 20:50:47 -08:00
Om Chauhan
281145a991 added changelog 2026-01-22 09:55:57 +05:30
Om Chauhan
7bd32e2fe5 feat(google): add location parameter to TTS services 2026-01-22 09:49:19 +05:30
James Hush
8f05d95f50 feat: add video_out_codec parameter for DailyTransport (#3520)
* feat: add video_out_codec parameter for DailyTransport

Add video_out_codec parameter to TransportParams allowing configuration
of the preferred video codec (VP8, H264, H265) for video output.

When set, this passes the preferredCodec option to Daily's
VideoPublishingSettings during the join operation.

* chore: move video_out_codec parameter to changelog folder (#3522)

* Initial plan

* Move video_out_codec parameter to changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

* Revert all CHANGELOG.md changes, keep only changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-01-22 11:31:07 +08:00
Om Chauhan
87c12f3098 changed frame filter storage type from tuples to sets 2026-01-22 08:43:46 +05:30
Om Chauhan
9c0bf89247 added changelog 2026-01-22 08:43:46 +05:30
Om Chauhan
6e44a2ab49 feat(task): add additive filter methods for frame monitoring 2026-01-22 08:43:46 +05:30
Aleix Conchillo Flaqué
7aa7b86aed claude: initial /changelog skill 2026-01-21 18:43:04 -08:00
Aleix Conchillo Flaqué
5ad9faeb4c Merge pull request #3519 from pipecat-ai/aleix/embedded-rtvi-processor
automatically add RTVI to the pipeline
2026-01-21 18:17:26 -08:00
Aleix Conchillo Flaqué
9e8f8b45c6 added changelog files for #3519 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
0ee11ad333 tests: disable RTVI in tests by default 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
124a3c35af RTVIObserver: don't handle some frames direction 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
054e504868 examples(foundational): remove RTVI (automatically added by PipelineTask) 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
e85a00cc0e PipelineTask: automatically add RTVI processor and RTVI observer
If `enable_rtvi` is enabled (enabled by default) and RTVI processor will be
added automatically to the pipeline. Also, and RTVI observer will be
registered.
2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
cc61cdbba3 RTVIProcessor: add create_rtvi_observer() 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
62f4708d43 transports: broadcast InputTransportMessageFrame frames 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
ba0ddb1832 FrameProcessor: copy kwargs when broadcasting frame 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
eacd2a4b71 FrameProcessor: add broadcast_frame_instance() 2026-01-21 18:14:17 -08:00
Mark Backman
7ed110650d Merge pull request #3516 from okue/minorpatch1
refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
2026-01-21 10:33:59 -05:00
okue
4a724379fc refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
The _bot_speaking flag does not need to be set in this method,
so the redundant assignment has been removed.
2026-01-21 23:59:15 +09:00
Aleix Conchillo Flaqué
768d3958dd Merge pull request #3512 from pipecat-ai/changelog-0.0.100
Release 0.0.100 - Changelog Update
2026-01-20 19:32:56 -08:00
aconchillo
5f9ff8bd58 Update changelog for version 0.0.100 2026-01-20 19:21:19 -08:00
Aleix Conchillo Flaqué
59ed422052 Merge pull request #3511 from pipecat-ai/aleix/camb-tts-client-on-start
CambTTSService: initialize client during StartFrame
2026-01-20 19:17:45 -08:00
Aleix Conchillo Flaqué
7e0ca113af CambTTSService: initialize client during StartFrame 2026-01-20 19:07:12 -08:00
Aleix Conchillo Flaqué
13c52e0e6d Merge pull request #3509 from pipecat-ai/aleix/nvidia-stt-tts-improvements
NVIDIA STT/TTS performance improvements
2026-01-20 16:39:12 -08:00
Aleix Conchillo Flaqué
a787fd9cd8 NVIDIATTSService: process incoming audio frame right away
Process audio as soon as we receive it from the generator. Previously, we were
reading from the generator and adding elements into a queue until there was no
more data, then we would process the queue.
2026-01-20 15:41:05 -08:00
Aleix Conchillo Flaqué
14495c425a NVIDIASTTService: no need for additional queue and task 2026-01-20 13:50:17 -08:00
Aleix Conchillo Flaqué
461bd0a2e0 update changelog for #3494 and #3499 2026-01-20 13:26:40 -08:00
Aleix Conchillo Flaqué
bd45ce2b4e Merge pull request #3499 from lukepayyapilli/fix/livekit-video-queue-memory-leak
fix(livekit): prevent memory leak when video_in_enabled is False
2026-01-20 13:21:21 -08:00
Aleix Conchillo Flaqué
a266644b06 Merge pull request #3494 from omChauhanDev/fix/uninterruptible-frame-handling
fix: preserve UninterruptibleFrames in __reset_process_queue
2026-01-20 13:19:40 -08:00
Mark Backman
03faadd7f9 Merge pull request #3508 from pipecat-ai/ss/log-daily-ids
Log Daily participant and meeting session IDs upon successful join in…
2026-01-20 15:43:48 -05:00
Aleix Conchillo Flaqué
bf43032652 Merge pull request #3504 from pipecat-ai/aleix/nvidia-stt-tts-error-handling
NVIDIA STT/TTS error handling
2026-01-20 09:41:08 -08:00
Sunah Suh
fa6f924b31 Log Daily participant and meeting session IDs upon successful join in Daily Transport 2026-01-20 11:31:17 -06:00
Aleix Conchillo Flaqué
a010a020fd add changelog fo 3504 2026-01-20 09:03:30 -08:00
Aleix Conchillo Flaqué
655006aff5 NvidiaSegmentedSTTService: simplify exception handling 2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
671dc8cd9b NvidiaSTTService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
9a718ded1e NvidiaTTSService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
024809b39a Merge pull request #3503 from pipecat-ai/aleix/ai-service-start-end-cancel
AIService: handle StartFrame/EndFrame/CancelFrame exceptions
2026-01-20 08:56:39 -08:00
Aleix Conchillo Flaqué
6cf0d53d00 AIService: handle StartFrame/EndFrame/CancelFrame exceptions
If AIService subclasses implement start()/stop()/cancel() and exception are not
handled, execution will not continue and therefore the originator frames will
not be pushed. This would cause the pipeline to not be started (i.e. StartFrame
would not be pushed downstream) or stopped properly.
2026-01-20 08:54:22 -08:00
kompfner
778dacc9a8 Merge pull request #3486 from pipecat-ai/pk/fix-nova-sonic-reset-conversation
Fix `AWSNovaSonicLLMService.reset_conversation()`
2026-01-20 10:07:38 -05:00
Paul Kompfner
06b3ecd2d6 In AWS Nova Sonic service, send the "interactive" user message (which triggers the bot response) only after sending the audio input start event, per the AWS team's recommendation 2026-01-20 09:56:25 -05:00
Paul Kompfner
b4d143e39b Add CHANGELOG for fixing AWSNovaSonicLLMService.reset_conversation() 2026-01-20 09:56:25 -05:00
Paul Kompfner
c89083e72e Improve 20e example to ask the bot to give a recap when loading a previous conversation from disk 2026-01-20 09:56:25 -05:00
Luke Payyapilli
1ac811ab32 chore: revert unrelated uv.lock changes 2026-01-20 09:19:43 -05:00
Luke Payyapilli
f6359d460e chore: install livekit as optional extra in CI instead of dev dep 2026-01-20 09:16:16 -05:00
Aleix Conchillo Flaqué
f03a7175c7 Merge pull request #3501 from pipecat-ai/aleix/improve-eval-numerical-word-prompt
scripts(eval): give examples to numerical word answers
2026-01-19 20:22:06 -08:00
Aleix Conchillo Flaqué
aed44c863a scripts(eval): give examples to numerical word answers
Some models need extra help.
2026-01-19 14:37:00 -08:00
ssillerom
fa5da3b0be change comments 2026-01-19 20:49:23 +01:00
ssillerom
7e82a0cf49 feature: Genesys AudioHook WebSocket protocol serializer for Pipecat 2026-01-19 20:45:22 +01:00
Mark Backman
cddd6d5b0a Merge pull request #3492 from pipecat-ai/mb/remove-unused-imports
Remove unused imports
2026-01-19 14:07:16 -05:00
Mark Backman
11cf891ac8 Manual updates for unused imports 2026-01-19 14:03:22 -05:00
Luke Payyapilli
c89ae717fe style: fix ruff formatting 2026-01-19 11:13:41 -05:00
Luke Payyapilli
562bdd3084 test: add livekit to dev deps and improve test clarity 2026-01-19 11:11:54 -05:00
Mark Backman
cc4c3650e1 Merge pull request #3491 from pipecat-ai/mb/update-release-evals
Add Camb TTS to release evals
2026-01-19 11:04:05 -05:00
Luke Payyapilli
dfc1f09b77 fix(livekit): prevent memory leak when video_in_enabled is False 2026-01-19 11:00:23 -05:00
Mark Backman
0b1a4792b8 Bump to latest azure-cognitiveservices-speech version, 1.47.0 2026-01-19 09:52:28 -05:00
Mark Backman
14bd3b1b32 Set Azure TTS default prosody rate to None 2026-01-19 09:19:57 -05:00
Mark Backman
f733e77496 AzureTTS: work around word ordering issue at 8khz sample rate 2026-01-19 09:13:41 -05:00
Filipi da Silva Fuchter
5fc46cc450 Merge pull request #3493 from omChauhanDev/fix/globally-unique-pc-id
fix: make SmallWebRTCConnection pc_id globally unique
2026-01-19 09:04:48 -05:00
Om Chauhan
4a9eb82f92 fix: preserve UninterruptibleFrames in __reset_process_queue 2026-01-18 20:39:13 +05:30
Om Chauhan
990d8386e4 fix: make SmallWebRTCConnection pc_id globally unique 2026-01-18 19:41:51 +05:30
Mark Backman
ce7d823770 Remove unused imports 2026-01-18 08:22:22 -05:00
Mark Backman
0b93c3f900 Add Camb TTS to release evals 2026-01-17 16:27:16 -05:00
Mark Backman
829c5f4604 Merge pull request #3169 from Incanta/hathora
Add Hathora STT and TTS services
2026-01-17 16:25:12 -05:00
Mike Seese
dc8ea615d9 add hathora to run-release-evals.py 2026-01-17 10:33:58 -08:00
Mike Seese
a3d206050d move hathora example as requested 2026-01-17 10:31:08 -08:00
Mike Seese
f48a567873 run the linter 2026-01-17 10:30:47 -08:00
Mark Backman
e69ccd8ea7 Merge pull request #3490 from pipecat-ai/mb/on-user-mute-events
Add on_user_mute_started and on_user_mute_stopped events
2026-01-17 11:05:15 -05:00
Mark Backman
11924bb980 Add on_user_mute_started and on_user_mute_stopped events 2026-01-17 11:01:46 -05:00
Mark Backman
af89154e96 Merge pull request #3489 from pipecat-ai/mb/fix-azure-tts-punctuation-spacing
fix: AzureTTSService punctuation spacing
2026-01-17 11:00:30 -05:00
Mark Backman
1485ea0831 Merge pull request #3488 from pipecat-ai/mb/on-user-turn-idle
Update on_user_idle to on_user_turn_idle
2026-01-17 11:00:16 -05:00
Mark Backman
e22bc777d8 Fix spacing for CJK languages 2026-01-17 09:04:50 -05:00
Mark Backman
043403fe23 fix: AzureTTSService punctuation spacing 2026-01-17 08:18:31 -05:00
Mark Backman
1e1160906e Update on_user_idle to on_user_turn_idle 2026-01-17 07:04:27 -05:00
Aleix Conchillo Flaqué
f7d3e63063 Merge pull request #3474 from pipecat-ai/fix/optional-member-access-function-call-cancel
Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
2026-01-16 22:06:45 -08:00
Paul Kompfner
6fa797c8e4 Fix AWS Nova Sonic reset_conversation(), which would previously error out.
Issues:
- After disconnecting, we were prematurely sending audio messages using the new prompt and content names, before the new prompt and content were created
- We weren't properly sending system instruction and conversation history messages to Nova Sonic with `"interactive": false`
2026-01-16 22:31:54 -05:00
Mark Backman
473d39791b Merge pull request #3482 from pipecat-ai/mb/user-idle-in-user-aggregator
Add UserIdleController, deprecate UserIdleProcessor
2026-01-16 18:47:10 -05:00
Aleix Conchillo Flaqué
2114abb8c6 add changelog file for 3484 2026-01-16 15:46:29 -08:00
Aleix Conchillo Flaqué
4fb4c26f55 Merge pull request #3484 from amichyrpi/main
Remove async_mode parameter from Mem0 storage
2026-01-16 15:44:52 -08:00
Mark Backman
2e8e574ea5 Add UserIdleController, deprecate UserIdleProcessor 2026-01-16 18:44:19 -05:00
Aleix Conchillo Flaqué
84c7e97be2 Merge pull request #3483 from pipecat-ai/aleix/throttle-user-speaking-frame
throttle user speaking frame
2026-01-16 15:29:37 -08:00
Amory Hen
a6e7c99d55 Remove async_mode parameter from Mem0 storage 2026-01-17 00:26:38 +01:00
Aleix Conchillo Flaqué
ac3fa7f91f BaseOuputTransport: minor cleanup 2026-01-16 15:15:49 -08:00
Aleix Conchillo Flaqué
6eadad53b2 BaseInputTransport: throttle UserSpeakingFrame 2026-01-16 15:15:49 -08:00
kompfner
b11150f31f Merge pull request #3480 from pipecat-ai/pk/fix-grok-realtime-smallwebrtc
Fix an issue where Grok Realtime would error out when running with Sm…
2026-01-16 15:46:27 -05:00
Paul Kompfner
836cf60611 Fix an issue where Grok Realtime would error out when running with SmallWebRTC transport.
The underlying issue was related to the fact that we were sending audio to Grok before we had configured the Grok session with our default input sample rate (16000), so Grok was interpreting those initial audio chunks as having its default sample rate (24000). We didn't see this issue when using the Daily transport simply because in our test environments Daily took a smidge longer than a reflexive (localhost) pure WebRTC connection, so we would only send audio to Grok *after* we had configured the Grok session with the desired sample rate.
2026-01-16 15:41:33 -05:00
James Hush
1c13ad95a5 Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
Extract dictionary value to local variable and check for None before
accessing cancel_on_interruption attribute, since the dictionary values
are typed as Optional[FunctionCallInProgressFrame].
2026-01-16 15:04:26 -05:00
Mark Backman
1e8516e91d Merge pull request #3476 from pipecat-ai/mb/project-urls
Update project.urls for PyPI
2026-01-16 14:57:39 -05:00
Mark Backman
32c775311d Merge pull request #3471 from pipecat-ai/mb/fix-pydantic-2.12-docs
Revert pydantic 2.12 extra type annotation
2026-01-16 14:57:24 -05:00
Mark Backman
28d0bb98de Merge pull request #3472 from pipecat-ai/mb/whisker-dev
Add whisker_setup.py setup file to .gitignore
2026-01-16 14:55:48 -05:00
Aleix Conchillo Flaqué
a9a9f3aeaa Merge pull request #3462 from pipecat-ai/aleix/fix-min-words-transcription-aggregation
MinWordsUserTurnStartStrategy: don't aggregate transcriptions
2026-01-16 11:18:23 -08:00
Aleix Conchillo Flaqué
c2a0735975 MinWordsUserTurnStartStrategy: don't aggregate transcriptions
If we aggregate transcriptions we will get incorrect interruptions. For example,
if we have a strategy with min_words=3 and we say "One" and pause, then "Two"
and pause and then "Three", this would trigger the start of the turn when it
shouldn't. We should only look at the incoming transcription text and don't
aggregate it with the previous.
2026-01-16 11:16:06 -08:00
Aleix Conchillo Flaqué
41cb53f6c2 Merge pull request #3479 from pipecat-ai/aleix/turns-mute-to-user-mute
turns: move mute to user_mute
2026-01-16 11:11:50 -08:00
Aleix Conchillo Flaqué
58552af8fd examples(foundational): remote STTMuteFilter example 2026-01-16 11:07:20 -08:00
Aleix Conchillo Flaqué
c7ab87b0cc turns: move mute to user_mute 2026-01-16 11:07:20 -08:00
Mark Backman
11ecc5fdee Update project.urls for PyPI 2026-01-16 12:48:13 -05:00
kompfner
19fb3eed9f Merge pull request #3466 from pipecat-ai/pk/fix-aws-nova-sonic-rtvi-bot-output
Fix realtime (speech-to-speech) services' RTVI event compatibility
2026-01-16 09:56:13 -05:00
Mark Backman
b292b32374 Merge pull request #3461 from glennpow/glenn/websocket-headers
Allow WebsocketClientTransport to send custom headers
2026-01-15 20:26:36 -05:00
Mark Backman
63d1393bb0 Add whisker_setup.py to .gitignore 2026-01-15 20:21:25 -05:00
Glenn Powell
37914cb062 Removed import and added changelog entry. 2026-01-15 16:47:15 -08:00
Mark Backman
ec40696854 Revert pydantic 2.12 extra type annotation 2026-01-15 19:16:15 -05:00
Mike Seese
2249f3d673 add requested changes from code review 2026-01-15 15:27:56 -08:00
Mike Seese
d2df324f29 fix some bugs after testing changes 2026-01-15 15:27:56 -08:00
Mike Seese
67fdb0b659 use parent _settings dict instead of self._params pattern 2026-01-15 15:27:56 -08:00
Mike Seese
e77bdf66f9 add can_generate_metrics functions 2026-01-15 15:27:56 -08:00
Mike Seese
1b3b67779c switch hathora services to use InputParams pattern 2026-01-15 15:27:55 -08:00
Mike Seese
6c7e386391 remove traced_stt from run_stt 2026-01-15 15:27:55 -08:00
Mike Seese
ba25b279d6 fix issues with PR suggestions 2026-01-15 15:27:55 -08:00
Mike Seese
e7c83c19b6 port turn_start_strategies to the newer user_turn_strategies 2026-01-15 15:27:55 -08:00
Mike Seese
7be7fb49a3 remove turn_analyzer args from transport params 2026-01-15 15:27:54 -08:00
Mike Seese
bcccb4cbb3 put fallback sample_rate value in function arg 2026-01-15 15:27:54 -08:00
Mike Seese
e9f1d951d3 Apply suggestions from code review
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-01-15 15:27:54 -08:00
Mike Seese
e5632a9339 transition Hathora service to use the unified API and apply PR feedback
add Hathora to root files

Hathora run linter

added hathora changelog
2026-01-15 15:27:53 -08:00
Mike Seese
1510fb4fc0 add Hathora STT and TTS services 2026-01-15 15:26:52 -08:00
Mark Backman
64a1ad2649 Merge pull request #3470 from pipecat-ai/mb/fix-docs-0.0.99
Docs fixes after 0.0.99
2026-01-15 17:34:44 -05:00
Mark Backman
4458ca1d24 Mock FastAPI 2026-01-15 17:29:47 -05:00
Mark Backman
21aaa48e62 Fix pydantic issues impacting autodoc 2026-01-15 17:29:47 -05:00
Mark Backman
e75c241030 Merge pull request #3468 from pipecat-ai/mb/camb-cleanuo
Clean up CambTTSService
2026-01-15 17:16:28 -05:00
Mark Backman
60216048a8 Docs fixes after 0.0.99 2026-01-15 16:40:42 -05:00
Mark Backman
f3c2e29fb4 Clean up CambTTSService 2026-01-15 15:59:17 -05:00
Paul Kompfner
ce99924be4 Add CHANGELOG entry describing fix for the missing "bot-llm-text" RTVI event when using realtime (speech-to-speech) services 2026-01-15 15:55:39 -05:00
Paul Kompfner
5de80a60d4 Fix "bot-llm-text" not firing when using Grok Realtime 2026-01-15 15:30:00 -05:00
Paul Kompfner
5753762350 Fix "bot-llm-text" not firing when using OpenAI Realtime 2026-01-15 15:16:08 -05:00
Paul Kompfner
885b318b04 Fix "bot-llm-text" not firing when using Gemini Live 2026-01-15 15:03:45 -05:00
Paul Kompfner
7a22d58cf4 Fix "bot-llm-text" not firing when using AWS Nova Sonic 2026-01-15 14:56:50 -05:00
Mark Backman
c8e4b462c9 Merge pull request #3460 from pipecat-ai/mb/reorder-07-examples
Renumber the 07 foundational examples
2026-01-15 14:44:21 -05:00
Mark Backman
30a3f42255 Merge pull request #3349 from eRuaro/feat/camb-tts-integration
Add Camb.ai TTS integration with MARS models
2026-01-15 14:43:12 -05:00
Neil Ruaro
26ddb2de2f minimal uv.lock update for camb-sdk 2026-01-16 03:18:01 +08:00
Neil Ruaro
f60eeaa212 reverted uv.lock, updated readthedocs.yaml, copyright year updates 2026-01-16 02:50:18 +08:00
Neil Ruaro
8cf72b36cb manually add camb-sdk to uv.lock, exclude camb from docs build 2026-01-16 02:26:38 +08:00
Neil Ruaro
38c3bcef96 exclude camb from docs build 2026-01-16 02:20:26 +08:00
Neil Ruaro
80604ba7b6 remove _update_settings method 2026-01-16 02:00:48 +08:00
Neil Ruaro
256c70c631 use UserTurnStrategies 2026-01-16 01:32:08 +08:00
Glenn Powell
0e3532c529 Allow WebsocketClientTransport to send custom headers 2026-01-15 09:31:48 -08:00
Neil Ruaro
9942fcfeb2 updated per PR reviews 2026-01-16 01:20:17 +08:00
Neil Ruaro
003c24ca6e Make model parameter explicit in docstring example 2026-01-16 01:18:37 +08:00
Neil Ruaro
ed120d014d Add model-specific sample rates, transport example, and fix audio buffer alignment 2026-01-16 01:18:37 +08:00
Neil Ruaro
e76a3d04f0 Update Camb TTS to 48kHz sample rate 2026-01-16 01:18:37 +08:00
Neil Ruaro
641d17007f Clean up Camb TTS service and tests 2026-01-16 01:18:37 +08:00
Neil Ruaro
9293b5f24a Migrate Camb TTS service from raw HTTP to official SDK
- Replace aiohttp with camb SDK (AsyncCambAI client)
- Add support for passing existing SDK client instance
- Simplify API: no longer requires aiohttp_session parameter
- Update example to use simplified initialization
- Rewrite tests to mock SDK client instead of HTTP servers
2026-01-16 01:18:37 +08:00
Neil Ruaro
c1f3cbd1d4 Yield TTSAudioRawFrame directly instead of calling private method 2026-01-16 01:18:37 +08:00
Neil Ruaro
78fa2ab65e Update default voice ID, fix MARS naming, and clean up example 2026-01-16 01:18:37 +08:00
Neil Ruaro
56da2caeed Update Camb.ai TTS inference options 2026-01-16 01:18:37 +08:00
Neil Ruaro
a541d65255 Update MARS model names to mars-flash, mars-pro, mars-instruct
Rename model identifiers from mars-8-* to the new naming convention:
- mars-8-flash -> mars-flash (default)
- mars-8 -> removed
- mars-8-instruct -> mars-instruct
- Added mars-pro
2026-01-16 01:18:37 +08:00
Neil Ruaro
a3d7e9eafe Address PR feedback: add --voice-id arg, remove test script
- Add --voice-id CLI argument to example (default: 2681)
- Remove test_camb_quick.py from examples/ (tests belong in tests/)
- Update docstring with new usage
2026-01-16 01:18:36 +08:00
Neil Ruaro
54933bea2a Rename changelog to PR number 2026-01-16 01:18:36 +08:00
Neil Ruaro
fcab9899cc Add changelog entry for Camb.ai TTS integration 2026-01-16 01:18:36 +08:00
Neil Ruaro
be098e85db Remove non-working Daily/WebRTC example
The Daily transport example had authentication issues. Keeping the
local audio example (07zb-interruptible-camb-local.py) which works.
2026-01-16 01:18:36 +08:00
Neil Ruaro
ed0ff46a87 added local test 2026-01-16 01:18:36 +08:00
Neil Ruaro
7ae0d651d6 added cambai tts integration 2026-01-16 01:18:36 +08:00
Mark Backman
efd4432cfb Renumber the 07 foundational examples 2026-01-15 10:26:17 -05:00
kompfner
24082b84f2 Merge pull request #3453 from pipecat-ai/pk/consistency-pass-on-user-started-stopped-speaking-frames
Do a consistency pass on how we're sending `UserStartedSpeakingFrame`…
2026-01-15 09:24:14 -05:00
Aleix Conchillo Flaqué
dcd5840341 Merge pull request #3455 from pipecat-ai/aleix/reset-user-turn-start-strategies
UserTurnController: reset user turn start strategies when turn triggered
2026-01-14 19:28:32 -08:00
Aleix Conchillo Flaqué
9e705ce768 UserTurnController: reset user turn start strategies when turn triggered 2026-01-14 18:20:29 -08:00
Mark Backman
965466cc09 Merge pull request #3454 from pipecat-ai/mb/external-turn-strategies-timeout
fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrat…
2026-01-14 20:15:31 -05:00
Mark Backman
f3993f1775 fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrategies 2026-01-14 20:10:56 -05:00
Paul Kompfner
e107902b14 Do a consistency pass on how we're sending UserStartedSpeakingFrames and UserStoppedSpeakingFrames. The codebase is now consistent in broadcasting both types of frames up and downstream. 2026-01-14 18:47:15 -05:00
kompfner
e7b5ff49f4 Merge pull request #3447 from pipecat-ai/pk/add-pr-3420-to-changelog
Add PR 3420 to CHANGELOG (it was missing)
2026-01-14 15:33:44 -05:00
Paul Kompfner
e33172c44e Add PR 3420 to CHANGELOG (it was missing) 2026-01-14 15:33:07 -05:00
Mark Backman
3d858e8aa6 Merge pull request #3444 from pipecat-ai/mb/update-quickstart-0.0.99
Update quickstart example for 0.0.99
2026-01-14 10:29:55 -05:00
Mark Backman
eab059c49a Merge pull request #3446 from pipecat-ai/mb/add-3392-changelog
Add PR 3392 to changelog, linting cleanup
2026-01-14 10:28:57 -05:00
Mark Backman
4aaff04fb3 Add PR 3392 to changelog, linting cleanup 2026-01-14 09:43:17 -05:00
Mark Backman
cb364f3cab Update quickstart example for 0.0.99 2026-01-14 08:59:20 -05:00
Mark Backman
a9bfb090c3 Merge pull request #3287 from ashotbagh/feature/asyncai-multicontext-wss
Fix TTFB metric and add multi-context WebSocket support for Async TTS
2026-01-14 07:52:52 -05:00
Ashot
c4ae4025f3 Adjustments of Async TTS for multicontext websocket support 2026-01-14 16:33:30 +04:00
Ashot
15067c678d adapt Async TTS to updated AudioContextTTSService 2026-01-14 15:45:27 +04:00
Ashot
5ae592f38e Improve Async TTS interruption handling by using AudioContextTTSService class and add changelog fragments 2026-01-14 15:45:27 +04:00
Ashot
9cdbc56be3 Fix TTFB metric and add multi-context WebSocket support for Async TTS 2026-01-14 15:45:27 +04:00
Aleix Conchillo Flaqué
86ed485711 Merge pull request #3440 from pipecat-ai/changelog-0.0.99
Release 0.0.99 - Changelog Update
2026-01-13 17:02:41 -08:00
Aleix Conchillo Flaqué
7e1b4a4e90 update cosmetic changelog updates for 0.0.99 2026-01-13 16:59:46 -08:00
aconchillo
4531d517da Update changelog for version 0.0.99 2026-01-14 00:49:15 +00:00
Aleix Conchillo Flaqué
6fd5847f84 Merge pull request #3439 from pipecat-ai/aleix/uv-lock-2026-01-13
uv.lock: upgrade to latest versions
2026-01-13 16:48:07 -08:00
Aleix Conchillo Flaqué
2015eba9b2 uv.lock: upgrade to latest versions 2026-01-13 16:45:44 -08:00
Mark Backman
84f16ee895 Merge pull request #3438 from pipecat-ai/mb/fix-26a
Fix 26a foundational
2026-01-13 19:43:50 -05:00
Aleix Conchillo Flaqué
5b2af03b16 Merge pull request #3437 from pipecat-ai/aleix/update-aggregator-logs
LLMContextAggregatorPair: make strategy logs less verbose
2026-01-13 16:39:29 -08:00
Mark Backman
b313395dc3 Fix 26a foundational 2026-01-13 19:31:24 -05:00
Aleix Conchillo Flaqué
0d6bdbee10 LLMContextAggregatorPair: make strategy logs less verbose 2026-01-13 15:11:22 -08:00
Aleix Conchillo Flaqué
248dac3a9d Merge pull request #3420 from pipecat-ai/pk/fix-gemini-3-parallel-function-calls
Fix parallel function calling with Gemini 3.
2026-01-13 14:40:33 -08:00
Paul Kompfner
be49a54856 Fast-exit in the fix for parallel function calling with Gemini 3, if we can determine up-front that there's no work to do 2026-01-13 17:32:20 -05:00
Aleix Conchillo Flaqué
bd9ee0d646 Merge pull request #3434 from pipecat-ai/aleix/context-appregator-pair-tuple
context aggregator pair tuple
2026-01-13 14:12:51 -08:00
Mark Backman
442e0e582d Merge pull request #3431 from pipecat-ai/mb/update-realtime-examples-transcript-handler
Update GeminiLiveLLMService to push thought frames, update 26a for new transcript events
2026-01-13 17:10:40 -05:00
kompfner
38194c0cff Merge pull request #3436 from pipecat-ai/pk/remove-transcript-processor-reference
Remove dead import of `TranscriptProcessor` (which is now deprecated)
2026-01-13 17:06:17 -05:00
Paul Kompfner
0ebdaba03c Remove dead import of TranscriptProcessor (which is now deprecated) 2026-01-13 17:02:57 -05:00
Aleix Conchillo Flaqué
ee82377d68 examples: fix 22d to push some CancelFrame and EndFrame 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
861588e4a3 examples: update all examples to use the new LLMContextAggregatorPair tuple 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
1ab3bf2ef6 LLMContextAggregatorPair: instances can now return a tuple 2026-01-13 14:01:53 -08:00
Mark Backman
bb00d223c9 Update 26a to use context aggregator transcription events 2026-01-13 17:01:10 -05:00
Aleix Conchillo Flaqué
86fbfaddd1 Merge pull request #3435 from pipecat-ai/aleix/fix-llm-context-create-audio-message
LLMContext: fix create_audio_message
2026-01-13 13:59:28 -08:00
Aleix Conchillo Flaqué
5612bf513b LLMContext: fix create_audio_message 2026-01-13 13:53:34 -08:00
Mark Backman
87d0dc9e24 Merge pull request #3412 from pipecat-ai/mb/remove-41a-b
Remove foundational examples 41a and 41b
2026-01-13 16:45:26 -05:00
Paul Kompfner
30fbcfbf71 Rework fix for parallel function calling with Gemini 3 2026-01-13 16:33:59 -05:00
Mark Backman
5d90f4ea06 Merge pull request #3428 from pipecat-ai/mb/fix-tracing-none-values
Fix TTS, realtime LLM services could return unknown for model_name
2026-01-13 15:40:10 -05:00
kompfner
f6d09e1574 Merge pull request #3430 from pipecat-ai/pk/request-image-frame-fixes
Fix request_image_frame and usage
2026-01-13 15:36:44 -05:00
Mark Backman
b8e48dee7f Merge pull request #3433 from pipecat-ai/mb/port-realtime-examples-transcript-events
Update examples to use transcription events from context aggregators
2026-01-13 15:36:06 -05:00
Mark Backman
a6ccb9ec69 Merge pull request #3427 from pipecat-ai/mb/add-07j-gladia-vad-example
Add 07j Gladia VAD foundational example, add to release evals
2026-01-13 15:35:24 -05:00
Mark Backman
66551ebdf5 Merge pull request #3426 from pipecat-ai/mb/changelog-3404
Add changelog fragments for PR 3404
2026-01-13 15:34:58 -05:00
Aleix Conchillo Flaqué
21534f7d83 added changelog file for #3430 2026-01-13 12:21:22 -08:00
Mark Backman
d591f9e108 Remove 28-transcription-processor.py 2026-01-13 15:20:59 -05:00
Mark Backman
aa2589d3be Update examples to use transcription events from context aggregators 2026-01-13 15:19:47 -05:00
Aleix Conchillo Flaqué
9d6067fa78 examples(foundational): speak "Let me check on that" in 14d examples 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
027e54425a examples(foundational): associate image requests to function calls 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
e268c73c41 LLMAssistantAggregator: cache function call requested images 2026-01-13 12:10:08 -08:00
Aleix Conchillo Flaqué
d3c57e2da0 UserImageRawFrame: don't deprecate request field 2026-01-13 11:56:13 -08:00
Aleix Conchillo Flaqué
02eace5a16 UserImageRequestFrame: don't deprecate function call related fields 2026-01-13 11:55:55 -08:00
Mark Backman
15bc1dd999 Update GeminiLiveLLMService to push Thought frames when thought content is returned 2026-01-13 14:13:00 -05:00
Paul Kompfner
b937956dc8 Fix request_image_frame and usage 2026-01-13 13:23:01 -05:00
Mark Backman
efbc0c8510 Fix TTS, realtime LLM services could return unknown for model_name 2026-01-13 12:12:15 -05:00
Himanshu Gunwant
d0f227189c fix: openai llm model name is unknown (#3422) 2026-01-13 11:55:52 -05:00
Mark Backman
41eef5efc4 Add 07j Gladia VAD foundational example, add to release evals 2026-01-13 11:36:15 -05:00
Mark Backman
f00f9d9f1a Add changelog fragments for PR 3404 2026-01-13 11:29:17 -05:00
Mark Backman
ae59b3ba36 Merge pull request #3404 from poseneror/feature/gladia-vad-events
feat(gladia): add VAD events support
2026-01-13 11:26:56 -05:00
Paul Kompfner
6668712f7b Add evals for parallel function calling 2026-01-13 11:03:38 -05:00
Paul Kompfner
8812686b17 Fix parallel function calling with Gemini 3.
Gemini expects parallel function calls to be passed in as a single multi-part `Content` block. This is important because only one of the function calls in a batch of parallel function calls gets a thought signature—if they're passed in as separate `Content` blocks, there'd be one or more missing thought signatures, which would result in a Gemini error.
2026-01-13 11:03:38 -05:00
kompfner
8b0f0b5bb4 Merge pull request #3425 from pipecat-ai/pk/gemini-3-flash-new-thinking-levels
Add Gemini 3 Flash-specific thinking levels
2026-01-13 11:02:53 -05:00
Paul Kompfner
f5e8a04e3b Bump aiortc dependency, which relaxes the constraint on av, which was pinned to 14.4.0, which no longer has all necessary wheels 2026-01-13 10:50:08 -05:00
Mark Backman
a298ce3b41 Merge pull request #3424 from pipecat-ai/mb/tts-append-trailing-space
Add append_trailing_space to TTSService to prevent vocalizing trailin…
2026-01-13 10:42:40 -05:00
Mark Backman
31daa889e8 Add append_trailing_space to TTSService to prevent vocalizing trailing punctuation; update DeepgramTTSService and RimeTTSService to use the arg 2026-01-13 10:38:54 -05:00
Paul Kompfner
76a058178e Add Gemini 3 Flash-specific thinking levels 2026-01-13 09:50:59 -05:00
poseneror
3304b18ac2 Add should_interrupt + broadcast user events 2026-01-13 14:27:35 +02:00
poseneror
b95a6afe77 feat(gladia): add VAD events support
Add support for Gladia's speech_start/speech_end events to emit
UserStartedSpeakingFrame and UserStoppedSpeakingFrame frames.

When enable_vad=True in GladiaInputParams:
- speech_start triggers interruption and pushes UserStartedSpeakingFrame
- speech_end pushes UserStoppedSpeakingFrame
- Tracks speaking state to prevent duplicate events

This allows using Gladia's built-in VAD instead of a separate VAD
in the pipeline.
2026-01-13 14:27:35 +02:00
Mark Backman
f6ed7d7582 Merge pull request #3418 from pipecat-ai/mb/speechmatics-task-cleanup 2026-01-12 19:24:56 -05:00
Mark Backman
cd3290df1c Small cleanup for task creation in SpeechmaticsSTTService 2026-01-12 16:00:32 -05:00
Mark Backman
2296caf529 Merge pull request #3414 from pipecat-ai/mb/changelog-3410
Update changelog for PR 3410.changed.md
2026-01-12 13:43:42 -05:00
Mark Backman
90ded6658d Merge pull request #3403 from pipecat-ai/mb/inworld-tts-add-keepalive
InworldTTSService: Add keepalive task
2026-01-12 13:31:24 -05:00
Mark Backman
7e97fb80a5 Merge pull request #3392 from pipecat-ai/mb/websocket-service-connection-closed-error
Add reconnect logic to WebsocketService in the event of ConnectionClo…
2026-01-12 13:11:43 -05:00
Mark Backman
b58471fdb1 Add Exotel and Vonage to Serializers in README services list 2026-01-12 12:24:56 -05:00
Aleix Conchillo Flaqué
46b4f9f29b Merge pull request #3413 from pipecat-ai/aleix/fix-assistant-thought-aggregation
LLMAssistantAggregator: reset aggregation after adding the thought, not before
2026-01-12 09:21:42 -08:00
Aleix Conchillo Flaqué
ec20d72aba LLMAssistantAggregator: reset aggregation after adding the thought, not before 2026-01-12 09:18:13 -08:00
Mark Backman
5743e2a99b Update changelog for PR 3410.changed.md 2026-01-12 12:15:40 -05:00
Mark Backman
2f429a2e76 Merge pull request #3410 from Vonage/feat/fastapi-ws-vonage-serializer
feat: update FastAPI WebSocket transport and add Vonage serializer
2026-01-12 12:10:57 -05:00
Varun Pratap Singh
3e982f7a4a refactor: rename audio_packet_bytes to fixed_audio_packet_size 2026-01-12 22:11:39 +05:30
Mark Backman
89484e281d Remove foundational examples 41a and 41b 2026-01-12 10:11:58 -05:00
Varun Pratap Singh
14a115f372 changelog: add fragments for PR #3410 2026-01-12 18:12:27 +05:30
Varun Pratap Singh
e96595fe59 feat: update FastAPI WebSocket transport and add Vonage serializer 2026-01-12 17:50:38 +05:30
Mark Backman
f58d21862b WebsocketService: Add _maybe_try_reconnect and use for exception cases 2026-01-11 16:43:37 -05:00
Om Chauhan
38506f51f7 fix(openrouter): handle multiple system messages for Gemini models 2026-01-11 21:19:47 +05:30
Mark Backman
aac24ad2d4 InworldTTSService: Add keepalive task 2026-01-10 11:20:20 -05:00
Aleix Conchillo Flaqué
1df9575e20 Merge pull request #3400 from pipecat-ai/aleix/ensure-bot-speaking-flag-is-set
BaseOutputTransport: ensure bot speaking flag is set on time
2026-01-10 07:34:26 -08:00
Aleix Conchillo Flaqué
64609fe80f BaseOutputTransport: ensure bot speaking flag is set on time 2026-01-09 20:40:25 -08:00
Aleix Conchillo Flaqué
533a54e111 Merge pull request #3399 from pipecat-ai/aleix/groq-switch-orpheus
GroqTTSService: switch to canopylabs/orpheus-v1-english
2026-01-09 20:39:56 -08:00
Aleix Conchillo Flaqué
b59c3eb470 GroqTTSService: switch to canopylabs/orpheus-v1-english 2026-01-09 18:14:48 -08:00
Aleix Conchillo Flaqué
0366fc35cb Merge pull request #3398 from pipecat-ai/aleix/examples-foundational-fix-49c-transport
examples(foundational): add missing transport.output() to 49c
2026-01-09 17:39:54 -08:00
Aleix Conchillo Flaqué
d86ff4b1ee Merge pull request #3397 from pipecat-ai/aleix/add-setup-pipeline-task
PipelineTask: add external pipeline task setup files
2026-01-09 17:39:25 -08:00
Aleix Conchillo Flaqué
f8040324e1 Merge pull request #3396 from pipecat-ai/aleix/add-pipeline-task-pipeline-property
PipelineTask: add pipeline property
2026-01-09 17:38:51 -08:00
Mark Backman
9c81acb159 Track websocket disconnecting status to improve error handling 2026-01-09 20:24:07 -05:00
Aleix Conchillo Flaqué
65395b1112 examples(foundational): add missing transport.output() to 49c 2026-01-09 16:44:04 -08:00
Aleix Conchillo Flaqué
d2696be03b PipelineTask: add external pipeline task setup files 2026-01-09 16:42:27 -08:00
Aleix Conchillo Flaqué
2da4d420f9 PipelineTask: add pipeline property 2026-01-09 15:47:02 -08:00
Aleix Conchillo Flaqué
a992f95c02 clarify changelog with #3343 fix 2026-01-09 10:37:16 -08:00
Aleix Conchillo Flaqué
edd8e07df6 update changelog with #3343 fix 2026-01-09 10:31:29 -08:00
Aleix Conchillo Flaqué
c813d43da0 Merge pull request #3343 from omChauhanDev/fix/auto-resolve-function-result
fix: keeping the Aggregator and Service states synchronized.
2026-01-09 10:04:20 -08:00
Aleix Conchillo Flaqué
c973445ab7 Merge pull request #3385 from pipecat-ai/aleix/context-aggregator-turn-stop-messages
user and assistant aggregator turn events
2026-01-09 09:52:48 -08:00
Aleix Conchillo Flaqué
25f6ba76d6 add start timestamp to user and assistant turn messages 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
8f47c569f9 examples(foundational): add 28-user-assistant-turns.py 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
c16801e524 examples(foundational): update 49 series with on_assistant_thought 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
dafcd0448f added changelog for new assistant turn events 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
24a52375c7 tests: added LLMAssistantAggregator unit tests 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
5f9e95038e BaseObject: improve logging messages 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
5cbb21afb2 deprecate TranscriptProcessor and related dataclasses and frames 2026-01-09 09:50:21 -08:00
Aleix Conchillo Flaqué
119fab2996 LLMAssistantAggregator: allow thought aggregation without appending to context 2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
38d354c4ed LLMAssistantAggregator: add assistant turn and thought events 2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
cdb1074e11 LLMAssistantAggregator: no need to use BotStoppedSpeakingFrame
The end of turn is already handle with interruptions or with
LLMFullResponseEndFrame. LLMFullResponseEndFrame should never be blocked,
otherwise the assistant would not work.
2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
4b61fd2d7d LLMUserAggregator: add user turn stopped message argument
It is now possible to get the user aggregation when a `on_user_turn_stopped`
event is emitted.
2026-01-09 09:42:41 -08:00
Aleix Conchillo Flaqué
5a0a5c120b Merge pull request #3394 from pipecat-ai/aleix/base-smart-turn-update-vad-start-secs
smartturn: rename on_vad_start_secs_updated to update_vad_start_secs
2026-01-09 09:40:29 -08:00
Aleix Conchillo Flaqué
d92926ae54 smartturn: rename on_vad_start_secs_updated to update_vad_start_secs 2026-01-09 09:34:15 -08:00
Aleix Conchillo Flaqué
b34af5da24 Merge pull request #3372 from pipecat-ai/aleix/add-user-turn-controller-processor
add new UserTurnController and UserTurnProcessor
2026-01-09 09:29:10 -08:00
Aleix Conchillo Flaqué
5da1f86575 scripts: add 53-concurrent-llm-evaluation.py to release evals 2026-01-09 09:26:38 -08:00
Aleix Conchillo Flaqué
b0185e3539 tests: improve LLMUserAggregator tests 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
7232da6ba1 tests: added unit tests for UserTurnProcessor 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
9dff75cd44 examples: add 53-concurrent-llm-evaluation.py 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
6038860be0 tests: added unit tests for UserTurnController 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
4653de9f03 tests: rename test_bot_turn_start_strategy to test_user_turn_stop_strategy 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
fef79651ef turns: add UserTurnProcessor for advanced pipeline user turn management 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
3d54ca0a7c LLMUserAggregator: user UserTurnController for user turn management 2026-01-09 09:21:28 -08:00
Aleix Conchillo Flaqué
199986815c turns: add UserTurnController for user turn management 2026-01-09 09:21:28 -08:00
Filipi da Silva Fuchter
0a3c00f68b Merge pull request #3391 from pipecat-ai/filipi/krisp_followup_improvements
Krisp VIVA follow-up improvements
2026-01-09 10:39:23 -05:00
marcus-daily
3e2467eb71 Fixing ruff formatting 2026-01-09 15:07:13 +00:00
marcus-daily
c4cc476c3d Updating changelog 2026-01-09 15:07:13 +00:00
marcus-daily
cc6ff1ac54 Reverting quickstart to match main 2026-01-09 15:07:13 +00:00
marcus-daily
b075502c4c Addressing code review comments 2026-01-09 15:07:13 +00:00
marcus-daily
35a99f92ab Take into account VAD start_secs when passing audio data to Smart Turn, and add an extra 500ms of pre-speech audio for good measure 2026-01-09 15:07:13 +00:00
Mark Backman
4fe0836cf9 Add reconnect logic to WebsocketService in the event of ConnectionClosedError 2026-01-09 09:03:01 -05:00
filipi87
8b7cc65ae6 Mentioning the Krisp Viva improvements in the changelog. 2026-01-09 10:43:01 -03:00
filipi87
4d495ba74f Fixing ruff format. 2026-01-09 10:32:36 -03:00
filipi87
de5de0b162 Fixed KrispVivaTurn to properly release the Krisp SDK. 2026-01-09 10:31:17 -03:00
filipi87
311da30802 Updating the Krisp Viva example to use Krisp turn model. 2026-01-09 10:19:13 -03:00
Garegin Harutyunyan
16819a5caa Krisp VIVA SDK Filter and Turn support. (#3261)
* Krisp VIVA SDK Filter and Turn support.

* Reverted the krisp_filter.py as it's already deprectaed.

* enabled test with krisp_audio mock.

* More review comment fixes.
reverted the state logic in viva filter to be similar to the existing impl on main branch.
Fixed tests, ruff, etc.

* More review comments for Turn detection.
removed integration tests.

* Moved the SDK init/deinit into start/stop
2026-01-09 08:15:08 -05:00
Mark Backman
72a44c2fcd Merge pull request #3386 from pipecat-ai/mb/deepgram-deprecate-vad-events
Deprecate support for vad_events in DeepgramSTTService
2026-01-09 07:56:03 -05:00
Mark Backman
7783b20b91 Merge pull request #3390 from dhruvladia-sarvam/update/sarvam-plugins 2026-01-09 07:11:13 -05:00
dhruvladia-sarvam
962ccbc0d7 fix 2026-01-09 14:26:28 +05:30
Mark Backman
4d61c5d7b2 Deprecate support for vad_events in DeepgramSTTService 2026-01-08 20:32:30 -05:00
Mark Backman
7ca4597ade Merge pull request #3379 from lukepayyapilli/fix/fastapi-websocket-json-text-handling
Fix FastAPIWebsocketTransport to handle both binary and text messages
2026-01-08 17:26:35 -05:00
Luke Payyapilli
f1a22728ab Add websocket extra to coverage workflow 2026-01-08 17:13:31 -05:00
Luke Payyapilli
ca88fc849f Add websocket extra to CI for FastAPI test coverage 2026-01-08 17:09:27 -05:00
Luke Payyapilli
ccd795445f Fix protobuf serializer test to compare attributes instead of frame objects 2026-01-08 17:00:40 -05:00
Luke Payyapilli
1874269a48 Remove FrameSerializerType enum and type property from serializers 2026-01-08 16:54:23 -05:00
Mark Backman
8b20373a8e Merge pull request #3380 from pipecat-ai/mb/changelog-3366
Add changelog fragment for PR 3366
2026-01-08 14:49:47 -05:00
Aleix Conchillo Flaqué
15dcb77a0c Merge pull request #3364 from pipecat-ai/rajneesh/add-daily-sip-provider-option
Add support for specifying sip provider.
2026-01-08 11:47:09 -08:00
Mark Backman
5d2fac9cd7 Add changelog fragment for PR 3366 2026-01-08 14:43:23 -05:00
Mark Backman
682b253760 Merge pull request #3366 from lukepayyapilli/fix/cartesia-allow-none-language
Allow language=None in CartesiaTTSService for auto-detection
2026-01-08 14:42:09 -05:00
Luke Payyapilli
f440de82e2 Handle None language in _process_word_timestamps_for_language 2026-01-08 13:59:21 -05:00
Mark Backman
5e0e6822c7 Merge pull request #3360 from pipecat-ai/mb/openai-realtime-send-image
Add video input (e.g. image input) support for OpenAI Realtime
2026-01-08 13:26:35 -05:00
Mark Backman
2aadac7a4d Update OpenAIRealtime image to video to align with GeminiLive 2026-01-08 13:23:08 -05:00
Filipi da Silva Fuchter
1098394486 Merge pull request #3374 from pipecat-ai/filipi/external_turn_controllers_interruptions
External turn controllers improvements
2026-01-08 13:05:41 -05:00
Mark Backman
b90a34228f Update 19c to remove pausing audio and input 2026-01-08 13:00:45 -05:00
Mark Backman
8bf8ebd34b Remove start_audio_paused from OpenAI Realtime demos, and others 2026-01-08 13:00:45 -05:00
Mark Backman
673d88417c Change Gemini Live and OpenAI Realtime logging to trace when sending a video frame 2026-01-08 13:00:45 -05:00
Mark Backman
3a7b489208 Add foundational 19c and add to evals 2026-01-08 13:00:45 -05:00
Mark Backman
7ae9eebc34 Add image input support for OpenAI Realtime 2026-01-08 13:00:44 -05:00
Mark Backman
8f83ba5878 Merge pull request #3376 from dhruvladia-sarvam/update/sarvam-plugins
headers update
2026-01-08 12:57:32 -05:00
filipi87
b8af3fa214 Improving should_interrupt docs for Speechmatics. 2026-01-08 14:53:29 -03:00
dhruvladia-sarvam
5ddec4f596 fix 2026-01-08 23:07:40 +05:30
dhruvladia-sarvam
8f4b4f4941 fix 2026-01-08 23:04:43 +05:30
dhruvladia-sarvam
953349f262 fix 2026-01-08 22:53:59 +05:30
Luke Payyapilli
b52ae0e56b Fix FastAPIWebsocketTransport to handle both binary and text messages 2026-01-08 11:25:18 -05:00
dhruvladia-sarvam
893b448534 headers update 2026-01-08 21:09:41 +05:30
Mark Backman
973769b8bc Merge pull request #3370 from pipecat-ai/mb/fix-azure-tts
AzureTTSService cleanup
2026-01-08 09:34:02 -05:00
filipi87
c8fa9d34e1 Adding a changelog entry for the new should_interrupt property. 2026-01-08 10:58:07 -03:00
filipi87
3069deb92f Allows defining whether Speechmatics should send an interruption when the user’s turn has started. 2026-01-08 10:50:33 -03:00
filipi87
68c9c01747 Allows defining whether Flux should send an interruption when the user’s turn has started. 2026-01-08 10:44:53 -03:00
filipi87
5e8f0baa12 Allows defining whether Deepgram should send an interruption when the user’s turn has started. 2026-01-08 10:36:52 -03:00
Mark Backman
8d1286cc00 Merge pull request #3371 from speechmatics/fix/voice-version-bump 2026-01-08 07:21:43 -05:00
Aleix Conchillo Flaqué
bda4dd339a Merge pull request #3373 from pipecat-ai/aleix/update-copyright-notices-2026
update examples and tests copyright and use a proper dash in 2024-2026
2026-01-07 20:36:40 -08:00
rajneeshksoni
f2e3034d24 Add support for specifying sip provider.
optional "provider" field in the RoomSipParams
2026-01-08 09:05:56 +05:30
Aleix Conchillo Flaqué
2626154a64 update examples and tests copyright and use a proper dash in 2024-2026 2026-01-07 19:32:22 -08:00
Sam Sykes
b770b2a419 Changelog 2026-01-07 16:56:56 -08:00
Sam Sykes
158c34b0f9 version bump 2026-01-07 16:54:53 -08:00
Mark Backman
d507c88d3e Merge pull request #3369 from pipecat-ai/mb/copyright-2026
Update copyright date range to 2024-2026
2026-01-07 17:07:05 -05:00
Mark Backman
98f70b775f Update copyright date range to 2024-2026 2026-01-07 16:58:13 -05:00
Mark Backman
54f4b824e4 Merge pull request #3356 from pipecat-ai/mb/gemini-live-user-transcript-timeout
Add timeout for handling user transcript messages
2026-01-07 16:47:23 -05:00
Mark Backman
2aa5307f0a Add _push_user_transcription to unify the logic to push user transcripts from a single utility function 2026-01-07 16:43:48 -05:00
Mark Backman
6c10d6ef8a Merge pull request #3367 from pipecat-ai/marcus/smart-turn-v3.2
Updated Smart Turn model weights to v3.2
2026-01-07 16:37:53 -05:00
Mark Backman
89b36f2b25 AzureTTSService: Restore metrics generation 2026-01-07 16:33:52 -05:00
Mark Backman
79a6adbcf3 AzureTTSService: Handle first chunk only for timestamps and TTFB metrics 2026-01-07 16:15:01 -05:00
Mark Backman
95f00a3c4b AzureTTSService: Align error handling with Pipecat norms 2026-01-07 15:45:30 -05:00
Mark Backman
3f8373f76f AzureTTSService: prevent word timestamp carryover on interruption 2026-01-07 15:39:37 -05:00
Mark Backman
23a9d3f4d7 Merge pull request #3334 from obata-kotobasamurai/fix/azure-tts-word-timestamp
Add word-level timestamp support to Azure TTS with race condition fix
2026-01-07 14:48:02 -05:00
Mark Backman
333279f45a Merge pull request #3328 from speechmatics/fix/speectmatics-vad
Update to SpeechmaticsSTTService for `0.0.99`
2026-01-07 14:42:21 -05:00
yukiobata1
add5f51201 updated azure tts.py file 2026-01-08 03:14:37 +09:00
marcus-daily
d1bedef5b3 Updated Smart Turn model weights to v3.2 2026-01-07 17:23:11 +00:00
Mark Backman
54cf0116a8 Merge pull request #3363 from pipecat-ai/mb/update-audo-context-inheritance
Update AudioContextTTSService to inherit from WebsocketTTSService
2026-01-07 12:08:47 -05:00
Luke Payyapilli
6b252fb46e Allow language=None in CartesiaTTSService for auto-detection 2026-01-07 11:50:21 -05:00
Sam Sykes
3e00a16f0f Remove unused import and correction to docs. 2026-01-07 07:45:26 -08:00
Sam Sykes
ecfd93544a Correction to UserStartedSpeakingFrame timing. 2026-01-07 07:43:47 -08:00
Sam Sykes
3ec89e49bf Added changelog for split_sentences and code tidy for end of turn handling. 2026-01-07 07:41:49 -08:00
Mark Backman
8762506e9f Update AudioContextTTSService to inherit from WebsocketTTSService 2026-01-07 09:10:27 -05:00
yukiobata1
7204bf9914 added changegelog 2026-01-07 13:32:31 +09:00
yukiobata1
f62c262f23 Call start_word_timestamps() when the first audio chunk arrives 2026-01-07 13:10:41 +09:00
Mark Backman
10aa784809 Merge pull request #3351 from okue/fix/stt-model-name-attribute
Fix STT model name attribute retrieval in tracing decorator
2026-01-06 16:10:56 -05:00
Filipi da Silva Fuchter
904f5dc183 Merge pull request #3338 from omChauhanDev/fix/smallwebrtc-mute-timeout-spam
fix(smallwebrtc): suppress timeout warnings when tracks are disabled
2026-01-06 09:07:52 -05:00
Mark Backman
c61a5e7173 Merge pull request #3346 from pipecat-ai/mb/cartesia-pronunciation-dict
Cartesia TTS: Add support for pronunciation_dict_id
2026-01-06 08:52:09 -05:00
Filipi da Silva Fuchter
81b28beef5 Merge pull request #3357 from pipecat-ai/filipi/live_avatar
Added support for using the HeyGen LiveAvatar API with the HeyGenTransport
2026-01-06 08:22:39 -05:00
filipi87
0d34356678 Adding a changelog entry for the HeyGen LiveAvatar API change. 2026-01-06 10:19:19 -03:00
filipi87
5412840a93 Added support for using the HeyGen LiveAvatar API with the HeyGenTransport. 2026-01-06 10:16:12 -03:00
yukiobata1
137bbb3d2c updated tts.py to match mark's version 2026-01-06 21:16:13 +09:00
Mark Backman
5a40054ac2 Merge pull request #3216 from mayurdd/patch-1
Adding include_language_detection param to Elevenlabs Realtime STT
2026-01-05 17:01:02 -05:00
Mark Backman
be621fbc5c Add timeout for handling user transcript messages 2026-01-05 16:58:14 -05:00
Mark Backman
9ab4836601 Merge pull request #3323 from pipecat-ai/mb/changelog-3322
Add changelog fragment for PR #3322
2026-01-05 16:55:52 -05:00
mayurdd
4671102833 Addressing the comments 2026-01-05 13:35:37 -08:00
Mayur Sirwani
67401a275b Adding include_language_detection to Elevenlabs Realtime STT
Adding a param to the config while connecting to the session
2026-01-05 13:27:27 -08:00
Mark Backman
c422588071 Merge pull request #3345 from pipecat-ai/mb/avoid-tts-dot
Add trailing space to DeepgramTTSService text generation
2026-01-05 15:56:14 -05:00
kompfner
fb12fec899 Merge pull request #3354 from pipecat-ai/pk/fix-aws-nova-sonic-example-for-nova-2-sonic
Fix the 20e example to use the proper conversation-start pattern for …
2026-01-05 11:17:57 -05:00
Paul Kompfner
c53c49558f Fix the 20e example to use the proper conversation-start pattern for the Nova 2 Sonic model 2026-01-05 10:56:08 -05:00
okue
1a26a2daa4 Fix STT model name attribute retrieval in tracing decorator
Changed getattr with default value to use 'or' operator for fallback.
This ensures proper model name retrieval when model_name attribute exists but is None or empty.
2026-01-05 17:20:48 +09:00
Mark Backman
d8be1282b5 Cartesia TTS: Add support for pronunciation_dict_id 2026-01-04 09:30:04 -05:00
Mark Backman
91bc5236b5 Add trailing space to DeepgramTTSService text generation 2026-01-04 08:53:48 -05:00
Om Chauhan
1ceb01665f fix: treat language as first-class STT setting 2026-01-04 11:04:30 +05:30
Om Chauhan
b278957111 fix: broadcast FunctionCallResultFrame, on implicit return 2026-01-03 19:52:25 +05:30
Mark Backman
1c80c739d6 Merge pull request #3335 from pipecat-ai/mb/update-evals-07-variants
Add 07 example variants to release evals
2026-01-02 15:32:12 -05:00
Om Chauhan
700a94222b fix(smallwebrtc): suppress timeout warnings when tracks are disabled 2026-01-01 22:00:08 +05:30
Sam Sykes
d5d2156689 Updated changelog. 2025-12-31 19:07:11 +00:00
Sam Sykes
8203ad08a8 Updated to have default as FIXED for Pipecat VAD. 2025-12-31 19:05:29 +00:00
Mark Backman
31907b90f0 Add 07 example variants to release evals 2025-12-31 09:11:00 -05:00
Mark Backman
7b595f10ce Merge pull request #3329 from omChauhanDev/deepgram-tts-validation
added encoding validation in DeepgramTTSService
2025-12-31 08:20:40 -05:00
yukiobata1
4f93d331b7 Added await to self.start_word_timestamps() 2025-12-31 19:19:21 +09:00
yukiobata1
32c6dccebe Add word-level timestamp support to Azure TTS with cumulative PTS fix
This commit adds word boundary support to AzureTTSService and fixes
the race condition that causes scrambled TTS output across multiple
sentences.

## Features Added

- Change AzureTTSService to inherit from WordTTSService
- Subscribe to Azure SDK's synthesis_word_boundary event
- Emit word-level text with timing information via _words_queue
- Add synthesis lock for sequential sentence processing

## Race Condition Fix

Previously, each sentence's word boundary timestamps reset to 0,
causing downstream components to interleave words when reordering
frames by PTS. This resulted in scrambled output like:
  'Hello ! I What am questions AI have assistant...'

The fix adds cumulative audio offset tracking to ensure monotonically
increasing PTS across all sentences:
  Sentence 1: pts = 0.1s, 0.5s, 0.8s (cumulative at end: 0.8s)
  Sentence 2: pts = 0.9s, 1.2s, 1.5s (0.8s + relative offset)

## Key Changes

- _cumulative_audio_offset: tracks total audio duration
- _handle_word_boundary: adds cumulative offset to timestamps
- _handle_completed: accumulates audio duration for next sentence
- flush_audio: resets cumulative offset at end of LLM response
- _handle_interruption: resets state on user interruption
- run_tts: uses synthesis lock for sequential processing

Fixes #2918

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 18:49:48 +09:00
Aleix Conchillo Flaqué
cbdc2b7d2d Merge pull request #3330 from pipecat-ai/aleix/update-turn-start-strategies-deprecations
update turn start strategies deprecations
2025-12-30 21:04:47 -08:00
Aleix Conchillo Flaqué
66a9dc70c7 LLMUserAggregator: fix turn strategies renaming 2025-12-30 20:59:48 -08:00
Aleix Conchillo Flaqué
846ca500d3 turns: update old turn_start_strategies deprecations 2025-12-30 19:50:10 -08:00
Om Chauhan
bd6afd445d added changelog 2025-12-31 09:18:18 +05:30
Om Chauhan
0663bbc2fb added encoding validation in DeepgramTTSService 2025-12-31 08:33:17 +05:30
Sam Sykes
8e7a951af8 updated changelog 2025-12-31 01:36:58 +00:00
Sam Sykes
ba1aeb8f7f Changelog 2025-12-31 01:31:46 +00:00
Sam Sykes
f7c74cfa80 Updated VAD 2025-12-31 01:28:31 +00:00
Mark Backman
2e700c8576 Merge pull request #3324 from pipecat-ai/mb/bump-small-webrtc-prebuilt-version
Bump small-webrtc-prebuilt verison to 2.0.4, update uv.lock
2025-12-30 20:10:11 -05:00
Aleix Conchillo Flaqué
fd2efb3b3a Merge pull request #3325 from pipecat-ai/aleix/rename-bot-turn-start-to-user-turn-stop
turns: rename bot turn start to user turn stop strategies
2025-12-30 14:36:02 -08:00
Aleix Conchillo Flaqué
eb5a797b12 turns: rename bot turn start to user turn stop strategies 2025-12-30 14:33:58 -08:00
Mark Backman
f4626a4fc4 Bump small-webrtc-prebuilt verison to 2.0.4, update uv.lock 2025-12-30 14:19:20 -05:00
Aleix Conchillo Flaqué
fb9a772e33 Merge pull request #3319 from pipecat-ai/aleix/openaillmcontext-backwards-compatibility
BaseInputTransport: fix OpenAILLMContext backwards compatibility
2025-12-30 09:35:43 -08:00
Aleix Conchillo Flaqué
4630e76942 ExternalUserTurnStartStrategy: disable interruptions 2025-12-30 09:32:31 -08:00
Aleix Conchillo Flaqué
4dba9ea329 BaseInputTransport: fix OpenAILLMContext backwards compatibility 2025-12-30 09:32:31 -08:00
Mark Backman
233bc23bf9 Merge pull request #3320 from joshwhiton/fix-changelog-numba-pin
Fix numba pin wording in changelog
2025-12-30 08:50:06 -05:00
Mark Backman
e0b40a330f Add changelog fragment for PR #3322 2025-12-30 08:40:28 -05:00
Mark Backman
9c6d0f1be1 Merge pull request #3322 from gui217/fix/rnnoise_filter_handle_empty_audio
Fix/rnnoise filter handle empty audio
2025-12-30 08:39:42 -05:00
gui217
32c3298eff Apply ruff formatting to test file 2025-12-30 13:39:36 +02:00
gui217
ec5fb392c4 Clean up test comments after rnnoise filter fix 2025-12-30 13:35:13 +02:00
gui217
bad8f8aa51 Fix rnnoise filter to handle empty audio 2025-12-30 13:32:36 +02:00
joshwhiton
6a7b6bcded Fix numba pin wording in changelog 2025-12-30 11:26:22 +07:00
Aleix Conchillo Flaqué
00548769cb Merge pull request #3318 from pipecat-ai/aleix/llm-user-aggregator-allow-interruptions
LLMUserAggregator: also read deprecated allow_interruptions
2025-12-29 18:11:57 -08:00
Aleix Conchillo Flaqué
0a0ab51cc7 LLMUserAggregator: also read deprecated allow_interruptions 2025-12-29 17:57:18 -08:00
Mark Backman
8339c2c2c7 Merge pull request #3317 from pipecat-ai/mb/add-changelog-other
Add 'other' changelog category
2025-12-29 20:46:18 -05:00
Aleix Conchillo Flaqué
ad4c22cf44 Merge pull request #3316 from pipecat-ai/aleix/llm-user-aggreagtor-enable-interruptions
turns(user): add support for enabling/disabling interruptions
2025-12-29 17:45:56 -08:00
Mark Backman
8ac6421988 Add 'other' changelog category 2025-12-29 20:43:24 -05:00
Aleix Conchillo Flaqué
9fe99ed880 add and update changelog entries 2025-12-29 17:35:10 -08:00
Aleix Conchillo Flaqué
97ab0d4f53 examples: added 52-live-translation without interruptions 2025-12-29 17:30:06 -08:00
Aleix Conchillo Flaqué
ffbbb1b3f5 turns(user): add support for enabling/disabling interruptions 2025-12-29 17:00:03 -08:00
Aleix Conchillo Flaqué
e22a6c9e4d Merge pull request #3305 from omChauhanDev/fix_unregister_function
fix: missing key access by adding existence check
2025-12-29 14:52:41 -08:00
Aleix Conchillo Flaqué
09e79149ea Merge pull request #3310 from omChauhanDev/fix-task-manager
fix: preserve asyncio.Task return value in create_task
2025-12-29 14:43:24 -08:00
Aleix Conchillo Flaqué
c799d63f8c Merge pull request #3308 from pipecat-ai/aleix/external-turn-start-strategies
turns: add external user and bot turn start strategies
2025-12-29 14:42:38 -08:00
Aleix Conchillo Flaqué
bd9a316d7a transports: don't use interruptions_allowed to avoid deprecation warning 2025-12-29 14:40:00 -08:00
Aleix Conchillo Flaqué
c8f47b4b22 turns: add UserTurnStartedParams and BotTurnStartedParams 2025-12-29 14:32:08 -08:00
Aleix Conchillo Flaqué
cf46431d92 update changelog file 2025-12-29 10:30:41 -08:00
Mark Backman
c28ed2206c DeepgramSTTService pushes user started/stopped speaking and interruption frames 2025-12-29 10:17:35 -08:00
Mark Backman
30e6a33930 Update VoicemailDetector to use ExternalTurnStartStrategies 2025-12-29 10:17:35 -08:00
Aleix Conchillo Flaqué
46db8e58d6 LLMUserAggregator: fix backwards compatibility with ExternalTurnStartStrategies 2025-12-29 10:17:35 -08:00
Aleix Conchillo Flaqué
e757b4bf6f tests: added external user and bot turn start strategies 2025-12-29 10:17:35 -08:00
Aleix Conchillo Flaqué
c821e9f8fd turns: add external user and bot turn start strategies
External strategies are strategies where the logic for user turn start and turn
end come from a different processors (e.g. an STT).
2025-12-29 10:17:35 -08:00
Mark Backman
01ce06c756 Merge pull request #3288 from pipecat-ai/mb/inworld-cleanup
Inworld TTS service clean up
2025-12-29 13:07:20 -05:00
Mark Backman
4bc490c843 Merge pull request #3289 from pipecat-ai/mb/audio-context-tts-service-base
Add AudioContextTTSService base class, update AudioContextWordTTSServ…
2025-12-29 13:05:06 -05:00
Mark Backman
345885fe7d Merge pull request #3271 from pipecat-ai/mb/changelog-3268
Update fragment name for 3268
2025-12-29 13:04:03 -05:00
Mark Backman
6475077fc8 Merge pull request #3313 from pipecat-ai/mb/ultravox-s2s-readme
Update Ultravox README link
2025-12-29 13:03:39 -05:00
Mark Backman
d646ca594b Update Ultravox README link 2025-12-29 11:43:28 -05:00
Mark Backman
7c0d897aa3 Merge pull request #3300 from omChauhanDev/nvidia-expose-use_ssl-param
exposed use_ssl param in nvidia services
2025-12-29 09:18:26 -05:00
Aleix Conchillo Flaqué
0e8e3afc85 Merge pull request #3307 from pipecat-ai/aleix/simplify-turns-package-imports
turns: simplify imports and don't require full strategy module path
2025-12-28 18:51:23 -08:00
Aleix Conchillo Flaqué
db85043841 Merge pull request #3297 from pipecat-ai/aleix/deprecate-allow-interruptions
deprecate allow interruptions
2025-12-28 18:50:15 -08:00
Om Chauhan
a181e01310 fixed: create_task to return coroutine result 2025-12-29 07:46:15 +05:30
Aleix Conchillo Flaqué
5496aa722f turns: simplify imports and don't require full strategy module path 2025-12-28 16:20:15 -08:00
Aleix Conchillo Flaqué
053f59ed6e FrameProcessor: deprecated interruptions_allowed 2025-12-28 08:27:02 -08:00
Aleix Conchillo Flaqué
5b93fb9609 PipelineTask: deprecate allow_interruptions parameter 2025-12-28 08:27:02 -08:00
Aleix Conchillo Flaqué
192ede6e34 Merge pull request #3298 from pipecat-ai/aleix/push-user-started-speaking-first
push UserStartedSpeakingFrame before interruption
2025-12-28 08:24:50 -08:00
Aleix Conchillo Flaqué
956f004424 Merge pull request #3296 from pipecat-ai/aleix/move-turn-start-strategies-to-aggregator
LLMUserAggregator: move turn_start_strategies from PipelineTask
2025-12-28 08:19:23 -08:00
Aleix Conchillo Flaqué
8b861d9143 LLMUserAggregator: move turn_start_strategies from PipelineTask 2025-12-28 08:16:34 -08:00
Aleix Conchillo Flaqué
e5bd55d1d5 Merge pull request #3292 from pipecat-ai/aleix/initial-user-mute-strategies
initial user mute strategies
2025-12-28 08:14:48 -08:00
Aleix Conchillo Flaqué
094d9fd7d7 turns(mute): make strategies available in __init__ 2025-12-28 08:12:44 -08:00
Aleix Conchillo Flaqué
c7589663b5 deprecate STTMuteFilter in favor of LLMUSerAggregator user mute strategies 2025-12-28 08:12:44 -08:00
Om Chauhan
0f144f48cb fix: missing key access by adding existence check 2025-12-28 10:28:37 +05:30
Aleix Conchillo Flaqué
a962c4eeba STTMuteFilter: use FunctionCallsStartedFrame and support multiple function calls 2025-12-27 13:52:30 -08:00
Aleix Conchillo Flaqué
43fc26cf0e tests: add user mute strategies tests to user aggregator 2025-12-27 13:49:31 -08:00
Aleix Conchillo Flaqué
53b450c1d1 added changelog entry for user mute strategies 2025-12-27 13:49:31 -08:00
Aleix Conchillo Flaqué
0efa36a04e examples(foundational): added 24-user-mute-strategy.py example 2025-12-27 13:49:31 -08:00
Om Chauhan
edc7db22b6 renamed changelog 2025-12-26 22:21:24 +05:30
Om Chauhan
2c2317de5d added changelog 2025-12-25 20:23:27 +05:30
Om Chauhan
604384b3ce exposed use_ssl param 2025-12-25 20:09:42 +05:30
Aleix Conchillo Flaqué
260b7e7959 push UserStartedSpeakingFrame before interruption 2025-12-24 15:33:44 -08:00
Aleix Conchillo Flaqué
0abaae2f07 LLMUserAggregator: no need to reset strategies
Turn start strategies are already reset when triggered, so there's no need to
reset them again.
2025-12-24 11:08:43 -08:00
Aleix Conchillo Flaqué
30922d365f minor turn start strategies cleanup 2025-12-24 11:08:43 -08:00
Aleix Conchillo Flaqué
c33c8d2195 LLMUserAggregator: add support for user mute strategies 2025-12-24 11:08:43 -08:00
Aleix Conchillo Flaqué
5a4236bc71 tests: add user mute strategy tests 2025-12-24 11:08:43 -08:00
Aleix Conchillo Flaqué
1d70275574 initial user mute strategies 2025-12-24 11:08:43 -08:00
Aleix Conchillo Flaqué
ee35ea0966 Merge pull request #3291 from pipecat-ai/aleix/llm-user-aggregator-timeout
LLMUserAggregator bot turn start strategies timeout fallback
2025-12-23 18:34:57 -08:00
Aleix Conchillo Flaqué
ffb5895404 tests: add initial tests for universal LLMUserAggregator 2025-12-23 15:51:06 -08:00
Aleix Conchillo Flaqué
1f0357ae5e LLMUserAggregator: add bot turn start strategies timeout fallback 2025-12-23 15:42:57 -08:00
Mark Backman
44a698cbcc Add AudioContextTTSService base class, update AudioContextWordTTSService inheritance 2025-12-23 10:36:17 -05:00
Mark Backman
74ab68cb58 Add changelog fragment 2025-12-23 10:15:50 -05:00
Mark Backman
5038ebf205 Clean up _receive_messages to use WebsocketService class 2025-12-23 09:44:21 -05:00
Mark Backman
1da215f576 Inworld TTS service clean up 2025-12-23 09:24:29 -05:00
Aleix Conchillo Flaqué
40493e8ce8 Merge pull request #3286 from pipecat-ai/aleix/improve-turn-analyzer-bot-turn-start-strategy
improve turn analyzer bot turn start strategy
2025-12-22 21:46:48 -08:00
Aleix Conchillo Flaqué
4017bfa769 LLMUserAggregator: improve turn_analyzer warning 2025-12-22 21:44:49 -08:00
Aleix Conchillo Flaqué
480a9d092c TurnAnalyzerBotTurnStartStrategy: make sure to use turn analyzer state 2025-12-22 16:29:48 -08:00
Aleix Conchillo Flaqué
b5fe1c9cd8 fix old interruption/speaking strategies docstrings 2025-12-22 16:19:25 -08:00
Mark Backman
49b53d72a9 Merge pull request #3276 from pipecat-ai/mb/grok-realtime-cleanup
GrokRealtimeLLMService cleanup
2025-12-22 18:13:23 -05:00
Aleix Conchillo Flaqué
ae9ee33af9 Merge pull request #3284 from pipecat-ai/aleix/min-words-bot-not-speaking
MinWordsUserTurnStartStrategy: single word interrupt if bot not speaking
2025-12-22 15:07:36 -08:00
Mark Backman
01466c19fc Merge pull request #3285 from pipecat-ai/mb/revert-changes-quickstat
Revert turn strategies changes to quickstart
2025-12-22 18:07:30 -05:00
Mark Backman
93689827e9 Revert turn strategies changes to quickstart 2025-12-22 18:05:05 -05:00
Aleix Conchillo Flaqué
a0d5ee3873 MinWordsUserTurnStartStrategy: single word interrupt if bot not speaking 2025-12-22 14:32:21 -08:00
Mark Backman
08a9b434c1 Merge pull request #3277 from pipecat-ai/mb/fix-deprecation-warning-LLMContextAssistantTimestampFrame
fix: Separate LLMContextAssistantTimestampFrame from OpenAILLMContext…
2025-12-22 13:51:26 -05:00
Mark Backman
2910b683a4 Fix STT services that rely on VAD stop speaking status to finalize the transcript (#3283)
Updates to AssemblyAISTTService, CartesiaSTTService, DeepgramSageMakerSTTService, DeepgramSTTService to use VADUser*SpeakingFrame
2025-12-22 12:54:06 -05:00
Mark Backman
0958c658db Merge pull request #3279 from pipecat-ai/mb/fix-11labs-realtime-stt-vad-speaking
fix: Update ElevenLabsRealtimeSTTService to use VADUser speaking frames
2025-12-22 12:11:18 -05:00
Mark Backman
00bb08bacc fix: Update ElevenLabsRealtimeSTTService to use VADUser speaking frames 2025-12-21 15:57:42 -05:00
Mark Backman
65f23adf4a fix: Separate LLMContextAssistantTimestampFrame from OpenAILLMContextAssistantTimestampFrame 2025-12-21 09:06:35 -05:00
Mark Backman
5ad8e5436d Add Grok Voice Agent to README services list 2025-12-20 08:11:41 -05:00
Mark Backman
845b4ad20e Add 51 foundational to evals 2025-12-20 08:07:25 -05:00
Mark Backman
32c4f914c4 Add event handling and class for response.function_call_arguments.delta 2025-12-20 08:06:39 -05:00
Mark Backman
348fa5a719 Improve SessionProperties initialization: remove voice from args, set default for TurnDetection 2025-12-20 08:02:48 -05:00
Mark Backman
0576783c5e Improve sample_rate handling in GrokRealtimeLLMService 2025-12-20 07:46:31 -05:00
Mrunmay Chichkhede
d7d979dde1 feat: Add GrokRealtimeLLMService for xAI Grok Voice Agent API (#3267) 2025-12-20 07:04:12 -05:00
Sam Sykes
76bae6e699 Update SpeechmaticsSTTService to use the python voice SDK 2025-12-19 19:59:18 -05:00
Mark Backman
f31416c5e4 Update fragment name for 3268 2025-12-19 17:55:10 -05:00
Aleix Conchillo Flaqué
5c779abad2 Merge pull request #3045 from pipecat-ai/aleix/redesign-interruption-strategies
introducing user and bot turn start strategies
2025-12-19 14:51:33 -08:00
Aleix Conchillo Flaqué
ec7a7ed048 add RNNoiseFilter to changelog and update pyrnnoise to 0.4.1 2025-12-19 14:48:06 -08:00
Aleix Conchillo Flaqué
bf791527dc update CHANGELOG for new user/bot turn start strategies 2025-12-19 14:48:06 -08:00
Aleix Conchillo Flaqué
5816f960cc LLMUserAggregator: add on_user_turn_started/on_bot_turn_started events 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
9bf6668b52 LLMUserAggregator: show error if using turn analyzer in transport 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
4a32aa5266 TurnAnalyzerBotTurnStartStrategy: don't use text on interim transcriptions 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
c9048d3a0f LLMUserAggregator: prevent consecutive user/bot turn starts 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
9e56d1ac65 TurnStartStrategies: set user and bot strategies defaults if None 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
d22e1f18bb examples: update with new user and bot turn start strategies 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
83263a30af llm_response: deprecate old LLMUserAggregatorParams and LLMAssistantAggregatorParams 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
169fc0b568 frames: deprecate emulated field in UserStartedSpeakingFrame/UserStoppedSpeakingFrame 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
a9cca0b934 LLMAssistantAggregatorParams: copy to llm_response_universal 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
dff6b5402a LLMUserAggregator: use TranscriptionUserTurnStartStrategy for emulated interruptions 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
2cdf207227 turns: add TranscriptionUserTurnStartStrategy 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
a388ff927c LLMUserAggregator: broadcast user started/stopped speaking frames 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
222ccbb471 SegmentedSTTService: use VAD user started/stopped speaking frames 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
49ebe34599 BaseInputTransport: broadcast SpeechControlParamsFrame 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
c4c4b4107b TurnAnalyzerBotTurnStartStrategy: broadcast SpeechControlParamsFrame 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
7e6b0839b0 examples(foundational): don't use legacy LLMUserAggregatorParams 2025-12-19 14:47:02 -08:00
Aleix Conchillo Flaqué
d33c72a8b0 LLMUserAggregator: allow external user started/stopped speaking frames 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
962eb73cc4 frames: deprecated EmulateUserStartedSpeakingFrame/EmulateUserStoppedSpeakingFrame 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
3d62b9c203 tests: added user turn start strategies unit tests 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
7e69288898 tests: added bot turn start strategies unit tests 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
76561da850 TranscriptionBotTurnStartStrategy: improve by using interim transcriptions 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
355fcf3282 BaseInputTransport: deprecate the use of turn analyzer in transport 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
359ac302f5 audio(interruptions): deprecate MinWordsInterruptionStrategy 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
223052e6e7 LLMUserAggregator: use new user and bot turn start strategies 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
0f6668d41b PipelineTask: pass turn start strategies to StartFrame 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
6a62c8d6da FrameProcessor: add user and bot turn start strategies 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
5dd3af25ac frames: add turn start strategies to StartFrame 2025-12-19 14:47:01 -08:00
Aleix Conchillo Flaqué
76c79a7dfa introduce new user and bot turn start strategies 2025-12-19 14:47:01 -08:00
Mark Backman
fac1a05eb5 Merge pull request #3268 from fixie-ai/mike/ttfb
Add ttfb tracking for Ultravox
2025-12-19 15:49:57 -05:00
kompfner
917c512aa8 Merge pull request #3263 from pipecat-ai/pk/deprecate-openai-llm-context
Deprecate `OpenAILLMContext` and associated things
2025-12-19 13:19:48 -05:00
Mike Depinet
5ec08ff1d8 Add ttfb tracking for Ultravox 2025-12-19 09:26:11 -08:00
Paul Kompfner
9b6f5853cf Deprecate OpenAILLMContext and associated things 2025-12-19 11:23:06 -05:00
Mark Backman
5e94b20562 Merge pull request #3233 from jaydamani/jay/improve-elevenlabs-services
Improve Elevenlabs realtime transcription service
2025-12-18 19:07:43 -05:00
Mark Backman
f6785de120 Merge pull request #3262 from pipecat-ai/mb/renumber-ultravox-foundational
Move Ultravox foundational example to 50, add to release evals
2025-12-18 14:31:46 -05:00
Mark Backman
56c58f7302 Move Ultravox foundational example to 50, add to release evals 2025-12-18 13:38:12 -05:00
Aleix Conchillo Flaqué
7f53483f6b Merge pull request #3257 from pipecat-ai/aleix/daily-transcriptions-track-type
add transport source to Daily transcriptions
2025-12-17 19:29:22 -08:00
Aleix Conchillo Flaqué
274db3e05c DailyTransport: add transport_source to transcription frames 2025-12-17 19:24:08 -08:00
Aleix Conchillo Flaqué
fb6c30156a pyproject: udpate daily-python to 0.23.0 2025-12-17 19:24:08 -08:00
Aleix Conchillo Flaqué
6c0e4be4ac Merge pull request #3205 from gui217/feat/rnnoise
Feat/rnnoise
2025-12-17 18:22:22 -08:00
Mark Backman
9623575b78 Merge pull request #3255 from pipecat-ai/mb/use-uv-ruff 2025-12-17 17:20:03 -05:00
Mark Backman
31b3bd737a Update linting scripts to use ruff version installed by uv 2025-12-17 16:31:14 -05:00
Aleix Conchillo Flaqué
f9fef78070 Merge pull request #3253 from pipecat-ai/changelog-0.0.98
Release 0.0.98 - Changelog Update
2025-12-17 11:22:35 -08:00
Aleix Conchillo Flaqué
92970c7873 changelog: add PR prefix to PR link 2025-12-17 14:20:34 -05:00
aconchillo
491d298c10 Update changelog for version 0.0.98 2025-12-17 11:16:03 -08:00
Aleix Conchillo Flaqué
c46a20328d changelog: fix 3230 entry 2025-12-17 11:06:57 -08:00
Aleix Conchillo Flaqué
7e4dbf42e8 Merge pull request #3252 from pipecat-ai/aleix/vision-response-frames
add vision response and text frames
2025-12-17 11:01:06 -08:00
Aleix Conchillo Flaqué
159e403ae4 MoondreamService: yield vision response and text frames 2025-12-17 10:42:08 -08:00
Aleix Conchillo Flaqué
d3d50ac580 frames: added vision response and text frames 2025-12-17 10:42:08 -08:00
jay
614d5e0d19 add changelog 2025-12-18 00:08:30 +05:30
jay
83a3295a39 update error handling based on code review 2025-12-18 00:03:47 +05:30
Aleix Conchillo Flaqué
e03e5f3a59 Merge pull request #3251 from pipecat-ai/aleix/more-evals-prompt-eng
scripts(evals): more eval prompts improvements
2025-12-17 10:29:50 -08:00
Mark Backman
65e4719cec Merge pull request #3250 from pipecat-ai/mb/add-pr-link-to-changelog-lines
Add PR link to the changelog line item
2025-12-17 12:58:48 -05:00
Aleix Conchillo Flaqué
d07b37b288 scripts(evals): more eval prompts improvements 2025-12-17 09:55:12 -08:00
Mark Backman
ca97d9dc4b Merge pull request #3249 from pipecat-ai/mb/cleanup-pipecat-version
Clean up use of pipecat version
2025-12-17 12:17:53 -05:00
Mark Backman
4c20483a7e Add PR link to the changelog line item 2025-12-17 12:12:05 -05:00
Mark Backman
6d84f36d05 Merge pull request #3214 from pipecat-ai/mb/update-run-inference
Update run_inference to use the provided LLM configuration params
2025-12-17 12:03:50 -05:00
Mark Backman
0b6e8f5bca Merge pull request #3246 from pipecat-ai/mb/changelog-3245
Add changelog fragment for PR 3245
2025-12-17 11:55:54 -05:00
Paul Kompfner
cdd6f5aa6a Fix Anthropic LLM's run_inference so that it works even when extended thinking is enabled 2025-12-17 11:55:46 -05:00
Mark Backman
f1a0d547ce Clean up use of pipecat version 2025-12-17 11:49:54 -05:00
mattie ruth backman
b1b7fc6357 Bump the RTVI version to 1.1.0 and add pipecat versioning to the botReady about field 2025-12-17 11:48:02 -05:00
Mark Backman
b3403e884d Merge pull request #3247 from pipecat-ai/mb/strip-whitespace-simple-text-agg
SimpleTextAggregator: Strip whitespace in the returned aggregation
2025-12-17 11:43:37 -05:00
Mark Backman
16e304016d SimpleTextAggregator: Strip whitespace in the returned aggregation 2025-12-17 11:33:39 -05:00
Mark Backman
21a55f6aae Update run_inference to use the provided LLM configuration params 2025-12-17 10:58:05 -05:00
Mark Backman
310df33de6 Add changelog fragment for PR 3245 2025-12-17 08:45:16 -05:00
Mark Backman
c8a86059fb Merge pull request #3245 from simopot/add-soniox-language-hints-strict
Add language_hints_strict parameter to SonioxSTTService
2025-12-17 08:43:25 -05:00
Mark Backman
c537d7bafb Merge pull request #3235 from pipecat-ai/mb/dev-runner-daily-pstn-dialin
Added Daily PSTN dial-in support to the development runner
2025-12-17 08:31:42 -05:00
Simo Potinkara
1fce68cef1 Add language_hints_strict parameter to SonioxSTTService
Add support for the language_hints_strict parameter in Soniox STT
configuration. When set to true, this parameter strictly enforces
language hints, restricting transcription to only the specified
languages.
2025-12-17 13:24:26 +02:00
Aleix Conchillo Flaqué
ecd9ec4ad2 Merge pull request #3241 from pipecat-ai/aleix/evals-remove-idle-timeout
evals remove idle timeout and prompt improvements
2025-12-16 18:04:52 -08:00
Aleix Conchillo Flaqué
db983cb693 BaseObject: log file and line number for uncaught exceptions 2025-12-16 17:29:14 -08:00
Aleix Conchillo Flaqué
5b30f1b1ef scripts(evals): improve prompts 2025-12-16 17:26:50 -08:00
Aleix Conchillo Flaqué
5f7dbfe775 scripts(evals): don't use on_idle_timeout 2025-12-16 17:26:42 -08:00
Aleix Conchillo Flaqué
2bb6ba59fc Merge pull request #3240 from pipecat-ai/aleix/cartesia-ensure-word-timestamps-started
WordTTSService: make sure word timestamps are always started
2025-12-16 14:02:55 -08:00
Aleix Conchillo Flaqué
ac7b06faba WordTTSService: make sure word timestamps are always started 2025-12-16 14:00:52 -08:00
Mark Backman
afa7573834 Merge pull request #3239 from pipecat-ai/mb/update-inworld-tts
Inworld TTS services: Add websocket TTS class, add word-timestamp ali…
2025-12-16 16:26:43 -05:00
Mark Backman
f2eb9eeb56 Merge pull request #3232 from pipecat-ai/mb/changelog-3230
Add changelog fragment for PR 3230
2025-12-16 16:23:17 -05:00
kompfner
9e49e09360 Merge pull request #3226 from pipecat-ai/filipi/elevenlabs_http_voice_settings
Fixed an issue where ElevenLabsHttpTTSService was not updating voice settings
2025-12-16 16:07:34 -05:00
kompfner
b5221cd2c1 Merge pull request #3234 from hwuiwon/hw/bugfix-llmcontext
Fix LLM context tool audio content handling
2025-12-16 16:04:16 -05:00
Hwuiwon Kim
796f3aeff3 fix 2025-12-16 15:56:08 -05:00
Mark Backman
de94790b94 Merge pull request #3236 from pipecat-ai/mb/websocket-stt-services
Update websocket STT services to use the WebsocketSTTService base class
2025-12-16 13:59:52 -05:00
Mark Backman
bd3bf9a00e Inworld TTS services: Add websocket TTS class, add word-timestamp alignment 2025-12-16 13:47:24 -05:00
kompfner
92f934031d Merge pull request #3224 from pipecat-ai/pk/simplify-gemini-thinking
Clean up logic related to applying Gemini thought signatures to conte…
2025-12-16 13:35:17 -05:00
Mark Backman
11b92d89d0 Add session ID to GladiaSTTService logs, reset bytes_sent counter 2025-12-16 10:06:16 -05:00
Mark Backman
0d1a122582 Add changelog for PR 3236 2025-12-16 09:48:47 -05:00
Mark Backman
24b5efb9d8 Update SonioxSTTService to use WebsocketSTTService 2025-12-16 09:46:35 -05:00
Mark Backman
eeb3b85e39 Update AWSTranscribeSTTService to use WebsocketSTTService 2025-12-16 09:37:31 -05:00
Mark Backman
8255770b6c Update AssemblyAISTTService to use WebsocketSTTService 2025-12-16 09:30:03 -05:00
Mark Backman
d3f918eb58 Update GladiaSTTService to use WebsocketSTTService 2025-12-16 09:20:53 -05:00
Mark Backman
36c6549426 Added Daily PSTN dial-in support to the development runner 2025-12-15 19:10:00 -05:00
Aleix Conchillo Flaqué
88d909d468 Merge pull request #3231 from pipecat-ai/aleix/improve-evals-assert-on-exit
evals: use EndFrame reason field to provide eval result
2025-12-15 13:23:29 -08:00
Aleix Conchillo Flaqué
21e346abe2 scripts(evals): improve eval prompts 2025-12-15 13:21:40 -08:00
Aleix Conchillo Flaqué
70a80847a7 scripts(evals): use future instead of a queue to store eval result 2025-12-15 13:21:28 -08:00
Hwuiwon Kim
27647fc067 Fix LLM context tool conversion and audio content handling 2025-12-15 13:43:57 -05:00
Mark Backman
85fe6d4c34 Add changelog fragment for PR 3230 2025-12-15 13:02:01 -05:00
Mark Backman
4cd971e4bd Merge pull request #3230 from kstonekuan/fix/smallwebrtcrequesthandler-return-type
Fix return type for SmallWebRTCRequestHandler.handle_web_request
2025-12-15 13:01:45 -05:00
jay
7e424d750e improve error handling to log all error types 2025-12-15 23:18:44 +05:30
jay
59c3abeb92 fix issue with infinite loop when websocket disconnects 2025-12-15 23:06:35 +05:30
Paul Kompfner
54926f390d Make image writing to and reading from LLMContext more robust; let's allow storing in context image types other than JPEG, meaning not lossily and unnecessarily re-encoding non-JPEG images as JPEG. 2025-12-15 10:39:36 -05:00
Kingston Kuan
50362ca37e Merge branch 'pipecat-ai:main' into fix/smallwebrtcrequesthandler-return-type 2025-12-15 16:41:59 +08:00
Aleix Conchillo Flaqué
a14c911fb2 scripts(evals): improve eval assertion on exit 2025-12-14 12:37:05 -08:00
Aleix Conchillo Flaqué
a5e42337a4 frames: EndFrame and CancelFrame reason is now Any 2025-12-14 12:16:14 -08:00
Aleix Conchillo Flaqué
4f848e9631 Merge pull request #3227 from fixie-ai/mike/upstream
Add Ultravox service
2025-12-13 18:29:02 -08:00
Kingston
93df7044fa fix return type for SmallWebRTCRequestHandler 2025-12-13 22:11:06 +08:00
Paul Kompfner
e604e9b490 Support conversations with Gemini 3 Pro Image (model "gemini-3-pro-image-preview").
Prior to this change, after the model generated an image the conversation would not be able to progress. It would stall out because we were never storing the image in context, so the model would never realize it already did the work of generating an image. We didn't run into issues with Gemini 2.5 Flash Image, because that model always followed up an image with a text message.
2025-12-12 18:20:17 -05:00
Mike Depinet
2e4fa3f8db PR comments
Also satisfy some Pyright complaints and update default model
2025-12-12 15:03:31 -08:00
Mark Backman
5f6448a8a4 Merge pull request #3228 from pipecat-ai/mb/gemini-live-update
Update GeminiLiveLLMService model to gemini-2.5-flash-native-audio-pr…
2025-12-12 14:32:45 -05:00
Mark Backman
6cda357ce8 Remove timestamp check from TestThoughtTranscription 2025-12-12 14:28:39 -05:00
Mark Backman
7e87f61d17 Update GeminiLiveLLMService model to gemini-2.5-flash-native-audio-preview-12-2025 2025-12-12 14:18:57 -05:00
Mike Depinet
ccdf83800b Rename changelog entries 2025-12-12 10:21:56 -08:00
Mike Depinet
4b81be7acf Add Ultravox service (#1)
Adds support for using Ultravox Realtime as a speech-to-speech service.

Also removes the deprecated Ultravox speech-to-text vllm model integration to avoid confusion.
2025-12-12 10:16:15 -08:00
Paul Kompfner
abc2ad8cbc Avoid printing out entire thought signatures in logs 2025-12-12 13:01:45 -05:00
Paul Kompfner
64471d65f8 Clean up logic related to applying Gemini thought signatures to context messages 2025-12-12 12:53:11 -05:00
Filipi Fuchter
3c4991a41f Mentioning the ElevenLabsHttpTTSService voice settings fix in the changelog. 2025-12-12 14:48:32 -03:00
Filipi Fuchter
71d6516a14 Fixed an issue where ElevenLabsHttpTTSService was not updating voice settings when receiving a TTSUpdateSettingsFrame. 2025-12-12 14:46:24 -03:00
Filipi da Silva Fuchter
22288648e6 Merge pull request #3210 from pipecat-ai/filipi/heygen_liveavatar
Adding support for the HeyGen LiveAvatar API
2025-12-12 09:19:58 -03:00
Filipi Fuchter
a6ee040d82 Adding the changelog mentioning the HeyGen changes. 2025-12-12 08:58:48 -03:00
Filipi Fuchter
87fc860cd5 Changing the HeyGenVideoService example to use the live avatar API. 2025-12-12 08:52:10 -03:00
Filipi Fuchter
b25ad21941 Refactoring HeyGenVideoService and HeyGenTransport to work with both APIs. 2025-12-12 08:51:35 -03:00
Filipi Fuchter
debcea3baa Adding the new HEYGEN_LIVE_AVATAR_API_KEY to the requested environment's variables. 2025-12-12 08:51:01 -03:00
Filipi Fuchter
c2abe42a64 Adding support for the HeyGen LiveAvatar API. 2025-12-12 08:49:52 -03:00
Filipi Fuchter
56dee06a29 Refactored the Interactive Avatar API to extend the HeyGen base API. 2025-12-12 08:49:16 -03:00
Filipi Fuchter
60cc14cafd Created HeyGen base API to support both Interactive Avatar and LiveAvatar. 2025-12-12 08:48:39 -03:00
kompfner
1e98094394 Merge pull request #3175 from pipecat-ai/pk/thinking-exploration
Additional functionality related to thinking, for Google and Anthropic LLMs.
2025-12-11 17:15:37 -05:00
Paul Kompfner
ccdd6cde52 Fix a couple of typos in comments 2025-12-11 17:05:09 -05:00
Paul Kompfner
12979293ad Add thinking examples to eval suite 2025-12-11 15:58:48 -05:00
Paul Kompfner
28248e9b00 Split up thinking examples so that there isn't an llm command-line arg for controlling which LLM to use. This change is preparation for adding these examples to our suite of evals. 2025-12-11 15:07:35 -05:00
Paul Kompfner
0e88ad672e Add ThoughtTranscriptionMessage.role, which is always "assistant" 2025-12-11 14:41:16 -05:00
kompfner
f41c3dcbc3 Merge pull request #3212 from pipecat-ai/pk/nova-2-sonic
Nova 2 Sonic support
2025-12-11 09:36:50 -05:00
Mark Backman
645e1802f8 Merge pull request #3219 from pipecat-ai/mb/deprecate-fal-smart-turn 2025-12-10 13:13:44 -05:00
Mark Backman
6636da682c Merge pull request #3085 from rimelabs/feature/rimeNonJsonTTsservice
Add RimeNonJsonTTSService for non-JSON WebSocket API support
2025-12-10 10:38:39 -05:00
Mark Backman
10a32c943f deprecate: FalSmartTurnAnalyzer and LocalSmartTurnAnalyzer 2025-12-10 08:14:28 -05:00
Gokul Js
455579ffcc Refactor RimeNonJsonTTSService to extend InterruptibleTTSService, removing dependency on WebsocketTTSService and streamlining audio interruption handling. 2025-12-10 04:56:52 +05:30
Paul Kompfner
c37da6ab78 In the AWS Nova Sonic example, shorten the simulated weather function call delay 2025-12-09 16:53:18 -05:00
Paul Kompfner
1892854516 In the AWS Nova Sonic example, send back "location" from the weather-fetching function to help the model associate a tool response with a tool call...if you interrupt the model while more than one function call is outbound, it seemingly can get confused about which tool result goes which call. 2025-12-09 16:27:23 -05:00
Mark Backman
735e597bf2 Merge pull request #3209 from pipecat-ai/hush/07n-prompt
Update system prompt in Gemini example to be more instructive
2025-12-09 15:45:46 -05:00
Vanessa Pyne
52980a69c5 Merge pull request #3215 from pipecat-ai/vp-user-bot-latency-observer-internal-var-change
user-bot-latency log observer internal var change
2025-12-09 13:03:29 -06:00
vipyne
ff2f1dac82 user-bot-latency log observer internal var change 2025-12-09 12:34:38 -06:00
Paul Kompfner
3cbfbb997e Added CHANGELOG for AWS Nova 2 Sonic-related changes 2025-12-09 12:57:19 -05:00
Paul Kompfner
3e66cb50e0 Update AWS Nova Sonic example to showcase async tool calling 2025-12-09 12:44:21 -05:00
Paul Kompfner
b821dd2507 Fix a bug in AWSNovaSonicLLMService where we would mishandle cancelled tool calls in context 2025-12-09 12:12:55 -05:00
Paul Kompfner
0c5bccd1f1 Changes related to Nova 2 Sonic's support for the model speaking first 2025-12-09 11:55:23 -05:00
Paul Kompfner
926514ca18 Add support to AWSNovaSonicLLMService for new "endpointingSensitivity" parameter. 2025-12-09 11:26:43 -05:00
Paul Kompfner
ca5e668f4a Update AWSNovaSonicLLMService docstring with more (and more up-to-date) info 2025-12-09 10:14:27 -05:00
Paul Kompfner
53de6c0b9a Update list of supported regions in 40-aws-nova-sonic.py 2025-12-09 09:46:53 -05:00
Paul Kompfner
b22ac8292f Update default model in AWSNovaSonicLLMService to "amazon.nova-2-sonic-v1:0" 2025-12-09 09:38:47 -05:00
James Hush
83877ab1e6 Update system prompt in Gemini example to be more instructive
Changed the on_client_connected system message from a direct greeting to
an instruction that tells the AI to introduce itself, giving the LLM more
flexibility in how it starts the conversation.
2025-12-09 09:04:10 +01:00
gui217
1c0e25a90d fix unit tests 2025-12-09 09:56:20 +02:00
Gokul Js
2a6a0d83db Update docstring in RimeNonJsonTTSService to clarify the focus on the current plain text protocol and note potential future support for JSON WebSocket. 2025-12-09 02:49:37 +05:30
Gokul Js
6ca117a3c1 Remove unused import of 'language' in tts.py to clean up the code and improve readability. 2025-12-09 02:45:17 +05:30
Gokul Js
4fcb099fd7 Add RimeNonJsonTTSService to support non-JSON streaming mode, enabling WebSocket streaming for the Arcana model. 2025-12-09 02:43:57 +05:30
Paul Kompfner
c5ff5cc219 Update CHANGELOG 2025-12-08 16:09:59 -05:00
Aleix Conchillo Flaqué
88289f578a Merge pull request #3208 from pipecat-ai/thor/add-client-identification
add Gemini client identification
2025-12-08 13:05:04 -08:00
Paul Kompfner
229ff794d6 Better handle Gemini non-function thought signatures 2025-12-08 15:56:40 -05:00
Aleix Conchillo Flaqué
096db3eb6c Merge pull request #3207 from pipecat-ai/aleix/voicemail-conversation-detected-event
VoicemailDetector: add on_conversation_detected event
2025-12-08 11:59:45 -08:00
Aleix Conchillo Flaqué
cfd1cada8c VoicemailDetector: add on_conversation_detected event 2025-12-08 11:57:14 -08:00
Aleix Conchillo Flaqué
ee435b6f1e update CHANGELOG 2025-12-08 11:54:09 -08:00
Aleix Conchillo Flaqué
d289b38ba7 tests(google): mock the new pipecat.version() 2025-12-08 11:51:01 -08:00
Aleix Conchillo Flaqué
b0f63c3785 pipecat: add version() function 2025-12-08 11:51:01 -08:00
Paul Kompfner
1249ee3de3 Better handle Gemini non-function thought signatures 2025-12-08 13:07:25 -05:00
Vanessa Pyne
b09d8bd595 Merge pull request #3206 from pipecat-ai/vp-update-bot-latency-observer
use VADUserStarted/StoppedSpeakingFrame s in user_bot_latency_log_observer.py
2025-12-08 11:37:56 -06:00
vipyne
540a48b1b6 use VADUserStarted/StoppedSpeakingFrame s in user_bot_latency_log_observer.py 2025-12-08 11:37:31 -06:00
Paul Kompfner
aa0529ff82 Update comments for accuracy 2025-12-08 11:47:06 -05:00
Paul Kompfner
7e92597c0e Remove LLMThoughtSignatureFrame in favor of using the more generic LLMMessagesAppendFrame 2025-12-08 11:10:05 -05:00
Gokul Js
99f89351fa Add support for non-JSON streaming mode in RimeTTSService, enabling both JSON and raw audio WebSocket streaming for enhanced performance and flexibility. 2025-12-08 21:32:50 +05:30
Gokul Js
0b4d984be6 Standardize error handling in RimeNonJsonTTSService by replacing specific error messages with a generic "Unknown error occurred" format, enhancing consistency in error reporting. 2025-12-08 21:24:30 +05:30
Paul Kompfner
17203ba3e6 Change FunctionInProgressFrame.llm_specific_extra to a more generic FunctionInProgressFrame.append_extra_context_messages. 2025-12-08 10:50:19 -05:00
Gokul Js
924831089c Enhance error handling in RimeNonJsonTTSService by standardizing error messages for improved clarity and consistency in reporting. 2025-12-08 21:17:01 +05:30
Gokul Js
329b8ac426 Refactor error handling in RimeNonJsonTTSService to provide a more generic error message, improving clarity in error reporting. 2025-12-08 21:06:48 +05:30
Paul Kompfner
61674d7758 Add process_thought constructor argument to TranscriptProcessor to control whether to handle thoughts in addition to assistant utterances. Defaults to False. 2025-12-08 10:27:36 -05:00
Gokul Js
b9990811b5 Merge branch 'main' into feature/rimeNonJsonTTsservice 2025-12-08 20:54:01 +05:30
Paul Kompfner
8ccc2cbf31 Add unit tests for ThoughtTranscriptProcessor 2025-12-08 10:14:31 -05:00
Gokul Js
f4e33fc8dd Update docstrings in RimeNonJsonTTSService for clarity and consistency, specifying 'Non-JSON' in relevant descriptions. 2025-12-08 20:32:13 +05:30
Gokul Js
5bfea84bd5 Refactor RimeNonJsonTTSService to extend WebsocketTTSService, enhancing WebSocket functionality and improving code clarity 2025-12-08 20:30:46 +05:30
Paul Kompfner
ef703e9d16 Get rid of ThoughtTranscriptProcessor, moving its logic into AssistantTranscriptProcessor instead 2025-12-08 09:59:32 -05:00
Paul Kompfner
44aa11737b Minor docstring update for accuracy 2025-12-08 09:29:10 -05:00
Paul Kompfner
49f1f7d6a2 Added CHANGELOG entry describing new thinking-related functionality 2025-12-08 09:29:10 -05:00
Paul Kompfner
4ea51ff67c Slight refactor of handling thought-signature-containing special context messages in the Gemini adapter 2025-12-08 09:29:10 -05:00
Paul Kompfner
747bd4f737 Tweak the prompt of the thinking + functions example to not confuse Gemini as much (Gemini found the original prompt a bit ambiguous, it seems) 2025-12-08 09:29:10 -05:00
Paul Kompfner
15f5583fd2 Simplify, at the expense of a bit of not-yet-needed flexibility: rather than associating a loose thought_metadata with each thought, use a signature. Thought signatures are the only "thought metadata" we use today. 2025-12-08 09:29:10 -05:00
Paul Kompfner
c8c6f424cd Add support for Gemini 3 Pro non-function-call-related thought signatures 2025-12-08 09:29:10 -05:00
Paul Kompfner
0cdf0c4504 Bump Google GenAI library version to at least 1.51.0, as that's the version where thinking_level—required for controlling Gemini 3 Pro thinking—is introduced 2025-12-08 09:29:10 -05:00
Paul Kompfner
217f03b9cc Add additional functionality related to "thinking", for Google and Anthropic LLMs.
Thinking, sometimes called "extended thinking" or "reasoning", is an LLM process where the model takes some additional time before giving an answer. It's useful for complex tasks that may require some level of planning and structured, step-by-step reasoning. The model can output its thoughts (or thought summaries, depending on the model) in addition to the answer. The thoughts are usually pretty granular and not really suitable for being spoken out loud in a conversation, but can be useful for logging or prompt debugging.

Here's what's added:

1. New typed input parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries).
2. New frames for representing thoughts output by LLMs.
3. A generic mechanism for associating extra LLM-specific data with a function call in context, used specifically to support Google's function-call-related "thought signatures", which are necessary to ensure thinking continuity between function calls in a chain (where the model thinks, makes a function call, thinks some more, etc.)
4. A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages.
5. An expansion of `TranscriptProcessor` to process LLM thoughts in addition to user and assistant utterances.
2025-12-08 09:29:01 -05:00
Gokul Js
12093fcffc Update default sample_rate parameter in RimeNonJsonTTSService to None for flexibility 2025-12-08 19:50:38 +05:30
Gokul Js
e5fb643cf5 Improve docstring formatting in RimeNonJsonTTSService for better readability 2025-12-08 19:45:13 +05:30
Mark Backman
4517475db7 Merge pull request #3197 from pipecat-ai/mb/cartesia-stt-cleanup
Clean up CartesiaSTTService
2025-12-08 08:53:40 -05:00
gui217
c48858742a clean up 2025-12-08 11:51:20 +02:00
gui217
90ef758522 align uv.lock 2025-12-08 11:40:08 +02:00
gui217
3974937352 align uv.lock 2025-12-08 11:39:40 +02:00
gui217
d64ab08bc4 chore: update uv.lock 2025-12-08 11:39:17 +02:00
gui217
6603ecfe29 chore: update uv.lock with pyrnnoise and restore revision 3 2025-12-08 11:39:10 +02:00
gui217
d3ae0b6a14 rebase 2025-12-08 11:36:44 +02:00
Aleix Conchillo Flaqué
92b6e8d66b Merge pull request #3189 from pipecat-ai/aleix/introduce-uninterruptible-frames
introduce uninterruptible frames
2025-12-07 14:02:35 -08:00
Aleix Conchillo Flaqué
3be1a7afaa Merge pull request #3202 from pipecat-ai/aleix/remove-manta
README: remove manta badge
2025-12-07 14:00:13 -08:00
thorwebdev
15df3c06e8 chore: add test. 2025-12-06 22:36:04 -05:00
Aleix Conchillo Flaqué
f0af0a6b96 README: remove manta badge 2025-12-05 16:16:19 -08:00
Mark Backman
4cefe1357c Merge pull request #3201 from pipecat-ai/changelog-0.0.97
Release 0.0.97 - Changelog Update
2025-12-05 18:49:15 -05:00
markbackman
4df0a9bf73 Update changelog for version 0.0.97 2025-12-05 18:47:21 -05:00
Mark Backman
9ef139d020 Merge pull request #3200 from pipecat-ai/mb/improve-changelog-template
Fix newlines between sections in changlelog template
2025-12-05 18:42:52 -05:00
Mark Backman
9103d4ae05 Fix newlines between sections in changlelog template 2025-12-05 18:40:49 -05:00
Aleix Conchillo Flaqué
bd63b6cefa Merge pull request #3198 from pipecat-ai/aleix/examples-14i-new-model
examples(foundational): update 14i-fireworks with new serverless model
2025-12-05 15:33:12 -08:00
Aleix Conchillo Flaqué
4d03270bc3 examples(foundational): update 14i-fireworks with new serverless model 2025-12-05 15:31:29 -08:00
Mark Backman
0debcee761 Clean up CartesiaSTTService 2025-12-05 18:12:11 -05:00
Mark Backman
6aee72c5b4 Merge pull request #3196 from pipecat-ai/mb/docs-cleanup-prep-0.0.97
Docs cleanup before 0.0.97 release
2025-12-05 15:16:36 -05:00
Mark Backman
8d62cfb1b6 Merge pull request #3195 from ivaaan/add-hume-header
Add tracking headers to Hume service
2025-12-05 14:50:18 -05:00
ivaaan
41214236ab add changelog 2025-12-05 20:47:04 +01:00
Mark Backman
b25963a63b Docs cleanup before 0.0.97 release 2025-12-05 14:19:26 -05:00
ivaaan
8c6ef21d84 add stop, cancel 2025-12-05 20:13:58 +01:00
thorwebdev
f729b1625b chore: move into services file. 2025-12-05 13:31:58 -05:00
ivaaan
0ffaa09c95 add tracking headers to Hume service 2025-12-05 19:00:47 +01:00
Aleix Conchillo Flaqué
f6e31b7e89 Merge pull request #3185 from pipecat-ai/fix/websocket-service-cancelled-error-handling
fix(websocket): handle CancelledError to prevent reconnection on shutdown
2025-12-05 09:25:49 -08:00
Aleix Conchillo Flaqué
49b2b12e04 frames: change function call frame base types 2025-12-05 09:22:29 -08:00
Aleix Conchillo Flaqué
7ad3969690 introduce UninterruptibleFrame frames 2025-12-05 09:21:36 -08:00
thorwebdev
af089a65ae feat: add Gemini client identification. 2025-12-05 12:06:28 -05:00
Aleix Conchillo Flaqué
48422dd442 WebsocketService: avoid reconnection on shutdown 2025-12-05 09:03:04 -08:00
Vanessa Pyne
fed6a8b669 Merge pull request #3187 from pipecat-ai/vp-mcp-filter-followup
add mcp filter example and changelog
2025-12-05 10:58:19 -06:00
vipyne
82e0253a62 add mcp filter example and changelog 2025-12-05 10:56:59 -06:00
Vanessa Pyne
a7f26dca60 Merge pull request #3152 from RuiDaniel/mcp_client_filters
Add filters to MCP Client
2025-12-05 10:50:27 -06:00
Vanessa Pyne
459ef27f3f Merge pull request #3079 from pipecat-ai/vp-add-exact-model-version-function
set full model name for base openai models
2025-12-05 10:48:53 -06:00
Mark Backman
464cfa5ccb Merge pull request #3188 from pipecat-ai/mb/improve-changelog-process
Auto-generate changelog from fragments
2025-12-05 11:42:25 -05:00
Mark Backman
9289881a80 Remove 3120.added.md 2025-12-05 11:35:50 -05:00
Mark Backman
34033cd454 Add new changelog entries 2025-12-05 11:35:50 -05:00
Mark Backman
47c21c9579 Delete README.md in changelog 2025-12-05 11:35:50 -05:00
Mark Backman
3b0bcf0b66 Validate fragment types match the expected types 2025-12-05 11:35:50 -05:00
Mark Backman
c4a8308027 Fail when no changelog fragments are available 2025-12-05 11:35:50 -05:00
Mark Backman
e9f76dcaf2 Set the date automatically when the workflow runs, leaving an optional override 2025-12-05 11:35:50 -05:00
Mark Backman
21b2229b2b Auto-generate changelog from fragments 2025-12-05 11:35:49 -05:00
Aleix Conchillo Flaqué
11aa9c9e68 update CHANGELOG, remove wait_for_all 2025-12-05 08:34:07 -08:00
Aleix Conchillo Flaqué
9f4680e9bd Merge pull request #3190 from pipecat-ai/aleix/no-need-wait-for-all
LLMService: let's not introduce wait_for_all for now
2025-12-05 08:31:44 -08:00
Aleix Conchillo Flaqué
04443a3820 LLMService: let's not introduce wait_for_all for now 2025-12-05 08:26:04 -08:00
Mark Backman
1571cc58ac Merge pull request #3192 from pipecat-ai/mb/cartesia-stt-timestamp
Add full transcript result for CartesiaSTTService
2025-12-05 10:37:06 -05:00
Mark Backman
dea80cf946 Add full transcript result for CartesiaSTTService 2025-12-05 10:25:46 -05:00
Mark Backman
91dec044c4 Merge pull request #3171 from LaurentMazare/gradium
Gradium integration.
2025-12-05 09:43:44 -05:00
laurent
8cf4267d87 Switch to a debug. 2025-12-05 15:37:17 +01:00
Mark Backman
0ee7cab6c6 Merge pull request #3184 from ashotbagh/feat/asyncai-multilingual-addons
Added new languages support for AsyncAI
2025-12-05 08:42:09 -05:00
Ashot
74c2039bfb Updated changelog. 2025-12-05 16:54:38 +04:00
Ashot
66088837cd Fixed defualt language issue in async tts 2025-12-05 16:51:05 +04:00
laurent
07ebf8534a Add the example. 2025-12-05 10:51:22 +01:00
laurent
fce4cfba15 Changelog update. 2025-12-05 10:46:01 +01:00
laurent
af52833ca0 Update the readme and env.example. 2025-12-05 10:44:30 +01:00
laurent
9fdf756375 Fix. 2025-12-05 10:38:35 +01:00
laurent
283bbb385c And remove the request-id. 2025-12-05 10:35:19 +01:00
laurent
8c6b2edb25 Various code review tweaks. 2025-12-05 10:33:48 +01:00
Laurent Mazare
6ab30f9b87 Apply suggestions from code review
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-12-05 10:25:47 +01:00
Aleix Conchillo Flaqué
3d93285bdf Merge pull request #3176 from pipecat-ai/aleix/exception-filename-line-number
log file name and line number when exception occurs
2025-12-04 11:08:32 -08:00
Aleix Conchillo Flaqué
7261cd28f2 log file name and line number when exception occurs 2025-12-04 11:06:45 -08:00
vipyne
33eeb8ce44 Use _full_model_name in llm trace if available 2025-12-04 11:54:45 -06:00
vipyne
ebda94ca98 set full model name for base openai models 2025-12-04 11:54:45 -06:00
Mark Backman
40b17cff8f Merge pull request #3186 from pipecat-ai/mb/11labs-fix-metrics-tracking
fix: ElevenLabsTTSService character usage metrics
2025-12-04 12:36:39 -05:00
marcus-daily
7ba0ebba11 Smart Turn analyzer now uses the full context of the turn rather than just the audio since VAD last triggered (fixes #3094) 2025-12-04 16:40:08 +00:00
Mark Backman
b39087027c fix: ElevenLabsTTSService character usage metrics 2025-12-04 09:41:18 -05:00
Ashot
e65974c870 Added new languages support for AsyncAI 2025-12-04 16:15:28 +04:00
marcus-daily
b1e5d68d97 Updating changelog 2025-12-04 11:32:16 +00:00
marcus-daily
39bca074d7 Smart Turn v3.1 2025-12-04 11:32:16 +00:00
Aleix Conchillo Flaqué
b5e79f9dc5 Merge pull request #3181 from pipecat-ai/aleix/sync-to-utils-sync
move pipecat.sync to pipecat.utils.sync
2025-12-03 19:41:18 -08:00
Aleix Conchillo Flaqué
613b96819f Merge pull request #3180 from pipecat-ai/aleix/deepgram-tts-service-fix
DeepgramTTSService: fix websocket header logging
2025-12-03 19:40:43 -08:00
Mark Backman
57c24670ea Merge pull request #3132 from pipecat-ai/mb/normalize-llm-text-frame-output
Add split_text_by_spaces string util, normalize aggregator input
2025-12-03 22:05:14 -05:00
Mark Backman
d79dd94019 Make aggregate return an AsyncIterator, other clean up 2025-12-03 22:00:34 -05:00
Mark Backman
fa8e7458e1 Clean up 2025-12-03 22:00:04 -05:00
Mark Backman
4d66191963 fix: PatternPairAggregator to process patterns only once 2025-12-03 22:00:04 -05:00
Mark Backman
7e9d67002e SkipTagsAggregator and PatternPairAggregator now subclass SimpleTextAggregator 2025-12-03 22:00:04 -05:00
Mark Backman
ffbb6e5937 Update SimpleTextAggregator to handle character by character input, use a buffer to handle ambiguous EOS scenarios, and add a flush method to all aggregators 2025-12-03 22:00:02 -05:00
Mark Backman
535b85cf90 Add split_text_by_spaces string util 2025-12-03 21:55:30 -05:00
Aleix Conchillo Flaqué
8dc9872ed5 deprecate pipecat.sync package 2025-12-03 18:44:41 -08:00
Aleix Conchillo Flaqué
f37a53cc25 utils(sync): move sync to utils.sync 2025-12-03 18:20:12 -08:00
Aleix Conchillo Flaqué
9cce28c64c DeepgramTTSService: use websocket response headers for logging 2025-12-03 18:16:25 -08:00
Aleix Conchillo Flaqué
3ca94363ec Merge pull request #3168 from pipecat-ai/aleix/dont-override-skip-tts
LLMTextFrame: don't override skip_tts
2025-12-03 18:15:50 -08:00
Rpcd
9dd882ecf8 Update src/pipecat/services/mcp_service.py
Co-authored-by: Vanessa Pyne <vipyne@gmail.com>
2025-12-03 17:28:37 +00:00
Rpcd
0bbb14eb9b Update src/pipecat/services/mcp_service.py
Co-authored-by: Vanessa Pyne <vipyne@gmail.com>
2025-12-03 17:28:29 +00:00
Mark Backman
050f287ec4 Merge pull request #3072 from jjmaldonis/deepgram/add-deepgram-request-ids-to-debug-logs
deepgram: added request IDs to debug logs
2025-12-03 09:37:25 -05:00
Jason Maldonis
e6f5561785 updated changelog 2025-12-03 08:18:09 -06:00
Jason Maldonis
2df91f4b37 fixed linting 2025-12-03 08:09:16 -06:00
Jason Maldonis
7db49b9067 deepgram: added request IDs to debug logs
Deepgram request IDs are necessary for investigating behavior at the
request level. This commit adds DEBUG logs that print Deepgram request
IDs when using Deepgram's STT or TTS.
2025-12-03 08:09:13 -06:00
Vanessa Pyne
7c497bdc89 Merge pull request #3130 from pipecat-ai/vp-nvidia-docs
update nvidia services naming
2025-12-02 13:04:16 -06:00
vipyne
1aa4247d2b remove nim from pyproject.toml 2025-12-02 12:55:13 -06:00
laurent
1ffa9ff51f Gradium integration. 2025-12-02 13:34:51 +01:00
Rpcd
435b53f1a0 Update src/pipecat/services/mcp_service.py
Co-authored-by: Vanessa Pyne <vipyne@gmail.com>
2025-12-02 09:22:08 +00:00
Rpcd
406bdfad0d Update src/pipecat/services/mcp_service.py
Co-authored-by: Vanessa Pyne <vipyne@gmail.com>
2025-12-02 09:21:59 +00:00
vipyne
acba544e6f pr notes for nvidia service name change 2025-12-01 22:41:17 -06:00
vipyne
5d93c64ee5 typo fixes and uv.lock update 2025-12-01 22:41:17 -06:00
vipyne
de10bc8803 changelog for riva,nim -> nvidia name change 2025-12-01 22:41:17 -06:00
vipyne
36f5c1722d deprecate riva and nim service paths in favor of nvidia 2025-12-01 22:41:17 -06:00
vipyne
a8280522e5 examples: rename nvidia foundational examples 2025-12-01 22:41:17 -06:00
vipyne
05d65dfdd3 Update NVIDIA NIM and Riva services to Nvidia
- pip install pipecat-ai[nim]
- pip install pipecat-ai[riva]

+ pip install pipecat-ai[nvidia]

and

- from pipecat.services.nim.llm import NimLLMService
+ from pipecat.services.nvidia.llm import NvidiaLLMService

- from pipecat.services.riva.stt import RivaSTTService
+ from pipecat.services.nvidia.stt import NvidiaSTTService

- from pipecat.services.riva.tts import RivaTTSService
+ from pipecat.services.nvidia.tts import NvidiaTTSService
2025-12-01 22:41:17 -06:00
Aleix Conchillo Flaqué
a3962e3b47 LLMTextFrame: don't override skip_tts 2025-12-01 18:37:07 -08:00
Aleix Conchillo Flaqué
cd231cf829 Merge pull request #3120 from pipecat-ai/aleix/function-calls-wait-for-all
allow waiting for all function calls to complete
2025-12-01 18:35:53 -08:00
Aleix Conchillo Flaqué
9fafc1692d update uv.lock 2025-12-01 18:32:00 -08:00
Aleix Conchillo Flaqué
7648d0436c examples(19): linting 2025-12-01 18:30:34 -08:00
Aleix Conchillo Flaqué
bff8747e38 LLMService: allow waiting for all function calls to complete 2025-12-01 18:30:25 -08:00
Mark Backman
d227c0c097 Merge pull request #3155 from pipecat-ai/mb/fix-sarvam-tts-not-flushing
fix: flush audio in SarvamTTSService
2025-12-01 17:22:33 -05:00
Mark Backman
9ccde60521 fix: flush audio in SarvamTTSService 2025-12-01 17:18:34 -05:00
Mark Backman
b84a40666c Merge pull request #3156 from pipecat-ai/mb/deepgram-stt-stopped-frame
fix: DeepgramTTSService, let the base class push TTSStoppedFrame
2025-12-01 17:18:19 -05:00
Mark Backman
e72b135a4c fix: DeepgramTTSService, let the base class push TTSStoppedFrame 2025-12-01 17:15:51 -05:00
Aleix Conchillo Flaqué
2235d8f5a2 CHANGELOG formatting 2025-12-01 10:24:42 -08:00
Mark Backman
6e20a50a4b Merge pull request #3153 from pipecat-ai/mb/fix-aws-stt-region
fix: AWSTranscribeSTTService always set to us-east-1
2025-12-01 13:07:22 -05:00
Mark Backman
89d9ca045a fix: AWSTranscribeSTTService always set to us-east-1 2025-12-01 13:02:08 -05:00
Mark Backman
4b95ee92eb Merge pull request #3166 from pipecat-ai/mb/update-changelog-AWSBedrockAgentCoreProcessor
Retroactively add changelog to 0.0.96 for AWSBedrockAgentCoreProcessor
2025-12-01 11:51:47 -05:00
Mark Backman
d481ac6cc6 Retroactively add changelog to 0.0.96 for AWSBedrockAgentCoreProcessor 2025-12-01 11:49:00 -05:00
Mark Backman
e5a91296b5 Merge pull request #3162 from ai-coustics/add-stt-optimized-model
Add Quail STT as default model for `AICFilter`
2025-11-30 09:59:37 -05:00
Corvin Jaedicke
d8d10a0685 add changelog entry 2025-11-28 15:24:19 +01:00
Corvin Jaedicke
6dd9ed03b1 bump version to include new STT model, noise gate deprecation warning 2025-11-28 15:14:43 +01:00
Filipi da Silva Fuchter
d486c80804 Merge pull request #3151 from pipecat-ai/filipi/fix_runner_ice_servers
Fixing runner ICE servers to be compatible with what is expected by the mobile SDKs.
2025-11-27 10:24:02 -03:00
Filipi Fuchter
dedea7c420 Fixing runner ICE servers to be compatible with what is expected by the mobile SDKs. 2025-11-27 09:27:26 -03:00
Aleix Conchillo Flaqué
b78eb5de6b Merge pull request #3148 from pipecat-ai/aleix/pipecat-0.0.96-update
update CHANGELOG for 0.0.96 with proper date
2025-11-26 17:21:31 -08:00
Aleix Conchillo Flaqué
95aa13beb1 update CHANGELOG for 0.0.96 with proper date 2025-11-26 17:16:54 -08:00
Mark Backman
88ce85342c Merge pull request #3147 from pipecat-ai/mb/fix-sagemaker-error-handling
Fix error handling in DeepramSageMakerSTTService
2025-11-26 20:15:45 -05:00
Mark Backman
bedd40ae8b Fix error handling in DeepramSageMakerSTTService 2025-11-26 20:12:31 -05:00
Mark Backman
fda327b3ee Merge pull request #3146 from pipecat-ai/mb/fix-aws-bedrock-region
fix: AWSBedrockLLMService was always set to us-east-1
2025-11-26 19:56:09 -05:00
Mark Backman
ace95b6e6d fix: AWSBedrockLLMService was always set to us-east-1 2025-11-26 19:52:04 -05:00
Aleix Conchillo Flaqué
26c5c28c5c Merge pull request #3145 from pipecat-ai/aleix/simli-enable-logging-param
SimliVideoService: add enable_logging input parameter
2025-11-26 16:49:12 -08:00
Aleix Conchillo Flaqué
81f862749d SimliVideoService: add enable_logging input parameter 2025-11-26 16:36:06 -08:00
Aleix Conchillo Flaqué
b8bf7b4132 Merge pull request #3143 from pipecat-ai/aleix/pipecat-0.0.96
update CHANGELOG for 0.0.96
2025-11-26 16:31:44 -08:00
Aleix Conchillo Flaqué
d90121ef3b update CHANGELOG for 0.0.96 2025-11-26 15:30:06 -08:00
Filipi da Silva Fuchter
d0b7b4fb0a Merge pull request #3144 from pipecat-ai/filipi/fix_flux_reconnection_issue
Fixed an issue with DeepgramFluxSTTService where it sometimes failed to reconnect.
2025-11-26 20:29:41 -03:00
Filipi Fuchter
4acc317923 Fixed an issue with DeepgramFluxSTTService where it sometimes failed to reconnect. 2025-11-26 20:23:03 -03:00
Filipi da Silva Fuchter
7caf5751ee Merge pull request #3084 from pipecat-ai/filipi/improve_error_handler
Improving error handler.
2025-11-26 18:40:44 -03:00
Filipi Fuchter
1330ef3ad6 Enhanced error handling across the framework.
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-11-26 18:34:25 -03:00
Mark Backman
9efb21d61e Merge pull request #3115 from pipecat-ai/mb/deepgram-websocket-tts
Update DeepgramTTSService to use Deepgram's Websocket TTS API
2025-11-26 13:30:52 -05:00
Mark Backman
6d93b8e9d8 Update DeepgramTTSService to use Deepgram's Websocket TTS API 2025-11-26 13:25:34 -05:00
Aleix Conchillo Flaqué
6f527e509e update CHANGELOG with FishAudioTTSService s1 model update 2025-11-26 10:22:59 -08:00
Aleix Conchillo Flaqué
6cf1d0417e Merge pull request #3136 from kcui5/patch-1
Update Fish Audio default model to s1
2025-11-26 10:19:26 -08:00
Mark Backman
19d8b0dfc2 Merge pull request #3011 from thsunkid/feat/add-cached-reasoning-tokens-metrics-to-opentel-spans 2025-11-26 07:45:33 -05:00
Kyle Cui
7fa0cbf2a9 Update Fish Audio default model to s1
Update default model from speech-1.5 to s1 for Fish Audio TTS service
2025-11-26 01:50:38 -08:00
Thu Nguyen
36c4bc2df2 Update changelog 2025-11-26 13:01:48 +07:00
Thu Nguyen
42be0183af Merge branch 'main' into feat/add-cached-reasoning-tokens-metrics-to-opentel-spans 2025-11-26 12:59:43 +07:00
RuiDaniel
7961f8a664 same behaviour on error 2025-11-25 18:35:59 +00:00
RuiDaniel
4ca143e8af add mcp filters to client 2025-11-25 18:27:22 +00:00
Mark Backman
2607699664 Merge pull request #3125 from pipecat-ai/mb/fix-sagemaker-imports
fix: remove stt_sagemaker import from deepgram/__init__.py
2025-11-24 21:31:31 -05:00
Mark Backman
47fa3b8556 Merge pull request #3108 from fbarril/livekit-transport-helper
add livekit helper
2025-11-24 20:13:13 -05:00
Mark Backman
fa0100c38b fix: remove stt_sagemaker import from deepgram/__init__.py 2025-11-24 20:04:18 -05:00
kompfner
e5142c1210 Merge pull request #3113 from pipecat-ai/pk/agentcore-processor
Initial implementation of `AWSBedrockAgentCoreProcessor`
2025-11-24 19:10:44 -05:00
Paul Kompfner
5907b51c7d In AWSBedrockAgentCoreProcessor use self.create_task()/self.cancel_task() instead of using asyncio directly. 2025-11-24 18:53:39 -05:00
Paul Kompfner
9e4ec4f7f3 Implement AWSBedrockAgentCoreProcessor 2025-11-24 18:53:35 -05:00
fbarril
e2161ea63d add pyjwt as a livekit dependency 2025-11-24 23:30:11 +00:00
fbarril
7c81f66241 Merge remote-tracking branch 'origin/main' into livekit-transport-helper
# Conflicts:
#	CHANGELOG.md
#	uv.lock
2025-11-24 23:29:22 +00:00
fbarril
60da466379 add pyjwt as a livekit dependency 2025-11-24 23:27:32 +00:00
fbarril
12c29b71f3 add entry to CHANGELOG.md 2025-11-24 23:27:13 +00:00
Mark Backman
b52b108932 Merge pull request #3118 from pipecat-ai/mb/deepgram-stt-sagemaker
Add SageMaker BiDi client and DeepgramSageMakerSTTService
2025-11-24 16:47:25 -05:00
Mark Backman
a357ff0205 Alphabetize the project.optional-dependencies 2025-11-24 16:43:44 -05:00
Mark Backman
0ece8b5894 Add 07c Deepgram SageMaker example 2025-11-24 16:41:01 -05:00
Mark Backman
782b257bbb Add DeepgramSageMakerSTTService 2025-11-24 16:41:01 -05:00
Mark Backman
ab8dcd6ede Add SageMaker BiDi client 2025-11-24 16:41:00 -05:00
Mark Backman
012c2f7dde Merge pull request #3106 from pipecat-ai/mb/update-11labs-realtime-stt
Fix sample_rate issue in ElevenLabsRealtimeSTTService, add timestamps…
2025-11-24 08:10:30 -05:00
Mark Backman
87fdd8f006 Fix MiniMax changelog entries 2025-11-24 08:07:20 -05:00
Mark Backman
7bdac02837 Fix sample_rate issue in ElevenLabsRealtimeSTTService, add timestamps and logging 2025-11-24 08:06:33 -05:00
Mark Backman
861567bc59 Merge pull request #3119 from pipecat-ai/aleix/changelog-formatting
format CHANGELOG
2025-11-24 08:05:11 -05:00
Aleix Conchillo Flaqué
d0ff43134a format CHANGELOG 2025-11-23 17:48:57 -08:00
Dante Noguez
3458b74fc9 Fix 11labs realtime dynamic updates (#3117) 2025-11-22 10:02:37 -05:00
mattie ruth backman
a6202c4d1a Fixed CHANGELOG post rebase 2025-11-21 17:16:10 -05:00
mattie ruth backman
3c3141796a Overlooked Changelog updates 2025-11-21 17:16:10 -05:00
mattie ruth backman
8b8b57b09c Introduced new bot-output RTVI event to provide...
a best effort version of the bot's output

- The `RTVIObserver` now emits `bot-output` messages based off
  the new `AggregatedTextFrame`s (`bot-tts-text` and
  `bot-llm-text` are still supported and generated, but
  `bot-transcript` is now deprecated in lieu of this new, more
  thorough, message).
- The new `RTVIBotOutputMessage` includes the fields:
  - `spoken`: A boolean indicating whether the text was spoken by TTS
  - `aggregated_by`: A string representing how the text was aggregated
    ("sentence", "word", "my custom aggregation")
- Introduced new fields to `RTVIObserver` to support the new
  `bot-output` messaging:
  - `bot_output_enabled`: Defaults to True. Set to false to disable
    bot-output messages.
  - `skip_aggregator_types`: Defaults to `None`. Set to a list of
    strings that match aggregation types that should not be included
    in bot-output messages. (Ex. `credit_card`)
2025-11-21 17:16:10 -05:00
mattie ruth backman
4f30a48ecd Rime and Cartesia TTS Updates:
`CartesiaTTSService`:
 - Modified use of custom default text_aggregator to avoid deprecation warnings and push users
   towards use of transformers or the `LLMTextProcessor`
 - Added convenience methods for taking advantage of Cartesia's SSML tags: spell, emotion,
   pauses, volume, and speed.

`RimeTTSService`:
 - Modified use of custom default text_aggregator to avoid deprecation warnings and push users
   towards use of transformers or the `LLMTextProcessor`
 - Added convenience methods for taking advantage of Rime's customization options: spell,
   pauses, pronunciations, and inline speed control.
2025-11-21 17:16:10 -05:00
mattie ruth backman
ecbc41045c Added ability to transform text just-in-time before it gets sent to the TTS 2025-11-21 17:16:10 -05:00
mattie ruth backman
e1528d0f0c Added support to TTS services to skip sending text to the...
the actual TTS service to be spoken based on its aggregation type.
2025-11-21 17:16:10 -05:00
mattie ruth backman
6b6d760cf1 Introduced LLMTextProcessor and deprecatd custom text_aggregators in TTS
Introduced `LLMTextProcessor`: A new processor meant to allow customization for how
LLMTextFrames should be aggregated and considered. It's purpose is to turn
`LLMTextFrame`s into `AggregatedTextFrame`s. By default, a TTSService will still
aggregate `LLMTextFrame`s by sentence for the service to consume. However, if you
wish to override how the llm text is aggregated, you should no longer override the
TTS's internal text_aggregator, but instead, insert this processor between your LLM
and TTS in the pipeline.
2025-11-21 17:16:10 -05:00
mattie ruth backman
7a4372a909 Introduced a new AggregatedTextFrame Frame type that TTSTextFrame inherits from
This frame introduces an `aggregated_by` field to describe the type of text included
in the frame and allows unspoken groupings of text to be pushed through the pipeline
and treated similar to TTSTextFrames.
2025-11-21 17:16:10 -05:00
mattie ruth backman
0e820a01b9 Introduce append_to_context to TextFrames
Adding support for setting whether or not the text in the TextFrame
should be added to the LLM context (by the LLM assistant aggregator).
Defaults to `True`.
2025-11-21 17:16:10 -05:00
mattie ruth backman
24266c238f Augmented PatternPairAggregator so that matched patterns can...
be treated as their own aggregation, taking advantage of the new
ability to assign a type to an aggregation
2025-11-21 17:16:10 -05:00
mattie ruth backman
dcc20f86e1 Updated the BaseTextAggregator to categorize aggregations
Modified the BaseTextAggregator type so that when text gets aggregated, metadata can
be associated with it. Currently, that just means a `type`, so that the aggregation
can be classified or described. Changes made to support this:
  - **IMPORTANT**: Aggregators are now expected to strip leading/trailing white space
    characters before returning their aggregation from `aggregation()` or `.text`. This
    way all aggregators have a consistent contract allowing downstream use to know how
    to stitch aggregations back together
  - Introduced a new `Aggregation` dataclass to represent both the aggregated `text` and
    a string identifying the `type` of aggregation (ex. "sentence", "word", "my custom
    aggregation")
  - **BREAKING**: `BaseTextAggregator.text` now returns an `Aggregation` (instead of `str`).
    To update: `aggregated_text = myAggregator.text` -> `aggregated_text = myAggregator.text.text`
  - **BREAKING**: `BaseTextAggregator.aggregate()` now returns `Optional[Aggregation]`
    (instead of `Optional[str]`). To update:
      ```
      aggregation = myAggregator.aggregate(text)
      if (aggregation):
        print(f"successfully aggregated text: {aggregation.text}") // instead of {aggregation}
      ```
  - `SimpleTextAggregator`, `SkipTagsAggregator`, `PatternPairAggregator` updated to
     produce/consume `Aggregation` objects.
  - All uses of the above Aggregators have been updated accordingly.
2025-11-21 17:16:10 -05:00
fbarril
ec8964425a add livekit helper 2025-11-21 00:27:57 +00:00
Vanessa Pyne
26918728df Merge pull request #3096 from pipecat-ai/vp-minimax-2962-v2
minimax 2962 language updates
2025-11-20 10:41:35 -06:00
vipyne
954849379b cleanup 2025-11-20 10:41:09 -06:00
vipyne
06542a2dbc Update CHANGELOG 2025-11-20 10:41:09 -06:00
Vanessa Pyne
59d40eac45 Update src/pipecat/services/minimax/tts.py
Co-authored-by: Mark Backman <mark@daily.co>

add warning
2025-11-20 10:41:09 -06:00
vipyne
17cf6c56cf minimax updates
some `debug`s -> `trace`s

add western US base_url to docs

ensure error_message is defined

add deprecation warning for `english_normalization` param
2025-11-20 10:41:09 -06:00
minimax
616e6ba351 docs(minimax): add API endpoint comment for west US region 2025-11-20 10:41:08 -06:00
minimax
f3cb5e0106 feat(minimax): comprehensive updates to TTS service
- Add support for speech-2.6-hd and speech-2.6-turbo models
- Add 16 new languages (total 40): Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino, Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian, Swedish, Tamil
- Add new emotions: calm and fluent
- Add new parameters: text_normalization (renamed from english_normalization), latex_read, force_cbr, exclude_aggregated_audio, subtitle_enable, subtitle_type
- Extract trace_id from response headers for all requests
- Improve error handling for non-streaming error responses
- Add detailed extra_info logging (audio_length, audio_size, usage_characters, word_count)
- Add validation warnings for language/model compatibility
- Fix silent error issue where HTTP 200 responses with errors were ignored

BREAKING CHANGE: Renamed parameter english_normalization to text_normalization
2025-11-20 10:41:08 -06:00
Aleix Conchillo Flaqué
c89f230c99 fix CHANGELOG 2025-11-20 08:40:30 -08:00
Aleix Conchillo Flaqué
69cd5716cd Merge pull request #3102 from pipecat-ai/aleix/daily-python-0.22.0
pyproject: update daily-python to 0.22.0
2025-11-20 08:35:39 -08:00
Mark Backman
ab58f72322 Merge pull request #3101 from hwuiwon/hw/inworld-talking-speed
feat: Add speaking rate control to Inworld TTS service.
2025-11-20 09:50:55 -05:00
Hwuiwon Kim
ead361f665 fix 2025-11-20 07:45:13 -05:00
Aleix Conchillo Flaqué
fa6b8851ed pyproject: update daily-python to 0.22.0 2025-11-19 21:56:38 -08:00
Hwuiwon Kim
1cc69d475d feat: Add speaking rate control to Inworld TTS service & fix param cases 2025-11-19 22:57:53 -05:00
Mark Backman
51bdd8b728 Merge pull request #3097 from hwuiwon/fix-typo
Fix typo in STT event handler documentation
2025-11-19 17:10:32 -05:00
Hwuiwon Kim
30ff488714 Fix typo in event handler documentation 2025-11-19 17:04:07 -05:00
Gokul Js
0707141998 fix 2025-11-20 01:36:35 +05:30
Gokul Js
cc861d6b70 Refactor WebSocket connection code in RimeNonJsonTTSService for improved readability 2025-11-19 22:46:36 +05:30
Gokul Js
de4e9c54f6 Increase WebSocket max size limit in RimeNonJsonTTSService to enhance data handling capacity 2025-11-19 22:44:50 +05:30
Gokul Js
da671cd232 Fix whitespace inconsistency in audio flushing method of RimeNonJsonTTSService 2025-11-19 22:19:36 +05:30
Gokul Js
1d9696e614 Add audio flushing after sending text in RimeNonJsonTTSService
This update ensures that audio is flushed immediately after sending bare text to the WebSocket, improving the responsiveness of the Text-to-Speech service.
2025-11-19 22:19:00 +05:30
Gokul Js
afeef94900 Remove unused audio_format parameter from extra settings in RimeNonJsonTTSService 2025-11-19 04:55:14 +05:30
Gokul Js
860d9c4f29 Refactor _update_settings method in RimeNonJsonTTSService for improved readability and maintainability 2025-11-19 04:53:27 +05:30
Gokul Js
4393191166 Add method to update settings in RimeNonJsonTTSService 2025-11-19 04:53:21 +05:30
Gokul Js
88daad524e Refactor whitespace in RimeNonJsonTTSService to improve code readability 2025-11-19 03:43:49 +05:30
Gokul Js
66c58f8155 fix 2025-11-19 03:40:59 +05:30
Gokul Js
7bbb5be910 format fix 2025-11-19 03:35:54 +05:30
Gokul Js
0dcb65bd56 add run tts methos for rimeNonJsonTTs 2025-11-19 03:34:58 +05:30
Gokul Js
2784b0f438 Add RimeNonJsonTTSService for non-JSON WebSocket API support
This commit introduces the RimeNonJsonTTSService class, enabling Text-to-Speech synthesis over WebSocket endpoints that require plain text messages. The service includes configuration parameters for language, segmentation, and audio settings, and handles WebSocket connections for raw audio byte transmission. Limitations include the lack of support for word-level timestamps and context IDs.
2025-11-19 03:24:57 +05:30
Martin Liu
8dfc59be13 Include pts in incoming video and audio frames 2025-11-12 18:36:56 -05:00
Thu Nguyen
35593b8574 Add cached and reasoning token metrics to OpenTelemetry spans 2025-11-09 00:38:30 +07:00
725 changed files with 46315 additions and 15443 deletions

View File

@@ -0,0 +1,47 @@
---
name: changelog
description: Create changelog files for important commits in a PR
---
Create changelog files for the important commits in this PR. The PR number is provided as an argument.
## Instructions
1. Skip changelog for: documentation-only, internal refactoring, test-only, CI changes.
2. First, check what commits are on the current branch compared to main:
```
git log main..HEAD --oneline
```
3. For each significant change, create a changelog file in the `changelog/` folder using the format:
Allowed types: `added`, `changed`, `deprecated`, `removed`, `fixed`, `security`, `performance`, `other`
- `{PR_NUMBER}.added.md` - for new features
- `{PR_NUMBER}.added.2.md`, `{PR_NUMBER}.added.3.md` - for additional entries of the same type
- `{PR_NUMBER}.changed.md` - for changes to existing functionality
- `{PR_NUMBER}.fixed.md` - for bug fixes
- `{PR_NUMBER}.deprecated.md` - for deprecations
- `{PR_NUMBER}.removed.md` - for removed features
- `{PR_NUMBER}.security.md` - for security fixes
- `{PR_NUMBER}.performance.md` - for performance improvements
- `{PR_NUMBER}.other.md` - for other changes
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
5. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.
6. Use ⚠️ emoji prefix for breaking changes.
## Example
For PR #3519 with a new feature and a bug fix:
`changelog/3519.added.md`:
```
- Added `SomeNewFeature` for doing something useful.
```
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly.
```

View File

@@ -0,0 +1,257 @@
---
name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
## Instructions
1. First, find the class in the codebase:
```
Search for "class ClassName" in src/pipecat/
```
2. If multiple files contain that class name:
- List all matches with their file paths
- Ask the user which one they want to document
- Wait for confirmation before proceeding
3. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component
4. Apply documentation in this order:
- Module docstring (at top, after imports)
- Class docstrings
- `__init__` methods (always document constructor parameters)
- Public methods (not starting with `_`)
- Dataclass/config classes with field descriptions
5. Skip documentation for:
- Private methods (starting with `_`)
- Simple dunder methods (`__str__`, `__repr__`, `__post_init__`)
- Very simple pass-through properties
- **Already documented code** - If a class, method, or function already has a complete docstring that follows the project style, do not modify it. A docstring is complete if it has:
- A one-line summary
- Args section (if it has parameters)
- Returns section (if it returns something meaningful)
- Only add or improve documentation where it is missing or incomplete
## Module Docstring Format
```python
"""[One-line description of module purpose].
[Optional: Longer explanation of functionality, key classes, or use cases.]
"""
```
Example:
```python
"""Neuphonic text-to-speech service implementations.
This module provides WebSocket and HTTP-based integrations with Neuphonic's
text-to-speech API for real-time audio synthesis.
"""
```
## Class Docstring Format
```python
class ClassName:
"""One-line summary describing what the class does.
[Longer description explaining purpose, behavior, and key features.
Use action-oriented language.]
[Optional: Event handlers, usage notes, or important caveats.]
"""
```
Example:
```python
class FrameProcessor(BaseObject):
"""Base class for all frame processors in the pipeline.
Frame processors are the building blocks of Pipecat pipelines, they can be
linked to form complex processing pipelines. They receive frames, process
them, and pass them to the next or previous processor in the chain.
Event handlers available:
- on_before_process_frame: Called before a frame is processed
- on_after_process_frame: Called after a frame is processed
Example::
@processor.event_handler("on_before_process_frame")
async def on_before_process_frame(processor, frame):
...
@processor.event_handler("on_after_process_frame")
async def on_after_process_frame(processor, frame):
...
"""
```
Note: When listing event handlers, do NOT use backticks. Include an `Example::` section (with double colon for Sphinx) showing the decorator pattern and function signature for each event.
## Constructor (`__init__`) Format
```python
def __init__(self, *, param1: Type, param2: Type = default, **kwargs):
"""Initialize the [ClassName].
Args:
param1: Description of param1 and its purpose.
param2: Description of param2. Defaults to [default].
**kwargs: Additional arguments passed to parent class.
"""
```
Example:
```python
def __init__(
self,
*,
api_key: str,
voice_id: Optional[str] = None,
sample_rate: Optional[int] = 22050,
**kwargs,
):
"""Initialize the Neuphonic TTS service.
Args:
api_key: Neuphonic API key for authentication.
voice_id: ID of the voice to use for synthesis.
sample_rate: Audio sample rate in Hz. Defaults to 22050.
**kwargs: Additional arguments passed to parent InterruptibleTTSService.
"""
```
## Method Docstring Format
```python
async def method_name(self, param1: Type) -> ReturnType:
"""One-line summary of what method does.
[Longer description if behavior isn't obvious.]
Args:
param1: Description of param1.
Returns:
Description of return value.
Raises:
ExceptionType: When this exception is raised.
"""
```
Example:
```python
async def put(self, item: Tuple[Frame, FrameDirection, FrameCallback]):
"""Put an item into the priority queue.
System frames (`SystemFrame`) have higher priority than any other
frames. If a non-frame item is provided it will have the highest priority.
Args:
item: The item to enqueue.
"""
```
## Dataclass/Config Format
```python
@dataclass
class ConfigName:
"""One-line description of configuration.
[Explanation of when/how to use this config.]
Parameters:
field1: Description of field1.
field2: Description of field2. Defaults to [default].
"""
field1: Type
field2: Type = default_value
```
Example:
```python
@dataclass
class FrameProcessorSetup:
"""Configuration parameters for frame processor initialization.
Parameters:
clock: The clock instance for timing operations.
task_manager: The task manager for handling async operations.
observer: Optional observer for monitoring frame processing events.
"""
clock: BaseClock
task_manager: BaseTaskManager
observer: Optional[BaseObserver] = None
```
## Enum Documentation Format
```python
class EnumName(Enum):
"""One-line description of the enum purpose.
[Longer description of how the enum is used.]
Parameters:
VALUE1: Description of VALUE1.
VALUE2: Description of VALUE2.
"""
VALUE1 = 1
VALUE2 = 2
```
## Writing Style Guidelines
- **Concise and professional** - No casual language or filler words
- **Action-oriented** - Start with verbs: "Processes...", "Manages...", "Converts..."
- **Purpose before implementation** - Explain WHY before HOW
- **Clear parameter descriptions** - Include type hints, defaults, and purpose
- **No redundant type info** - Type hints are in the signature, don't repeat in description
- **Use backticks for code references** - Wrap class names, method names, event names, parameter names, and code snippets in backticks
Good: "Neuphonic API key for authentication."
Bad: "str: The API key (string) that is used for authenticating with Neuphonic."
Good: "Triggers `on_speech_started` when the `VADAnalyzer` detects speech."
Bad: "Triggers on_speech_started when the VADAnalyzer detects speech."
## Deprecation Notice Format
When documenting deprecated code:
```python
"""[Description].
.. deprecated:: X.X.X
`ClassName` is deprecated and will be removed in a future version.
Use `NewClassName` instead.
"""
```
## Checklist
Before finishing, verify:
- [ ] Module has a docstring at the top (after copyright header and imports)
- [ ] All public classes have docstrings
- [ ] All `__init__` methods document their parameters
- [ ] All public methods have docstrings with Args/Returns/Raises as needed
- [ ] Dataclasses use "Parameters:" section for field descriptions
- [ ] Enums document each value in "Parameters:" section
- [ ] Writing is concise and action-oriented
- [ ] No documentation added to private methods (starting with `_`)
- [ ] Existing complete docstrings were left unchanged

View File

@@ -0,0 +1,128 @@
---
name: pr-description
description: Update a GitHub PR description with a summary of changes
---
Update a GitHub pull request description based on the changes in the PR.
## Arguments
```
/pr-description <PR_NUMBER> [--fixes <ISSUE_NUMBERS>]
```
- `PR_NUMBER` (required): The pull request number to update
- `--fixes` (optional): Comma-separated issue numbers that this PR fixes (e.g., `--fixes 123,456`)
Examples:
- `/pr-description 3534`
- `/pr-description 3534 --fixes 123`
- `/pr-description 3534 --fixes 123,456,789`
## Instructions
1. First, gather information about the PR:
- Use GitHub plugin to get PR details (title, current description, base branch)
- Use local git to get commits: `git log main..HEAD --oneline`
- Use local git to get the diff: `git diff main..HEAD`
- Parse any `--fixes` argument for issue numbers
2. Check the existing PR description:
- If it already has a complete, accurate description that reflects the changes, do nothing
- If it's missing sections, incomplete, or outdated compared to the actual changes, proceed to update
- If it only has the template placeholder text, generate a full description
3. Analyze the changes:
- Understand the purpose of each commit
- Identify any breaking changes (API changes, removed features, behavior changes)
- Look for new features, bug fixes, refactoring, or documentation changes
- Collect issue numbers from:
- The `--fixes` argument (if provided)
- Commit messages (patterns like "Fixes #123", "Closes #456", "Resolves #789")
4. Generate or update the PR description with these sections:
## PR Description Format
### Summary (always include)
Brief bullet points describing what changed and why. Focus on the *purpose* and *impact*, not implementation details.
```markdown
## Summary
- Added X to enable Y
- Fixed bug where Z would happen
- Refactored W for better maintainability
```
### Breaking Changes (include only if applicable)
Document any changes that affect existing users or APIs.
```markdown
## Breaking Changes
- `ClassName.method()` now requires a `param` argument
- Removed deprecated `old_function()` - use `new_function()` instead
```
### Testing (include when non-obvious)
How to verify the changes work. Skip for trivial changes.
```markdown
## Testing
- Run `uv run pytest tests/test_feature.py` to verify the fix
- Example usage: `uv run examples/new_feature.py`
```
### Fixes (include if issues are provided or found in commits)
List issues this PR fixes. GitHub will automatically close these issues when the PR is merged.
```markdown
## Fixes
- Fixes #123
- Fixes #456
```
Note: Use "Fixes #X" format (not "Closes" or "Resolves") for consistency. Each issue should be on its own line with "Fixes" to ensure GitHub auto-closes them.
## Guidelines
- **Be concise** - Reviewers should understand the PR in 30 seconds
- **Focus on why** - The diff shows *what* changed, explain *why*
- **Skip empty sections** - Only include sections that have content
- **Use bullet points** - Easier to scan than paragraphs
- **Don't duplicate the diff** - Avoid listing every file or line changed
## Example Output
```markdown
## Summary
- Added `/docstring` skill for documenting Python modules with Google-style docstrings
- Skill finds classes by name and handles conflicts when multiple matches exist
- Skips already-documented code to avoid unnecessary changes
## Testing
/docstring ClassName
## Fixes
- Fixes #123
```
## Checklist
Before updating the PR:
- [ ] Verified existing description needs updating (not already complete)
- [ ] Summary accurately reflects the changes
- [ ] Breaking changes are clearly documented (if any)
- [ ] No unnecessary sections included
- [ ] Description is concise and scannable

View File

@@ -21,20 +21,20 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
- name: Set up Python
run: uv python install 3.10
run: uv python install 3.12
- name: Install development dependencies
run: uv sync --group dev
- name: Build project
run: uv build
- name: Install project in editable mode
run: uv pip install --editable .
run: uv pip install --editable .

View File

@@ -33,7 +33,14 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra google \
--extra langchain \
--extra livekit \
--extra piper \
--extra websocket
- name: Run tests with coverage
run: |

View File

@@ -22,22 +22,22 @@ jobs:
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
- name: Set up Python
run: uv python install 3.10
run: uv python install 3.12
- name: Install development dependencies
run: uv sync --group dev
- name: Ruff formatter
id: ruff-format
run: uv run ruff format --diff
- name: Ruff linter (all rules)
id: ruff-check
run: uv run ruff check
run: uv run ruff check

174
.github/workflows/generate-changelog.yml vendored Normal file
View File

@@ -0,0 +1,174 @@
name: Generate Changelog for Release
on:
workflow_dispatch:
inputs:
version:
description: "Release version (e.g., 0.0.97)"
required: true
type: string
date:
description: "Release date (YYYY-MM-DD format, defaults to today)"
required: false
type: string
default: ""
permissions:
contents: write
pull-requests: write
jobs:
generate-changelog:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
- name: Install dependencies
run: |
uv sync --group dev
- name: Set release date
id: set_date
run: |
if [ -z "${{ inputs.date }}" ]; then
RELEASE_DATE=$(date +%Y-%m-%d)
echo "Using today's date: $RELEASE_DATE"
else
RELEASE_DATE="${{ inputs.date }}"
echo "Using provided date: $RELEASE_DATE"
fi
echo "release_date=$RELEASE_DATE" >> $GITHUB_OUTPUT
- name: Validate inputs
run: |
# Validate version format (basic check)
if ! [[ "${{ inputs.version }}" =~ ^[0-9]+\.[0-9]+\.[0-9]+.*$ ]]; then
echo "Error: Version must be in format X.Y.Z (e.g., 0.0.97)"
exit 1
fi
# Validate date format if provided
if [ -n "${{ inputs.date }}" ]; then
if ! date -d "${{ inputs.date }}" >/dev/null 2>&1; then
# Try macOS date format
if ! date -j -f "%Y-%m-%d" "${{ inputs.date }}" >/dev/null 2>&1; then
echo "Error: Date must be in YYYY-MM-DD format (e.g., 2025-12-04)"
exit 1
fi
fi
fi
- name: Check for changelog fragments
id: check_fragments
run: |
FRAGMENT_COUNT=$(find changelog -name "*.md" ! -name "_template.md.j2" | wc -l | tr -d ' ')
echo "fragment_count=$FRAGMENT_COUNT" >> $GITHUB_OUTPUT
if [ "$FRAGMENT_COUNT" -eq "0" ]; then
echo "❌ Error: No changelog fragments found in changelog/"
echo ""
echo "Cannot create a release without changelog entries."
echo "Add changelog fragments to the changelog/ directory (e.g., 1234.added.md) and try again."
exit 1
fi
# Validate fragment types
VALID_TYPES="added changed deprecated removed fixed security other"
INVALID_FRAGMENTS=""
for file in changelog/*.md; do
# Skip template
if [[ "$file" == "changelog/_template.md.j2" ]]; then
continue
fi
# Extract type from filename (e.g., 1234.added.md -> added)
filename=$(basename "$file")
# Handle both 1234.added.md and 1234.added.2.md patterns
type=$(echo "$filename" | sed -E 's/^[0-9]+\.([a-z]+)(\.[0-9]+)?\.md$/\1/')
# Check if type is valid
if ! echo "$VALID_TYPES" | grep -wq "$type"; then
INVALID_FRAGMENTS="$INVALID_FRAGMENTS\n - $filename (type: '$type')"
fi
done
if [ -n "$INVALID_FRAGMENTS" ]; then
echo "❌ Error: Invalid changelog fragment types found:"
echo -e "$INVALID_FRAGMENTS"
echo ""
echo "Valid types are: $VALID_TYPES"
echo "Example: 1234.added.md, 5678.fixed.md"
exit 1
fi
echo "✓ Found $FRAGMENT_COUNT changelog fragment(s)"
echo "has_fragments=true" >> $GITHUB_OUTPUT
- name: Preview changelog
run: |
echo "## Preview of changelog for version ${{ inputs.version }}"
echo ""
uv run towncrier build --draft --version "${{ inputs.version }}" --date "${{ steps.set_date.outputs.release_date }}"
- name: Build changelog
run: |
uv run towncrier build --version "${{ inputs.version }}" --date "${{ steps.set_date.outputs.release_date }}" --yes
- name: Create Pull Request
uses: peter-evans/create-pull-request@v7
with:
token: ${{ secrets.GITHUB_TOKEN }}
commit-message: "Update changelog for version ${{ inputs.version }}"
title: "Release ${{ inputs.version }} - Changelog Update"
body: |
## Changelog Update for Release ${{ inputs.version }}
This PR updates the CHANGELOG.md with all changes for version **${{ inputs.version }}**.
### Summary
- **Version:** ${{ inputs.version }}
- **Date:** ${{ steps.set_date.outputs.release_date }}
- **Fragments processed:** ${{ steps.check_fragments.outputs.fragment_count }}
### What this PR does
- ✅ Adds new release section to CHANGELOG.md
- ✅ Removes processed changelog fragments
- ✅ Ready to merge for release
### Next Steps
1. Review the changelog entries below
2. Make any necessary edits to CHANGELOG.md if needed
3. Merge this PR
4. Continue with your release process
---
<details>
<summary>📋 Preview of changes</summary>
The changelog has been updated with entries from the following fragments:
```bash
${{ steps.check_fragments.outputs.fragment_count }} fragments processed
```
</details>
branch: changelog-${{ inputs.version }}
delete-branch: true
labels: |
changelog
release

View File

@@ -50,7 +50,6 @@ jobs:
run: |
uv sync --group dev --all-extras \
--no-extra krisp \
--no-extra ultravox \
--no-extra local-smart-turn \
--no-extra moondream \
--no-extra mlx-whisper

View File

@@ -37,7 +37,14 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra google \
--extra langchain \
--extra livekit \
--extra piper \
--extra websocket
- name: Test with pytest
run: |

16
.gitignore vendored
View File

@@ -4,7 +4,14 @@ __pycache__/
*~
venv
.venv
/.idea
.idea
.gradle
.next
next-env.d.ts
local.properties
*.log
*.lock
smart_turn_audio_log
#*#
# Distribution / Packaging
@@ -27,7 +34,7 @@ share/python-wheels/
*.egg
MANIFEST
.DS_Store
.env
.env*
fly.toml
# Examples
@@ -51,4 +58,7 @@ docs/api/_build/
docs/api/api
# uv
.python-version
.python-version
# Pipecat
whisker_setup.py

View File

@@ -11,7 +11,7 @@ build:
jobs:
post_install:
- pip install uv
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra ultravox --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
sphinx:
configuration: docs/api/conf.py

File diff suppressed because it is too large Load Diff

143
CLAUDE.md Normal file
View File

@@ -0,0 +1,143 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
```
## Architecture
### Frame-Based Pipeline Processing
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
Transport Input → Pipeline Source → [Processor1] → [Processor2] → ... → Pipeline Sink → Transport Output
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): External I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to exectue fast.
### Key Directories
| Directory | Purpose |
|---------------------------|----------------------------------------------------|
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
### Docstring Example
```python
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Pull Requests
After creating a PR, use `/changelog <pr_number>` to generate the changelog file and `/pr-description <pr_number>` to update the PR description.

View File

@@ -79,7 +79,7 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
**Examples:**
- [RivaSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/riva/stt.py)
- [NvidiaSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/nvidia/stt.py)
- [FalSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/fal/stt.py)
#### Key requirements:

View File

@@ -17,24 +17,122 @@ We welcome contributions of all kinds! Your help is appreciated. Follow these st
git checkout -b your-branch-name
```
4. **Make your changes**: Edit or add files as necessary.
5. **Test your changes**: Ensure that your changes look correct and follow the style set in the codebase.
6. **Commit your changes**: Once you're satisfied with your changes, commit them with a meaningful message.
5. **Add a changelog entry**: Create a changelog fragment file (see [Changelog Entries](#changelog-entries) below).
6. **Test your changes**: Ensure that your changes look correct and follow the style set in the codebase.
7. **Commit your changes**: Once you're satisfied with your changes, commit them with a meaningful message.
```bash
git commit -m "Description of your changes"
```
7. **Push your changes**: Push your branch to your forked repository.
8. **Push your changes**: Push your branch to your forked repository.
```bash
git push origin your-branch-name
```
8. **Submit a Pull Request (PR)**: Open a PR from your forked repository to the main branch of this repo.
9. **Submit a Pull Request (PR)**: Open a PR from your forked repository to the main branch of this repo.
> Important: Describe the changes you've made clearly!
Our maintainers will review your PR, and once everything is good, your contributions will be merged!
## Changelog Entries
Every pull request that makes a user-facing change should include a changelog entry. We use a changelog fragment system to avoid merge conflicts.
### Creating a Changelog Fragment
1. Create a new file in the `changelog/` directory with this naming pattern:
```
<PR_number>.<type>.md
```
2. Choose the appropriate type:
- `added.md` - New features
- `changed.md` - Changes in existing functionality
- `deprecated.md` - Soon-to-be removed features
- `removed.md` - Removed features
- `fixed.md` - Bug fixes
- `security.md` - Security fixes
- `other.md` - Other changes (documentation, dependencies, etc.)
3. Write your changelog entry as a Markdown bullet point. Include the `-` at the start:
**Example files:**
`changelog/1234.added.md`:
```markdown
- Added support for Anthropic Claude 3.5 Sonnet with improved streaming performance.
```
`changelog/5678.fixed.md`:
```markdown
- Fixed an issue where audio frames were dropped during high-load scenarios.
```
**For entries with nested bullets:**
`changelog/1234.changed.md`:
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```
### Multiple Changes in One PR
**Different types of changes:** Create separate fragment files for each type:
```
changelog/1234.added.md
changelog/1234.fixed.md
```
**Multiple changes of the same type:** Create numbered fragment files:
```
changelog/1234.changed.md
changelog/1234.changed.2.md
```
**Related changes:** Use nested bullets in a single fragment:
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```
**Rule of thumb:** One logical change per fragment file. If changes are unrelated, use separate files.
### Preview Your Changes
To see what your changelog entry will look like:
```bash
towncrier build --draft --version Unreleased
```
This won't modify any files, just show you a preview.
### When to Skip Changelog Entries
You can skip adding a changelog entry for:
- Documentation-only changes
- Internal refactoring with no user-facing impact
- Test-only changes
- CI/build configuration changes
If you're unsure whether your change needs a changelog entry, ask in your PR!
## Dependency Management
This project uses [uv](https://docs.astral.sh/uv/) for dependency management. The `uv.lock` file is committed to ensure reproducible builds.

View File

@@ -1,6 +1,6 @@
BSD 2-Clause License
Copyright (c) 20242025, Daily
Copyright (c) 20242026, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

View File

@@ -3,7 +3,6 @@
</div></h1>
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![codecov](https://codecov.io/gh/pipecat-ai/pipecat/graph/badge.svg?token=LNVUIVO4Y9)](https://codecov.io/gh/pipecat-ai/pipecat) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/pipecat-ai/pipecat)
[![](https://getmanta.ai/api/badges?text=Manta%20Graph&link=manta)](https://getmanta.ai/pipecat)
# 🎙️ Pipecat: Real-Time Voice & Multimodal AI Agents
@@ -72,19 +71,19 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
## 🧩 Available services
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
@@ -154,7 +153,6 @@ You can get started with Pipecat running on your local machine, then move your a
--no-extra gstreamer \
--no-extra krisp \
--no-extra local \
--no-extra ultravox # (ultravox not fully supported on macOS)
```
3. Install the git pre-commit hooks:

1
changelog/3134.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `ResembleAITTSService` for text-to-speech using Resemble AI's streaming WebSocket API with word-level timestamps and jitter buffering for smooth audio playback.

1
changelog/3355.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `UserBotLatencyObserver` for tracking user-to-bot response latency. When tracing is enabled, latency measurements are automatically recorded as `turn.user_bot_latency_seconds` attributes on OpenTelemetry turn spans.

View File

@@ -0,0 +1 @@
- Deprecated `UserBotLatencyLogObserver`. Use `UserBotLatencyObserver` directly with its `on_latency_measured` event handler instead.

1
changelog/3542.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed pipeline freeze when `InterruptionFrame` discards `EndFrame` or `StopFrame` by making terminal frames uninterruptible.

1
changelog/3589.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed OpenAI LLM stream not being closed on cancellation/exception, which could leak sockets.

1
changelog/3593.added.md Normal file
View File

@@ -0,0 +1 @@
- Added support for Inworld TTS Websocket Auto Mode for improved latency

View File

@@ -0,0 +1 @@
- Updated timestamps to be cumulative within an agent turn, using flushCompleted message as an indication of when timestamps from the server are reset to 0

1
changelog/3610.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `PipelineTask` adding duplicate `RTVIProcessor` and `RTVIObserver` when they were already provided in the pipeline or observers list. They are now detected and skipped, with appropriate warnings and errors logged for mismatched configurations.

View File

@@ -0,0 +1 @@
- Changed `KokoroTTSService` to use `kokoro-onnx` instead of `kokoro` as the underlying TTS engine.

1
changelog/3616.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed function call timeout task not being cancelled when the handler completes without calling `result_callback` or is cancelled externally, which caused `RuntimeWarning: coroutine was never awaited`.

5
changelog/3617.fixed.md Normal file
View File

@@ -0,0 +1,5 @@
- Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin
languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK
languages, causing text to accumulate until flush instead of being split at
sentence boundaries. Added fallback detection for unambiguous non-Latin
sentence-ending punctuation (e.g., `。`, ``, ``).

1
changelog/3623.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `PipelineTask` to also call `set_bot_ready()` when an external `RTVIProcessor` is provided.

1
changelog/3628.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `VADController` not broadcasting `SpeechControlParamsFrame` on startup, which prevented STT services from receiving VAD params needed for TTFB measurement.

1
changelog/3629.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `StopAsyncIteration` exceptions in `parse_telephony_websocket()` when WebSocket connections close before sending expected messages.

1
changelog/3630.added.md Normal file
View File

@@ -0,0 +1 @@
- Added RTVI function call lifecycle events (`llm-function-call-started`, `llm-function-call-in-progress`, `llm-function-call-stopped`) with configurable security levels via `RTVIObserverParams.function_call_report_level`. Supports per-function control over what information is exposed (`DISABLED`, `NONE`, `NAME`, or `FULL`).

View File

@@ -0,0 +1 @@
- Deprecated `RTVILLMFunctionCallMessage`, `RTVILLMFunctionCallMessageData`, and `RTVIProcessor.handle_function_call()`. Use the new `llm-function-call-in-progress` event sent automatically by `RTVIObserver` instead.

1
changelog/3635.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed WebSocket transport error when broadcasting `InputTransportMessageFrame` by correctly instantiating the frame with its message parameter.

1
changelog/3649.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed orphan OpenTelemetry spans during flow initialization and transitions in tracing.

View File

@@ -0,0 +1 @@
- Upgraded the `pipecat-ai-small-webrtc-prebuilt` package to v2.1.0.

1
changelog/3656.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `OpenAIRealtimeSTTService` for real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD and server-side VAD modes, noise reduction, and automatic reconnection.

10
changelog/3659.changed.md Normal file
View File

@@ -0,0 +1,10 @@
- ⚠️ The default `VADParams` `stop_secs` default is changing from `0.8` seconds
to `0.2` seconds. This change both simplifies the developer experience and
improves the performance of STT services. With a shorter `stop_secs` value,
STT services using a local VAD can finalize sooner, resulting in faster
transcription.
- `SpeechTimeoutUserTurnStopStrategy`: control how long to wait for
additional user speech using `user_speech_timeout` (default: 0.6 sec).
- `TurnAnalyzerUserTurnStopStrategy`: the turn analyzer automatically adjusts
the user wait time based on the audio input.

View File

@@ -0,0 +1 @@
- Moved interruption wait event from per-processor instance state to `InterruptionFrame` itself. Added `InterruptionFrame.complete()` to signal when the interruption has fully traversed the pipeline. Custom processors that block or consume an `InterruptionFrame` before it reaches the pipeline sink must call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()`. A warning is logged if completion does not happen within 2 seconds.

1
changelog/3663.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `SambaNovaLLMService` and `GoogleLLMOpenAIBetaService` streams not being closed on cancellation/exception, which could leak sockets.

View File

@@ -0,0 +1 @@
- Update the default model to `scribe_v2` for `ElevenLabsSTTService`.

View File

@@ -0,0 +1 @@
- Changed the `DeepgramSTTService` default setting for `smart_format` to `False`, as agents don't need smart formatting. Disabling this setting provides a small performance improvement, as well.

1
changelog/3667.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed an issue in `InworldTTSService` where punctuation was pronounced. Now, the `InworldTTSService` ensures proper spacing between sentences, resolving pronunciation issues.

1
changelog/3668.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `ParallelPipeline` allowing frames pushed by internal processors to escape during lifecycle frame (`StartFrame`/`EndFrame`/`CancelFrame`) synchronization. These frames are now buffered and flushed after all branches complete.

1
changelog/3678.added.md Normal file
View File

@@ -0,0 +1 @@
- Added pyright basic type checking configuration for the core framework.

16
changelog/_template.md.j2 Normal file
View File

@@ -0,0 +1,16 @@
{% for section, _ in sections.items() %}
{% if sections[section] %}
{% for category, val in definitions.items() if category in sections[section]%}
### {{ definitions[category]['name'] }}
{% for text, values in sections[section][category].items() %}
{{ text }}
(PR {{ values|join(', ') }})
{% endfor %}
{% endfor %}
{% else %}
No significant changes.
{% endif %}
{% endfor %}

View File

@@ -2,7 +2,7 @@
# Build docs using uv
echo "Installing dependencies with uv..."
uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra ultravox --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
# Check if sphinx-build is available
if ! uv run sphinx-build --version &> /dev/null; then
@@ -24,4 +24,4 @@ if [ $? -eq 0 ]; then
else
echo "Documentation build failed!" >&2
exit 1
fi
fi

View File

@@ -61,9 +61,6 @@ autodoc_mock_imports = [
# OpenCV - sometimes has import issues during docs build
"cv2",
# Heavy ML packages excluded from ReadTheDocs
# ultravox dependencies
"vllm",
"vllm.engine.arg_utils",
# local-smart-turn dependencies
"coremltools",
"coremltools.models",
@@ -94,6 +91,25 @@ autodoc_mock_imports = [
# MLX dependencies (Apple Silicon specific)
"mlx",
"mlx_whisper", # Note: might need underscore format too
# Pydantic v2 compatibility issues in third-party SDKs
"hume",
"hume.tts",
"hume.tts.types",
"cartesia",
"camb",
"sarvamai",
"openpipe",
"openai.types.beta.realtime",
"langchain_core",
"langchain_core.messages",
# FastAPI - Pydantic v2 compatibility issues during Sphinx autodoc
"fastapi",
"fastapi.applications",
"fastapi.routing",
"fastapi.params",
"fastapi.middleware",
"fastapi.responses",
"uvicorn",
]
# HTML output settings
@@ -119,7 +135,6 @@ def import_core_modules():
"pipecat.observers",
"pipecat.runner",
"pipecat.serializers",
"pipecat.sync",
"pipecat.transcriptions",
"pipecat.utils",
]

View File

@@ -30,7 +30,6 @@ Quick Links
Runner <api/pipecat.runner>
Serializers <api/pipecat.serializers>
Services <api/pipecat.services>
Sync <api/pipecat.sync>
Transcriptions <api/pipecat.transcriptions>
Transports <api/pipecat.transports>
Utils <api/pipecat.utils>
Utils <api/pipecat.utils>

View File

@@ -31,6 +31,9 @@ AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Camb.ai
CAMB_API_KEY=...
# Cartesia
CARTESIA_API_KEY=...
CARTESIA_VOICE_ID=...
@@ -40,10 +43,11 @@ CEREBRAS_API_KEY=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
DAILY_ROOM_URL=https://...
# Deepgram
DEEPGRAM_API_KEY=...
SAGEMAKER_ENDPOINT_NAME=...
# DeepSeek
DEEPSEEK_API_KEY=...
@@ -72,14 +76,21 @@ GOOGLE_CLOUD_PROJECT_ID=...
GOOGLE_CLOUD_LOCATION=...
GOOGLE_TEST_CREDENTIALS=...
# Gradium
GRAPDIUM_API_KEY=...
# Grok
GROK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Hathora
HATHORA_API_KEY=...
# Heygen
HEYGEN_API_KEY=...
HEYGEN_LIVE_AVATAR_API_KEY=...
# Hume
HUME_API_KEY=...
@@ -92,7 +103,8 @@ INWORLD_API_KEY=...
KRISP_MODEL_PATH=...
# Krisp Viva
KRISP_VIVA_MODEL_PATH=...
KRISP_VIVA_FILTER_MODEL_PATH=...
KRISP_VIVA_TURN_MODEL_PATH=...
# LiveKit
LIVEKIT_API_KEY=...
@@ -144,6 +156,10 @@ PLIVO_AUTH_TOKEN=...
# Qwen
QWEN_API_KEY=...
# Resemble AI
RESEMBLE_API_KEY=
RESEMBLE_VOICE_UUID=
# Rime
RIME_API_KEY=...
RIME_VOICE_ID=...
@@ -186,8 +202,11 @@ TOGETHER_API_KEY=...
TWILIO_ACCOUNT_SID=...
TWILIO_AUTH_TOKEN=...
# Ultravox Realtime
ULTRAVOX_API_KEY=...
# WhatsApp
WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
WHATSAPP_PHONE_NUMBER_ID=...
WHATSAPP_APP_SECRET=...
WHATSAPP_APP_SECRET=...

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -16,7 +16,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.piper.tts import PiperTTSService
from pipecat.services.piper.tts import PiperHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -24,9 +24,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
@@ -39,7 +38,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = PiperTTSService(
tts = PiperHttpTTSService(
base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -15,7 +15,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.riva.tts import FastPitchTTSService
from pipecat.services.nvidia.tts import NvidiaTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
@@ -36,7 +35,7 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
tts = NvidiaTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
task = PipelineTask(
Pipeline([tts, transport.output()]),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -25,9 +25,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -23,9 +23,8 @@ from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -22,9 +22,8 @@ from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -17,22 +17,25 @@ from fastapi.responses import RedirectResponse
from loguru import logger
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.smallwebrtc.connection import IceServer, SmallWebRTCConnection
from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -60,8 +63,6 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
)
@@ -82,17 +83,25 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,20 +12,23 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.daily import configure
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.daily.transport import DailyParams, DailyTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -45,8 +48,6 @@ async def main():
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
)
@@ -65,16 +66,26 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -12,10 +12,8 @@ import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
InterruptionFrame,
TranscriptionFrame,
@@ -27,12 +25,17 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.livekit import configure
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.livekit.transport import LiveKitParams, LiveKitTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -50,8 +53,6 @@ async def main():
params=LiveKitParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
)
@@ -75,17 +76,25 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -23,7 +23,6 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
@@ -66,9 +65,8 @@ class MonthPrepender(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_out_enabled=True,

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -26,7 +26,6 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaHttpTTSService

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -9,10 +9,8 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import Frame, LLMRunFrame, MetricsFrame
from pipecat.metrics.metrics import (
LLMUsageMetricsData,
@@ -24,7 +22,10 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -34,6 +35,8 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -58,27 +61,20 @@ class MetricsLogger(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -105,18 +101,26 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
ml,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,10 +10,8 @@ from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
@@ -25,7 +23,10 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -34,6 +35,8 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -73,9 +76,8 @@ class ImageSyncAggregator(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -83,8 +85,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
@@ -92,8 +92,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -118,7 +116,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
@@ -129,12 +135,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(),
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
image_sync_aggregator,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -9,16 +9,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.stt import CartesiaSTTService
@@ -27,31 +28,26 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -76,17 +72,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -9,16 +9,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -27,30 +28,25 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -75,17 +71,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -15,10 +15,10 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response import (
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
@@ -29,12 +29,12 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
4. Text-to-Speech (TTS)
- Low latency streaming audio synthesis
- Multiple voice options available including `sarah`, `theo`, and `megan`
- Multiple voice options available including `sarah`, `theo`, `megan` and `jack`
5. Configuration Options
- `operating_point` parameter defaults to `ENHANCED` for optimal accuracy
@@ -95,10 +95,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
enable_vad=True,
enable_diarization=True,
focus_speakers=["S1"],
end_of_utterance_silence_trigger=0.5,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
# focus_speakers=["S1"],
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
speaker_passive_format="<PASSIVE><{speaker_id}>{text}</{speaker_id}></PASSIVE>",
),
@@ -132,20 +130,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,19 +10,17 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response import (
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
@@ -33,30 +31,25 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -75,7 +68,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
TTS Features:
- Low latency streaming audio synthesis
- Multiple voice options available including `sarah`, `theo`, and `megan`
- Multiple voice options available including `sarah`, `theo`, `megan` and `jack`
For more information:
- STT: https://docs.speechmatics.com/rt-api-ref
@@ -88,8 +81,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
enable_diarization=True,
end_of_utterance_silence_trigger=0.5,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
@@ -121,20 +112,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(aggregation_timeout=0.005),
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -15,16 +15,17 @@ from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMMessagesUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frameworks.langchain import LangchainProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -33,6 +34,8 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -46,27 +49,20 @@ def get_session_history(session_id: str) -> BaseChatMessageHistory:
return message_store[session_id]
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -102,17 +98,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
lc = LangchainProcessor(history_chain)
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
lc, # Langchain
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -17,6 +17,7 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response_universal import (
LLMContext,
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -26,13 +27,13 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -69,17 +70,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,16 +11,17 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -29,31 +30,26 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -80,17 +76,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -0,0 +1,141 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService
from pipecat.services.deepgram.stt_sagemaker import DeepgramSageMakerSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Initialize Deepgram SageMaker STT Service
# This requires:
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram model
stt = DeepgramSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
model="us.amazon.nova-pro-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,17 +11,15 @@ from deepgram import LiveOptions
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import (
InterruptionFrame,
LLMRunFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
)
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -30,13 +28,13 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -73,17 +71,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)
@@ -96,14 +97,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@stt.event_handler("on_speech_started")
async def on_speech_started(stt, *args, **kwargs):
await task.queue_frames([InterruptionFrame(), UserStartedSpeakingFrame()])
@stt.event_handler("on_utterance_end")
async def on_utterance_end(stt, *args, **kwargs):
await task.queue_frames([UserStoppedSpeakingFrame()])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -28,31 +29,26 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -74,17 +70,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,16 +11,17 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.elevenlabs.stt import ElevenLabsSTTService
@@ -29,31 +30,26 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -84,17 +80,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.elevenlabs.stt import ElevenLabsRealtimeSTTService
@@ -28,31 +29,26 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -77,17 +73,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -28,30 +29,25 @@ from pipecat.services.playht.tts import PlayHTHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -77,17 +73,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -29,30 +30,25 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -79,17 +75,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.azure.llm import AzureLLMService
@@ -28,30 +29,25 @@ from pipecat.services.azure.tts import AzureHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -83,17 +79,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.azure.llm import AzureLLMService
@@ -28,30 +29,25 @@ from pipecat.services.azure.tts import AzureTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -83,17 +79,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -0,0 +1,135 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_out_sample_rate=24000,
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,48 +10,45 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.stt import OpenAIRealtimeSTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -59,10 +56,15 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAISTTService(
stt = OpenAIRealtimeSTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
# Uses local VAD by default.
# To enable server-side VAD, set turn_detection=None or
# a dict with server_vad settings.
# turn_detection={"type": "server_vad", "threshold": 0.5},
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
@@ -77,17 +79,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,16 +11,17 @@ import time
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -29,30 +30,25 @@ from pipecat.services.openpipe.llm import OpenPipeLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -82,17 +78,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,16 +11,17 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -29,30 +30,25 @@ from pipecat.services.xtts.tts import XTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -80,17 +76,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,57 +10,44 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.ultravox.stt import UltravoxSTTService
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# NOTE: This example requires GPU resources to run efficiently.
# The Ultravox model is compute-intensive and performs best with GPU acceleration.
# This can be deployed on cloud GPU providers like Cerebrium.ai for optimal performance.
# Want to initialize the ultravox processor since it takes time to load the model and dont
# want to load it every time the pipeline is run
ultravox_processor = UltravoxSTTService(
model_name="fixie-ai/ultravox-v0_5-llama-3_1-8b",
hf_token=os.getenv("HF_TOKEN"),
)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -68,17 +55,49 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY", ""),
region=os.getenv("GLADIA_REGION"),
params=GladiaInputParams(
language_config=LanguageConfig(
languages=[Language.EN],
),
enable_vad=True,
),
)
tts = CartesiaTTSService(
api_key=os.environ.get("CARTESIA_API_KEY"),
voice_id="97f4b8fb-f2fe-444b-bb9a-c109783a857a",
api_key=os.getenv("CARTESIA_API_KEY", ""),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
messages = [
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
ultravox_processor,
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
@@ -94,6 +113,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
@@ -101,7 +123,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -30,30 +31,25 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -86,17 +82,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -28,30 +29,25 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -73,17 +69,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User respones
user_aggregator, # User respones
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,17 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response import LLMUserAggregatorParams
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.groq.llm import GroqLLMService
@@ -29,30 +29,25 @@ from pipecat.services.groq.tts import GroqTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -76,19 +71,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
context, user_params=LLMUserAggregatorParams(aggregation_timeout=0.05)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -8,13 +8,17 @@
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesAppendFrame, LLMRunFrame
from pipecat.frames.frames import LLMMessagesAppendFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frameworks.strands_agents import StrandsAgentsProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -23,6 +27,8 @@ from pipecat.services.aws.tts import AWSPollyTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
# Strands agent setup
try:
@@ -35,24 +41,20 @@ except ImportError:
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
@@ -71,9 +73,9 @@ def build_agent(model_id: str, max_tokens: int):
@tool
def check_weather(location: str) -> str:
if location.lower() == "san francisco":
return "The weather in San Francisco is sunny and 30 degrees."
return "The weather in San Francisco is sunny and 75 degrees."
elif location.lower() == "sydney":
return "The weather in Sydney is cloudy and 20 degrees."
return "The weather in Sydney is cloudy and 60 degrees."
else:
return "I'm not sure about the weather in that location."
@@ -114,17 +116,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Setup context aggregators for message handling
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # Speech-to-text
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # Strands Agents processor
tts, # Text-to-speech
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -8,16 +8,17 @@
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService
@@ -26,30 +27,25 @@ from pipecat.services.aws.tts import AWSPollyTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -79,17 +75,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -25,16 +25,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.llm import GoogleLLMService
@@ -43,13 +44,13 @@ from pipecat.services.google.tts import GoogleTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -57,8 +58,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
@@ -66,8 +65,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -89,6 +86,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash-image",
# model="gemini-3-pro-image-preview", # A more powerful model, but slower
)
messages = [
@@ -99,17 +97,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # Gemini TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.llm import GoogleLLMService
@@ -29,30 +30,25 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -106,17 +102,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # Gemini TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)
@@ -136,7 +140,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
messages.append(
{
"role": "system",
"content": "Hello! I'm your AI assistant. I can help you with a variety of tasks. What would you like to know?",
"content": "You are an AI assistant. You can help with a variety of tasks. Introduce yourself and ask the user what they would like to know.",
}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,49 +10,45 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.google.stt import GoogleSTTService
from pipecat.services.google.tts import GoogleHttpTTSService, GoogleTTSService
from pipecat.services.google.tts import GoogleHttpTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -75,8 +71,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
# turn on thinking if you want it
# params=GoogleLLMService.InputParams(extra={"thinking_config": {"thinking_budget": 4096}}),)
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
)
messages = [
@@ -87,17 +85,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User respones
user_aggregator, # User respones
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.llm import GoogleLLMService
@@ -29,30 +30,25 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -75,8 +71,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
# turn on thinking if you want it
# params=GoogleLLMService.InputParams(extra={"thinking_config": {"thinking_budget": 4096}}),)
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
)
messages = [
@@ -87,17 +85,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User respones
user_aggregator, # User respones
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -10,16 +10,17 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.assemblyai.stt import AssemblyAISTTService
@@ -28,31 +29,26 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
),
}
@@ -79,17 +75,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,9 +1,26 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Interruptible bot with Krisp VIVA noise filtering and turn detection.
This example demonstrates a conversational bot with:
- Krisp VIVA noise reduction on incoming audio
- Krisp VIVA Turn detection for natural interruptions
- Voice activity detection (VAD)
Required environment variables:
- KRISP_VIVA_FILTER_MODEL_PATH: Path to the Krisp noise filter model file (.kef)
- KRISP_VIVA_TURN_MODEL_PATH: Path to the Krisp turn detection model file (.kef)
- DEEPGRAM_API_KEY: Deepgram API key for STT/TTS
- OPENAI_API_KEY: OpenAI API key for LLM
Optional environment variables:
- KRISP_NOISE_SUPPRESSION_LEVEL: Noise suppression level 0-100 (default: 100)
Higher values = more aggressive noise reduction
"""
import os
@@ -11,16 +28,17 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -29,32 +47,27 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispVivaFilter(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispVivaFilter(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispVivaFilter(),
),
}
@@ -77,17 +90,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=KrispVivaTurn())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -11,16 +11,17 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.krisp_filter import KrispFilter
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -29,32 +30,27 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispFilter(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispFilter(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()),
audio_in_filter=KrispFilter(),
),
}
@@ -77,17 +73,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

Some files were not shown because too many files have changed in this diff Show More