Compare commits

...

250 Commits

Author SHA1 Message Date
kompfner
c861beb066 Fix await usage in transcription timeout task 2026-01-23 11:15:16 -05:00
Aleix Conchillo Flaqué
8951442b8e Merge pull request #3534 from pipecat-ai/aleix/claude-skills-pr-description
claude: add pr-description skill
2026-01-22 17:34:46 -08:00
Aleix Conchillo Flaqué
7e6e3031e7 claude: add pr-description skill 2026-01-22 13:41:50 -08:00
Aleix Conchillo Flaqué
308829f92b Merge pull request #3533 from pipecat-ai/aleix/claude-skills-docstring
claude: add docstring skill
2026-01-22 12:58:38 -08:00
Aleix Conchillo Flaqué
82a799e63e claude: add docstring skill 2026-01-22 12:53:38 -08:00
Cale Shapera
6b5bcae86f change default Inworld TTS model to inworld-tts-1.5-max (#3531) 2026-01-22 14:21:15 -05:00
Mark Backman
836073849c Merge pull request #3527 from weakcamel/patch-1
Update README.md - fix Google Imagen URL
2026-01-22 10:46:10 -05:00
Waldek Maleska
b13b65d6e2 Update README.md - fix Google Imagen URL 2026-01-22 15:17:41 +00:00
Mark Backman
3d545b718d Merge pull request #3344 from omChauhanDev/fix/stt-dynamic-language-update
fix: treat language as first-class STT setting
2026-01-22 09:21:56 -05:00
marcus-daily
f2fa5d9733 Updating changelog 2026-01-22 14:17:59 +00:00
marcus-daily
76b774072c Formatting fixes 2026-01-22 14:17:59 +00:00
marcus-daily
b6341ffaa5 Save Smart Turn input data if SMART_TURN_LOG_DATA is set 2026-01-22 14:17:59 +00:00
Mark Backman
29fae67c9e Merge pull request #3523 from omChauhanDev/add-location-support-google-tts
feat(google): add location parameter to TTS services
2026-01-22 09:12:16 -05:00
Mark Backman
718ea1c15e Merge pull request #3526 from pipecat-ai/mb/remove-logs
Remove application logs
2026-01-22 08:48:07 -05:00
Mark Backman
8e09d94614 Remove application logs 2026-01-22 08:28:52 -05:00
Aleix Conchillo Flaqué
de73e28563 Merge pull request #3510 from omChauhanDev/feat/add-reached-filter-methods
feat(task): add additive filter methods for frame monitoring
2026-01-21 21:05:33 -08:00
Aleix Conchillo Flaqué
55250b4f7e Merge pull request #3521 from pipecat-ai/aleix/claude-changelog-skill
claude: initial /changelog skill
2026-01-21 20:50:47 -08:00
Om Chauhan
281145a991 added changelog 2026-01-22 09:55:57 +05:30
Om Chauhan
7bd32e2fe5 feat(google): add location parameter to TTS services 2026-01-22 09:49:19 +05:30
James Hush
8f05d95f50 feat: add video_out_codec parameter for DailyTransport (#3520)
* feat: add video_out_codec parameter for DailyTransport

Add video_out_codec parameter to TransportParams allowing configuration
of the preferred video codec (VP8, H264, H265) for video output.

When set, this passes the preferredCodec option to Daily's
VideoPublishingSettings during the join operation.

* chore: move video_out_codec parameter to changelog folder (#3522)

* Initial plan

* Move video_out_codec parameter to changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

* Revert all CHANGELOG.md changes, keep only changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-01-22 11:31:07 +08:00
Om Chauhan
87c12f3098 changed frame filter storage type from tuples to sets 2026-01-22 08:43:46 +05:30
Om Chauhan
9c0bf89247 added changelog 2026-01-22 08:43:46 +05:30
Om Chauhan
6e44a2ab49 feat(task): add additive filter methods for frame monitoring 2026-01-22 08:43:46 +05:30
Aleix Conchillo Flaqué
7aa7b86aed claude: initial /changelog skill 2026-01-21 18:43:04 -08:00
Aleix Conchillo Flaqué
5ad9faeb4c Merge pull request #3519 from pipecat-ai/aleix/embedded-rtvi-processor
automatically add RTVI to the pipeline
2026-01-21 18:17:26 -08:00
Aleix Conchillo Flaqué
9e8f8b45c6 added changelog files for #3519 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
0ee11ad333 tests: disable RTVI in tests by default 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
124a3c35af RTVIObserver: don't handle some frames direction 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
054e504868 examples(foundational): remove RTVI (automatically added by PipelineTask) 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
e85a00cc0e PipelineTask: automatically add RTVI processor and RTVI observer
If `enable_rtvi` is enabled (enabled by default) and RTVI processor will be
added automatically to the pipeline. Also, and RTVI observer will be
registered.
2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
cc61cdbba3 RTVIProcessor: add create_rtvi_observer() 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
62f4708d43 transports: broadcast InputTransportMessageFrame frames 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
ba0ddb1832 FrameProcessor: copy kwargs when broadcasting frame 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
eacd2a4b71 FrameProcessor: add broadcast_frame_instance() 2026-01-21 18:14:17 -08:00
Mark Backman
7ed110650d Merge pull request #3516 from okue/minorpatch1
refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
2026-01-21 10:33:59 -05:00
okue
4a724379fc refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
The _bot_speaking flag does not need to be set in this method,
so the redundant assignment has been removed.
2026-01-21 23:59:15 +09:00
Aleix Conchillo Flaqué
768d3958dd Merge pull request #3512 from pipecat-ai/changelog-0.0.100
Release 0.0.100 - Changelog Update
2026-01-20 19:32:56 -08:00
aconchillo
5f9ff8bd58 Update changelog for version 0.0.100 2026-01-20 19:21:19 -08:00
Aleix Conchillo Flaqué
59ed422052 Merge pull request #3511 from pipecat-ai/aleix/camb-tts-client-on-start
CambTTSService: initialize client during StartFrame
2026-01-20 19:17:45 -08:00
Aleix Conchillo Flaqué
7e0ca113af CambTTSService: initialize client during StartFrame 2026-01-20 19:07:12 -08:00
Aleix Conchillo Flaqué
13c52e0e6d Merge pull request #3509 from pipecat-ai/aleix/nvidia-stt-tts-improvements
NVIDIA STT/TTS performance improvements
2026-01-20 16:39:12 -08:00
Aleix Conchillo Flaqué
a787fd9cd8 NVIDIATTSService: process incoming audio frame right away
Process audio as soon as we receive it from the generator. Previously, we were
reading from the generator and adding elements into a queue until there was no
more data, then we would process the queue.
2026-01-20 15:41:05 -08:00
Aleix Conchillo Flaqué
14495c425a NVIDIASTTService: no need for additional queue and task 2026-01-20 13:50:17 -08:00
Aleix Conchillo Flaqué
461bd0a2e0 update changelog for #3494 and #3499 2026-01-20 13:26:40 -08:00
Aleix Conchillo Flaqué
bd45ce2b4e Merge pull request #3499 from lukepayyapilli/fix/livekit-video-queue-memory-leak
fix(livekit): prevent memory leak when video_in_enabled is False
2026-01-20 13:21:21 -08:00
Aleix Conchillo Flaqué
a266644b06 Merge pull request #3494 from omChauhanDev/fix/uninterruptible-frame-handling
fix: preserve UninterruptibleFrames in __reset_process_queue
2026-01-20 13:19:40 -08:00
Mark Backman
03faadd7f9 Merge pull request #3508 from pipecat-ai/ss/log-daily-ids
Log Daily participant and meeting session IDs upon successful join in…
2026-01-20 15:43:48 -05:00
Aleix Conchillo Flaqué
bf43032652 Merge pull request #3504 from pipecat-ai/aleix/nvidia-stt-tts-error-handling
NVIDIA STT/TTS error handling
2026-01-20 09:41:08 -08:00
Sunah Suh
fa6f924b31 Log Daily participant and meeting session IDs upon successful join in Daily Transport 2026-01-20 11:31:17 -06:00
Aleix Conchillo Flaqué
a010a020fd add changelog fo 3504 2026-01-20 09:03:30 -08:00
Aleix Conchillo Flaqué
655006aff5 NvidiaSegmentedSTTService: simplify exception handling 2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
671dc8cd9b NvidiaSTTService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
9a718ded1e NvidiaTTSService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
024809b39a Merge pull request #3503 from pipecat-ai/aleix/ai-service-start-end-cancel
AIService: handle StartFrame/EndFrame/CancelFrame exceptions
2026-01-20 08:56:39 -08:00
Aleix Conchillo Flaqué
6cf0d53d00 AIService: handle StartFrame/EndFrame/CancelFrame exceptions
If AIService subclasses implement start()/stop()/cancel() and exception are not
handled, execution will not continue and therefore the originator frames will
not be pushed. This would cause the pipeline to not be started (i.e. StartFrame
would not be pushed downstream) or stopped properly.
2026-01-20 08:54:22 -08:00
kompfner
778dacc9a8 Merge pull request #3486 from pipecat-ai/pk/fix-nova-sonic-reset-conversation
Fix `AWSNovaSonicLLMService.reset_conversation()`
2026-01-20 10:07:38 -05:00
Paul Kompfner
06b3ecd2d6 In AWS Nova Sonic service, send the "interactive" user message (which triggers the bot response) only after sending the audio input start event, per the AWS team's recommendation 2026-01-20 09:56:25 -05:00
Paul Kompfner
b4d143e39b Add CHANGELOG for fixing AWSNovaSonicLLMService.reset_conversation() 2026-01-20 09:56:25 -05:00
Paul Kompfner
c89083e72e Improve 20e example to ask the bot to give a recap when loading a previous conversation from disk 2026-01-20 09:56:25 -05:00
Luke Payyapilli
1ac811ab32 chore: revert unrelated uv.lock changes 2026-01-20 09:19:43 -05:00
Luke Payyapilli
f6359d460e chore: install livekit as optional extra in CI instead of dev dep 2026-01-20 09:16:16 -05:00
Aleix Conchillo Flaqué
f03a7175c7 Merge pull request #3501 from pipecat-ai/aleix/improve-eval-numerical-word-prompt
scripts(eval): give examples to numerical word answers
2026-01-19 20:22:06 -08:00
Aleix Conchillo Flaqué
aed44c863a scripts(eval): give examples to numerical word answers
Some models need extra help.
2026-01-19 14:37:00 -08:00
Mark Backman
cddd6d5b0a Merge pull request #3492 from pipecat-ai/mb/remove-unused-imports
Remove unused imports
2026-01-19 14:07:16 -05:00
Mark Backman
11cf891ac8 Manual updates for unused imports 2026-01-19 14:03:22 -05:00
Luke Payyapilli
c89ae717fe style: fix ruff formatting 2026-01-19 11:13:41 -05:00
Luke Payyapilli
562bdd3084 test: add livekit to dev deps and improve test clarity 2026-01-19 11:11:54 -05:00
Mark Backman
cc4c3650e1 Merge pull request #3491 from pipecat-ai/mb/update-release-evals
Add Camb TTS to release evals
2026-01-19 11:04:05 -05:00
Luke Payyapilli
dfc1f09b77 fix(livekit): prevent memory leak when video_in_enabled is False 2026-01-19 11:00:23 -05:00
Filipi da Silva Fuchter
5fc46cc450 Merge pull request #3493 from omChauhanDev/fix/globally-unique-pc-id
fix: make SmallWebRTCConnection pc_id globally unique
2026-01-19 09:04:48 -05:00
Om Chauhan
4a9eb82f92 fix: preserve UninterruptibleFrames in __reset_process_queue 2026-01-18 20:39:13 +05:30
Om Chauhan
990d8386e4 fix: make SmallWebRTCConnection pc_id globally unique 2026-01-18 19:41:51 +05:30
Mark Backman
ce7d823770 Remove unused imports 2026-01-18 08:22:22 -05:00
Mark Backman
0b93c3f900 Add Camb TTS to release evals 2026-01-17 16:27:16 -05:00
Mark Backman
829c5f4604 Merge pull request #3169 from Incanta/hathora
Add Hathora STT and TTS services
2026-01-17 16:25:12 -05:00
Mike Seese
dc8ea615d9 add hathora to run-release-evals.py 2026-01-17 10:33:58 -08:00
Mike Seese
a3d206050d move hathora example as requested 2026-01-17 10:31:08 -08:00
Mike Seese
f48a567873 run the linter 2026-01-17 10:30:47 -08:00
Mark Backman
e69ccd8ea7 Merge pull request #3490 from pipecat-ai/mb/on-user-mute-events
Add on_user_mute_started and on_user_mute_stopped events
2026-01-17 11:05:15 -05:00
Mark Backman
11924bb980 Add on_user_mute_started and on_user_mute_stopped events 2026-01-17 11:01:46 -05:00
Mark Backman
af89154e96 Merge pull request #3489 from pipecat-ai/mb/fix-azure-tts-punctuation-spacing
fix: AzureTTSService punctuation spacing
2026-01-17 11:00:30 -05:00
Mark Backman
1485ea0831 Merge pull request #3488 from pipecat-ai/mb/on-user-turn-idle
Update on_user_idle to on_user_turn_idle
2026-01-17 11:00:16 -05:00
Mark Backman
e22bc777d8 Fix spacing for CJK languages 2026-01-17 09:04:50 -05:00
Mark Backman
043403fe23 fix: AzureTTSService punctuation spacing 2026-01-17 08:18:31 -05:00
Mark Backman
1e1160906e Update on_user_idle to on_user_turn_idle 2026-01-17 07:04:27 -05:00
Aleix Conchillo Flaqué
f7d3e63063 Merge pull request #3474 from pipecat-ai/fix/optional-member-access-function-call-cancel
Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
2026-01-16 22:06:45 -08:00
Paul Kompfner
6fa797c8e4 Fix AWS Nova Sonic reset_conversation(), which would previously error out.
Issues:
- After disconnecting, we were prematurely sending audio messages using the new prompt and content names, before the new prompt and content were created
- We weren't properly sending system instruction and conversation history messages to Nova Sonic with `"interactive": false`
2026-01-16 22:31:54 -05:00
Mark Backman
473d39791b Merge pull request #3482 from pipecat-ai/mb/user-idle-in-user-aggregator
Add UserIdleController, deprecate UserIdleProcessor
2026-01-16 18:47:10 -05:00
Aleix Conchillo Flaqué
2114abb8c6 add changelog file for 3484 2026-01-16 15:46:29 -08:00
Aleix Conchillo Flaqué
4fb4c26f55 Merge pull request #3484 from amichyrpi/main
Remove async_mode parameter from Mem0 storage
2026-01-16 15:44:52 -08:00
Mark Backman
2e8e574ea5 Add UserIdleController, deprecate UserIdleProcessor 2026-01-16 18:44:19 -05:00
Aleix Conchillo Flaqué
84c7e97be2 Merge pull request #3483 from pipecat-ai/aleix/throttle-user-speaking-frame
throttle user speaking frame
2026-01-16 15:29:37 -08:00
Amory Hen
a6e7c99d55 Remove async_mode parameter from Mem0 storage 2026-01-17 00:26:38 +01:00
Aleix Conchillo Flaqué
ac3fa7f91f BaseOuputTransport: minor cleanup 2026-01-16 15:15:49 -08:00
Aleix Conchillo Flaqué
6eadad53b2 BaseInputTransport: throttle UserSpeakingFrame 2026-01-16 15:15:49 -08:00
kompfner
b11150f31f Merge pull request #3480 from pipecat-ai/pk/fix-grok-realtime-smallwebrtc
Fix an issue where Grok Realtime would error out when running with Sm…
2026-01-16 15:46:27 -05:00
Paul Kompfner
836cf60611 Fix an issue where Grok Realtime would error out when running with SmallWebRTC transport.
The underlying issue was related to the fact that we were sending audio to Grok before we had configured the Grok session with our default input sample rate (16000), so Grok was interpreting those initial audio chunks as having its default sample rate (24000). We didn't see this issue when using the Daily transport simply because in our test environments Daily took a smidge longer than a reflexive (localhost) pure WebRTC connection, so we would only send audio to Grok *after* we had configured the Grok session with the desired sample rate.
2026-01-16 15:41:33 -05:00
James Hush
1c13ad95a5 Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
Extract dictionary value to local variable and check for None before
accessing cancel_on_interruption attribute, since the dictionary values
are typed as Optional[FunctionCallInProgressFrame].
2026-01-16 15:04:26 -05:00
Mark Backman
1e8516e91d Merge pull request #3476 from pipecat-ai/mb/project-urls
Update project.urls for PyPI
2026-01-16 14:57:39 -05:00
Mark Backman
32c775311d Merge pull request #3471 from pipecat-ai/mb/fix-pydantic-2.12-docs
Revert pydantic 2.12 extra type annotation
2026-01-16 14:57:24 -05:00
Mark Backman
28d0bb98de Merge pull request #3472 from pipecat-ai/mb/whisker-dev
Add whisker_setup.py setup file to .gitignore
2026-01-16 14:55:48 -05:00
Aleix Conchillo Flaqué
a9a9f3aeaa Merge pull request #3462 from pipecat-ai/aleix/fix-min-words-transcription-aggregation
MinWordsUserTurnStartStrategy: don't aggregate transcriptions
2026-01-16 11:18:23 -08:00
Aleix Conchillo Flaqué
c2a0735975 MinWordsUserTurnStartStrategy: don't aggregate transcriptions
If we aggregate transcriptions we will get incorrect interruptions. For example,
if we have a strategy with min_words=3 and we say "One" and pause, then "Two"
and pause and then "Three", this would trigger the start of the turn when it
shouldn't. We should only look at the incoming transcription text and don't
aggregate it with the previous.
2026-01-16 11:16:06 -08:00
Aleix Conchillo Flaqué
41cb53f6c2 Merge pull request #3479 from pipecat-ai/aleix/turns-mute-to-user-mute
turns: move mute to user_mute
2026-01-16 11:11:50 -08:00
Aleix Conchillo Flaqué
58552af8fd examples(foundational): remote STTMuteFilter example 2026-01-16 11:07:20 -08:00
Aleix Conchillo Flaqué
c7ab87b0cc turns: move mute to user_mute 2026-01-16 11:07:20 -08:00
Mark Backman
11ecc5fdee Update project.urls for PyPI 2026-01-16 12:48:13 -05:00
kompfner
19fb3eed9f Merge pull request #3466 from pipecat-ai/pk/fix-aws-nova-sonic-rtvi-bot-output
Fix realtime (speech-to-speech) services' RTVI event compatibility
2026-01-16 09:56:13 -05:00
Mark Backman
b292b32374 Merge pull request #3461 from glennpow/glenn/websocket-headers
Allow WebsocketClientTransport to send custom headers
2026-01-15 20:26:36 -05:00
Mark Backman
63d1393bb0 Add whisker_setup.py to .gitignore 2026-01-15 20:21:25 -05:00
Glenn Powell
37914cb062 Removed import and added changelog entry. 2026-01-15 16:47:15 -08:00
Mark Backman
ec40696854 Revert pydantic 2.12 extra type annotation 2026-01-15 19:16:15 -05:00
Mike Seese
2249f3d673 add requested changes from code review 2026-01-15 15:27:56 -08:00
Mike Seese
d2df324f29 fix some bugs after testing changes 2026-01-15 15:27:56 -08:00
Mike Seese
67fdb0b659 use parent _settings dict instead of self._params pattern 2026-01-15 15:27:56 -08:00
Mike Seese
e77bdf66f9 add can_generate_metrics functions 2026-01-15 15:27:56 -08:00
Mike Seese
1b3b67779c switch hathora services to use InputParams pattern 2026-01-15 15:27:55 -08:00
Mike Seese
6c7e386391 remove traced_stt from run_stt 2026-01-15 15:27:55 -08:00
Mike Seese
ba25b279d6 fix issues with PR suggestions 2026-01-15 15:27:55 -08:00
Mike Seese
e7c83c19b6 port turn_start_strategies to the newer user_turn_strategies 2026-01-15 15:27:55 -08:00
Mike Seese
7be7fb49a3 remove turn_analyzer args from transport params 2026-01-15 15:27:54 -08:00
Mike Seese
bcccb4cbb3 put fallback sample_rate value in function arg 2026-01-15 15:27:54 -08:00
Mike Seese
e9f1d951d3 Apply suggestions from code review
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-01-15 15:27:54 -08:00
Mike Seese
e5632a9339 transition Hathora service to use the unified API and apply PR feedback
add Hathora to root files

Hathora run linter

added hathora changelog
2026-01-15 15:27:53 -08:00
Mike Seese
1510fb4fc0 add Hathora STT and TTS services 2026-01-15 15:26:52 -08:00
Mark Backman
64a1ad2649 Merge pull request #3470 from pipecat-ai/mb/fix-docs-0.0.99
Docs fixes after 0.0.99
2026-01-15 17:34:44 -05:00
Mark Backman
4458ca1d24 Mock FastAPI 2026-01-15 17:29:47 -05:00
Mark Backman
21aaa48e62 Fix pydantic issues impacting autodoc 2026-01-15 17:29:47 -05:00
Mark Backman
e75c241030 Merge pull request #3468 from pipecat-ai/mb/camb-cleanuo
Clean up CambTTSService
2026-01-15 17:16:28 -05:00
Mark Backman
60216048a8 Docs fixes after 0.0.99 2026-01-15 16:40:42 -05:00
Mark Backman
f3c2e29fb4 Clean up CambTTSService 2026-01-15 15:59:17 -05:00
Paul Kompfner
ce99924be4 Add CHANGELOG entry describing fix for the missing "bot-llm-text" RTVI event when using realtime (speech-to-speech) services 2026-01-15 15:55:39 -05:00
Paul Kompfner
5de80a60d4 Fix "bot-llm-text" not firing when using Grok Realtime 2026-01-15 15:30:00 -05:00
Paul Kompfner
5753762350 Fix "bot-llm-text" not firing when using OpenAI Realtime 2026-01-15 15:16:08 -05:00
Paul Kompfner
885b318b04 Fix "bot-llm-text" not firing when using Gemini Live 2026-01-15 15:03:45 -05:00
Paul Kompfner
7a22d58cf4 Fix "bot-llm-text" not firing when using AWS Nova Sonic 2026-01-15 14:56:50 -05:00
Mark Backman
c8e4b462c9 Merge pull request #3460 from pipecat-ai/mb/reorder-07-examples
Renumber the 07 foundational examples
2026-01-15 14:44:21 -05:00
Mark Backman
30a3f42255 Merge pull request #3349 from eRuaro/feat/camb-tts-integration
Add Camb.ai TTS integration with MARS models
2026-01-15 14:43:12 -05:00
Neil Ruaro
26ddb2de2f minimal uv.lock update for camb-sdk 2026-01-16 03:18:01 +08:00
Neil Ruaro
f60eeaa212 reverted uv.lock, updated readthedocs.yaml, copyright year updates 2026-01-16 02:50:18 +08:00
Neil Ruaro
8cf72b36cb manually add camb-sdk to uv.lock, exclude camb from docs build 2026-01-16 02:26:38 +08:00
Neil Ruaro
38c3bcef96 exclude camb from docs build 2026-01-16 02:20:26 +08:00
Neil Ruaro
80604ba7b6 remove _update_settings method 2026-01-16 02:00:48 +08:00
Neil Ruaro
256c70c631 use UserTurnStrategies 2026-01-16 01:32:08 +08:00
Glenn Powell
0e3532c529 Allow WebsocketClientTransport to send custom headers 2026-01-15 09:31:48 -08:00
Neil Ruaro
9942fcfeb2 updated per PR reviews 2026-01-16 01:20:17 +08:00
Neil Ruaro
003c24ca6e Make model parameter explicit in docstring example 2026-01-16 01:18:37 +08:00
Neil Ruaro
ed120d014d Add model-specific sample rates, transport example, and fix audio buffer alignment 2026-01-16 01:18:37 +08:00
Neil Ruaro
e76a3d04f0 Update Camb TTS to 48kHz sample rate 2026-01-16 01:18:37 +08:00
Neil Ruaro
641d17007f Clean up Camb TTS service and tests 2026-01-16 01:18:37 +08:00
Neil Ruaro
9293b5f24a Migrate Camb TTS service from raw HTTP to official SDK
- Replace aiohttp with camb SDK (AsyncCambAI client)
- Add support for passing existing SDK client instance
- Simplify API: no longer requires aiohttp_session parameter
- Update example to use simplified initialization
- Rewrite tests to mock SDK client instead of HTTP servers
2026-01-16 01:18:37 +08:00
Neil Ruaro
c1f3cbd1d4 Yield TTSAudioRawFrame directly instead of calling private method 2026-01-16 01:18:37 +08:00
Neil Ruaro
78fa2ab65e Update default voice ID, fix MARS naming, and clean up example 2026-01-16 01:18:37 +08:00
Neil Ruaro
56da2caeed Update Camb.ai TTS inference options 2026-01-16 01:18:37 +08:00
Neil Ruaro
a541d65255 Update MARS model names to mars-flash, mars-pro, mars-instruct
Rename model identifiers from mars-8-* to the new naming convention:
- mars-8-flash -> mars-flash (default)
- mars-8 -> removed
- mars-8-instruct -> mars-instruct
- Added mars-pro
2026-01-16 01:18:37 +08:00
Neil Ruaro
a3d7e9eafe Address PR feedback: add --voice-id arg, remove test script
- Add --voice-id CLI argument to example (default: 2681)
- Remove test_camb_quick.py from examples/ (tests belong in tests/)
- Update docstring with new usage
2026-01-16 01:18:36 +08:00
Neil Ruaro
54933bea2a Rename changelog to PR number 2026-01-16 01:18:36 +08:00
Neil Ruaro
fcab9899cc Add changelog entry for Camb.ai TTS integration 2026-01-16 01:18:36 +08:00
Neil Ruaro
be098e85db Remove non-working Daily/WebRTC example
The Daily transport example had authentication issues. Keeping the
local audio example (07zb-interruptible-camb-local.py) which works.
2026-01-16 01:18:36 +08:00
Neil Ruaro
ed0ff46a87 added local test 2026-01-16 01:18:36 +08:00
Neil Ruaro
7ae0d651d6 added cambai tts integration 2026-01-16 01:18:36 +08:00
Mark Backman
efd4432cfb Renumber the 07 foundational examples 2026-01-15 10:26:17 -05:00
kompfner
24082b84f2 Merge pull request #3453 from pipecat-ai/pk/consistency-pass-on-user-started-stopped-speaking-frames
Do a consistency pass on how we're sending `UserStartedSpeakingFrame`…
2026-01-15 09:24:14 -05:00
Aleix Conchillo Flaqué
dcd5840341 Merge pull request #3455 from pipecat-ai/aleix/reset-user-turn-start-strategies
UserTurnController: reset user turn start strategies when turn triggered
2026-01-14 19:28:32 -08:00
Aleix Conchillo Flaqué
9e705ce768 UserTurnController: reset user turn start strategies when turn triggered 2026-01-14 18:20:29 -08:00
Mark Backman
965466cc09 Merge pull request #3454 from pipecat-ai/mb/external-turn-strategies-timeout
fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrat…
2026-01-14 20:15:31 -05:00
Mark Backman
f3993f1775 fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrategies 2026-01-14 20:10:56 -05:00
Paul Kompfner
e107902b14 Do a consistency pass on how we're sending UserStartedSpeakingFrames and UserStoppedSpeakingFrames. The codebase is now consistent in broadcasting both types of frames up and downstream. 2026-01-14 18:47:15 -05:00
kompfner
e7b5ff49f4 Merge pull request #3447 from pipecat-ai/pk/add-pr-3420-to-changelog
Add PR 3420 to CHANGELOG (it was missing)
2026-01-14 15:33:44 -05:00
Paul Kompfner
e33172c44e Add PR 3420 to CHANGELOG (it was missing) 2026-01-14 15:33:07 -05:00
Mark Backman
3d858e8aa6 Merge pull request #3444 from pipecat-ai/mb/update-quickstart-0.0.99
Update quickstart example for 0.0.99
2026-01-14 10:29:55 -05:00
Mark Backman
eab059c49a Merge pull request #3446 from pipecat-ai/mb/add-3392-changelog
Add PR 3392 to changelog, linting cleanup
2026-01-14 10:28:57 -05:00
Mark Backman
4aaff04fb3 Add PR 3392 to changelog, linting cleanup 2026-01-14 09:43:17 -05:00
Mark Backman
cb364f3cab Update quickstart example for 0.0.99 2026-01-14 08:59:20 -05:00
Mark Backman
a9bfb090c3 Merge pull request #3287 from ashotbagh/feature/asyncai-multicontext-wss
Fix TTFB metric and add multi-context WebSocket support for Async TTS
2026-01-14 07:52:52 -05:00
Ashot
c4ae4025f3 Adjustments of Async TTS for multicontext websocket support 2026-01-14 16:33:30 +04:00
Ashot
15067c678d adapt Async TTS to updated AudioContextTTSService 2026-01-14 15:45:27 +04:00
Ashot
5ae592f38e Improve Async TTS interruption handling by using AudioContextTTSService class and add changelog fragments 2026-01-14 15:45:27 +04:00
Ashot
9cdbc56be3 Fix TTFB metric and add multi-context WebSocket support for Async TTS 2026-01-14 15:45:27 +04:00
Aleix Conchillo Flaqué
86ed485711 Merge pull request #3440 from pipecat-ai/changelog-0.0.99
Release 0.0.99 - Changelog Update
2026-01-13 17:02:41 -08:00
Aleix Conchillo Flaqué
7e1b4a4e90 update cosmetic changelog updates for 0.0.99 2026-01-13 16:59:46 -08:00
aconchillo
4531d517da Update changelog for version 0.0.99 2026-01-14 00:49:15 +00:00
Aleix Conchillo Flaqué
6fd5847f84 Merge pull request #3439 from pipecat-ai/aleix/uv-lock-2026-01-13
uv.lock: upgrade to latest versions
2026-01-13 16:48:07 -08:00
Aleix Conchillo Flaqué
2015eba9b2 uv.lock: upgrade to latest versions 2026-01-13 16:45:44 -08:00
Mark Backman
84f16ee895 Merge pull request #3438 from pipecat-ai/mb/fix-26a
Fix 26a foundational
2026-01-13 19:43:50 -05:00
Aleix Conchillo Flaqué
5b2af03b16 Merge pull request #3437 from pipecat-ai/aleix/update-aggregator-logs
LLMContextAggregatorPair: make strategy logs less verbose
2026-01-13 16:39:29 -08:00
Mark Backman
b313395dc3 Fix 26a foundational 2026-01-13 19:31:24 -05:00
Aleix Conchillo Flaqué
0d6bdbee10 LLMContextAggregatorPair: make strategy logs less verbose 2026-01-13 15:11:22 -08:00
Aleix Conchillo Flaqué
248dac3a9d Merge pull request #3420 from pipecat-ai/pk/fix-gemini-3-parallel-function-calls
Fix parallel function calling with Gemini 3.
2026-01-13 14:40:33 -08:00
Paul Kompfner
be49a54856 Fast-exit in the fix for parallel function calling with Gemini 3, if we can determine up-front that there's no work to do 2026-01-13 17:32:20 -05:00
Aleix Conchillo Flaqué
bd9ee0d646 Merge pull request #3434 from pipecat-ai/aleix/context-appregator-pair-tuple
context aggregator pair tuple
2026-01-13 14:12:51 -08:00
Mark Backman
442e0e582d Merge pull request #3431 from pipecat-ai/mb/update-realtime-examples-transcript-handler
Update GeminiLiveLLMService to push thought frames, update 26a for new transcript events
2026-01-13 17:10:40 -05:00
kompfner
38194c0cff Merge pull request #3436 from pipecat-ai/pk/remove-transcript-processor-reference
Remove dead import of `TranscriptProcessor` (which is now deprecated)
2026-01-13 17:06:17 -05:00
Paul Kompfner
0ebdaba03c Remove dead import of TranscriptProcessor (which is now deprecated) 2026-01-13 17:02:57 -05:00
Aleix Conchillo Flaqué
ee82377d68 examples: fix 22d to push some CancelFrame and EndFrame 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
861588e4a3 examples: update all examples to use the new LLMContextAggregatorPair tuple 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
1ab3bf2ef6 LLMContextAggregatorPair: instances can now return a tuple 2026-01-13 14:01:53 -08:00
Mark Backman
bb00d223c9 Update 26a to use context aggregator transcription events 2026-01-13 17:01:10 -05:00
Aleix Conchillo Flaqué
86fbfaddd1 Merge pull request #3435 from pipecat-ai/aleix/fix-llm-context-create-audio-message
LLMContext: fix create_audio_message
2026-01-13 13:59:28 -08:00
Aleix Conchillo Flaqué
5612bf513b LLMContext: fix create_audio_message 2026-01-13 13:53:34 -08:00
Mark Backman
87d0dc9e24 Merge pull request #3412 from pipecat-ai/mb/remove-41a-b
Remove foundational examples 41a and 41b
2026-01-13 16:45:26 -05:00
Paul Kompfner
30fbcfbf71 Rework fix for parallel function calling with Gemini 3 2026-01-13 16:33:59 -05:00
Mark Backman
5d90f4ea06 Merge pull request #3428 from pipecat-ai/mb/fix-tracing-none-values
Fix TTS, realtime LLM services could return unknown for model_name
2026-01-13 15:40:10 -05:00
kompfner
f6d09e1574 Merge pull request #3430 from pipecat-ai/pk/request-image-frame-fixes
Fix request_image_frame and usage
2026-01-13 15:36:44 -05:00
Mark Backman
b8e48dee7f Merge pull request #3433 from pipecat-ai/mb/port-realtime-examples-transcript-events
Update examples to use transcription events from context aggregators
2026-01-13 15:36:06 -05:00
Mark Backman
a6ccb9ec69 Merge pull request #3427 from pipecat-ai/mb/add-07j-gladia-vad-example
Add 07j Gladia VAD foundational example, add to release evals
2026-01-13 15:35:24 -05:00
Mark Backman
66551ebdf5 Merge pull request #3426 from pipecat-ai/mb/changelog-3404
Add changelog fragments for PR 3404
2026-01-13 15:34:58 -05:00
Aleix Conchillo Flaqué
21534f7d83 added changelog file for #3430 2026-01-13 12:21:22 -08:00
Mark Backman
d591f9e108 Remove 28-transcription-processor.py 2026-01-13 15:20:59 -05:00
Mark Backman
aa2589d3be Update examples to use transcription events from context aggregators 2026-01-13 15:19:47 -05:00
Aleix Conchillo Flaqué
9d6067fa78 examples(foundational): speak "Let me check on that" in 14d examples 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
027e54425a examples(foundational): associate image requests to function calls 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
e268c73c41 LLMAssistantAggregator: cache function call requested images 2026-01-13 12:10:08 -08:00
Aleix Conchillo Flaqué
d3c57e2da0 UserImageRawFrame: don't deprecate request field 2026-01-13 11:56:13 -08:00
Aleix Conchillo Flaqué
02eace5a16 UserImageRequestFrame: don't deprecate function call related fields 2026-01-13 11:55:55 -08:00
Mark Backman
15bc1dd999 Update GeminiLiveLLMService to push Thought frames when thought content is returned 2026-01-13 14:13:00 -05:00
Paul Kompfner
b937956dc8 Fix request_image_frame and usage 2026-01-13 13:23:01 -05:00
Mark Backman
efbc0c8510 Fix TTS, realtime LLM services could return unknown for model_name 2026-01-13 12:12:15 -05:00
Himanshu Gunwant
d0f227189c fix: openai llm model name is unknown (#3422) 2026-01-13 11:55:52 -05:00
Mark Backman
41eef5efc4 Add 07j Gladia VAD foundational example, add to release evals 2026-01-13 11:36:15 -05:00
Mark Backman
f00f9d9f1a Add changelog fragments for PR 3404 2026-01-13 11:29:17 -05:00
Mark Backman
ae59b3ba36 Merge pull request #3404 from poseneror/feature/gladia-vad-events
feat(gladia): add VAD events support
2026-01-13 11:26:56 -05:00
Paul Kompfner
6668712f7b Add evals for parallel function calling 2026-01-13 11:03:38 -05:00
Paul Kompfner
8812686b17 Fix parallel function calling with Gemini 3.
Gemini expects parallel function calls to be passed in as a single multi-part `Content` block. This is important because only one of the function calls in a batch of parallel function calls gets a thought signature—if they're passed in as separate `Content` blocks, there'd be one or more missing thought signatures, which would result in a Gemini error.
2026-01-13 11:03:38 -05:00
kompfner
8b0f0b5bb4 Merge pull request #3425 from pipecat-ai/pk/gemini-3-flash-new-thinking-levels
Add Gemini 3 Flash-specific thinking levels
2026-01-13 11:02:53 -05:00
Paul Kompfner
f5e8a04e3b Bump aiortc dependency, which relaxes the constraint on av, which was pinned to 14.4.0, which no longer has all necessary wheels 2026-01-13 10:50:08 -05:00
Mark Backman
a298ce3b41 Merge pull request #3424 from pipecat-ai/mb/tts-append-trailing-space
Add append_trailing_space to TTSService to prevent vocalizing trailin…
2026-01-13 10:42:40 -05:00
Mark Backman
31daa889e8 Add append_trailing_space to TTSService to prevent vocalizing trailing punctuation; update DeepgramTTSService and RimeTTSService to use the arg 2026-01-13 10:38:54 -05:00
Paul Kompfner
76a058178e Add Gemini 3 Flash-specific thinking levels 2026-01-13 09:50:59 -05:00
poseneror
3304b18ac2 Add should_interrupt + broadcast user events 2026-01-13 14:27:35 +02:00
poseneror
b95a6afe77 feat(gladia): add VAD events support
Add support for Gladia's speech_start/speech_end events to emit
UserStartedSpeakingFrame and UserStoppedSpeakingFrame frames.

When enable_vad=True in GladiaInputParams:
- speech_start triggers interruption and pushes UserStartedSpeakingFrame
- speech_end pushes UserStoppedSpeakingFrame
- Tracks speaking state to prevent duplicate events

This allows using Gladia's built-in VAD instead of a separate VAD
in the pipeline.
2026-01-13 14:27:35 +02:00
Mark Backman
f6ed7d7582 Merge pull request #3418 from pipecat-ai/mb/speechmatics-task-cleanup 2026-01-12 19:24:56 -05:00
Mark Backman
cd3290df1c Small cleanup for task creation in SpeechmaticsSTTService 2026-01-12 16:00:32 -05:00
Mark Backman
2296caf529 Merge pull request #3414 from pipecat-ai/mb/changelog-3410
Update changelog for PR 3410.changed.md
2026-01-12 13:43:42 -05:00
Mark Backman
90ded6658d Merge pull request #3403 from pipecat-ai/mb/inworld-tts-add-keepalive
InworldTTSService: Add keepalive task
2026-01-12 13:31:24 -05:00
Mark Backman
7e97fb80a5 Merge pull request #3392 from pipecat-ai/mb/websocket-service-connection-closed-error
Add reconnect logic to WebsocketService in the event of ConnectionClo…
2026-01-12 13:11:43 -05:00
Mark Backman
b58471fdb1 Add Exotel and Vonage to Serializers in README services list 2026-01-12 12:24:56 -05:00
Aleix Conchillo Flaqué
46b4f9f29b Merge pull request #3413 from pipecat-ai/aleix/fix-assistant-thought-aggregation
LLMAssistantAggregator: reset aggregation after adding the thought, not before
2026-01-12 09:21:42 -08:00
Aleix Conchillo Flaqué
ec20d72aba LLMAssistantAggregator: reset aggregation after adding the thought, not before 2026-01-12 09:18:13 -08:00
Mark Backman
5743e2a99b Update changelog for PR 3410.changed.md 2026-01-12 12:15:40 -05:00
Mark Backman
2f429a2e76 Merge pull request #3410 from Vonage/feat/fastapi-ws-vonage-serializer
feat: update FastAPI WebSocket transport and add Vonage serializer
2026-01-12 12:10:57 -05:00
Varun Pratap Singh
3e982f7a4a refactor: rename audio_packet_bytes to fixed_audio_packet_size 2026-01-12 22:11:39 +05:30
Mark Backman
89484e281d Remove foundational examples 41a and 41b 2026-01-12 10:11:58 -05:00
Varun Pratap Singh
14a115f372 changelog: add fragments for PR #3410 2026-01-12 18:12:27 +05:30
Varun Pratap Singh
e96595fe59 feat: update FastAPI WebSocket transport and add Vonage serializer 2026-01-12 17:50:38 +05:30
Mark Backman
f58d21862b WebsocketService: Add _maybe_try_reconnect and use for exception cases 2026-01-11 16:43:37 -05:00
Mark Backman
aac24ad2d4 InworldTTSService: Add keepalive task 2026-01-10 11:20:20 -05:00
Mark Backman
9c81acb159 Track websocket disconnecting status to improve error handling 2026-01-09 20:24:07 -05:00
Mark Backman
4fe0836cf9 Add reconnect logic to WebsocketService in the event of ConnectionClosedError 2026-01-09 09:03:01 -05:00
Om Chauhan
1ceb01665f fix: treat language as first-class STT setting 2026-01-04 11:04:30 +05:30
380 changed files with 8651 additions and 4503 deletions

View File

@@ -0,0 +1,40 @@
---
name: changelog
description: Create changelog files for important commits in a PR
---
Create changelog files for the important commits in this PR. The PR number is provided as an argument.
## Instructions
1. First, check what commits are on the current branch compared to main:
```
git log main..HEAD --oneline
```
2. For each significant change, create a changelog file in the `changelog/` folder using the format:
- `{PR_NUMBER}.added.md` - for new features
- `{PR_NUMBER}.added.2.md`, `{PR_NUMBER}.added.3.md` - for additional new features
- `{PR_NUMBER}.changed.md` - for changes to existing functionality
- `{PR_NUMBER}.fixed.md` - for bug fixes
- `{PR_NUMBER}.deprecated.md` - for deprecations
3. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
4. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.
5. Use ⚠️ emoji prefix for breaking changes.
## Example
For PR #3519 with a new feature and a bug fix:
`changelog/3519.added.md`:
```
- Added `SomeNewFeature` for doing something useful.
```
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly.
```

View File

@@ -0,0 +1,257 @@
---
name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
## Instructions
1. First, find the class in the codebase:
```
Search for "class ClassName" in src/pipecat/
```
2. If multiple files contain that class name:
- List all matches with their file paths
- Ask the user which one they want to document
- Wait for confirmation before proceeding
3. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component
4. Apply documentation in this order:
- Module docstring (at top, after imports)
- Class docstrings
- `__init__` methods (always document constructor parameters)
- Public methods (not starting with `_`)
- Dataclass/config classes with field descriptions
5. Skip documentation for:
- Private methods (starting with `_`)
- Simple dunder methods (`__str__`, `__repr__`, `__post_init__`)
- Very simple pass-through properties
- **Already documented code** - If a class, method, or function already has a complete docstring that follows the project style, do not modify it. A docstring is complete if it has:
- A one-line summary
- Args section (if it has parameters)
- Returns section (if it returns something meaningful)
- Only add or improve documentation where it is missing or incomplete
## Module Docstring Format
```python
"""[One-line description of module purpose].
[Optional: Longer explanation of functionality, key classes, or use cases.]
"""
```
Example:
```python
"""Neuphonic text-to-speech service implementations.
This module provides WebSocket and HTTP-based integrations with Neuphonic's
text-to-speech API for real-time audio synthesis.
"""
```
## Class Docstring Format
```python
class ClassName:
"""One-line summary describing what the class does.
[Longer description explaining purpose, behavior, and key features.
Use action-oriented language.]
[Optional: Event handlers, usage notes, or important caveats.]
"""
```
Example:
```python
class FrameProcessor(BaseObject):
"""Base class for all frame processors in the pipeline.
Frame processors are the building blocks of Pipecat pipelines, they can be
linked to form complex processing pipelines. They receive frames, process
them, and pass them to the next or previous processor in the chain.
Event handlers available:
- on_before_process_frame: Called before a frame is processed
- on_after_process_frame: Called after a frame is processed
Example::
@processor.event_handler("on_before_process_frame")
async def on_before_process_frame(processor, frame):
...
@processor.event_handler("on_after_process_frame")
async def on_after_process_frame(processor, frame):
...
"""
```
Note: When listing event handlers, do NOT use backticks. Include an `Example::` section (with double colon for Sphinx) showing the decorator pattern and function signature for each event.
## Constructor (`__init__`) Format
```python
def __init__(self, *, param1: Type, param2: Type = default, **kwargs):
"""Initialize the [ClassName].
Args:
param1: Description of param1 and its purpose.
param2: Description of param2. Defaults to [default].
**kwargs: Additional arguments passed to parent class.
"""
```
Example:
```python
def __init__(
self,
*,
api_key: str,
voice_id: Optional[str] = None,
sample_rate: Optional[int] = 22050,
**kwargs,
):
"""Initialize the Neuphonic TTS service.
Args:
api_key: Neuphonic API key for authentication.
voice_id: ID of the voice to use for synthesis.
sample_rate: Audio sample rate in Hz. Defaults to 22050.
**kwargs: Additional arguments passed to parent InterruptibleTTSService.
"""
```
## Method Docstring Format
```python
async def method_name(self, param1: Type) -> ReturnType:
"""One-line summary of what method does.
[Longer description if behavior isn't obvious.]
Args:
param1: Description of param1.
Returns:
Description of return value.
Raises:
ExceptionType: When this exception is raised.
"""
```
Example:
```python
async def put(self, item: Tuple[Frame, FrameDirection, FrameCallback]):
"""Put an item into the priority queue.
System frames (`SystemFrame`) have higher priority than any other
frames. If a non-frame item is provided it will have the highest priority.
Args:
item: The item to enqueue.
"""
```
## Dataclass/Config Format
```python
@dataclass
class ConfigName:
"""One-line description of configuration.
[Explanation of when/how to use this config.]
Parameters:
field1: Description of field1.
field2: Description of field2. Defaults to [default].
"""
field1: Type
field2: Type = default_value
```
Example:
```python
@dataclass
class FrameProcessorSetup:
"""Configuration parameters for frame processor initialization.
Parameters:
clock: The clock instance for timing operations.
task_manager: The task manager for handling async operations.
observer: Optional observer for monitoring frame processing events.
"""
clock: BaseClock
task_manager: BaseTaskManager
observer: Optional[BaseObserver] = None
```
## Enum Documentation Format
```python
class EnumName(Enum):
"""One-line description of the enum purpose.
[Longer description of how the enum is used.]
Parameters:
VALUE1: Description of VALUE1.
VALUE2: Description of VALUE2.
"""
VALUE1 = 1
VALUE2 = 2
```
## Writing Style Guidelines
- **Concise and professional** - No casual language or filler words
- **Action-oriented** - Start with verbs: "Processes...", "Manages...", "Converts..."
- **Purpose before implementation** - Explain WHY before HOW
- **Clear parameter descriptions** - Include type hints, defaults, and purpose
- **No redundant type info** - Type hints are in the signature, don't repeat in description
- **Use backticks for code references** - Wrap class names, method names, event names, parameter names, and code snippets in backticks
Good: "Neuphonic API key for authentication."
Bad: "str: The API key (string) that is used for authenticating with Neuphonic."
Good: "Triggers `on_speech_started` when the `VADAnalyzer` detects speech."
Bad: "Triggers on_speech_started when the VADAnalyzer detects speech."
## Deprecation Notice Format
When documenting deprecated code:
```python
"""[Description].
.. deprecated:: X.X.X
`ClassName` is deprecated and will be removed in a future version.
Use `NewClassName` instead.
"""
```
## Checklist
Before finishing, verify:
- [ ] Module has a docstring at the top (after copyright header and imports)
- [ ] All public classes have docstrings
- [ ] All `__init__` methods document their parameters
- [ ] All public methods have docstrings with Args/Returns/Raises as needed
- [ ] Dataclasses use "Parameters:" section for field descriptions
- [ ] Enums document each value in "Parameters:" section
- [ ] Writing is concise and action-oriented
- [ ] No documentation added to private methods (starting with `_`)
- [ ] Existing complete docstrings were left unchanged

View File

@@ -0,0 +1,128 @@
---
name: pr-description
description: Update a GitHub PR description with a summary of changes
---
Update a GitHub pull request description based on the changes in the PR.
## Arguments
```
/pr-description <PR_NUMBER> [--fixes <ISSUE_NUMBERS>]
```
- `PR_NUMBER` (required): The pull request number to update
- `--fixes` (optional): Comma-separated issue numbers that this PR fixes (e.g., `--fixes 123,456`)
Examples:
- `/pr-description 3534`
- `/pr-description 3534 --fixes 123`
- `/pr-description 3534 --fixes 123,456,789`
## Instructions
1. First, gather information about the PR:
- Use GitHub plugin to get PR details (title, current description, base branch)
- Use local git to get commits: `git log main..HEAD --oneline`
- Use local git to get the diff: `git diff main..HEAD`
- Parse any `--fixes` argument for issue numbers
2. Check the existing PR description:
- If it already has a complete, accurate description that reflects the changes, do nothing
- If it's missing sections, incomplete, or outdated compared to the actual changes, proceed to update
- If it only has the template placeholder text, generate a full description
3. Analyze the changes:
- Understand the purpose of each commit
- Identify any breaking changes (API changes, removed features, behavior changes)
- Look for new features, bug fixes, refactoring, or documentation changes
- Collect issue numbers from:
- The `--fixes` argument (if provided)
- Commit messages (patterns like "Fixes #123", "Closes #456", "Resolves #789")
4. Generate or update the PR description with these sections:
## PR Description Format
### Summary (always include)
Brief bullet points describing what changed and why. Focus on the *purpose* and *impact*, not implementation details.
```markdown
## Summary
- Added X to enable Y
- Fixed bug where Z would happen
- Refactored W for better maintainability
```
### Breaking Changes (include only if applicable)
Document any changes that affect existing users or APIs.
```markdown
## Breaking Changes
- `ClassName.method()` now requires a `param` argument
- Removed deprecated `old_function()` - use `new_function()` instead
```
### Testing (include when non-obvious)
How to verify the changes work. Skip for trivial changes.
```markdown
## Testing
- Run `uv run pytest tests/test_feature.py` to verify the fix
- Example usage: `uv run examples/new_feature.py`
```
### Fixes (include if issues are provided or found in commits)
List issues this PR fixes. GitHub will automatically close these issues when the PR is merged.
```markdown
## Fixes
- Fixes #123
- Fixes #456
```
Note: Use "Fixes #X" format (not "Closes" or "Resolves") for consistency. Each issue should be on its own line with "Fixes" to ensure GitHub auto-closes them.
## Guidelines
- **Be concise** - Reviewers should understand the PR in 30 seconds
- **Focus on why** - The diff shows *what* changed, explain *why*
- **Skip empty sections** - Only include sections that have content
- **Use bullet points** - Easier to scan than paragraphs
- **Don't duplicate the diff** - Avoid listing every file or line changed
## Example Output
```markdown
## Summary
- Added `/docstring` skill for documenting Python modules with Google-style docstrings
- Skill finds classes by name and handles conflicts when multiple matches exist
- Skips already-documented code to avoid unnecessary changes
## Testing
/docstring ClassName
## Fixes
- Fixes #123
```
## Checklist
Before updating the PR:
- [ ] Verified existing description needs updating (not already complete)
- [ ] Summary accurately reflects the changes
- [ ] Breaking changes are clearly documented (if any)
- [ ] No unnecessary sections included
- [ ] Description is concise and scannable

View File

@@ -33,7 +33,7 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra livekit --extra websocket
- name: Run tests with coverage
run: |

View File

@@ -37,7 +37,7 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra livekit --extra websocket
- name: Test with pytest
run: |

16
.gitignore vendored
View File

@@ -4,7 +4,14 @@ __pycache__/
*~
venv
.venv
/.idea
.idea
.gradle
.next
next-env.d.ts
local.properties
*.log
*.lock
smart_turn_audio_log
#*#
# Distribution / Packaging
@@ -27,7 +34,7 @@ share/python-wheels/
*.egg
MANIFEST
.DS_Store
.env
.env*
fly.toml
# Examples
@@ -51,4 +58,7 @@ docs/api/_build/
docs/api/api
# uv
.python-version
.python-version
# Pipecat
whisker_setup.py

View File

@@ -7,6 +7,664 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [0.0.100] - 2026-01-20
### Added
- Added Hathora service to support Hathora-hosted TTS and STT models (only
non-streaming)
(PR [#3169](https://github.com/pipecat-ai/pipecat/pull/3169))
- Added `CambTTSService`, using Camb.ai's TTS integration with MARS models
(mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech
synthesis.
(PR [#3349](https://github.com/pipecat-ai/pipecat/pull/3349))
- Added the `additional_headers` param to `WebsocketClientParams`, allowing
`WebsocketClientTransport` to send custom headers on connect, for cases such
as authentication.
(PR [#3461](https://github.com/pipecat-ai/pipecat/pull/3461))
- Added `UserIdleController` for detecting user idle state, integrated into
`LLMUserAggregator` and `UserTurnProcessor` via optional `user_idle_timeout`
parameter. Emits `on_user_turn_idle` event for application-level handling.
Deprecated `UserIdleProcessor` in favor of the new compositional approach.
(PR [#3482](https://github.com/pipecat-ai/pipecat/pull/3482))
- Added `on_user_mute_started` and `on_user_mute_stopped` event handlers to
`LLMUserAggregator` for tracking user mute state changes.
(PR [#3490](https://github.com/pipecat-ai/pipecat/pull/3490))
### Changed
- Enhanced interruption handling in `AsyncAITTSService` by supporting
multi-context WebSocket sessions for more robust context management.
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
- Throttle `UserSpeakingFrame` to broadcast at most every 200ms instead of on
every audio chunk, reducing frame processing overhead during user speech.
(PR [#3483](https://github.com/pipecat-ai/pipecat/pull/3483))
### Deprecated
- For consistency with other package names, we just deprecated
`pipecat.turns.mute` (introduced in Pipecat 0.0.99) in favor of
`pipecat.turns.user_mute`.
(PR [#3479](https://github.com/pipecat-ai/pipecat/pull/3479))
### Fixed
- Corrected TTFB metric calculation in `AsyncAIHttpTTSService`.
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
- Fixed an issue where the "bot-llm-text" RTVI event would not fire for
realtime (speech-to-speech) services:
- `AWSNovaSonicLLMService`
- `GeminiLiveLLMService`
- `OpenAIRealtimeLLMService`
- `GrokRealtimeLLMService`
The issue was that these services weren't pushing `LLMTextFrame`s. Now
they do.
(PR [#3446](https://github.com/pipecat-ai/pipecat/pull/3446))
- Fixed an issue where `on_user_turn_stop_timeout` could fire while a user is
talking when using `ExternalUserTurnStrategies`.
(PR [#3454](https://github.com/pipecat-ai/pipecat/pull/3454))
- Fixed an issue where user turn start strategies were not being reset after a
user turn started, causing incorrect strategy behavior.
(PR [#3455](https://github.com/pipecat-ai/pipecat/pull/3455))
- Fixed `MinWordsUserTurnStartStrategy` to not aggregate transcriptions,
preventing incorrect turn starts when words are spoken with pauses between
them.
(PR [#3462](https://github.com/pipecat-ai/pipecat/pull/3462))
- Fixed an issue where Grok Realtime would error out when running with
SmallWebRTC transport.
(PR [#3480](https://github.com/pipecat-ai/pipecat/pull/3480))
- Fixed a `Mem0MemoryService` issue where passing `async_mode: true` was
causing an error. See
https://docs.mem0.ai/platform/features/async-mode-default-change.
(PR [#3484](https://github.com/pipecat-ai/pipecat/pull/3484))
- Fixed `AWSNovaSonicLLMService.reset_conversation()`, which would previously
error out. Now it successfully reconnects and "rehydrates" from the context
object.
(PR [#3486](https://github.com/pipecat-ai/pipecat/pull/3486))
- Fixed `AzureTTSService` transcript formatting issues:
- Punctuation now appears without extra spaces (e.g., "Hello!" instead of
"Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces
between characters
(PR [#3489](https://github.com/pipecat-ai/pipecat/pull/3489))
- Fixed an issue where `UninterruptibleFrame` frames would not be preserved in
some cases.
(PR [#3494](https://github.com/pipecat-ai/pipecat/pull/3494))
- Fixed memory leak in `LiveKitTransport` when `video_in_enabled` is `False`.
(PR [#3499](https://github.com/pipecat-ai/pipecat/pull/3499))
- Fixed an issue in `AIService` where unhandled exceptions in `start()`,
`stop()`, or `cancel()` implementations would prevent `process_frame()` to
continue and therefore `StartFrame`, `EndFrame`, or `CancelFrame` from being
pushed downstream, causing the pipeline to not start or stop properly.
(PR [#3503](https://github.com/pipecat-ai/pipecat/pull/3503))
- Moved `NVIDIATTSService` and `NVIDIASTTService` client initialization from
constructor to `start()` for better error handling.
(PR [#3504](https://github.com/pipecat-ai/pipecat/pull/3504))
- Optimized `NVIDIATTSService` to process incoming audio frames immediately.
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
- Optimized `NVIDIASTTService` by removing unnecessary queue and task.
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
- Fixed a `CambTTSService` issue where client was being initialized in the
constructor which wouldn't allow for proper Pipeline error handling.
(PR [#3511](https://github.com/pipecat-ai/pipecat/pull/3511))
## [0.0.99] - 2026-01-13
### Added
- Introducing user turn strategies. User turn strategies indicate when the user
turn starts or stops. In conversational agents, these are often referred to
as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g.
using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using
an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are
evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategy
Available user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategy
The default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]
Turn strategies are configured when setting up `LLMContextAggregatorPair`.
For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
)
],
)
),
)
```
In order to use the user turn strategies you must update to the new
universal `LLMContext` and `LLMContextAggregatorPair`.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural
network via pyrnnoise library.
(PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205))
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time
voice conversations:
- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz)
(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))
- Added an approximation of TTFB for Ultravox.
(PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268))
- Added a new `AudioContextTTSService` to the TTS service base classes. The
`AudioContextWordTTSService` now inherits from `AudioContextTTSService` and
`WebsocketWordTTSService`.
(PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289))
- `LLMUserAggregator` now exposes the following events:
- `on_user_turn_started`: triggered when a user turn starts
- `on_user_turn_stopped`: triggered when a user turn ends
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop
and times out
(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))
- Introducing user mute strategies. User mute strategies indicate when user
input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user
input from interrupting bot speech, tool execution, or other critical system
operations.
A list of strategies can be specified; all strategies are evaluated for
every frame so that each strategy can maintain its internal state. A user
frame is muted if any of the configured strategies indicates it should be
muted.
Available user mute strategies:
- `FirstSpeechUserMuteStrategy`
- `MuteUntilFirstBotCompleteUserMuteStrategy`
- `AlwaysUserMuteStrategy`
- `FunctionCallUserMuteStrategy`
User mute strategies replace the legacy `STTMuteFilter` and provide a more
flexible and composable approach to muting user input.
User mute strategies are configured when setting up the
`LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
FirstSpeechUserMuteStrategy(),
]
),
)
```
In order to use user mute strategies you should update to the new universal
`LLMContext` and `LLMContextAggregatorPair`.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService`
and `NvidiaTTSService`.
(PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300))
- Added `enable_interruptions` constructor argument to all user turn
strategies. This tells the `LLMUserAggregator` to push or not push an
`InterruptionFrame`.
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control
sentence splitting behavior for finals on sentence boundaries.
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
- Added word-level timestamp support to `AzureTTSService` for accurate
text-to-audio synchronization.
(PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334))
- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams`
and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation
dictionary feature for custom pronunciations.
(PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346))
- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport`
(see https://www.liveavatar.com/).
(PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357))
- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:
- New `start_video_paused` parameter to control initial video input state
- New `video_frame_detail` parameter to set image processing quality
("auto",
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
parameter.
- `set_video_input_paused()` method to pause/resume video input at runtime
- `set_video_frame_detail()` method to adjust video frame quality
dynamically
- Automatic rate limiting (1 frame per second) to prevent API overload
(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))
- Added `UserTurnProcessor`, a frame processor built on `UserTurnController`
that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames
and interruptions based on the controller's user turn strategies.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added `UserTurnController` to manage user turns. It emits
`on_user_turn_started`, `on_user_turn_stopped`, and
`on_user_turn_stop_timeout` events, and can be integrated into processors to
detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are
implemented using this controller.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added `should_interrupt` property to `DeepgramFluxSTTService`,
`DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the
bot should be interrupted when the external service detects user speech.
(PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374))
- `LLMAssistantAggregator` now exposes the following events:
- `on_assistant_turn_started`: triggered when the assistant turn starts
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
- `on_assistant_thought`: triggered when there's an assistant thought
available
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA
SDK (requires `krisp_audio`).
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Added support for setting up a pipeline task from external files. You can now
register custom pipeline task setup files by setting the
`PIPECAT_SETUP_FILES` environment variable. This variable should contain a
colon-separated list of Python files (e.g. `export
PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
function with the following signature:
```python
async def setup_pipeline_task(task: PipelineTask):
...
```
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
- Added a keepalive task for `InworldTTSService` to keep the service connected
in the event of no generations for longer periods of time.
(PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403))
- Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When
enabled, `GladiaSTTService` acts as the turn controller, emitting
`UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally
`InterruptionFrame`.
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
- Added `should_interrupt` property to `GladiaSTTService` to configure whether
the bot should be interrupted when the external service detects user speech.
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
- Added `VonageFrameSerializer` for the Vonage Video API Audio Connector
WebSocket protocol.
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
- Added `append_trailing_space` parameter to `TTSService` to automatically
append a trailing space to text before sending to TTS, helping prevent some
services from vocalizing trailing punctuation.
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
### Changed
- Updated `ElevenLabsRealtimeSTTService` to accept the
`include_language_detection` parameter to detect language.
```python
stt = ElevenLabsRealtimeSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
include_language_detection=True
)
```
(PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216))
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved
VAD, Smart Turn capabilities, and brings dramatic improvements to latency
without any impact on accuracy. Use the `turn_detection_mode` parameter to control
the endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
```python
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
```
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
- `daily-python` updated to 0.23.0.
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by
`DailyTransport` now include the transport source (i.e., the originating
audio track).
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
- Updates to Inworld TTS services:
- Improved `InworldTTSService`'s websocket implementation to better flush
and close context to better handle long inputs.
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))
- Improved the error handling and reconnection logic for `WebsocketServer` by
distinguishing between errors when disconnecting and websocket communication
errors.
(PR [#3392](https://github.com/pipecat-ai/pipecat/pull/3392))
- Updated `DeepgramSTTService` to push user started/stopped speaking and
interruption frames when `vad_enabled` is set to true. This centralizes the
frames into the service, removing the need to have your application code
handle Deepgram's events and push these frames.
(PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314))
- Added encoding validation to `DeepgramTTSService` to prevent unsupported
encodings from reaching the API. The service now raises `ValueError` at
initialization with a clear error message.
(PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329))
- Updated `read_audio_frame` & `read_video_frame` methods in
`SmallWebRTCClient` to check if the track is enabled before logging a
warning.
(PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336))
- Updated `CartesiaTTSService` to support setting `language=None`, resulting in
Cartesia auto-detecting the language of the conversation.
(PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366))
- The bundled Smart Turn weights are now updated to v3.2, which has better
handling of short utterances, and is more robust against background noise.
(PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367))
- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6`
(PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371))
- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
meaning that the start of the turn audio is not cut off. This improves
accuracy for short utterances.
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
(PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377))
- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter`
to share a single SDK instance within the same process.
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english`
and voice ID to `autumn`.
(PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399))
- Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio
packetization via the `fixed_audio_packet_size` parameter to support media
endpoints requiring strict framing and real-time pacing.
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
- `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to
`True` to prevent punctuation (e.g., “dot”) from being pronounced.
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
- Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`,
`LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns
thought content.
(PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431))
### Deprecated
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use
`pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- `FrameProcessor.interruption_strategies` is deprecated, use
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in
`pipecat.processors.aggregators.llm_response` are now deprecated. Use the new
universal `LLMContext` and `LLMContextAggregatorPair` instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and
`UserStoppedSpeakingFrame` frames.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame`
frames are deprecated.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in
unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies`
parameter instead.
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is
deprecated. Use the new `turn_detection_mode` parameter instead, with
`TurnDetectionMode.EXTERNAL`,`TurnDetectionMode.ADAPTIVE`, or
`TurnDetectionMode.SMART_TURN`. The `enable_vad` parameter is also
deprecated and is inferred from the `turn_detection_mode`.
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are
now deprecated in favor of the universal `LLMContext` and its associated
things.
From the developer's point of view, switching to using `LLMContext`
machinery will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```
(PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263))
- `STTMuteFilter` is deprecated and will be removed in a future version. Use
`LLMUserAggregator`'s new `user_mute_strategies` instead.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- `FrameProcessor.interruptions_allowed` is now deprecated, use
`LLMUserAggregator`'s new parameter `user_mute_strategies` instead.
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
- `PipelineParams.allow_interruptions` is now deprecated, use
`LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For
example, to disable interruptions but still get user turns you can do:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
),
),
)
```
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
- `TranscriptProcessor` and related data classes and frames
(`TranscriptionMessage`, `ThoughtTranscriptionMessage`,
`TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and
`LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and
`on_assistant_turn_stopped`) instead.
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
- Deprecated support for the `vad_events` `LiveOptions` in
`DeepgramSTTService`. Instead, use a local Silero VAD for VAD events.
Additionally, deprecated `should_interrupt` which will be removed along with
`vad_events` support in a future release.
(PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386))
- Loading external observers from files is deprecated, use the new pipeline
task setup files and `PIPECAT_SETUP_FILES` environment variable instead.
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
### Fixed
- Improved error handling in `ElevenLabsRealtimeSTTService`
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
that blocks the process if the websocket disconnects due to an error
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
- Fixed a bug in `STTMuteFilter` where the user was not always muted during
function calls, especially when there were multiple simultaneous calls.
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate
memory" error when processing silence audio frames.
(PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322))
- Updated `SpeechmaticsSTTService` for version `0.0.99+`:
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
in order to finalize transcription.
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
detection.
- Only emit VAD + interruption frames if VAD is enabled within the plugin
(modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
- Fixed an issue with function calling where a handler failing to invoke its
result callback could leave the context stuck in IN_PROGRESS, causing LLM
inference for subsequent function call results to block while waiting on the
unresolved call.
(PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343))
- Fixed an issue with DeepgramTTSService where the model would output "Dot"
instead of a period in some circumstances.
(PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345))
- Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351))
- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were
occasionally not pushed.
(PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356))
- Fixed potential memory leaks and initialization issues in `KrispVivaFilter`
by improving SDK lifecycle management.
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was
set after awaiting, allowing the event loop to re-enter the method before the
guard was set.
(PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400))
- Fixed parallel function calling when using Gemini thinking.
(PR [3420](https://github.com/pipecat-ai/pipecat/pull/3420))
- Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422))
- Fixed an issue in `traced_tts`, `traced_gemini_live`, and
`traced_openai_realtime` where `model_name` in OpenTelemetry appears as
`unknown`.
(PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428))
- Fixed `request_image_frame` (for backwards compatibility) and restored
function-callrelated fields in `UserImageRequestFrame` and
`UserImageRawFrame`, preventing a case where adding a non-LLM message to the
context could trigger duplicate LLM inferences (on image arrival and on
function-call result), potentially causing an infinite inference loop.
(PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430))
- Fixed `LLMContext.create_audio_message()` by correcting an internal helper
that was incorrectly declared async while being run in `asyncio.to_thread()`.
(PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435))
### Other
- Added `52-live-transcription.py` foundational example demonstrating live
transcription and translation from English to Spanish. In this example, the
bot is not interruptible: as the user continues speaking, English
transcriptions are queued, and the bot continuously translates and speaks
each queued sentence in Spanish without being interrupted by new user speech.
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows
how to use `UserTurnProcessor`.
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
- Added a new foundational example `28-user-assistant-turns.py` that shows how
to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to
gather a conversation transcript.
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
## [0.0.98] - 2025-12-17
### Added

View File

@@ -73,15 +73,15 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Category | Services |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |

View File

@@ -1,42 +0,0 @@
- Introducing user turn strategies. User turn strategies indicate when the user turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategy
Available user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategy
The default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]
Turn strategies are configured when setting up `LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
)
],
)
),
)
```
In order to use the user turn strategies you must update to the new universal `LLMContext` and `LLMContextAggregatorPair`.

View File

@@ -1 +0,0 @@
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- `FrameProcessor.interruption_strategies` is deprecated, use `LLMUserAggregator`'s new `user_turn_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` frames are deprecated.

View File

@@ -1 +0,0 @@
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames.

View File

@@ -1 +0,0 @@
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new universal `LLMContext` and `LLMContextAggregatorPair` instead.

View File

@@ -1 +0,0 @@
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with `LLMUserAggregator`'s new `user_turn_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural network via pyrnnoise library.

View File

@@ -1,7 +0,0 @@
- Updated `ElevenLabsRealtimeSTTService` to accept the `include_language_detection` parameter to detect language.
```python
stt = ElevenLabsRealtimeSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
include_language_detection=True
)
```

View File

@@ -1,15 +0,0 @@
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved VAD,
Smart Turn capabilities, and brings dramatic improvements to latency without
any impact on accuracy. Use the `turn_detection_mode` parameter to control the
endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
```python
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
```

View File

@@ -1,4 +0,0 @@
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is deprecated.
Use the new `turn_detection_mode` parameter instead, with `TurnDetectionMode.EXTERNAL`,
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. The `enable_vad`
parameter is also deprecated and is inferred from the `turn_detection_mode`.

View File

@@ -1,2 +0,0 @@
- Improved error handling in `ElevenLabsRealtimeSTTService`
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop that blocks the process if the websocket disconnects due to an error

View File

@@ -1 +0,0 @@
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by `DailyTransport` now include the transport source (i.e., the originating audio track).

View File

@@ -1 +0,0 @@
- `daily-python` updated to 0.23.0.

View File

@@ -1,15 +0,0 @@
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are now deprecated in favor of the universal `LLMContext` and its associated things.
From the developer's point of view, switching to using `LLMContext` machinery will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```

View File

@@ -1,8 +0,0 @@
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time voice conversations:
- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz)

View File

@@ -1 +0,0 @@
- Added an approximation of TTFB for Ultravox.

View File

@@ -1,5 +0,0 @@
- Updates to Inworld TTS services:
- Improved `InworldTTSService`'s websocket implementation to better flush and
close context to better handle long inputs.
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.

View File

@@ -1 +0,0 @@
- Added a new `AudioContextTTSService` to the TTS service base classes. The `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and `WebsocketWordTTSService`.

View File

@@ -1,4 +0,0 @@
- `LLMUserAggregator` now exposes the following events:
- `on_user_turn_started`: triggered when a user turn starts
- `on_user_turn_stopped`: triggered when a user turn ends
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop and times out

View File

@@ -1,29 +0,0 @@
- Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations.
A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted.
Available user mute strategies:
* `FirstSpeechUserMuteStrategy`
* `MuteUntilFirstBotCompleteUserMuteStrategy`
* `AlwaysUserMuteStrategy`
* `FunctionCallUserMuteStrategy`
User mute strategies replace the legacy `STTMuteFilter` and provide a more flexible and composable approach to muting user input.
User mute strategies are configured when setting up the `LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
FirstSpeechUserMuteStrategy(),
]
),
)
```
In order to use user mute strategies you should update to the new universal `LLMContext` and `LLMContextAggregatorPair`.

View File

@@ -1 +0,0 @@
- `STTMuteFilter` is deprecated and will be removed in a future version. Use `LLMUserAggregator`'s new `user_mute_strategies` instead.

View File

@@ -1 +0,0 @@
- Fixed a bug in `STTMuteFilter` where the user was not always muted during function calls, especially when there were multiple simultaneous calls.

View File

@@ -1 +0,0 @@
- `FrameProcessor.interruptions_allowed` is now deprecated, use `LLMUserAggregator`'s new parameter `user_mute_strategies` instead.

View File

@@ -1,12 +0,0 @@
- `PipelineParams.allow_interruptions` is now deprecated, use `LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For example, to disable interruptions but still get user turns you can do:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
),
),
)
```

View File

@@ -1 +0,0 @@
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService` and `NvidiaTTSService`.

View File

@@ -1 +0,0 @@
- Updated `DeepgramSTTService` to push user started/stopped speaking and interruption frames when `vad_enabled` is set to true. This centralizes the frames into the service, removing the need to have your application code handle Deepgram's events and push these frames.

View File

@@ -1 +0,0 @@
- Added `enable_interruptions` constructor argument to all user turn strategies. This tells the `LLMUserAggregator` to push or not push an `InterruptionFrame`.

View File

@@ -1 +0,0 @@
- Added `52-live-transcription.py` foundational example demonstrating live transcription and translation from English to Spanish. In this example, the bot is not interruptible: as the user continues speaking, English transcriptions are queued, and the bot continuously translates and speaks each queued sentence in Spanish without being interrupted by new user speech.

View File

@@ -1 +0,0 @@
- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate memory" error when processing silence audio frames.

View File

@@ -1 +0,0 @@
- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control sentence splitting behavior for finals on sentence boundaries.

View File

@@ -1,4 +0,0 @@
- Updated `SpeechmaticsSTTService` for version `0.0.99+`:
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame` in order to finalize transcription.
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn detection.
- Only emit VAD + interruption frames if VAD is enabled within the plugin (modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).

View File

@@ -1 +0,0 @@
- Added encoding validation to `DeepgramTTSService` to prevent unsupported encodings from reaching the API. The service now raises `ValueError` at initialization with a clear error message.

View File

@@ -1,2 +0,0 @@
- Added word-level timestamp support to `AzureTTSService` for accurate text-to-audio synchronization.

View File

@@ -1 +0,0 @@
- Updated `read_audio_frame` & `read_video_frame` methods in `SmallWebRTCClient` to check if the track is enabled before logging a warning.

View File

@@ -1 +0,0 @@
- Fixed an issue with function calling where a handler failing to invoke its result callback could leave the context stuck in IN_PROGRESS, causing LLM inference for subsequent function call results to block while waiting on the unresolved call.

View File

@@ -1 +0,0 @@
- Fixed an issue with DeepgramTTSService where the model would output "Dot" instead of a period in some circumstances.

View File

@@ -1 +0,0 @@
- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams` and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation dictionary feature for custom pronunciations.

View File

@@ -1 +0,0 @@
- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were occasionally not pushed.

View File

@@ -1 +0,0 @@
- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport` (see https://www.liveavatar.com/).

View File

@@ -1,8 +0,0 @@
- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:
- New `start_video_paused` parameter to control initial video input state
- New `video_frame_detail` parameter to set image processing quality ("auto",
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
parameter.
- `set_video_input_paused()` method to pause/resume video input at runtime
- `set_video_frame_detail()` method to adjust video frame quality dynamically
- Automatic rate limiting (1 frame per second) to prevent API overload

View File

@@ -1 +0,0 @@
- Updated `CartesiaTTSService` to support setting `language=None`, resulting in Cartesia auto-detecting the language of the conversation.

View File

@@ -1,3 +0,0 @@
- The bundled Smart Turn weights are now updated to v3.2, which has better
handling of short utterances, and is more robust against background
noise.

View File

@@ -1 +0,0 @@
- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6`

View File

@@ -1 +0,0 @@
- Added `UserTurnProcessor`, a frame processor built on `UserTurnController` that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames and interruptions based on the controller's user turn strategies.

View File

@@ -1 +0,0 @@
- Added `UserTurnController` to manage user turns. It emits `on_user_turn_started`, `on_user_turn_stopped`, and `on_user_turn_stop_timeout` events, and can be integrated into processors to detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are implemented using this controller.

View File

@@ -1 +0,0 @@
- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows how to use `UserTurnProcessor`.

View File

@@ -1 +0,0 @@
- Added `should_interrupt` property to `DeepgramFluxSTTService`, `DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the bot should be interrupted when the external service detects user speech.

View File

@@ -1,5 +0,0 @@
- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
meaning that the start of the turn audio is not cut off. This improves
accuracy for short utterances.
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.

View File

@@ -1,4 +0,0 @@
- `LLMAssistantAggregator` now exposes the following events:
- `on_assistant_turn_started`: triggered when the assistant turn starts
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
- `on_assistant_thought`: triggered when there's an assistant thought available

View File

@@ -1 +0,0 @@
- `TranscriptProcessor` and related data classes and frames (`TranscriptionMessage`, `ThoughtTranscriptionMessage`, `TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and `LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and `on_assistant_turn_stopped`) instead.

View File

@@ -1 +0,0 @@
- Added a new foundational example `28-user-assistant-turns.py` that shows how to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to gather a conversation transcript.

View File

@@ -1 +0,0 @@
- Deprecated support for the `vad_events` `LiveOptions` in `DeepgramSTTService`. Instead, use a local Silero VAD for VAD events. Additionally, deprecated `should_interrupt` which will be removed along with `vad_events` support in a future release.

View File

@@ -1 +0,0 @@
- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA SDK (requires `krisp_audio`).

View File

@@ -1 +0,0 @@
- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter` to share a single SDK instance within the same process.

View File

@@ -1 +0,0 @@
- Fixed potential memory leaks and initialization issues in `KrispVivaFilter` by improving SDK lifecycle management.

View File

@@ -1,6 +0,0 @@
- Added support for setting up a pipeline task from external files. You can now register custom pipeline task setup files by setting the `PIPECAT_SETUP_FILES` environment variable. This variable should contain a colon-separated list of Python files (e.g. `export PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a function with the following signature:
```python
async def setup_pipeline_task(task: PipelineTask):
...
```

View File

@@ -1 +0,0 @@
- Loading external observers from files is deprecated, use the new pipeline task setup files and `PIPECAT_SETUP_FILES` environment variable instead.

View File

@@ -1 +0,0 @@
- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english` and voice ID to `autumn`.

View File

@@ -1 +0,0 @@
- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was set after awaiting, allowing the event loop to re-enter the method before the guard was set.

View File

@@ -0,0 +1 @@
- Added `add_reached_upstream_filter()` and `add_reached_downstream_filter()` methods to `PipelineTask` for appending frame types.

1
changelog/3510.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `reached_upstream_types` and `reached_downstream_types` read-only properties to `PipelineTask` for inspecting current frame filters.

View File

@@ -0,0 +1 @@
- Changed frame filter storage from tuples to sets in `PipelineTask`.

View File

@@ -0,0 +1 @@
- Added `RTVIProcessor.create_rtvi_observer()` factory method for creating RTVI observers.

View File

@@ -0,0 +1 @@
- Added `FrameProcessor.broadcast_frame_instance(frame)` method to broadcast a frame instance by extracting its fields and creating new instances for each direction.

1
changelog/3519.added.md Normal file
View File

@@ -0,0 +1 @@
- `PipelineTask` now automatically adds `RTVIProcessor` and registers `RTVIObserver` when `enable_rtvi=True` (default), simplifying pipeline setup.

View File

@@ -0,0 +1 @@
- Fixed `FrameProcessor.broadcast_frame()` to deep copy kwargs, preventing shared mutable references between the downstream and upstream frame instances.

1
changelog/3519.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Transports now properly broadcast `InputTransportMessageFrame` frames both upstream and downstream instead of only pushing downstream.

1
changelog/3520.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `video_out_codec` parameter to `TransportParams` allowing configuration of the preferred video codec (e.g., `"VP8"`, `"H264"`, `"H265"`) for video output in `DailyTransport`.

1
changelog/3523.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `location` parameter to Google TTS services (`GoogleHttpTTSService`, `GoogleTTSService`, `GeminiTTSService`) for regional endpoint support.

1
changelog/3525.added.md Normal file
View File

@@ -0,0 +1 @@
- Added new `SMART_TURN_LOG_DATA` environment variable, which causes Smart Turn input data to be saved to disk

View File

@@ -0,0 +1,2 @@
- Changed default Inworld TTS model from `inworld-tts-1` to
`inworld-tts-1.5-max`.

View File

@@ -91,6 +91,25 @@ autodoc_mock_imports = [
# MLX dependencies (Apple Silicon specific)
"mlx",
"mlx_whisper", # Note: might need underscore format too
# Pydantic v2 compatibility issues in third-party SDKs
"hume",
"hume.tts",
"hume.tts.types",
"cartesia",
"camb",
"sarvamai",
"openpipe",
"openai.types.beta.realtime",
"langchain_core",
"langchain_core.messages",
# FastAPI - Pydantic v2 compatibility issues during Sphinx autodoc
"fastapi",
"fastapi.applications",
"fastapi.routing",
"fastapi.params",
"fastapi.middleware",
"fastapi.responses",
"uvicorn",
]
# HTML output settings

View File

@@ -31,6 +31,9 @@ AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Camb.ai
CAMB_API_KEY=...
# Cartesia
CARTESIA_API_KEY=...
CARTESIA_VOICE_ID=...
@@ -82,6 +85,9 @@ GROK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Hathora
HATHORA_API_KEY=...
# Heygen
HEYGEN_API_KEY=...
HEYGEN_LIVE_AVATAR_API_KEY=...

View File

@@ -85,7 +85,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -98,11 +98,11 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -68,7 +68,7 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -82,11 +82,11 @@ async def main():
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -78,7 +78,7 @@ async def main():
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def main():
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -106,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -119,12 +119,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(),
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
ml,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -120,7 +120,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -138,12 +138,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(),
stt,
context_aggregator.user(),
user_aggregator,
llm,
tts,
image_sync_aggregator,
transport.output(),
context_aggregator.assistant(),
assistant_aggregator,
]
)

View File

@@ -77,7 +77,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -90,11 +90,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -131,7 +131,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -140,11 +140,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -117,7 +117,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -132,11 +132,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -103,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
lc = LangchainProcessor(history_chain)
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -116,11 +116,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
lc, # Langchain
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -71,7 +71,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -80,11 +80,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -81,7 +81,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -96,11 +96,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -86,7 +86,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -99,11 +99,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -72,7 +72,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
@@ -81,11 +81,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -75,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -88,11 +88,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -85,7 +85,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -100,11 +100,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

View File

@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)

Some files were not shown because too many files have changed in this diff Show More