Compare commits

...

746 Commits

Author SHA1 Message Date
Mark Backman
bd4596570c Changelog entry for lazy import fix 2026-02-20 16:35:17 -07:00
Mark Backman
9110e9f30b Lazy import LocalSmartTurnAnalyzerV3 to avoid unnecessary transformers load
Move the LocalSmartTurnAnalyzerV3 import from module level into
__post_init__ so that importing user_turn_strategies no longer eagerly
loads the transformers package. This eliminates the spurious PyTorch
was not found warning for users who don't use the smart turn analyzer.
2026-02-20 16:31:39 -07:00
Aleix Conchillo Flaqué
0370bb15e4 update uv.lock 2026-02-20 13:47:04 -08:00
Aleix Conchillo Flaqué
2b3595485f Merge pull request #3788 from dhruvladia-sarvam/v3-fix-final
initial
2026-02-20 13:46:18 -08:00
Filipi da Silva Fuchter
63c664becb Merge pull request #3787 from pipecat-ai/filipi/refresh_active_audio_context
Fix race condition where context times out after sending second transcript
2026-02-20 14:50:38 -05:00
dhruvladia-sarvam
fecf462139 initial 2026-02-21 01:02:37 +05:30
Daksh Dua
023063759a Changelog entry for TTS race condition fix. 2026-02-20 16:00:34 -03:00
Daksh Dua
c49eda98e7 Fix race condition where context times out after sending second transcript 2026-02-20 15:37:14 -03:00
Filipi da Silva Fuchter
5d07326e36 Merge pull request #3732 from pipecat-ai/filipi/tts_updates
Refactored audio context management in TTS services
2026-02-20 13:02:42 -05:00
filipi87
fa659311b6 Changelog entry 2026-02-20 14:57:59 -03:00
filipi87
125c423356 Refactored audio context management in TTS services to improve encapsulation and reduce code duplication 2026-02-20 14:57:44 -03:00
Filipi da Silva Fuchter
c9615c8db6 Merge pull request #3779 from pipecat-ai/filipi/filter_observer
Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer.
2026-02-20 12:42:02 -05:00
Aleix Conchillo Flaqué
28c542f6ed Merge pull request #3785 from pipecat-ai/mb/deepgram-sagemaker-tts
Add DeepgramSageMakerTTSService
2026-02-20 09:01:32 -08:00
Aleix Conchillo Flaqué
5708c81b93 Merge pull request #3782 from pipecat-ai/aleix/fix-mutable-default-args-aggregator-pair
Fix mutable default arguments in LLMContextAggregatorPair
2026-02-20 08:02:18 -08:00
Mark Backman
82ce3ea8de Update 07c example to use DeepgramSageMakerTTSService 2026-02-20 08:10:41 -07:00
Mark Backman
62ada92188 Add changelog for PR #3785 2026-02-20 08:09:57 -07:00
Mark Backman
273692421f Add DeepgramSageMakerTTSService for Deepgram TTS on AWS SageMaker
Adds a TTS service that connects to Deepgram models deployed on AWS
SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
Deepgram TTS protocol (Speak, Flush, Clear, Close) over the BiDi
client, with interruption handling and per-turn TTFB metrics.

Updates the example and env.example with separate STT/TTS endpoint names.
2026-02-20 08:08:00 -07:00
Mark Backman
0a3e212f93 Merge pull request #3784 from pipecat-ai/mb/stt-sagemaker-finalize
Align DeepgramSageMakerSTTService finalize pattern with DeepgramSTTService
2026-02-20 09:26:23 -05:00
Mark Backman
43d686c622 Add changelog entry for PR #3784 2026-02-20 07:17:36 -07:00
Mark Backman
4d136e1e28 Align DeepgramSageMakerSTTService finalize pattern with DeepgramSTTService 2026-02-20 07:15:38 -07:00
Aleix Conchillo Flaqué
2024285c75 Add changelog entries for PR #3782 2026-02-19 20:52:31 -08:00
Aleix Conchillo Flaqué
bc830c16f1 Fix mutable default arguments in LLMContextAggregatorPair
Replace mutable default parameter values with None and instantiate
inside the method body to avoid shared state across calls.
2026-02-19 20:52:00 -08:00
filipi87
18630c9478 Adding changelog entry for RTVI observer ignored_sources feature. 2026-02-19 18:41:05 -03:00
filipi87
3a8d3cc841 Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer. 2026-02-19 18:36:12 -03:00
Filipi da Silva Fuchter
2963c7589d Merge pull request #3774 from pipecat-ai/mb/broadcast-frames-rtvi-observer
Fix RTVIObserver missing upstream-only frames
2026-02-19 15:32:48 -05:00
filipi87
63caa403cb Improving RTVI doc description. 2026-02-19 17:31:25 -03:00
Aleix Conchillo Flaqué
846cf0794d Merge pull request #3615 from omChauhanDev/fix/daily-transport-message-queue
fix(daily): queue outbound messages until transport joins
2026-02-19 11:55:11 -08:00
Aleix Conchillo Flaqué
498349c17e Merge pull request #3776 from pipecat-ai/aleix/stt-ttfb-metrics-refactor
Refactor STT TTFB metrics to use base class start/stop pattern
2026-02-19 11:46:46 -08:00
Aleix Conchillo Flaqué
474b27305f Merge pull request #3748 from pipecat-ai/mb/user-idle-configurable
Make UserIdleController always-on with dynamic timeout updates
2026-02-19 11:44:51 -08:00
Aleix Conchillo Flaqué
20509e8f96 Merge pull request #3744 from pipecat-ai/mb/user-idle-timeout-frame
Redesign UserIdleController to use BotStoppedSpeakingFrame
2026-02-19 11:34:42 -08:00
filipi87
5b2fa69bdc Renaming from broadcasted_sibling_id to broadcast_sibling_id 2026-02-19 16:24:07 -03:00
Aleix Conchillo Flaqué
4f8cacc769 Merge pull request #3747 from pipecat-ai/mb/update-comment-mute-strategy
Update comment in _maybe_mute_frame
2026-02-19 11:19:44 -08:00
Aleix Conchillo Flaqué
0145fb4ea0 Merge pull request #3763 from lukepayyapilli/fix/asyncgen-cleanup-uvloop-crash
Fix async generator cleanup to prevent uvloop crash on Python 3.12+
2026-02-19 11:14:00 -08:00
Aleix Conchillo Flaqué
8e52df7f03 Add changelog entries for PR #3776 2026-02-19 10:52:45 -08:00
Aleix Conchillo Flaqué
8ee99e37ff Merge pull request #3768 from tanmayc25/fix/tavus-sample-rate
fix: use audio.sample_rate instead of audio.audio_frames in TavusInputTransport
2026-02-19 10:52:34 -08:00
Aleix Conchillo Flaqué
bae4211369 Update dependency lock file 2026-02-19 10:52:28 -08:00
Aleix Conchillo Flaqué
859cd7c920 Refactor STT TTFB metrics to use base class start/stop pattern
Eliminate custom _emit_stt_ttfb_metric and manual timestamp tracking in
STTService by reusing FrameProcessor's start_ttfb_metrics/stop_ttfb_metrics
with new start_time/end_time parameters. This keeps the chronological
start→stop ordering and removes _speech_end_time and _last_transcription_time
state from STTService.
2026-02-19 10:52:24 -08:00
filipi87
d608c400f9 Preventing the duplicated BotStartedSpeakingFrame and BotStoppedSpeakingFrame. 2026-02-19 15:49:22 -03:00
Aleix Conchillo Flaqué
94e93bed83 Merge pull request #3719 from pipecat-ai/aleix/sip-transfer-refer-frames
Add SIP transfer and SIP REFER frames to Daily transport
2026-02-19 10:09:13 -08:00
filipi87
b1cee140b9 Refactoring to use broadcasted_sibling_id instead of broadcasted field. 2026-02-19 15:06:50 -03:00
Aleix Conchillo Flaqué
352361bdd2 Update changelog skill to avoid line wrapping 2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
baa61468a1 Add changelog entries for PR #3719 2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
7501ba2e45 Undeprecate DailyUpdateRemoteParticipantsFrame
Remove the deprecation warning and __post_init__ override. Also fix the
default value for remote_participants to use field(default_factory=dict)
instead of None.
2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
200716e8fe Add SIP transfer and SIP REFER frames to Daily transport
Add write_transport_frame() hook to BaseOutputTransport so subclasses
can handle custom frame types that flow through the audio queue. Add
DailySIPTransferFrame and DailySIPReferFrame as DataFrame subclasses
that queue with audio, ensuring SIP operations execute only after the
bot finishes its current utterance. Override write_transport_frame in
DailyOutputTransport to dispatch these frames to the existing
sip_call_transfer() and sip_refer() client methods.

Also switch DailyOutputTransport.send_message error handling from
logger.error to push_error for consistency.
2026-02-19 09:20:33 -08:00
Mark Backman
50ef4909e3 Add changelog entries for PR #3774 2026-02-19 07:44:52 -07:00
Mark Backman
63df4642b5 Fix RTVIObserver missing upstream-only frames by adding broadcasted flag
RTVIObserver previously filtered out all upstream frames to avoid
duplicate messages from broadcasted frames. This caused upstream-only
frames to be silently ignored. Instead, add a `broadcasted` field to
the Frame base class that is set by broadcast_frame() and
broadcast_frame_instance(), and only skip upstream copies of
broadcasted frames.
2026-02-19 07:43:20 -07:00
Filipi da Silva Fuchter
43869a499d Merge pull request #3773 from pipecat-ai/mb/fix-ci-apt-get-update
Fix CI: add apt-get update before installing system packages
2026-02-19 09:28:25 -05:00
Mark Backman
d2bf3952ec Merge pull request #3772 from simliai/main
Update SimliClient to latest
2026-02-19 09:13:14 -05:00
Mark Backman
92c380ee77 Add apt-get update before installing system packages in CI
The CI was failing because the runner's package index was stale,
causing a 404 when fetching libasound2-dev (a dependency of
portaudio19-dev). Running apt-get update first refreshes the index.
2026-02-19 07:01:07 -07:00
antonyesk601
a55ba40921 fix: remove misimport 2026-02-19 10:41:17 +00:00
antonyesk601
fb1bfd03dd update SimliClient to latest 2026-02-19 10:35:50 +00:00
Filipi da Silva Fuchter
a0a7b3101d Merge pull request #3765 from ianbbqzy/ian/inworld-default-async
[inworld] default timestamp transport strategy to ASYNC
2026-02-18 16:59:01 -05:00
Filipi da Silva Fuchter
39dc4ba99c Updated changelog/3765.changed.md 2026-02-18 16:58:27 -05:00
Filipi da Silva Fuchter
a5b5a8e5cf Merge pull request #3759 from pipecat-ai/mb/gradium-context-update
Switch Gradium TTS to AudioContextWordTTSService for multiplexing
2026-02-18 10:16:57 -05:00
filipi87
1daea78b91 Fix GradiumTTSService to reuse context IDs across multiple run_tts calls and prevent the parent class from pushing text frames. 2026-02-18 12:12:49 -03:00
Tanmay Chaudhari
6066eec853 Add changelog for PR #3768 2026-02-18 14:31:16 +05:30
Tanmay Chaudhari
cd379671aa fix: use audio.sample_rate instead of audio.audio_frames in TavusInputTransport 2026-02-18 14:18:16 +05:30
Ian Lee
8006223911 [inworld] default timestamp transport strategy to ASYNC 2026-02-17 15:13:20 -08:00
Luke Payyapilli
247f0bbcd3 Fix async generator cleanup to prevent uvloop crash on Python 3.12+ 2026-02-17 13:10:31 -05:00
Mark Backman
3537420d91 Merge pull request #3761 from speechmatics/fix/sdk-version 2026-02-17 08:02:00 -05:00
Sam Sykes
65fb88e61e chore: update version specifier for speechmatics-voice
Change the version specifier from `>=0.2.8` to
`~=0.2.8` for the `speechmatics-voice` package.
This ensures compatibility with future patch
versions while preventing potential breaking
changes from minor updates.
2026-02-17 09:58:17 +00:00
Sam Sykes
b345f48ac1 fix: update dependency specifier for speechmatics-voice
Change the version specifier from >=0.2.8 to ~=0.2.8 for the
speechmatics-voice package to ensure compatibility with future
patch versions.
2026-02-17 09:55:43 +00:00
Mark Backman
f181e12d8f Add changelog for PR #3759 2026-02-16 11:35:45 -07:00
Mark Backman
36de6003d0 Switch Gradium TTS to AudioContextWordTTSService for multiplexing
Use client_req_id-based multiplexing instead of disconnecting and
reconnecting the websocket on every interruption. This follows the
same pattern used by Cartesia, ElevenLabs, and other services via
AudioContextWordTTSService.

Key changes:
- Base class: InterruptibleWordTTSService -> AudioContextWordTTSService
- Add close_ws_on_eos: False to setup message to keep connection alive
- Add client_req_id to text, end_of_stream messages for demultiplexing
- Route audio via append_to_audio_context() instead of push_frame()
- Silently drop messages for cancelled/unknown contexts on interruption
- Add _handle_interruption() that resets context without reconnecting
- Remove no-op push_frame() override
2026-02-16 11:34:16 -07:00
Mark Backman
dba4de77bf Merge pull request #3684 from ai-coustics/goedev/aic-model-caching
AIC model caching
2026-02-16 10:43:14 -05:00
Mark Backman
507765625f Make UserIdleController always-on with dynamic timeout updates
Always create UserIdleController (timeout=0 means disabled), removing
all Optional guards. Add UserIdleTimeoutUpdateFrame to allow changing
the idle timeout at runtime.
2026-02-14 09:54:30 -05:00
Mark Backman
8f5e5e8e7c Update comment in _maybe_mute_frame 2026-02-14 09:41:42 -05:00
Mark Backman
c682a44bb6 Merge pull request #3738 from lukepayyapilli/fix/mute-events-before-start-frame
Fix LLMUserAggregator broadcasting mute events before StartFrame
2026-02-14 09:40:40 -05:00
Mark Backman
cb7023681f Add changelog for PR #3744 2026-02-14 08:57:46 -05:00
Mark Backman
012ef41ff4 Redesign UserIdleController to use BotStoppedSpeakingFrame
Replace the continuous heartbeat-based timer (UserSpeakingFrame/BotSpeakingFrame
+ asyncio.Event loop) with a simple one-shot timer that starts when
BotStoppedSpeakingFrame is received and cancels on UserStartedSpeakingFrame or
BotStartedSpeakingFrame. This eliminates false idle triggers caused by gaps
between the user finishing speaking and the bot starting to speak (LLM/TTS
latency).

Guard the timer start with two conditions to prevent false triggers:
- User turn in progress: during interruptions, BotStoppedSpeaking arrives
  while the user is still speaking mid-turn.
- Function calls in progress: FunctionCallsStarted arrives before
  BotStoppedSpeaking because the bot speaks concurrently with the function
  call starting, so the timer must wait for the result and subsequent bot
  response.
2026-02-14 08:55:56 -05:00
Filipi da Silva Fuchter
f6bb5fa124 Merge pull request #3741 from pipecat-ai/filipi/update_prebuilt
Using the latest version of pipecat-ai-small-webrtc-prebuilt.
2026-02-13 15:31:48 -05:00
filipi87
2489c76bc6 Using the latest version of pipecat-ai-small-webrtc-prebuilt. 2026-02-13 16:43:25 -03:00
Mark Backman
73cb96bf66 Merge pull request #3739 from pipecat-ai/mb/docs-skill
Add /update-docs Claude Code skill
2026-02-13 13:26:06 -05:00
Mark Backman
79ec61d1d8 Merge pull request #3642 from pipecat-ai/cb/rime-arcana-v3
Update RimeTTSService for arcana and mistv2 model support
2026-02-13 13:25:27 -05:00
Mark Backman
ca440594fe Merge pull request #3720 from pipecat-ai/mb/fix-grok-realtime
Fix Grok Realtime voice type validation for server responses
2026-02-13 13:24:53 -05:00
Mark Backman
6c25dd4aa2 Merge pull request #3736 from pipecat-ai/mb/improve-events-docstrings
Improve events docstrings
2026-02-13 13:24:15 -05:00
Mark Backman
09bb6bb03b Merge pull request #3735 from pipecat-ai/mb/fix-llm-tracing-error-handilng
Fix double execution of service functions on tracing errors
2026-02-13 13:23:55 -05:00
Mark Backman
746fdfbfef Merge pull request #3728 from pipecat-ai/mb/upgrade-pillow
Bump Pillow upper bound from <12 to <13
2026-02-13 13:23:41 -05:00
Mark Backman
f7af9f1efd Broaden /update-docs scope to detect missing doc sections 2026-02-13 13:14:45 -05:00
Mark Backman
a5f95acaf5 Add changelog for PR #3735 2026-02-13 13:08:03 -05:00
Mark Backman
e50b138ab2 Fix double execution of service functions when tracing errors occur
The outer try/except in each service decorator caught both tracing
setup errors and application errors from the wrapped function. If the
function itself raised (e.g. LLM rate limit, TTS timeout), the
exception was caught and the function was called a second time.

Fix by tracking whether the original function was called via a
fn_called flag. If the function was already called, re-raise the
exception instead of falling back to untraced re-execution.
2026-02-13 13:08:03 -05:00
Mark Backman
3640c7a2dd Merge pull request #3733 from pipecat-ai/mb/fix-traceable-init
Deprecate unused class_decorators tracing module and fix stale comments
2026-02-13 13:04:34 -05:00
Mark Backman
2454bedf29 Add /update-docs skill for keeping docs in sync with source changes
Adds a Claude Code skill that analyzes the current branch diff against
main, maps changed source files to their doc pages, and makes targeted
updates to Configuration, InputParams, Usage, Notes, and Event Handlers
sections.
2026-02-13 12:52:23 -05:00
Luke Payyapilli
3adb2f50a6 Fix LLMUserAggregator broadcasting mute events before StartFrame 2026-02-13 11:59:56 -05:00
Mark Backman
01b7a93e08 Deprecate unused Traceable/class_decorators module and fix stale comments
The class_decorators.py module (Traceable, @traceable, @traced) is not
used anywhere in the codebase. Mark it deprecated and fix the misleading
comment in service_decorators.py that referenced it as if it were active.
2026-02-13 11:25:40 -05:00
Mark Backman
347eaf582d Merge pull request #3721 from pipecat-ai/fix/pipeline-scoped-tracing-context
Replace singleton context providers with pipeline-scoped TracingContext
2026-02-13 11:24:37 -05:00
Mark Backman
25ca296477 Move tracing fields to AIService and extract _get_turn_context helper
Consolidate _tracing_enabled and _tracing_context from LLMService,
STTService, and TTSService into the shared AIService base class.
Extract _get_turn_context() helper in service_decorators.py to
encapsulate the repeated pattern across all traced decorators.
2026-02-13 11:21:24 -05:00
Mark Backman
3fce88555f Improve events docstrings 2026-02-13 09:39:44 -05:00
Mark Backman
9e6f27c9f1 Merge pull request #3625 from ianbbqzy/ian/inworld-async-timestamp
[inworld] Allow Async delivery of timestamps info
2026-02-12 21:20:22 -05:00
Ian Lee
94f01af545 [inworld] Allow Async delivery of timestamps info
* speed up first audio chunk latency
2026-02-12 17:48:58 -08:00
Filipi da Silva Fuchter
432870cc36 Merge pull request #3729 from pipecat-ai/filipi/elevenlabs_issue
TTS services fixes.
2026-02-12 16:31:46 -05:00
Filipi da Silva Fuchter
e065907745 Merge pull request #3718 from pipecat-ai/filipi/bot_started_speaking
Fixing an issue in RTVI where we were sometimes receiving bot output messages before the bot started speaking.
2026-02-12 16:31:14 -05:00
Mark Backman
b7a5ca3d1e Merge pull request #3730 from pipecat-ai/mb/stt-keepalive
Move STT keepalive from WebsocketSTTService to STTService base class
2026-02-12 15:37:23 -05:00
filipi87
9569625f03 Changelog entries for the TTS fixes. 2026-02-12 16:11:02 -03:00
Mark Backman
18afe37bd1 Add changelog entries for PR #3642 2026-02-12 14:09:24 -05:00
Mark Backman
2b9777b812 Update RimeTTSService InputParams for arcana and mistv2 model support
Add model-specific params (arcana: repetition_penalty, temperature, top_p;
mistv2: no_text_normalization, save_oovs, segment) with dynamic query param
building via _build_settings(). Model/voice/param changes now trigger
WebSocket reconnection since all settings are URL query params.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 14:01:41 -05:00
filipi87
8866ab1585 Fixing RimeTTSService to reuse the same context when needed. 2026-02-12 15:53:38 -03:00
filipi87
f0995164d9 Fixing PlayHTTTSService to reuse the same context when needed. 2026-02-12 15:50:18 -03:00
filipi87
136732afae Fixing InworldTTSService to reuse the same context when needed. 2026-02-12 15:46:59 -03:00
filipi87
3410eb82b3 Fixing CartesiaTTSService to reuse the same context when needed. 2026-02-12 15:26:49 -03:00
Chad Bailey
794811fbdb Updated WSS endpoint for Rime Arcana v3 support 2026-02-12 13:24:29 -05:00
filipi87
abea22ec57 Fixing AsyncAITTSService to reuse the same context when needed. 2026-02-12 15:17:47 -03:00
Mark Backman
08beb0264a Add changelog entries for PR #3730 2026-02-12 13:14:11 -05:00
Mark Backman
2e15b4842c Move STT keepalive mechanism from WebsocketSTTService to STTService base class
This allows non-websocket STT services (like SarvamSTTService, which uses
the Sarvam Python SDK for connection management) to reuse the same keepalive
pattern. Subclasses override _send_keepalive() and _is_keepalive_ready() for
their specific protocol.
2026-02-12 11:09:39 -05:00
filipi87
6d95a2425c Fixing ElevenLabs TTS word timestamp interleaving across sentences. 2026-02-12 12:54:47 -03:00
Mark Backman
4667a3d66d Add changelog for #3728 2026-02-12 09:42:23 -05:00
Mark Backman
0bf2477d2c Bump Pillow upper bound from <12 to <13 2026-02-12 09:41:18 -05:00
Mark Backman
71a752c971 Add tests for TracingContext and TurnTraceObserver
Cover pipeline-scoped tracing context lifecycle, span hierarchy,
conversation/turn context management, and concurrent pipeline isolation.
2026-02-11 23:27:35 -05:00
Mark Backman
358f237507 Replace singleton context providers with pipeline-scoped TracingContext
ConversationContextProvider and TurnContextProvider were singletons that
stored tracing context as class-level state. When two PipelineTask instances
ran concurrently, they would overwrite each other's context, causing service
spans to attach to the wrong pipeline's turn span.

Replace both singletons with a single TracingContext object owned by each
PipelineTask, threaded to services via StartFrame.
2026-02-11 21:58:10 -05:00
Mark Backman
d99a256715 Merge pull request #3706 from ianbbqzy/ian/inworld-user-agent
[Inworld] add User-Agent and X-Request-Id for better traceability
2026-02-11 19:38:26 -05:00
Ian Lee
dcbcab1542 [Inworld] add User-Agent and X-Request-Id for better traceability 2026-02-11 15:47:20 -08:00
Mark Backman
a966947220 Add changelog for #3720 2026-02-11 18:04:58 -05:00
Mark Backman
16b060d9e9 Fix Grok Realtime voice type validation for server responses
The Grok API now returns prefixed voice names (e.g. "human_Ara") in
session.updated events, causing Pydantic validation errors. Widen the
voice field type from GrokVoice to GrokVoice | str to accept both
user-facing names and server-returned values.
2026-02-11 18:04:20 -05:00
filipi87
ed7fde324e Adding changelog entry for the RTVIObserver fix. 2026-02-11 16:23:42 -03:00
filipi87
beb4e86b5f Fixing an issue in RTVI where we were sometimes receiving bot output messages before the bot started speaking. 2026-02-11 16:17:28 -03:00
Aleix Conchillo Flaqué
e75ccd9c2f Merge pull request #3717 from pipecat-ai/aleix/update-claude-md-pr-instructions
Add /pr-submit skill and clean up CLAUDE.md
2026-02-11 10:40:20 -08:00
Aleix Conchillo Flaqué
a80919ceff Move PR submission instructions from CLAUDE.md to /pr-submit skill
Extract the procedural PR workflow into an actionable skill that can be
invoked with /pr-submit. CLAUDE.md is better suited for project context
and conventions, not step-by-step procedures.
2026-02-11 09:57:42 -08:00
Aleix Conchillo Flaqué
1fe4538982 Update PR submission instructions in CLAUDE.md
Expand the Pull Requests section with detailed step-by-step instructions
including branch naming, commit guidance, changelog generation, and PR
description updates.
2026-02-11 09:51:10 -08:00
Filipi da Silva Fuchter
9a48d93bd2 Merge pull request #3713 from pipecat-ai/filipi/smallwebrtc_8khz
Fixing smallwebrtc transport input audio resampling logic.
2026-02-11 11:58:32 -05:00
filipi87
0c3e59ed61 Adding changelog entry for the SmallWebRTCTransport fix. 2026-02-11 13:07:52 -03:00
filipi87
ec2b38dc29 Fixing smallwebrtc transport input audio resampling logic. 2026-02-11 13:01:25 -03:00
Gökmen Görgen
2036757b84 add unit tests for AICModelManager and AICFilter error handling, model loading, and processor behavior 2026-02-11 15:22:37 +01:00
Mark Backman
0574167fbd Merge pull request #3709 from pipecat-ai/mb/fix-quickstart-pcc-deploy
Fix quickstart pcc-deploy.toml
2026-02-10 22:19:37 -05:00
Mark Backman
972ad93e18 Fix quickstart pcc-deploy.toml 2026-02-10 22:17:09 -05:00
Mark Backman
ac53594967 Merge pull request #3708 from pipecat-ai/mb/fix-quickstart-pyproject
Fix quickstart pyproject.toml
2026-02-10 22:09:49 -05:00
Mark Backman
b063d9d43b Fix quickstart pyproject.toml 2026-02-10 22:06:38 -05:00
Mark Backman
48e93beadf Merge pull request #3705 from pipecat-ai/mb/quickstart-0.0.102
Update quickstart for 0.0.102
2026-02-10 21:57:33 -05:00
Aleix Conchillo Flaqué
640940a41a Merge pull request #3704 from pipecat-ai/changelog-0.0.102
Release 0.0.102 - Changelog Update
2026-02-10 18:31:30 -08:00
aconchillo
f1e2001a4e Update changelog for version 0.0.102 2026-02-10 18:28:21 -08:00
Aleix Conchillo Flaqué
12dc6c0b9e Merge pull request #3707 from pipecat-ai/aleix/fix-openai-stream-close-compat
fix(openai): use compatible stream closing for non-OpenAI providers
2026-02-10 18:26:18 -08:00
Aleix Conchillo Flaqué
93f4402198 Update stream close test to match new _closing helper 2026-02-10 18:19:57 -08:00
Aleix Conchillo Flaqué
f3eb5b30a0 Add changelog for #3707 2026-02-10 18:01:29 -08:00
Aleix Conchillo Flaqué
18aad05a7c fix(openai): use compatible stream closing for non-OpenAI providers
OpenAI's AsyncStream uses close() while async generators (e.g. from
OpenPipe) use aclose(). Replace direct async-with on the stream with a
helper that handles both protocols.
2026-02-10 17:59:21 -08:00
Mark Backman
883b24f577 Update quickstart for 0.0.102 2026-02-10 18:14:04 -05:00
Mark Backman
17ab9c425f Merge pull request #3675 from pipecat-ai/mb/elevenlabs-realtime-send-silence
Add silence-based keepalive to WebsocketSTTService
2026-02-10 18:03:38 -05:00
Mark Backman
2f5e61ac55 Add silence-based keepalive to WebsocketSTTService
Adds opt-in keepalive_timeout and keepalive_interval params to
WebsocketSTTService. When enabled, a background task sends silent audio
(or a service-specific protocol message) when the connection has been
idle, preventing server-side timeout disconnects.

Subclasses override _send_keepalive(silence) to wrap the silence in
their wire format. The default sends raw PCM bytes.

Enables keepalive for ElevenLabs (10s), Gladia (20s), and Soniox (1s),
replacing their per-service custom keepalive tasks.
2026-02-10 17:58:47 -05:00
Aleix Conchillo Flaqué
1128c5b7fb Merge pull request #3702 from pipecat-ai/aleix/add-missing-local-smartturn-dependency
pyproject: add local smartturn as a default dependency
2026-02-10 14:34:43 -08:00
Aleix Conchillo Flaqué
a9a5edd8ca pyproject: add local smartturn as a default dependency 2026-02-10 14:32:32 -08:00
Filipi da Silva Fuchter
a98c884e31 Merge pull request #3621 from pipecat-ai/filipi/context_compressure
Context summarization feature implementation
2026-02-10 17:04:47 -05:00
filipi87
2475697955 Changelog entries for context summarization 2026-02-10 18:59:12 -03:00
filipi87
ba242d4875 Context summarization example with Google 2026-02-10 18:59:03 -03:00
filipi87
5deb80932b Context summarization example with OpenAI 2026-02-10 18:58:55 -03:00
filipi87
4a00e6829f Automated tests for the context summarizer. 2026-02-10 18:58:44 -03:00
filipi87
9d89afa7d4 Automated tests for the context summarization feature. 2026-02-10 18:58:33 -03:00
filipi87
92b6ecd945 New Claude skill to help refactor and cleanup the code. 2026-02-10 18:58:22 -03:00
filipi87
314d074c61 Context summarization feature implementation. 2026-02-10 18:58:12 -03:00
Filipi da Silva Fuchter
9c627e7292 Merge pull request #3653 from pipecat-ai/filipi/heygen_lite
HeyGen improvements.
2026-02-10 12:12:22 -05:00
Filipi da Silva Fuchter
ad179b0852 Merge pull request #3584 from pipecat-ai/filipi/speak_frame
TTS services improvements.
2026-02-10 12:11:47 -05:00
filipi87
5128089d42 Add changelog entries for PR #3653. 2026-02-10 14:02:32 -03:00
filipi87
87a79df048 Updating the heygen examples to use sandbox by default. 2026-02-10 14:02:20 -03:00
filipi87
24f90715e3 Use LITE as the default mode, and add support for video_settings and is_sandbox in LiveAvatarNewSessionRequest. 2026-02-10 14:02:09 -03:00
filipi87
e00b98343e Changelog entries for TTS context tracking 2026-02-10 11:37:21 -03:00
filipi87
ad1bec4583 Updated openai example to use on_tts_request and append_to_text. 2026-02-10 11:28:35 -03:00
filipi87
a47d7f98ee Refactored all 30+ TTS service implementations to support context tracking 2026-02-10 11:28:08 -03:00
filipi87
19cd242261 Added TTS context tracking system to trace audio generation through the pipeline. 2026-02-10 11:27:58 -03:00
filipi87
9bb712a47b Simplified universal context aggregators, _handle_text() to only check frame.append_to_context instead of also checking self._started 2026-02-10 11:27:30 -03:00
filipi87
1dccbe7c0b Simplified context aggregators, _handle_text() to only check frame.append_to_context instead of also checking self._started 2026-02-10 11:27:13 -03:00
Mark Backman
2dd3e2f1e7 Merge pull request #3697 from pipecat-ai/mb/soniox-rt-4
Update SonioxSTTService default model to stt-rt-v4
2026-02-10 09:24:39 -05:00
filipi87
f206aaa28d - Added context_id field to all TTS-related frames (TTSAudioRawFrame, TTSStartedFrame, TTSStoppedFrame, AggregatedTextFrame, TTSTextFrame)
- Added append_to_context parameter to TTSSpeakFrame for conditional LLM context addition
2026-02-10 11:22:26 -03:00
Mark Backman
60e42f5690 Merge pull request #3701 from pipecat-ai/mb/changelog-3700 2026-02-10 09:19:42 -05:00
Mark Backman
88e981c013 Set vad_force_turn_endpoint to False in SonioxSTTService 2026-02-10 09:16:03 -05:00
Mark Backman
7bd8dfe898 Add changelog for PR 3700 2026-02-10 08:20:03 -05:00
Mark Backman
83039a1a35 Merge pull request #3700 from ashotbagh/chore/async-migration
chore: update Async API URL and default model
2026-02-10 08:17:04 -05:00
Ashot
28e8b61eb4 chore: update Async API URL and default model 2026-02-10 15:23:51 +04:00
Mark Backman
d47d95e1f0 Update SonioxSTTService default model to stt-rt-v4 2026-02-09 23:48:08 -05:00
Mark Backman
79b9d929c5 Merge pull request #3682 from eoinoreilly30/patch-1
Add new voice options 'marin' and 'cedar'
2026-02-09 23:47:39 -05:00
Eoin
dfc0856d54 Added changelog entry 2026-02-10 12:31:26 +09:00
Eoin
f3c1cd4cd6 Lint 2026-02-10 12:31:26 +09:00
Eoin
18d91d6df3 Add new voice options 'marin' and 'cedar' 2026-02-10 12:31:26 +09:00
Mark Backman
688f502488 Merge pull request #3644 from pipecat-ai/mb/update-assembly-ai-default-config
AssemblyAISTTService: Disable turn detection when setting vad_force_t…
2026-02-09 22:27:44 -05:00
Mark Backman
7684a94c33 AssemblyAISTTService: Disable turn detection when setting vad_force_turn_endpoint to True 2026-02-09 22:20:35 -05:00
Aleix Conchillo Flaqué
e27f4bccfb Merge pull request #3695 from pipecat-ai/aleix/more-claude-updates
CLAUDE.md: add pipeline task and pipeline runner
2026-02-09 18:14:30 -08:00
Mark Backman
fa8b0aeda8 Merge pull request #3690 from pipecat-ai/mb/add-claude-settings
Add shared Claude Code settings
2026-02-09 19:22:28 -05:00
Aleix Conchillo Flaqué
946f0f4e77 CLAUDE.md: add pipeline task and pipeline runner 2026-02-09 16:19:11 -08:00
Mark Backman
b9cf3f3225 Merge pull request #3694 from pipecat-ai/mb/claude-updates
Add observers, error handling, task management, and testing to CLAUDE.md
2026-02-09 19:05:49 -05:00
Aleix Conchillo Flaqué
d32c4b2f5f Merge pull request #3693 from pipecat-ai/aleix/update-examples-remove-default-turn-analyzer
remove the now default turn analyzer from examples
2026-02-09 16:04:19 -08:00
Mark Backman
77a5d16a10 Merge pull request #3692 from pipecat-ai/mb/request-metadata-updates
Rename RequestMetadataFrame to ServiceSwitcherRequestMetadataFrame with service targeting
2026-02-09 18:19:29 -05:00
Mark Backman
ca224834b2 Add observers, error handling, task management, and testing to CLAUDE.md 2026-02-09 18:12:24 -05:00
Aleix Conchillo Flaqué
3867bc6302 LLMUserAggregator: update turn analyzer warning 2026-02-09 14:33:38 -08:00
Aleix Conchillo Flaqué
83a8379401 examples: remove the now default turn analyzer user turn stop strategy 2026-02-09 14:33:38 -08:00
mattie ruth backman
f2688deb0d Update args field in RTVILLMFunctionCallInProgressMessageData to match API of existing RTVILLMFunctionCallResultData. 2026-02-09 17:17:01 -05:00
Mark Backman
981253c703 Rename RequestMetadataFrame to ServiceSwitcherRequestMetadataFrame with service targeting
Add a `service` field so the frame targets a specific service, allowing
ServiceSwitcher.push_frame to consume it only when the targeted service
matches the active service. STTService and test mocks now push the frame
downstream after handling instead of silently consuming it.
2026-02-09 16:48:34 -05:00
Mark Backman
aa6c9797ca Merge pull request #3671 from pipecat-ai/mb/sarvam-cleanup
Clean up on Sarvam STT and TTS classes
2026-02-09 15:58:34 -05:00
Mark Backman
6305e04569 Clean up on Sarvam STT and TTS classes 2026-02-09 15:53:05 -05:00
Mark Backman
3ff9b7b5ad Merge pull request #3687 from pipecat-ai/mb/rtvi-mute-events
Emit RTVI events for user mute/unmute
2026-02-09 15:18:28 -05:00
Mark Backman
cc797ba3cf Add shared Claude Code settings to disable commit attribution 2026-02-09 15:15:31 -05:00
Aleix Conchillo Flaqué
91c8122c17 Merge pull request #3689 from pipecat-ai/aleix/default-smart-turn-stop-strategy
Use TurnAnalyzerUserTurnStopStrategy as default stop strategy
2026-02-09 12:07:16 -08:00
Aleix Conchillo Flaqué
944ac92593 Fix test_langchain to use explicit stop strategy
The default stop strategy changed to TurnAnalyzerUserTurnStopStrategy,
which requires actual audio analysis. Use SpeechTimeoutUserTurnStopStrategy
explicitly since this test is not testing turn detection.
2026-02-09 12:00:41 -08:00
Aleix Conchillo Flaqué
ca0d2e68c3 Add changelog for #3689 2026-02-09 11:58:09 -08:00
Aleix Conchillo Flaqué
631463e573 Use TurnAnalyzerUserTurnStopStrategy as default stop strategy
Change the default user turn stop strategy from
TranscriptionUserTurnStopStrategy to TurnAnalyzerUserTurnStopStrategy
with LocalSmartTurnAnalyzerV3. Also reduce AUDIO_INPUT_TIMEOUT_SECS
from 1.0 to 0.5 and remove its debug log.
2026-02-09 11:58:09 -08:00
Mark Backman
6a553367a2 Merge pull request #3676 from pipecat-ai/mb/code-review-skill
Add Claude code-review skill
2026-02-09 14:48:20 -05:00
Mark Backman
00ec6c77ea Emit RTVI events for user mute/unmute state changes
Add UserMuteStartedFrame/UserMuteStoppedFrame and corresponding RTVI
messages so clients can observe when mute strategies activate/deactivate.
2026-02-09 14:44:32 -05:00
Mark Backman
ee6520db30 Merge pull request #3637 from pipecat-ai/mb/improve-user-stop-turn
Improve user turn stop timing by triggering timeout from VAD stop, push STT metadata to user aggregator
2026-02-09 14:43:22 -05:00
Aleix Conchillo Flaqué
2a572aedba Simplify ServiceSwitcher with closure-based filters
- Make ServiceSwitcherStrategy inherit from BaseObject with properties
  for services and active_service, and move initial service selection
  into the base class
- Add on_service_switched event to ServiceSwitcherStrategy
- handle_frame now returns the switched-to service (or None), allowing
  ServiceSwitcher to swallow ManuallySwitchServiceFrame on switch and
  request metadata from the new active service
- Override push_frame to suppress RequestMetadataFrame and
  ServiceMetadataFrame from inactive services
- Remove ServiceSwitcherFilter and ServiceSwitcherFilterFrame in favor
  of plain FunctionFilter instances with closures that check the
  strategy's active service directly
- FunctionFilter: add FilterType alias
- FunctionFilter: when direction is None, frames in both directions
  are filtered instead of just one
- Add docstrings to ServiceSwitcher and its components
2026-02-09 14:12:33 -05:00
Mark Backman
5e66702cf5 Improved the accuracy of the UserBotLatencyObserver and UserBotLatencyLogObserver 2026-02-09 14:12:33 -05:00
Mark Backman
34b068d657 Improve user turn stop timing by triggering timeout from VAD stop
Refactor TranscriptionUserTurnStopStrategy and TurnAnalyzerUserTurnStopStrategy
to use VADUserStoppedSpeakingFrame as the ground truth for when speech ended,
rather than triggering timeouts from transcription frames.
2026-02-09 14:12:33 -05:00
Mark Backman
05e2a013b3 Merge pull request #3672 from pipecat-ai/mb/rtvi-duplicate-events
Filter RTVIObserver to downstream frames only and broadcast FunctionCallCancelFrame
2026-02-09 12:58:28 -05:00
Mark Backman
5f64dae0cf Filter RTVIObserver to downstream frames only and broadcast FunctionCallCancelFrame
RTVIObserver now skips upstream frames to prevent duplicate RTVI messages
when frames are broadcast in both directions. Also changed
FunctionCallCancelFrame to use broadcast_frame for consistency with
other function call frames.
2026-02-09 12:39:25 -05:00
Mark Backman
1bf8b54502 Merge pull request #3683 from dhruvladia-sarvam/sarvam-v3-update 2026-02-09 06:49:59 -05:00
Gökmen Görgen
ed3ec045aa add changelog file. 2026-02-09 12:04:09 +01:00
Gökmen Görgen
67d39a97f7 AIC model caching. 2026-02-09 11:51:28 +01:00
dhruvladia-sarvam
947ff03c9f v3 addition 2026-02-09 13:04:45 +05:30
Om Chauhan
a4e187e138 replace background task with flush-on-join 2026-02-09 06:04:08 +05:30
Om Chauhan
9f380170d7 added changelog 2026-02-09 05:37:43 +05:30
Om Chauhan
12f27f9cda fix(daily): queue outbound messages until transport joins 2026-02-09 05:37:43 +05:30
Mark Backman
104d06551a Merge pull request #3679 from pipecat-ai/mb/remove-to-be-updated
Remove SequentialMergePipeline
2026-02-08 15:28:38 -05:00
Mark Backman
90ad2a4e81 Remove SequentialMergePipeline 2026-02-08 14:44:48 -05:00
Mark Backman
3494a94cac Add Claude code-review skill 2026-02-08 11:06:48 -05:00
Mark Backman
570f2d7fc0 Merge pull request #3667 from ianbbqzy/ian/fix-auto-mode-space
[inworld] aggregate_sentence mode needs trailing space
2026-02-07 18:22:32 -05:00
Ian Lee
f3d99adf8f [inworld] aggregate_sentence mode needs trailing space 2026-02-07 15:18:24 -08:00
Mark Backman
d34f416281 Merge pull request #3598 from dhruvladia-sarvam/sarvam-v3-update
ASR and TTS v3 update
2026-02-07 10:51:35 -05:00
Mark Backman
5a1deb7cb4 Merge pull request #3659 from pipecat-ai/mb/change-vad-defaults
Set VADParams stop_secs to 0.2 by default
2026-02-06 23:51:50 -05:00
Mark Backman
a5fc2b1650 Set VADParams stop_secs to 0.2 by default 2026-02-06 23:49:08 -05:00
Aleix Conchillo Flaqué
5cb8d91431 added changelog file for #3616 2026-02-06 16:45:23 -08:00
Aleix Conchillo Flaqué
ce690848c0 Merge pull request #3616 from omChauhanDev/fix/function-call-timeout-task-cleanup
fix: ensure function call timeout task is always cancelled
2026-02-06 16:40:56 -08:00
Aleix Conchillo Flaqué
30f51edfcd Merge pull request #3668 from pipecat-ai/aleix/parallel-pipeline-buffering
Buffer internal frames during ParallelPipeline lifecycle sync
2026-02-06 15:25:32 -08:00
Aleix Conchillo Flaqué
cd03d449cb Update changelog skill with skip rules and allowed types 2026-02-06 15:23:14 -08:00
Aleix Conchillo Flaqué
57df03aade Update CLAUDE.md with PR workflow instructions 2026-02-06 15:23:14 -08:00
Aleix Conchillo Flaqué
4945cfbd8f Buffer internal frames during ParallelPipeline lifecycle synchronization
Processors inside parallel sub-pipelines can push frames during
StartFrame/EndFrame/CancelFrame processing. Previously these frames
could escape the ParallelPipeline before all branches finished
processing the lifecycle frame. Now they are buffered and flushed
after synchronization completes.
2026-02-06 15:15:46 -08:00
Mark Backman
8d37d3bae7 Merge pull request #3666 from pipecat-ai/mb/deepgram-stt-smart-format
DeepgramSTTService: disable smart_format by default
2026-02-06 14:04:37 -05:00
Mark Backman
d7b1624d3c Merge pull request #3663 from lukepayyapilli/fix/stream-close-sambanova-google
fix: close stream on cancellation for SambaNova and Google OpenAI services
2026-02-06 14:02:31 -05:00
Mark Backman
7f65204c3b DeepgramSTTService: disable smart_format by default 2026-02-06 13:45:10 -05:00
Aleix Conchillo Flaqué
97eff414c3 Merge pull request #3660 from pipecat-ai/aleix/interruption-frame-completion-event
Attach asyncio.Event to InterruptionFrame for completion signaling
2026-02-06 10:14:26 -08:00
Aleix Conchillo Flaqué
5b67e76de7 Add changelog for PR #3660 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
b9e79bd06a CLAUDE.md: explain about InterruptionFrame.complete() 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
d5105a78e6 STTMuteFilter should call frame.complete() when InterruptionFrame is blocked 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
a352b2d7a0 Add tests for InterruptionFrame completion event
Add tests for the event-based interruption completion: complete() sets
the event, complete() is safe without an event, the event fires at
the pipeline sink, and a warning is logged when the frame is blocked.

Also remove the unconditional await after the timeout so the function
returns instead of hanging when complete() is never called.
2026-02-06 09:57:24 -08:00
Aleix Conchillo Flaqué
2345090b10 Attach asyncio.Event to InterruptionFrame for completion signaling
Move the interruption wait event from per-processor instance state to
the frame itself. The event is created in
push_interruption_task_frame_and_wait(), threaded through
InterruptionTaskFrame → InterruptionFrame, and set when the frame
reaches the pipeline sink. This scopes the event to each interruption
flow rather than sharing mutable state on the processor.

Also adds a 2s timeout warning to help diagnose cases where
InterruptionFrame.complete() is never called.
2026-02-06 09:57:24 -08:00
Mark Backman
af562bf9a8 Merge pull request #3664 from pipecat-ai/mb/elevenlabs-scribe-v2
Update ElevenLabsSTTService to scribe_v2
2026-02-06 12:31:44 -05:00
Mark Backman
d4993f0dcf Update ElevenLabsSTTService to scribe_v2 2026-02-06 11:37:23 -05:00
Luke Payyapilli
1790a84bfd add changelog 2026-02-06 10:05:02 -05:00
Luke Payyapilli
29c53b99a4 fix: close stream on cancellation for SambaNova and Google OpenAI services 2026-02-06 10:02:40 -05:00
Mark Backman
aa5a855eab Merge pull request #3656 from pipecat-ai/mb/openai-realtime-stt
Add OpenAIRealtimeSTTService
2026-02-06 09:15:58 -05:00
Mark Backman
e66d6f8ffe Merge pull request #3658 from pipecat-ai/mb/bump-protobuf-5.29.6
Upgrade protobuf to >=5.29.6
2026-02-05 19:09:30 -05:00
Mark Backman
b8ac2ba713 Merge pull request #3593 from ianbbqzy/ian/inworld-auto-mode
Add auto_mode support for inworld plugin
2026-02-05 18:16:38 -05:00
Ian Lee
6eea40858e fix lint and changelog 2026-02-05 15:10:36 -08:00
Mark Backman
90700d10aa Upgrade protobuf to >=5.29.6 2026-02-05 18:08:52 -05:00
Mark Backman
fa85f7bbc7 Merge pull request #3640 from lukepayyapilli/fix/openai-stream-close
fix: close stream on cancellation to prevent socket leaks
2026-02-05 18:00:06 -05:00
Mark Backman
669f013970 Merge pull request #3657 from pipecat-ai/filipi/changing_no_audio_log_to_debug
Changing the ‘no audio received’ log from warning to debug.
2026-02-05 17:35:24 -05:00
filipi87
76f63e54e2 Changing the ‘no audio received’ log from warning to debug. 2026-02-05 18:07:14 -03:00
Filipi da Silva Fuchter
cce5a13444 Merge pull request #3650 from pipecat-ai/filipi/twilio_issues
Ignoring RTVI messages inside the Serializers by default.
2026-02-05 15:52:59 -05:00
Mark Backman
d11e1cd631 Update 13k to use ElevenLabsRealtimeSTTService 2026-02-05 15:48:00 -05:00
Mark Backman
8b9da632d1 Add OpenAIRealtimeSTTService 2026-02-05 15:48:00 -05:00
Mark Backman
b36f7892a4 Merge pull request #3654 from pipecat-ai/aleix/more-claude-update
CLAUDE.md: add RTVI and serializers
2026-02-05 15:23:35 -05:00
Mark Backman
9b43cde128 Merge pull request #3355 from itsderek23/user-bot-latency
Add `user_bot_latency_seconds` to OpenTelemetry turn spans
2026-02-05 15:23:15 -05:00
filipi87
6af4d872a8 Refactoring the serializers to ignore the RTVI messages by default. 2026-02-05 16:52:53 -03:00
Ian Lee
22398e1410 add changelog back 2026-02-05 11:39:39 -08:00
Ian Lee
d10467e043 update timestamps reset handling 2026-02-05 11:39:39 -08:00
Ian Lee
cbe131636d add changelog 2026-02-05 11:39:39 -08:00
Ian Lee
fef9e3ea32 Add auto_mode support for inworld plugin 2026-02-05 11:39:39 -08:00
Mark Backman
56d8ef2bf4 Deprecate UserBotLatencyLogObserver, update 29 example 2026-02-05 14:29:45 -05:00
Derek Haynes
8791559351 Add changelog entry for PR #3355 2026-02-05 14:29:45 -05:00
Derek Haynes
f6c919354f Add test for user bot latency 2026-02-05 14:29:45 -05:00
Derek Haynes
93138466d6 Feat: Add user-bot latency to OTel turn spans
This adds user-to-bot response latency tracking to OpenTelemetry spans:

- Created UserBotLatencyObserver as a reusable component for tracking
user-to-bot response latency
- Records the value as an attribute on turn spans (turn.user_bot_latency_seconds)
- Updated TurnTraceObserver to use UserBotLatencyObserver, following the same pattern as TurnTrackingObserver
- Updated PipelineTask to automatically create and wire UserBotLatencyObserver
when tracing is enabled (same as TurnTrackingObserver)
2026-02-05 14:29:42 -05:00
Mark Backman
5a5a98b497 Merge pull request #3649 from itsderek23/fix/tracing-orphan-spans
Fix orphan otel spans during flow initialization and transitions
2026-02-05 14:23:52 -05:00
Aleix Conchillo Flaqué
2b4f507d37 CLAUDE.md: add RTVI and serializers 2026-02-05 11:06:00 -08:00
Mark Backman
d6f3a90662 Merge pull request #3652 from pipecat-ai/mb/upgrade-small-webrtc-prebuilt-2.1.0
Upgrade pipecat-ai-small-webrtc-prebuilt to 2.1.0
2026-02-05 13:48:54 -05:00
Derek Haynes
8fb0e37965 Update changelog for #3649 2026-02-05 11:35:22 -07:00
Derek Haynes
0d45b48f7b Fix import placement 2026-02-05 11:26:58 -07:00
Mark Backman
6af4520b1f Merge pull request #3635 from pipecat-ai/filipi/fix_websocket
Fixed an error in the WebSocket transport that occurred when an InputTransportMessageFrame was received and broadcast.
2026-02-05 12:22:59 -05:00
filipi87
ba469e5645 Add changelog entry
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 12:19:51 -05:00
Mark Backman
bd12b60b5c Merge pull request #3614 from okue/fix/websocket-broadcast-frame-misuse
fix: pass frame class instead of instance to broadcast_frame in websocket transports
2026-02-05 12:19:03 -05:00
Mark Backman
54db37ea47 Upgrade pipecat-ai-small-webrtc-prebuilt to 2.1.0 2026-02-05 12:09:51 -05:00
filipi87
752e16f553 Ignoring RTVI messages inside TwilioSerializer by default. 2026-02-05 10:51:03 -03:00
Derek Haynes
7c7408a048 Fix orphan spans in tracing during flow initialization and transitions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 06:06:13 -07:00
Mark Backman
8f42343927 Merge pull request #3630 from pipecat-ai/mb/add-function-call-messages-rtvi
Add native RTVI function call lifecycle messages
2026-02-04 16:20:42 -05:00
Mark Backman
46da6cd91b Update changelogs 2026-02-04 11:19:30 -05:00
Mark Backman
ecb02d9049 Bump RTVI_PROTOCOL_VERSION to 1.2.0 2026-02-04 11:17:38 -05:00
Mark Backman
cc68e00125 Deprecate llm-function-call message 2026-02-04 11:17:23 -05:00
Mark Backman
e0e3b5250b Add RTVIObserverParams to control what information is included in function call events 2026-02-04 11:05:05 -05:00
Luke Payyapilli
55a3b10e70 fix(openai): close stream on cancellation to prevent socket leaks 2026-02-04 09:59:10 -05:00
dhruvladia-sarvam
e6b06414b3 change default speaker for bulbul:v3-beta to shubh 2026-02-04 16:46:35 +05:30
Aleix Conchillo Flaqué
6bcfb40d12 Merge pull request #3636 from pipecat-ai/aleix/initial-claude-md
initial CLAUDE.md
2026-02-03 19:31:16 -08:00
Aleix Conchillo Flaqué
65b1a8ce36 initial CLAUDE.md 2026-02-03 18:04:54 -08:00
Mark Backman
2db3d94d06 Merge pull request #3628 from pipecat-ai/mb/broadcast-speech-control-params-frame
Fix: Broadcast SpeechControlParamsFrame from VADController
2026-02-03 18:44:15 -05:00
Mark Backman
2a26b9f7a3 Fix: Broadcast SpeechControlParamsFrame from VADController 2026-02-03 18:40:39 -05:00
Aleix Conchillo Flaqué
4f77c532fb Merge pull request #3623 from pipecat-ai/aleix/pipeline-task-rtvi-always-set-bot-ready
PipelineTask: also call set_bot_ready() for external RTVI processors
2026-02-03 14:21:03 -08:00
Aleix Conchillo Flaqué
c3a4da4a29 PipelineTask: also call set_bot_ready() for external RTVI processors 2026-02-03 14:16:08 -08:00
Mark Backman
84ca0b6d58 Merge pull request #3629 from pipecat-ai/fix/telephony-websocket-stopasynciteration
Fix StopAsyncIteration in parse_telephony_websocket
2026-02-03 12:10:07 -05:00
Mark Backman
c1857d255d Avoid nesting try/excepts 2026-02-03 12:00:04 -05:00
Mark Backman
d50ec33079 Merge pull request #3542 from lukepayyapilli/fix/terminal-frames-uninterruptible
fix: make EndFrame and StopFrame uninterruptible to prevent pipeline freeze
2026-02-03 10:08:17 -05:00
Mark Backman
40c84faff5 Remove handle_function_call_start 2026-02-03 10:00:59 -05:00
Mark Backman
84cd9346f9 Add native RTVI function call lifecycle messages 2026-02-03 10:00:59 -05:00
Luke Payyapilli
5d5b19e1d2 Add changelog entry 2026-02-03 09:12:59 -05:00
Luke Payyapilli
8d3e10f054 Make EndFrame and StopFrame uninterruptible to prevent pipeline freeze 2026-02-03 09:12:59 -05:00
dhruvladia-sarvam
1665ce181a refactor(sarvam): centralize model configuration with dataclasses 2026-02-03 14:33:41 +05:30
James Hush
803a20cc00 Fix formatting: remove extra blank line
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:46:44 +08:00
James Hush
90bead06ab Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-03 16:42:13 +08:00
James Hush
b427d534ae Add tests for parse_telephony_websocket StopAsyncIteration handling
Tests cover:
- No messages received (raises ValueError)
- One message received (logs warning, continues)
- Two messages received (normal operation)
- All telephony providers (Twilio, Telnyx, Plivo, Exotel)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:33:36 +08:00
James Hush
b030f1178d Add changelog and improve docstring for parse_telephony_websocket
- Added changelog entry for bug fix
- Enhanced docstring with Args and Raises sections

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:26:09 +08:00
James Hush
a627597bca Fix StopAsyncIteration in parse_telephony_websocket
Handle WebSocket disconnections gracefully when telephony providers send
fewer messages than expected. Adds explicit StopAsyncIteration handling
for both first and second message retrieval.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:25:07 +08:00
Aleix Conchillo Flaqué
4c10ddb7bb upgrade uv.lock 2026-02-02 16:25:06 -08:00
Mark Backman
a4e499dc80 Merge pull request #3617 from pipecat-ai/fix/cjk-sentence-splitting
Fix sentence splitting for CJK and other non-Latin languages
2026-02-02 18:16:51 -05:00
Mark Backman
ca49acfaa6 Merge pull request #3619 from pipecat-ai/mb/resemble-readme
Resemble cleanup
2026-02-02 09:20:11 -05:00
Mark Backman
86147f15f3 Renumber the Resemble foundational example 2026-02-02 09:07:05 -05:00
Mark Backman
5cda72d138 Add Resemble TTS to README 2026-02-02 09:05:03 -05:00
Mark Backman
54e62a8177 Merge pull request #3134 from pipecat-ai/mb/resemble-tts-draft
Add ResembleAITTSService
2026-02-02 08:59:27 -05:00
Mark Backman
a592b7fdf0 Update per PR 1789, align with ErrorFrame norms 2026-02-02 08:55:29 -05:00
Mark Backman
ba2b7c05d6 Add ResembleAITTSService 2026-02-02 08:55:27 -05:00
James Hush
774041e9a1 Add changelog for PR #3617
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:47:22 +08:00
James Hush
763002f2bc Fix sentence splitting for CJK and other non-Latin languages in TTS pipeline
NLTK's sent_tokenize() only supports ~15 European languages and defaults to
English. For Japanese, Chinese, Korean, Hindi, Arabic, and other non-Latin
languages, NLTK fails to recognize sentence boundaries like 。?! causing
text to accumulate until flush instead of being emitted sentence-by-sentence.

Add a fallback in match_endofsentence() that scans for unambiguous non-Latin
sentence-ending punctuation when NLTK fails to split the text. Latin
punctuation (. ! ? ; …) is excluded from the fallback since NLTK handles
those correctly and they can be ambiguous (abbreviations, decimals, etc.).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:27:49 +08:00
Om Chauhan
50dedf350d fix: ensure function call timeout task is always cancelled 2026-02-02 08:38:54 +05:30
okue
d3ecbb11c1 fix: pass frame class instead of instance to broadcast_frame in websocket transports
broadcast_frame() expects a frame class and kwargs, but the three
websocket input transports (fastapi, client, server) were incorrectly
passing a frame instance. This would cause a TypeError at runtime when
an InputTransportMessageFrame was received.
2026-02-01 20:38:34 +09:00
Aleix Conchillo Flaqué
f453227ba3 Merge pull request #3612 from pipecat-ai/aleix/use-kokoro-onnx
KokoroTTSService: use kokoro-onnx instead of kokoro
2026-01-31 21:03:55 -08:00
Aleix Conchillo Flaqué
52cc64019a Merge pull request #3611 from pipecat-ai/aleix/aicoustics-example-update
examples: update 07zd to use vad_analyzer in LLMUserAggregator
2026-01-31 21:02:50 -08:00
Aleix Conchillo Flaqué
95689cc81c KokoroTTSService: use kokoro-onnx instead of kokoro 2026-01-31 17:20:27 -08:00
Aleix Conchillo Flaqué
675c7c43e3 examples: update 07zd to use vad_analyzer in LLMUserAggregator 2026-01-31 15:31:15 -08:00
Aleix Conchillo Flaqué
bfd19e867c Merge pull request #3610 from pipecat-ai/aleix/dont-add-rtvi-observer-if-already-there
PipelineTask: don't add RTVIObserver if already there
2026-01-31 14:57:52 -08:00
Aleix Conchillo Flaqué
acc9923c0a PipelineTask: don't add RTVIObserver if already there 2026-01-31 14:54:29 -08:00
Mark Backman
bdc9e7e2e4 Merge pull request #3608 from pipecat-ai/mb/quickstart-0.0.101
Update quickstart for 0.0.101
2026-01-31 10:39:17 -05:00
Mark Backman
a587e1b99a Update quickstart for 0.0.101 2026-01-31 09:52:24 -05:00
Aleix Conchillo Flaqué
7853e5ca93 Merge pull request #3606 from pipecat-ai/changelog-0.0.101
Release 0.0.101 - Changelog Update
2026-01-30 22:58:22 -08:00
aconchillo
614b8e1a62 Update changelog for version 0.0.101 2026-01-30 22:54:31 -08:00
Aleix Conchillo Flaqué
ef51c2a5c6 changelog: fix 3582 changed file 2026-01-30 22:48:26 -08:00
Aleix Conchillo Flaqué
f42dc0d38e Merge pull request #3605 from pipecat-ai/aleix/gemini-live-schedule-transcription-timeout-handler
GeminiLiveLLMService: let the transcription timeout handler be scheduled
2026-01-30 22:44:05 -08:00
Aleix Conchillo Flaqué
d87f3543c7 GeminiLiveLLMService: let the transcription timeout handler be scheduled 2026-01-30 22:41:10 -08:00
Aleix Conchillo Flaqué
fee633cb92 scripts(evals): disable kokoro for now 2026-01-30 21:23:42 -08:00
Aleix Conchillo Flaqué
607af91153 Merge pull request #3604 from pipecat-ai/mb/fix-ivr-navigator-aggregation
Fix IVRNavigator to push AggregatedTextFrame when switching to conver…
2026-01-30 21:22:20 -08:00
Mark Backman
e779233918 Fix IVRNavigator to push AggregatedTextFrame when switching to conversation mode 2026-01-30 21:07:49 -05:00
Aleix Conchillo Flaqué
604d5d0b14 examples: update 07zi and 07zj to use vad_analyzer form LLMUserAggregator 2026-01-30 16:14:02 -08:00
Mark Backman
342ae7af41 Merge pull request #3601 from pipecat-ai/mb/add-22-release-evals
Add 22 foundational to release evals
2026-01-30 15:31:54 -05:00
Mark Backman
c92ec1552e Add 22 foundational to release evals 2026-01-30 15:12:52 -05:00
Aleix Conchillo Flaqué
93160f1455 scripts(evals): remove vad_analyzer from transport 2026-01-30 12:08:12 -08:00
Aleix Conchillo Flaqué
e3158e1131 Merge pull request #3600 from pipecat-ai/aleix/llm-server-timeout-task-never-waited
LLMService: make sure function call timeout handler is started
2026-01-30 12:01:18 -08:00
Mark Backman
63a23246d5 Add UserTurnCompletionLLMServiceMixin (#3518)
* Added UserTurnCompletionLLMServiceMixin class

* Added 22-filter-incomplete-turns.py foundational example

* Removed old 22 natural conversation foundational examples

* Added test_user_turn_completion_mixin.py
2026-01-30 14:57:15 -05:00
Aleix Conchillo Flaqué
569ea9849a Merge pull request #3599 from pipecat-ai/aleix/release-evals-disable-rtvi
scripts(evals): disable RTVI
2026-01-30 11:44:46 -08:00
Aleix Conchillo Flaqué
a98ca9b65b LLMService: make sure function call timeout handler is started 2026-01-30 11:38:26 -08:00
Aleix Conchillo Flaqué
c9310789dc scripts(evals): use new vad_analyzer from LLMUSerAggregator 2026-01-30 10:57:17 -08:00
Aleix Conchillo Flaqué
b93e12d701 scripts(evals): disable RTVI 2026-01-30 10:52:38 -08:00
Aleix Conchillo Flaqué
3f77da627d Merge pull request #3583 from pipecat-ai/aleix/move-vad-analyzer-to-llm-user-aggregator
VAD analyzer is now passed to LLMUserAggregator
2026-01-30 10:46:10 -08:00
Aleix Conchillo Flaqué
35d265770d LLMUserAggregator: don't process certain self-queued frames 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
9632efec8c VADProcessor: broadcast frames 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
27dbfa1eda NvidiaTTSService: return AsyncIterator instead of AsyncIterable 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
183c0aa4ef LLMUserAggregator: queue frames internally so strategies and controllers can process them 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
a69a037ffa changelog: add updates for #3583 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
c46e7f5da0 TurnAnalyzerUserTurnStopStrategy: only update vad params if frame contains vad 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
307aeaeda0 examples: update with LLMUserAggregatorParams vad_analyzer and VADProcessor 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
305ab44132 tests: add unittest.main() call 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
b486f35c70 audio: add new VADProcessor 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
c92080b0d2 LLMUserAggregator: add vad_analyzer and use VADController 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
ddfedaf478 audio(vad): add new VADController 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
b1ad4d5ab0 BaseInputTransport: deprecate vad_analyzer 2026-01-30 10:07:33 -08:00
Aleix Conchillo Flaqué
0857aa87be Merge pull request #3595 from pipecat-ai/aleix/add-kokoro-tts-support
services(tss): add new KokoroTTSService
2026-01-30 09:49:05 -08:00
Aleix Conchillo Flaqué
fd3c5f69b7 upgrade uv.lock 2026-01-30 09:41:33 -08:00
Aleix Conchillo Flaqué
72ab329513 services(tss): add new KokoroTTSService 2026-01-30 09:39:01 -08:00
Filipi da Silva Fuchter
7999d08b7e Merge pull request #3052 from Navigate-AI/fork/main
Include pts in video and audio frames in SmallWebRTCClient
2026-01-30 09:03:29 -05:00
dhruvladia-sarvam
57821cf709 fix 2026-01-30 16:07:52 +05:30
dhruvladia-sarvam
18045582a9 ASR and TTS v3 update 2026-01-30 15:53:06 +05:30
Mark Backman
7be2b8cc34 Merge pull request #3587 from pipecat-ai/mb/gradium-improvements
GradiumSTTService now flushes pending transcripts on VAD stopped dete…
2026-01-29 18:11:25 -05:00
Aleix Conchillo Flaqué
671cc8eb74 Merge pull request #3590 from pipecat-ai/aleix/custom-cli-runner-args
runner: allow custom CLI arguments
2026-01-29 13:53:27 -08:00
Aleix Conchillo Flaqué
b4dce656f0 Merge pull request #3594 from pipecat-ai/aleix/user-turn-controller-reset-timeout-on-interims
UserTurnController: reset user turn timeout with interim transcriptions
2026-01-29 13:12:44 -08:00
Aleix Conchillo Flaqué
253a1d1114 UserTurnController: reset user turn timeout with interim transcriptions 2026-01-29 13:10:10 -08:00
Aleix Conchillo Flaqué
ca613bcb79 Merge pull request #3592 from pipecat-ai/aleix/broadcast-frame-no-deepcopy
don't deep copy fields when broadcasting frames
2026-01-29 11:50:20 -08:00
Aleix Conchillo Flaqué
0423acd8a0 STTService: just clear buffer before running run_stt() 2026-01-29 11:47:57 -08:00
Aleix Conchillo Flaqué
7eabaaa0ef FrameProcessors: do not deepcopy fields when broadcasting frames 2026-01-29 11:47:57 -08:00
Aleix Conchillo Flaqué
bbb8b53d03 runner: allow custom CLI arguments 2026-01-29 10:15:53 -08:00
Aleix Conchillo Flaqué
f3b72e9263 Merge pull request #3585 from pipecat-ai/aleix/improve-piper-tts-support
improve Piper TTS support
2026-01-29 08:36:13 -08:00
Mark Backman
31c7fbc5ba Add delay_in_frames and language support 2026-01-29 10:59:04 -05:00
Mark Backman
6ab12626d6 GradiumSTTService now flushes pending transcripts on VAD stopped detection 2026-01-29 10:26:17 -05:00
Mark Backman
b77a50de73 Merge pull request #3529 from lukepayyapilli/fix/llm-timeout-without-retry
feat: handle exceptions for BaseOpenAILLMService
2026-01-29 09:12:54 -05:00
Luke Payyapilli
433c1b9b92 add catch-all exception handler per review feedback 2026-01-29 09:07:06 -05:00
Aleix Conchillo Flaqué
bd00587092 changelog: add files for 3585 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
5a85e27cc5 PiperHttpTTSService: allow passing a voice id 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
11daa43b1b TTSService: resample _stream_audio_frames_from_iterator() input audio if needed 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
875614ff7a tts: add support for local PiperTTSService 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
eb1bf1e446 tts: rename PiperTTSService to PiperHttpTTSService 2026-01-28 23:27:32 -08:00
mattie ruth backman
7456a0a55f Fix the /start and /offer/api proxy endpoints for smallWebRTC to match pipecat cloud behavior WRT requestData 2026-01-28 15:25:13 -05:00
Filipi da Silva Fuchter
27277ed3d9 Merge pull request #3571 from pipecat-ai/filipi/funcion_call_improvements
Function call improvements
2026-01-28 14:03:40 -05:00
filipi87
5543bc56f3 Add changelog files for PR #3571
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 15:43:59 -03:00
filipi87
c8496dfb8e Updated the examples which use UserImageRequestFrame to defer the function call result. 2026-01-28 15:39:21 -03:00
filipi87
d3f4cbb620 Providing a way to defer the function call results. 2026-01-28 15:39:06 -03:00
filipi87
c9f922c479 Removed an overridden method that was identical to the parent implementation. 2026-01-28 15:38:40 -03:00
Aleix Conchillo Flaqué
49bd3da26b Merge pull request #3582 from pipecat-ai/aleix/daily-sample-room-url
rename DAILY_SAMPLE_ROOM_URL to DAILY_ROOM_URL
2026-01-28 10:38:14 -08:00
Aleix Conchillo Flaqué
f3ef488925 rename DAILY_SAMPLE_ROOM_URL to DAILY_ROOM_URL 2026-01-28 10:05:27 -08:00
Aleix Conchillo Flaqué
4f08098917 Merge pull request #3580 from Pulkit0729/fix/livekit
fix: adding missing livekit transport configs
2026-01-28 10:04:34 -08:00
Pulkit
a7cd5b0322 fix: adding missing livekit transport configs 2026-01-28 23:15:03 +05:30
Aleix Conchillo Flaqué
55dadc9118 tests(genesys): fix formatting 2026-01-28 09:15:42 -08:00
Aleix Conchillo Flaqué
01bbf61e0d Merge pull request #3500 from ssillerom/feature/genesys_serializer
Feature/genesys serializer
2026-01-28 09:09:11 -08:00
ssillerom
10fb77c0e2 added changelog file 2026-01-28 18:07:33 +01:00
ssillerom
2612fae527 ruff linting 2026-01-28 18:02:51 +01:00
ssillerom
c5be67f293 fix: create disconnect message passing output vars 2026-01-28 17:56:21 +01:00
kompfner
312caaba86 Merge pull request #3429 from lukepayyapilli/fix/gemini-live-interrupted-signal
feat: handle server_content.interrupted for faster interruptions
2026-01-28 10:25:36 -05:00
Luke Payyapilli
ff0eb6d286 fix: emit ErrorFrame on LLM completion timeout 2026-01-28 09:44:32 -05:00
ssillerom
ef6bbace98 fixes: super init inhereted class to set event hanlders in the construct 2026-01-28 15:40:24 +01:00
Filipi da Silva Fuchter
06ec21387f Merge pull request #3581 from pipecat-ai/filipi/open_ai_audio_duration
Fixed race condition in OpenAIRealtimeLLMService
2026-01-28 07:42:35 -05:00
filipi87
bdae177125 Adding changelog entry for the OpenAiRealtimeLLMService fix. 2026-01-28 08:39:11 -03:00
filipi87
468e159f9b Fixed race condition in OpenAIRealtimeLLMService that could cause an error when truncating the conversation. 2026-01-28 08:36:31 -03:00
ssillerom
a4acafd3be feature: added event handlers in constructor and call func in each _handle_* func 2026-01-28 10:54:26 +01:00
ssillerom
105824a372 Merge main into feature/genesys_serializer
Incorporates latest changes from main branch including:
- AIC filter and VAD updates
- STT service improvements
- Base serializer changes
- Various bug fixes
2026-01-28 10:48:56 +01:00
ssillerom
55e0d4ecc4 ruff fixes done 2026-01-28 08:59:28 +01:00
ssillerom
9102e81cb8 added tests to the PR 2026-01-27 23:39:43 +01:00
ssillerom
d7d8e93a3d feature: added custom params in closed message to genesys, simplified create_* functions, simplified constructor method and simplified opened message 2026-01-27 23:36:47 +01:00
Mark Backman
bf9b166464 Merge pull request #3575 from pipecat-ai/mb/fix-turn-stopped-event-end-cancel-frame
Emit on_assistant_turn_stopped and on_user_turn_stopped from EndFrame…
2026-01-27 14:55:34 -05:00
Mark Backman
e80e0eab29 Emit on_assistant_turn_stopped and on_user_turn_stopped from EndFrame or CancelFrame 2026-01-27 14:50:10 -05:00
Mark Backman
61242e6575 Merge pull request #3574 from pipecat-ai/mb/fix-websocket-close-message-handling
Fix WebsocketService infinite loop on graceful server disconnect
2026-01-27 13:53:26 -05:00
Aleix Conchillo Flaqué
8841387121 Merge pull request #3560 from pipecat-ai/aleix/serializer-base-objects
FrameSerializer: subclass from BaseObject so we can add events
2026-01-27 09:58:44 -08:00
Aleix Conchillo Flaqué
ee695ae9fe FrameSerializer: subclass from BaseObject so we can add events 2026-01-27 09:53:46 -08:00
Mark Backman
52012b0fb2 Fix WebsocketService infinite loop on graceful server disconnect 2026-01-27 12:41:28 -05:00
Mark Backman
f7a1c6b719 Merge pull request #3408 from ai-coustics/aic-v2
Add ai-coustics AIC SDK v2 support with model downloading
2026-01-27 10:38:26 -05:00
Gökmen Görgen
6aa77ccc13 group aic related changes in changelog. 2026-01-27 16:22:54 +01:00
Gökmen Görgen
45b7ec4e2c re-enable 07zd-interruptible-aicoustics.py in release evals. 2026-01-27 16:18:56 +01:00
Mark Backman
1c434c6ad5 Merge pull request #3562 from speechmatics/fix/smx-ttfs-finals
Support TTFS for Speechmatics STT
2026-01-27 08:35:34 -05:00
Mark Backman
4591affba9 Merge pull request #3568 from pipecat-ai/mb/changelog-3536 2026-01-27 07:14:41 -05:00
Sam Sykes
91346f5f37 Add support for self.request_finalize() for Pipecat-based VAD. 2026-01-27 10:44:35 +00:00
Filipi da Silva Fuchter
6a66ebe332 Merge pull request #3541 from pipecat-ai/filipi/audio_buffer
Refactoring AudioBufferProcessor to fix audio track synchronization.
2026-01-27 05:32:41 -05:00
Filipi da Silva Fuchter
c1d4180042 Merge pull request #3567 from pipecat-ai/filipi/openai_realtime_audio_duration
Fixed race condition in OpenAIRealtimeBetaLLMService
2026-01-27 05:30:33 -05:00
Gökmen Görgen
81a53c699c handle AIC processor init errors gracefully and ensure _aic_ready reflects readiness 2026-01-27 11:28:05 +01:00
Sam Sykes
60168f7f69 remove comment 2026-01-26 23:16:43 +00:00
Sam Sykes
23d7608e5f changelog update 2026-01-26 23:15:30 +00:00
Sam Sykes
99242c0a93 linting updates 2026-01-26 23:14:40 +00:00
Sam Sykes
3a71865cf4 removed old metrics 2026-01-26 23:11:25 +00:00
Mark Backman
ecf2e69f3f Merge pull request #3536 from surapuramakhil/main
LLMAssistantAggregator: preserve non-ASCII characters in JSON output
2026-01-26 16:42:05 -05:00
Mark Backman
febd52274d Add changelog fragment for PR 3536 2026-01-26 16:42:00 -05:00
Mark Backman
1542d922e7 Merge pull request #3546 from pipecat-ai/pk/changelog-fragment-for-pr-3406
Added a changelog fragment for PR 3406
2026-01-26 16:31:57 -05:00
Paul Kompfner
15d5d1159e Added a changelog fragment for PR 3406 2026-01-26 16:27:33 -05:00
Mark Backman
884630a6bd Merge pull request #3559 from pipecat-ai/aleix/transport-broadcast-fixes
transports: fix broadcast_frame_class reference
2026-01-26 16:25:31 -05:00
Mark Backman
1cf137c6a8 Merge pull request #3565 from pipecat-ai/markbackman-patch-1 2026-01-26 15:49:35 -05:00
filipi87
98fcfd7c91 Adding changelog entry for the OpenAiRealtimeBetaLLMService fix. 2026-01-26 17:19:08 -03:00
filipi87
2f23f2e39c Fixed race condition in OpenAIRealtimeBetaLLMService that could cause an error when truncating the conversation. 2026-01-26 17:08:27 -03:00
Mark Backman
9c6b11cecf Update README links to use absolute URLs 2026-01-26 13:03:39 -05:00
Sam Sykes
fc1444c9d6 Updated changelog 2026-01-26 16:25:37 +00:00
Sam Sykes
ea94939add update dependency 2026-01-26 16:24:56 +00:00
Sam Sykes
0c69ae6371 Changelog entry. 2026-01-26 16:07:59 +00:00
Sam Sykes
8b88280bb1 Default to using EXTERNAL mode. 2026-01-26 15:52:42 +00:00
Sam Sykes
960d0faea5 support is_eou for final segment in utterance 2026-01-26 15:48:04 +00:00
Luke Payyapilli
b9390ccb1b Address review: remove UserStartedSpeakingFrame, add explanatory comment 2026-01-26 10:08:17 -05:00
Mark Backman
061a0dc43d Merge pull request #3498 from pipecat-ai/mb/azure-tts-8khz-workaround
AzureTTSService 8khz workaround
2026-01-26 09:48:22 -05:00
Mark Backman
328bbe069f Merge pull request #3554 from pipecat-ai/mb/simplify-stt-ttfb
Simplify STT finalize handling
2026-01-26 08:00:04 -05:00
Mark Backman
dc32ecc872 Merge pull request #3555 from pipecat-ai/mb/speechmatics-stt-ttfb
Align Speechmatics STT TTFB metrics with STT classes
2026-01-26 07:59:34 -05:00
Gökmen Görgen
ca2eb1904f Merge remote-tracking branch 'origin/aic-v2' into aic-v2 2026-01-26 10:16:23 +01:00
Gökmen Görgen
4bce58f270 update changelog and remove outdated dependency notes 2026-01-26 10:15:15 +01:00
Gökmen Görgen
7572d63f8f Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:40 +01:00
Gökmen Görgen
3c463c9416 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:33 +01:00
Gökmen Görgen
bd618d64e3 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:16 +01:00
Gökmen Görgen
a824660df7 add unit tests for AICVADAnalyzer and AICFilter. 2026-01-26 09:56:36 +01:00
Gökmen Görgen
58b9019852 bump aic-sdk to 2.0.1 in optional dependencies. 2026-01-26 09:14:16 +01:00
Gökmen Görgen
afcdef8c81 docstring clarification. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
bd92104fb3 clarify voice confidence method behavior in AIC VAD. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
34e9f224a8 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
dca7f3b5b0 add changelog. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
70a85cd192 use path for keeping the consistency between the parameters. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
91e86658b7 force developer to set a license key, it's required. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
0a8588669c address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
0e99400148 two dots are rust specific thinks, I'm not sure if it's familiar for Python developers. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
648f20db6d Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
09b5b6b12d Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
0e6a423955 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
dc8972cd94 log optimal number of frames for given sample rate in AICFilter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
e4e2231958 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
18b3ee743b replace os with pathlib.Path in AICFilter for path handling consistency. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
65b8e0e89c rename enabled to bypass in AICFilter for clarity. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
b77f8b065f remove voice gain. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
5fd43faec3 add min speech duration. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
abebcf37bd address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
ca4e3c79f9 Update pyproject.toml
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
e8d1bec03b Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
f0cc54589e remove enhancement level parameter from AICFilter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
22b9aac2ff use quail model in the example. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
7f86f4ac27 fix class name. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
dcab79753b even the parameters are fixed, keep aic ready for processing. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
bdded9b026 set SDK ID for telemetry in AIC filter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
1e1e275fea address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
effb6aa8f4 clean up unused imports in audio utils. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a4a9bae79e drop v1 support from aic. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
c943ef9261 keep uv.lock as it is. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
f05809520b Remove outdated AIC Filter and VAD v2 files, migrate to consolidated implementations.
Added the new ACIFilter to the same module.
2026-01-26 08:44:17 +01:00
Gökmen Görgen
ec17dc6626 aic-sdk-py v2.
# Conflicts:
#	uv.lock

# Conflicts:
#	examples/foundational/07zd-interruptible-aicoustics.py
#	pyproject.toml
#	src/pipecat/audio/filters/aic_filter.py
#	src/pipecat/audio/vad/aic_vad.py
2026-01-26 08:44:17 +01:00
Gökmen Görgen
4e85e81d9b Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Tobias <76444201+Fl1tzi@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
a1cc88a233 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Tobias <76444201+Fl1tzi@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
61a230ec53 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Stephan Eckes <stephan@steck.tech>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
a13380b574 clean up unused imports in audio utils. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
2a927189d9 reorganize imports. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a90c15362c drop v1 support from aic. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
d3bdd2d246 use new model id. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
465ae4f706 keep uv.lock as it is. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a0d801b658 Remove outdated AIC Filter and VAD v2 files, migrate to consolidated implementations.
Added the new ACIFilter to the same module.
2026-01-26 08:44:17 +01:00
Gökmen Görgen
35919a84e3 aic-sdk-py v2.
# Conflicts:
#	uv.lock
2026-01-26 08:44:17 +01:00
Aleix Conchillo Flaqué
f94a60f381 transports: fix broadcast_frame_class reference 2026-01-25 15:42:09 -08:00
ssillerom
a446bca72d changes: added OutputTransportUrgentFrame to on closed, removed callback 2026-01-25 21:12:28 +01:00
Sergio Sillero
8ae834366b Merge branch 'pipecat-ai:main' into feature/genesys_serializer 2026-01-25 21:04:27 +01:00
Mark Backman
a4acc12f91 Align Speechmatics STT TTFB metrics with STT classes 2026-01-24 18:26:34 -05:00
Mark Backman
e93112e76e Simplify STT finalize handling 2026-01-24 15:28:27 -05:00
Mark Backman
680bcaac66 Merge pull request #3550 from pipecat-ai/mb/update-smart-turn-data-env-var
Update env var to PIPECAT_SMART_TURN_LOG_DATA
2026-01-24 13:52:36 -05:00
Mark Backman
d2ac9006a2 Update env var to PIPECAT_SMART_TURN_LOG_DATA 2026-01-24 12:50:42 -05:00
Mark Backman
bcb019e8ab Add TTFB metrics for STT services (#3495) 2026-01-23 18:47:34 -05:00
kompfner
4ea546785f Merge pull request #3406 from omChauhanDev/fix/openrouter-gemini-messages
fix(openrouter): handle multiple system messages for Gemini models
2026-01-23 14:53:59 -05:00
filipi87
f128cdd19a Adding a changelog entry to the AudioBufferProcessor fix. 2026-01-23 16:16:01 -03:00
filipi87
7921bce4af Refactoring AudioBufferProcessor to fix audio track synchronization. 2026-01-23 16:15:48 -03:00
Luke Payyapilli
cadced3f79 feat: handle server_content.interrupted for faster barge-in response 2026-01-23 10:41:04 -05:00
Aleix Conchillo Flaqué
8951442b8e Merge pull request #3534 from pipecat-ai/aleix/claude-skills-pr-description
claude: add pr-description skill
2026-01-22 17:34:46 -08:00
Aleix Conchillo Flaqué
7e6e3031e7 claude: add pr-description skill 2026-01-22 13:41:50 -08:00
Akhil
3b3c7aa8cc LLMAssistantAggregator: preserve non-ASCII characters in JSON output
Add ensure_ascii=False to json.dumps() calls for tool call arguments
and function call results to prevent unnecessary unicode escaping.
2026-01-22 15:37:44 -06:00
Aleix Conchillo Flaqué
308829f92b Merge pull request #3533 from pipecat-ai/aleix/claude-skills-docstring
claude: add docstring skill
2026-01-22 12:58:38 -08:00
Aleix Conchillo Flaqué
82a799e63e claude: add docstring skill 2026-01-22 12:53:38 -08:00
Cale Shapera
6b5bcae86f change default Inworld TTS model to inworld-tts-1.5-max (#3531) 2026-01-22 14:21:15 -05:00
Mark Backman
836073849c Merge pull request #3527 from weakcamel/patch-1
Update README.md - fix Google Imagen URL
2026-01-22 10:46:10 -05:00
Waldek Maleska
b13b65d6e2 Update README.md - fix Google Imagen URL 2026-01-22 15:17:41 +00:00
Mark Backman
3d545b718d Merge pull request #3344 from omChauhanDev/fix/stt-dynamic-language-update
fix: treat language as first-class STT setting
2026-01-22 09:21:56 -05:00
marcus-daily
f2fa5d9733 Updating changelog 2026-01-22 14:17:59 +00:00
marcus-daily
76b774072c Formatting fixes 2026-01-22 14:17:59 +00:00
marcus-daily
b6341ffaa5 Save Smart Turn input data if SMART_TURN_LOG_DATA is set 2026-01-22 14:17:59 +00:00
Mark Backman
29fae67c9e Merge pull request #3523 from omChauhanDev/add-location-support-google-tts
feat(google): add location parameter to TTS services
2026-01-22 09:12:16 -05:00
Mark Backman
718ea1c15e Merge pull request #3526 from pipecat-ai/mb/remove-logs
Remove application logs
2026-01-22 08:48:07 -05:00
Mark Backman
8e09d94614 Remove application logs 2026-01-22 08:28:52 -05:00
Aleix Conchillo Flaqué
de73e28563 Merge pull request #3510 from omChauhanDev/feat/add-reached-filter-methods
feat(task): add additive filter methods for frame monitoring
2026-01-21 21:05:33 -08:00
Aleix Conchillo Flaqué
55250b4f7e Merge pull request #3521 from pipecat-ai/aleix/claude-changelog-skill
claude: initial /changelog skill
2026-01-21 20:50:47 -08:00
Om Chauhan
281145a991 added changelog 2026-01-22 09:55:57 +05:30
Om Chauhan
7bd32e2fe5 feat(google): add location parameter to TTS services 2026-01-22 09:49:19 +05:30
James Hush
8f05d95f50 feat: add video_out_codec parameter for DailyTransport (#3520)
* feat: add video_out_codec parameter for DailyTransport

Add video_out_codec parameter to TransportParams allowing configuration
of the preferred video codec (VP8, H264, H265) for video output.

When set, this passes the preferredCodec option to Daily's
VideoPublishingSettings during the join operation.

* chore: move video_out_codec parameter to changelog folder (#3522)

* Initial plan

* Move video_out_codec parameter to changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

* Revert all CHANGELOG.md changes, keep only changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-01-22 11:31:07 +08:00
Om Chauhan
87c12f3098 changed frame filter storage type from tuples to sets 2026-01-22 08:43:46 +05:30
Om Chauhan
9c0bf89247 added changelog 2026-01-22 08:43:46 +05:30
Om Chauhan
6e44a2ab49 feat(task): add additive filter methods for frame monitoring 2026-01-22 08:43:46 +05:30
Aleix Conchillo Flaqué
7aa7b86aed claude: initial /changelog skill 2026-01-21 18:43:04 -08:00
Aleix Conchillo Flaqué
5ad9faeb4c Merge pull request #3519 from pipecat-ai/aleix/embedded-rtvi-processor
automatically add RTVI to the pipeline
2026-01-21 18:17:26 -08:00
Aleix Conchillo Flaqué
9e8f8b45c6 added changelog files for #3519 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
0ee11ad333 tests: disable RTVI in tests by default 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
124a3c35af RTVIObserver: don't handle some frames direction 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
054e504868 examples(foundational): remove RTVI (automatically added by PipelineTask) 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
e85a00cc0e PipelineTask: automatically add RTVI processor and RTVI observer
If `enable_rtvi` is enabled (enabled by default) and RTVI processor will be
added automatically to the pipeline. Also, and RTVI observer will be
registered.
2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
cc61cdbba3 RTVIProcessor: add create_rtvi_observer() 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
62f4708d43 transports: broadcast InputTransportMessageFrame frames 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
ba0ddb1832 FrameProcessor: copy kwargs when broadcasting frame 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
eacd2a4b71 FrameProcessor: add broadcast_frame_instance() 2026-01-21 18:14:17 -08:00
Mark Backman
7ed110650d Merge pull request #3516 from okue/minorpatch1
refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
2026-01-21 10:33:59 -05:00
okue
4a724379fc refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
The _bot_speaking flag does not need to be set in this method,
so the redundant assignment has been removed.
2026-01-21 23:59:15 +09:00
Aleix Conchillo Flaqué
768d3958dd Merge pull request #3512 from pipecat-ai/changelog-0.0.100
Release 0.0.100 - Changelog Update
2026-01-20 19:32:56 -08:00
aconchillo
5f9ff8bd58 Update changelog for version 0.0.100 2026-01-20 19:21:19 -08:00
Aleix Conchillo Flaqué
59ed422052 Merge pull request #3511 from pipecat-ai/aleix/camb-tts-client-on-start
CambTTSService: initialize client during StartFrame
2026-01-20 19:17:45 -08:00
Aleix Conchillo Flaqué
7e0ca113af CambTTSService: initialize client during StartFrame 2026-01-20 19:07:12 -08:00
Aleix Conchillo Flaqué
13c52e0e6d Merge pull request #3509 from pipecat-ai/aleix/nvidia-stt-tts-improvements
NVIDIA STT/TTS performance improvements
2026-01-20 16:39:12 -08:00
Aleix Conchillo Flaqué
a787fd9cd8 NVIDIATTSService: process incoming audio frame right away
Process audio as soon as we receive it from the generator. Previously, we were
reading from the generator and adding elements into a queue until there was no
more data, then we would process the queue.
2026-01-20 15:41:05 -08:00
Aleix Conchillo Flaqué
14495c425a NVIDIASTTService: no need for additional queue and task 2026-01-20 13:50:17 -08:00
Aleix Conchillo Flaqué
461bd0a2e0 update changelog for #3494 and #3499 2026-01-20 13:26:40 -08:00
Aleix Conchillo Flaqué
bd45ce2b4e Merge pull request #3499 from lukepayyapilli/fix/livekit-video-queue-memory-leak
fix(livekit): prevent memory leak when video_in_enabled is False
2026-01-20 13:21:21 -08:00
Aleix Conchillo Flaqué
a266644b06 Merge pull request #3494 from omChauhanDev/fix/uninterruptible-frame-handling
fix: preserve UninterruptibleFrames in __reset_process_queue
2026-01-20 13:19:40 -08:00
Mark Backman
03faadd7f9 Merge pull request #3508 from pipecat-ai/ss/log-daily-ids
Log Daily participant and meeting session IDs upon successful join in…
2026-01-20 15:43:48 -05:00
Aleix Conchillo Flaqué
bf43032652 Merge pull request #3504 from pipecat-ai/aleix/nvidia-stt-tts-error-handling
NVIDIA STT/TTS error handling
2026-01-20 09:41:08 -08:00
Sunah Suh
fa6f924b31 Log Daily participant and meeting session IDs upon successful join in Daily Transport 2026-01-20 11:31:17 -06:00
Aleix Conchillo Flaqué
a010a020fd add changelog fo 3504 2026-01-20 09:03:30 -08:00
Aleix Conchillo Flaqué
655006aff5 NvidiaSegmentedSTTService: simplify exception handling 2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
671dc8cd9b NvidiaSTTService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
9a718ded1e NvidiaTTSService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
024809b39a Merge pull request #3503 from pipecat-ai/aleix/ai-service-start-end-cancel
AIService: handle StartFrame/EndFrame/CancelFrame exceptions
2026-01-20 08:56:39 -08:00
Aleix Conchillo Flaqué
6cf0d53d00 AIService: handle StartFrame/EndFrame/CancelFrame exceptions
If AIService subclasses implement start()/stop()/cancel() and exception are not
handled, execution will not continue and therefore the originator frames will
not be pushed. This would cause the pipeline to not be started (i.e. StartFrame
would not be pushed downstream) or stopped properly.
2026-01-20 08:54:22 -08:00
kompfner
778dacc9a8 Merge pull request #3486 from pipecat-ai/pk/fix-nova-sonic-reset-conversation
Fix `AWSNovaSonicLLMService.reset_conversation()`
2026-01-20 10:07:38 -05:00
Paul Kompfner
06b3ecd2d6 In AWS Nova Sonic service, send the "interactive" user message (which triggers the bot response) only after sending the audio input start event, per the AWS team's recommendation 2026-01-20 09:56:25 -05:00
Paul Kompfner
b4d143e39b Add CHANGELOG for fixing AWSNovaSonicLLMService.reset_conversation() 2026-01-20 09:56:25 -05:00
Paul Kompfner
c89083e72e Improve 20e example to ask the bot to give a recap when loading a previous conversation from disk 2026-01-20 09:56:25 -05:00
Luke Payyapilli
1ac811ab32 chore: revert unrelated uv.lock changes 2026-01-20 09:19:43 -05:00
Luke Payyapilli
f6359d460e chore: install livekit as optional extra in CI instead of dev dep 2026-01-20 09:16:16 -05:00
Aleix Conchillo Flaqué
f03a7175c7 Merge pull request #3501 from pipecat-ai/aleix/improve-eval-numerical-word-prompt
scripts(eval): give examples to numerical word answers
2026-01-19 20:22:06 -08:00
Aleix Conchillo Flaqué
aed44c863a scripts(eval): give examples to numerical word answers
Some models need extra help.
2026-01-19 14:37:00 -08:00
ssillerom
fa5da3b0be change comments 2026-01-19 20:49:23 +01:00
ssillerom
7e82a0cf49 feature: Genesys AudioHook WebSocket protocol serializer for Pipecat 2026-01-19 20:45:22 +01:00
Mark Backman
cddd6d5b0a Merge pull request #3492 from pipecat-ai/mb/remove-unused-imports
Remove unused imports
2026-01-19 14:07:16 -05:00
Mark Backman
11cf891ac8 Manual updates for unused imports 2026-01-19 14:03:22 -05:00
Luke Payyapilli
c89ae717fe style: fix ruff formatting 2026-01-19 11:13:41 -05:00
Luke Payyapilli
562bdd3084 test: add livekit to dev deps and improve test clarity 2026-01-19 11:11:54 -05:00
Mark Backman
cc4c3650e1 Merge pull request #3491 from pipecat-ai/mb/update-release-evals
Add Camb TTS to release evals
2026-01-19 11:04:05 -05:00
Luke Payyapilli
dfc1f09b77 fix(livekit): prevent memory leak when video_in_enabled is False 2026-01-19 11:00:23 -05:00
Mark Backman
0b1a4792b8 Bump to latest azure-cognitiveservices-speech version, 1.47.0 2026-01-19 09:52:28 -05:00
Mark Backman
14bd3b1b32 Set Azure TTS default prosody rate to None 2026-01-19 09:19:57 -05:00
Mark Backman
f733e77496 AzureTTS: work around word ordering issue at 8khz sample rate 2026-01-19 09:13:41 -05:00
Filipi da Silva Fuchter
5fc46cc450 Merge pull request #3493 from omChauhanDev/fix/globally-unique-pc-id
fix: make SmallWebRTCConnection pc_id globally unique
2026-01-19 09:04:48 -05:00
Om Chauhan
4a9eb82f92 fix: preserve UninterruptibleFrames in __reset_process_queue 2026-01-18 20:39:13 +05:30
Om Chauhan
990d8386e4 fix: make SmallWebRTCConnection pc_id globally unique 2026-01-18 19:41:51 +05:30
Mark Backman
ce7d823770 Remove unused imports 2026-01-18 08:22:22 -05:00
Mark Backman
0b93c3f900 Add Camb TTS to release evals 2026-01-17 16:27:16 -05:00
Mark Backman
829c5f4604 Merge pull request #3169 from Incanta/hathora
Add Hathora STT and TTS services
2026-01-17 16:25:12 -05:00
Mike Seese
dc8ea615d9 add hathora to run-release-evals.py 2026-01-17 10:33:58 -08:00
Mike Seese
a3d206050d move hathora example as requested 2026-01-17 10:31:08 -08:00
Mike Seese
f48a567873 run the linter 2026-01-17 10:30:47 -08:00
Mark Backman
e69ccd8ea7 Merge pull request #3490 from pipecat-ai/mb/on-user-mute-events
Add on_user_mute_started and on_user_mute_stopped events
2026-01-17 11:05:15 -05:00
Mark Backman
11924bb980 Add on_user_mute_started and on_user_mute_stopped events 2026-01-17 11:01:46 -05:00
Mark Backman
af89154e96 Merge pull request #3489 from pipecat-ai/mb/fix-azure-tts-punctuation-spacing
fix: AzureTTSService punctuation spacing
2026-01-17 11:00:30 -05:00
Mark Backman
1485ea0831 Merge pull request #3488 from pipecat-ai/mb/on-user-turn-idle
Update on_user_idle to on_user_turn_idle
2026-01-17 11:00:16 -05:00
Mark Backman
e22bc777d8 Fix spacing for CJK languages 2026-01-17 09:04:50 -05:00
Mark Backman
043403fe23 fix: AzureTTSService punctuation spacing 2026-01-17 08:18:31 -05:00
Mark Backman
1e1160906e Update on_user_idle to on_user_turn_idle 2026-01-17 07:04:27 -05:00
Aleix Conchillo Flaqué
f7d3e63063 Merge pull request #3474 from pipecat-ai/fix/optional-member-access-function-call-cancel
Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
2026-01-16 22:06:45 -08:00
Paul Kompfner
6fa797c8e4 Fix AWS Nova Sonic reset_conversation(), which would previously error out.
Issues:
- After disconnecting, we were prematurely sending audio messages using the new prompt and content names, before the new prompt and content were created
- We weren't properly sending system instruction and conversation history messages to Nova Sonic with `"interactive": false`
2026-01-16 22:31:54 -05:00
Mark Backman
473d39791b Merge pull request #3482 from pipecat-ai/mb/user-idle-in-user-aggregator
Add UserIdleController, deprecate UserIdleProcessor
2026-01-16 18:47:10 -05:00
Aleix Conchillo Flaqué
2114abb8c6 add changelog file for 3484 2026-01-16 15:46:29 -08:00
Aleix Conchillo Flaqué
4fb4c26f55 Merge pull request #3484 from amichyrpi/main
Remove async_mode parameter from Mem0 storage
2026-01-16 15:44:52 -08:00
Mark Backman
2e8e574ea5 Add UserIdleController, deprecate UserIdleProcessor 2026-01-16 18:44:19 -05:00
Aleix Conchillo Flaqué
84c7e97be2 Merge pull request #3483 from pipecat-ai/aleix/throttle-user-speaking-frame
throttle user speaking frame
2026-01-16 15:29:37 -08:00
Amory Hen
a6e7c99d55 Remove async_mode parameter from Mem0 storage 2026-01-17 00:26:38 +01:00
Aleix Conchillo Flaqué
ac3fa7f91f BaseOuputTransport: minor cleanup 2026-01-16 15:15:49 -08:00
Aleix Conchillo Flaqué
6eadad53b2 BaseInputTransport: throttle UserSpeakingFrame 2026-01-16 15:15:49 -08:00
kompfner
b11150f31f Merge pull request #3480 from pipecat-ai/pk/fix-grok-realtime-smallwebrtc
Fix an issue where Grok Realtime would error out when running with Sm…
2026-01-16 15:46:27 -05:00
Paul Kompfner
836cf60611 Fix an issue where Grok Realtime would error out when running with SmallWebRTC transport.
The underlying issue was related to the fact that we were sending audio to Grok before we had configured the Grok session with our default input sample rate (16000), so Grok was interpreting those initial audio chunks as having its default sample rate (24000). We didn't see this issue when using the Daily transport simply because in our test environments Daily took a smidge longer than a reflexive (localhost) pure WebRTC connection, so we would only send audio to Grok *after* we had configured the Grok session with the desired sample rate.
2026-01-16 15:41:33 -05:00
James Hush
1c13ad95a5 Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
Extract dictionary value to local variable and check for None before
accessing cancel_on_interruption attribute, since the dictionary values
are typed as Optional[FunctionCallInProgressFrame].
2026-01-16 15:04:26 -05:00
Mark Backman
1e8516e91d Merge pull request #3476 from pipecat-ai/mb/project-urls
Update project.urls for PyPI
2026-01-16 14:57:39 -05:00
Mark Backman
32c775311d Merge pull request #3471 from pipecat-ai/mb/fix-pydantic-2.12-docs
Revert pydantic 2.12 extra type annotation
2026-01-16 14:57:24 -05:00
Mark Backman
28d0bb98de Merge pull request #3472 from pipecat-ai/mb/whisker-dev
Add whisker_setup.py setup file to .gitignore
2026-01-16 14:55:48 -05:00
Aleix Conchillo Flaqué
a9a9f3aeaa Merge pull request #3462 from pipecat-ai/aleix/fix-min-words-transcription-aggregation
MinWordsUserTurnStartStrategy: don't aggregate transcriptions
2026-01-16 11:18:23 -08:00
Aleix Conchillo Flaqué
c2a0735975 MinWordsUserTurnStartStrategy: don't aggregate transcriptions
If we aggregate transcriptions we will get incorrect interruptions. For example,
if we have a strategy with min_words=3 and we say "One" and pause, then "Two"
and pause and then "Three", this would trigger the start of the turn when it
shouldn't. We should only look at the incoming transcription text and don't
aggregate it with the previous.
2026-01-16 11:16:06 -08:00
Aleix Conchillo Flaqué
41cb53f6c2 Merge pull request #3479 from pipecat-ai/aleix/turns-mute-to-user-mute
turns: move mute to user_mute
2026-01-16 11:11:50 -08:00
Aleix Conchillo Flaqué
58552af8fd examples(foundational): remote STTMuteFilter example 2026-01-16 11:07:20 -08:00
Aleix Conchillo Flaqué
c7ab87b0cc turns: move mute to user_mute 2026-01-16 11:07:20 -08:00
Mark Backman
11ecc5fdee Update project.urls for PyPI 2026-01-16 12:48:13 -05:00
kompfner
19fb3eed9f Merge pull request #3466 from pipecat-ai/pk/fix-aws-nova-sonic-rtvi-bot-output
Fix realtime (speech-to-speech) services' RTVI event compatibility
2026-01-16 09:56:13 -05:00
Mark Backman
b292b32374 Merge pull request #3461 from glennpow/glenn/websocket-headers
Allow WebsocketClientTransport to send custom headers
2026-01-15 20:26:36 -05:00
Mark Backman
63d1393bb0 Add whisker_setup.py to .gitignore 2026-01-15 20:21:25 -05:00
Glenn Powell
37914cb062 Removed import and added changelog entry. 2026-01-15 16:47:15 -08:00
Mark Backman
ec40696854 Revert pydantic 2.12 extra type annotation 2026-01-15 19:16:15 -05:00
Mike Seese
2249f3d673 add requested changes from code review 2026-01-15 15:27:56 -08:00
Mike Seese
d2df324f29 fix some bugs after testing changes 2026-01-15 15:27:56 -08:00
Mike Seese
67fdb0b659 use parent _settings dict instead of self._params pattern 2026-01-15 15:27:56 -08:00
Mike Seese
e77bdf66f9 add can_generate_metrics functions 2026-01-15 15:27:56 -08:00
Mike Seese
1b3b67779c switch hathora services to use InputParams pattern 2026-01-15 15:27:55 -08:00
Mike Seese
6c7e386391 remove traced_stt from run_stt 2026-01-15 15:27:55 -08:00
Mike Seese
ba25b279d6 fix issues with PR suggestions 2026-01-15 15:27:55 -08:00
Mike Seese
e7c83c19b6 port turn_start_strategies to the newer user_turn_strategies 2026-01-15 15:27:55 -08:00
Mike Seese
7be7fb49a3 remove turn_analyzer args from transport params 2026-01-15 15:27:54 -08:00
Mike Seese
bcccb4cbb3 put fallback sample_rate value in function arg 2026-01-15 15:27:54 -08:00
Mike Seese
e9f1d951d3 Apply suggestions from code review
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-01-15 15:27:54 -08:00
Mike Seese
e5632a9339 transition Hathora service to use the unified API and apply PR feedback
add Hathora to root files

Hathora run linter

added hathora changelog
2026-01-15 15:27:53 -08:00
Mike Seese
1510fb4fc0 add Hathora STT and TTS services 2026-01-15 15:26:52 -08:00
Mark Backman
64a1ad2649 Merge pull request #3470 from pipecat-ai/mb/fix-docs-0.0.99
Docs fixes after 0.0.99
2026-01-15 17:34:44 -05:00
Mark Backman
4458ca1d24 Mock FastAPI 2026-01-15 17:29:47 -05:00
Mark Backman
21aaa48e62 Fix pydantic issues impacting autodoc 2026-01-15 17:29:47 -05:00
Mark Backman
e75c241030 Merge pull request #3468 from pipecat-ai/mb/camb-cleanuo
Clean up CambTTSService
2026-01-15 17:16:28 -05:00
Mark Backman
60216048a8 Docs fixes after 0.0.99 2026-01-15 16:40:42 -05:00
Mark Backman
f3c2e29fb4 Clean up CambTTSService 2026-01-15 15:59:17 -05:00
Paul Kompfner
ce99924be4 Add CHANGELOG entry describing fix for the missing "bot-llm-text" RTVI event when using realtime (speech-to-speech) services 2026-01-15 15:55:39 -05:00
Paul Kompfner
5de80a60d4 Fix "bot-llm-text" not firing when using Grok Realtime 2026-01-15 15:30:00 -05:00
Paul Kompfner
5753762350 Fix "bot-llm-text" not firing when using OpenAI Realtime 2026-01-15 15:16:08 -05:00
Paul Kompfner
885b318b04 Fix "bot-llm-text" not firing when using Gemini Live 2026-01-15 15:03:45 -05:00
Paul Kompfner
7a22d58cf4 Fix "bot-llm-text" not firing when using AWS Nova Sonic 2026-01-15 14:56:50 -05:00
Mark Backman
c8e4b462c9 Merge pull request #3460 from pipecat-ai/mb/reorder-07-examples
Renumber the 07 foundational examples
2026-01-15 14:44:21 -05:00
Mark Backman
30a3f42255 Merge pull request #3349 from eRuaro/feat/camb-tts-integration
Add Camb.ai TTS integration with MARS models
2026-01-15 14:43:12 -05:00
Neil Ruaro
26ddb2de2f minimal uv.lock update for camb-sdk 2026-01-16 03:18:01 +08:00
Neil Ruaro
f60eeaa212 reverted uv.lock, updated readthedocs.yaml, copyright year updates 2026-01-16 02:50:18 +08:00
Neil Ruaro
8cf72b36cb manually add camb-sdk to uv.lock, exclude camb from docs build 2026-01-16 02:26:38 +08:00
Neil Ruaro
38c3bcef96 exclude camb from docs build 2026-01-16 02:20:26 +08:00
Neil Ruaro
80604ba7b6 remove _update_settings method 2026-01-16 02:00:48 +08:00
Neil Ruaro
256c70c631 use UserTurnStrategies 2026-01-16 01:32:08 +08:00
Glenn Powell
0e3532c529 Allow WebsocketClientTransport to send custom headers 2026-01-15 09:31:48 -08:00
Neil Ruaro
9942fcfeb2 updated per PR reviews 2026-01-16 01:20:17 +08:00
Neil Ruaro
003c24ca6e Make model parameter explicit in docstring example 2026-01-16 01:18:37 +08:00
Neil Ruaro
ed120d014d Add model-specific sample rates, transport example, and fix audio buffer alignment 2026-01-16 01:18:37 +08:00
Neil Ruaro
e76a3d04f0 Update Camb TTS to 48kHz sample rate 2026-01-16 01:18:37 +08:00
Neil Ruaro
641d17007f Clean up Camb TTS service and tests 2026-01-16 01:18:37 +08:00
Neil Ruaro
9293b5f24a Migrate Camb TTS service from raw HTTP to official SDK
- Replace aiohttp with camb SDK (AsyncCambAI client)
- Add support for passing existing SDK client instance
- Simplify API: no longer requires aiohttp_session parameter
- Update example to use simplified initialization
- Rewrite tests to mock SDK client instead of HTTP servers
2026-01-16 01:18:37 +08:00
Neil Ruaro
c1f3cbd1d4 Yield TTSAudioRawFrame directly instead of calling private method 2026-01-16 01:18:37 +08:00
Neil Ruaro
78fa2ab65e Update default voice ID, fix MARS naming, and clean up example 2026-01-16 01:18:37 +08:00
Neil Ruaro
56da2caeed Update Camb.ai TTS inference options 2026-01-16 01:18:37 +08:00
Neil Ruaro
a541d65255 Update MARS model names to mars-flash, mars-pro, mars-instruct
Rename model identifiers from mars-8-* to the new naming convention:
- mars-8-flash -> mars-flash (default)
- mars-8 -> removed
- mars-8-instruct -> mars-instruct
- Added mars-pro
2026-01-16 01:18:37 +08:00
Neil Ruaro
a3d7e9eafe Address PR feedback: add --voice-id arg, remove test script
- Add --voice-id CLI argument to example (default: 2681)
- Remove test_camb_quick.py from examples/ (tests belong in tests/)
- Update docstring with new usage
2026-01-16 01:18:36 +08:00
Neil Ruaro
54933bea2a Rename changelog to PR number 2026-01-16 01:18:36 +08:00
Neil Ruaro
fcab9899cc Add changelog entry for Camb.ai TTS integration 2026-01-16 01:18:36 +08:00
Neil Ruaro
be098e85db Remove non-working Daily/WebRTC example
The Daily transport example had authentication issues. Keeping the
local audio example (07zb-interruptible-camb-local.py) which works.
2026-01-16 01:18:36 +08:00
Neil Ruaro
ed0ff46a87 added local test 2026-01-16 01:18:36 +08:00
Neil Ruaro
7ae0d651d6 added cambai tts integration 2026-01-16 01:18:36 +08:00
Mark Backman
efd4432cfb Renumber the 07 foundational examples 2026-01-15 10:26:17 -05:00
kompfner
24082b84f2 Merge pull request #3453 from pipecat-ai/pk/consistency-pass-on-user-started-stopped-speaking-frames
Do a consistency pass on how we're sending `UserStartedSpeakingFrame`…
2026-01-15 09:24:14 -05:00
Aleix Conchillo Flaqué
dcd5840341 Merge pull request #3455 from pipecat-ai/aleix/reset-user-turn-start-strategies
UserTurnController: reset user turn start strategies when turn triggered
2026-01-14 19:28:32 -08:00
Aleix Conchillo Flaqué
9e705ce768 UserTurnController: reset user turn start strategies when turn triggered 2026-01-14 18:20:29 -08:00
Mark Backman
965466cc09 Merge pull request #3454 from pipecat-ai/mb/external-turn-strategies-timeout
fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrat…
2026-01-14 20:15:31 -05:00
Mark Backman
f3993f1775 fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrategies 2026-01-14 20:10:56 -05:00
Paul Kompfner
e107902b14 Do a consistency pass on how we're sending UserStartedSpeakingFrames and UserStoppedSpeakingFrames. The codebase is now consistent in broadcasting both types of frames up and downstream. 2026-01-14 18:47:15 -05:00
kompfner
e7b5ff49f4 Merge pull request #3447 from pipecat-ai/pk/add-pr-3420-to-changelog
Add PR 3420 to CHANGELOG (it was missing)
2026-01-14 15:33:44 -05:00
Paul Kompfner
e33172c44e Add PR 3420 to CHANGELOG (it was missing) 2026-01-14 15:33:07 -05:00
Mark Backman
3d858e8aa6 Merge pull request #3444 from pipecat-ai/mb/update-quickstart-0.0.99
Update quickstart example for 0.0.99
2026-01-14 10:29:55 -05:00
Mark Backman
eab059c49a Merge pull request #3446 from pipecat-ai/mb/add-3392-changelog
Add PR 3392 to changelog, linting cleanup
2026-01-14 10:28:57 -05:00
Mark Backman
4aaff04fb3 Add PR 3392 to changelog, linting cleanup 2026-01-14 09:43:17 -05:00
Mark Backman
cb364f3cab Update quickstart example for 0.0.99 2026-01-14 08:59:20 -05:00
Mark Backman
a9bfb090c3 Merge pull request #3287 from ashotbagh/feature/asyncai-multicontext-wss
Fix TTFB metric and add multi-context WebSocket support for Async TTS
2026-01-14 07:52:52 -05:00
Ashot
c4ae4025f3 Adjustments of Async TTS for multicontext websocket support 2026-01-14 16:33:30 +04:00
Ashot
15067c678d adapt Async TTS to updated AudioContextTTSService 2026-01-14 15:45:27 +04:00
Ashot
5ae592f38e Improve Async TTS interruption handling by using AudioContextTTSService class and add changelog fragments 2026-01-14 15:45:27 +04:00
Ashot
9cdbc56be3 Fix TTFB metric and add multi-context WebSocket support for Async TTS 2026-01-14 15:45:27 +04:00
Aleix Conchillo Flaqué
86ed485711 Merge pull request #3440 from pipecat-ai/changelog-0.0.99
Release 0.0.99 - Changelog Update
2026-01-13 17:02:41 -08:00
Aleix Conchillo Flaqué
7e1b4a4e90 update cosmetic changelog updates for 0.0.99 2026-01-13 16:59:46 -08:00
aconchillo
4531d517da Update changelog for version 0.0.99 2026-01-14 00:49:15 +00:00
Aleix Conchillo Flaqué
6fd5847f84 Merge pull request #3439 from pipecat-ai/aleix/uv-lock-2026-01-13
uv.lock: upgrade to latest versions
2026-01-13 16:48:07 -08:00
Aleix Conchillo Flaqué
2015eba9b2 uv.lock: upgrade to latest versions 2026-01-13 16:45:44 -08:00
Mark Backman
84f16ee895 Merge pull request #3438 from pipecat-ai/mb/fix-26a
Fix 26a foundational
2026-01-13 19:43:50 -05:00
Aleix Conchillo Flaqué
5b2af03b16 Merge pull request #3437 from pipecat-ai/aleix/update-aggregator-logs
LLMContextAggregatorPair: make strategy logs less verbose
2026-01-13 16:39:29 -08:00
Mark Backman
b313395dc3 Fix 26a foundational 2026-01-13 19:31:24 -05:00
Aleix Conchillo Flaqué
0d6bdbee10 LLMContextAggregatorPair: make strategy logs less verbose 2026-01-13 15:11:22 -08:00
Aleix Conchillo Flaqué
248dac3a9d Merge pull request #3420 from pipecat-ai/pk/fix-gemini-3-parallel-function-calls
Fix parallel function calling with Gemini 3.
2026-01-13 14:40:33 -08:00
Paul Kompfner
be49a54856 Fast-exit in the fix for parallel function calling with Gemini 3, if we can determine up-front that there's no work to do 2026-01-13 17:32:20 -05:00
Aleix Conchillo Flaqué
bd9ee0d646 Merge pull request #3434 from pipecat-ai/aleix/context-appregator-pair-tuple
context aggregator pair tuple
2026-01-13 14:12:51 -08:00
Mark Backman
442e0e582d Merge pull request #3431 from pipecat-ai/mb/update-realtime-examples-transcript-handler
Update GeminiLiveLLMService to push thought frames, update 26a for new transcript events
2026-01-13 17:10:40 -05:00
kompfner
38194c0cff Merge pull request #3436 from pipecat-ai/pk/remove-transcript-processor-reference
Remove dead import of `TranscriptProcessor` (which is now deprecated)
2026-01-13 17:06:17 -05:00
Paul Kompfner
0ebdaba03c Remove dead import of TranscriptProcessor (which is now deprecated) 2026-01-13 17:02:57 -05:00
Aleix Conchillo Flaqué
ee82377d68 examples: fix 22d to push some CancelFrame and EndFrame 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
861588e4a3 examples: update all examples to use the new LLMContextAggregatorPair tuple 2026-01-13 14:01:53 -08:00
Aleix Conchillo Flaqué
1ab3bf2ef6 LLMContextAggregatorPair: instances can now return a tuple 2026-01-13 14:01:53 -08:00
Mark Backman
bb00d223c9 Update 26a to use context aggregator transcription events 2026-01-13 17:01:10 -05:00
Aleix Conchillo Flaqué
86fbfaddd1 Merge pull request #3435 from pipecat-ai/aleix/fix-llm-context-create-audio-message
LLMContext: fix create_audio_message
2026-01-13 13:59:28 -08:00
Aleix Conchillo Flaqué
5612bf513b LLMContext: fix create_audio_message 2026-01-13 13:53:34 -08:00
Mark Backman
87d0dc9e24 Merge pull request #3412 from pipecat-ai/mb/remove-41a-b
Remove foundational examples 41a and 41b
2026-01-13 16:45:26 -05:00
Paul Kompfner
30fbcfbf71 Rework fix for parallel function calling with Gemini 3 2026-01-13 16:33:59 -05:00
Mark Backman
5d90f4ea06 Merge pull request #3428 from pipecat-ai/mb/fix-tracing-none-values
Fix TTS, realtime LLM services could return unknown for model_name
2026-01-13 15:40:10 -05:00
kompfner
f6d09e1574 Merge pull request #3430 from pipecat-ai/pk/request-image-frame-fixes
Fix request_image_frame and usage
2026-01-13 15:36:44 -05:00
Mark Backman
b8e48dee7f Merge pull request #3433 from pipecat-ai/mb/port-realtime-examples-transcript-events
Update examples to use transcription events from context aggregators
2026-01-13 15:36:06 -05:00
Mark Backman
a6ccb9ec69 Merge pull request #3427 from pipecat-ai/mb/add-07j-gladia-vad-example
Add 07j Gladia VAD foundational example, add to release evals
2026-01-13 15:35:24 -05:00
Mark Backman
66551ebdf5 Merge pull request #3426 from pipecat-ai/mb/changelog-3404
Add changelog fragments for PR 3404
2026-01-13 15:34:58 -05:00
Aleix Conchillo Flaqué
21534f7d83 added changelog file for #3430 2026-01-13 12:21:22 -08:00
Mark Backman
d591f9e108 Remove 28-transcription-processor.py 2026-01-13 15:20:59 -05:00
Mark Backman
aa2589d3be Update examples to use transcription events from context aggregators 2026-01-13 15:19:47 -05:00
Aleix Conchillo Flaqué
9d6067fa78 examples(foundational): speak "Let me check on that" in 14d examples 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
027e54425a examples(foundational): associate image requests to function calls 2026-01-13 12:11:30 -08:00
Aleix Conchillo Flaqué
e268c73c41 LLMAssistantAggregator: cache function call requested images 2026-01-13 12:10:08 -08:00
Aleix Conchillo Flaqué
d3c57e2da0 UserImageRawFrame: don't deprecate request field 2026-01-13 11:56:13 -08:00
Aleix Conchillo Flaqué
02eace5a16 UserImageRequestFrame: don't deprecate function call related fields 2026-01-13 11:55:55 -08:00
Mark Backman
15bc1dd999 Update GeminiLiveLLMService to push Thought frames when thought content is returned 2026-01-13 14:13:00 -05:00
Paul Kompfner
b937956dc8 Fix request_image_frame and usage 2026-01-13 13:23:01 -05:00
Mark Backman
efbc0c8510 Fix TTS, realtime LLM services could return unknown for model_name 2026-01-13 12:12:15 -05:00
Himanshu Gunwant
d0f227189c fix: openai llm model name is unknown (#3422) 2026-01-13 11:55:52 -05:00
Mark Backman
41eef5efc4 Add 07j Gladia VAD foundational example, add to release evals 2026-01-13 11:36:15 -05:00
Mark Backman
f00f9d9f1a Add changelog fragments for PR 3404 2026-01-13 11:29:17 -05:00
Mark Backman
ae59b3ba36 Merge pull request #3404 from poseneror/feature/gladia-vad-events
feat(gladia): add VAD events support
2026-01-13 11:26:56 -05:00
Paul Kompfner
6668712f7b Add evals for parallel function calling 2026-01-13 11:03:38 -05:00
Paul Kompfner
8812686b17 Fix parallel function calling with Gemini 3.
Gemini expects parallel function calls to be passed in as a single multi-part `Content` block. This is important because only one of the function calls in a batch of parallel function calls gets a thought signature—if they're passed in as separate `Content` blocks, there'd be one or more missing thought signatures, which would result in a Gemini error.
2026-01-13 11:03:38 -05:00
kompfner
8b0f0b5bb4 Merge pull request #3425 from pipecat-ai/pk/gemini-3-flash-new-thinking-levels
Add Gemini 3 Flash-specific thinking levels
2026-01-13 11:02:53 -05:00
Paul Kompfner
f5e8a04e3b Bump aiortc dependency, which relaxes the constraint on av, which was pinned to 14.4.0, which no longer has all necessary wheels 2026-01-13 10:50:08 -05:00
Mark Backman
a298ce3b41 Merge pull request #3424 from pipecat-ai/mb/tts-append-trailing-space
Add append_trailing_space to TTSService to prevent vocalizing trailin…
2026-01-13 10:42:40 -05:00
Mark Backman
31daa889e8 Add append_trailing_space to TTSService to prevent vocalizing trailing punctuation; update DeepgramTTSService and RimeTTSService to use the arg 2026-01-13 10:38:54 -05:00
Paul Kompfner
76a058178e Add Gemini 3 Flash-specific thinking levels 2026-01-13 09:50:59 -05:00
poseneror
3304b18ac2 Add should_interrupt + broadcast user events 2026-01-13 14:27:35 +02:00
poseneror
b95a6afe77 feat(gladia): add VAD events support
Add support for Gladia's speech_start/speech_end events to emit
UserStartedSpeakingFrame and UserStoppedSpeakingFrame frames.

When enable_vad=True in GladiaInputParams:
- speech_start triggers interruption and pushes UserStartedSpeakingFrame
- speech_end pushes UserStoppedSpeakingFrame
- Tracks speaking state to prevent duplicate events

This allows using Gladia's built-in VAD instead of a separate VAD
in the pipeline.
2026-01-13 14:27:35 +02:00
Mark Backman
f6ed7d7582 Merge pull request #3418 from pipecat-ai/mb/speechmatics-task-cleanup 2026-01-12 19:24:56 -05:00
Mark Backman
cd3290df1c Small cleanup for task creation in SpeechmaticsSTTService 2026-01-12 16:00:32 -05:00
Mark Backman
2296caf529 Merge pull request #3414 from pipecat-ai/mb/changelog-3410
Update changelog for PR 3410.changed.md
2026-01-12 13:43:42 -05:00
Mark Backman
90ded6658d Merge pull request #3403 from pipecat-ai/mb/inworld-tts-add-keepalive
InworldTTSService: Add keepalive task
2026-01-12 13:31:24 -05:00
Mark Backman
7e97fb80a5 Merge pull request #3392 from pipecat-ai/mb/websocket-service-connection-closed-error
Add reconnect logic to WebsocketService in the event of ConnectionClo…
2026-01-12 13:11:43 -05:00
Mark Backman
b58471fdb1 Add Exotel and Vonage to Serializers in README services list 2026-01-12 12:24:56 -05:00
Aleix Conchillo Flaqué
46b4f9f29b Merge pull request #3413 from pipecat-ai/aleix/fix-assistant-thought-aggregation
LLMAssistantAggregator: reset aggregation after adding the thought, not before
2026-01-12 09:21:42 -08:00
Aleix Conchillo Flaqué
ec20d72aba LLMAssistantAggregator: reset aggregation after adding the thought, not before 2026-01-12 09:18:13 -08:00
Mark Backman
5743e2a99b Update changelog for PR 3410.changed.md 2026-01-12 12:15:40 -05:00
Mark Backman
2f429a2e76 Merge pull request #3410 from Vonage/feat/fastapi-ws-vonage-serializer
feat: update FastAPI WebSocket transport and add Vonage serializer
2026-01-12 12:10:57 -05:00
Varun Pratap Singh
3e982f7a4a refactor: rename audio_packet_bytes to fixed_audio_packet_size 2026-01-12 22:11:39 +05:30
Mark Backman
89484e281d Remove foundational examples 41a and 41b 2026-01-12 10:11:58 -05:00
Varun Pratap Singh
14a115f372 changelog: add fragments for PR #3410 2026-01-12 18:12:27 +05:30
Varun Pratap Singh
e96595fe59 feat: update FastAPI WebSocket transport and add Vonage serializer 2026-01-12 17:50:38 +05:30
Mark Backman
f58d21862b WebsocketService: Add _maybe_try_reconnect and use for exception cases 2026-01-11 16:43:37 -05:00
Om Chauhan
38506f51f7 fix(openrouter): handle multiple system messages for Gemini models 2026-01-11 21:19:47 +05:30
Mark Backman
aac24ad2d4 InworldTTSService: Add keepalive task 2026-01-10 11:20:20 -05:00
Mark Backman
9c81acb159 Track websocket disconnecting status to improve error handling 2026-01-09 20:24:07 -05:00
Mark Backman
4fe0836cf9 Add reconnect logic to WebsocketService in the event of ConnectionClosedError 2026-01-09 09:03:01 -05:00
Om Chauhan
1ceb01665f fix: treat language as first-class STT setting 2026-01-04 11:04:30 +05:30
Martin Liu
8dfc59be13 Include pts in incoming video and audio frames 2025-11-12 18:36:56 -05:00
571 changed files with 28347 additions and 11328 deletions

5
.claude/settings.json Normal file
View File

@@ -0,0 +1,5 @@
{
"attribution": {
"commit": ""
}
}

View File

@@ -0,0 +1,47 @@
---
name: changelog
description: Create changelog files for important commits in a PR
---
Create changelog files for the important commits in this PR. The PR number is provided as an argument.
## Instructions
1. Skip changelog for: documentation-only, internal refactoring, test-only, CI changes.
2. First, check what commits are on the current branch compared to main:
```
git log main..HEAD --oneline
```
3. For each significant change, create a changelog file in the `changelog/` folder using the format:
Allowed types: `added`, `changed`, `deprecated`, `removed`, `fixed`, `security`, `performance`, `other`
- `{PR_NUMBER}.added.md` - for new features
- `{PR_NUMBER}.added.2.md`, `{PR_NUMBER}.added.3.md` - for additional entries of the same type
- `{PR_NUMBER}.changed.md` - for changes to existing functionality
- `{PR_NUMBER}.fixed.md` - for bug fixes
- `{PR_NUMBER}.deprecated.md` - for deprecations
- `{PR_NUMBER}.removed.md` - for removed features
- `{PR_NUMBER}.security.md` - for security fixes
- `{PR_NUMBER}.performance.md` - for performance improvements
- `{PR_NUMBER}.other.md` - for other changes
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change. No line wrapping.
5. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.
6. Use ⚠️ emoji prefix for breaking changes.
## Example
For PR #3519 with a new feature and a bug fix:
`changelog/3519.added.md`:
```
- Added `SomeNewFeature` for doing something useful.
```
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly.
```

View File

@@ -0,0 +1,307 @@
# Code Cleanup Skill
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecats architecture, coding standards, and example patterns**.
It focuses on **readability, correctness, performance, and consistency**, while avoiding breaking changes.
---
## Skill Overview
This skill analyzes all changes introduced in your branch and performs the following actions:
1. **Analyze Branch Changes**
- Review uncommitted changes and outgoing commits
2. **Refactor for Readability**
- Improve clarity, naming, structure, and modern Python usage
3. **Enhance Performance**
- Identify safe, conservative optimization opportunities
4. **Add Documentation**
- Apply Pipecat-style, Google-format docstrings
5. **Ensure Pattern Consistency**
- Match existing Pipecat services, pipelines, and examples
6. **Validate Examples**
- Ensure examples follow foundational patterns (e.g. `07-interruptible.py`)
---
## Usage
Invoke the skill using any of the following commands:
- “Clean up my branch code”
- “Refactor the changes in my branch”
- “Review and improve my branch code”
- `/cleanup`
---
## What This Skill Does
### 1. Analyze Branch Changes
The skill retrieves all uncommitted changes and outgoing commits to understand:
- New files added
- Modified files
- Code additions and deletions
- Overall scope and intent of changes
---
### 2. Code Refactoring
#### Readability Improvements
- Replace tuples with named classes or dataclasses
- Improve variable, method, and class naming
- Extract complex logic into well-named helper methods
- Add missing type hints
- Simplify nested or complex conditionals
- Replace deprecated methods and features
- Normalize formatting to match Pipecat style
#### Performance Enhancements
- Identify inefficient loops or repeated work
- Suggest appropriate data structures
- Optimize async workflows and I/O
- Remove redundant operations
> Performance changes are conservative and non-breaking.
---
### 3. Documentation
Documentation follows **Google-style docstrings**, consistent with Pipecat conventions.
#### Class Documentation
```python
class ExampleService:
"""Brief one-line description.
Detailed explanation of the class purpose, responsibilities,
and important behaviors.
Supported features:
- Feature 1
- Feature 2
- Feature 3
"""
```
#### Method Documentation
```python
def process_data(self, data: str, options: Optional[dict] = None) -> bool:
"""Process incoming data with optional configuration.
Args:
data: The input data to process.
options: Optional configuration dictionary.
Returns:
True if processing succeeded, False otherwise.
Raises:
ValueError: If data is empty or invalid.
"""
```
#### Pydantic Model Parameters
```python
class InputParams(BaseModel):
"""Configuration parameters for the service.
Parameters:
timeout: Request timeout in seconds.
retry_count: Number of retry attempts.
enable_logging: Whether to enable debug logging.
"""
timeout: Optional[float] = None
retry_count: int = 3
enable_logging: bool = False
```
---
### 4. Pattern Consistency Checks
#### Service Classes
- Correct inheritance (`TTSService`, `STTService`, `LLMService`)
- Consistent constructor signatures
- Frame emission patterns
- Metrics support:
- `can_generate_metrics()`
- TTFB metrics
- Usage metrics
- Alignment with similar existing services
#### Examples
Validated against `examples/foundational/07-interruptible.py`:
- Proper `create_transport()` usage
- Correct pipeline structure
- Task setup and observers
- Event handler registration
- Runner and bot entrypoint consistency
---
### 5. Specific Implementation Patterns
#### Service Implementation
```python
class ExampleTTSService(TTSService):
def __init__(self, *, api_key: Optional[str] = None, **kwargs):
super().__init__(**kwargs)
self._api_key = api_key or os.getenv("SERVICE_API_KEY")
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
try:
await self.start_ttfb_metrics()
yield TTSStartedFrame()
# ... processing ...
yield TTSAudioRawFrame(...)
finally:
await self.stop_ttfb_metrics()
```
---
#### Example Structure Pattern
```python
transport_params = {
"daily": lambda: DailyParams(...),
"twilio": lambda: FastAPIWebsocketParams(...),
"webrtc": lambda: TransportParams(...),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(...)
tts = SomeTTSService(...)
llm = OpenAILLMService(...)
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(...)
pipeline = Pipeline([...])
task = PipelineTask(pipeline, params=..., observers=[...])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
```
---
## Execution Flow
1. Fetch uncommitted and outgoing changes
2. Categorize files (services, examples, tests, utilities)
3. Analyze each file:
- Readability
- Performance
- Documentation
- Pattern consistency
4. Generate actionable recommendations
5. Apply Pipecat standards
---
## Examples
### Before: Tuple Usage
```python
def get_audio_info(self) -> Tuple[int, int]:
return (48000, 1)
```
### After: Named Class
```python
class AudioInfo:
"""Audio configuration information.
Parameters:
sample_rate: Sample rate in Hz.
num_channels: Number of audio channels.
"""
sample_rate: int
num_channels: int
def get_audio_info(self) -> AudioInfo:
return AudioInfo(sample_rate=48000, num_channels=1)
```
---
### Before: Missing Documentation
```python
class NewTTSService(TTSService):
def __init__(self, api_key: str, voice: str):
self._api_key = api_key
self._voice = voice
```
### After: Fully Documented
```python
class NewTTSService(TTSService):
"""Text-to-speech service using NewProvider API.
Streams PCM audio and emits TTSAudioRawFrame frames compatible
with Pipecat transports.
Supported features:
- Text-to-speech synthesis
- Streaming PCM audio
- Voice customization
- TTFB metrics
"""
def __init__(self, *, api_key: str, voice: str, **kwargs):
"""Initialize the NewTTSService.
Args:
api_key: API key for authentication.
voice: Voice identifier to use.
**kwargs: Additional arguments passed to the parent service.
"""
super().__init__(**kwargs)
self._api_key = api_key
self.set_voice(voice)
```
---
## Notes
- Non-breaking improvements only
- Backward compatibility preserved
- Conservative performance changes
- Google-style docstrings
- Pattern checks follow recent Pipecat code

View File

@@ -0,0 +1,107 @@
---
name: code-review
description: Automated code review for pull requests using multiple specialized agents
disable-model-invocation: true
allowed-tools: Bash(gh issue view:*), Bash(gh search:*), Bash(gh issue list:*), Bash(gh pr comment:*), Bash(gh pr diff:*), Bash(gh pr view:*), Bash(gh pr list:*)
---
Provide a code review for the given pull request.
**Agent assumptions (applies to all agents and subagents):**
- All tools are functional and will work without error. Do not test tools or make exploratory calls. Make sure this is clear to every subagent that is launched.
- Only call a tool if it is required to complete the task. Every tool call should have a clear purpose.
To do this, follow these steps precisely:
1. Launch a haiku agent to check if any of the following are true:
- The pull request is closed
- The pull request is a draft
- The pull request does not need code review (e.g. automated PR, trivial change that is obviously correct)
- Claude has already commented on this PR (check `gh pr view <PR> --comments` for comments left by claude)
If any condition is true, stop and do not proceed.
Note: Still review Claude generated PR's.
2. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including:
- The root CLAUDE.md file, if it exists
- Any CLAUDE.md files in directories containing files modified by the pull request
3. Launch a sonnet agent to view the pull request and return a summary of the changes
4. Launch 4 agents in parallel to independently review the changes. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following:
Agents 1 + 2: CLAUDE.md compliance sonnet agents
Audit changes for CLAUDE.md compliance in parallel. Note: When evaluating CLAUDE.md compliance for a file, you should only consider CLAUDE.md files that share a file path with the file or parents.
Agent 3: Opus bug agent (parallel subagent with agent 4)
Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff.
Agent 4: Opus bug agent (parallel subagent with agent 3)
Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code.
**CRITICAL: We only want HIGH SIGNAL issues.** Flag issues where:
- The code will fail to compile or parse (syntax errors, type errors, missing imports, unresolved references)
- The code will definitely produce wrong results regardless of inputs (clear logic errors)
- Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken
Do NOT flag:
- Code style or quality concerns
- Potential issues that depend on specific inputs or state
- Subjective suggestions or improvements
If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time.
In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent.
5. For each issue found in the previous step by agents 3 and 4, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations.
6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review.
7. If issues were found, skip to step 8 to post comments.
If NO issues were found, post a summary comment using `gh pr comment` (if `--comment` argument is provided):
"No issues found. Checked for bugs and CLAUDE.md compliance."
8. Create a list of all comments that you plan on leaving. This is only for you to make sure you are comfortable with the comments. Do not post this list anywhere.
9. Post inline comments for each issue using `gh pr review` with inline comments. For each comment:
- Provide a brief description of the issue
- For small, self-contained fixes, include a committable suggestion block
- For larger fixes (6+ lines, structural changes, or changes spanning multiple locations), describe the issue and suggested fix without a suggestion block
- Never post a committable suggestion UNLESS committing the suggestion fixes the issue entirely. If follow up steps are required, do not leave a committable suggestion.
**IMPORTANT: Only post ONE comment per unique issue. Do not post duplicate comments.**
Use this list when evaluating issues in Steps 4 and 5 (these are false positives, do NOT flag):
- Pre-existing issues
- Something that appears to be a bug but is actually correct
- Pedantic nitpicks that a senior engineer would not flag
- Issues that a linter will catch (do not run the linter to verify)
- General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md
- Issues mentioned in CLAUDE.md but explicitly silenced in the code (e.g., via a lint ignore comment)
Notes:
- Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch.
- Create a todo list before starting.
- You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md, include a link to it).
- If no issues are found, post a comment with the following format:
---
## Code review
No issues found. Checked for bugs and CLAUDE.md compliance.
---
- When linking to code in inline comments, follow the following format precisely, otherwise the Markdown preview won't render correctly: `https://github.com/OWNER/REPO/blob/FULL_SHA/path/to/file.py#L10-L15`
- Requires full git sha
- You must provide the full sha. Commands like `https://github.com/owner/repo/blob/$(git rev-parse HEAD)/foo/bar` will not work, since your comment will be directly rendered in Markdown.
- Repo name must match the repo you're code reviewing
- # sign after the file name
- Line range format is L[start]-L[end]
- Provide at least 1 line of context before and after, centered on the line you are commenting about (eg. if you are commenting about lines 5-6, you should link to `L4-7`)

View File

@@ -0,0 +1,257 @@
---
name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
## Instructions
1. First, find the class in the codebase:
```
Search for "class ClassName" in src/pipecat/
```
2. If multiple files contain that class name:
- List all matches with their file paths
- Ask the user which one they want to document
- Wait for confirmation before proceeding
3. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component
4. Apply documentation in this order:
- Module docstring (at top, after imports)
- Class docstrings
- `__init__` methods (always document constructor parameters)
- Public methods (not starting with `_`)
- Dataclass/config classes with field descriptions
5. Skip documentation for:
- Private methods (starting with `_`)
- Simple dunder methods (`__str__`, `__repr__`, `__post_init__`)
- Very simple pass-through properties
- **Already documented code** - If a class, method, or function already has a complete docstring that follows the project style, do not modify it. A docstring is complete if it has:
- A one-line summary
- Args section (if it has parameters)
- Returns section (if it returns something meaningful)
- Only add or improve documentation where it is missing or incomplete
## Module Docstring Format
```python
"""[One-line description of module purpose].
[Optional: Longer explanation of functionality, key classes, or use cases.]
"""
```
Example:
```python
"""Neuphonic text-to-speech service implementations.
This module provides WebSocket and HTTP-based integrations with Neuphonic's
text-to-speech API for real-time audio synthesis.
"""
```
## Class Docstring Format
```python
class ClassName:
"""One-line summary describing what the class does.
[Longer description explaining purpose, behavior, and key features.
Use action-oriented language.]
[Optional: Event handlers, usage notes, or important caveats.]
"""
```
Example:
```python
class FrameProcessor(BaseObject):
"""Base class for all frame processors in the pipeline.
Frame processors are the building blocks of Pipecat pipelines, they can be
linked to form complex processing pipelines. They receive frames, process
them, and pass them to the next or previous processor in the chain.
Event handlers available:
- on_before_process_frame: Called before a frame is processed
- on_after_process_frame: Called after a frame is processed
Example::
@processor.event_handler("on_before_process_frame")
async def on_before_process_frame(processor, frame):
...
@processor.event_handler("on_after_process_frame")
async def on_after_process_frame(processor, frame):
...
"""
```
Note: When listing event handlers, do NOT use backticks. Include an `Example::` section (with double colon for Sphinx) showing the decorator pattern and function signature for each event.
## Constructor (`__init__`) Format
```python
def __init__(self, *, param1: Type, param2: Type = default, **kwargs):
"""Initialize the [ClassName].
Args:
param1: Description of param1 and its purpose.
param2: Description of param2. Defaults to [default].
**kwargs: Additional arguments passed to parent class.
"""
```
Example:
```python
def __init__(
self,
*,
api_key: str,
voice_id: Optional[str] = None,
sample_rate: Optional[int] = 22050,
**kwargs,
):
"""Initialize the Neuphonic TTS service.
Args:
api_key: Neuphonic API key for authentication.
voice_id: ID of the voice to use for synthesis.
sample_rate: Audio sample rate in Hz. Defaults to 22050.
**kwargs: Additional arguments passed to parent InterruptibleTTSService.
"""
```
## Method Docstring Format
```python
async def method_name(self, param1: Type) -> ReturnType:
"""One-line summary of what method does.
[Longer description if behavior isn't obvious.]
Args:
param1: Description of param1.
Returns:
Description of return value.
Raises:
ExceptionType: When this exception is raised.
"""
```
Example:
```python
async def put(self, item: Tuple[Frame, FrameDirection, FrameCallback]):
"""Put an item into the priority queue.
System frames (`SystemFrame`) have higher priority than any other
frames. If a non-frame item is provided it will have the highest priority.
Args:
item: The item to enqueue.
"""
```
## Dataclass/Config Format
```python
@dataclass
class ConfigName:
"""One-line description of configuration.
[Explanation of when/how to use this config.]
Parameters:
field1: Description of field1.
field2: Description of field2. Defaults to [default].
"""
field1: Type
field2: Type = default_value
```
Example:
```python
@dataclass
class FrameProcessorSetup:
"""Configuration parameters for frame processor initialization.
Parameters:
clock: The clock instance for timing operations.
task_manager: The task manager for handling async operations.
observer: Optional observer for monitoring frame processing events.
"""
clock: BaseClock
task_manager: BaseTaskManager
observer: Optional[BaseObserver] = None
```
## Enum Documentation Format
```python
class EnumName(Enum):
"""One-line description of the enum purpose.
[Longer description of how the enum is used.]
Parameters:
VALUE1: Description of VALUE1.
VALUE2: Description of VALUE2.
"""
VALUE1 = 1
VALUE2 = 2
```
## Writing Style Guidelines
- **Concise and professional** - No casual language or filler words
- **Action-oriented** - Start with verbs: "Processes...", "Manages...", "Converts..."
- **Purpose before implementation** - Explain WHY before HOW
- **Clear parameter descriptions** - Include type hints, defaults, and purpose
- **No redundant type info** - Type hints are in the signature, don't repeat in description
- **Use backticks for code references** - Wrap class names, method names, event names, parameter names, and code snippets in backticks
Good: "Neuphonic API key for authentication."
Bad: "str: The API key (string) that is used for authenticating with Neuphonic."
Good: "Triggers `on_speech_started` when the `VADAnalyzer` detects speech."
Bad: "Triggers on_speech_started when the VADAnalyzer detects speech."
## Deprecation Notice Format
When documenting deprecated code:
```python
"""[Description].
.. deprecated:: X.X.X
`ClassName` is deprecated and will be removed in a future version.
Use `NewClassName` instead.
"""
```
## Checklist
Before finishing, verify:
- [ ] Module has a docstring at the top (after copyright header and imports)
- [ ] All public classes have docstrings
- [ ] All `__init__` methods document their parameters
- [ ] All public methods have docstrings with Args/Returns/Raises as needed
- [ ] Dataclasses use "Parameters:" section for field descriptions
- [ ] Enums document each value in "Parameters:" section
- [ ] Writing is concise and action-oriented
- [ ] No documentation added to private methods (starting with `_`)
- [ ] Existing complete docstrings were left unchanged

View File

@@ -0,0 +1,128 @@
---
name: pr-description
description: Update a GitHub PR description with a summary of changes
---
Update a GitHub pull request description based on the changes in the PR.
## Arguments
```
/pr-description <PR_NUMBER> [--fixes <ISSUE_NUMBERS>]
```
- `PR_NUMBER` (required): The pull request number to update
- `--fixes` (optional): Comma-separated issue numbers that this PR fixes (e.g., `--fixes 123,456`)
Examples:
- `/pr-description 3534`
- `/pr-description 3534 --fixes 123`
- `/pr-description 3534 --fixes 123,456,789`
## Instructions
1. First, gather information about the PR:
- Use GitHub plugin to get PR details (title, current description, base branch)
- Use local git to get commits: `git log main..HEAD --oneline`
- Use local git to get the diff: `git diff main..HEAD`
- Parse any `--fixes` argument for issue numbers
2. Check the existing PR description:
- If it already has a complete, accurate description that reflects the changes, do nothing
- If it's missing sections, incomplete, or outdated compared to the actual changes, proceed to update
- If it only has the template placeholder text, generate a full description
3. Analyze the changes:
- Understand the purpose of each commit
- Identify any breaking changes (API changes, removed features, behavior changes)
- Look for new features, bug fixes, refactoring, or documentation changes
- Collect issue numbers from:
- The `--fixes` argument (if provided)
- Commit messages (patterns like "Fixes #123", "Closes #456", "Resolves #789")
4. Generate or update the PR description with these sections:
## PR Description Format
### Summary (always include)
Brief bullet points describing what changed and why. Focus on the *purpose* and *impact*, not implementation details.
```markdown
## Summary
- Added X to enable Y
- Fixed bug where Z would happen
- Refactored W for better maintainability
```
### Breaking Changes (include only if applicable)
Document any changes that affect existing users or APIs.
```markdown
## Breaking Changes
- `ClassName.method()` now requires a `param` argument
- Removed deprecated `old_function()` - use `new_function()` instead
```
### Testing (include when non-obvious)
How to verify the changes work. Skip for trivial changes.
```markdown
## Testing
- Run `uv run pytest tests/test_feature.py` to verify the fix
- Example usage: `uv run examples/new_feature.py`
```
### Fixes (include if issues are provided or found in commits)
List issues this PR fixes. GitHub will automatically close these issues when the PR is merged.
```markdown
## Fixes
- Fixes #123
- Fixes #456
```
Note: Use "Fixes #X" format (not "Closes" or "Resolves") for consistency. Each issue should be on its own line with "Fixes" to ensure GitHub auto-closes them.
## Guidelines
- **Be concise** - Reviewers should understand the PR in 30 seconds
- **Focus on why** - The diff shows *what* changed, explain *why*
- **Skip empty sections** - Only include sections that have content
- **Use bullet points** - Easier to scan than paragraphs
- **Don't duplicate the diff** - Avoid listing every file or line changed
## Example Output
```markdown
## Summary
- Added `/docstring` skill for documenting Python modules with Google-style docstrings
- Skill finds classes by name and handles conflicts when multiple matches exist
- Skips already-documented code to avoid unnecessary changes
## Testing
/docstring ClassName
## Fixes
- Fixes #123
```
## Checklist
Before updating the PR:
- [ ] Verified existing description needs updating (not already complete)
- [ ] Summary accurately reflects the changes
- [ ] Breaking changes are clearly documented (if any)
- [ ] No unnecessary sections included
- [ ] Description is concise and scannable

View File

@@ -0,0 +1,28 @@
---
name: pr-submit
description: Create and submit a GitHub PR from the current branch
---
Submit the current changes as a GitHub pull request.
## Instructions
1. Check the current state of the repository:
- Run `git status` to see staged, unstaged, and untracked changes
- Run `git diff` to see current changes
- Run `git log --oneline -10` to see recent commits
2. If there are uncommitted changes relevant to the PR:
- Ask the user if they want a specific prefix for the branch name (e.g., `alice/`, `fix/`, `feat/`)
- Create a new branch based on the current branch
- Commit the changes using multiple commits if the changes are unrelated
3. Push the branch and create the PR:
- Push with `-u` flag to set upstream tracking
- Create the PR using `gh pr create`
4. After the PR is created:
- Run `/changelog <pr_number>` to generate changelog files, then commit and push them
- Run `/pr-description <pr_number>` to update the PR description
5. Return the PR URL to the user.

View File

@@ -0,0 +1,250 @@
---
name: update-docs
description: Update documentation pages to match source code changes on the current branch
---
Update documentation pages to reflect source code changes on the current branch. Analyzes the diff against main, maps changed source files to their corresponding doc pages, and makes targeted edits.
## Arguments
```
/update-docs [DOCS_PATH]
```
- `DOCS_PATH` (optional): Path to the docs repository root. If not provided, ask the user.
Examples:
- `/update-docs /Users/me/src/docs`
- `/update-docs`
## Instructions
### Step 1: Resolve docs path
If `DOCS_PATH` was provided as an argument, use it. Otherwise, ask the user for the path to their docs repository.
Verify the path exists and contains `server/services/` subdirectory.
### Step 2: Create docs branch
Get the current pipecat branch name:
```bash
git rev-parse --abbrev-ref HEAD
```
In the docs repo, create a new branch off main with a matching name:
```bash
cd DOCS_PATH && git checkout main && git pull && git checkout -b {branch-name}-docs
```
For example, if the pipecat branch is `feat/new-service`, the docs branch becomes `feat/new-service-docs`.
All doc edits in subsequent steps are made on this branch.
### Step 3: Detect changed source files
Run:
```bash
git diff main..HEAD --name-only
```
Filter to files that could affect documentation:
- `src/pipecat/services/**/*.py` (service implementations)
- `src/pipecat/transports/**/*.py` (transport implementations)
- `src/pipecat/serializers/**/*.py` (serializer implementations)
- `src/pipecat/processors/**/*.py` (processor implementations)
- `src/pipecat/audio/**/*.py` (audio utilities)
- `src/pipecat/turns/**/*.py` (turn management)
- `src/pipecat/observers/**/*.py` (observers)
- `src/pipecat/pipeline/**/*.py` (pipeline core)
Ignore `__init__.py`, `__pycache__`, test files, and files that only contain type re-exports.
### Step 4: Map source files to doc pages
For each changed source file, find the corresponding doc page. Read the mapping file at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md` and apply its tiered lookup: tier 1 (known exceptions) → tier 2 (pattern matching) → tier 3 (search fallback). **First match wins.**
### Step 5: Analyze each source-doc pair
For each mapped pair:
1. **Read the full source file** to understand current state
2. **Read the diff** for that file: `git diff main..HEAD -- <source_file>`
3. **Read the current doc page** in full
Identify what changed by comparing source to docs:
- **Constructor parameters**: Compare `__init__` signature to the Configuration section's `<ParamField>` entries
- **InputParams fields**: Compare `InputParams(BaseModel)` class fields to the InputParams table
- **Event handlers**: Compare `_register_event_handler` calls and event handler definitions to Event Handlers section
- **Class names / imports**: Check if Usage examples reference correct names
- **Behavioral changes**: Check if Notes section needs updating
### Step 6: Make targeted edits
For each doc page that needs updates, edit **only the sections that need changes**. Preserve all other content exactly as-is.
#### Rules
- **Never remove content** unless the corresponding source code was removed
- **Never rewrite sections** that are already accurate
- **Match existing formatting** — if the page uses `<ParamField>` tags, use them; if it uses tables, use tables
- **Keep descriptions concise** — match the tone and length of surrounding content
- **Preserve CardGroup, links, and examples** unless they reference removed functionality
- **Don't touch frontmatter** unless the class was renamed
#### Section-specific guidance
**Configuration** (constructor params):
- Use `<ParamField path="name" type="type" default="value">` format if the page already uses it
- Add new params in logical order (required first, then optional)
- Remove params that no longer exist in source
- Update types/defaults that changed
**InputParams** (runtime settings):
- Use markdown table format: `| Parameter | Type | Default | Description |`
- Match the field names and types from the `InputParams(BaseModel)` class
- Include the default values from the source
**Usage** (code examples):
- Update import paths, class names, and parameter names
- Only modify examples if they would break or be misleading with the new API
- Don't rewrite working examples just to add new optional params
**Notes**:
- Add notes for new behavioral gotchas or breaking changes
- Remove notes about limitations that were fixed
- Keep existing notes that are still accurate
**Event Handlers**:
- Update the event table and example code
- Add new events, remove deleted ones
- Update handler signatures if they changed
**Overview / Key Features / Prerequisites**:
- Only update if the PR fundamentally changes what the service does (new capability, removed capability, renamed class)
- Most PRs will NOT need changes to these sections
### Step 7: Update guides
Guides at `DOCS_PATH/guides/` reference specific class names, parameters, imports, and code patterns. After completing reference doc edits, check if any guides need updates too.
For each changed source file, collect the class names, renamed parameters, and changed imports from the diff. Search the guides directory:
```bash
grep -rl "ClassName\|old_param_name" DOCS_PATH/guides/
```
For each guide that references changed code:
1. Read the full guide
2. Update class names, parameter names, import paths, and code examples that are now incorrect
3. **Don't rewrite prose** — only fix the specific references that changed
4. Leave guides alone if they reference the service generally but don't use any changed APIs
Guide directories:
- `guides/learn/` — conceptual tutorials (pipeline, LLM, STT, TTS, etc.)
- `guides/fundamentals/` — practical how-tos (metrics, recording, transcripts, etc.)
- `guides/features/` — feature-specific guides (Gemini Live, OpenAI audio, WhatsApp, etc.)
- `guides/telephony/` — telephony integration guides (Twilio, Plivo, Telnyx, etc.)
### Step 8: Identify doc gaps
After processing all mapped pairs, check for two kinds of gaps:
**Missing pages**: Source files that had no doc page mapping (neither tier 1, 2, nor 3) and are not marked as "(skip)". For each, tell the user:
- The source file path
- The main class(es) it defines
- Whether a new doc page should be created
**Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.
If the user wants a new page, create it using this template structure:
```
---
title: "Service Name"
description: "Brief description"
---
## Overview
[Description from class docstring or source analysis]
<CardGroup cols={2}>
[Cards for API reference and examples if available]
</CardGroup>
## Installation
```bash
pip install "pipecat-ai[package-name]"
```
## Prerequisites
[Environment variables and account setup]
## Configuration
[ParamField entries for constructor params]
## InputParams
[Table of InputParams fields, if the service has them]
## Usage
### Basic Setup
```python
[Minimal working example]
```
## Notes
[Important caveats]
## Event Handlers
[Event table and example code]
```
### Step 9: Output summary
After all edits are complete, print a summary:
```
## Documentation Updates
### Updated reference pages
- `server/services/stt/deepgram.mdx` — Updated Configuration (added `new_param`), InputParams (updated `language` default)
- `server/services/tts/elevenlabs.mdx` — Updated Event Handlers (added `on_connected`)
### Updated guides
- `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param``new_param`)
### Unmapped source files
- `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)
### Skipped files
- `src/pipecat/services/ai_service.py` — internal base class
```
## Guidelines
- **Be conservative** — only change what the diff warrants. Don't "improve" docs beyond what changed in source.
- **Read before editing** — always read the full doc page before making changes so you understand the existing structure.
- **Preserve voice** — match the writing style of the existing doc page, don't impose a different tone.
- **One PR at a time** — this skill operates on the current branch's diff against main. Don't look at other branches.
- **Parallel analysis** — when multiple source files map to different doc pages, analyze and edit them in parallel for efficiency.
- **Shared source files** — files like `services/google/google.py` are shared bases. Check which services import from them and update all affected doc pages.
## Checklist
Before finishing, verify:
- [ ] All changed source files were checked against the mapping table
- [ ] Each doc page edit matches the actual source code change (not guessed)
- [ ] No content was removed unless the corresponding source was removed
- [ ] New parameters have accurate types and defaults from source
- [ ] Formatting matches the existing page style
- [ ] Guides referencing changed APIs were checked and updated
- [ ] Unmapped files were reported to the user

View File

@@ -0,0 +1,79 @@
# Source-to-Doc Mapping
Maps pipecat source files to their documentation pages. Source paths are relative to `src/pipecat/`. Doc paths are relative to `DOCS_PATH`.
## Name mismatches
These source paths don't follow the standard `services/{provider}/{type}.py``server/services/{type}/{provider}.mdx` pattern.
| Source path | Doc page |
|---|---|
| `services/google/llm.py` | `server/services/llm/gemini.mdx` |
| `services/google/llm_vertex.py` | `server/services/llm/google-vertex.mdx` |
| `services/google/google.py` | (shared base — check which services use it) |
| `services/google/gemini_live/**` | `server/services/s2s/gemini-live.mdx` |
| `services/google/gemini_live/llm_vertex.py` | `server/services/s2s/gemini-live-vertex.mdx` |
| `services/aws_nova_sonic/**` | `server/services/s2s/aws.mdx` |
| `services/ultravox/**` | `server/services/s2s/ultravox.mdx` |
| `services/grok/realtime/**` | `server/services/s2s/grok.mdx` |
| `services/openai/realtime/**` | `server/services/s2s/openai.mdx` |
| `processors/frameworks/rtvi.py` | `server/frameworks/rtvi/rtvi-processor.mdx` and `server/frameworks/rtvi/rtvi-observer.mdx` |
| `processors/transcript_processor.py` | `server/utilities/transcript-processor.mdx` |
| `processors/user_idle_processor.py` | `server/utilities/user-idle-processor.mdx` |
| `processors/idle_frame_processor.py` | `server/pipeline/pipeline-idle-detection.mdx` |
| `pipeline/task.py` | `server/pipeline/pipeline-task.mdx` |
| `pipeline/runner.py` | `server/utilities/runner/guide.mdx` |
| `transports/base_transport.py` | `server/services/transport/transport-params.mdx` |
## Skip list
These files should never trigger doc updates.
| Pattern | Reason |
|---|---|
| `services/ai_service.py` | Internal base class |
| `services/stt_service.py` | Internal base class |
| `services/tts_service.py` | Internal base class |
| `services/llm_service.py` | Internal base class |
| `services/websocket_service.py` | Internal base class |
| `services/openai_realtime_beta/**` | Deprecated |
| `services/openai_realtime/**` | Deprecated |
| `services/gemini_multimodal_live/**` | Deprecated |
| `services/aws/agent_core.py` | Internal |
| `services/aws/sagemaker/**` | No doc page |
| `transports/base_input.py` | Internal base class |
| `transports/base_output.py` | Internal base class |
| `transports/websocket/client.py` | No doc page |
| `serializers/base_serializer.py` | Internal base class |
| `serializers/protobuf.py` | Internal |
| `processors/audio/**` | Internal |
| `pipeline/pipeline.py` | Core architecture, not a service doc |
## Pattern matching
For files not in the tables above, apply these patterns. Convert underscores to hyphens in provider names for doc filenames.
| Source pattern | Doc pattern |
|---|---|
| `services/{provider}/stt*.py` | `server/services/stt/{provider}.mdx` |
| `services/{provider}/tts*.py` | `server/services/tts/{provider}.mdx` |
| `services/{provider}/llm*.py` | `server/services/llm/{provider}.mdx` |
| `services/{provider}/image*.py` | `server/services/image-generation/{provider}.mdx` |
| `services/{provider}/video*.py` | `server/services/video/{provider}.mdx` |
| `services/{provider}/realtime/**` | `server/services/s2s/{provider}.mdx` |
| `transports/{name}/**` | `server/services/transport/{name}.mdx` |
| `serializers/{name}.py` | `server/services/serializers/{name}.mdx` |
| `observers/**` | `server/utilities/observers/` (match by class name) |
| `audio/vad/**` | `server/utilities/audio/` (match by class name) |
| `audio/filters/**` | `server/utilities/audio/` (match by class name) |
| `audio/mixers/**` | `server/utilities/audio/` (match by class name) |
| `processors/filters/**` | `server/utilities/filters/` (match by class name) |
If the doc file doesn't exist at the resolved path, the file is **unmapped**.
## Search fallback
For files that don't match any table or pattern above:
1. Extract the main class name(s) from the source file
2. Search the docs directory for that class name: `grep -r "ClassName" DOCS_PATH/server/`
3. If found in a doc page, use that as the mapping

View File

@@ -29,11 +29,21 @@ jobs:
- name: Install system packages
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra google \
--extra langchain \
--extra livekit \
--extra local-smart-turn-v3 \
--extra piper \
--extra tracing \
--extra websocket
- name: Run tests with coverage
run: |

View File

@@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.10.18', '3.11.13', '3.12.11', '3.13.5']
python-version: ['3.10.19', '3.11.14', '3.12.12', '3.13.12']
name: Python ${{ matrix.python-version }}
steps:
@@ -40,20 +40,10 @@ jobs:
uv python install ${{ matrix.python-version }}
uv python pin ${{ matrix.python-version }}
- name: Test uv sync with all extras (Python < 3.13)
if: "!startsWith(matrix.python-version, '3.13.')"
- name: Test uv sync with all extras
run: |
uv sync --group dev --all-extras --no-extra krisp
- name: Test uv sync without PyTorch extras (Python 3.13+)
if: startsWith(matrix.python-version, '3.13.')
run: |
uv sync --group dev --all-extras \
--no-extra krisp \
--no-extra local-smart-turn \
--no-extra moondream \
--no-extra mlx-whisper
- name: Verify installation
run: |
uv run python --version

View File

@@ -33,11 +33,21 @@ jobs:
- name: Install system packages
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra google \
--extra langchain \
--extra livekit \
--extra local-smart-turn-v3 \
--extra piper \
--extra tracing \
--extra websocket
- name: Test with pytest
run: |

16
.gitignore vendored
View File

@@ -4,7 +4,14 @@ __pycache__/
*~
venv
.venv
/.idea
.idea
.gradle
.next
next-env.d.ts
local.properties
*.log
*.lock
smart_turn_audio_log
#*#
# Distribution / Packaging
@@ -27,7 +34,7 @@ share/python-wheels/
*.egg
MANIFEST
.DS_Store
.env
.env*
fly.toml
# Examples
@@ -51,4 +58,7 @@ docs/api/_build/
docs/api/api
# uv
.python-version
.python-version
# Pipecat
whisker_setup.py

File diff suppressed because it is too large Load Diff

155
CLAUDE.md Normal file
View File

@@ -0,0 +1,155 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
```
## Architecture
### Frame-Based Pipeline Processing
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
### Key Directories
| Directory | Purpose |
|---------------------------|----------------------------------------------------|
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
### Docstring Example
```python
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.

View File

@@ -73,15 +73,15 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Category | Services |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |

View File

@@ -1,42 +0,0 @@
- Introducing user turn strategies. User turn strategies indicate when the user turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategy
Available user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategy
The default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]
Turn strategies are configured when setting up `LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
)
],
)
),
)
```
In order to use the user turn strategies you must update to the new universal `LLMContext` and `LLMContextAggregatorPair`.

View File

@@ -1 +0,0 @@
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- `FrameProcessor.interruption_strategies` is deprecated, use `LLMUserAggregator`'s new `user_turn_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` frames are deprecated.

View File

@@ -1 +0,0 @@
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames.

View File

@@ -1 +0,0 @@
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new universal `LLMContext` and `LLMContextAggregatorPair` instead.

View File

@@ -1 +0,0 @@
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with `LLMUserAggregator`'s new `user_turn_strategies` parameter instead.

View File

@@ -1 +0,0 @@
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural network via pyrnnoise library.

View File

@@ -1,7 +0,0 @@
- Updated `ElevenLabsRealtimeSTTService` to accept the `include_language_detection` parameter to detect language.
```python
stt = ElevenLabsRealtimeSTTService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
include_language_detection=True
)
```

View File

@@ -1,15 +0,0 @@
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved VAD,
Smart Turn capabilities, and brings dramatic improvements to latency without
any impact on accuracy. Use the `turn_detection_mode` parameter to control the
endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
```python
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
)
```

View File

@@ -1,4 +0,0 @@
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is deprecated.
Use the new `turn_detection_mode` parameter instead, with `TurnDetectionMode.EXTERNAL`,
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. The `enable_vad`
parameter is also deprecated and is inferred from the `turn_detection_mode`.

View File

@@ -1,2 +0,0 @@
- Improved error handling in `ElevenLabsRealtimeSTTService`
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop that blocks the process if the websocket disconnects due to an error

View File

@@ -1 +0,0 @@
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by `DailyTransport` now include the transport source (i.e., the originating audio track).

View File

@@ -1 +0,0 @@
- `daily-python` updated to 0.23.0.

View File

@@ -1,15 +0,0 @@
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are now deprecated in favor of the universal `LLMContext` and its associated things.
From the developer's point of view, switching to using `LLMContext` machinery will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```

View File

@@ -1,8 +0,0 @@
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time voice conversations:
- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz)

View File

@@ -1 +0,0 @@
- Added an approximation of TTFB for Ultravox.

View File

@@ -1,5 +0,0 @@
- Updates to Inworld TTS services:
- Improved `InworldTTSService`'s websocket implementation to better flush and
close context to better handle long inputs.
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.

View File

@@ -1 +0,0 @@
- Added a new `AudioContextTTSService` to the TTS service base classes. The `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and `WebsocketWordTTSService`.

View File

@@ -1,4 +0,0 @@
- `LLMUserAggregator` now exposes the following events:
- `on_user_turn_started`: triggered when a user turn starts
- `on_user_turn_stopped`: triggered when a user turn ends
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop and times out

View File

@@ -1,29 +0,0 @@
- Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations.
A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted.
Available user mute strategies:
* `FirstSpeechUserMuteStrategy`
* `MuteUntilFirstBotCompleteUserMuteStrategy`
* `AlwaysUserMuteStrategy`
* `FunctionCallUserMuteStrategy`
User mute strategies replace the legacy `STTMuteFilter` and provide a more flexible and composable approach to muting user input.
User mute strategies are configured when setting up the `LLMContextAggregatorPair`. For example:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_mute_strategies=[
FirstSpeechUserMuteStrategy(),
]
),
)
```
In order to use user mute strategies you should update to the new universal `LLMContext` and `LLMContextAggregatorPair`.

View File

@@ -1 +0,0 @@
- `STTMuteFilter` is deprecated and will be removed in a future version. Use `LLMUserAggregator`'s new `user_mute_strategies` instead.

View File

@@ -1 +0,0 @@
- Fixed a bug in `STTMuteFilter` where the user was not always muted during function calls, especially when there were multiple simultaneous calls.

View File

@@ -1 +0,0 @@
- `FrameProcessor.interruptions_allowed` is now deprecated, use `LLMUserAggregator`'s new parameter `user_mute_strategies` instead.

View File

@@ -1,12 +0,0 @@
- `PipelineParams.allow_interruptions` is now deprecated, use `LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For example, to disable interruptions but still get user turns you can do:
```python
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
),
),
)
```

View File

@@ -1 +0,0 @@
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService` and `NvidiaTTSService`.

View File

@@ -1 +0,0 @@
- Updated `DeepgramSTTService` to push user started/stopped speaking and interruption frames when `vad_enabled` is set to true. This centralizes the frames into the service, removing the need to have your application code handle Deepgram's events and push these frames.

View File

@@ -1 +0,0 @@
- Added `enable_interruptions` constructor argument to all user turn strategies. This tells the `LLMUserAggregator` to push or not push an `InterruptionFrame`.

View File

@@ -1 +0,0 @@
- Added `52-live-transcription.py` foundational example demonstrating live transcription and translation from English to Spanish. In this example, the bot is not interruptible: as the user continues speaking, English transcriptions are queued, and the bot continuously translates and speaks each queued sentence in Spanish without being interrupted by new user speech.

View File

@@ -1 +0,0 @@
- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate memory" error when processing silence audio frames.

View File

@@ -1 +0,0 @@
- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control sentence splitting behavior for finals on sentence boundaries.

View File

@@ -1,4 +0,0 @@
- Updated `SpeechmaticsSTTService` for version `0.0.99+`:
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame` in order to finalize transcription.
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn detection.
- Only emit VAD + interruption frames if VAD is enabled within the plugin (modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).

View File

@@ -1 +0,0 @@
- Added encoding validation to `DeepgramTTSService` to prevent unsupported encodings from reaching the API. The service now raises `ValueError` at initialization with a clear error message.

View File

@@ -1,2 +0,0 @@
- Added word-level timestamp support to `AzureTTSService` for accurate text-to-audio synchronization.

View File

@@ -1 +0,0 @@
- Updated `read_audio_frame` & `read_video_frame` methods in `SmallWebRTCClient` to check if the track is enabled before logging a warning.

View File

@@ -1 +0,0 @@
- Fixed an issue with function calling where a handler failing to invoke its result callback could leave the context stuck in IN_PROGRESS, causing LLM inference for subsequent function call results to block while waiting on the unresolved call.

View File

@@ -1 +0,0 @@
- Fixed an issue with DeepgramTTSService where the model would output "Dot" instead of a period in some circumstances.

View File

@@ -1 +0,0 @@
- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams` and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation dictionary feature for custom pronunciations.

View File

@@ -1 +0,0 @@
- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were occasionally not pushed.

View File

@@ -1 +0,0 @@
- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport` (see https://www.liveavatar.com/).

View File

@@ -1,8 +0,0 @@
- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:
- New `start_video_paused` parameter to control initial video input state
- New `video_frame_detail` parameter to set image processing quality ("auto",
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
parameter.
- `set_video_input_paused()` method to pause/resume video input at runtime
- `set_video_frame_detail()` method to adjust video frame quality dynamically
- Automatic rate limiting (1 frame per second) to prevent API overload

View File

@@ -1 +0,0 @@
- Updated `CartesiaTTSService` to support setting `language=None`, resulting in Cartesia auto-detecting the language of the conversation.

View File

@@ -1,3 +0,0 @@
- The bundled Smart Turn weights are now updated to v3.2, which has better
handling of short utterances, and is more robust against background
noise.

View File

@@ -1 +0,0 @@
- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6`

View File

@@ -1 +0,0 @@
- Added `UserTurnProcessor`, a frame processor built on `UserTurnController` that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames and interruptions based on the controller's user turn strategies.

View File

@@ -1 +0,0 @@
- Added `UserTurnController` to manage user turns. It emits `on_user_turn_started`, `on_user_turn_stopped`, and `on_user_turn_stop_timeout` events, and can be integrated into processors to detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are implemented using this controller.

View File

@@ -1 +0,0 @@
- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows how to use `UserTurnProcessor`.

View File

@@ -1 +0,0 @@
- Added `should_interrupt` property to `DeepgramFluxSTTService`, `DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the bot should be interrupted when the external service detects user speech.

View File

@@ -1,5 +0,0 @@
- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
meaning that the start of the turn audio is not cut off. This improves
accuracy for short utterances.
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.

View File

@@ -1,4 +0,0 @@
- `LLMAssistantAggregator` now exposes the following events:
- `on_assistant_turn_started`: triggered when the assistant turn starts
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
- `on_assistant_thought`: triggered when there's an assistant thought available

View File

@@ -1 +0,0 @@
- `TranscriptProcessor` and related data classes and frames (`TranscriptionMessage`, `ThoughtTranscriptionMessage`, `TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and `LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and `on_assistant_turn_stopped`) instead.

View File

@@ -1 +0,0 @@
- Added a new foundational example `28-user-assistant-turns.py` that shows how to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to gather a conversation transcript.

View File

@@ -1 +0,0 @@
- Deprecated support for the `vad_events` `LiveOptions` in `DeepgramSTTService`. Instead, use a local Silero VAD for VAD events. Additionally, deprecated `should_interrupt` which will be removed along with `vad_events` support in a future release.

View File

@@ -1 +0,0 @@
- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA SDK (requires `krisp_audio`).

View File

@@ -1 +0,0 @@
- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter` to share a single SDK instance within the same process.

View File

@@ -1 +0,0 @@
- Fixed potential memory leaks and initialization issues in `KrispVivaFilter` by improving SDK lifecycle management.

View File

@@ -1,6 +0,0 @@
- Added support for setting up a pipeline task from external files. You can now register custom pipeline task setup files by setting the `PIPECAT_SETUP_FILES` environment variable. This variable should contain a colon-separated list of Python files (e.g. `export PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a function with the following signature:
```python
async def setup_pipeline_task(task: PipelineTask):
...
```

View File

@@ -1 +0,0 @@
- Loading external observers from files is deprecated, use the new pipeline task setup files and `PIPECAT_SETUP_FILES` environment variable instead.

View File

@@ -1 +0,0 @@
- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english` and voice ID to `autumn`.

View File

@@ -1 +0,0 @@
- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was set after awaiting, allowing the event loop to re-enter the method before the guard was set.

1
changelog/3615.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed race condition where `RTVIObserver` could send messages before `DailyTransport` join completed. Outbound messages are now queued & delivered after the transport is ready.

1
changelog/3625.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `"timestampTransportStrategy": "ASYNC"` to `InworldAITTSService`. This allows timestamps info to trail audio chunks arrival, resulting in much better first audio chunk latency

1
changelog/3642.added.md Normal file
View File

@@ -0,0 +1 @@
- Added model-specific `InputParams` to `RimeTTSService`: arcana params (`repetition_penalty`, `temperature`, `top_p`) and mistv2 params (`no_text_normalization`, `save_oovs`, `segment`). Model, voice, and param changes now trigger WebSocket reconnection.

View File

@@ -0,0 +1 @@
- ⚠️ `RimeTTSService` now defaults to `model="arcana"` and the `wss://users-ws.rime.ai/ws3` endpoint. `InputParams` defaults changed from mistv2-specific values to `None` — only explicitly-set params are sent as query params.

View File

@@ -0,0 +1,3 @@
- `AICFilter` now shares read-only AIC models via a singleton `AICModelManager` in `aic_filter.py`.
- Multiple filters using the same model path or `(model_id, model_download_dir)` share one loaded model, with reference counting and concurrent load deduplication.
- Model file I/O runs off the event loop so the filter does not block.

1
changelog/3698.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed async generator cleanup in OpenAI LLM streaming to prevent `AttributeError` with uvloop on Python 3.12+ (MagicStack/uvloop#699).

View File

@@ -0,0 +1 @@
- Added `X-User-Agent` and `X-Request-Id` headers to `InworldTTSService` for better traceability.

1
changelog/3713.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `SmallWebRTCTransport` input audio resampling to properly handle all sample rates, including 8kHz audio.

1
changelog/3718.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a race condition in `RTVIObserver` where bot output messages could be sent before the bot-started-speaking event.

View File

@@ -0,0 +1 @@
- Added `write_transport_frame()` hook to `BaseOutputTransport` allowing transport subclasses to handle custom frame types that flow through the audio queue.

1
changelog/3719.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `DailySIPTransferFrame` and `DailySIPReferFrame` to the Daily transport. These frames queue SIP transfer and SIP REFER operations with audio, so the operation executes only after the bot finishes its current utterance.

View File

@@ -0,0 +1 @@
- `DailyUpdateRemoteParticipantsFrame` is no longer deprecated and is now queued with audio like other transport frames.

1
changelog/3720.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Grok Realtime `session.updated` event parsing failure caused by the API returning prefixed voice names (e.g. `"human_Ara"` instead of `"Ara"`).

View File

@@ -0,0 +1 @@
- Bumped Pillow dependency upper bound from `<12` to `<13` to allow Pillow 12.x.

View File

@@ -0,0 +1 @@
- Fixed context ID reuse issue in `ElevenLabsTTSService`, `InworldTTSService`, `RimeTTSService`, `CartesiaTTSService`, `AsyncAITTSService`, and `PlayHTTTSService`. Services now properly reuse the same context ID across multiple `run_tts()` invocations within a single LLM turn, preventing context tracking issues and incorrect lifecycle signaling.

1
changelog/3729.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed word timestamp interleaving issue in `ElevenLabsTTSService` when processing multiple sentences within a single LLM turn.

1
changelog/3730.added.md Normal file
View File

@@ -0,0 +1 @@
- Added keepalive support to `SarvamSTTService` to prevent idle connection timeouts (e.g. when used behind a `ServiceSwitcher`).

View File

@@ -0,0 +1 @@
- Moved STT keepalive mechanism from `WebsocketSTTService` to the `STTService` base class, allowing any STT service (not just websocket-based ones) to use idle-connection keepalive via the `keepalive_timeout` and `keepalive_interval` parameters.

View File

@@ -0,0 +1,3 @@
- Improved audio context management in `AudioContextTTSService` by moving context ID tracking to the base class and adding `reuse_context_id_within_turn` parameter to control concurrent TTS request handling.
- Added helper methods: `has_active_audio_context()`, `get_active_audio_context_id()`, `remove_active_audio_context()`, `reset_active_audio_context()`
- Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS implementations by removing duplicate context management code

View File

@@ -0,0 +1 @@
- Deprecated unused `Traceable`, `@traceable`, `@traced`, and `AttachmentStrategy` in `pipecat.utils.tracing.class_decorators`. This module will be removed in a future release.

1
changelog/3735.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed tracing service decorators executing the wrapped function twice when the function itself raised an exception (e.g., LLM rate limit, TTS timeout).

1
changelog/3737.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `LLMUserAggregator` broadcasting mute events before `StartFrame` reaches downstream processors.

1
changelog/3744.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `UserIdleController` false idle triggers caused by gaps between user and bot activity frames. The idle timer now starts only after `BotStoppedSpeakingFrame` and is suppressed during active user turns and function calls.

1
changelog/3748.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `UserIdleTimeoutUpdateFrame` to enable or disable user idle detection at runtime by updating the timeout dynamically.

Some files were not shown because too many files have changed in this diff Show More