From 4531d517daf78b6577a4a72f59a9997f957bfab5 Mon Sep 17 00:00:00 2001 From: aconchillo <951761+aconchillo@users.noreply.github.com> Date: Wed, 14 Jan 2026 00:49:15 +0000 Subject: [PATCH 1/2] Update changelog for version 0.0.99 --- CHANGELOG.md | 522 +++++++++++++++++++++++++++++++++ changelog/3045.added.md | 42 --- changelog/3045.deprecated.2.md | 1 - changelog/3045.deprecated.3.md | 1 - changelog/3045.deprecated.4.md | 1 - changelog/3045.deprecated.5.md | 1 - changelog/3045.deprecated.6.md | 1 - changelog/3045.deprecated.md | 1 - changelog/3205.added.md | 1 - changelog/3216.changed.md | 7 - changelog/3225.changed.md | 15 - changelog/3225.deprecated.md | 4 - changelog/3233.fixed.md | 2 - changelog/3257.changed.2.md | 1 - changelog/3257.changed.md | 1 - changelog/3263.deprecated.md | 15 - changelog/3267.added.md | 8 - changelog/3268.added.md | 1 - changelog/3288.changed.md | 5 - changelog/3289.added.md | 1 - changelog/3291.added.md | 4 - changelog/3292.added.md | 29 -- changelog/3292.deprecated.md | 1 - changelog/3292.fixed.md | 1 - changelog/3297.deprecated.2.md | 1 - changelog/3297.deprecated.md | 12 - changelog/3300.added.md | 1 - changelog/3314.changed.md | 1 - changelog/3316.added.md | 1 - changelog/3316.other.md | 1 - changelog/3322.fixed.md | 1 - changelog/3328.added.md | 1 - changelog/3328.fixed.md | 4 - changelog/3329.changed.md | 1 - changelog/3334.added.md | 2 - changelog/3336.changed.md | 1 - changelog/3343.fixed.md | 1 - changelog/3345.fixed.md | 1 - changelog/3346.added.md | 1 - changelog/3351.fixed.md | 1 - changelog/3356.fixed.md | 1 - changelog/3357.added.md | 1 - changelog/3360.added.md | 8 - changelog/3366.changed.md | 1 - changelog/3367.changed.md | 3 - changelog/3371.changed.md | 1 - changelog/3372.added.2.md | 1 - changelog/3372.added.md | 1 - changelog/3372.other.md | 1 - changelog/3374.added.md | 1 - changelog/3377.changed.md | 5 - changelog/3385.added.md | 4 - changelog/3385.deprecated.md | 1 - changelog/3385.other.md | 1 - changelog/3386.deprecated.md | 1 - changelog/3391.added.md | 1 - changelog/3391.changed.md | 1 - changelog/3391.fixed.md | 1 - changelog/3397.added.md | 6 - changelog/3397.deprecated.md | 1 - changelog/3399.changed.md | 1 - changelog/3400.fixed.md | 1 - changelog/3403.added.md | 1 - changelog/3404.added.2.md | 1 - changelog/3404.added.md | 1 - changelog/3410.added.md | 1 - changelog/3410.changed.md | 1 - changelog/3422.fixed.md | 1 - changelog/3424.added.md | 1 - changelog/3424.changed.md | 1 - changelog/3428.fixed.md | 1 - changelog/3430.fixed.md | 1 - changelog/3431.changed.md | 1 - changelog/3435.fixed.md | 1 - 74 files changed, 522 insertions(+), 230 deletions(-) delete mode 100644 changelog/3045.added.md delete mode 100644 changelog/3045.deprecated.2.md delete mode 100644 changelog/3045.deprecated.3.md delete mode 100644 changelog/3045.deprecated.4.md delete mode 100644 changelog/3045.deprecated.5.md delete mode 100644 changelog/3045.deprecated.6.md delete mode 100644 changelog/3045.deprecated.md delete mode 100644 changelog/3205.added.md delete mode 100644 changelog/3216.changed.md delete mode 100644 changelog/3225.changed.md delete mode 100644 changelog/3225.deprecated.md delete mode 100644 changelog/3233.fixed.md delete mode 100644 changelog/3257.changed.2.md delete mode 100644 changelog/3257.changed.md delete mode 100644 changelog/3263.deprecated.md delete mode 100644 changelog/3267.added.md delete mode 100644 changelog/3268.added.md delete mode 100644 changelog/3288.changed.md delete mode 100644 changelog/3289.added.md delete mode 100644 changelog/3291.added.md delete mode 100644 changelog/3292.added.md delete mode 100644 changelog/3292.deprecated.md delete mode 100644 changelog/3292.fixed.md delete mode 100644 changelog/3297.deprecated.2.md delete mode 100644 changelog/3297.deprecated.md delete mode 100644 changelog/3300.added.md delete mode 100644 changelog/3314.changed.md delete mode 100644 changelog/3316.added.md delete mode 100644 changelog/3316.other.md delete mode 100644 changelog/3322.fixed.md delete mode 100644 changelog/3328.added.md delete mode 100644 changelog/3328.fixed.md delete mode 100644 changelog/3329.changed.md delete mode 100644 changelog/3334.added.md delete mode 100644 changelog/3336.changed.md delete mode 100644 changelog/3343.fixed.md delete mode 100644 changelog/3345.fixed.md delete mode 100644 changelog/3346.added.md delete mode 100644 changelog/3351.fixed.md delete mode 100644 changelog/3356.fixed.md delete mode 100644 changelog/3357.added.md delete mode 100644 changelog/3360.added.md delete mode 100644 changelog/3366.changed.md delete mode 100644 changelog/3367.changed.md delete mode 100644 changelog/3371.changed.md delete mode 100644 changelog/3372.added.2.md delete mode 100644 changelog/3372.added.md delete mode 100644 changelog/3372.other.md delete mode 100644 changelog/3374.added.md delete mode 100644 changelog/3377.changed.md delete mode 100644 changelog/3385.added.md delete mode 100644 changelog/3385.deprecated.md delete mode 100644 changelog/3385.other.md delete mode 100644 changelog/3386.deprecated.md delete mode 100644 changelog/3391.added.md delete mode 100644 changelog/3391.changed.md delete mode 100644 changelog/3391.fixed.md delete mode 100644 changelog/3397.added.md delete mode 100644 changelog/3397.deprecated.md delete mode 100644 changelog/3399.changed.md delete mode 100644 changelog/3400.fixed.md delete mode 100644 changelog/3403.added.md delete mode 100644 changelog/3404.added.2.md delete mode 100644 changelog/3404.added.md delete mode 100644 changelog/3410.added.md delete mode 100644 changelog/3410.changed.md delete mode 100644 changelog/3422.fixed.md delete mode 100644 changelog/3424.added.md delete mode 100644 changelog/3424.changed.md delete mode 100644 changelog/3428.fixed.md delete mode 100644 changelog/3430.fixed.md delete mode 100644 changelog/3431.changed.md delete mode 100644 changelog/3435.fixed.md diff --git a/CHANGELOG.md b/CHANGELOG.md index a22a905ad..290a7802c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,528 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 +## [0.0.99] - 2026-01-13 + +### Added + +- Introducing user turn strategies. User turn strategies indicate when the user + turn starts or stops. In conversational agents, these are often referred to + as start/stop speaking or turn-taking plans or policies. + + User turn start strategies indicate when the user starts speaking (e.g. + using VAD events or when a user says one or more words). + + User turn stop strategies indicate when the user stops speaking (e.g. using + an end-of-turn detection model or by observing incoming transcriptions). + + A list of strategies can be specified for both strategies; strategies are + evaluated in order until one evaluates to true. + + Available user turn start strategies: + - VADUserTurnStartStrategy + - TranscriptionUserTurnStartStrategy + - MinWordsUserTurnStartStrategy + - ExternalUserTurnStartStrategy + + Available user turn stop strategies: + - TranscriptionUserTurnStopStrategy + - TurnAnalyzerUserTurnStopStrategy + - ExternalUserTurnStopStrategy + + The default strategies are: + + - start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy] + - stop: [TranscriptionUserTurnStopStrategy] + + Turn strategies are configured when setting up `LLMContextAggregatorPair`. + For example: + + ```python + context_aggregator = LLMContextAggregatorPair( + context, + user_params=LLMUserAggregatorParams( + user_turn_strategies=UserTurnStrategies( + stop=[ + TurnAnalyzerUserTurnStopStrategy( +turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) + ) + ], + ) + ), + ) + ``` + + In order to use the user turn strategies you must update to the new + universal `LLMContext` and `LLMContextAggregatorPair`. +(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + +- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural + network via pyrnnoise library. +(PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205)) + +- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time + voice conversations: + + - Support for real-time audio streaming with WebSocket connection + - Built-in server-side VAD (Voice Activity Detection) + - Multiple voice options: Ara, Rex, Sal, Eve, Leo + - Built-in tools support: web_search, x_search, file_search + - Custom function calling with standard Pipecat tools schema + - Configurable audio formats (PCM at 8kHz-48kHz) +(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267)) + +- Added an approximation of TTFB for Ultravox. +(PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268)) + +- Added a new `AudioContextTTSService` to the TTS service base classes. The + `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and + `WebsocketWordTTSService`. +(PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289)) + +- `LLMUserAggregator` now exposes the following events: + - `on_user_turn_started`: triggered when a user turn starts + - `on_user_turn_stopped`: triggered when a user turn ends + - `on_user_turn_stop_timeout`: triggered when a user turn does not stop + and times out +(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291)) + +- Introducing user mute strategies. User mute strategies indicate when user + input should be muted based on the current system state. + + In conversational agents, user mute strategies are used to prevent user + input from interrupting bot speech, tool execution, or other critical system + operations. + + A list of strategies can be specified; all strategies are evaluated for + every frame so that each strategy can maintain its internal state. A user + frame is muted if any of the configured strategies indicates it should be + muted. + + Available user mute strategies: + + * `FirstSpeechUserMuteStrategy` + * `MuteUntilFirstBotCompleteUserMuteStrategy` + * `AlwaysUserMuteStrategy` + * `FunctionCallUserMuteStrategy` + + User mute strategies replace the legacy `STTMuteFilter` and provide a more + flexible and composable approach to muting user input. + + User mute strategies are configured when setting up the + `LLMContextAggregatorPair`. For example: + + ```python + context_aggregator = LLMContextAggregatorPair( + context, + user_params=LLMUserAggregatorParams( + user_mute_strategies=[ + FirstSpeechUserMuteStrategy(), + ] + ), + ) + ``` + + In order to use user mute strategies you should update to the new universal + `LLMContext` and `LLMContextAggregatorPair`. +(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) + +- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService` + and `NvidiaTTSService`. +(PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300)) + +- Added `enable_interruptions` constructor argument to all user turn + strategies. This tells the `LLMUserAggregator` to push or not push an + `InterruptionFrame`. +(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316)) + +- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control + sentence splitting behavior for finals on sentence boundaries. +(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328)) + +- Added word-level timestamp support to `AzureTTSService` for accurate + text-to-audio synchronization. +(PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334)) + +- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams` + and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation + dictionary feature for custom pronunciations. +(PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346)) + +- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport` + (see https://www.liveavatar.com/). +(PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357)) + +- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`: + - New `start_video_paused` parameter to control initial video input state + - New `video_frame_detail` parameter to set image processing quality + ("auto", + "low", or "high"). This corresponds to OpenAI Realtime's `image_detail` + parameter. + - `set_video_input_paused()` method to pause/resume video input at runtime + - `set_video_frame_detail()` method to adjust video frame quality + dynamically + - Automatic rate limiting (1 frame per second) to prevent API overload +(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360)) + +- Added `UserTurnProcessor`, a frame processor built on `UserTurnController` + that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames + and interruptions based on the controller's user turn strategies. +(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) + +- Added `UserTurnController` to manage user turns. It emits + `on_user_turn_started`, `on_user_turn_stopped`, and + `on_user_turn_stop_timeout` events, and can be integrated into processors to + detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are + implemented using this controller. +(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) + +- Added `should_interrupt` property to `DeepgramFluxSTTService`, + `DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the + bot should be interrupted when the external service detects user speech. +(PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374)) + +- `LLMAssistantAggregator` now exposes the following events: + - `on_assistant_turn_started`: triggered when the assistant turn starts + - `on_assistant_turn_stopped`: triggered when the assistant turn ends + - `on_assistant_thought`: triggered when there's an assistant thought + available +(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) + +- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA + SDK (requires `krisp_audio`). +(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) + +- Added support for setting up a pipeline task from external files. You can now + register custom pipeline task setup files by setting the + `PIPECAT_SETUP_FILES` environment variable. This variable should contain a + colon-separated list of Python files (e.g. `export + PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a + function with the following signature: + + ```python + async def setup_pipeline_task(task: PipelineTask): + ... + ``` +(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397)) + +- Added a keepalive task for `InworldTTSService` to keep the service connected + in the event of no generations for longer periods of time. +(PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403)) + +- Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When + enabled, `GladiaSTTService` acts as the turn controller, emitting + `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally + `InterruptionFrame`. +(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404)) + +- Added `should_interrupt` property to `GladiaSTTService` to configure whether + the bot should be interrupted when the external service detects user speech. +(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404)) + +- Added `VonageFrameSerializer` for the Vonage Video API Audio Connector + WebSocket protocol. +(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410)) + +- Added `append_trailing_space` parameter to `TTSService` to automatically + append a trailing space to text before sending to TTS, helping prevent some + services from vocalizing trailing punctuation. +(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424)) + +### Changed + +- Updated `ElevenLabsRealtimeSTTService` to accept the + `include_language_detection` parameter to detect language. + ```python + stt = ElevenLabsRealtimeSTTService( + api_key=os.getenv("ELEVENLABS_API_KEY"), + include_language_detection=True + ) + ``` +(PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216)) + +- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved + VAD, + Smart Turn capabilities, and brings dramatic improvements to latency + without + any impact on accuracy. Use the `turn_detection_mode` parameter to control + the + endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default), + `TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. + ```python + stt = SpeechmaticsSTTService( + api_key=os.getenv("SPEECHMATICS_API_KEY"), + params=SpeechmaticsSTTService.InputParams( + language=Language.EN, +turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE, + speaker_active_format="<{speaker_id}>{text}", + ), + ) + ``` +(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225)) + +- `daily-python` updated to 0.23.0. +(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257)) + +- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by + `DailyTransport` now include the transport source (i.e., the originating + audio track). +(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257)) + +- Updates to Inworld TTS services: + + - Improved `InworldTTSService`'s websocket implementation to better flush + and + close context to better handle long inputs. + - Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`. +(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288)) + +- Updated `DeepgramSTTService` to push user started/stopped speaking and + interruption frames when `vad_enabled` is set to true. This centralizes the + frames into the service, removing the need to have your application code + handle Deepgram's events and push these frames. +(PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314)) + +- Added encoding validation to `DeepgramTTSService` to prevent unsupported + encodings from reaching the API. The service now raises `ValueError` at + initialization with a clear error message. +(PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329)) + +- Updated `read_audio_frame` & `read_video_frame` methods in + `SmallWebRTCClient` to check if the track is enabled before logging a + warning. +(PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336)) + +- Updated `CartesiaTTSService` to support setting `language=None`, resulting in + Cartesia auto-detecting the language of the conversation. +(PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366)) + +- The bundled Smart Turn weights are now updated to v3.2, which has better + handling of short utterances, and is more robust against background + noise. +(PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367)) + +- Updated `SpeechmaticsSTTService` dependency to + `speechmatics-voice[smart]>=0.2.6` +(PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371)) + +- Smart Turn now takes into account `vad_start_seconds` when buffering audio, + meaning that the start of the turn audio is not cut off. This improves + accuracy for short utterances. + + - The default value of `pre_speech_ms` is now set to 500ms for Smart Turn. +(PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377)) + +- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter` + to share a single SDK instance within the same process. +(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) + +- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english` + and voice ID to `autumn`. +(PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399)) + +- Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio + packetization via the `fixed_audio_packet_size` parameter to support media + endpoints requiring strict framing and real-time pacing. +(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410)) + +- `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to + `True` to prevent punctuation (e.g., “dot”) from being pronounced. +(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424)) + +- Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`, + `LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns + thought content. +(PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431)) + +### Deprecated + +- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use + `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with + `LLMUserAggregator`'s new `user_turn_strategies` parameter instead. +(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + +- `FrameProcessor.interruption_strategies` is deprecated, use + `LLMUserAggregator`'s new `user_turn_strategies` parameter instead. +(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + +- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in + `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new + universal `LLMContext` and `LLMContextAggregatorPair` instead. +(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + +- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and + `UserStoppedSpeakingFrame` frames. +(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + +- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` + frames are deprecated. +(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + +- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in + unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies` + parameter instead. +(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + +- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is + deprecated. + Use the new `turn_detection_mode` parameter instead, with + `TurnDetectionMode.EXTERNAL`, + `TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. The + `enable_vad` + parameter is also deprecated and is inferred from the + `turn_detection_mode`. +(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225)) + +- `OpenAILLMContext` and its associated things (context aggregators, etc.) are + now deprecated in favor of the universal `LLMContext` and its associated + things. + + From the developer's point of view, switching to using `LLMContext` + machinery will usually be a matter of going from this: + + ```python + context = OpenAILLMContext(messages, tools) + context_aggregator = llm.create_context_aggregator(context) + ``` + + To this: + + ``` + context = LLMContext(messages, tools) + context_aggregator = LLMContextAggregatorPair(context) + ``` +(PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263)) + +- `STTMuteFilter` is deprecated and will be removed in a future version. Use + `LLMUserAggregator`'s new `user_mute_strategies` instead. +(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) + +- `FrameProcessor.interruptions_allowed` is now deprecated, use + `LLMUserAggregator`'s new parameter `user_mute_strategies` instead. +(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297)) + +- `PipelineParams.allow_interruptions` is now deprecated, use + `LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For + example, to disable interruptions but still get user turns you can do: + + ```python + context_aggregator = LLMContextAggregatorPair( + context, + user_params=LLMUserAggregatorParams( + user_turn_strategies=UserTurnStrategies( +start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)], + ), + ), + ) + ``` +(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297)) + +- `TranscriptProcessor` and related data classes and frames + (`TranscriptionMessage`, `ThoughtTranscriptionMessage`, + `TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and + `LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and + `on_assistant_turn_stopped`) instead. +(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) + +- Deprecated support for the `vad_events` `LiveOptions` in + `DeepgramSTTService`. Instead, use a local Silero VAD for VAD events. + Additionally, deprecated `should_interrupt` which will be removed along with + `vad_events` support in a future release. +(PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386)) + +- Loading external observers from files is deprecated, use the new pipeline + task setup files and `PIPECAT_SETUP_FILES` environment variable instead. +(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397)) + +### Fixed + +- Improved error handling in `ElevenLabsRealtimeSTTService` + - Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop + that blocks the process if the websocket disconnects due to an error +(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233)) + +- Fixed a bug in `STTMuteFilter` where the user was not always muted during + function calls, especially when there were multiple simultaneous calls. +(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) + +- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate + memory" error when processing silence audio frames. +(PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322)) + +- Updated `SpeechmaticsSTTService` for version `0.0.99+`: + - Fixed `SpeechmaticsSTTService` to listen for + `VADUserStoppedSpeakingFrame` in order to finalize transcription. + - Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn + detection. + - Only emit VAD + interruption frames if VAD is enabled within the plugin + (modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`). +(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328)) + +- Fixed an issue with function calling where a handler failing to invoke its + result callback could leave the context stuck in IN_PROGRESS, causing LLM + inference for subsequent function call results to block while waiting on the + unresolved call. +(PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343)) + +- Fixed an issue with DeepgramTTSService where the model would output "Dot" + instead of a period in some circumstances. +(PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345)) + +- Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as + `unknown`. +(PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351)) + +- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were + occasionally not pushed. +(PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356)) + +- Fixed potential memory leaks and initialization issues in `KrispVivaFilter` + by improving SDK lifecycle management. +(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) + +- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was + set after awaiting, allowing the event loop to re-enter the method before the + guard was set. +(PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400)) + +- Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as + `unknown`. +(PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422)) + +- Fixed an issue in `traced_tts`, `traced_gemini_live`, and + `traced_openai_realtime` where `model_name` in OpenTelemetry appears as + `unknown`. +(PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428)) + +- Fixed `request_image_frame` (for backwards compatibility) and restored + function-call–related fields in `UserImageRequestFrame` and + `UserImageRawFrame`, preventing a case where adding a non-LLM message to the + context could trigger duplicate LLM inferences (on image arrival and on + function-call result), potentially causing an infinite inference loop. +(PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430)) + +- Fixed `LLMContext.create_audio_message()` by correcting an internal helper + that was incorrectly declared async while being run in `asyncio.to_thread()`. +(PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435)) + +### Other + +- Added `52-live-transcription.py` foundational example demonstrating live + transcription and translation from English to Spanish. In this example, the + bot is not interruptible: as the user continues speaking, English + transcriptions are queued, and the bot continuously translates and speaks + each queued sentence in Spanish without being interrupted by new user speech. +(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316)) + +- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows + how to use `UserTurnProcessor`. +(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) + +- Added a new foundational example `28-user-assistant-turns.py` that shows how + to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to + gather a conversation transcript. +(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) + ## [0.0.98] - 2025-12-17 ### Added diff --git a/changelog/3045.added.md b/changelog/3045.added.md deleted file mode 100644 index 0dda476c7..000000000 --- a/changelog/3045.added.md +++ /dev/null @@ -1,42 +0,0 @@ -- Introducing user turn strategies. User turn strategies indicate when the user turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies. - - User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words). - - User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions). - - A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true. - - Available user turn start strategies: - - VADUserTurnStartStrategy - - TranscriptionUserTurnStartStrategy - - MinWordsUserTurnStartStrategy - - ExternalUserTurnStartStrategy - - Available user turn stop strategies: - - TranscriptionUserTurnStopStrategy - - TurnAnalyzerUserTurnStopStrategy - - ExternalUserTurnStopStrategy - - The default strategies are: - - - start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy] - - stop: [TranscriptionUserTurnStopStrategy] - - Turn strategies are configured when setting up `LLMContextAggregatorPair`. For example: - - ```python - context_aggregator = LLMContextAggregatorPair( - context, - user_params=LLMUserAggregatorParams( - user_turn_strategies=UserTurnStrategies( - stop=[ - TurnAnalyzerUserTurnStopStrategy( - turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) - ) - ], - ) - ), - ) - ``` - - In order to use the user turn strategies you must update to the new universal `LLMContext` and `LLMContextAggregatorPair`. diff --git a/changelog/3045.deprecated.2.md b/changelog/3045.deprecated.2.md deleted file mode 100644 index 7f03ff54a..000000000 --- a/changelog/3045.deprecated.2.md +++ /dev/null @@ -1 +0,0 @@ -- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies` parameter instead. diff --git a/changelog/3045.deprecated.3.md b/changelog/3045.deprecated.3.md deleted file mode 100644 index 594950603..000000000 --- a/changelog/3045.deprecated.3.md +++ /dev/null @@ -1 +0,0 @@ -- `FrameProcessor.interruption_strategies` is deprecated, use `LLMUserAggregator`'s new `user_turn_strategies` parameter instead. diff --git a/changelog/3045.deprecated.4.md b/changelog/3045.deprecated.4.md deleted file mode 100644 index fda634ce8..000000000 --- a/changelog/3045.deprecated.4.md +++ /dev/null @@ -1 +0,0 @@ -- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` frames are deprecated. diff --git a/changelog/3045.deprecated.5.md b/changelog/3045.deprecated.5.md deleted file mode 100644 index 57781a489..000000000 --- a/changelog/3045.deprecated.5.md +++ /dev/null @@ -1 +0,0 @@ -- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames. diff --git a/changelog/3045.deprecated.6.md b/changelog/3045.deprecated.6.md deleted file mode 100644 index 3bf804220..000000000 --- a/changelog/3045.deprecated.6.md +++ /dev/null @@ -1 +0,0 @@ -- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new universal `LLMContext` and `LLMContextAggregatorPair` instead. diff --git a/changelog/3045.deprecated.md b/changelog/3045.deprecated.md deleted file mode 100644 index c58f1a4e2..000000000 --- a/changelog/3045.deprecated.md +++ /dev/null @@ -1 +0,0 @@ -- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with `LLMUserAggregator`'s new `user_turn_strategies` parameter instead. diff --git a/changelog/3205.added.md b/changelog/3205.added.md deleted file mode 100644 index dc72a1cf0..000000000 --- a/changelog/3205.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural network via pyrnnoise library. diff --git a/changelog/3216.changed.md b/changelog/3216.changed.md deleted file mode 100644 index b8d480fe2..000000000 --- a/changelog/3216.changed.md +++ /dev/null @@ -1,7 +0,0 @@ -- Updated `ElevenLabsRealtimeSTTService` to accept the `include_language_detection` parameter to detect language. - ```python - stt = ElevenLabsRealtimeSTTService( - api_key=os.getenv("ELEVENLABS_API_KEY"), - include_language_detection=True - ) - ``` diff --git a/changelog/3225.changed.md b/changelog/3225.changed.md deleted file mode 100644 index c5063f58e..000000000 --- a/changelog/3225.changed.md +++ /dev/null @@ -1,15 +0,0 @@ -- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved VAD, - Smart Turn capabilities, and brings dramatic improvements to latency without - any impact on accuracy. Use the `turn_detection_mode` parameter to control the - endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default), - `TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. - ```python - stt = SpeechmaticsSTTService( - api_key=os.getenv("SPEECHMATICS_API_KEY"), - params=SpeechmaticsSTTService.InputParams( - language=Language.EN, - turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE, - speaker_active_format="<{speaker_id}>{text}", - ), - ) - ``` diff --git a/changelog/3225.deprecated.md b/changelog/3225.deprecated.md deleted file mode 100644 index 162c82a82..000000000 --- a/changelog/3225.deprecated.md +++ /dev/null @@ -1,4 +0,0 @@ -- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is deprecated. - Use the new `turn_detection_mode` parameter instead, with `TurnDetectionMode.EXTERNAL`, - `TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. The `enable_vad` - parameter is also deprecated and is inferred from the `turn_detection_mode`. diff --git a/changelog/3233.fixed.md b/changelog/3233.fixed.md deleted file mode 100644 index 3f17fd765..000000000 --- a/changelog/3233.fixed.md +++ /dev/null @@ -1,2 +0,0 @@ -- Improved error handling in `ElevenLabsRealtimeSTTService` -- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop that blocks the process if the websocket disconnects due to an error \ No newline at end of file diff --git a/changelog/3257.changed.2.md b/changelog/3257.changed.2.md deleted file mode 100644 index 333c69746..000000000 --- a/changelog/3257.changed.2.md +++ /dev/null @@ -1 +0,0 @@ -- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by `DailyTransport` now include the transport source (i.e., the originating audio track). diff --git a/changelog/3257.changed.md b/changelog/3257.changed.md deleted file mode 100644 index fda547eef..000000000 --- a/changelog/3257.changed.md +++ /dev/null @@ -1 +0,0 @@ -- `daily-python` updated to 0.23.0. diff --git a/changelog/3263.deprecated.md b/changelog/3263.deprecated.md deleted file mode 100644 index 11f659c2e..000000000 --- a/changelog/3263.deprecated.md +++ /dev/null @@ -1,15 +0,0 @@ -- `OpenAILLMContext` and its associated things (context aggregators, etc.) are now deprecated in favor of the universal `LLMContext` and its associated things. - - From the developer's point of view, switching to using `LLMContext` machinery will usually be a matter of going from this: - - ```python - context = OpenAILLMContext(messages, tools) - context_aggregator = llm.create_context_aggregator(context) - ``` - - To this: - - ``` - context = LLMContext(messages, tools) - context_aggregator = LLMContextAggregatorPair(context) - ``` diff --git a/changelog/3267.added.md b/changelog/3267.added.md deleted file mode 100644 index bdeccd6ed..000000000 --- a/changelog/3267.added.md +++ /dev/null @@ -1,8 +0,0 @@ -- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time voice conversations: - - - Support for real-time audio streaming with WebSocket connection - - Built-in server-side VAD (Voice Activity Detection) - - Multiple voice options: Ara, Rex, Sal, Eve, Leo - - Built-in tools support: web_search, x_search, file_search - - Custom function calling with standard Pipecat tools schema - - Configurable audio formats (PCM at 8kHz-48kHz) diff --git a/changelog/3268.added.md b/changelog/3268.added.md deleted file mode 100644 index 6bbcd038c..000000000 --- a/changelog/3268.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added an approximation of TTFB for Ultravox. diff --git a/changelog/3288.changed.md b/changelog/3288.changed.md deleted file mode 100644 index 52a9694cd..000000000 --- a/changelog/3288.changed.md +++ /dev/null @@ -1,5 +0,0 @@ -- Updates to Inworld TTS services: - - - Improved `InworldTTSService`'s websocket implementation to better flush and - close context to better handle long inputs. - - Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`. diff --git a/changelog/3289.added.md b/changelog/3289.added.md deleted file mode 100644 index fb19607eb..000000000 --- a/changelog/3289.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added a new `AudioContextTTSService` to the TTS service base classes. The `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and `WebsocketWordTTSService`. diff --git a/changelog/3291.added.md b/changelog/3291.added.md deleted file mode 100644 index dacfd4f12..000000000 --- a/changelog/3291.added.md +++ /dev/null @@ -1,4 +0,0 @@ -- `LLMUserAggregator` now exposes the following events: - - `on_user_turn_started`: triggered when a user turn starts - - `on_user_turn_stopped`: triggered when a user turn ends - - `on_user_turn_stop_timeout`: triggered when a user turn does not stop and times out diff --git a/changelog/3292.added.md b/changelog/3292.added.md deleted file mode 100644 index 936d927d8..000000000 --- a/changelog/3292.added.md +++ /dev/null @@ -1,29 +0,0 @@ -- Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state. - - In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations. - - A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted. - - Available user mute strategies: - - * `FirstSpeechUserMuteStrategy` - * `MuteUntilFirstBotCompleteUserMuteStrategy` - * `AlwaysUserMuteStrategy` - * `FunctionCallUserMuteStrategy` - - User mute strategies replace the legacy `STTMuteFilter` and provide a more flexible and composable approach to muting user input. - - User mute strategies are configured when setting up the `LLMContextAggregatorPair`. For example: - - ```python - context_aggregator = LLMContextAggregatorPair( - context, - user_params=LLMUserAggregatorParams( - user_mute_strategies=[ - FirstSpeechUserMuteStrategy(), - ] - ), - ) - ``` - - In order to use user mute strategies you should update to the new universal `LLMContext` and `LLMContextAggregatorPair`. diff --git a/changelog/3292.deprecated.md b/changelog/3292.deprecated.md deleted file mode 100644 index 3aceea5f1..000000000 --- a/changelog/3292.deprecated.md +++ /dev/null @@ -1 +0,0 @@ -- `STTMuteFilter` is deprecated and will be removed in a future version. Use `LLMUserAggregator`'s new `user_mute_strategies` instead. diff --git a/changelog/3292.fixed.md b/changelog/3292.fixed.md deleted file mode 100644 index 4d3df66b0..000000000 --- a/changelog/3292.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed a bug in `STTMuteFilter` where the user was not always muted during function calls, especially when there were multiple simultaneous calls. diff --git a/changelog/3297.deprecated.2.md b/changelog/3297.deprecated.2.md deleted file mode 100644 index 49d5723af..000000000 --- a/changelog/3297.deprecated.2.md +++ /dev/null @@ -1 +0,0 @@ -- `FrameProcessor.interruptions_allowed` is now deprecated, use `LLMUserAggregator`'s new parameter `user_mute_strategies` instead. diff --git a/changelog/3297.deprecated.md b/changelog/3297.deprecated.md deleted file mode 100644 index d29a8a020..000000000 --- a/changelog/3297.deprecated.md +++ /dev/null @@ -1,12 +0,0 @@ -- `PipelineParams.allow_interruptions` is now deprecated, use `LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For example, to disable interruptions but still get user turns you can do: - - ```python - context_aggregator = LLMContextAggregatorPair( - context, - user_params=LLMUserAggregatorParams( - user_turn_strategies=UserTurnStrategies( - start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)], - ), - ), - ) - ``` diff --git a/changelog/3300.added.md b/changelog/3300.added.md deleted file mode 100644 index 6e0066559..000000000 --- a/changelog/3300.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService` and `NvidiaTTSService`. \ No newline at end of file diff --git a/changelog/3314.changed.md b/changelog/3314.changed.md deleted file mode 100644 index 3b9b074a8..000000000 --- a/changelog/3314.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Updated `DeepgramSTTService` to push user started/stopped speaking and interruption frames when `vad_enabled` is set to true. This centralizes the frames into the service, removing the need to have your application code handle Deepgram's events and push these frames. diff --git a/changelog/3316.added.md b/changelog/3316.added.md deleted file mode 100644 index d4c76c46a..000000000 --- a/changelog/3316.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `enable_interruptions` constructor argument to all user turn strategies. This tells the `LLMUserAggregator` to push or not push an `InterruptionFrame`. diff --git a/changelog/3316.other.md b/changelog/3316.other.md deleted file mode 100644 index 23c1be025..000000000 --- a/changelog/3316.other.md +++ /dev/null @@ -1 +0,0 @@ -- Added `52-live-transcription.py` foundational example demonstrating live transcription and translation from English to Spanish. In this example, the bot is not interruptible: as the user continues speaking, English transcriptions are queued, and the bot continuously translates and speaks each queued sentence in Spanish without being interrupted by new user speech. diff --git a/changelog/3322.fixed.md b/changelog/3322.fixed.md deleted file mode 100644 index 3ad2b75bc..000000000 --- a/changelog/3322.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate memory" error when processing silence audio frames. diff --git a/changelog/3328.added.md b/changelog/3328.added.md deleted file mode 100644 index db793e828..000000000 --- a/changelog/3328.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control sentence splitting behavior for finals on sentence boundaries. diff --git a/changelog/3328.fixed.md b/changelog/3328.fixed.md deleted file mode 100644 index 6f09c2386..000000000 --- a/changelog/3328.fixed.md +++ /dev/null @@ -1,4 +0,0 @@ -- Updated `SpeechmaticsSTTService` for version `0.0.99+`: - - Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame` in order to finalize transcription. - - Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn detection. - - Only emit VAD + interruption frames if VAD is enabled within the plugin (modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`). diff --git a/changelog/3329.changed.md b/changelog/3329.changed.md deleted file mode 100644 index 8271decf2..000000000 --- a/changelog/3329.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Added encoding validation to `DeepgramTTSService` to prevent unsupported encodings from reaching the API. The service now raises `ValueError` at initialization with a clear error message. diff --git a/changelog/3334.added.md b/changelog/3334.added.md deleted file mode 100644 index 993bca20e..000000000 --- a/changelog/3334.added.md +++ /dev/null @@ -1,2 +0,0 @@ -- Added word-level timestamp support to `AzureTTSService` for accurate text-to-audio synchronization. - \ No newline at end of file diff --git a/changelog/3336.changed.md b/changelog/3336.changed.md deleted file mode 100644 index 2f6a28b30..000000000 --- a/changelog/3336.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Updated `read_audio_frame` & `read_video_frame` methods in `SmallWebRTCClient` to check if the track is enabled before logging a warning. \ No newline at end of file diff --git a/changelog/3343.fixed.md b/changelog/3343.fixed.md deleted file mode 100644 index ee037b304..000000000 --- a/changelog/3343.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed an issue with function calling where a handler failing to invoke its result callback could leave the context stuck in IN_PROGRESS, causing LLM inference for subsequent function call results to block while waiting on the unresolved call. diff --git a/changelog/3345.fixed.md b/changelog/3345.fixed.md deleted file mode 100644 index 61c0a575e..000000000 --- a/changelog/3345.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed an issue with DeepgramTTSService where the model would output "Dot" instead of a period in some circumstances. diff --git a/changelog/3346.added.md b/changelog/3346.added.md deleted file mode 100644 index b72d23b55..000000000 --- a/changelog/3346.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams` and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation dictionary feature for custom pronunciations. diff --git a/changelog/3351.fixed.md b/changelog/3351.fixed.md deleted file mode 100644 index 2792839cb..000000000 --- a/changelog/3351.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as `unknown`. diff --git a/changelog/3356.fixed.md b/changelog/3356.fixed.md deleted file mode 100644 index 0532e742a..000000000 --- a/changelog/3356.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were occasionally not pushed. diff --git a/changelog/3357.added.md b/changelog/3357.added.md deleted file mode 100644 index 08a739573..000000000 --- a/changelog/3357.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport` (see https://www.liveavatar.com/). diff --git a/changelog/3360.added.md b/changelog/3360.added.md deleted file mode 100644 index c9f22e823..000000000 --- a/changelog/3360.added.md +++ /dev/null @@ -1,8 +0,0 @@ -- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`: - - New `start_video_paused` parameter to control initial video input state - - New `video_frame_detail` parameter to set image processing quality ("auto", - "low", or "high"). This corresponds to OpenAI Realtime's `image_detail` - parameter. - - `set_video_input_paused()` method to pause/resume video input at runtime - - `set_video_frame_detail()` method to adjust video frame quality dynamically - - Automatic rate limiting (1 frame per second) to prevent API overload diff --git a/changelog/3366.changed.md b/changelog/3366.changed.md deleted file mode 100644 index d9a0a4d89..000000000 --- a/changelog/3366.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Updated `CartesiaTTSService` to support setting `language=None`, resulting in Cartesia auto-detecting the language of the conversation. diff --git a/changelog/3367.changed.md b/changelog/3367.changed.md deleted file mode 100644 index 036ff1c62..000000000 --- a/changelog/3367.changed.md +++ /dev/null @@ -1,3 +0,0 @@ -- The bundled Smart Turn weights are now updated to v3.2, which has better - handling of short utterances, and is more robust against background - noise. diff --git a/changelog/3371.changed.md b/changelog/3371.changed.md deleted file mode 100644 index eb96711fb..000000000 --- a/changelog/3371.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6` diff --git a/changelog/3372.added.2.md b/changelog/3372.added.2.md deleted file mode 100644 index 6b1bf1ed4..000000000 --- a/changelog/3372.added.2.md +++ /dev/null @@ -1 +0,0 @@ -- Added `UserTurnProcessor`, a frame processor built on `UserTurnController` that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames and interruptions based on the controller's user turn strategies. diff --git a/changelog/3372.added.md b/changelog/3372.added.md deleted file mode 100644 index 3468f7ac1..000000000 --- a/changelog/3372.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `UserTurnController` to manage user turns. It emits `on_user_turn_started`, `on_user_turn_stopped`, and `on_user_turn_stop_timeout` events, and can be integrated into processors to detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are implemented using this controller. diff --git a/changelog/3372.other.md b/changelog/3372.other.md deleted file mode 100644 index d0e96c20b..000000000 --- a/changelog/3372.other.md +++ /dev/null @@ -1 +0,0 @@ -- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows how to use `UserTurnProcessor`. diff --git a/changelog/3374.added.md b/changelog/3374.added.md deleted file mode 100644 index eb04650ef..000000000 --- a/changelog/3374.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `should_interrupt` property to `DeepgramFluxSTTService`, `DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the bot should be interrupted when the external service detects user speech. \ No newline at end of file diff --git a/changelog/3377.changed.md b/changelog/3377.changed.md deleted file mode 100644 index 8a006bd42..000000000 --- a/changelog/3377.changed.md +++ /dev/null @@ -1,5 +0,0 @@ -- Smart Turn now takes into account `vad_start_seconds` when buffering audio, - meaning that the start of the turn audio is not cut off. This improves - accuracy for short utterances. - -- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn. diff --git a/changelog/3385.added.md b/changelog/3385.added.md deleted file mode 100644 index b79a1d584..000000000 --- a/changelog/3385.added.md +++ /dev/null @@ -1,4 +0,0 @@ -- `LLMAssistantAggregator` now exposes the following events: - - `on_assistant_turn_started`: triggered when the assistant turn starts - - `on_assistant_turn_stopped`: triggered when the assistant turn ends - - `on_assistant_thought`: triggered when there's an assistant thought available diff --git a/changelog/3385.deprecated.md b/changelog/3385.deprecated.md deleted file mode 100644 index e810af08a..000000000 --- a/changelog/3385.deprecated.md +++ /dev/null @@ -1 +0,0 @@ -- `TranscriptProcessor` and related data classes and frames (`TranscriptionMessage`, `ThoughtTranscriptionMessage`, `TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and `LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and `on_assistant_turn_stopped`) instead. diff --git a/changelog/3385.other.md b/changelog/3385.other.md deleted file mode 100644 index 69f612908..000000000 --- a/changelog/3385.other.md +++ /dev/null @@ -1 +0,0 @@ -- Added a new foundational example `28-user-assistant-turns.py` that shows how to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to gather a conversation transcript. diff --git a/changelog/3386.deprecated.md b/changelog/3386.deprecated.md deleted file mode 100644 index 3c9c141ce..000000000 --- a/changelog/3386.deprecated.md +++ /dev/null @@ -1 +0,0 @@ -- Deprecated support for the `vad_events` `LiveOptions` in `DeepgramSTTService`. Instead, use a local Silero VAD for VAD events. Additionally, deprecated `should_interrupt` which will be removed along with `vad_events` support in a future release. diff --git a/changelog/3391.added.md b/changelog/3391.added.md deleted file mode 100644 index 7dfaa9a2f..000000000 --- a/changelog/3391.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA SDK (requires `krisp_audio`). diff --git a/changelog/3391.changed.md b/changelog/3391.changed.md deleted file mode 100644 index fb12beac0..000000000 --- a/changelog/3391.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter` to share a single SDK instance within the same process. diff --git a/changelog/3391.fixed.md b/changelog/3391.fixed.md deleted file mode 100644 index 95c14ebd5..000000000 --- a/changelog/3391.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed potential memory leaks and initialization issues in `KrispVivaFilter` by improving SDK lifecycle management. \ No newline at end of file diff --git a/changelog/3397.added.md b/changelog/3397.added.md deleted file mode 100644 index 2f819cc43..000000000 --- a/changelog/3397.added.md +++ /dev/null @@ -1,6 +0,0 @@ -- Added support for setting up a pipeline task from external files. You can now register custom pipeline task setup files by setting the `PIPECAT_SETUP_FILES` environment variable. This variable should contain a colon-separated list of Python files (e.g. `export PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a function with the following signature: - - ```python - async def setup_pipeline_task(task: PipelineTask): - ... - ``` diff --git a/changelog/3397.deprecated.md b/changelog/3397.deprecated.md deleted file mode 100644 index b9028c5be..000000000 --- a/changelog/3397.deprecated.md +++ /dev/null @@ -1 +0,0 @@ -- Loading external observers from files is deprecated, use the new pipeline task setup files and `PIPECAT_SETUP_FILES` environment variable instead. diff --git a/changelog/3399.changed.md b/changelog/3399.changed.md deleted file mode 100644 index fecf505bc..000000000 --- a/changelog/3399.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english` and voice ID to `autumn`. diff --git a/changelog/3400.fixed.md b/changelog/3400.fixed.md deleted file mode 100644 index aaf881bb7..000000000 --- a/changelog/3400.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was set after awaiting, allowing the event loop to re-enter the method before the guard was set. diff --git a/changelog/3403.added.md b/changelog/3403.added.md deleted file mode 100644 index 6b55ef97d..000000000 --- a/changelog/3403.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added a keepalive task for `InworldTTSService` to keep the service connected in the event of no generations for longer periods of time. diff --git a/changelog/3404.added.2.md b/changelog/3404.added.2.md deleted file mode 100644 index 0f15c39c3..000000000 --- a/changelog/3404.added.2.md +++ /dev/null @@ -1 +0,0 @@ -- Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When enabled, `GladiaSTTService` acts as the turn controller, emitting `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally `InterruptionFrame`. diff --git a/changelog/3404.added.md b/changelog/3404.added.md deleted file mode 100644 index 733c22ebe..000000000 --- a/changelog/3404.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `should_interrupt` property to `GladiaSTTService` to configure whether the bot should be interrupted when the external service detects user speech. diff --git a/changelog/3410.added.md b/changelog/3410.added.md deleted file mode 100644 index 094532343..000000000 --- a/changelog/3410.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `VonageFrameSerializer` for the Vonage Video API Audio Connector WebSocket protocol. diff --git a/changelog/3410.changed.md b/changelog/3410.changed.md deleted file mode 100644 index 3e54436de..000000000 --- a/changelog/3410.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio packetization via the `fixed_audio_packet_size` parameter to support media endpoints requiring strict framing and real-time pacing. diff --git a/changelog/3422.fixed.md b/changelog/3422.fixed.md deleted file mode 100644 index fa34d9262..000000000 --- a/changelog/3422.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as `unknown`. diff --git a/changelog/3424.added.md b/changelog/3424.added.md deleted file mode 100644 index 61cc8ea77..000000000 --- a/changelog/3424.added.md +++ /dev/null @@ -1 +0,0 @@ -- Added `append_trailing_space` parameter to `TTSService` to automatically append a trailing space to text before sending to TTS, helping prevent some services from vocalizing trailing punctuation. diff --git a/changelog/3424.changed.md b/changelog/3424.changed.md deleted file mode 100644 index 2e665ca2d..000000000 --- a/changelog/3424.changed.md +++ /dev/null @@ -1 +0,0 @@ -- `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to `True` to prevent punctuation (e.g., “dot”) from being pronounced. diff --git a/changelog/3428.fixed.md b/changelog/3428.fixed.md deleted file mode 100644 index e82ff082e..000000000 --- a/changelog/3428.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed an issue in `traced_tts`, `traced_gemini_live`, and `traced_openai_realtime` where `model_name` in OpenTelemetry appears as `unknown`. diff --git a/changelog/3430.fixed.md b/changelog/3430.fixed.md deleted file mode 100644 index 6f689eb78..000000000 --- a/changelog/3430.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed `request_image_frame` (for backwards compatibility) and restored function-call–related fields in `UserImageRequestFrame` and `UserImageRawFrame`, preventing a case where adding a non-LLM message to the context could trigger duplicate LLM inferences (on image arrival and on function-call result), potentially causing an infinite inference loop. diff --git a/changelog/3431.changed.md b/changelog/3431.changed.md deleted file mode 100644 index 0a9164491..000000000 --- a/changelog/3431.changed.md +++ /dev/null @@ -1 +0,0 @@ -- Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`, `LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns thought content. diff --git a/changelog/3435.fixed.md b/changelog/3435.fixed.md deleted file mode 100644 index a4482c328..000000000 --- a/changelog/3435.fixed.md +++ /dev/null @@ -1 +0,0 @@ -- Fixed `LLMContext.create_audio_message()` by correcting an internal helper that was incorrectly declared async while being run in `asyncio.to_thread()`. From 7e1b4a4e905c767076c2c18b97ce0562478665f1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aleix=20Conchillo=20Flaqu=C3=A9?= Date: Tue, 13 Jan 2026 16:59:46 -0800 Subject: [PATCH 2/2] update cosmetic changelog updates for 0.0.99 --- CHANGELOG.md | 218 ++++++++++++++++++++++++--------------------------- 1 file changed, 104 insertions(+), 114 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 290a7802c..3d583b4e1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,13 +15,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies. - User turn start strategies indicate when the user starts speaking (e.g. + User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words). - User turn stop strategies indicate when the user stops speaking (e.g. using + User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions). - A list of strategies can be specified for both strategies; strategies are + A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true. Available user turn start strategies: @@ -40,7 +40,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy] - stop: [TranscriptionUserTurnStopStrategy] - Turn strategies are configured when setting up `LLMContextAggregatorPair`. + urn strategies are configured when setting up `LLMContextAggregatorPair`. For example: ```python @@ -58,13 +58,13 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) ) ``` - In order to use the user turn strategies you must update to the new + In order to use the user turn strategies you must update to the new universal `LLMContext` and `LLMContextAggregatorPair`. -(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) - Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural network via pyrnnoise library. -(PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205)) + (PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205)) - Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time voice conversations: @@ -75,31 +75,31 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) - Built-in tools support: web_search, x_search, file_search - Custom function calling with standard Pipecat tools schema - Configurable audio formats (PCM at 8kHz-48kHz) -(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267)) + (PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267)) - Added an approximation of TTFB for Ultravox. -(PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268)) + (PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268)) - Added a new `AudioContextTTSService` to the TTS service base classes. The `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and `WebsocketWordTTSService`. -(PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289)) + (PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289)) - `LLMUserAggregator` now exposes the following events: - `on_user_turn_started`: triggered when a user turn starts - `on_user_turn_stopped`: triggered when a user turn ends - `on_user_turn_stop_timeout`: triggered when a user turn does not stop - and times out -(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291)) + and times out + (PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291)) - Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state. - In conversational agents, user mute strategies are used to prevent user + In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations. - A list of strategies can be specified; all strategies are evaluated for + A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted. @@ -111,10 +111,10 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) * `AlwaysUserMuteStrategy` * `FunctionCallUserMuteStrategy` - User mute strategies replace the legacy `STTMuteFilter` and provide a more + User mute strategies replace the legacy `STTMuteFilter` and provide a more flexible and composable approach to muting user input. - User mute strategies are configured when setting up the + User mute strategies are configured when setting up the `LLMContextAggregatorPair`. For example: ```python @@ -128,35 +128,35 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) ) ``` - In order to use user mute strategies you should update to the new universal + In order to use user mute strategies you should update to the new universal `LLMContext` and `LLMContextAggregatorPair`. -(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) + (PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) - Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService` and `NvidiaTTSService`. -(PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300)) + (PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300)) - Added `enable_interruptions` constructor argument to all user turn strategies. This tells the `LLMUserAggregator` to push or not push an `InterruptionFrame`. -(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316)) + (PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316)) - Added `split_sentences` parameter to `SpeechmaticsSTTService` to control sentence splitting behavior for finals on sentence boundaries. -(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328)) + (PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328)) - Added word-level timestamp support to `AzureTTSService` for accurate text-to-audio synchronization. -(PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334)) + (PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334)) - Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams` and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation dictionary feature for custom pronunciations. -(PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346)) + (PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346)) - Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport` (see https://www.liveavatar.com/). -(PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357)) + (PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357)) - Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`: - New `start_video_paused` parameter to control initial video input state @@ -166,37 +166,37 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) parameter. - `set_video_input_paused()` method to pause/resume video input at runtime - `set_video_frame_detail()` method to adjust video frame quality - dynamically + dynamically - Automatic rate limiting (1 frame per second) to prevent API overload -(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360)) + (PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360)) - Added `UserTurnProcessor`, a frame processor built on `UserTurnController` that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames and interruptions based on the controller's user turn strategies. -(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) + (PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) - Added `UserTurnController` to manage user turns. It emits `on_user_turn_started`, `on_user_turn_stopped`, and `on_user_turn_stop_timeout` events, and can be integrated into processors to detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are implemented using this controller. -(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) + (PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) - Added `should_interrupt` property to `DeepgramFluxSTTService`, `DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the bot should be interrupted when the external service detects user speech. -(PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374)) + (PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374)) - `LLMAssistantAggregator` now exposes the following events: - `on_assistant_turn_started`: triggered when the assistant turn starts - `on_assistant_turn_stopped`: triggered when the assistant turn ends - `on_assistant_thought`: triggered when there's an assistant thought available -(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) + (PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) - Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA SDK (requires `krisp_audio`). -(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) + (PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) - Added support for setting up a pipeline task from external files. You can now register custom pipeline task setup files by setting the @@ -209,30 +209,30 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) async def setup_pipeline_task(task: PipelineTask): ... ``` -(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397)) + (PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397)) - Added a keepalive task for `InworldTTSService` to keep the service connected in the event of no generations for longer periods of time. -(PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403)) + (PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403)) - Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When enabled, `GladiaSTTService` acts as the turn controller, emitting `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally `InterruptionFrame`. -(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404)) + (PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404)) - Added `should_interrupt` property to `GladiaSTTService` to configure whether the bot should be interrupted when the external service detects user speech. -(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404)) + (PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404)) - Added `VonageFrameSerializer` for the Vonage Video API Audio Connector WebSocket protocol. -(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410)) + (PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410)) - Added `append_trailing_space` parameter to `TTSService` to automatically append a trailing space to text before sending to TTS, helping prevent some services from vocalizing trailing punctuation. -(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424)) + (PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424)) ### Changed @@ -244,16 +244,13 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) include_language_detection=True ) ``` -(PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216)) + (PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216)) - Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved - VAD, - Smart Turn capabilities, and brings dramatic improvements to latency - without - any impact on accuracy. Use the `turn_detection_mode` parameter to control - the - endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default), - `TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. + VAD, Smart Turn capabilities, and brings dramatic improvements to latency + without any impact on accuracy. Use the `turn_detection_mode` parameter to control + the endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default), + `TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. ```python stt = SpeechmaticsSTTService( api_key=os.getenv("SPEECHMATICS_API_KEY"), @@ -264,126 +261,119 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE, ), ) ``` -(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225)) + (PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225)) - `daily-python` updated to 0.23.0. -(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257)) + (PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257)) - `TranscriptionFrame` and `InterimTranscriptionFrame` produced by `DailyTransport` now include the transport source (i.e., the originating audio track). -(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257)) + (PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257)) - Updates to Inworld TTS services: - Improved `InworldTTSService`'s websocket implementation to better flush - and - close context to better handle long inputs. + and close context to better handle long inputs. - Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`. -(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288)) + (PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288)) - Updated `DeepgramSTTService` to push user started/stopped speaking and interruption frames when `vad_enabled` is set to true. This centralizes the frames into the service, removing the need to have your application code handle Deepgram's events and push these frames. -(PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314)) + (PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314)) - Added encoding validation to `DeepgramTTSService` to prevent unsupported encodings from reaching the API. The service now raises `ValueError` at initialization with a clear error message. -(PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329)) + (PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329)) - Updated `read_audio_frame` & `read_video_frame` methods in `SmallWebRTCClient` to check if the track is enabled before logging a warning. -(PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336)) + (PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336)) - Updated `CartesiaTTSService` to support setting `language=None`, resulting in Cartesia auto-detecting the language of the conversation. -(PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366)) + (PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366)) - The bundled Smart Turn weights are now updated to v3.2, which has better - handling of short utterances, and is more robust against background - noise. -(PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367)) + handling of short utterances, and is more robust against background noise. + (PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367)) -- Updated `SpeechmaticsSTTService` dependency to - `speechmatics-voice[smart]>=0.2.6` -(PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371)) +- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6` + (PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371)) - Smart Turn now takes into account `vad_start_seconds` when buffering audio, - meaning that the start of the turn audio is not cut off. This improves - accuracy for short utterances. - - - The default value of `pre_speech_ms` is now set to 500ms for Smart Turn. -(PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377)) + meaning that the start of the turn audio is not cut off. This improves + accuracy for short utterances. + - The default value of `pre_speech_ms` is now set to 500ms for Smart Turn. + (PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377)) - Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter` to share a single SDK instance within the same process. -(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) + (PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) - Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english` and voice ID to `autumn`. -(PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399)) + (PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399)) - Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio packetization via the `fixed_audio_packet_size` parameter to support media endpoints requiring strict framing and real-time pacing. -(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410)) + (PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410)) - `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to `True` to prevent punctuation (e.g., “dot”) from being pronounced. -(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424)) + (PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424)) - Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`, `LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns thought content. -(PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431)) + (PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431)) ### Deprecated - `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with `LLMUserAggregator`'s new `user_turn_strategies` parameter instead. -(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) - `FrameProcessor.interruption_strategies` is deprecated, use `LLMUserAggregator`'s new `user_turn_strategies` parameter instead. -(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) - The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new universal `LLMContext` and `LLMContextAggregatorPair` instead. -(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) - Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames. -(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) - `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` frames are deprecated. -(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) - ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies` parameter instead. -(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) + (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045)) - For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is - deprecated. - Use the new `turn_detection_mode` parameter instead, with - `TurnDetectionMode.EXTERNAL`, - `TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. The - `enable_vad` - parameter is also deprecated and is inferred from the - `turn_detection_mode`. -(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225)) + deprecated. Use the new `turn_detection_mode` parameter instead, with + `TurnDetectionMode.EXTERNAL`,`TurnDetectionMode.ADAPTIVE`, or + `TurnDetectionMode.SMART_TURN`. The `enable_vad` parameter is also + deprecated and is inferred from the `turn_detection_mode`. + (PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225)) - `OpenAILLMContext` and its associated things (context aggregators, etc.) are now deprecated in favor of the universal `LLMContext` and its associated things. - From the developer's point of view, switching to using `LLMContext` + From the developer's point of view, switching to using `LLMContext` machinery will usually be a matter of going from this: ```python @@ -397,15 +387,15 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE, context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context) ``` -(PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263)) + (PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263)) - `STTMuteFilter` is deprecated and will be removed in a future version. Use `LLMUserAggregator`'s new `user_mute_strategies` instead. -(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) + (PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) - `FrameProcessor.interruptions_allowed` is now deprecated, use `LLMUserAggregator`'s new parameter `user_mute_strategies` instead. -(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297)) + (PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297)) - `PipelineParams.allow_interruptions` is now deprecated, use `LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For @@ -421,95 +411,95 @@ start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)], ), ) ``` -(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297)) + (PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297)) - `TranscriptProcessor` and related data classes and frames (`TranscriptionMessage`, `ThoughtTranscriptionMessage`, `TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and `LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and `on_assistant_turn_stopped`) instead. -(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) + (PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) - Deprecated support for the `vad_events` `LiveOptions` in `DeepgramSTTService`. Instead, use a local Silero VAD for VAD events. Additionally, deprecated `should_interrupt` which will be removed along with `vad_events` support in a future release. -(PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386)) + (PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386)) - Loading external observers from files is deprecated, use the new pipeline task setup files and `PIPECAT_SETUP_FILES` environment variable instead. -(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397)) + (PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397)) ### Fixed - Improved error handling in `ElevenLabsRealtimeSTTService` - Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop that blocks the process if the websocket disconnects due to an error -(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233)) + (PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233)) - Fixed a bug in `STTMuteFilter` where the user was not always muted during function calls, especially when there were multiple simultaneous calls. -(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) + (PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292)) - Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate memory" error when processing silence audio frames. -(PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322)) + (PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322)) - Updated `SpeechmaticsSTTService` for version `0.0.99+`: - - Fixed `SpeechmaticsSTTService` to listen for - `VADUserStoppedSpeakingFrame` in order to finalize transcription. + - Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame` + in order to finalize transcription. - Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn - detection. + detection. - Only emit VAD + interruption frames if VAD is enabled within the plugin - (modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`). -(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328)) + (modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`). + (PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328)) - Fixed an issue with function calling where a handler failing to invoke its result callback could leave the context stuck in IN_PROGRESS, causing LLM inference for subsequent function call results to block while waiting on the unresolved call. -(PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343)) + (PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343)) - Fixed an issue with DeepgramTTSService where the model would output "Dot" instead of a period in some circumstances. -(PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345)) + (PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345)) - Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as `unknown`. -(PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351)) + (PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351)) - Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were occasionally not pushed. -(PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356)) + (PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356)) - Fixed potential memory leaks and initialization issues in `KrispVivaFilter` by improving SDK lifecycle management. -(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) + (PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391)) - Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was set after awaiting, allowing the event loop to re-enter the method before the guard was set. -(PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400)) + (PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400)) - Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as `unknown`. -(PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422)) + (PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422)) - Fixed an issue in `traced_tts`, `traced_gemini_live`, and `traced_openai_realtime` where `model_name` in OpenTelemetry appears as `unknown`. -(PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428)) + (PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428)) - Fixed `request_image_frame` (for backwards compatibility) and restored function-call–related fields in `UserImageRequestFrame` and `UserImageRawFrame`, preventing a case where adding a non-LLM message to the context could trigger duplicate LLM inferences (on image arrival and on function-call result), potentially causing an infinite inference loop. -(PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430)) + (PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430)) - Fixed `LLMContext.create_audio_message()` by correcting an internal helper that was incorrectly declared async while being run in `asyncio.to_thread()`. -(PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435)) + (PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435)) ### Other @@ -518,16 +508,16 @@ start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)], bot is not interruptible: as the user continues speaking, English transcriptions are queued, and the bot continuously translates and speaks each queued sentence in Spanish without being interrupted by new user speech. -(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316)) + (PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316)) - Added a new foundational example `53-concurrent-llm-evaluation.py` that shows how to use `UserTurnProcessor`. -(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) + (PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372)) - Added a new foundational example `28-user-assistant-turns.py` that shows how to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to gather a conversation transcript. -(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) + (PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385)) ## [0.0.98] - 2025-12-17