10683 lines
446 KiB
Markdown
10683 lines
446 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to **Pipecat** will be documented in this file.
|
||
|
||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||
|
||
<!-- towncrier release notes start -->
|
||
|
||
## [1.2.1] - 2026-05-15
|
||
|
||
### Changed
|
||
|
||
- Changed the default WebSocket endpoints for `GradiumSTTService` and
|
||
`GradiumTTSService` to the region-neutral
|
||
`wss://api.gradium.ai/api/speech/asr` and
|
||
`wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes
|
||
traffic to the nearest endpoint. Override the url to pin to a specific
|
||
region.
|
||
(PR [#4500](https://github.com/pipecat-ai/pipecat/pull/4500))
|
||
|
||
### Fixed
|
||
|
||
- Fixed bot hangs when `filter_incomplete_user_turns` was enabled and the LLM
|
||
responded by calling a tool. The user turn never finalized, so the assistant
|
||
aggregator gated the tool-result context push and the LLM continuation never
|
||
ran. Tool calls now finalize the turn the moment they start, before the
|
||
function dispatches.
|
||
(PR [#4501](https://github.com/pipecat-ai/pipecat/pull/4501))
|
||
|
||
## [1.2.0] - 2026-05-14
|
||
|
||
### Added
|
||
|
||
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a
|
||
per-session identifier in local development the same way they can in Pipecat
|
||
Cloud. The development runner now mints a UUID at every construction site,
|
||
and paths that already returned a `sessionId` to the caller (Daily `/start`,
|
||
dial-in webhook) share that same UUID with the runner args instead of
|
||
generating two. The SmallWebRTC `/api/offer` endpoint also accepts an
|
||
optional `session_id` query parameter so the `/sessions/{session_id}/...`
|
||
proxy can thread it through.
|
||
(PR [#4385](https://github.com/pipecat-ai/pipecat/pull/4385))
|
||
|
||
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService`
|
||
for controlling Cartesia's server-side text buffering. When unset, Pipecat
|
||
picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE`
|
||
mode (custom buffering — avoids stacking client-side aggregation on top of
|
||
Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode
|
||
(Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to
|
||
override.
|
||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||
|
||
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and
|
||
`DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
|
||
Improvement Program. When set, the value is forwarded to Deepgram as a query
|
||
parameter on the speak request. Defaults to `None`, which preserves the
|
||
existing behavior. See https://dpgr.am/deepgram-mip for pricing implications
|
||
before enabling.
|
||
(PR [#4400](https://github.com/pipecat-ai/pipecat/pull/4400))
|
||
|
||
- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set
|
||
via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that
|
||
appends a developer-role message to the context whenever `LLMSetToolsFrame`
|
||
changes the set of advertised standard tools. Helps the LLM stay coherent
|
||
across mid-conversation tool changes, mitigating several flavors of
|
||
tool-call-related hallucination: calling tools that have been removed,
|
||
avoiding tools that have been re-added, and hallucinating output (made-up
|
||
answers or tool-call-shaped non-tool-calls) when tools are unavailable.
|
||
(PR [#4404](https://github.com/pipecat-ai/pipecat/pull/4404))
|
||
|
||
- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in
|
||
`pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the
|
||
inference-triggered event and suppresses `on_user_turn_stopped`, leaving
|
||
finalization to another strategy in the chain such as
|
||
`LLMTurnCompletionUserTurnStopStrategy`.
|
||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||
|
||
- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop` —
|
||
a generic stop strategy that finalizes the user turn whenever a
|
||
`UserTurnInferenceCompletedFrame` arrives, regardless of which component
|
||
produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base;
|
||
future producers (Flux, custom end-of-turn classifiers, etc.) can use the
|
||
base directly or subclass it to add producer-specific setup.
|
||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||
|
||
- Added `on_user_turn_inference_triggered`, a new event on the user turn
|
||
controller, processor, aggregator and stop strategies that fires when a
|
||
strategy has enough signal to start LLM inference. By default it fires
|
||
together with `on_user_turn_stopped`; a gating strategy can fire only the
|
||
inference-triggered event and defer finalization to a peer.
|
||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||
|
||
- Added `FilterIncompleteUserTurnStrategies` in
|
||
`pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization
|
||
that wraps the detector chain with `deferred(...)` and appends
|
||
`LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case:
|
||
`user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass
|
||
`config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.
|
||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||
|
||
- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`.
|
||
When installed, the strategy gates `on_user_turn_stopped` on a
|
||
`UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by
|
||
any component that can judge turn completeness — e.g. the
|
||
`UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout`
|
||
provides a safety net if no completion frame ever arrives.
|
||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||
|
||
- Added first-class RTVI support for the UI Agent Protocol:
|
||
- Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server
|
||
messages, plus `ui-command` and `ui-task` server-to-client messages, with
|
||
paired `*Data` / `*Message` pydantic models.
|
||
- Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`,
|
||
`Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching
|
||
default handlers live in `@pipecat-ai/client-react`.
|
||
- Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`,
|
||
and `ui-cancel-task` messages.
|
||
- Adds five UI pipeline frames, mirroring the `client-message`
|
||
frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` /
|
||
`RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` /
|
||
`UITaskMessage` envelopes, while the processor pushes inbound
|
||
`RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame`
|
||
alongside `on_ui_message`.
|
||
- Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.
|
||
(PR [#4407](https://github.com/pipecat-ai/pipecat/pull/4407))
|
||
|
||
- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore
|
||
processor now resolve credentials via the standard boto3 provider chain (EC2
|
||
instance profiles, EKS pod roles / IRSA, ECS task roles, SSO,
|
||
`~/.aws/credentials`) when explicit credentials and `AWS_*` environment
|
||
variables are absent. Services running with IAM roles no longer need to
|
||
export static credentials.
|
||
(PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
|
||
|
||
- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can
|
||
bias transcription for both file-based and realtime transcription.
|
||
(PR [#4426](https://github.com/pipecat-ai/pipecat/pull/4426))
|
||
|
||
- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and
|
||
`DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum
|
||
silence duration before the watchdog sends a silence packet to prevent
|
||
dangling turns. The actual threshold is `max(chunk_duration * 2,
|
||
watchdog_min_timeout)`, so it also adapts automatically to the audio chunk
|
||
size in use.
|
||
(PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
|
||
|
||
- Added `cancel_on_interruption=False` support for `GeminiLiveLLMService` on
|
||
models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini
|
||
2.x); the conversation now continues while the tool runs. On models that
|
||
don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time
|
||
warning explaining the limitation. (Note: an intermittent 1008 error can
|
||
occasionally fire on Gemini 2.5 during long-running tool calls; we
|
||
auto-reconnect.)
|
||
(PR [#4448](https://github.com/pipecat-ai/pipecat/pull/4448))
|
||
|
||
- Added `NvidiaSageMakerWebsocketSTTService` for streaming speech recognition
|
||
using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint.
|
||
Produces `InterimTranscriptionFrame` and `TranscriptionFrame` frames, is
|
||
VAD-aware, and automatically reconnects on error.
|
||
(PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
|
||
|
||
- Added NVIDIA Magpie TTS services via AWS SageMaker:
|
||
`NvidiaSageMakerHTTPTTSService` (single HTTP invocation, streams raw PCM
|
||
back) and `NvidiaSageMakerWebsocketTTSService` (persistent HTTP/2 bidi-stream
|
||
with full interruption support via `InterruptibleTTSService`).
|
||
(PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
|
||
|
||
- Added support for `reasoning` configuration on `OpenAIRealtimeLLMService`,
|
||
for use with reasoning-capable Realtime models such as `gpt-realtime-2`.
|
||
(PR [#4470](https://github.com/pipecat-ai/pipecat/pull/4470))
|
||
|
||
- Inworld TTS updates:
|
||
- Added `delivery_mode` setting (`STABLE`/`BALANCED`/`CREATIVE`) to
|
||
`InworldTTSService` and `InworldHttpTTSService`, enabling the
|
||
stability-vs-creativity tradeoff in `inworld-tts-2`.
|
||
- Added language support to `InworldTTSService` and
|
||
`InworldHttpTTSService`. The `language` setting is now forwarded to the API,
|
||
and a new `language_to_inworld_language()` helper normalizes Pipecat
|
||
`Language` enums to Inworld's BCP-47 locale tags.
|
||
(PR [#4473](https://github.com/pipecat-ai/pipecat/pull/4473))
|
||
|
||
### Changed
|
||
|
||
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the
|
||
generally available `tts-rt-v1`.
|
||
(PR [#4386](https://github.com/pipecat-ai/pipecat/pull/4386))
|
||
|
||
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16`
|
||
to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the
|
||
`use_normalized_timestamps` and `max_buffer_delay_ms` fields.
|
||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||
|
||
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead
|
||
of the deprecated `use_original_timestamps` field. Word timestamps now
|
||
reflect what was actually spoken (post text-normalization and
|
||
pronunciation-dictionary substitution), matching the convention Pipecat uses
|
||
for ElevenLabs. This is a behavior change for `sonic-3` users, who were
|
||
previously receiving timestamps tied to the input transcript.
|
||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||
|
||
- Broadened `tool_resources` to `app_resources` for easy access not just in
|
||
tool handlers but in other places like custom `FrameProcessor`s. Three
|
||
changes: a rename (`tool_resources` → `app_resources`), a new `app_resources`
|
||
property on `PipelineTask`, and a new `pipeline_task` property on
|
||
`FrameProcessor`. Tool handlers now read `params.app_resources`; custom
|
||
processors read `self.pipeline_task.app_resources`. The previous
|
||
`tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and
|
||
`FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit
|
||
`DeprecationWarning`s.
|
||
(PR [#4395](https://github.com/pipecat-ai/pipecat/pull/4395))
|
||
|
||
- Lowered the per-message log in
|
||
`SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App
|
||
messages can be high-frequency and were noisy at debug level; set the loguru
|
||
level to `TRACE` to see them again.
|
||
(PR [#4397](https://github.com/pipecat-ai/pipecat/pull/4397))
|
||
|
||
- Changed the default model for `GrokRealtimeLLMService` to
|
||
`grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The
|
||
previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is
|
||
being removed.
|
||
(PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
|
||
|
||
- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to
|
||
`inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`,
|
||
`InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing
|
||
users can pin the prior model explicitly via the `model`/`tts_model`
|
||
argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid
|
||
model IDs.
|
||
(PR [#4422](https://github.com/pipecat-ai/pipecat/pull/4422))
|
||
|
||
- Changed the default model for `GrokLLMService` from `grok-3` to
|
||
`grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.
|
||
(PR [#4429](https://github.com/pipecat-ai/pipecat/pull/4429))
|
||
|
||
- `DeepgramFluxSTT` watchdog silence threshold is now dynamic:
|
||
`max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms.
|
||
This prevents false silence injections when large audio chunks are sent at
|
||
lower frequency.
|
||
(PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
|
||
|
||
- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the
|
||
turn is complete (on `on_turn_context_completed`) rather than waiting until
|
||
all audio has finished playing back. The `isFinal` message from ElevenLabs is
|
||
now used to signal `TTSStoppedFrame` and clean up the audio context,
|
||
improving turn transition timing.
|
||
(PR [#4433](https://github.com/pipecat-ai/pipecat/pull/4433))
|
||
|
||
- Updated `InworldHttpTTSService` and `InworldTTSService` to use PCM audio
|
||
encoding by default, which returns audio bytes without headers.
|
||
(PR [#4446](https://github.com/pipecat-ai/pipecat/pull/4446))
|
||
|
||
- Moved `create_task`, `cancel_task`, the `task_manager` property, and
|
||
`setup(task_manager)` up from `FrameProcessor` to `BaseObject`. Custom
|
||
`BaseObject` subclasses (turn strategies, controllers, etc.) now inherit
|
||
these methods directly instead of reimplementing the task manager wiring.
|
||
Owners propagate the task manager to their child `BaseObject`s via `await
|
||
child.setup(task_manager)`.
|
||
(PR [#4449](https://github.com/pipecat-ai/pipecat/pull/4449))
|
||
|
||
- Changed the default OpenAI Realtime input audio transcription model from
|
||
`gpt-4o-transcribe` to `gpt-realtime-whisper` for both
|
||
`OpenAIRealtimeSTTService` and `OpenAIRealtimeLLMService`. The new model does
|
||
not accept the `prompt` parameter; if a prompt is supplied alongside
|
||
`gpt-realtime-whisper`, it is dropped automatically and a warning is logged.
|
||
To keep using prompt hints, explicitly pin `model="gpt-4o-transcribe"` (or
|
||
`"gpt-4o-mini-transcribe"`).
|
||
(PR [#4450](https://github.com/pipecat-ai/pipecat/pull/4450))
|
||
|
||
- Updated the default model for `CartesiaTTSService` and
|
||
`CartesiaHttpTTSService` from `sonic-3` to `sonic-3.5`.
|
||
(PR [#4462](https://github.com/pipecat-ai/pipecat/pull/4462))
|
||
|
||
- Changed the default model for `OpenAIRealtimeLLMService` from
|
||
`gpt-realtime-1.5` to `gpt-realtime-2`.
|
||
(PR [#4472](https://github.com/pipecat-ai/pipecat/pull/4472))
|
||
|
||
### Deprecated
|
||
|
||
- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use
|
||
`user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add
|
||
`LLMTurnCompletionUserTurnStopStrategy` to a custom
|
||
`user_turn_strategies.stop`) instead. Setting the legacy flag still works for
|
||
one release: the aggregator emits a `DeprecationWarning` and rewires the
|
||
strategies as if you had passed `FilterIncompleteUserTurnStrategies`
|
||
directly.
|
||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||
|
||
- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the
|
||
`create_file_resampler()` / `create_stream_resampler()` factories).
|
||
Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class
|
||
will be removed in Pipecat 2.0 along with the default `resampy` and `numba`
|
||
dependencies.
|
||
(PR [#4428](https://github.com/pipecat-ai/pipecat/pull/4428))
|
||
|
||
### Fixed
|
||
|
||
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as
|
||
`ErrorFrame`s. The latest API emits a `flush_done` per transcript when
|
||
server-side buffering is disabled; Pipecat now consumes them silently since
|
||
each turn already has its own `context_id`.
|
||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||
|
||
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`,
|
||
`VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance
|
||
(e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both
|
||
the class and an instance.
|
||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||
|
||
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200
|
||
response — one with the API's error text and a second, less informative
|
||
"Unknown error" frame from the outer exception handler. It now pushes a
|
||
single frame that includes the HTTP status code and returns cleanly.
|
||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||
|
||
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally
|
||
for user turn stop strategies. It is now only imported when
|
||
`default_user_turn_stop_strategies()` is called. This improves startup time
|
||
and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning
|
||
when the default stop strategies are not used.
|
||
(PR [#4393](https://github.com/pipecat-ai/pipecat/pull/4393))
|
||
|
||
- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was
|
||
stored in `Settings` but never sent to xAI, so every session silently fell
|
||
back to xAI's server-side default. The model is now passed via the `?model=`
|
||
query parameter on the WebSocket URL as xAI's Voice Agent API requires.
|
||
(PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
|
||
|
||
- Fixed `on_user_turn_stopped` firing prematurely when
|
||
`filter_incomplete_user_turns` was enabled. The event now fires only after
|
||
the LLM confirms the user turn is complete (`✓`); previously the smart-turn
|
||
detector's tentative stop was bubbling up before the LLM had a chance to veto
|
||
it, causing observers, transcript appenders and UI indicators to receive an
|
||
early — and sometimes duplicated — signal.
|
||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||
|
||
- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting
|
||
across two assistant messages in the LLM context and not surfacing in
|
||
`on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted
|
||
at the end of a TTS context now carries a PTS just past the last word so it
|
||
can't overtake clock-queued `TTSTextFrame`s in the transport's output, and
|
||
`LLMAssistantAggregator` now triggers
|
||
`on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the
|
||
frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting
|
||
transcripts).
|
||
(PR [#4414](https://github.com/pipecat-ai/pipecat/pull/4414))
|
||
|
||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged
|
||
words (e.g. `bookLook`) when using Flash models. Flash often splits sentences
|
||
mid-stream into alignment chunks that begin with a real inter-word space, but
|
||
the previous fix unconditionally stripped that space from every chunk.
|
||
Leading spaces are now stripped only on the first alignment chunk of an
|
||
utterance, so subsequent chunks correctly flush partial words across
|
||
boundaries.
|
||
(PR [#4415](https://github.com/pipecat-ai/pipecat/pull/4415))
|
||
|
||
- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor
|
||
erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`
|
||
was set in the environment. The half-populated kwargs are no longer forwarded
|
||
to aioboto3; partial env-var configurations now fall through to the boto3
|
||
credential chain like fully-unset configurations do.
|
||
(PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
|
||
|
||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing
|
||
romanized/normalized text to the LLM context. With non-Latin input (e.g.,
|
||
Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao
|
||
!` instead of `你好!`), which then degraded subsequent LLM turns. The services
|
||
now consume `alignment` by default and only switch to `normalizedAlignment` /
|
||
`normalized_alignment` when `pronunciation_dictionary_locators` is configured
|
||
(where `alignment` has overlapping restarts that produce duplicated/garbled
|
||
words, per #4316). Both fields are read with preferred-with-fallback
|
||
semantics since each is nullable per the API schema.
|
||
(PR [#4424](https://github.com/pipecat-ai/pipecat/pull/4424))
|
||
|
||
- Fixed a deadlock in `TTSService` that could permanently stall pipeline
|
||
processing when all three conditions occurred together:
|
||
`pause_frame_processing=True`, an interruption arrived before any TTS audio
|
||
was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`,
|
||
`FunctionCallResultFrame`) was in the processing queue at that moment. The
|
||
process task would block on `__process_event.wait()` indefinitely because
|
||
`BotStoppedSpeakingFrame` never arrives (no audio was played) and the
|
||
interruption handler did not resume processing. Affects services using
|
||
`pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and
|
||
ResembleAI.
|
||
(PR [#4431](https://github.com/pipecat-ai/pipecat/pull/4431))
|
||
|
||
- Fixed interruptions being delayed when a slow non-uninterruptible frame was
|
||
processing and an uninterruptible frame was waiting in the queue. The bot
|
||
would stall until the slow frame finished instead of cancelling it
|
||
immediately on interruption.
|
||
(PR [#4434](https://github.com/pipecat-ai/pipecat/pull/4434))
|
||
|
||
- Fixed `TTSService` dropping uninterruptible frames (e.g.
|
||
`FunctionCallResultFrame`) from its internal serialization queue when an
|
||
interruption occurs. Previously, the queue was recreated on every
|
||
interruption, silently discarding any queued frames. The queue is now reset
|
||
instead of recreated, preserving uninterruptible frames so they are always
|
||
delivered downstream.
|
||
(PR [#4435](https://github.com/pipecat-ai/pipecat/pull/4435))
|
||
|
||
- Fixed a race condition in the Daily transport that caused `AttributeError:
|
||
'NoneType' object has no attribute 'send_app_message'` when tearing down a
|
||
pipeline. Both `DailyInputTransport` and `DailyOutputTransport` share the
|
||
same `DailyTransportClient` and both call `cleanup()`, which was releasing
|
||
the underlying `CallClient` on the first call — leaving the second caller
|
||
with a `None` client.
|
||
(PR [#4440](https://github.com/pipecat-ai/pipecat/pull/4440))
|
||
|
||
- Restored `cancel_on_interruption=False` support for `AWSNovaSonicLLMService`
|
||
and `OpenAIRealtimeLLMService`. These services previously honored the flag by
|
||
simply not cancelling in-flight function calls on interruption; the
|
||
introduction of the new async-tool mechanism (which threads
|
||
started/intermediate/final messages through the LLM context) broke that path
|
||
because the realtime services didn't know how to interpret those messages.
|
||
Note that new-style streamed intermediate results
|
||
(`FunctionCallResultProperties(is_final=False)`) are not supported on these
|
||
realtime services. Similar fixes for other impacted realtime services are
|
||
forthcoming.
|
||
(PR [#4441](https://github.com/pipecat-ai/pipecat/pull/4441))
|
||
|
||
- Fixed two misspelled Gemini TTS voice names in
|
||
`GeminiTTSService.AVAILABLE_VOICES`.
|
||
(PR [#4443](https://github.com/pipecat-ai/pipecat/pull/4443))
|
||
|
||
- Extended the `cancel_on_interruption=False` regression fix to
|
||
`GrokRealtimeLLMService`, `AzureRealtimeLLMService`, and
|
||
`UltravoxRealtimeLLMService`. Grok and Azure use the same approach as in
|
||
#4441 (each service detects async-tool messages in the LLM context and routes
|
||
the final result to its formal tool-result channel; Azure inherits
|
||
transitively from `OpenAIRealtimeLLMService`). Ultravox needed a different
|
||
approach because its API freezes the conversation between
|
||
`client_tool_invocation` and the matching `client_tool_result` — for
|
||
async-registered functions it now ships a placeholder `client_tool_result`
|
||
immediately when the function is invoked (to unfreeze the conversation), then
|
||
injects the real result as user-side text once the tool finishes. Streamed
|
||
intermediate results (`FunctionCallResultProperties(is_final=False)`) are
|
||
still not supported on any of these realtime services. `GeminiLiveLLMService`
|
||
and `InworldRealtimeLLMService` are excluded for now: Gemini Live's
|
||
async-tool path needs deeper investigation, and Inworld tool calling needs to
|
||
be sorted out first.
|
||
(PR [#4447](https://github.com/pipecat-ai/pipecat/pull/4447))
|
||
|
||
- Fixed `OpenAIRealtimeLLMService` handling of multi-output-item responses
|
||
(observed with `gpt-realtime-2`). A single response can now contain more than
|
||
one audio item, and the first item's `audio.done` may arrive after the second
|
||
item's deltas have started. Deltas still arrive strictly in playback order,
|
||
so we continue to forward them as received (matching OpenAI's reference
|
||
implementation). The fix removes spurious warnings, ensures truncation always
|
||
targets the latest audio item, and emits a single bracketing
|
||
`TTSStartedFrame`/`TTSStoppedFrame` pair per assistant turn (the Stopped is
|
||
now pushed on `response.done`).
|
||
(PR [#4465](https://github.com/pipecat-ai/pipecat/pull/4465))
|
||
|
||
- Fixed missing `output` attribute on LLM OpenTelemetry spans when the LLM call
|
||
is interrupted mid-stream.
|
||
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
|
||
|
||
- Fixed incorrect `metrics.ttfb` on STT OpenTelemetry spans, and parented them
|
||
to the current turn span.
|
||
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
|
||
|
||
- Fixed incorrect `metrics.ttfb` on TTS OpenTelemetry spans for streaming
|
||
services.
|
||
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
|
||
|
||
- Extended the `cancel_on_interruption=False` regression fix to
|
||
`InworldRealtimeLLMService`. Uses the same approach as in #4441 (the service
|
||
detects async-tool messages in the LLM context and routes the final result to
|
||
its formal tool-result channel). Note: as of this writing, Inworld Realtime
|
||
doesn't appear to handle the resulting delayed tool result reliably — the
|
||
routing is best-effort and the service surfaces a one-time warning when
|
||
async-tool messages are seen. Streamed intermediate results
|
||
(`FunctionCallResultProperties(is_final=False)`) are still not supported on
|
||
this realtime service. (Inworld was excluded from #4447 pending resolution of
|
||
an unrelated tool-calling issue, which turned out to be an account-level
|
||
matter.)
|
||
(PR [#4474](https://github.com/pipecat-ai/pipecat/pull/4474))
|
||
|
||
- Fixed Cartesia TTS Korean word timestamps to use normal spacing rules,
|
||
preserving word boundaries and per-word timestamp alignment during downstream
|
||
aggregation.
|
||
(PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
|
||
|
||
- Fixed Cartesia TTS Chinese and Japanese timestamp grouping to preserve
|
||
provider text spacing, avoiding artificial spaces when timestamp groups are
|
||
reassembled downstream.
|
||
(PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
|
||
|
||
- Fixed `SonioxSTTService` final transcription frames missing detected language
|
||
metadata when Soniox returns token-level language annotations.
|
||
(PR [#4482](https://github.com/pipecat-ai/pipecat/pull/4482))
|
||
|
||
- Fixed Soniox final transcription language detection to use the most common
|
||
recognized token language, avoiding mislabeling an utterance when the last
|
||
token is tagged with a different language.
|
||
(PR [#4495](https://github.com/pipecat-ai/pipecat/pull/4495))
|
||
|
||
- Fixed dropped audio in streaming TTS services whose wire protocol doesn't
|
||
echo `context_id` back on incoming audio (Sarvam, Smallest, Soniox, Inworld,
|
||
and others). Previously, audio that arrived between contexts or at the very
|
||
start of a turn was tagged with `context_id=None` and silently dropped with
|
||
an "unable to append audio to context: no context ID provided" debug log.
|
||
`TTSService.get_active_audio_context_id()` now falls back to the
|
||
synthesis-side `_turn_context_id` when the playback cursor isn't set yet.
|
||
(PR [#4497](https://github.com/pipecat-ai/pipecat/pull/4497))
|
||
|
||
### Security
|
||
|
||
- Fixed a path traversal issue in the development runner's
|
||
`/files/{filename:path}` download endpoint. Previously, when the runner was
|
||
started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could
|
||
escape the configured folder because `%2F`-encoded separators bypassed
|
||
Starlette's path normalisation. The endpoint now resolves the joined path and
|
||
rejects any filename that escapes the allowed base with a 403, and also
|
||
returns 404 (instead of an implicit `null` 200) when `--folder` is unset.
|
||
(PR [#4417](https://github.com/pipecat-ai/pipecat/pull/4417))
|
||
|
||
## [1.1.0] - 2026-04-27
|
||
|
||
### Added
|
||
|
||
- Added `MistralSTTService` for real-time speech-to-text using Mistral's
|
||
Voxtral Realtime API (`voxtral-mini-transcribe-realtime-2602`). Supports
|
||
streaming transcription with interim results, automatic language detection,
|
||
and VAD-driven utterance lifecycle.
|
||
(PR [#4253](https://github.com/pipecat-ai/pipecat/pull/4253))
|
||
|
||
- Added `buttons` field to `OutputDTMFFrame` and `OutputDTMFUrgentFrame` for
|
||
sending multi-key DTMF sequences as a `list[KeypadEntry]`. Use
|
||
`OutputDTMFFrame.from_string("123#")` (or the equivalent on
|
||
`OutputDTMFUrgentFrame`) to build one from a dial string, and `to_string()`
|
||
to convert back.
|
||
(PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))
|
||
|
||
- Added `DailyTransport.send_dtmf()` to expose the Daily call client's DTMF
|
||
sending capability, enabling applications to send tones during a call (e.g.
|
||
IVR navigation).
|
||
(PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))
|
||
|
||
- Added `DailyOutputDTMFFrame` and `DailyOutputDTMFUrgentFrame` frames. In
|
||
addition to the inherited `buttons`, they accept `session_id`,
|
||
`digit_duration_ms` and `method`, which are forwarded to Daily's `send_dtmf`
|
||
as `sessionId`, `digitDurationMs` and `method`.
|
||
(PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))
|
||
|
||
- Added incremental `pyright` type checking. A `pyrightconfig.json` at the repo
|
||
root uses `typeCheckingMode: "basic"` with an explicit `include` list of
|
||
modules that pass cleanly (`clocks`, `metrics`, `transcriptions`, `frames`,
|
||
`observers`, `extensions`, `turns`, `pipeline`, `runner`). Remaining modules
|
||
will be added in subsequent PRs. CI enforces the checked set via `uv run
|
||
pyright` in the format workflow.
|
||
(PR [#4324](https://github.com/pipecat-ai/pipecat/pull/4324))
|
||
|
||
- Added multilingual support to `DeepgramFluxSTTService` via a new
|
||
`language_hints: list[Language]` setting. Works with Deepgram's new
|
||
`flux-general-multi` model to bias transcription across English, Spanish,
|
||
French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch.
|
||
Omit the hints to use auto-detection, or pass a subset to bias toward
|
||
expected languages. Hints can be updated mid-stream via
|
||
`STTUpdateSettingsFrame` (sent as a Deepgram `Configure` control message, no
|
||
reconnect) to support detect-then-lock flows.
|
||
(PR [#4326](https://github.com/pipecat-ai/pipecat/pull/4326))
|
||
|
||
- Added fine-grained server-side VAD tuning options to
|
||
`SarvamSTTService.Settings` for the `saaras:v3` model, including speech
|
||
thresholds, frame-count controls, pre-speech padding, interruption
|
||
sensitivity, and initial-frame skipping.
|
||
(PR [#4334](https://github.com/pipecat-ai/pipecat/pull/4334))
|
||
|
||
- Added `XAISTTService` for real-time speech-to-text using xAI's voice STT
|
||
WebSocket API (`wss://api.x.ai/v1/stt`). Streams raw audio (PCM, µ-law, or
|
||
A-law) and emits interim and final transcription frames driven by the
|
||
server's `is_final` / `speech_final` flags. Settings expose
|
||
`interim_results`, `endpointing`, `language`, `multichannel`, `channels`, and
|
||
`diarize`. Requires the `xai` optional extra (`pip install
|
||
"pipecat-ai[xai]"`).
|
||
(PR [#4340](https://github.com/pipecat-ai/pipecat/pull/4340))
|
||
|
||
- Added `XAITTSService` for streaming text-to-speech using xAI's WebSocket TTS
|
||
endpoint (`wss://api.x.ai/v1/tts`). Streams `text.delta` chunks up and base64
|
||
`audio.delta` chunks down on the same connection so audio begins flowing
|
||
before the full utterance finishes synthesizing; complements the batch-HTTP
|
||
`XAIHttpTTSService`. Defaults to raw PCM output so `TTSAudioRawFrame` needs
|
||
no decoding. The `xai` optional extra now pulls in
|
||
`pipecat-ai[websockets-base]`.
|
||
(PR [#4341](https://github.com/pipecat-ai/pipecat/pull/4341))
|
||
|
||
- Added `SonioxTTSService`, a real-time WebSocket TTS service that streams text
|
||
in and audio out over a persistent connection. Install with `pip install
|
||
"pipecat-ai[soniox]"`.
|
||
(PR [#4360](https://github.com/pipecat-ai/pipecat/pull/4360))
|
||
|
||
- Added support for Daily's built-in `screenVideo` destination in
|
||
`DailyTransport`. When `"screenVideo"` is included in
|
||
`video_out_destinations` transport parameter, a dedicated screen video track
|
||
is created at join time and frames with `transport_destination="screenVideo"`
|
||
are routed to it.
|
||
|
||
```python
|
||
params = DailyParams(
|
||
video_out_enabled=True,
|
||
video_out_is_live=True,
|
||
video_out_width=1280,
|
||
video_out_height=720,
|
||
video_out_destinations=["screenVideo"]
|
||
)
|
||
|
||
...
|
||
|
||
frame = OutputImageRawFrame(...)
|
||
frame.transport_destination = "screenVideo"
|
||
```
|
||
(PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))
|
||
|
||
- Added `camera_out_send_settings` to `DailyParams`. This dict is passed
|
||
verbatim to the Daily client's camera publishing settings, allowing
|
||
applications to fully control encoding, codec, bitrate, and framerate.
|
||
|
||
```python
|
||
params = DailyParams(
|
||
camera_out_send_settings={
|
||
"maxQuality": "high",
|
||
"encodings": {
|
||
"high": {"maxBitrate": 2_000_000, "maxFramerate": 30}
|
||
},
|
||
},
|
||
)
|
||
```
|
||
(PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))
|
||
|
||
- Added `tool_resources` to `PipelineTask` and `FunctionCallParams`. Pass an
|
||
application-defined object (DB handles, clients, state, etc.) to
|
||
`PipelineTask(..., tool_resources=...)` and access it from any tool handler
|
||
via `params.tool_resources`. Passed by reference; the caller retains their
|
||
handle and can read mutations after the task finishes. Resolves #4256.
|
||
(PR [#4371](https://github.com/pipecat-ai/pipecat/pull/4371))
|
||
|
||
### Changed
|
||
|
||
- Updated NVIDIA STT services to align with Nemotron Speech defaults and
|
||
configuration: `api_key` is now optional for local deployments, additional
|
||
recognition settings are available (including alternatives, word offsets, and
|
||
diarization), and streaming/segmented docs now reflect Nemotron Speech APIs.
|
||
- NVIDIA streaming STT now sets `TranscriptionFrame.finalized=True` when the
|
||
provider marks a result as final, and preserves `language` on both
|
||
`TranscriptionFrame` and `InterimTranscriptionFrame`.
|
||
(PR [#4269](https://github.com/pipecat-ai/pipecat/pull/4269))
|
||
|
||
- Updated `NvidiaLLMService` to emit model reasoning as `LLMThought*Frame`s
|
||
(from both `reasoning_content` and `<think>...</think>` output), avoid mixing
|
||
reasoning text into normal assistant content, and allow keyless local NIM
|
||
endpoints while warning when the cloud endpoint is used without an API key.
|
||
(PR [#4270](https://github.com/pipecat-ai/pipecat/pull/4270))
|
||
|
||
- STT services now reconnect safely when settings change: reconnection is
|
||
deferred until the current user turn ends (i.e., until
|
||
`UserStoppedSpeakingFrame` is received) rather than interrupting an active
|
||
speech session. Audio frames received while the reconnect is in progress are
|
||
buffered and replayed once the new connection is ready. `CartesiaSTTService`
|
||
and `DeepgramSTTService` both use this new behavior.
|
||
(PR [#4311](https://github.com/pipecat-ai/pipecat/pull/4311))
|
||
|
||
- Reduced debug log noise for LLM services. The system instruction is now
|
||
logged once when composed (e.g. when turn completion is enabled) instead of
|
||
on every LLM call. Per-call logs now show only the conversation messages,
|
||
consistent across Google, Anthropic, AWS, and OpenAI services.
|
||
(PR [#4314](https://github.com/pipecat-ai/pipecat/pull/4314))
|
||
|
||
- `LiveKitRunnerArguments.token` is now a required `str` (previously `str |
|
||
None` with a default of `None`). LiveKit requires a token to join a room, so
|
||
the type now reflects reality. This only affects custom runners that
|
||
construct `LiveKitRunnerArguments` directly; code consuming the argument from
|
||
the standard runner is unaffected.
|
||
(PR [#4324](https://github.com/pipecat-ai/pipecat/pull/4324))
|
||
|
||
- `TranscriptionFrame.language` and `InterimTranscriptionFrame.language`
|
||
emitted by `DeepgramFluxSTTService` now reflect the language Deepgram
|
||
detected for each turn (read from the `languages` field on Flux's `TurnInfo`
|
||
event). On `flux-general-multi` this gives per-turn accuracy for downstream
|
||
consumers (e.g. TTS voice selection). `flux-general-en` continues to emit
|
||
`Language.EN`.
|
||
(PR [#4326](https://github.com/pipecat-ai/pipecat/pull/4326))
|
||
|
||
- Added `includes_inter_frame_spaces` parameter to
|
||
`TTSService.add_word_timestamps` and `_add_word_timestamps` (default `None`).
|
||
When `True`, downstream consumers will not inject additional spaces between
|
||
tokens; `None` leaves each frame's own default unchanged.
|
||
- `InworldTTSService` now passes `includes_inter_frame_spaces=True` when
|
||
reporting word timestamps, since Inworld tokens already include inter-word
|
||
spacing.
|
||
(PR [#4330](https://github.com/pipecat-ai/pipecat/pull/4330))
|
||
|
||
- `SarvamSTTService` now uses `saaras:v3` as its default model instead of
|
||
`saarika:v2.5`. Applications that relied on the previous default should set
|
||
`settings=SarvamSTTService.Settings(model="saarika:v2.5")` explicitly.
|
||
(PR [#4334](https://github.com/pipecat-ai/pipecat/pull/4334))
|
||
|
||
- `SpeechTimeoutUserTurnStopStrategy` now waits only `user_speech_timeout` when
|
||
a transcript arrives without a VAD stop event, rather than
|
||
`max(ttfs_p99_latency, user_speech_timeout)`. If you had `ttfs_p99_latency >
|
||
user_speech_timeout`, turn detection in that path is slightly faster than
|
||
before.
|
||
(PR [#4337](https://github.com/pipecat-ai/pipecat/pull/4337))
|
||
|
||
- If you use an STT service that emits finalized transcripts (Speechmatics,
|
||
Soniox, Deepgram Flux, AssemblyAI) with `SpeechTimeoutUserTurnStopStrategy`,
|
||
user turns now end as soon as `user_speech_timeout` elapses after VAD stop.
|
||
Previously the strategy also waited for the STT P99 latency
|
||
(`ttfs_p99_latency`) even when the transcript was already marked final.
|
||
`user_speech_timeout` is still honored as a floor — STT finalization never
|
||
shortens it.
|
||
(PR [#4337](https://github.com/pipecat-ai/pipecat/pull/4337))
|
||
|
||
- ⚠️ `PlivoFrameSerializer` and `TelnyxFrameSerializer` now raise `ValueError`
|
||
at construction when `auto_hang_up=True` (the default) but required
|
||
credentials are missing, matching `TwilioFrameSerializer`. Previously they
|
||
constructed successfully and the hangup failed silently at call-end, leaving
|
||
phantom billable sessions on the provider. If you relied on the old silent
|
||
behavior, pass `auto_hang_up=False` explicitly or provide the credentials.
|
||
The specific fields checked are `call_id`/`auth_id`/`auth_token` for Plivo
|
||
and `call_control_id`/`api_key` for Telnyx.
|
||
(PR [#4349](https://github.com/pipecat-ai/pipecat/pull/4349))
|
||
|
||
- `ToolsSchema(standard_tools=...)` now accepts any `Sequence[FunctionSchema |
|
||
DirectFunction]` rather than requiring an exact `list` of the union. Callers
|
||
can pass a narrower `list[FunctionSchema]` (or any other `Sequence`) without
|
||
the type checker complaining about list invariance.
|
||
(PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))
|
||
|
||
- Updated `aic-sdk` dependency to `~=2.2.0`. The `AIC_LICENSE_KEY` environment
|
||
variable replaces the previous `AICOUSTICS_LICENSE_KEY`.
|
||
(PR [#4362](https://github.com/pipecat-ai/pipecat/pull/4362))
|
||
|
||
- Loosened the `protobuf` dependency to `>=5.29.6,<7`, so projects pinned to
|
||
protobuf 5.x can install `pipecat-ai` again. The previous `>=6.31.1,<7` pin
|
||
(introduced in 1.0.8 alongside the `nvidia-riva-client 2.25.1` upgrade)
|
||
silently blocked any environment whose dependency graph already constrained
|
||
protobuf to the 5.x line. The bundled `frames_pb2.py` is now compiled with
|
||
protoc 5.x so it imports cleanly on both 5.x and 6.x runtimes.
|
||
|
||
Installing the `nvidia` extra still pulls protobuf 6.x: `nvidia-riva-client
|
||
2.25.1` ships gencode that requires a 6.x runtime, so `pipecat-ai[nvidia]`
|
||
now declares `protobuf>=6.31.1,<7` explicitly to cover an upstream packaging
|
||
gap (https://github.com/nvidia-riva/python-clients/issues/172).
|
||
(PR [#4372](https://github.com/pipecat-ai/pipecat/pull/4372))
|
||
|
||
- Daily rooms created by the development runner (`pipecat.runner.run`) now
|
||
expire after 4 hours with `eject_at_room_exp=True`, mirroring Pipecat Cloud's
|
||
max session limit. Previously, runner-created rooms inherited a 2-hour
|
||
expiration on the default code paths and had no expiration at all when
|
||
callers posted partial `dailyRoomProperties` (e.g. `{"start_video_off":
|
||
true}`) to `/start`, causing rooms to accumulate indefinitely. Explicit `exp`
|
||
and `eject_at_room_exp` values in `dailyRoomProperties` are still respected.
|
||
(PR [#4374](https://github.com/pipecat-ai/pipecat/pull/4374))
|
||
|
||
- Updated `daily-python` dependency to `~=0.28.0`.
|
||
(PR [#4379](https://github.com/pipecat-ai/pipecat/pull/4379))
|
||
|
||
### Deprecated
|
||
|
||
- Deprecated `TransportParams.video_out_bitrate` for the Daily transport. Use
|
||
`DailyParams.camera_out_send_settings` instead to configure camera publishing
|
||
encodings (bitrate, framerate, codec, etc.).
|
||
(PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))
|
||
|
||
### Fixed
|
||
|
||
- Fixed missing tool handlers so unregistered tool calls fail with a normal
|
||
final tool result instead of leaving tool-call state hanging.
|
||
(PR [#4301](https://github.com/pipecat-ai/pipecat/pull/4301))
|
||
|
||
- Fixed `pipecat-ai[tavus]` not installing the required `daily-python`
|
||
dependency. Installing the `tavus` extra now correctly pulls in
|
||
`pipecat-ai[daily]`.
|
||
(PR [#4304](https://github.com/pipecat-ai/pipecat/pull/4304))
|
||
|
||
- Fixed audio loss and potential errors when STT settings were updated
|
||
mid-speech. Previously, `CartesiaSTTService` and `DeepgramSTTService` would
|
||
immediately disconnect and reconnect when settings changed, dropping any
|
||
in-flight audio. Reconnection is now deferred until the user stops speaking,
|
||
and audio arriving during the reconnect window is buffered and replayed.
|
||
(PR [#4311](https://github.com/pipecat-ai/pipecat/pull/4311))
|
||
|
||
- Fixed `SmallestTTSService` WebSocket endpoint URL to match Smallest AI v4.0.0
|
||
API (`wss://waves-api.smallest.ai` → `wss://api.smallest.ai`) and restored
|
||
keepalive using a silent space message instead of the unsupported flush
|
||
command.
|
||
(PR [#4320](https://github.com/pipecat-ai/pipecat/pull/4320))
|
||
|
||
- Fixed whitespace handling in TTS token streaming mode. Inter-token whitespace
|
||
(e.g., spaces between words) is now preserved for correct prosody, while
|
||
leading whitespace before the first non-whitespace token is still stripped to
|
||
avoid issues with TTS models that are sensitive to leading spaces.
|
||
(PR [#4323](https://github.com/pipecat-ai/pipecat/pull/4323))
|
||
|
||
- Fixed `SentryMetrics` silently dropping `MetricsFrame`s from
|
||
`stop_ttfb_metrics` and `stop_processing_metrics`. `SentryMetrics` called the
|
||
base `FrameProcessorMetrics` implementation but discarded its return value,
|
||
so `FrameProcessor` never pushed the `MetricsFrame` downstream. This
|
||
prevented observers (e.g. `UserBotLatencyObserver`, `MetricsLogObserver`)
|
||
from seeing TTFB and processing metrics for any service using
|
||
`metrics=SentryMetrics()`. The metrics were still calculated and Sentry
|
||
transactions still completed — only the downstream frame push was affected.
|
||
(PR [#4325](https://github.com/pipecat-ai/pipecat/pull/4325))
|
||
|
||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` emitting word
|
||
timestamps and `TTSTextFrame` content that matched the input text instead of
|
||
the spoken audio when a pronunciation dictionary
|
||
(`pronunciation_dictionary_locators`) or text normalization rewrote the
|
||
input. Both services now consume ElevenLabs' normalized alignment, so
|
||
downstream consumers (captions, transcripts, context aggregation) reflect
|
||
what the listener actually hears.
|
||
(PR [#4344](https://github.com/pipecat-ai/pipecat/pull/4344))
|
||
|
||
- Fixed a crash in `DeepgramSTTService` when an `STTUpdateSettingsFrame`
|
||
arrived before the WebSocket handshake completed (for example, when pushing
|
||
an update upstream on `StartFrame`). The settings-triggered reconnect
|
||
cancelled the in-flight connection task before its keepalive task was
|
||
created, causing an `UnboundLocalError: cannot access local variable
|
||
'keepalive_task'` in the handler's `finally` block.
|
||
(PR [#4347](https://github.com/pipecat-ai/pipecat/pull/4347))
|
||
|
||
- Fixed direct-function registration crashing for functions without a
|
||
docstring. `DirectFunctionWrapper` passed `inspect.getdoc()`'s result to
|
||
`docstring_parser.parse()`, which raises when the docstring is `None`.
|
||
Functions now register cleanly whether or not they have a docstring; an empty
|
||
docstring produces empty description and parameter metadata as expected.
|
||
(PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))
|
||
|
||
- Fixed `AssemblyAISTTService`, `CartesiaSTTService`, `GradiumSTTService`, and
|
||
`SonioxSTTService` crashing the pipeline on transient WebSocket send
|
||
failures. Each `run_stt` sent audio directly without catching errors, so a
|
||
single network hiccup mid-stream raised an uncaught exception through
|
||
`process_frame`. The guards now log a warning and let the connection-state
|
||
check on the next call handle recovery, matching the pattern used by
|
||
Deepgram, xAI, Azure, and other push-based STTs.
|
||
(PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))
|
||
|
||
- Fixed Gemini Live losing conversation history in the (rare) case of a
|
||
WebSocket reconnect before any session resumption handle is received. When
|
||
the session reconnects (e.g. on system instruction change), conversation
|
||
history is now re-seeded into the new session before it is marked ready for
|
||
input.
|
||
(PR [#4355](https://github.com/pipecat-ai/pipecat/pull/4355))
|
||
|
||
- Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte
|
||
MTU (IPv6, Tailscale overlays, many consumer VPNs). aiortc's default SCTP
|
||
chunk size of 1200 bytes produces ~1305-byte UDP datagrams after headers,
|
||
which the kernel rejects with EMSGSIZE; aiortc has no path-MTU discovery so
|
||
it retransmits forever at the same oversized size. The chunk size is now
|
||
clamped to 1100 bytes (~1205-byte datagrams, ~75 bytes of slack). Override
|
||
with `PIPECAT_SCTP_MAX_CHUNK_SIZE` if your path MTU requires a different
|
||
value.
|
||
(PR [#4358](https://github.com/pipecat-ai/pipecat/pull/4358))
|
||
|
||
## [1.0.0] - 2026-04-14
|
||
|
||
Migration guide: https://docs.pipecat.ai/pipecat/migration/migration-1.0
|
||
|
||
### Added
|
||
|
||
- Updated LemonSlice transport:
|
||
- Added `on_avatar_connected` and `on_avatar_disconnected` events triggered
|
||
when the avatar joins and leaves the room.
|
||
- Added `api_url` parameter to `LemonSliceNewSessionRequest` to allow
|
||
overriding the LemonSlice API endpoint.
|
||
- Added support for passing arbitrary named parameters to the LemonSlice
|
||
API endpoint.
|
||
(PR [#3995](https://github.com/pipecat-ai/pipecat/pull/3995))
|
||
|
||
- Added Inworld Realtime LLM service with WebSocket-based cascade STT/LLM/TTS,
|
||
semantic VAD, function calling, and Router support.
|
||
(PR [#4140](https://github.com/pipecat-ai/pipecat/pull/4140))
|
||
|
||
- ⚠️ Added WebSocket-based `OpenAIResponsesLLMService` as the new default for
|
||
the OpenAI Responses API. It maintains a persistent connection to
|
||
`wss://api.openai.com/v1/responses` and automatically uses
|
||
`previous_response_id` to send only incremental context, falling back to full
|
||
context on reconnection or cache miss. The previous HTTP-based implementation
|
||
is now available as `OpenAIResponsesHttpLLMService`.
|
||
(PR [#4141](https://github.com/pipecat-ai/pipecat/pull/4141))
|
||
|
||
- Added `group_parallel_tools` parameter to `LLMService` (default `True`). When
|
||
`True`, all function calls from the same LLM response batch share a group ID
|
||
and the LLM is triggered exactly once after the last call completes. Set to
|
||
`False` to trigger inference independently for each function call result as
|
||
it arrives.
|
||
(PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))
|
||
|
||
- Added async function call support to `register_function()` and
|
||
`register_direct_function()` via `cancel_on_interruption=False`. When set to
|
||
`False`, the LLM continues the conversation immediately without waiting for
|
||
the function result. The result is injected back into the context as a
|
||
`developer` message once available, triggering a new LLM inference at that
|
||
point.
|
||
(PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))
|
||
|
||
- Added `enable_prompt_caching` setting to `AWSBedrockLLMService` for Bedrock
|
||
ConverseStream prompt caching.
|
||
(PR [#4219](https://github.com/pipecat-ai/pipecat/pull/4219))
|
||
|
||
- Added support for streaming intermediate results from async function calls.
|
||
Call `result_callback` multiple times with
|
||
`properties=FunctionCallResultProperties(is_final=False)` to push incremental
|
||
updates, then call it once more (with `is_final=True`, the default) to
|
||
deliver the final result. Only valid for functions registered with
|
||
`cancel_on_interruption=False`.
|
||
(PR [#4230](https://github.com/pipecat-ai/pipecat/pull/4230))
|
||
|
||
- Added `LLMMessagesTransformFrame` to facilitate programmatically editing
|
||
context in a frame-based way.
|
||
|
||
The previous approach required the caller to directly grab a reference to
|
||
the context object, grab a "snapshot" of its messages _at that point in
|
||
time_, transform the messages, and then push an `LLMMessagesUpdateFrame` with
|
||
the transformed messages. This approach can lead to problems: what if there
|
||
had already been a change to the context queued in the pipeline? The
|
||
transformed messages would simply overwrite it without consideration.
|
||
(PR [#4231](https://github.com/pipecat-ai/pipecat/pull/4231))
|
||
|
||
- The development runner now exports a module-level `app` FastAPI instance
|
||
(`from pipecat.runner.run import app`) so you can register custom routes
|
||
before calling `main()`.
|
||
(PR [#4234](https://github.com/pipecat-ai/pipecat/pull/4234))
|
||
|
||
- `ToolsSchema` now accepts `custom_tools` for OpenAI LLM services
|
||
(`OpenAILLMService`, `OpenAIResponsesLLMService`,
|
||
`OpenAIResponsesHttpLLMService`, and `OpenAIRealtimeLLMService`), letting you
|
||
pass provider-specific tools like `tool_search` alongside standard function
|
||
tools.
|
||
(PR [#4248](https://github.com/pipecat-ai/pipecat/pull/4248))
|
||
|
||
- Added enhancements to `NvidiaTTSService`:
|
||
|
||
- Cross-sentence stitching: multiple sentences within an LLM turn are fed
|
||
into a single `SynthesizeOnline` gRPC stream for seamless audio across
|
||
sentence boundaries (requires Magpie TTS model v1.7.0+).
|
||
- `custom_dictionary` and `encoding` parameters for IPA-based custom
|
||
pronunciation and output audio encoding.
|
||
- Metrics generation (`can_generate_metrics` returns true) and
|
||
`stop_all_metrics()` when an audio context is interrupted.
|
||
- gRPC error handling around synthesis config retrieval
|
||
(`GetRivaSynthesisConfig`).
|
||
(PR [#4249](https://github.com/pipecat-ai/pipecat/pull/4249))
|
||
|
||
- Added `MistralTTSService` for streaming text-to-speech using Mistral's
|
||
Voxtral TTS API (`voxtral-mini-tts-2603`). Supports SSE-based audio streaming
|
||
with automatic resampling from the API's native 24kHz to any requested sample
|
||
rate. Requires the `mistral` optional extra (`pip install
|
||
pipecat-ai[mistral]`).
|
||
(PR [#4251](https://github.com/pipecat-ai/pipecat/pull/4251))
|
||
|
||
- Added `truncate_large_values` parameter to `LLMContext.get_messages()`. When
|
||
`True`, returns compact deep copies of messages with binary data (base64
|
||
images, audio) replaced by short placeholders and long string values in
|
||
LLM-specific messages recursively truncated. Useful for serialization,
|
||
logging, and debugging tools.
|
||
(PR [#4272](https://github.com/pipecat-ai/pipecat/pull/4272))
|
||
|
||
- `CartesiaSTTService` now supports runtime settings updates (e.g. changing
|
||
`language` or `model` via `STTUpdateSettingsFrame`). The service
|
||
automatically reconnects with the new parameters. Previously, settings
|
||
updates were silently ignored.
|
||
(PR [#4282](https://github.com/pipecat-ai/pipecat/pull/4282))
|
||
|
||
- Added `pcm_32000` and `pcm_48000` sample rate support to ElevenLabs TTS
|
||
services.
|
||
(PR [#4293](https://github.com/pipecat-ai/pipecat/pull/4293))
|
||
|
||
- Added `enable_logging` parameter to `ElevenLabsHttpTTSService`. Set to
|
||
`False` to enable zero retention mode (enterprise only).
|
||
(PR [#4293](https://github.com/pipecat-ai/pipecat/pull/4293))
|
||
|
||
### Changed
|
||
|
||
- Updated `onnxruntime` from 1.23.2 to 1.24.3, adding support for Python 3.14.
|
||
(PR [#3984](https://github.com/pipecat-ai/pipecat/pull/3984))
|
||
|
||
- MCPClient now requires async with MCPClient(...) as mcp: or explicit
|
||
start()/close() calls to manage the connection lifecycle.
|
||
(PR [#4034](https://github.com/pipecat-ai/pipecat/pull/4034))
|
||
|
||
- ⚠️ Updated `langchain` extra to require langchain 1.x (from 0.3.x),
|
||
langchain-community 0.4.x (from 0.3.x), and langchain-openai 1.x (from
|
||
0.3.x). If you pin these packages in your project, update your pins
|
||
accordingly.
|
||
(PR [#4192](https://github.com/pipecat-ai/pipecat/pull/4192))
|
||
|
||
- `WebsocketService` reconnection errors are now non-fatal. When a websocket
|
||
service exhausts its reconnection attempts (either via exponential backoff or
|
||
quick failure detection), it emits a non-fatal `ErrorFrame` instead of a
|
||
fatal one. This allows application-level failover (e.g. `ServiceSwitcher`) to
|
||
handle the failure instead of killing the entire pipeline.
|
||
(PR [#4201](https://github.com/pipecat-ai/pipecat/pull/4201))
|
||
|
||
- Changed `GrokLLMService` default model from `grok-3-beta` to `grok-3`, now
|
||
that the model is generally available.
|
||
(PR [#4209](https://github.com/pipecat-ai/pipecat/pull/4209))
|
||
|
||
- `GoogleImageGenService` now defaults to `imagen-4.0-generate-001` (previously
|
||
`imagen-3.0-generate-002`).
|
||
(PR [#4213](https://github.com/pipecat-ai/pipecat/pull/4213))
|
||
|
||
- ⚠️ `BaseOpenAILLMService.get_chat_completions()` now accepts an `LLMContext`
|
||
instead of `OpenAILLMInvocationParams`. If you override this method, update
|
||
your signature accordingly.
|
||
(PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))
|
||
|
||
- When multiple function calls are returned in a single LLM response, by
|
||
default (when `group_parallel_tools=True`) the LLM is now triggered exactly
|
||
once after the last call in the batch completes, rather than waiting for all
|
||
function calls.
|
||
(PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))
|
||
|
||
- ⚠️ `LLMService.function_call_timeout_secs` now defaults to `None` instead of
|
||
`10.0`. Deferred function calls will run indefinitely unless a timeout is
|
||
explicitly set at the service level or per-call. If you relied on the
|
||
previous 10-second default, pass `function_call_timeout_secs=10.0`
|
||
explicitly.
|
||
(PR [#4224](https://github.com/pipecat-ai/pipecat/pull/4224))
|
||
|
||
- Updated `NvidiaTTSService`:
|
||
|
||
- Made `api_key` optional for local NIM deployments.
|
||
- Voice, language, and quality can be updated without reconnecting the gRPC
|
||
client; new values take effect on the next synthesis turn, not for the
|
||
current turn's in-flight requests.
|
||
- Replaced per-sentence synchronous `synthesize_online` calls with async
|
||
queue-backed gRPC streaming.
|
||
- Streaming now uses asyncio tasks with explicit gRPC cancellation on
|
||
interruption and stale-response filtering when a stream is aborted or
|
||
replaced.
|
||
- Renamed Riva references to Nemotron Speech in docs and messages.
|
||
- Disabled automatic TTS start frames at the service level
|
||
(`push_start_frame=False`) and emit `TTSStartedFrame` when a stitched
|
||
synthesis stream is started for a context.
|
||
(PR [#4249](https://github.com/pipecat-ai/pipecat/pull/4249))
|
||
|
||
### Removed
|
||
|
||
- ⚠️ Removed `OpenPipeLLMService` and the `openpipe` extra. OpenPipe was
|
||
acquired by CoreWeave and the package is no longer maintained. If you were
|
||
using `openpipe` as an LLM provider, switch to the underlying provider
|
||
directly (e.g. `openai`). The OpenPipe interface can still be used with
|
||
`OpenAILLMService` by specifying a `base_url`.
|
||
(PR [#4191](https://github.com/pipecat-ai/pipecat/pull/4191))
|
||
|
||
- ⚠️ Removed `NoisereduceFilter`. Use system-level noise reduction or a
|
||
service-based alternative instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated `vad_enabled` and `vad_audio_passthrough` transport
|
||
params.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated `camera_in_enabled`, `camera_in_is_live`,
|
||
`camera_in_width`, `camera_in_height`, `camera_out_enabled`,
|
||
`camera_out_is_live`, `camera_out_width`, `camera_out_height`, and
|
||
`camera_out_color` transport params. Use the `video_in_*` and `video_out_*`
|
||
equivalents instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `FrameProcessor.wait_for_task()`. Use `create_task()` and manage
|
||
tasks with the built-in `TaskManager` instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated transport frames: `TransportMessageFrame`,
|
||
`TransportMessageUrgentFrame`, `InputTransportMessageUrgentFrame`,
|
||
`DailyTransportMessageFrame`, and `DailyTransportMessageUrgentFrame`. Use
|
||
`OutputTransportMessageFrame`, `OutputTransportMessageUrgentFrame`,
|
||
`InputTransportMessageFrame`, `DailyOutputTransportMessageFrame`, and
|
||
`DailyOutputTransportMessageUrgentFrame` instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `create_default_resampler()` from `pipecat.audio.utils`.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `DailyRunner.configure_with_args()`. Use `PipelineRunner` with
|
||
`RunnerArguments` instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated `on_pipeline_ended`, `on_pipeline_cancelled`, and
|
||
`on_pipeline_stopped` events from `PipelineTask`. Use `on_pipeline_finished`
|
||
instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed single-argument function call support from `LLMService`. Functions
|
||
must use named parameters instead of a single `arguments` parameter.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `FalSmartTurnAnalyzer` and `LocalSmartTurnAnalyzer`.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `RTVIObserver.errors_enabled` parameter.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated RTVI models, frames, and processor methods including
|
||
`RTVIConfig`, `RTVIServiceConfig`, `RTVIServiceOptionConfig`, various
|
||
`RTVI*Data` models, `RTVIActionFrame`, and
|
||
`RTVIProcessor.handle_function_call`/`handle_function_call_start`. Use the
|
||
updated RTVI processor API instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated `KeypadEntryFrame` alias.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated interruption frames: `StartInterruptionFrame` and
|
||
`BotInterruptionFrame`. Use `InterruptionFrame` and `InterruptionTaskFrame`
|
||
instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `LLMService.request_image_frame()`. Push a `UserImageRequestFrame`
|
||
instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `TTSService.say()`. Push a `TTSSpeakFrame` into the pipeline
|
||
instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `KrispFilter`. The `krisp` extra has been removed from
|
||
`pyproject.toml`.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `AudioBufferProcessor.user_continuous_stream` parameter. Use
|
||
`user_audio_passthrough` instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed `LLMService.start_callback` parameter. Register an
|
||
`on_llm_response_start` event handler instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated `observers` field from `PipelineParams`. Pass observers
|
||
directly to `PipelineTask` constructor instead.
|
||
(PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.openai_realtime` package. Use
|
||
`pipecat.services.openai.realtime` instead.
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.google.llm_vertex` module. Use
|
||
`pipecat.services.google.vertex.llm` instead.
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `GoogleLLMOpenAIBetaService` from
|
||
`pipecat.services.google.openai`. Use `GoogleLLMService` from
|
||
`pipecat.services.google.llm` instead.
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `OpenAIRealtimeBetaLLMService` and
|
||
`AzureRealtimeBetaLLMService`. Use `OpenAIRealtimeLLMService` and
|
||
`AzureRealtimeLLMService` from `pipecat.services.openai.realtime` and
|
||
`pipecat.services.azure.realtime` instead.
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.ai_services` module. Import from
|
||
`pipecat.services.ai_service`, `pipecat.services.llm_service`,
|
||
`pipecat.services.stt_service`, `pipecat.services.tts_service`, etc. instead.
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.gemini_multimodal_live` package. Use
|
||
`pipecat.services.google.gemini_live` instead. Note that class names no
|
||
longer include "Multimodal" (e.g. `GeminiMultimodalLiveLLMService` →
|
||
`GeminiLiveLLMService`).
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.google.gemini_live.llm_vertex`
|
||
module. Use `pipecat.services.google.gemini_live.vertex.llm` instead.
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.nim` package. Use
|
||
`pipecat.services.nvidia.llm` instead (`NimLLMService` → `NvidiaLLMService`).
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.deepgram.stt_sagemaker` and
|
||
`pipecat.services.deepgram.tts_sagemaker` modules. Use
|
||
`pipecat.services.deepgram.sagemaker.stt` and
|
||
`pipecat.services.deepgram.sagemaker.tts` instead.
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.aws_nova_sonic` package. Use
|
||
`pipecat.services.aws.nova_sonic` instead.
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.services.riva` package. Use
|
||
`pipecat.services.nvidia.stt` and `pipecat.services.nvidia.tts` instead
|
||
(`RivaSTTService` → `NvidiaSTTService`, `RivaTTSService` →
|
||
`NvidiaTTSService`).
|
||
(PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))
|
||
|
||
- ⚠️ Removed deprecated compatibility modules:
|
||
`pipecat.services.openai_realtime_beta` (use
|
||
`pipecat.services.openai.realtime`),
|
||
`pipecat.services.openai_realtime.context`,
|
||
`pipecat.services.openai_realtime.frames`,
|
||
`pipecat.services.openai.realtime.context`,
|
||
`pipecat.services.openai.realtime.frames`,
|
||
`pipecat.services.gemini_multimodal_live` (use
|
||
`pipecat.services.google.gemini_live`),
|
||
`pipecat.services.aws_nova_sonic.context` (use
|
||
`pipecat.services.aws.nova_sonic`), `pipecat.services.google.openai` and
|
||
`pipecat.services.google.llm_openai` (use `pipecat.services.google.llm`).
|
||
(PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))
|
||
|
||
- ⚠️ Removed `VisionImageFrameAggregator` (from
|
||
`pipecat.processors.aggregators.vision_image_frame`). Vision/image handling
|
||
is now built into `LLMContext` (from
|
||
`pipecat.processors.aggregators.llm_context`). See the `12*` examples for the
|
||
recommended replacement pattern.
|
||
(PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))
|
||
|
||
- ⚠️ Removed `OpenAILLMContext`, `OpenAILLMContextFrame`, and
|
||
`OpenAILLMContext.from_messages()`. Use `LLMContext` (from
|
||
`pipecat.processors.aggregators.llm_context`) and `LLMContextFrame` (from
|
||
`pipecat.frames.frames`) instead. All services now exclusively use the
|
||
universal `LLMContext`.
|
||
|
||
From the developer's point of view, migrating will usually be a matter of
|
||
going from this:
|
||
|
||
```python
|
||
context = OpenAILLMContext(messages, tools)
|
||
context_aggregator = llm.create_context_aggregator(context)
|
||
```
|
||
|
||
To this:
|
||
|
||
```python
|
||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
|
||
|
||
context = LLMContext(messages, tools)
|
||
context_aggregator = LLMContextAggregatorPair(context)
|
||
```
|
||
(PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))
|
||
|
||
- ⚠️ Removed deprecated frame types `LLMMessagesFrame` and
|
||
`OpenAILLMContextAssistantTimestampFrame` from `pipecat.frames.frames`.
|
||
Instead of `LLMMessagesFrame`, use `LLMContextFrame` with the new messages,
|
||
or `LLMMessagesUpdateFrame` with `run_llm=True`.
|
||
(PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))
|
||
|
||
- ⚠️ Removed `GatedOpenAILLMContextAggregator` (from
|
||
`pipecat.processors.aggregators.gated_open_ai_llm_context`). Use
|
||
`GatedLLMContextAggregator` (from
|
||
`pipecat.processors.aggregators.gated_llm_context`) instead.
|
||
(PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))
|
||
|
||
- ⚠️ Removed deprecated service-specific context and aggregator machinery,
|
||
which was superseded by the universal `LLMContext` system.
|
||
|
||
Service-specific classes removed: `AnthropicLLMContext`,
|
||
`AnthropicContextAggregatorPair`, `AWSBedrockLLMContext`,
|
||
`AWSBedrockContextAggregatorPair`, `OpenAIContextAggregatorPair`, and their
|
||
user/assistant aggregators. Also removed `create_context_aggregator()` from
|
||
`LLMService`, `OpenAILLMService`, `AnthropicLLMService`, and
|
||
`AWSBedrockLLMService`.
|
||
|
||
Base aggregator classes removed (from
|
||
`pipecat.processors.aggregators.llm_response`): `BaseLLMResponseAggregator`,
|
||
`LLMContextResponseAggregator`, `LLMUserContextAggregator`,
|
||
`LLMAssistantContextAggregator`, `LLMUserResponseAggregator`,
|
||
`LLMAssistantResponseAggregator`.
|
||
|
||
From the developer's point of view, migrating will usually be a matter of
|
||
going from this:
|
||
|
||
```python
|
||
context = OpenAILLMContext(messages, tools)
|
||
context_aggregator = llm.create_context_aggregator(context)
|
||
```
|
||
|
||
To this:
|
||
|
||
```python
|
||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
|
||
|
||
context = LLMContext(messages, tools)
|
||
context_aggregator = LLMContextAggregatorPair(context)
|
||
```
|
||
(PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))
|
||
|
||
- ⚠️ Removed deprecated service parameters and shims that have been replaced by
|
||
the `settings=Service.Settings(...)` pattern or direct `__init__` parameters:
|
||
- `PollyTTSService` alias (use `AWSTTSService`)
|
||
- `TTSService`: `text_aggregator`, `text_filter` init params
|
||
- `AWSNovaSonicLLMService`: `send_transcription_frames` init param
|
||
- `DeepgramSTTService`: `url` init param (use `base_url`)
|
||
- `FishAudioTTSService`: `model` init param (use `reference_id` or
|
||
`settings`)
|
||
- `GladiaSTTService`: `language` and `confidence` from `GladiaInputParams`,
|
||
`InputParams` class alias
|
||
- `GeminiTTSService`: `api_key` init param
|
||
- `GeminiLiveLLMService`: `base_url` init param (use `http_options`)
|
||
- `GoogleVertexLLMService`: `InputParams` class with
|
||
`location`/`project_id` fields (use direct init params); `project_id` is now
|
||
required, `location` defaults to `"us-east4"`
|
||
- `MiniMaxHttpTTSService`: `english_normalization` from `InputParams` (use
|
||
`text_normalization`)
|
||
- `SimliVideoService`: `simli_config` init param (use `api_key`/`face_id`),
|
||
`use_turn_server` init param; `api_key` and `face_id` are now required
|
||
- `AnthropicLLMService`: `enable_prompt_caching_beta` from `InputParams`
|
||
(use `enable_prompt_caching`)
|
||
(PR [#4220](https://github.com/pipecat-ai/pipecat/pull/4220))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.transports.services` and
|
||
`pipecat.transports.network` module aliases. Update imports to use
|
||
`pipecat.transports.daily.transport`, `pipecat.transports.livekit.transport`,
|
||
`pipecat.transports.websocket.*`, `pipecat.transports.webrtc.*`, and
|
||
`pipecat.transports.daily.utils` respectively.
|
||
(PR [#4225](https://github.com/pipecat-ai/pipecat/pull/4225))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.sync` package. Use `pipecat.utils.sync`
|
||
instead.
|
||
(PR [#4225](https://github.com/pipecat-ai/pipecat/pull/4225))
|
||
|
||
- ⚠️ Removed deprecated `TranscriptionMessage`, `ThoughtTranscriptionMessage`,
|
||
and `TranscriptionUpdateFrame` from `pipecat.frames.frames`.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `allow_interruptions` parameter from `PipelineParams`,
|
||
`StartFrame`, and `FrameProcessor`. Interruptions are now always allowed by
|
||
default. Use `LLMUserAggregator`'s `user_turn_strategies` /
|
||
`user_mute_strategies` parameters to control interruption behavior.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `STTMuteFilter`, `STTMuteConfig`, and `STTMuteStrategy`
|
||
from `pipecat.processors.filters.stt_mute_filter`. Use
|
||
`pipecat.turns.user_mute` strategies with `LLMUserAggregator`'s
|
||
`user_mute_strategies` parameter instead.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.processors.transcript_processor` module
|
||
(`TranscriptProcessor`, `TranscriptProcessorConfig`). Use pipeline observers
|
||
instead.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `EmulateUserStartedSpeakingFrame` and
|
||
`EmulateUserStoppedSpeakingFrame` frames, and the `emulated` field from
|
||
`UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame`.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `interruption_strategies` parameter from
|
||
`PipelineParams`, `StartFrame`, and `FrameProcessor`. Use
|
||
`LLMUserAggregator`'s `user_turn_strategies` parameter instead.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.audio.interruptions` module
|
||
(`BaseInterruptionStrategy`, `MinWordsInterruptionStrategy`). Use
|
||
`pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with
|
||
`LLMUserAggregator`'s `user_turn_strategies` parameter instead.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `pipecat.utils.tracing.class_decorators` module. Use
|
||
`pipecat.utils.tracing.service_decorators` instead.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `add_pattern_pair` method from `PatternPairAggregator`.
|
||
Use `add_pattern` instead.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed deprecated `UserResponseAggregator` class from
|
||
`pipecat.processors.aggregators.user_response`. Use `LLMUserAggregator`
|
||
instead.
|
||
(PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))
|
||
|
||
- ⚠️ Removed `ExternalUserTurnStrategies` and the automatic fallback to it in
|
||
`LLMUserAggregator` when a `SpeechControlParamsFrame` was received from the
|
||
transport.
|
||
(PR [#4229](https://github.com/pipecat-ai/pipecat/pull/4229))
|
||
|
||
- ⚠️ Removed `vad_analyzer` and `turn_analyzer` parameters from
|
||
`TransportParams` and all transport input classes, along with all deprecated
|
||
VAD/turn analysis logic in `BaseInputTransport`. VAD and turn detection are
|
||
now handled entirely by `LLMUserAggregator`.
|
||
(PR [#4229](https://github.com/pipecat-ai/pipecat/pull/4229))
|
||
|
||
- ⚠️ Removed deprecated `TranscriptionUserTurnStopStrategy` alias (deprecated
|
||
in 0.0.102). Use `SpeechTimeoutUserTurnStopStrategy` instead.
|
||
(PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))
|
||
|
||
- ⚠️ Removed deprecated `vad_events` setting and `should_interrupt` parameter
|
||
from `DeepgramSTTService` (deprecated in 0.0.99). Use Silero VAD for voice
|
||
activity detection instead.
|
||
(PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))
|
||
|
||
- ⚠️ Removed deprecated `send_transcription_frames` parameter from
|
||
`OpenAIRealtimeLLMService` (deprecated in 0.0.92). Transcription frames are
|
||
always sent.
|
||
(PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))
|
||
|
||
- ⚠️ Removed deprecated `UserIdleProcessor` (deprecated in 0.0.100). Use
|
||
`LLMUserAggregator` with the `user_idle_timeout` parameter instead.
|
||
(PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))
|
||
|
||
- ⚠️ Removed deprecated `UserBotLatencyLogObserver` (deprecated in 0.0.102).
|
||
Use `UserBotLatencyObserver` with its `on_latency_measured` event handler
|
||
instead.
|
||
(PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))
|
||
|
||
- ⚠️ Removed the `riva` install extra. Use `nvidia` instead (`pip install
|
||
"pipecat-ai[nvidia]"`).
|
||
(PR [#4235](https://github.com/pipecat-ai/pipecat/pull/4235))
|
||
|
||
- Removed the empty `remote-smart-turn` install extra (was already a no-op).
|
||
(PR [#4235](https://github.com/pipecat-ai/pipecat/pull/4235))
|
||
|
||
- ⚠️ Removed `DeprecatedModuleProxy` and all service `__init__.py` re-export
|
||
shims. Flat imports like `from pipecat.services.openai import
|
||
OpenAILLMService` no longer work. Use the full submodule path instead: `from
|
||
pipecat.services.openai.llm import OpenAILLMService`. This is already the
|
||
established pattern across all examples and internal code.
|
||
(PR [#4239](https://github.com/pipecat-ai/pipecat/pull/4239))
|
||
|
||
- ⚠️ Removed deprecated `PIPECAT_OBSERVER_FILES` environment variable support.
|
||
Use `PIPECAT_SETUP_FILES` instead.
|
||
(PR [#4267](https://github.com/pipecat-ai/pipecat/pull/4267))
|
||
|
||
### Fixed
|
||
|
||
- Fixed `IdleFrameProcessor` where `asyncio.Event` was unconditionally cleared
|
||
in a `finally` block instead of only on the success path.
|
||
(PR [#3796](https://github.com/pipecat-ai/pipecat/pull/3796))
|
||
|
||
- Fixed MCPClient opening a new connection for every tool call instead of
|
||
reusing the session.
|
||
(PR [#4034](https://github.com/pipecat-ai/pipecat/pull/4034))
|
||
|
||
- GoogleLLMService now applies a low-latency thinking default
|
||
(`thinking_level="minimal"`) for Gemini 3+ Flash models.
|
||
(PR [#4067](https://github.com/pipecat-ai/pipecat/pull/4067))
|
||
|
||
- Fixed `WebsocketService` entering an infinite reconnection loop when a server
|
||
accepts the WebSocket handshake but immediately closes the connection (e.g.
|
||
invalid API key, close code 1008). The service now detects connections that
|
||
fail repeatedly within seconds of being established and stops retrying after
|
||
3 consecutive quick failures.
|
||
(PR [#4201](https://github.com/pipecat-ai/pipecat/pull/4201))
|
||
|
||
- Fixed `InworldHttpTTSService` streaming responses crashing with
|
||
`UnicodeDecodeError` when multi-byte UTF-8 characters were split across chunk
|
||
boundaries. This caused TTS audio to cut off mid-sentence intermittently.
|
||
(PR [#4202](https://github.com/pipecat-ai/pipecat/pull/4202))
|
||
|
||
- Fixed a crash (`JSONDecodeError`) when a user interruption occurs while the
|
||
LLM is streaming function call arguments. Previously, the incomplete JSON
|
||
arguments were passed directly to `json.loads()`, causing an unhandled
|
||
exception. Affected services: OpenAI, Google (OpenAI-compatible), and
|
||
SambaNova.
|
||
(PR [#4203](https://github.com/pipecat-ai/pipecat/pull/4203))
|
||
|
||
- Fixed `BaseOutputTransport` discarding pending `UninterruptibleFrame` items
|
||
(e.g. function-call context updates) when an interruption arrived. The audio
|
||
task is now kept alive and only interruptible frames are drained when
|
||
uninterruptible frames are present in the queue.
|
||
(PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))
|
||
|
||
- Fixed spurious LLM inference being triggered when a function call result
|
||
arrived while the user was actively speaking. The context frame is now
|
||
suppressed until the user stops speaking.
|
||
(PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))
|
||
|
||
- Fixed `CartesiaTTSService` failing with "Context has closed" errors when
|
||
switching voice, model, or language via `TTSUpdateSettingsFrame`. The service
|
||
now automatically flushes the current audio context and opens a fresh one
|
||
when these settings change.
|
||
(PR [#4220](https://github.com/pipecat-ai/pipecat/pull/4220))
|
||
|
||
- Fixed duplicate LLM replies that could occur when multiple async function
|
||
call results arrived while an LLM request was already queued.
|
||
(PR [#4230](https://github.com/pipecat-ai/pipecat/pull/4230))
|
||
|
||
- Fixed undefined `_warn_deprecated_param` calls in `OpenAIRealtimeLLMService`
|
||
and `GrokRealtimeLLMService` for the deprecated `session_properties` init
|
||
parameter.
|
||
(PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))
|
||
|
||
- Fixed Gemini Live bot hanging after a session resumption reconnect. Audio,
|
||
video, and text input were silently dropped after reconnecting because the
|
||
internal `_ready_for_realtime_input` flag was not being reset.
|
||
(PR [#4242](https://github.com/pipecat-ai/pipecat/pull/4242))
|
||
|
||
- Fixed `VADController` getting stuck in the `SPEAKING` state when audio frames
|
||
stop arriving mid-speech (e.g. user mutes mic). A new `audio_idle_timeout`
|
||
parameter (default 1s, set to 0 to disable) forces a transition back to
|
||
`QUIET` and emits `on_speech_stopped` when no audio is received while
|
||
speaking.
|
||
(PR [#4244](https://github.com/pipecat-ai/pipecat/pull/4244))
|
||
|
||
- Fixed `PipelineRunner._gc_collect()` blocking the event loop by running
|
||
`gc.collect()` synchronously. Now offloaded via `asyncio.to_thread` to avoid
|
||
stalling concurrent pipeline tasks.
|
||
(PR [#4255](https://github.com/pipecat-ai/pipecat/pull/4255))
|
||
|
||
- Fixed `ElevenLabsTTSService` incorrectly enabling `auto_mode` when using
|
||
`TextAggregationMode.TOKEN`. Auto mode disables server-side buffering and is
|
||
designed for complete sentences — enabling it with token streaming degraded
|
||
speech quality. The default is now derived automatically from the aggregation
|
||
strategy: `auto_mode=True` for `SENTENCE`, `auto_mode=False` for `TOKEN`.
|
||
Callers can still override by passing `auto_mode` explicitly.
|
||
(PR [#4265](https://github.com/pipecat-ai/pipecat/pull/4265))
|
||
|
||
- Fixed `ValueError: write to closed file` during pipeline shutdown when
|
||
observers were active. Observer proxy tasks are now cancelled before observer
|
||
resources are cleaned up.
|
||
(PR [#4267](https://github.com/pipecat-ai/pipecat/pull/4267))
|
||
|
||
- Fixed delayed turn completion when STT transcripts arrive after the p99
|
||
timeout. Previously, a late transcript (beyond the p99 window) would fall
|
||
through to the 5-second `user_turn_stop_timeout` fallback. Now the turn stop
|
||
triggers immediately when the late transcript arrives.
|
||
(PR [#4283](https://github.com/pipecat-ai/pipecat/pull/4283))
|
||
|
||
- Fixed `ElevenLabsTTSService` ignoring `enable_logging=False` and
|
||
`enable_ssml_parsing=False`. The truthy check treated `False` the same as
|
||
`None` (both skipped), and Python's `str(False)` produced `"False"` instead
|
||
of the lowercase `"false"` expected by the API.
|
||
(PR [#4293](https://github.com/pipecat-ai/pipecat/pull/4293))
|
||
|
||
- Fixed `on_assistant_turn_stopped` not resetting internal state when the LLM
|
||
returned no text tokens. Added `interrupted` field to
|
||
`AssistantTurnStoppedMessage` to indicate whether the assistant turn was
|
||
interrupted.
|
||
(PR [#4294](https://github.com/pipecat-ai/pipecat/pull/4294))
|
||
|
||
- Fixed `LLMContextSummarizer` failing with "No messages to summarize" when
|
||
using `system_instruction` instead of a system-role message at the start of
|
||
the context. The summarizer previously scanned the entire context for the
|
||
first system message, which could match a mid-conversation injection (e.g.
|
||
idle notifications) instead of the initial prompt, causing the summarization
|
||
range to be empty.
|
||
(PR [#4295](https://github.com/pipecat-ai/pipecat/pull/4295))
|
||
|
||
## [0.0.108] - 2026-03-27
|
||
|
||
### Added
|
||
|
||
- Added `SarvamLLMService` with support for `sarvam-30b`, `sarvam-30b-16k`,
|
||
`sarvam-105b` and `sarvam-105b-32k`.
|
||
(PR [#3978](https://github.com/pipecat-ai/pipecat/pull/3978))
|
||
|
||
- Added `on_turn_context_created(context_id)` hook to `TTSService`. Override
|
||
this to perform provider-specific setup (e.g. eagerly opening a server-side
|
||
context) before text starts flowing. Called each time a new turn context ID
|
||
is created.
|
||
(PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))
|
||
|
||
- Added `XAIHttpTTSService` for text-to-speech using xAI's HTTP TTS API.
|
||
(PR [#4031](https://github.com/pipecat-ai/pipecat/pull/4031))
|
||
|
||
- Added support for "developer" role messages in conversation context across
|
||
all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock),
|
||
"developer" messages are converted to "user" messages (use
|
||
`system_instruction` to set the system instruction). For OpenAI services,
|
||
"developer" messages pass through in conversation history. For the Responses
|
||
API, they are kept as "developer" role (matching the existing "system" →
|
||
"developer" conversion).
|
||
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
|
||
|
||
- Added `SmallestTTSService`, a WebSocket-based TTS service integration with
|
||
Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with
|
||
configurable voice, language, speed, consistency, similarity, and enhancement
|
||
settings.
|
||
(PR [#4092](https://github.com/pipecat-ai/pipecat/pull/4092))
|
||
|
||
- Added warnings in turn stop strategies when `VADParams.stop_secs` differs
|
||
from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`,
|
||
which collapses the STT wait timeout to 0s and may cause delayed turn
|
||
detection. The warnings guide developers to re-run the
|
||
[stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD
|
||
settings.
|
||
(PR [#4115](https://github.com/pipecat-ai/pipecat/pull/4115))
|
||
|
||
- Added `domain` parameter to `AssemblyAISTTSettings` for specialized
|
||
recognition modes such as Medical Mode (`domain="medical-v1"`).
|
||
(PR [#4117](https://github.com/pipecat-ai/pipecat/pull/4117))
|
||
|
||
- Added `NovitaLLMService` for using Novita AI's LLM models via their
|
||
OpenAI-compatible API.
|
||
(PR [#4119](https://github.com/pipecat-ai/pipecat/pull/4119))
|
||
|
||
- Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer
|
||
resources are properly released when no longer needed. Custom `VADAnalyzer`
|
||
subclasses can override `cleanup()` to free any held resources.
|
||
(PR [#4120](https://github.com/pipecat-ai/pipecat/pull/4120))
|
||
|
||
- Added `on_end_of_turn` event handler to `AssemblyAISTTService`. This fires
|
||
after the final transcript is pushed, providing a reliable hook for
|
||
end-of-turn logic that doesn't race with `TranscriptionFrame`. Works in both
|
||
Pipecat and AssemblyAI turn detection modes.
|
||
(PR [#4128](https://github.com/pipecat-ai/pipecat/pull/4128))
|
||
|
||
- Added `DeepgramFluxSageMakerSTTService` for running Deepgram Flux
|
||
speech-to-text on AWS SageMaker endpoints. Use with
|
||
`ExternalUserTurnStrategies` to take advantage of Flux's turn detection.
|
||
(PR [#4143](https://github.com/pipecat-ai/pipecat/pull/4143))
|
||
|
||
- Added `Mem0MemoryService.get_memories()` convenience method for retrieving
|
||
all stored memories outside the pipeline (e.g. to build a personalized
|
||
greeting at connection time). This avoids the need to manually handle client
|
||
type branching, filter construction, and async wrapping.
|
||
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
|
||
|
||
### Changed
|
||
|
||
- Added context prewarming path for `InworldTTSService` to improve first audio
|
||
latency.
|
||
(PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))
|
||
|
||
- Added `KrispVivaVadAnalyzer` for Voice Activity Detection using the Krisp
|
||
VIVA SDK (requires `krisp_audio`).
|
||
(PR [#4022](https://github.com/pipecat-ai/pipecat/pull/4022))
|
||
|
||
- Modified `InworldTTSService` to close context at end of turn instead of
|
||
relying on idle timeout.
|
||
(PR [#4028](https://github.com/pipecat-ai/pipecat/pull/4028))
|
||
|
||
- Added Gemini 3 support to the Gemini Live service.
|
||
(PR [#4078](https://github.com/pipecat-ai/pipecat/pull/4078))
|
||
|
||
- `TTSService`: the default `stop_frame_timeout_s` (idle time before an
|
||
automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has
|
||
changed from `2.0` to `3.0` seconds.
|
||
(PR [#4084](https://github.com/pipecat-ai/pipecat/pull/4084))
|
||
|
||
- ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system
|
||
message, matching all other adapters. Previously it searched for the first
|
||
"system" message anywhere in the conversation history. A "system" message
|
||
appearing later in the list will now be converted to "user" instead of being
|
||
extracted as the system instruction.
|
||
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
|
||
|
||
- Fixed `InworldTtsService` to fallback to full text when TTS timestamps are
|
||
not received.
|
||
(PR [#4113](https://github.com/pipecat-ai/pipecat/pull/4113))
|
||
|
||
- ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova
|
||
Sonic) now prefer `system_instruction` from service settings over an initial
|
||
system message in the LLM context, matching the behavior of non-realtime
|
||
services. Previously, context-provided system instructions took precedence. A
|
||
warning is now logged when both are set.
|
||
(PR [#4130](https://github.com/pipecat-ai/pipecat/pull/4130))
|
||
|
||
- Bumped `nvidia-riva-client` minimum version to `>=2.25.1`.
|
||
(PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))
|
||
|
||
- Upgraded `protobuf` from 5.x to 6.x (`>=6.31.1,<7`).
|
||
(PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))
|
||
|
||
- Unrecognized language strings (e.g. Deepgram's `"multi"`) no longer produce a
|
||
warning at startup. The log message has been downgraded to debug level since
|
||
these are valid service-specific values that are passed through correctly.
|
||
(PR [#4137](https://github.com/pipecat-ai/pipecat/pull/4137))
|
||
|
||
- `GrokLLMService` and `GrokRealtimeLLMService` now live in the
|
||
`pipecat.services.xai` module alongside `XAIHttpTTSService`, since all three
|
||
use the same xAI API. Update imports from `pipecat.services.grok.*` to
|
||
`pipecat.services.xai.*` (e.g. `from pipecat.services.xai.llm import
|
||
GrokLLMService`).
|
||
(PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))
|
||
|
||
- ⚠️ Bumped `mem0ai` dependency from `~=0.1.94` to `>=1.0.8,<2`. Users of the
|
||
`mem0` extra will need to update their mem0ai package.
|
||
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
|
||
|
||
### Deprecated
|
||
|
||
- `pipecat.services.grok.llm`, `pipecat.services.grok.realtime.llm`, and
|
||
`pipecat.services.grok.realtime.events` are deprecated. The old import paths
|
||
still work but emit a `DeprecationWarning`; use `pipecat.services.xai.llm`,
|
||
`pipecat.services.xai.realtime.llm`, and
|
||
`pipecat.services.xai.realtime.events` instead.
|
||
(PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))
|
||
|
||
### Removed
|
||
|
||
- ⚠️ `TTSService.add_word_timestamps()` no longer supports the `"Reset"` and
|
||
`"TTSStoppedFrame"` sentinel strings. If you have a custom TTS service that
|
||
called `await self.add_word_timestamps([("Reset", 0)])` or `await
|
||
self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id)`,
|
||
replace them with `await self.append_to_audio_context(ctx_id,
|
||
TTSStoppedFrame(context_id=ctx_id))` and let `_handle_audio_context` manage
|
||
the word-timestamp reset automatically.
|
||
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
|
||
|
||
- Removed `SambaNovaSTTService`. SambaNova no longer offers speech-to-text
|
||
audio models. Use another STT provider instead.
|
||
(PR [#4154](https://github.com/pipecat-ai/pipecat/pull/4154))
|
||
|
||
### Fixed
|
||
|
||
- Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring
|
||
`settings.system_instruction`. The system instruction was being read from a
|
||
deprecated constructor parameter instead of the settings object, causing it
|
||
to be silently ignored.
|
||
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
|
||
|
||
- Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when
|
||
the only message in context was a system message. The lone system message is
|
||
now converted to "user" role instead of being extracted, matching the
|
||
existing Anthropic adapter behavior.
|
||
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
|
||
|
||
- Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was
|
||
deferred while waiting for the bot to finish responding and `turn_complete`
|
||
never arrived. As a possible root-cause fix, `turn_complete` messages are now
|
||
handled even if they lack `usage_metadata`. As a fallback, the deferred
|
||
`EndFrame` now has a 30-second safety timeout.
|
||
(PR [#4125](https://github.com/pipecat-ai/pipecat/pull/4125))
|
||
|
||
- Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous
|
||
contexts exceeded") caused by rapid user interruptions. When interruptions
|
||
arrived before any TTS text was generated, phantom contexts were created on
|
||
the ElevenLabs server that were never closed, eventually exceeding the
|
||
5-context limit.
|
||
(PR [#4126](https://github.com/pipecat-ai/pipecat/pull/4126))
|
||
|
||
- Fixed the final sentence being dropped from the conversation context when
|
||
using RTVI text input with non-word-timestamp TTS services. The
|
||
`LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`,
|
||
causing the `LLMAssistantAggregator` to finalize the context before the final
|
||
sentence arrived.
|
||
(PR [#4127](https://github.com/pipecat-ai/pipecat/pull/4127))
|
||
|
||
- Fixed audio crackling and popping in recordings when both user and bot are
|
||
speaking. `AudioBufferProcessor` no longer injects silence into a track's
|
||
buffer while that track is actively producing audio, preventing mid-utterance
|
||
interruptions in the recorded output.
|
||
(PR [#4135](https://github.com/pipecat-ai/pipecat/pull/4135))
|
||
|
||
- Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale
|
||
words or backward PTS values into later turns.
|
||
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
|
||
|
||
- Fixed a race condition in `InterruptibleTTSService` where, if `run_tts` had
|
||
been invoked but `BotStartedSpeakingFrame` had not yet been received, a user
|
||
interruption could allow stale audio to leak through.
|
||
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
|
||
|
||
- Fixed Gemini Live local VAD mode (`GeminiVADParams(disabled=True)` with
|
||
external VAD) not working. The bot now correctly detects user speech and
|
||
signals turn boundaries to the Gemini API.
|
||
(PR [#4146](https://github.com/pipecat-ai/pipecat/pull/4146))
|
||
|
||
- Fixed Gemini Live message handling to process all `server_content` fields
|
||
independently. Gemini 3.x can bundle multiple fields (e.g. `model_turn` and
|
||
`output_transcription`) on the same message, but the previous `elif` chain
|
||
only processed the first match, silently dropping the rest.
|
||
(PR [#4147](https://github.com/pipecat-ai/pipecat/pull/4147))
|
||
|
||
- Fixed `ServiceSwitcher` with `ServiceSwitcherStrategyFailover` incorrectly
|
||
triggering failover when `ErrorFrame`s from other pipeline stages (e.g. TTS)
|
||
propagated upstream through the switcher. Previously, any non-fatal error
|
||
passing through would be misattributed to the active service and trigger an
|
||
unwanted service switch. Now only errors originating from the switcher's own
|
||
managed services trigger failover.
|
||
(PR [#4149](https://github.com/pipecat-ai/pipecat/pull/4149))
|
||
|
||
- Fixed `LiveKitOutputTransport` not clearing the `rtc.AudioSource` internal
|
||
buffer on interruption, causing the bot to continue speaking for several
|
||
seconds after being interrupted.
|
||
(PR [#4151](https://github.com/pipecat-ai/pipecat/pull/4151))
|
||
|
||
- Fixed a crash in OpenAI LLM processing when the provider returns
|
||
`chunk.choices[0].delta.audio = None`, which caused `'NoneType' object has no
|
||
attribute 'get'` errors during audio transcript handling.
|
||
(PR [#4152](https://github.com/pipecat-ai/pipecat/pull/4152))
|
||
|
||
- Fixed error floods in `DeepgramSTTService` when the WebSocket connection
|
||
drops. With Deepgram SDK 6.x, `send_media()` raises exceptions on a dead
|
||
connection instead of silently failing, causing every queued audio frame to
|
||
log an error. Now `send_media()` failures are caught gracefully — a single
|
||
warning is logged and audio frames are skipped until the existing
|
||
reconnection logic restores the connection.
|
||
(PR [#4153](https://github.com/pipecat-ai/pipecat/pull/4153))
|
||
|
||
- `Mem0MemoryService` no longer blocks the event loop during memory storage and
|
||
retrieval. All Mem0 API calls now run in a background thread, and message
|
||
storage is fire-and-forget so it doesn't delay downstream processing.
|
||
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
|
||
|
||
- Fixed `Mem0MemoryService` failing to store messages when the context
|
||
contained system or developer role messages. The Mem0 API only accepts user
|
||
and assistant roles, so other roles are now filtered out before storing.
|
||
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
|
||
|
||
- Added missing `on_dtmf_event` callback to `LemonSliceTransportClient.setup()`
|
||
`DailyCallbacks` construction, fixing a `ValidationError` at pipeline setup
|
||
time.
|
||
(PR [#4161](https://github.com/pipecat-ai/pipecat/pull/4161))
|
||
|
||
- Fixed an issue in `InworldTTSService` where, in cases of fast interruption,
|
||
we would continue receiving audio from the previous context.
|
||
(PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))
|
||
|
||
- Fixed a word timestamp interleaving issue in `InworldTTSService` when
|
||
processing multiple sentences.
|
||
(PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))
|
||
|
||
- Fixed duplicate `TTSStoppedFrame` being pushed in TTS services using
|
||
`push_stop_frames=True`. When the stop-frame timeout fired, a second
|
||
`TTSStoppedFrame` could be pushed after the normal one at context completion.
|
||
(PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))
|
||
|
||
- ⚠️ Fixed `DeepgramSTTService` compatibility with deepgram-sdk 6.1.0. The SDK
|
||
now requires explicit message objects for `send_keep_alive()`,
|
||
`send_close_stream()`, and `send_finalize()`. The minimum deepgram-sdk
|
||
version is now 6.1.0.
|
||
(PR [#4174](https://github.com/pipecat-ai/pipecat/pull/4174))
|
||
|
||
- Fixed RTVI events not being delivered to clients when using WebSocket
|
||
transports. `ProtobufFrameSerializer` now sets `ignore_rtvi_messages=False`
|
||
by default.
|
||
(PR [#4176](https://github.com/pipecat-ai/pipecat/pull/4176))
|
||
|
||
- Fixed a timing issue where turn detection timer tasks (idle controller,
|
||
speech timeout, turn analyzer, and turn completion) could miss their first
|
||
tick because the newly created asyncio task was not yet scheduled when the
|
||
caller continued.
|
||
(PR [#4183](https://github.com/pipecat-ai/pipecat/pull/4183))
|
||
|
||
- Fixed `FastAPIWebsocketTransport` intermittently hanging on shutdown when the
|
||
remote side (e.g. Twilio) disconnects while audio is being sent. A race
|
||
condition between the send and receive paths could cause the
|
||
`on_client_disconnected` callback to be skipped, leaving the pipeline waiting
|
||
for a disconnect signal that never came.
|
||
(PR [#4186](https://github.com/pipecat-ai/pipecat/pull/4186))
|
||
|
||
### Performance
|
||
|
||
- `RimeTTSService` now handles Rime's `done` WebSocket message to complete
|
||
audio contexts immediately, eliminating the 3-second idle timeout that
|
||
previously added latency at the end of each utterance.
|
||
(PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))
|
||
|
||
## [0.0.107] - 2026-03-23
|
||
|
||
### Added
|
||
|
||
- Added `frame_order` parameter to `SyncParallelPipeline`. Set
|
||
`frame_order=FrameOrder.PIPELINE` to push synchronized output frames in
|
||
pipeline definition order (all frames from the first pipeline, then the
|
||
second, etc.) instead of the default arrival order.
|
||
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
|
||
|
||
- Added `sync_with_audio` field to `OutputImageRawFrame`. When set to `True`,
|
||
the output transport queues image frames with audio so they are displayed
|
||
only after all preceding audio has been sent, enabling synchronized
|
||
audio/image playback.
|
||
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
|
||
|
||
- Added `OpenAIResponsesLLMService`, a new LLM service that uses the OpenAI
|
||
Responses API. Supports streaming text, function calling, usage metrics, and
|
||
out-of-band inference. Works with the universal `LLMContext` and
|
||
`LLMContextAggregatorPair`. See
|
||
`examples/foundational/07-interruptible-openai-responses.py` and
|
||
`14-function-calling-openai-responses.py`.
|
||
(PR [#4074](https://github.com/pipecat-ai/pipecat/pull/4074))
|
||
|
||
- Added `audio_out_auto_silence` parameter to `TransportParams` (defaults to
|
||
`True`). When set to `False`, the transport waits for audio data instead of
|
||
inserting silence when the output queue is empty, which is useful for
|
||
scenarios that require uninterrupted audio playback without artificial gaps.
|
||
(PR [#4104](https://github.com/pipecat-ai/pipecat/pull/4104))
|
||
|
||
### Changed
|
||
|
||
- Renamed tracing span attributes to align with OpenTelemetry GenAI semantic
|
||
conventions: `gen_ai.system` to `gen_ai.provider.name`, `system` to
|
||
`gen_ai.system_instructions`, `gen_ai.usage.cache_read_input_tokens` to
|
||
`gen_ai.usage.cache_read.input_tokens`, and
|
||
`gen_ai.usage.cache_creation_input_tokens` to
|
||
`gen_ai.usage.cache_creation.input_tokens`.
|
||
(PR [#3449](https://github.com/pipecat-ai/pipecat/pull/3449))
|
||
|
||
- `DeepgramSageMakerTTSService` now correctly routes audio through the base
|
||
`TTSService` audio context queue. Audio frames are delivered via
|
||
`append_to_audio_context()` instead of being pushed directly, enabling proper
|
||
ordering, interruption handling, and start/stop frame lifecycle management.
|
||
Interruptions now trigger a `Clear` message to Deepgram (flushing its text
|
||
buffer) at the right time via `on_audio_context_interrupted`.
|
||
(PR [#4083](https://github.com/pipecat-ai/pipecat/pull/4083))
|
||
|
||
- `GradiumTTSService` now sends a per-context `setup` message with
|
||
`client_req_id` before the first text message for each TTS context, following
|
||
Gradium's multiplexing protocol. Previously, a single setup message was sent
|
||
at connection time without a `client_req_id`, which prevented Gradium from
|
||
associating requests with their sessions when using `close_ws_on_eos=False`.
|
||
(PR [#4091](https://github.com/pipecat-ai/pipecat/pull/4091))
|
||
|
||
### Fixed
|
||
|
||
- Fixed stale `system_instruction` in LLM tracing spans by reading from
|
||
`_settings.system_instruction` instead of the removed `_system_instruction`
|
||
attribute.
|
||
(PR [#3449](https://github.com/pipecat-ai/pipecat/pull/3449))
|
||
|
||
- Fixed `SyncParallelPipeline` breaking the Whisker debugger.
|
||
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
|
||
|
||
- Fixed `SyncParallelPipeline` race condition where concurrent SystemFrame
|
||
processing (e.g. from RTVI) could corrupt sink queues and cause deadlocks.
|
||
SystemFrames now take a fast path that passes them through without draining
|
||
queued output.
|
||
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
|
||
|
||
- Fixed TTS frame ordering so that non-system frames always arrive in correct
|
||
order relative to the `TTSStartedFrame`/`TTSAudioRawFrame`/`TTSStoppedFrame`
|
||
sequence. Previously these frames could race ahead of or behind audio context
|
||
frames, producing out-of-order output downstream.
|
||
(PR [#4075](https://github.com/pipecat-ai/pipecat/pull/4075))
|
||
|
||
- Fixed `SarvamTTSService` audio and error frames now route through
|
||
`append_to_audio_context()` instead of `push_frame()`, ensuring correct
|
||
behavior with audio contexts and interruptions.
|
||
(PR [#4082](https://github.com/pipecat-ai/pipecat/pull/4082))
|
||
|
||
- Fixed audio frame ordering and interruption handling in Fish Audio, LMNT,
|
||
Neuphonic, and Rime NonJson TTS services. These services were bypassing the
|
||
base `TTSService` audio context serialization queue by pushing audio frames
|
||
directly, which could cause out-of-order frames and broken interruptions
|
||
during speech.
|
||
(PR [#4090](https://github.com/pipecat-ai/pipecat/pull/4090))
|
||
|
||
- Fixed Genesys AudioHook serializer to always include the `parameters` field in
|
||
protocol messages. The AudioHook protocol requires every message to carry a
|
||
`parameters` object (even if empty), but `_create_message` omitted it when no
|
||
parameters were provided. This caused clients that validate message structure
|
||
(including the Genesys reference implementation) to reject `pong` and
|
||
parameter-less `closed` responses, breaking server sequence tracking and
|
||
preventing `outputVariables` from reaching the Architect flow.
|
||
(PR [#4093](https://github.com/pipecat-ai/pipecat/pull/4093))
|
||
|
||
## [0.0.106] - 2026-03-18
|
||
|
||
### Added
|
||
|
||
- Added optional `service` field to `ServiceUpdateSettingsFrame` (and its
|
||
subclasses `LLMUpdateSettingsFrame`, `TTSUpdateSettingsFrame`,
|
||
`STTUpdateSettingsFrame`) to target a specific service instance. When
|
||
`service` is set, only the matching service applies the settings; others
|
||
forward the frame unchanged. This enables updating a single service when
|
||
multiple services of the same type exist in the pipeline.
|
||
(PR [#4004](https://github.com/pipecat-ai/pipecat/pull/4004))
|
||
|
||
- Added `sip_provider` and `room_geo` parameters to `configure()` in the Daily
|
||
runner. These convenience parameters let callers specify a SIP provider name
|
||
and geographic region directly without manually constructing
|
||
`DailyRoomProperties` and `DailyRoomSipParams`.
|
||
(PR [#4005](https://github.com/pipecat-ai/pipecat/pull/4005))
|
||
|
||
- Added `PerplexityLLMAdapter` that automatically transforms conversation
|
||
messages to satisfy Perplexity's stricter API constraints (strict role
|
||
alternation, no non-initial system messages, last message must be user/tool).
|
||
Previously, certain conversation histories could cause Perplexity API errors
|
||
that didn't occur with OpenAI (`PerplexityLLMService` subclasses
|
||
`OpenAILLMService` since Perplexity uses an OpenAI-compatible API).
|
||
(PR [#4009](https://github.com/pipecat-ai/pipecat/pull/4009))
|
||
|
||
- Added DTMF input event support to the Daily transport. Incoming DTMF tones
|
||
are now received via Daily's `on_dtmf_event` callback and pushed into the
|
||
pipeline as `InputDTMFFrame`, enabling bots to react to keypad presses from
|
||
phone callers.
|
||
(PR [#4047](https://github.com/pipecat-ai/pipecat/pull/4047))
|
||
|
||
- Added `WakePhraseUserTurnStartStrategy` for triggering user turns based on
|
||
wake phrases, with support for `single_activation` mode. Deprecates
|
||
`WakeCheckFilter`.
|
||
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
|
||
|
||
- Added `default_user_turn_start_strategies()` and
|
||
`default_user_turn_stop_strategies()` helper functions for composing custom
|
||
strategy lists.
|
||
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
|
||
|
||
### Changed
|
||
|
||
- Changed tool result JSON serialization to use `ensure_ascii=False`,
|
||
preserving UTF-8 characters instead of escaping them. This reduces context
|
||
size and token usage for non-English languages.
|
||
(PR [#3457](https://github.com/pipecat-ai/pipecat/pull/3457))
|
||
|
||
- `OpenAIRealtimeSTTService`'s `noise_reduction` parameter is now part of
|
||
`OpenAIRealtimeSTTSettings`, making it runtime-updatable via
|
||
`STTUpdateSettingsFrame`. The direct `noise_reduction` init argument is
|
||
deprecated as of 0.0.106.
|
||
(PR [#3991](https://github.com/pipecat-ai/pipecat/pull/3991))
|
||
|
||
- Updated `sarvamai` dependency from `0.1.26a2` (alpha) to `0.1.26` (stable
|
||
release).
|
||
(PR [#3997](https://github.com/pipecat-ai/pipecat/pull/3997))
|
||
|
||
- `SimliVideoService` now extends `AIService` instead of `FrameProcessor`,
|
||
aligning it with the HeyGen and Tavus video services. It supports
|
||
`SimliVideoService.Settings(...)` for configuration and uses
|
||
`start()`/`stop()`/`cancel()` lifecycle methods. Existing constructor usage
|
||
(`api_key`, `face_id`, etc.) remains unchanged.
|
||
(PR [#4001](https://github.com/pipecat-ai/pipecat/pull/4001))
|
||
|
||
- Update `pipecat-ai-small-webrtc-prebuilt` to `2.4.0`.
|
||
(PR [#4023](https://github.com/pipecat-ai/pipecat/pull/4023))
|
||
|
||
- Nova Sonic assistant text transcripts are now delivered in real-time using
|
||
speculative text events instead of delayed final text events. Previously,
|
||
assistant text only arrived after all audio had finished playing, causing
|
||
laggy transcripts in client UIs. Speculative text arrives before each audio
|
||
chunk, providing text synchronized with what the bot is saying. This also
|
||
simplifies the internal text handling by removing the interruption re-push
|
||
hack and assistant text buffer.
|
||
(PR [#4042](https://github.com/pipecat-ai/pipecat/pull/4042))
|
||
|
||
- Updated `daily-python` dependency to 0.25.0.
|
||
(PR [#4047](https://github.com/pipecat-ai/pipecat/pull/4047))
|
||
|
||
- Added `enable_dialout` parameter to `configure()` in `pipecat.runner.daily`
|
||
to support dial-out rooms. Also narrowed misleading `Optional` type hints and
|
||
deduplicated token expiry calculation.
|
||
(PR [#4048](https://github.com/pipecat-ai/pipecat/pull/4048))
|
||
|
||
- Extended `ProcessFrameResult` to stop strategies, allowing a stop strategy to
|
||
short-circuit evaluation of subsequent strategies by returning `STOP`.
|
||
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
|
||
|
||
- `GradiumSTTService` now takes both an `encoding` and `sample_rate`
|
||
constructor argument which is assmebled in the class to form the
|
||
`input_format`. PCM accepts `8000`, `16000`, and `24000` Hz sample rates.
|
||
(PR [#4066](https://github.com/pipecat-ai/pipecat/pull/4066))
|
||
|
||
- Improved `GradiumSTTService` transcription accuracy by reworking how text
|
||
fragments are accumulated and finalized. Previously, trailing words could be
|
||
dropped when the server's `flushed` response arrived before all text tokens
|
||
were delivered. The service now uses a short aggregation delay after flush to
|
||
capture trailing tokens, producing complete utterances.
|
||
(PR [#4066](https://github.com/pipecat-ai/pipecat/pull/4066))
|
||
|
||
### Deprecated
|
||
|
||
- `SimliVideoService.InputParams` is deprecated. Use the direct constructor
|
||
parameters `max_session_length`, `max_idle_time`, and `enable_logging`
|
||
instead.
|
||
(PR [#4001](https://github.com/pipecat-ai/pipecat/pull/4001))
|
||
|
||
- Deprecated `LocalSmartTurnAnalyzerV2` and `LocalCoreMLSmartTurnAnalyzer`. Use
|
||
`LocalSmartTurnAnalyzerV3` instead. Instantiating these analyzers will now
|
||
emit a `DeprecationWarning`.
|
||
(PR [#4012](https://github.com/pipecat-ai/pipecat/pull/4012))
|
||
|
||
- Deprecated `WakeCheckFilter` in favor of `WakePhraseUserTurnStartStrategy`.
|
||
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where the default model for `OpenAILLMService` and
|
||
`AzureLLMService` was mistakenly reverted to `gpt-4o`. The defaults are now
|
||
restored to `gpt-4.1`.
|
||
(PR [#4000](https://github.com/pipecat-ai/pipecat/pull/4000))
|
||
|
||
- Fixed a race condition where `EndTaskFrame` could cause the pipeline to shut
|
||
down before in-flight frames (e.g. LLM function call responses) finished
|
||
processing. `EndTaskFrame` and `StopTaskFrame` now flow through the pipeline
|
||
as `ControlFrame`s, ensuring all pending work is flushed before shutdown
|
||
begins. `CancelTaskFrame` and `InterruptionTaskFrame` remain immediate
|
||
(`SystemFrame`).
|
||
(PR [#4006](https://github.com/pipecat-ai/pipecat/pull/4006))
|
||
|
||
- Fixed `ParallelPipeline` dropping or misordering frames during lifecycle
|
||
synchronization. Buffered frames are now flushed in the correct order
|
||
relative to synchronization frames (`StartFrame` goes first,
|
||
`EndFrame`/`CancelFrame` go after), and frames added to the buffer during
|
||
flush are also drained.
|
||
(PR [#4007](https://github.com/pipecat-ai/pipecat/pull/4007))
|
||
|
||
- Fixed `TTSService` potentially canceling in-flight audio during shutdown. The
|
||
stop sequence now waits for all queued audio contexts to finish processing
|
||
before canceling the stop frame task.
|
||
(PR [#4007](https://github.com/pipecat-ai/pipecat/pull/4007))
|
||
|
||
- Fixed `Language` enum values (e.g. `Language.ES`) not being converted to
|
||
service-specific codes when passed via
|
||
`settings=Service.Settings(language=Language.ES)` at init time. This caused
|
||
API errors (e.g. 400 from Rime) because the raw enum was sent instead of the
|
||
expected language code (e.g. `"spa"`). Runtime updates via
|
||
`UpdateSettingsFrame` were unaffected. The fix centralizes conversion in the
|
||
base `TTSService` and `STTService` classes so all services handle this
|
||
consistently.
|
||
(PR [#4024](https://github.com/pipecat-ai/pipecat/pull/4024))
|
||
|
||
- Fixed `DeepgramSTTService` ignoring the `base_url` scheme when using `ws://`
|
||
or `http://`. Previously these were silently overwritten with `wss://` /
|
||
`https://`, breaking air-gapped or private deployments that don't use TLS.
|
||
All scheme choices (`wss://`, `https://`, `ws://`, `http://`, or bare
|
||
hostname) are now respected.
|
||
(PR [#4026](https://github.com/pipecat-ai/pipecat/pull/4026))
|
||
|
||
- Fixed `LLMSwitcher.register_function()` and `register_direct_function()` not
|
||
accepting or forwarding the `timeout_secs` parameter.
|
||
(PR [#4037](https://github.com/pipecat-ai/pipecat/pull/4037))
|
||
|
||
- Fixed empty user transcriptions in Nova Sonic causing spurious interruptions.
|
||
Previously, an empty transcription could trigger an interruption of the
|
||
assistant's response even though the user hadn't actually spoken.
|
||
(PR [#4042](https://github.com/pipecat-ai/pipecat/pull/4042))
|
||
|
||
- Fixed `SonioxSTTService` and `OpenAIRealtimeSTTService` crash when language
|
||
parameters contain plain strings instead of `Language` enum values.
|
||
(PR [#4046](https://github.com/pipecat-ai/pipecat/pull/4046))
|
||
|
||
- Fixed premature user turn stops caused by late transcriptions arriving
|
||
between turns. A stale transcript from the previous turn could persist into
|
||
the next turn and trigger a stop before the current turn's real transcript
|
||
arrived. Stop strategies are now reset at both turn start and turn stop to
|
||
prevent state from leaking across turn boundaries.
|
||
(PR [#4057](https://github.com/pipecat-ai/pipecat/pull/4057))
|
||
|
||
- Fixed raw language strings like `"de-DE"` silently failing when passed to
|
||
TTS/STT services (e.g. ElevenLabs producing no audio). Raw strings now go
|
||
through the same `Language` enum resolution as enum values, so regional codes
|
||
like `"de-DE"` are properly converted to service-expected formats like
|
||
`"de"`. Unrecognized strings log a warning instead of failing silently.
|
||
(PR [#4058](https://github.com/pipecat-ai/pipecat/pull/4058))
|
||
|
||
- Fixed Deepgram STT list-type settings (`keyterm`, `keywords`, `search`,
|
||
`redact`, `replace`) being stringified instead of passed as lists to the SDK,
|
||
which caused them to be sent as literal strings (e.g. `"['pipecat']"`) in the
|
||
WebSocket query params.
|
||
(PR [#4063](https://github.com/pipecat-ai/pipecat/pull/4063))
|
||
|
||
- Fixed `MinWordsUserTurnStartStrategy` including text below the word threshold
|
||
in the output by resetting aggregation when the minimum word count is not
|
||
met.
|
||
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
|
||
|
||
- Fixed audio overlap and potential dropped TTS content when multiple assistant
|
||
turns occur in quick succession. `TTSService` now flushes remaining text
|
||
before pausing frame processing on `LLMFullResponseEndFrame`/`EndFrame`,
|
||
instead of pausing first.
|
||
(PR [#4071](https://github.com/pipecat-ai/pipecat/pull/4071))
|
||
|
||
### Security
|
||
|
||
- Bumped PyJWT minimum version from 2.10.1 to 2.12.0 in the `livekit` extra to
|
||
address CVE-2026-32597 (GHSA-752w-5fwx-jx9f), where PyJWT <= 2.11.0 accepted
|
||
unknown `crit` header extensions.
|
||
(PR [#4035](https://github.com/pipecat-ai/pipecat/pull/4035))
|
||
|
||
## [0.0.105] - 2026-03-10
|
||
|
||
### Added
|
||
|
||
- Added concurrent audio context support: `CartesiaTTSService` can now
|
||
synthesize the next sentence while the previous one is still playing, by
|
||
setting `pause_frame_processing=False` and routing each sentence through its
|
||
own audio context queue.
|
||
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
|
||
|
||
- Added custom video track support to Daily transport. Use
|
||
`video_out_destinations` in `DailyParams` to publish multiple video tracks
|
||
simultaneously, mirroring the existing `audio_out_destinations` feature.
|
||
(PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))
|
||
|
||
- Added `ServiceSwitcherStrategyFailover` that automatically switches to the
|
||
next service when the active service reports a non-fatal error. Recovery
|
||
policies can be implemented via the `on_service_switched` event handler.
|
||
(PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))
|
||
|
||
- Added optional `timeout_secs` parameter to `register_function()` and
|
||
`register_direct_function()` for per-tool function call timeout control,
|
||
overriding the global `function_call_timeout_secs` default.
|
||
(PR [#3915](https://github.com/pipecat-ai/pipecat/pull/3915))
|
||
|
||
- Added `cloud-audio-only` recording option to Daily transport's
|
||
`enable_recording` property.
|
||
(PR [#3916](https://github.com/pipecat-ai/pipecat/pull/3916))
|
||
|
||
- Wired up `system_instruction` in `BaseOpenAILLMService`,
|
||
`AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default
|
||
system prompt, matching the behavior of the Google services. This enables
|
||
sharing a single `LLMContext` across multiple LLM services, where each
|
||
service provides its own system instruction independently.
|
||
|
||
```python
|
||
llm = OpenAILLMService(
|
||
api_key=os.getenv("OPENAI_API_KEY"),
|
||
system_instruction="You are a helpful assistant.",
|
||
)
|
||
|
||
context = LLMContext()
|
||
|
||
@transport.event_handler("on_client_connected")
|
||
async def on_client_connected(transport, client):
|
||
context.add_message({"role": "user", "content": "Please introduce yourself."})
|
||
await task.queue_frames([LLMRunFrame()])
|
||
```
|
||
(PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))
|
||
|
||
- Added `vad_threshold` parameter to `AssemblyAIConnectionParams` for
|
||
configuring voice activity detection sensitivity in U3 Pro. Aligning this
|
||
with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone"
|
||
where AssemblyAI transcribes speech that VAD hasn't detected yet.
|
||
(PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))
|
||
|
||
- Added `push_empty_transcripts` parameter to `BaseWhisperSTTService` and
|
||
`OpenAISTTService` to allow empty transcripts to be pushed downstream as
|
||
`TranscriptionFrame` instead of discarding them (the default behavior). This
|
||
is intended for situations where VAD fires even though the user did not
|
||
speak. In these cases, it is useful to know that nothing was transcribed so
|
||
that the agent can resume speaking, instead of waiting longer for a
|
||
transcription.
|
||
(PR [#3930](https://github.com/pipecat-ai/pipecat/pull/3930))
|
||
|
||
- LLM services (`BaseOpenAILLMService`, `AnthropicLLMService`,
|
||
`AWSBedrockLLMService`) now log a warning when both `system_instruction` and
|
||
a system message in the context are set. The constructor's
|
||
`system_instruction` takes precedence.
|
||
(PR [#3932](https://github.com/pipecat-ai/pipecat/pull/3932))
|
||
|
||
- Runtime settings updates (via `STTUpdateSettingsFrame`) now work for AWS
|
||
Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and
|
||
Soniox STT services. Previously, changing settings at runtime only stored the
|
||
new values without reconnecting.
|
||
(PR [#3946](https://github.com/pipecat-ai/pipecat/pull/3946))
|
||
|
||
- Exposed `on_summary_applied` event on `LLMAssistantAggregator`, allowing
|
||
users to listen for context summarization events without accessing private
|
||
members.
|
||
(PR [#3947](https://github.com/pipecat-ai/pipecat/pull/3947))
|
||
|
||
- Deepgram Flux STT settings (`keyterm`, `eot_threshold`,
|
||
`eager_eot_threshold`, `eot_timeout_ms`) can now be updated mid-stream via
|
||
`STTUpdateSettingsFrame` without triggering a reconnect. The new values are
|
||
sent to Deepgram as a Configure WebSocket message on the existing connection.
|
||
(PR [#3953](https://github.com/pipecat-ai/pipecat/pull/3953))
|
||
|
||
- Added `system_instruction` parameter to `run_inference` across all LLM
|
||
services, allowing callers to override the system prompt for one-shot
|
||
inference calls. Used by `_generate_summary` to pass the summarization prompt
|
||
cleanly.
|
||
(PR [#3968](https://github.com/pipecat-ai/pipecat/pull/3968))
|
||
|
||
### Changed
|
||
|
||
- Audio context management (previously in `AudioContextTTSService`) is now
|
||
built into `TTSService`. All WebSocket providers (`cartesia`, `elevenlabs`,
|
||
`asyncai`, `inworld`, `rime`, `gradium`, `resembleai`) now inherit from
|
||
`WebsocketTTSService` directly. Word-timestamp baseline is set automatically
|
||
on the first audio chunk of each context instead of requiring each provider
|
||
to call `start_word_timestamps()` in their receive loop.
|
||
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
|
||
|
||
- Daily transport now uses `CustomVideoSource`/`CustomVideoTrack` instead of
|
||
`VirtualCameraDevice` for the default camera output, mirroring how audio
|
||
already works with `CustomAudioSource`/`CustomAudioTrack`.
|
||
(PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))
|
||
|
||
- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions`
|
||
class was removed from the SDK and is now provided by pipecat directly;
|
||
import it from `pipecat.services.deepgram.stt` instead of `deepgram`.
|
||
(PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))
|
||
|
||
- `ServiceSwitcherStrategy` base class now provides a `handle_error()` hook for
|
||
subclasses to implement error-based switching. `ServiceSwitcher` defaults to
|
||
`ServiceSwitcherStrategyManual` and `strategy_type` is now optional.
|
||
(PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))
|
||
|
||
- Support for Voice Focus 2.0 models.
|
||
- Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
|
||
- Cleaned unused `ParameterFixedError` exception handling in `AICFilter`
|
||
parameter setup.
|
||
(PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))
|
||
|
||
- `max_context_tokens` and `max_unsummarized_messages` in
|
||
`LLMAutoContextSummarizationConfig` (and deprecated
|
||
`LLMContextSummarizationConfig`) can now be set to `None` independently to
|
||
disable that summarization threshold. At least one must remain set.
|
||
(PR [#3914](https://github.com/pipecat-ai/pipecat/pull/3914))
|
||
|
||
- ⚠️ Removed `formatted_finals` and `word_finalization_max_wait_time` from
|
||
`AssemblyAIConnectionParams` as these were v2 API parameters not supported in
|
||
v3. Clarified that `format_turns` only applies to Universal-Streaming models;
|
||
U3 Pro has automatic formatting built-in.
|
||
(PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))
|
||
|
||
- Changed `DeepgramTTSService` to send a Clear message on interruption instead
|
||
of disconnecting and reconnecting the WebSocket, allowing the connection to
|
||
persist throughout the session.
|
||
(PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))
|
||
|
||
- Re-added `enhancement_level` support to `AICFilter` with runtime
|
||
`FilterEnableFrame` control, applying `ProcessorParameter.Bypass` and
|
||
`ProcessorParameter.EnhancementLevel` together.
|
||
(PR [#3961](https://github.com/pipecat-ai/pipecat/pull/3961))
|
||
|
||
- Updated `daily-python` dependency from `~=0.23.0` to `~=0.24.0`.
|
||
(PR [#3970](https://github.com/pipecat-ai/pipecat/pull/3970))
|
||
|
||
- Updated `FishAudioTTSService` default model from `s1` to `s2-pro`, matching
|
||
Fish Audio's latest recommended model for improved quality and speed.
|
||
(PR [#3973](https://github.com/pipecat-ai/pipecat/pull/3973))
|
||
|
||
- `AzureSTTService` `region` parameter is now optional when `private_endpoint`
|
||
is provided. A `ValueError` is raised if neither is given, and a warning is
|
||
logged if both are provided (`private_endpoint` takes priority).
|
||
(PR [#3974](https://github.com/pipecat-ai/pipecat/pull/3974))
|
||
|
||
### Deprecated
|
||
|
||
- Deprecated `AudioContextTTSService` and `AudioContextWordTTSService`.
|
||
Subclass `WebsocketTTSService` directly instead; audio context management is
|
||
now part of the base `TTSService`.
|
||
- Deprecated `WordTTSService`, `WebsocketWordTTSService`, and
|
||
`InterruptibleWordTTSService`. Word timestamp logic is now always active in
|
||
`TTSService` and no longer needs to be opted into via a subclass.
|
||
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
|
||
|
||
- Deprecated `pipecat.services.google.llm_vertex`,
|
||
`pipecat.services.google.llm_openai`, and
|
||
`pipecat.services.google.gemini_live.llm_vertex` modules. Use
|
||
`pipecat.services.google.vertex.llm`, `pipecat.services.google.openai.llm`,
|
||
and `pipecat.services.google.gemini_live.vertex.llm` instead. The old import
|
||
paths still work but will emit a `DeprecationWarning`.
|
||
(PR [#3980](https://github.com/pipecat-ai/pipecat/pull/3980))
|
||
|
||
### Removed
|
||
|
||
- ⚠️ Removed `supports_word_timestamps` parameter from `TTSService.__init__()`.
|
||
Word timestamp logic is now always active. Remove this argument from any
|
||
custom subclass `super().__init__()` calls.
|
||
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
|
||
|
||
### Fixed
|
||
|
||
- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The
|
||
deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit
|
||
`KeepAlive` messages every 5 seconds, within the recommended 3–5 second
|
||
interval before Deepgram's 10-second inactivity timeout.
|
||
(PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))
|
||
|
||
- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in
|
||
`AICFilter` caused by holding a `memoryview` on the mutable audio buffer
|
||
across async yield points.
|
||
(PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))
|
||
|
||
- Fixed TTS context not being appended to the assistant message history when
|
||
using `TTSSpeakFrame` with `append_to_context=True` with some TTS providers.
|
||
(PR [#3936](https://github.com/pipecat-ai/pipecat/pull/3936))
|
||
|
||
- Fixed context summarization leaving orphaned tool responses in the kept
|
||
context when tool calls were moved to the summarized portion.
|
||
(PR [#3937](https://github.com/pipecat-ai/pipecat/pull/3937))
|
||
|
||
- Fixed turn completion state not resetting at end of LLM responses.
|
||
`LLMFullResponseEndFrame` is pushed (not received) by the LLM service, so the
|
||
mixin now handles it in `push_frame` instead of `process_frame`.
|
||
(PR [#3956](https://github.com/pipecat-ai/pipecat/pull/3956))
|
||
|
||
- Fixed turn completion instructions being injected as a context system message
|
||
instead of using `system_instruction`. This caused warning spam when
|
||
`system_instruction` was also set and didn't persist across full context
|
||
updates.
|
||
(PR [#3957](https://github.com/pipecat-ai/pipecat/pull/3957))
|
||
|
||
- Fixed `TTSService` audio context queue getting blocked when
|
||
`append_to_audio_context()` was called with a `None` context ID, which
|
||
prevented subsequent audio from being delivered.
|
||
(PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))
|
||
|
||
- Fixed `on_call_state_updated` event handler in LiveKit transport receiving
|
||
incorrect number of arguments due to redundant `self` passed to
|
||
`_call_event_handler`.
|
||
(PR [#3959](https://github.com/pipecat-ai/pipecat/pull/3959))
|
||
|
||
- Fixed OpenAI Realtime, OpenAI Realtime Beta, and Grok realtime services
|
||
treating `conversation_already_has_active_response` as a fatal error. These
|
||
services now log it as a non-fatal debug event when a response is already in
|
||
progress.
|
||
(PR [#3960](https://github.com/pipecat-ai/pipecat/pull/3960))
|
||
|
||
- Fixed `SmallWebRTCConnection` silently discarding messages sent before the
|
||
data channel is open by queuing them and flushing once the channel is ready.
|
||
A bounded queue (`MAX_MESSAGE_QUEUE_SIZE = 50`) prevents unbounded memory
|
||
growth, and a 10-second timeout after connection clears the queue and falls
|
||
back to discard mode if the data channel never opens.
|
||
(PR [#3962](https://github.com/pipecat-ai/pipecat/pull/3962))
|
||
|
||
- Fixed `AzureSTTService` failing to initialize when `private_endpoint` is
|
||
provided. The Azure Speech SDK's `SpeechConfig` does not accept both `region`
|
||
and `endpoint` simultaneously, so they are now passed conditionally.
|
||
(PR [#3967](https://github.com/pipecat-ai/pipecat/pull/3967))
|
||
|
||
- Fixed `GoogleLLMService` ignoring the `system_instruction` set via
|
||
constructor or `GoogleLLMSettings` when a system message was also present in
|
||
the context. The settings value now correctly takes priority, and a warning
|
||
is logged when both are set.
|
||
(PR [#3976](https://github.com/pipecat-ai/pipecat/pull/3976))
|
||
|
||
### Other
|
||
|
||
- Updated foundational examples to use `system_instruction` on LLM services
|
||
instead of adding system messages to `LLMContext`.
|
||
(PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))
|
||
|
||
- Updated AssemblyAI turn detection example to use `keyterms_prompt` list
|
||
format instead of `prompt` string for improved clarity.
|
||
(PR [#3929](https://github.com/pipecat-ai/pipecat/pull/3929))
|
||
|
||
- Updated foundational examples and eval scripts to use `"user"` role instead
|
||
of `"system"` when adding messages to `LLMContext`, since system prompts
|
||
should be set via `system_instruction` on the LLM service.
|
||
(PR [#3931](https://github.com/pipecat-ai/pipecat/pull/3931))
|
||
|
||
## [0.0.104] - 2026-03-02
|
||
|
||
### Added
|
||
|
||
- Added `TextAggregationMetricsData` metric measuring the time from the first
|
||
LLM token to the first complete sentence, representing the latency cost of
|
||
sentence aggregation in the TTS pipeline.
|
||
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
|
||
|
||
- Added support for using strongly-typed objects instead of dicts for updating
|
||
service settings at runtime.
|
||
|
||
Instead of, say:
|
||
|
||
```python
|
||
await task.queue_frame(
|
||
STTUpdateSettingsFrame(settings={"language": Language.ES})
|
||
)
|
||
```
|
||
|
||
you'd do:
|
||
|
||
```python
|
||
await task.queue_frame(
|
||
STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
|
||
)
|
||
```
|
||
|
||
Each service now vends strongly-typed classes like `DeepgramSTTSettings`
|
||
representing the service's runtime-updatable settings.
|
||
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
|
||
|
||
- Added support for specifying private endpoints for Azure Speech-to-Text,
|
||
enabling use in private networks behind firewalls.
|
||
(PR [#3764](https://github.com/pipecat-ai/pipecat/pull/3764))
|
||
|
||
- Added `LemonSliceTransport` and `LemonSliceApi` to support adding real-time
|
||
LemonSlice Avatars to any Daily room.
|
||
(PR [#3791](https://github.com/pipecat-ai/pipecat/pull/3791))
|
||
|
||
- Added `output_medium` parameter to `AgentInputParams` and
|
||
`OneShotInputParams` in Ultravox service to control initial output medium
|
||
(text or voice) at call creation time.
|
||
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
|
||
|
||
- Added `TurnMetricsData` as a generic metrics class for turn detection, with
|
||
e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData`
|
||
with `e2e_processing_time_ms` tracking the interval from VAD
|
||
speech-to-silence transition to turn completion.
|
||
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
|
||
|
||
- Added `on_audio_context_interrupted()` and `on_audio_context_completed()`
|
||
callbacks to `AudioContextTTSService`. Subclasses can override these to
|
||
perform provider-specific cleanup instead of overriding
|
||
`_handle_interruption()`.
|
||
(PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
|
||
|
||
- Added `on_summary_applied` event to `LLMContextSummarizer` for observability,
|
||
providing message counts before and after context summarization.
|
||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||
|
||
- Added `summary_message_template` to `LLMContextSummarizationConfig` for
|
||
customizing how summaries are formatted when injected into context (e.g.,
|
||
wrapping in XML tags).
|
||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||
|
||
- Added `summarization_timeout` to `LLMContextSummarizationConfig` (default
|
||
120s) to prevent hung LLM calls from permanently blocking future
|
||
summarizations.
|
||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||
|
||
- Added optional `llm` field to `LLMContextSummarizationConfig` for routing
|
||
summarization to a dedicated LLM service (e.g., a cheaper/faster model)
|
||
instead of the pipeline's primary model.
|
||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||
|
||
- Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
|
||
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
|
||
|
||
- Added `LLMSummarizeContextFrame` to trigger on-demand context summarization
|
||
from anywhere in the pipeline (e.g. a function call tool). Accepts an
|
||
optional `config: LLMContextSummaryConfig` to override summary generation
|
||
settings per request.
|
||
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
|
||
|
||
- Added `LLMContextSummaryConfig` (summary generation params:
|
||
`target_context_tokens`, `min_messages_after_summary`,
|
||
`summarization_prompt`) and `LLMAutoContextSummarizationConfig` (auto-trigger
|
||
thresholds: `max_context_tokens`, `max_unsummarized_messages`, plus a nested
|
||
`summary_config`). These replace the monolithic
|
||
`LLMContextSummarizationConfig`.
|
||
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
|
||
|
||
- Added support for the `speed_alpha` parameter to the `arcana` model in
|
||
`RimeTTSService`.
|
||
(PR [#3873](https://github.com/pipecat-ai/pipecat/pull/3873))
|
||
|
||
- Added `ClientConnectedFrame`, a new `SystemFrame` pushed by all transports
|
||
(Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen,
|
||
Tavus) when a client connects. Enables observers to track transport readiness
|
||
timing.
|
||
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
|
||
|
||
- Added `StartupTimingObserver` for measuring how long each processor's
|
||
`start()` method takes during pipeline startup. Also measures transport
|
||
readiness — the time from `StartFrame` to first client connection — via the
|
||
`on_transport_timing_report` event.
|
||
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
|
||
|
||
- Added `BotConnectedFrame` for SFU transports and `on_transport_timing_report`
|
||
event to `StartupTimingObserver` with bot and client connection timing.
|
||
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
|
||
|
||
- Added optional `direction` parameter to `PipelineTask.queue_frame()` and
|
||
`PipelineTask.queue_frames()`, allowing frames to be pushed upstream from the
|
||
end of the pipeline.
|
||
(PR [#3883](https://github.com/pipecat-ai/pipecat/pull/3883))
|
||
|
||
- Added `on_latency_breakdown` event to `UserBotLatencyObserver` providing
|
||
per-service TTFB, text aggregation, user turn duration, and function call
|
||
latency metrics for each user-to-bot response cycle.
|
||
(PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
|
||
|
||
- Added `on_first_bot_speech_latency` event to `UserBotLatencyObserver`
|
||
measuring the time from client connection to first bot speech. An
|
||
`on_latency_breakdown` is also emitted for this first speech event.
|
||
(PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
|
||
|
||
- Added `broadcast_interruption()` to `FrameProcessor`. This method pushes an
|
||
`InterruptionFrame` both upstream and downstream directly from the calling
|
||
processor, avoiding the round-trip through the pipeline task that
|
||
`push_interruption_task_frame_and_wait()` required.
|
||
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
|
||
|
||
### Changed
|
||
|
||
- Added `text_aggregation_mode` parameter to `TTSService` and all TTS
|
||
subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All
|
||
text now flows through text aggregators regardless of mode, enabling pattern
|
||
detection and tag handling in TOKEN mode.
|
||
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
|
||
|
||
- ⚠️ Refactored runtime-updatable service settings to use strongly-typed
|
||
classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific
|
||
subclasses) instead of plain dicts. Each service's `_settings` now holds
|
||
these strongly-typed objects. For service maintainers, see changes in
|
||
COMMUNITY_INTEGRATIONS.md.
|
||
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
|
||
|
||
- Word timestamp support has been moved from `WordTTSService` into `TTSService`
|
||
via a new `supports_word_timestamps` parameter. Services that previously
|
||
extended `WordTTSService`, `AudioContextWordTTSService`, or
|
||
`WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their
|
||
parent `__init__` instead.
|
||
(PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
|
||
|
||
- Improved Ultravox TTFB measurement accuracy by using VAD speech end time
|
||
instead of `UserStoppedSpeakingFrame` timing.
|
||
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
|
||
|
||
- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini
|
||
realtime services: added `InterruptionFrame` handling with metrics cleanup,
|
||
processing metrics at response boundaries, and improved agent transcript
|
||
handling for both voice and text output modalities.
|
||
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
|
||
|
||
- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.
|
||
(PR [#3807](https://github.com/pipecat-ai/pipecat/pull/3807))
|
||
|
||
- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and
|
||
`KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to
|
||
`KRISP_VIVA_API_KEY` environment variable.
|
||
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
|
||
|
||
- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security
|
||
vulnerability.
|
||
(PR [#3811](https://github.com/pipecat-ai/pipecat/pull/3811))
|
||
|
||
- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally
|
||
speaking, you don't want a user interruption to prevent a service setting
|
||
change from going into effect. Note that you usually don't use
|
||
`ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
|
||
- `LLMUpdateSettingsFrame`
|
||
- `TTSUpdateSettingsFrame`
|
||
- `STTUpdateSettingsFrame`
|
||
(PR [#3819](https://github.com/pipecat-ai/pipecat/pull/3819))
|
||
|
||
- Updated context summarization to use `user` role instead of `assistant` for
|
||
summary messages.
|
||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||
|
||
- Rename `AssemblyAISTTService` parameter
|
||
`min_end_of_turn_silence_when_confident` parameter to `min_turn_silence` (old
|
||
name still supported with deprecation warning)
|
||
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
|
||
|
||
- ⚠️ Renamed `LLMAssistantAggregatorParams` fields:
|
||
`enable_context_summarization` → `enable_auto_context_summarization` and
|
||
`context_summarization_config` → `auto_context_summarization_config` (now
|
||
accepts `LLMAutoContextSummarizationConfig`). The old names still work with a
|
||
`DeprecationWarning` for one release cycle.
|
||
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
|
||
|
||
- `ElevenLabsRealtimeSTTService` now sets `TranscriptionFrame.finalized` to
|
||
`True` when using `CommitStrategy.MANUAL`.
|
||
(PR [#3865](https://github.com/pipecat-ai/pipecat/pull/3865))
|
||
|
||
- Updated numba version pin from == to >=0.61.2
|
||
(PR [#3868](https://github.com/pipecat-ai/pipecat/pull/3868))
|
||
|
||
- Updated tracing code to use `ServiceSettings` dataclass API
|
||
(`given_fields()`, attribute access) instead of dict-style access
|
||
(`.items()`, `in`, subscript).
|
||
(PR [#3879](https://github.com/pipecat-ai/pipecat/pull/3879))
|
||
|
||
- ⚠️ Removed `event` field and `complete()` method from `InterruptionFrame`.
|
||
Removed `event` field from `InterruptionTaskFrame`. These are no longer
|
||
needed since `broadcast_interruption()` does not require a round-trip
|
||
completion signal.
|
||
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
|
||
|
||
- Moved `pipecat.services.deepgram.stt_sagemaker` and
|
||
`pipecat.services.deepgram.tts_sagemaker` to
|
||
`pipecat.services.deepgram.sagemaker.stt` and
|
||
`pipecat.services.deepgram.sagemaker.tts`. The old import paths still work
|
||
but emit a `DeprecationWarning`.
|
||
(PR [#3902](https://github.com/pipecat-ai/pipecat/pull/3902))
|
||
|
||
### Deprecated
|
||
|
||
- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS
|
||
subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or
|
||
`text_aggregation_mode=TextAggregationMode.TOKEN` instead.
|
||
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
|
||
|
||
- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services
|
||
in favor of runtime updates via `TTSUpdateSettingsFrame`,
|
||
`STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.
|
||
|
||
⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas
|
||
previously only `set_language()` caused the service to actually react to the
|
||
update (e.g. by reconnecting to a remote service so it an pick up the
|
||
change), now all these methods do. This change was made as part of a refactor
|
||
making them all work the same way under the hood.
|
||
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
|
||
|
||
- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of
|
||
passing typed settings delta objects with
|
||
`*UpdateSettingsFrame(delta={...})`.
|
||
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
|
||
|
||
- Deprecated `WordTTSService`, `WebsocketWordTTSService`,
|
||
`AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their
|
||
non-word counterparts with `supports_word_timestamps=True` instead:
|
||
- `WordTTSService` → `TTSService(supports_word_timestamps=True)`
|
||
- `WebsocketWordTTSService` →
|
||
`WebsocketTTSService(supports_word_timestamps=True)`
|
||
- `AudioContextWordTTSService` →
|
||
`AudioContextTTSService(supports_word_timestamps=True)`
|
||
- `InterruptibleWordTTSService` →
|
||
`InterruptibleTTSService(supports_word_timestamps=True)`
|
||
(PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
|
||
|
||
- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`.
|
||
`BaseSmartTurn` now emits `TurnMetricsData` directly.
|
||
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
|
||
|
||
- Deprecated `LLMContextSummarizationConfig`. Use
|
||
`LLMAutoContextSummarizationConfig` with a nested `LLMContextSummaryConfig`
|
||
instead. The old class emits a `DeprecationWarning`.
|
||
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
|
||
|
||
- Deprecated `push_interruption_task_frame_and_wait()` in `FrameProcessor`. Use
|
||
`broadcast_interruption()` instead. The old method now delegates to
|
||
`broadcast_interruption()` and logs a deprecation warning.
|
||
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
|
||
|
||
### Removed
|
||
|
||
- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The
|
||
`transformers` and `onnxruntime` packages are now always installed as core
|
||
dependencies since they are required by the default turn stop strategy,
|
||
`TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
|
||
(PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
|
||
|
||
- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been
|
||
shut down and is no longer available.
|
||
(PR [#3838](https://github.com/pipecat-ai/pipecat/pull/3838))
|
||
|
||
### Fixed
|
||
|
||
- Added `LLMSpecificMessage` handling in `LLMContextSummarizationUtil` to skip
|
||
provider-specific messages during context summarization.
|
||
(PR [#3794](https://github.com/pipecat-ai/pipecat/pull/3794))
|
||
|
||
- Treated `response_cancel_not_active` as a non-fatal error in realtime
|
||
services (`OpenAIRealtimeLLMService`, `GrokRealtimeLLMService`,
|
||
`OpenAIRealtimeBetaLLMService`) to prevent WebSocket disconnection when
|
||
cancelling an inactive response.
|
||
(PR [#3795](https://github.com/pipecat-ai/pipecat/pull/3795))
|
||
|
||
- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies
|
||
(`transformers`, `onnxruntime`) into core dependencies instead of using a
|
||
self-referential extra.
|
||
(PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
|
||
|
||
- Fixed `SentryMetrics` method signatures to match updated
|
||
`FrameProcessorMetrics` base class, resolving `TypeError` when using
|
||
`start_time`/`end_time` keyword arguments.
|
||
(PR [#3808](https://github.com/pipecat-ai/pipecat/pull/3808))
|
||
|
||
- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and
|
||
`AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.
|
||
(PR [#3813](https://github.com/pipecat-ai/pipecat/pull/3813))
|
||
|
||
- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI,
|
||
ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio
|
||
contexts after normal speech completion, only on interruption.
|
||
(PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
|
||
|
||
- Fixed STT TTFB metrics measuring timeout expiry time instead of actual
|
||
transcript arrival time.
|
||
(PR [#3822](https://github.com/pipecat-ai/pipecat/pull/3822))
|
||
|
||
- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being
|
||
unintentionally pushed downstream in `LLMUserAggregator`. They are now
|
||
consumed like `TranscriptionFrame`.
|
||
(PR [#3825](https://github.com/pipecat-ai/pipecat/pull/3825))
|
||
|
||
- Fixed misleading "Empty audio frame received for STT service" warnings when
|
||
using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`)
|
||
that buffer audio internally.
|
||
(PR [#3828](https://github.com/pipecat-ai/pipecat/pull/3828))
|
||
|
||
- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is
|
||
sometimes vocalized
|
||
(PR [#3837](https://github.com/pipecat-ai/pipecat/pull/3837))
|
||
|
||
- Fixed `TTSSpeakFrame` not committing spoken text to the conversation context
|
||
when used outside of an LLM response (e.g., bot greetings or injected
|
||
speech).
|
||
(PR [#3845](https://github.com/pipecat-ai/pipecat/pull/3845))
|
||
|
||
- Removed verbose per-chunk audio logging from `GenesysAudioHookSerializer`
|
||
that flooded production logs.
|
||
(PR [#3850](https://github.com/pipecat-ai/pipecat/pull/3850))
|
||
|
||
- Add beta feature warning when using custom prompts with AssemblyAI
|
||
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
|
||
|
||
- Fixed `LocalSmartTurnAnalyzerV3` producing incorrect end-of-turn predictions
|
||
at non-16kHz sample rates (e.g. 8kHz Twilio telephony) by adding automatic
|
||
resampling to 16kHz before Whisper feature extraction.
|
||
(PR [#3857](https://github.com/pipecat-ai/pipecat/pull/3857))
|
||
|
||
- Fixed `PipelineTask` double-inserting `RTVIProcessor` into the frame chain
|
||
when the user provides both an `RTVIProcessor` in the pipeline and a custom
|
||
`RTVIObserver` subclass in observers.
|
||
(PR [#3867](https://github.com/pipecat-ai/pipecat/pull/3867))
|
||
|
||
- Fixed turn completion instructions being lost when `LLMMessagesUpdateFrame`
|
||
replaces the LLM context. When `filter_incomplete_user_turns` is enabled, the
|
||
turn completion system message is now re-injected after context replacement.
|
||
(PR [#3888](https://github.com/pipecat-ai/pipecat/pull/3888))
|
||
|
||
- Fixed Azure TTS and STT services silently swallowing cancellation errors
|
||
(invalid API key, network failures, rate limiting) instead of propagating
|
||
them as `ErrorFrame`s to the pipeline.
|
||
(PR [#3893](https://github.com/pipecat-ai/pipecat/pull/3893))
|
||
|
||
### Performance
|
||
|
||
- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to
|
||
`AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on
|
||
every interruption by using `client_req_id`-based multiplexing.
|
||
(PR [#3759](https://github.com/pipecat-ai/pipecat/pull/3759))
|
||
|
||
### Other
|
||
|
||
- Standardized Sarvam STT/TTS User-Agent header handling to consistently send
|
||
Pipecat SDK identity in websocket requests.
|
||
(PR [#3886](https://github.com/pipecat-ai/pipecat/pull/3886))
|
||
|
||
## [0.0.103] - 2026-02-20
|
||
|
||
### Added
|
||
|
||
- Added `"timestampTransportStrategy": "ASYNC"` to `InworldAITTSService`. This
|
||
allows timestamps info to trail audio chunks arrival, resulting in much
|
||
better first audio chunk latency
|
||
(PR [#3625](https://github.com/pipecat-ai/pipecat/pull/3625))
|
||
|
||
- Added model-specific `InputParams` to `RimeTTSService`: arcana params
|
||
(`repetition_penalty`, `temperature`, `top_p`) and mistv2 params
|
||
(`no_text_normalization`, `save_oovs`, `segment`). Model, voice, and param
|
||
changes now trigger WebSocket reconnection.
|
||
(PR [#3642](https://github.com/pipecat-ai/pipecat/pull/3642))
|
||
|
||
- Added `write_transport_frame()` hook to `BaseOutputTransport` allowing
|
||
transport subclasses to handle custom frame types that flow through the audio
|
||
queue.
|
||
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
|
||
|
||
- Added `DailySIPTransferFrame` and `DailySIPReferFrame` to the Daily
|
||
transport. These frames queue SIP transfer and SIP REFER operations with
|
||
audio, so the operation executes only after the bot finishes its current
|
||
utterance.
|
||
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
|
||
|
||
- Added keepalive support to `SarvamSTTService` to prevent idle connection
|
||
timeouts (e.g. when used behind a `ServiceSwitcher`).
|
||
(PR [#3730](https://github.com/pipecat-ai/pipecat/pull/3730))
|
||
|
||
- Added `UserIdleTimeoutUpdateFrame` to enable or disable user idle detection
|
||
at runtime by updating the timeout dynamically.
|
||
(PR [#3748](https://github.com/pipecat-ai/pipecat/pull/3748))
|
||
|
||
- Added `broadcast_sibling_id` field to the base `Frame` class. This field is
|
||
automatically set by `broadcast_frame()` and `broadcast_frame_instance()` to
|
||
the ID of the paired frame pushed in the opposite direction, allowing
|
||
receivers to identify broadcast pairs.
|
||
(PR [#3774](https://github.com/pipecat-ai/pipecat/pull/3774))
|
||
|
||
- Added `ignored_sources` parameter to `RTVIObserverParams` and
|
||
`add_ignored_source()`/`remove_ignored_source()` methods to `RTVIObserver` to
|
||
suppress RTVI messages from specific pipeline processors (e.g. a silent
|
||
evaluation LLM).
|
||
(PR [#3779](https://github.com/pipecat-ai/pipecat/pull/3779))
|
||
|
||
- Added `DeepgramSageMakerTTSService` for running Deepgram TTS models deployed
|
||
on AWS SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
|
||
Deepgram TTS protocol (Speak, Flush, Clear, Close), interruption handling,
|
||
and per-turn TTFB metrics.
|
||
(PR [#3785](https://github.com/pipecat-ai/pipecat/pull/3785))
|
||
|
||
### Changed
|
||
|
||
- ⚠️ `RimeTTSService` now defaults to `model="arcana"` and the
|
||
`wss://users-ws.rime.ai/ws3` endpoint. `InputParams` defaults changed from
|
||
mistv2-specific values to `None` — only explicitly-set params are sent as
|
||
query params.
|
||
(PR [#3642](https://github.com/pipecat-ai/pipecat/pull/3642))
|
||
|
||
- `AICFilter` now shares read-only AIC models via a singleton `AICModelManager`
|
||
in `aic_filter.py`.
|
||
- Multiple filters using the same model path or `(model_id,
|
||
model_download_dir)` share one loaded model, with reference counting and
|
||
concurrent load deduplication.
|
||
- Model file I/O runs off the event loop so the filter does not block.
|
||
(PR [#3684](https://github.com/pipecat-ai/pipecat/pull/3684))
|
||
|
||
- Added `X-User-Agent` and `X-Request-Id` headers to `InworldTTSService` for
|
||
better traceability.
|
||
(PR [#3706](https://github.com/pipecat-ai/pipecat/pull/3706))
|
||
|
||
- `DailyUpdateRemoteParticipantsFrame` is no longer deprecated and is now
|
||
queued with audio like other transport frames.
|
||
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
|
||
|
||
- Bumped Pillow dependency upper bound from `<12` to `<13` to allow Pillow
|
||
12.x.
|
||
(PR [#3728](https://github.com/pipecat-ai/pipecat/pull/3728))
|
||
|
||
- Moved STT keepalive mechanism from `WebsocketSTTService` to the `STTService`
|
||
base class, allowing any STT service (not just websocket-based ones) to use
|
||
idle-connection keepalive via the `keepalive_timeout` and
|
||
`keepalive_interval` parameters.
|
||
(PR [#3730](https://github.com/pipecat-ai/pipecat/pull/3730))
|
||
|
||
- Improved audio context management in `AudioContextTTSService` by moving
|
||
context ID tracking to the base class and adding
|
||
`reuse_context_id_within_turn` parameter to control concurrent TTS request
|
||
handling.
|
||
- Added helper methods: `has_active_audio_context()`,
|
||
`get_active_audio_context_id()`, `remove_active_audio_context()`,
|
||
`reset_active_audio_context()`
|
||
- Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS
|
||
implementations by removing duplicate context management code
|
||
(PR [#3732](https://github.com/pipecat-ai/pipecat/pull/3732))
|
||
|
||
- `UserIdleController` is now always created with a default timeout of 0
|
||
(disabled). The `user_idle_timeout` parameter changed from `Optional[float] =
|
||
None` to `float = 0` in `UserTurnProcessor`, `LLMUserAggregatorParams`, and
|
||
`UserIdleController`.
|
||
(PR [#3748](https://github.com/pipecat-ai/pipecat/pull/3748))
|
||
|
||
- Change the version specifier from `>=0.2.8` to `~=0.2.8` for the
|
||
`speechmatics-voice` package to ensure compatibility with future patch
|
||
versions.
|
||
(PR [#3761](https://github.com/pipecat-ai/pipecat/pull/3761))
|
||
|
||
- Updated `InworldTTSService` and `InworldHttpTTSService` to use `ASYNC`
|
||
timestamp transport strategy by default
|
||
(PR [#3765](https://github.com/pipecat-ai/pipecat/pull/3765))
|
||
|
||
- Added `start_time` and `end_time` parameters to `start_ttfb_metrics()`,
|
||
`stop_ttfb_metrics()`, `start_processing_metrics()`, and
|
||
`stop_processing_metrics()` in `FrameProcessor` and `FrameProcessorMetrics`,
|
||
allowing custom timestamps for metrics measurement. `STTService` now uses
|
||
these instead of custom TTFB tracking.
|
||
(PR [#3776](https://github.com/pipecat-ai/pipecat/pull/3776))
|
||
|
||
- Updated default Anthropic model from `claude-sonnet-4-5-20250929` to
|
||
`claude-sonnet-4-6`.
|
||
(PR [#3792](https://github.com/pipecat-ai/pipecat/pull/3792))
|
||
|
||
### Deprecated
|
||
|
||
- Deprecated unused `Traceable`, `@traceable`, `@traced`, and
|
||
`AttachmentStrategy` in `pipecat.utils.tracing.class_decorators`. This module
|
||
will be removed in a future release.
|
||
(PR [#3733](https://github.com/pipecat-ai/pipecat/pull/3733))
|
||
|
||
### Fixed
|
||
|
||
- Fixed race condition where `RTVIObserver` could send messages before
|
||
`DailyTransport` join completed. Outbound messages are now queued & delivered
|
||
after the transport is ready.
|
||
(PR [#3615](https://github.com/pipecat-ai/pipecat/pull/3615))
|
||
|
||
- Fixed async generator cleanup in OpenAI LLM streaming to prevent
|
||
`AttributeError` with uvloop on Python 3.12+ (MagicStack/uvloop#699).
|
||
(PR [#3698](https://github.com/pipecat-ai/pipecat/pull/3698))
|
||
|
||
- Fixed `SmallWebRTCTransport` input audio resampling to properly handle all
|
||
sample rates, including 8kHz audio.
|
||
(PR [#3713](https://github.com/pipecat-ai/pipecat/pull/3713))
|
||
|
||
- Fixed a race condition in `RTVIObserver` where bot output messages could be
|
||
sent before the bot-started-speaking event.
|
||
(PR [#3718](https://github.com/pipecat-ai/pipecat/pull/3718))
|
||
|
||
- Fixed Grok Realtime `session.updated` event parsing failure caused by the API
|
||
returning prefixed voice names (e.g. `"human_Ara"` instead of `"Ara"`).
|
||
(PR [#3720](https://github.com/pipecat-ai/pipecat/pull/3720))
|
||
|
||
- Fixed context ID reuse issue in `ElevenLabsTTSService`, `InworldTTSService`,
|
||
`RimeTTSService`, `CartesiaTTSService`, `AsyncAITTSService`, and
|
||
`PlayHTTTSService`. Services now properly reuse the same context ID across
|
||
multiple `run_tts()` invocations within a single LLM turn, preventing context
|
||
tracking issues and incorrect lifecycle signaling.
|
||
(PR [#3729](https://github.com/pipecat-ai/pipecat/pull/3729))
|
||
|
||
- Fixed word timestamp interleaving issue in `ElevenLabsTTSService` when
|
||
processing multiple sentences within a single LLM turn.
|
||
(PR [#3729](https://github.com/pipecat-ai/pipecat/pull/3729))
|
||
|
||
- Fixed tracing service decorators executing the wrapped function twice when
|
||
the function itself raised an exception (e.g., LLM rate limit, TTS timeout).
|
||
(PR [#3735](https://github.com/pipecat-ai/pipecat/pull/3735))
|
||
|
||
- Fixed `LLMUserAggregator` broadcasting mute events before `StartFrame`
|
||
reaches downstream processors.
|
||
(PR [#3737](https://github.com/pipecat-ai/pipecat/pull/3737))
|
||
|
||
- Fixed `UserIdleController` false idle triggers caused by gaps between user
|
||
and bot activity frames. The idle timer now starts only after
|
||
`BotStoppedSpeakingFrame` and is suppressed during active user turns and
|
||
function calls.
|
||
(PR [#3744](https://github.com/pipecat-ai/pipecat/pull/3744))
|
||
|
||
- Fixed incorrect `sample_rate` assignment in
|
||
`TavusInputTransport._on_participant_audio_data` (was using
|
||
`audio.audio_frames` instead of `audio.sample_rate`).
|
||
(PR [#3768](https://github.com/pipecat-ai/pipecat/pull/3768))
|
||
|
||
- Fixed `RTVIObserver` not processing upstream-only frames. Previously, all
|
||
upstream frames were filtered out to avoid duplicate messages from
|
||
broadcasted frames. Now only upstream copies of broadcasted frames are
|
||
skipped.
|
||
(PR [#3774](https://github.com/pipecat-ai/pipecat/pull/3774))
|
||
|
||
- Fixed mutable default arguments in `LLMContextAggregatorPair.__init__()` that
|
||
could cause shared state across instances.
|
||
(PR [#3782](https://github.com/pipecat-ai/pipecat/pull/3782))
|
||
|
||
- Fixed `DeepgramSageMakerSTTService` to properly track finalize lifecycle
|
||
using `request_finalize()` / `confirm_finalize()` and use `is_final` (instead
|
||
of `is_final and speech_final`) for final transcription detection, matching
|
||
`DeepgramSTTService` behavior.
|
||
(PR [#3784](https://github.com/pipecat-ai/pipecat/pull/3784))
|
||
|
||
- Fixed a race condition in `AudioContextTTSService` where the audio context
|
||
could time out between consecutive TTS requests within the same turn, causing
|
||
audio to be discarded.
|
||
(PR [#3787](https://github.com/pipecat-ai/pipecat/pull/3787))
|
||
|
||
- Fixed `push_interruption_task_frame_and_wait()` hanging indefinitely when the
|
||
`InterruptionFrame` does not reach the pipeline sink within the timeout.
|
||
Added a `timeout` keyword argument to customize the wait duration.
|
||
(PR [#3789](https://github.com/pipecat-ai/pipecat/pull/3789))
|
||
|
||
## [0.0.102] - 2026-02-10
|
||
|
||
### Added
|
||
|
||
- Added `ResembleAITTSService` for text-to-speech using Resemble AI's streaming
|
||
WebSocket API with word-level timestamps and jitter buffering for smooth
|
||
audio playback.
|
||
(PR [#3134](https://github.com/pipecat-ai/pipecat/pull/3134))
|
||
|
||
- Added `UserBotLatencyObserver` for tracking user-to-bot response latency.
|
||
When tracing is enabled, latency measurements are automatically recorded as
|
||
`turn.user_bot_latency_seconds` attributes on OpenTelemetry turn spans.
|
||
(PR [#3355](https://github.com/pipecat-ai/pipecat/pull/3355))
|
||
|
||
- Added `append_to_context` parameter to `TTSSpeakFrame` for conditional LLM
|
||
context addition.
|
||
- Allows fine-grained control over whether text should be added to
|
||
conversation context
|
||
- Defaults to `True` to maintain backward compatibility
|
||
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
|
||
|
||
- Added TTS context tracking system with `context_id` field to trace audio
|
||
generation through the pipeline.
|
||
- `TTSAudioRawFrame`, `TTSStartedFrame`, `TTSStoppedFrame` now include
|
||
`context_id`
|
||
- `AggregatedTextFrame` and `TTSTextFrame` now include `context_id`
|
||
- Enables tracking which TTS request generated specific audio chunks
|
||
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
|
||
|
||
- Added support for Inworld TTS Websocket Auto Mode for improved latency
|
||
(PR [#3593](https://github.com/pipecat-ai/pipecat/pull/3593))
|
||
|
||
- Added new frames for context summarization: `LLMContextSummaryRequestFrame`
|
||
and `LLMContextSummaryResultFrame`.
|
||
(PR [#3621](https://github.com/pipecat-ai/pipecat/pull/3621))
|
||
|
||
- Added context summarization feature to automatically compress conversation
|
||
history when conversation length limits (by token or message count) are
|
||
reached, enabling efficient long-running conversations.
|
||
- Configure via `enable_context_summarization=True` in
|
||
`LLMAssistantAggregatorParams`
|
||
- Customize behavior with `LLMContextSummarizationConfig` (max tokens,
|
||
thresholds, etc.)
|
||
- Automatically preserves incomplete function call sequences during
|
||
summarization
|
||
- See new examples:
|
||
`examples/foundational/54-context-summarization-openai.py` and
|
||
`examples/foundational/54a-context-summarization-google.py`
|
||
(PR [#3621](https://github.com/pipecat-ai/pipecat/pull/3621))
|
||
|
||
- Added RTVI function call lifecycle events (`llm-function-call-started`,
|
||
`llm-function-call-in-progress`, `llm-function-call-stopped`) with
|
||
configurable security levels via
|
||
`RTVIObserverParams.function_call_report_level`. Supports per-function
|
||
control over what information is exposed (`DISABLED`, `NONE`, `NAME`, or
|
||
`FULL`).
|
||
(PR [#3630](https://github.com/pipecat-ai/pipecat/pull/3630))
|
||
|
||
- Added `RequestMetadataFrame` and metadata handling for `ServiceSwitcher` to
|
||
ensure STT services correctly emit `STTMetadataFrame` when switching between
|
||
services. Only the active service's metadata is propagated downstream,
|
||
switching services triggers the newly active service to re-emit its metadata,
|
||
and proper frame ordering is maintained at startup.
|
||
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
|
||
|
||
- Added `STTMetadataFrame` to broadcast STT service latency information at
|
||
pipeline start.
|
||
- STT services broadcast P99 time-to-final-segment (`ttfs_p99_latency`) to
|
||
downstream processors
|
||
- Turn stop strategies automatically configure their STT timeout from this
|
||
metadata
|
||
- Developers can override `ttfs_p99_latency` via constructor argument for
|
||
custom deployments
|
||
- Added measured P99 values for STT providers.
|
||
- See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) to
|
||
measure latency for your configuration
|
||
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
|
||
|
||
- Added support for `is_sandbox` parameter in `LiveAvatarNewSessionRequest` to
|
||
enable sandbox mode for HeyGen LiveAvatar sessions.
|
||
(PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))
|
||
|
||
- Added support for `video_settings` parameter in `LiveAvatarNewSessionRequest`
|
||
to configure video encoding (H264/VP8) and quality levels.
|
||
(PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))
|
||
|
||
- Added `OpenAIRealtimeSTTService` for real-time streaming speech-to-text using
|
||
OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD
|
||
and server-side VAD modes, noise reduction, and automatic reconnection.
|
||
(PR [#3656](https://github.com/pipecat-ai/pipecat/pull/3656))
|
||
|
||
- Added `bulbul:v3-beta` TTS model support for Sarvam AI with temperature
|
||
control and 25 new speaker voices.
|
||
(PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))
|
||
|
||
- Added `saaras:v3` STT model support for Sarvam AI with new `mode` parameter
|
||
(transcribe, translate, verbatim, translit, codemix) and prompt support.
|
||
(PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))
|
||
|
||
- Added new OpenAI TTS voice options `marin` and `cedar`.
|
||
(PR [#3682](https://github.com/pipecat-ai/pipecat/pull/3682))
|
||
|
||
- Added `UserMuteStartedFrame` and `UserMuteStoppedFrame` system frames, and
|
||
corresponding `user-mute-started` / `user-mute-stopped` RTVI messages, so
|
||
clients can observe when mute strategies activate or deactivate.
|
||
(PR [#3687](https://github.com/pipecat-ai/pipecat/pull/3687))
|
||
|
||
### Changed
|
||
|
||
- Updated all 30+ TTS service implementations to support context tracking with
|
||
`context_id`.
|
||
- Services now generate and propagate context IDs through TTS frames
|
||
- Enables end-to-end tracing of TTS requests through the pipeline
|
||
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
|
||
|
||
- ⚠️ `TTSService.run_tts()` now requires a `context_id` parameter for context
|
||
tracking.
|
||
- Custom TTS service implementations must update their `run_tts()`
|
||
signature
|
||
- Before: `async def run_tts(self, text: str) -> AsyncGenerator[Frame,
|
||
None]:`
|
||
- After: `async def run_tts(self, text: str, context_id: str) ->
|
||
AsyncGenerator[Frame, None]:`
|
||
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
|
||
|
||
- Simplified context aggregators to use `frame.append_to_context` flag instead
|
||
of tracking internal state.
|
||
- Cleaner logic in `LLMResponseAggregator` and
|
||
`LLMResponseUniversalAggregator`
|
||
- More consistent behavior across aggregator implementations
|
||
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
|
||
|
||
- Updated timestamps to be cumulative within an agent turn, using
|
||
flushCompleted message as an indication of when timestamps from the server
|
||
are reset to 0
|
||
(PR [#3593](https://github.com/pipecat-ai/pipecat/pull/3593))
|
||
|
||
- Changed `KokoroTTSService` to use `kokoro-onnx` instead of `kokoro` as the
|
||
underlying TTS engine.
|
||
(PR [#3612](https://github.com/pipecat-ai/pipecat/pull/3612))
|
||
|
||
- Improved user turn stop timing in `TranscriptionUserTurnStopStrategy` and
|
||
`TurnAnalyzerUserTurnStopStrategy`.
|
||
- Timeout now starts on `VADUserStoppedSpeakingFrame` for tighter, more
|
||
predictable timing
|
||
- Added support for finalized transcripts
|
||
(`TranscriptionFrame.finalized=True`) to trigger earlier
|
||
- Added fallback timeout for edge cases where transcripts arrive without
|
||
VAD events
|
||
- Removed `InterimTranscriptionFrame` handling (no longer affects timing)
|
||
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
|
||
|
||
- Improved the accuracy of the `UserBotLatencyObserver` and
|
||
`UserBotLatencyLogObserver` by measuring from the time when the user actually
|
||
starts speaking.
|
||
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
|
||
|
||
- ⚠️ Renamed `timeout` parameter to `user_speech_timeout` in
|
||
`TranscriptionUserTurnStopStrategy`.
|
||
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
|
||
|
||
- Updated the `VADUserStartedSpeakingFrame` to include `start_secs` and
|
||
`timestamp` and `VADUserStoppedSpeakingFrame` to include `stop_secs` and
|
||
`timestamp`, removing the need to separately handle the
|
||
`SpeechControlParamsFrame` for VADParams values.
|
||
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
|
||
|
||
- ⚠️ Renamed `TranscriptionUserTurnStopStrategy` to
|
||
`SpeechTimeoutUserTurnStopStrategy`. The old name is deprecated and will be
|
||
removed in a future release.
|
||
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
|
||
|
||
- `AssemblyAISTTService` now automatically configures optimal settings for
|
||
manual turn detection when `vad_force_turn_endpoint=True`. This sets
|
||
`end_of_turn_confidence_threshold=1.0` and `max_turn_silence=2000` by
|
||
default, which disables model-based turn detection and reduces latency by
|
||
relying on external VAD for turn endpoints. Warnings are logged if
|
||
conflicting settings are detected.
|
||
(PR [#3644](https://github.com/pipecat-ai/pipecat/pull/3644))
|
||
|
||
- Upgraded the `pipecat-ai-small-webrtc-prebuilt` package to v2.1.0.
|
||
(PR [#3652](https://github.com/pipecat-ai/pipecat/pull/3652))
|
||
|
||
- Changed default session mode from "CUSTOM" to "LITE" in HeyGen LiveAvatar
|
||
integration, with VP8 as the default video encoding.
|
||
(PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))
|
||
|
||
- ⚠️ The default `VADParams` `stop_secs` default is changing from `0.8` seconds
|
||
to `0.2` seconds. This change both simplifies the developer experience and
|
||
improves the performance of STT services. With a shorter `stop_secs` value,
|
||
STT services using a local VAD can finalize sooner, resulting in faster
|
||
transcription.
|
||
- `SpeechTimeoutUserTurnStopStrategy`: control how long to wait for
|
||
additional user speech using `user_speech_timeout` (default: 0.6 sec).
|
||
- `TurnAnalyzerUserTurnStopStrategy`: the turn analyzer automatically
|
||
adjusts the user wait time based on the audio input.
|
||
(PR [#3659](https://github.com/pipecat-ai/pipecat/pull/3659))
|
||
|
||
- Moved interruption wait event from per-processor instance state to
|
||
`InterruptionFrame` itself. Added `InterruptionFrame.complete()` to signal
|
||
when the interruption has fully traversed the pipeline. Custom processors
|
||
that block or consume an `InterruptionFrame` before it reaches the pipeline
|
||
sink must call `frame.complete()` to avoid stalling
|
||
`push_interruption_task_frame_and_wait()`. A warning is logged if completion
|
||
does not happen within 2 seconds.
|
||
(PR [#3660](https://github.com/pipecat-ai/pipecat/pull/3660))
|
||
|
||
- Update the default model to `scribe_v2` for `ElevenLabsSTTService`.
|
||
(PR [#3664](https://github.com/pipecat-ai/pipecat/pull/3664))
|
||
|
||
- Changed the `DeepgramSTTService` default setting for `smart_format` to
|
||
`False`, as agents don't need smart formatting. Disabling this setting
|
||
provides a small performance improvement, as well.
|
||
(PR [#3666](https://github.com/pipecat-ai/pipecat/pull/3666))
|
||
|
||
- Changed `FunctionCallCancelFrame` to broadcast in both directions for
|
||
consistency with other function call frames.
|
||
(PR [#3672](https://github.com/pipecat-ai/pipecat/pull/3672))
|
||
|
||
- Changed default user turn stop strategy from
|
||
`TranscriptionUserTurnStopStrategy` to `TurnAnalyzerUserTurnStopStrategy`
|
||
with `LocalSmartTurnAnalyzerV3`.
|
||
(PR [#3689](https://github.com/pipecat-ai/pipecat/pull/3689))
|
||
|
||
- Renamed `RequestMetadataFrame` to `ServiceSwitcherRequestMetadataFrame` and
|
||
added a `service` field to target a specific service. The frame is now pushed
|
||
downstream by services after handling instead of being silently consumed.
|
||
(PR [#3692](https://github.com/pipecat-ai/pipecat/pull/3692))
|
||
|
||
- Update `SonioxSTTService` to set `vad_force_turn_endpoint` to `True`. This
|
||
setting disabled the turn detection logic available natively in Soniox.
|
||
Instead, Soniox relies on a local VAD to finalize the transcript. This
|
||
configuration meaningfully reduces the time to final segment for Soniox. With
|
||
this setting enabled, Soniox outputs a transcript in ~250ms (median). Pipecat
|
||
enables smart-turn detection by default using the `LocalSmartTurnAnalyzerV3`.
|
||
To use the native turn detection logic in Soniox, just set
|
||
`vad_force_turn_endpoint` to `False`.
|
||
(PR [#3697](https://github.com/pipecat-ai/pipecat/pull/3697))
|
||
|
||
- Update `SonioxSTTService` default model to `stt-rt-v4`.
|
||
(PR [#3697](https://github.com/pipecat-ai/pipecat/pull/3697))
|
||
|
||
- Updated the default model to `async_flash_v1.0` and base URL to
|
||
`https://api.async.com` for `AsyncAITTSService`.
|
||
(PR [#3701](https://github.com/pipecat-ai/pipecat/pull/3701))
|
||
|
||
### Deprecated
|
||
|
||
- Deprecated `UserBotLatencyLogObserver`. Use `UserBotLatencyObserver` directly
|
||
with its `on_latency_measured` event handler instead.
|
||
(PR [#3355](https://github.com/pipecat-ai/pipecat/pull/3355))
|
||
|
||
- Deprecated `RTVILLMFunctionCallMessage`, `RTVILLMFunctionCallMessageData`,
|
||
and `RTVIProcessor.handle_function_call()`. Use the new
|
||
`llm-function-call-in-progress` event sent automatically by `RTVIObserver`
|
||
instead.
|
||
(PR [#3630](https://github.com/pipecat-ai/pipecat/pull/3630))
|
||
|
||
### Removed
|
||
|
||
- ⚠️ Removed `timeout` parameter from `TurnAnalyzerUserTurnStopStrategy`. The
|
||
timeout is now managed internally based on STT latency.
|
||
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
|
||
|
||
### Fixed
|
||
|
||
- Fixed pipeline freeze when `InterruptionFrame` discards `EndFrame` or
|
||
`StopFrame` by making terminal frames uninterruptible.
|
||
(PR [#3542](https://github.com/pipecat-ai/pipecat/pull/3542))
|
||
|
||
- Fixed OpenAI LLM stream not being closed on cancellation/exception, which
|
||
could leak sockets.
|
||
(PR [#3589](https://github.com/pipecat-ai/pipecat/pull/3589))
|
||
|
||
- Fixed `PipelineTask` adding duplicate `RTVIProcessor` and `RTVIObserver` when
|
||
they were already provided in the pipeline or observers list. They are now
|
||
detected and skipped, with appropriate warnings and errors logged for
|
||
mismatched configurations.
|
||
(PR [#3610](https://github.com/pipecat-ai/pipecat/pull/3610))
|
||
|
||
- Fixed function call timeout task not being cancelled when the handler
|
||
completes without calling `result_callback` or is cancelled externally, which
|
||
caused `RuntimeWarning: coroutine was never awaited`.
|
||
(PR [#3616](https://github.com/pipecat-ai/pipecat/pull/3616))
|
||
|
||
- Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin
|
||
languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK
|
||
languages, causing text to accumulate until flush instead of being split at
|
||
sentence boundaries. Added fallback detection for unambiguous non-Latin
|
||
sentence-ending punctuation (e.g., `。`, `?`, `!`).
|
||
(PR [#3617](https://github.com/pipecat-ai/pipecat/pull/3617))
|
||
|
||
- Fixed `PipelineTask` to also call `set_bot_ready()` when an external
|
||
`RTVIProcessor` is provided.
|
||
(PR [#3623](https://github.com/pipecat-ai/pipecat/pull/3623))
|
||
|
||
- Fixed `VADController` not broadcasting `SpeechControlParamsFrame` on startup,
|
||
which prevented STT services from receiving VAD params needed for TTFB
|
||
measurement.
|
||
(PR [#3628](https://github.com/pipecat-ai/pipecat/pull/3628))
|
||
|
||
- Fixed `StopAsyncIteration` exceptions in `parse_telephony_websocket()` when
|
||
WebSocket connections close before sending expected messages.
|
||
(PR [#3629](https://github.com/pipecat-ai/pipecat/pull/3629))
|
||
|
||
- Fixed WebSocket transport error when broadcasting
|
||
`InputTransportMessageFrame` by correctly instantiating the frame with its
|
||
message parameter.
|
||
(PR [#3635](https://github.com/pipecat-ai/pipecat/pull/3635))
|
||
|
||
- Fixed orphan OpenTelemetry spans during flow initialization and transitions
|
||
in tracing.
|
||
(PR [#3649](https://github.com/pipecat-ai/pipecat/pull/3649))
|
||
|
||
- Fixed `SambaNovaLLMService` and `GoogleLLMOpenAIBetaService` streams not
|
||
being closed on cancellation/exception, which could leak sockets.
|
||
(PR [#3663](https://github.com/pipecat-ai/pipecat/pull/3663))
|
||
|
||
- Fixed an issue in `InworldTTSService` where punctuation was pronounced. Now,
|
||
the `InworldTTSService` ensures proper spacing between sentences, resolving
|
||
pronunciation issues.
|
||
(PR [#3667](https://github.com/pipecat-ai/pipecat/pull/3667))
|
||
|
||
- Fixed `ParallelPipeline` allowing frames pushed by internal processors to
|
||
escape during lifecycle frame (`StartFrame`/`EndFrame`/`CancelFrame`)
|
||
synchronization. These frames are now buffered and flushed after all branches
|
||
complete.
|
||
(PR [#3668](https://github.com/pipecat-ai/pipecat/pull/3668))
|
||
|
||
- Fixed issues in Sarvam STT and TTS services: missing event handler
|
||
registration for VAD signals, `Optional[bool]` type annotations, WebSocket
|
||
state cleanup on API errors, and TTS disconnect/reconnection state
|
||
management.
|
||
(PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))
|
||
|
||
- Fixed `RTVIObserver` sending duplicate client messages for frames that are
|
||
broadcast in both directions (e.g. `UserStartedSpeakingFrame`,
|
||
`FunctionCallResultFrame`).
|
||
(PR [#3672](https://github.com/pipecat-ai/pipecat/pull/3672))
|
||
|
||
- Fixed WebSocket STT services (ElevenLabs, Cartesia, Gladia, Soniox)
|
||
disconnecting due to idle timeout when no audio is being sent (e.g. when
|
||
inactive behind a `ServiceSwitcher`). `WebsocketSTTService` now provides
|
||
opt-in silence-based keepalive via `keepalive_timeout` and
|
||
`keepalive_interval` parameters.
|
||
(PR [#3675](https://github.com/pipecat-ai/pipecat/pull/3675))
|
||
|
||
## [0.0.101] - 2026-01-30
|
||
|
||
### Added
|
||
|
||
- Additions for `AICFilter` and `AICVADAnalyzer`:
|
||
- Added model downloading support to `AICFilter` with `model_id` and
|
||
`model_download_dir` parameters.
|
||
- Added `model_path` parameter to `AICFilter` for loading local `.aicmodel`
|
||
files.
|
||
- Added unit tests for `AICFilter` and `AICVADAnalyzer`.
|
||
(PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))
|
||
|
||
- Added handling for `server_content.interrupted` signal in the Gemini Live
|
||
service for faster interruption response in the case where there isn't
|
||
already turn tracking in the pipeline, e.g. local VAD + context aggregators.
|
||
When there is already turn tracking in the pipeline, the additional
|
||
interruption does no harm.
|
||
(PR [#3429](https://github.com/pipecat-ai/pipecat/pull/3429))
|
||
|
||
- Added new `GenesysFrameSerializer` for the Genesys AudioHook WebSocket
|
||
protocol, enabling bidirectional audio streaming between Pipecat pipelines
|
||
and Genesys Cloud contact center.
|
||
(PR [#3500](https://github.com/pipecat-ai/pipecat/pull/3500))
|
||
|
||
- Added `reached_upstream_types` and `reached_downstream_types` read-only
|
||
properties to `PipelineTask` for inspecting current frame filters.
|
||
(PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))
|
||
|
||
- Added `add_reached_upstream_filter()` and `add_reached_downstream_filter()`
|
||
methods to `PipelineTask` for appending frame types.
|
||
(PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))
|
||
|
||
- Added `UserTurnCompletionLLMServiceMixin` for LLM services to detect and
|
||
filter incomplete user turns. When enabled via `filter_incomplete_user_turns`
|
||
in `LLMUserAggregatorParams`, the LLM outputs a turn completion marker at the
|
||
start of each response: ✓ (complete), ○ (incomplete short), or ◐ (incomplete
|
||
long). Incomplete turns are suppressed, and configurable timeouts
|
||
automatically re-prompt the user.
|
||
(PR [#3518](https://github.com/pipecat-ai/pipecat/pull/3518))
|
||
|
||
- Added `FrameProcessor.broadcast_frame_instance(frame)` method to broadcast a
|
||
frame instance by extracting its fields and creating new instances for each
|
||
direction.
|
||
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
|
||
|
||
- `PipelineTask` now automatically adds `RTVIProcessor` and registers
|
||
`RTVIObserver` when `enable_rtvi=True` (default), simplifying pipeline setup.
|
||
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
|
||
|
||
- Added `RTVIProcessor.create_rtvi_observer()` factory method for creating RTVI
|
||
observers.
|
||
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
|
||
|
||
- Added `video_out_codec` parameter to `TransportParams` allowing configuration
|
||
of the preferred video codec (e.g., `"VP8"`, `"H264"`, `"H265"`) for video
|
||
output in `DailyTransport`.
|
||
(PR [#3520](https://github.com/pipecat-ai/pipecat/pull/3520))
|
||
|
||
- Added `location` parameter to Google TTS services (`GoogleHttpTTSService`,
|
||
`GoogleTTSService`, `GeminiTTSService`) for regional endpoint support.
|
||
(PR [#3523](https://github.com/pipecat-ai/pipecat/pull/3523))
|
||
|
||
- Added new `PIPECAT_SMART_TURN_LOG_DATA` environment variable, which causes
|
||
Smart Turn input data to be saved to disk
|
||
(PR [#3525](https://github.com/pipecat-ai/pipecat/pull/3525))
|
||
|
||
- Added `result_callback` parameter to `UserImageRequestFrame` to support
|
||
deferred function call results.
|
||
(PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))
|
||
|
||
- Added `function_call_timeout_secs` parameter to `LLMService` to configure
|
||
timeout for deferred function calls (defaults to 10.0 seconds).
|
||
(PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))
|
||
|
||
- Added `vad_analyzer` parameter to `LLMUserAggregatorParams`. VAD analysis is
|
||
now handled inside the `LLMUserAggregator` rather than in the transport,
|
||
keeping voice activity detection closer to where it is consumed. The
|
||
`vad_analyzer` on `BaseInputTransport` is now deprecated.
|
||
|
||
```python
|
||
context_aggregator = LLMContextAggregatorPair(
|
||
context,
|
||
user_params=LLMUserAggregatorParams(
|
||
vad_analyzer=SileroVADAnalyzer(),
|
||
),
|
||
)
|
||
```
|
||
(PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))
|
||
|
||
- Added `VADProcessor` for detecting speech in audio streams within a pipeline.
|
||
Pushes `VADUserStartedSpeakingFrame`, `VADUserStoppedSpeakingFrame`, and
|
||
`UserSpeakingFrame` downstream based on VAD state changes.
|
||
(PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))
|
||
|
||
- Added `VADController` for managing voice activity detection state and
|
||
emitting speech events independently of transport or pipeline processors.
|
||
(PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))
|
||
|
||
- Added local `PiperTTSService` for offline text-to-speech using Piper voice
|
||
models. The existing HTTP-based service has been renamed to
|
||
`PiperHttpTTSService`.
|
||
(PR [#3585](https://github.com/pipecat-ai/pipecat/pull/3585))
|
||
|
||
- `main()` in `pipecat.runner.run` now accepts an optional
|
||
`argparse.ArgumentParser`, allowing bots to define custom CLI arguments
|
||
accessible via `runner_args.cli_args`.
|
||
(PR [#3590](https://github.com/pipecat-ai/pipecat/pull/3590))
|
||
|
||
- Added `KokoroTTSService` for local text-to-speech synthesis using the
|
||
Kokoro-82M model.
|
||
(PR [#3595](https://github.com/pipecat-ai/pipecat/pull/3595))
|
||
|
||
### Changed
|
||
|
||
- Updated `AICFilter` and `AICVADAnalyzer` to use aic-sdk ~= 2.0.1.
|
||
(PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))
|
||
|
||
- Improved the STT TTFB (Time To First Byte) measurement, reporting the delay
|
||
between when the user stops speaking and when the final transcription is
|
||
received. Note: Unlike traditional TTFB which measures from a discrete
|
||
request, STT services receive continuous audio input—so we measure from
|
||
speech end to final transcript, which captures the latency that matters for
|
||
voice AI applications. In support of this change, added `finalized` field to
|
||
`TranscriptionFrame` to indicate when a transcript is the final result for an
|
||
utterance.
|
||
(PR [#3495](https://github.com/pipecat-ai/pipecat/pull/3495))
|
||
|
||
- `SarvamSTTService` now defaults `vad_signals` and `high_vad_sensitivity` to
|
||
`None` (omitted from connection parameters), improving latency by ~300ms
|
||
compared to the previous defaults.
|
||
(PR [#3495](https://github.com/pipecat-ai/pipecat/pull/3495))
|
||
|
||
- Changed frame filter storage from tuples to sets in `PipelineTask`.
|
||
(PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))
|
||
|
||
- Changed default Inworld TTS model from `inworld-tts-1` to
|
||
`inworld-tts-1.5-max`.
|
||
(PR [#3531](https://github.com/pipecat-ai/pipecat/pull/3531))
|
||
|
||
- `FrameSerializer` now subclasses from `BaseObject` to enable event support.
|
||
(PR [#3560](https://github.com/pipecat-ai/pipecat/pull/3560))
|
||
|
||
- Added support for TTFS in `SpeechmaticsSTTService` and set the default mode
|
||
to `EXTERNAL` to support Pipecat-controlled VAD.
|
||
- Changed dependency to `speechmatics-voice[smart]>=0.2.8`
|
||
(PR [#3562](https://github.com/pipecat-ai/pipecat/pull/3562))
|
||
|
||
- ⚠️ Changed function call handling to use timeout-based completion instead of
|
||
immediate callback execution.
|
||
- Function calls that defer their results (e.g., `UserImageRequestFrame`)
|
||
now use a timeout mechanism
|
||
- The `result_callback` is invoked automatically when the deferred
|
||
operation completes or after timeout
|
||
- This change affects examples using `UserImageRequestFrame` - the
|
||
`result_callback` should now be passed to the frame instead of being called
|
||
immediately
|
||
(PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))
|
||
|
||
- Pipecat runner now uses `DAILY_ROOM_URL` instead of `DAILY_SAMPLE_ROOM_URL`.
|
||
(PR [#3582](https://github.com/pipecat-ai/pipecat/pull/3582))
|
||
|
||
- Updates to `GradiumSTTService`:
|
||
- Now flushes pending transcriptions when VAD detects the user stopped
|
||
speaking, improving response latency.
|
||
- `GradiumSTTService` now supports `InputParams` for configuring `language`
|
||
and `delay_in_frames` settings.
|
||
(PR [#3587](https://github.com/pipecat-ai/pipecat/pull/3587))
|
||
|
||
### Deprecated
|
||
|
||
- ⚠️ Deprecated `vad_analyzer` parameter on `BaseInputTransport`. Pass
|
||
`vad_analyzer` to `LLMUserAggregatorParams` instead or use `VADProcessor` in
|
||
the pipeline.
|
||
(PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))
|
||
|
||
### Removed
|
||
|
||
- Removed deprecated `AICFilter` parameters: `enhancement_level`, `voice_gain`,
|
||
`noise_gate_enable`.
|
||
(PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where if you were using `OpenRouterLLMService` with a Gemini
|
||
model, it wouldn't handle multiple `"system"` messages as expected (and as we
|
||
do in `GoogleLLMService`), which is to convert subsequent ones into `"user"`
|
||
messages. Instead, the latest `"system"` message would overwrite the previous
|
||
ones.
|
||
(PR [#3406](https://github.com/pipecat-ai/pipecat/pull/3406))
|
||
|
||
- Transports now properly broadcast `InputTransportMessageFrame` frames both
|
||
upstream and downstream instead of only pushing downstream.
|
||
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
|
||
|
||
- Fixed `FrameProcessor.broadcast_frame()` to deep copy kwargs, preventing
|
||
shared mutable references between the downstream and upstream frame
|
||
instances.
|
||
(PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))
|
||
|
||
- Fixed OpenAI LLM services to emit `ErrorFrame` on completion timeout,
|
||
enabling proper error handling and LLMSwitcher failover.
|
||
(PR [#3529](https://github.com/pipecat-ai/pipecat/pull/3529))
|
||
|
||
- Fixed a logging issue where non-ASCII characters (e.g., Japanese, Chinese,
|
||
etc.) were being unnecessarily escaped to Unicode sequences when function
|
||
call occurred.
|
||
(PR [#3536](https://github.com/pipecat-ai/pipecat/pull/3536))
|
||
|
||
- Fixed how audio tracks are synchronized inside the `AudioBufferProcessor` to
|
||
fix timing issues where silence and audio were misaligned between user and
|
||
bot buffers.
|
||
(PR [#3541](https://github.com/pipecat-ai/pipecat/pull/3541))
|
||
|
||
- Fixed race condition in `OpenAIRealtimeBetaLLMService` that could cause an
|
||
error when truncating the conversation.
|
||
(PR [#3567](https://github.com/pipecat-ai/pipecat/pull/3567))
|
||
|
||
- Fixed an infinite loop in `WebsocketService` that blocked the event loop when
|
||
a remote server closed the connection gracefully.
|
||
(PR [#3574](https://github.com/pipecat-ai/pipecat/pull/3574))
|
||
|
||
- Fixed `LLMUserAggregator` and `LLMAssistantAggregator` not emitting pending
|
||
transcripts via `on_user_turn_stopped` and `on_assistant_turn_stopped` events
|
||
when the conversation ends (`EndFrame`) or is cancelled (`CancelFrame`).
|
||
(PR [#3575](https://github.com/pipecat-ai/pipecat/pull/3575))
|
||
|
||
- Added missing `LiveKitRunnerArguments` and `LiveKitTransport` support in
|
||
runner utilities to enable LiveKit transport configuration.
|
||
(PR [#3580](https://github.com/pipecat-ai/pipecat/pull/3580))
|
||
|
||
- Fixed race condition in `OpenAIRealtimeLLMService` that could cause an error
|
||
when truncating the conversation.
|
||
(PR [#3581](https://github.com/pipecat-ai/pipecat/pull/3581))
|
||
|
||
- Fixed `PiperHttpTTSService` (olf `PiperTTSService`) to resample audio output
|
||
based on the model's sample rate parsed from the WAV header.
|
||
(PR [#3585](https://github.com/pipecat-ai/pipecat/pull/3585))
|
||
|
||
- Fixed `UserTurnController` to reset user turn timeout when interim
|
||
transcriptions are received.
|
||
(PR [#3594](https://github.com/pipecat-ai/pipecat/pull/3594))
|
||
|
||
- Fixed an issue in the `IVRNavigator` where the `TextFrame`s pushed had
|
||
incorrect spacing. Now, the internal `IVRProcessor` pushes
|
||
`AggregatedTextFrame`s when in conversation mode. This allows for controlling
|
||
spacing of the outputted, aggregated text.
|
||
(PR [#3604](https://github.com/pipecat-ai/pipecat/pull/3604))
|
||
|
||
- Fixed `GeminiLiveLLMService` transcription timeout handler not being
|
||
scheduled by yielding to the event loop after task creation.
|
||
(PR [#3605](https://github.com/pipecat-ai/pipecat/pull/3605))
|
||
|
||
## [0.0.100] - 2026-01-20
|
||
|
||
### Added
|
||
|
||
- Added Hathora service to support Hathora-hosted TTS and STT models (only
|
||
non-streaming)
|
||
(PR [#3169](https://github.com/pipecat-ai/pipecat/pull/3169))
|
||
|
||
- Added `CambTTSService`, using Camb.ai's TTS integration with MARS models
|
||
(mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech
|
||
synthesis.
|
||
(PR [#3349](https://github.com/pipecat-ai/pipecat/pull/3349))
|
||
|
||
- Added the `additional_headers` param to `WebsocketClientParams`, allowing
|
||
`WebsocketClientTransport` to send custom headers on connect, for cases such
|
||
as authentication.
|
||
(PR [#3461](https://github.com/pipecat-ai/pipecat/pull/3461))
|
||
|
||
- Added `UserIdleController` for detecting user idle state, integrated into
|
||
`LLMUserAggregator` and `UserTurnProcessor` via optional `user_idle_timeout`
|
||
parameter. Emits `on_user_turn_idle` event for application-level handling.
|
||
Deprecated `UserIdleProcessor` in favor of the new compositional approach.
|
||
(PR [#3482](https://github.com/pipecat-ai/pipecat/pull/3482))
|
||
|
||
- Added `on_user_mute_started` and `on_user_mute_stopped` event handlers to
|
||
`LLMUserAggregator` for tracking user mute state changes.
|
||
(PR [#3490](https://github.com/pipecat-ai/pipecat/pull/3490))
|
||
|
||
### Changed
|
||
|
||
- Enhanced interruption handling in `AsyncAITTSService` by supporting
|
||
multi-context WebSocket sessions for more robust context management.
|
||
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
|
||
|
||
- Throttle `UserSpeakingFrame` to broadcast at most every 200ms instead of on
|
||
every audio chunk, reducing frame processing overhead during user speech.
|
||
(PR [#3483](https://github.com/pipecat-ai/pipecat/pull/3483))
|
||
|
||
### Deprecated
|
||
|
||
- For consistency with other package names, we just deprecated
|
||
`pipecat.turns.mute` (introduced in Pipecat 0.0.99) in favor of
|
||
`pipecat.turns.user_mute`.
|
||
(PR [#3479](https://github.com/pipecat-ai/pipecat/pull/3479))
|
||
|
||
### Fixed
|
||
|
||
- Corrected TTFB metric calculation in `AsyncAIHttpTTSService`.
|
||
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
|
||
|
||
- Fixed an issue where the "bot-llm-text" RTVI event would not fire for
|
||
realtime (speech-to-speech) services:
|
||
|
||
- `AWSNovaSonicLLMService`
|
||
- `GeminiLiveLLMService`
|
||
- `OpenAIRealtimeLLMService`
|
||
- `GrokRealtimeLLMService`
|
||
|
||
The issue was that these services weren't pushing `LLMTextFrame`s. Now
|
||
they do.
|
||
(PR [#3446](https://github.com/pipecat-ai/pipecat/pull/3446))
|
||
|
||
- Fixed an issue where `on_user_turn_stop_timeout` could fire while a user is
|
||
talking when using `ExternalUserTurnStrategies`.
|
||
(PR [#3454](https://github.com/pipecat-ai/pipecat/pull/3454))
|
||
|
||
- Fixed an issue where user turn start strategies were not being reset after a
|
||
user turn started, causing incorrect strategy behavior.
|
||
(PR [#3455](https://github.com/pipecat-ai/pipecat/pull/3455))
|
||
|
||
- Fixed `MinWordsUserTurnStartStrategy` to not aggregate transcriptions,
|
||
preventing incorrect turn starts when words are spoken with pauses between
|
||
them.
|
||
(PR [#3462](https://github.com/pipecat-ai/pipecat/pull/3462))
|
||
|
||
- Fixed an issue where Grok Realtime would error out when running with
|
||
SmallWebRTC transport.
|
||
(PR [#3480](https://github.com/pipecat-ai/pipecat/pull/3480))
|
||
|
||
- Fixed a `Mem0MemoryService` issue where passing `async_mode: true` was
|
||
causing an error. See
|
||
https://docs.mem0.ai/platform/features/async-mode-default-change.
|
||
(PR [#3484](https://github.com/pipecat-ai/pipecat/pull/3484))
|
||
|
||
- Fixed `AWSNovaSonicLLMService.reset_conversation()`, which would previously
|
||
error out. Now it successfully reconnects and "rehydrates" from the context
|
||
object.
|
||
(PR [#3486](https://github.com/pipecat-ai/pipecat/pull/3486))
|
||
|
||
- Fixed `AzureTTSService` transcript formatting issues:
|
||
- Punctuation now appears without extra spaces (e.g., "Hello!" instead of
|
||
"Hello !")
|
||
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces
|
||
between characters
|
||
(PR [#3489](https://github.com/pipecat-ai/pipecat/pull/3489))
|
||
|
||
- Fixed an issue where `UninterruptibleFrame` frames would not be preserved in
|
||
some cases.
|
||
(PR [#3494](https://github.com/pipecat-ai/pipecat/pull/3494))
|
||
|
||
- Fixed memory leak in `LiveKitTransport` when `video_in_enabled` is `False`.
|
||
(PR [#3499](https://github.com/pipecat-ai/pipecat/pull/3499))
|
||
|
||
- Fixed an issue in `AIService` where unhandled exceptions in `start()`,
|
||
`stop()`, or `cancel()` implementations would prevent `process_frame()` to
|
||
continue and therefore `StartFrame`, `EndFrame`, or `CancelFrame` from being
|
||
pushed downstream, causing the pipeline to not start or stop properly.
|
||
(PR [#3503](https://github.com/pipecat-ai/pipecat/pull/3503))
|
||
|
||
- Moved `NVIDIATTSService` and `NVIDIASTTService` client initialization from
|
||
constructor to `start()` for better error handling.
|
||
(PR [#3504](https://github.com/pipecat-ai/pipecat/pull/3504))
|
||
|
||
- Optimized `NVIDIATTSService` to process incoming audio frames immediately.
|
||
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
|
||
|
||
- Optimized `NVIDIASTTService` by removing unnecessary queue and task.
|
||
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
|
||
|
||
- Fixed a `CambTTSService` issue where client was being initialized in the
|
||
constructor which wouldn't allow for proper Pipeline error handling.
|
||
(PR [#3511](https://github.com/pipecat-ai/pipecat/pull/3511))
|
||
|
||
## [0.0.99] - 2026-01-13
|
||
|
||
### Added
|
||
|
||
- Introducing user turn strategies. User turn strategies indicate when the user
|
||
turn starts or stops. In conversational agents, these are often referred to
|
||
as start/stop speaking or turn-taking plans or policies.
|
||
|
||
User turn start strategies indicate when the user starts speaking (e.g.
|
||
using VAD events or when a user says one or more words).
|
||
|
||
User turn stop strategies indicate when the user stops speaking (e.g. using
|
||
an end-of-turn detection model or by observing incoming transcriptions).
|
||
|
||
A list of strategies can be specified for both strategies; strategies are
|
||
evaluated in order until one evaluates to true.
|
||
|
||
Available user turn start strategies:
|
||
|
||
- VADUserTurnStartStrategy
|
||
- TranscriptionUserTurnStartStrategy
|
||
- MinWordsUserTurnStartStrategy
|
||
- ExternalUserTurnStartStrategy
|
||
|
||
Available user turn stop strategies:
|
||
|
||
- TranscriptionUserTurnStopStrategy
|
||
- TurnAnalyzerUserTurnStopStrategy
|
||
- ExternalUserTurnStopStrategy
|
||
|
||
The default strategies are:
|
||
|
||
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
|
||
- stop: [TranscriptionUserTurnStopStrategy]
|
||
|
||
Turn strategies are configured when setting up `LLMContextAggregatorPair`.
|
||
For example:
|
||
|
||
```python
|
||
context_aggregator = LLMContextAggregatorPair(
|
||
context,
|
||
user_params=LLMUserAggregatorParams(
|
||
user_turn_strategies=UserTurnStrategies(
|
||
stop=[
|
||
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
|
||
)
|
||
],
|
||
)
|
||
),
|
||
)
|
||
```
|
||
|
||
In order to use the user turn strategies you must update to the new
|
||
universal `LLMContext` and `LLMContextAggregatorPair`.
|
||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||
|
||
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural
|
||
network via pyrnnoise library.
|
||
(PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205))
|
||
|
||
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time
|
||
voice conversations:
|
||
|
||
- Support for real-time audio streaming with WebSocket connection
|
||
- Built-in server-side VAD (Voice Activity Detection)
|
||
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
|
||
- Built-in tools support: web_search, x_search, file_search
|
||
- Custom function calling with standard Pipecat tools schema
|
||
- Configurable audio formats (PCM at 8kHz-48kHz)
|
||
(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))
|
||
|
||
- Added an approximation of TTFB for Ultravox.
|
||
(PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268))
|
||
|
||
- Added a new `AudioContextTTSService` to the TTS service base classes. The
|
||
`AudioContextWordTTSService` now inherits from `AudioContextTTSService` and
|
||
`WebsocketWordTTSService`.
|
||
(PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289))
|
||
|
||
- `LLMUserAggregator` now exposes the following events:
|
||
|
||
- `on_user_turn_started`: triggered when a user turn starts
|
||
- `on_user_turn_stopped`: triggered when a user turn ends
|
||
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop
|
||
and times out
|
||
(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))
|
||
|
||
- Introducing user mute strategies. User mute strategies indicate when user
|
||
input should be muted based on the current system state.
|
||
|
||
In conversational agents, user mute strategies are used to prevent user
|
||
input from interrupting bot speech, tool execution, or other critical system
|
||
operations.
|
||
|
||
A list of strategies can be specified; all strategies are evaluated for
|
||
every frame so that each strategy can maintain its internal state. A user
|
||
frame is muted if any of the configured strategies indicates it should be
|
||
muted.
|
||
|
||
Available user mute strategies:
|
||
|
||
- `FirstSpeechUserMuteStrategy`
|
||
- `MuteUntilFirstBotCompleteUserMuteStrategy`
|
||
- `AlwaysUserMuteStrategy`
|
||
- `FunctionCallUserMuteStrategy`
|
||
|
||
User mute strategies replace the legacy `STTMuteFilter` and provide a more
|
||
flexible and composable approach to muting user input.
|
||
|
||
User mute strategies are configured when setting up the
|
||
`LLMContextAggregatorPair`. For example:
|
||
|
||
```python
|
||
context_aggregator = LLMContextAggregatorPair(
|
||
context,
|
||
user_params=LLMUserAggregatorParams(
|
||
user_mute_strategies=[
|
||
FirstSpeechUserMuteStrategy(),
|
||
]
|
||
),
|
||
)
|
||
```
|
||
|
||
In order to use user mute strategies you should update to the new universal
|
||
`LLMContext` and `LLMContextAggregatorPair`.
|
||
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
|
||
|
||
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService`
|
||
and `NvidiaTTSService`.
|
||
(PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300))
|
||
|
||
- Added `enable_interruptions` constructor argument to all user turn
|
||
strategies. This tells the `LLMUserAggregator` to push or not push an
|
||
`InterruptionFrame`.
|
||
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
|
||
|
||
- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control
|
||
sentence splitting behavior for finals on sentence boundaries.
|
||
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
|
||
|
||
- Added word-level timestamp support to `AzureTTSService` for accurate
|
||
text-to-audio synchronization.
|
||
(PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334))
|
||
|
||
- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams`
|
||
and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation
|
||
dictionary feature for custom pronunciations.
|
||
(PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346))
|
||
|
||
- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport`
|
||
(see https://www.liveavatar.com/).
|
||
(PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357))
|
||
|
||
- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:
|
||
|
||
- New `start_video_paused` parameter to control initial video input state
|
||
- New `video_frame_detail` parameter to set image processing quality
|
||
("auto",
|
||
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
|
||
parameter.
|
||
- `set_video_input_paused()` method to pause/resume video input at runtime
|
||
- `set_video_frame_detail()` method to adjust video frame quality
|
||
dynamically
|
||
- Automatic rate limiting (1 frame per second) to prevent API overload
|
||
(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))
|
||
|
||
- Added `UserTurnProcessor`, a frame processor built on `UserTurnController`
|
||
that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames
|
||
and interruptions based on the controller's user turn strategies.
|
||
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
|
||
|
||
- Added `UserTurnController` to manage user turns. It emits
|
||
`on_user_turn_started`, `on_user_turn_stopped`, and
|
||
`on_user_turn_stop_timeout` events, and can be integrated into processors to
|
||
detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are
|
||
implemented using this controller.
|
||
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
|
||
|
||
- Added `should_interrupt` property to `DeepgramFluxSTTService`,
|
||
`DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the
|
||
bot should be interrupted when the external service detects user speech.
|
||
(PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374))
|
||
|
||
- `LLMAssistantAggregator` now exposes the following events:
|
||
|
||
- `on_assistant_turn_started`: triggered when the assistant turn starts
|
||
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
|
||
- `on_assistant_thought`: triggered when there's an assistant thought
|
||
available
|
||
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
|
||
|
||
- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA
|
||
SDK (requires `krisp_audio`).
|
||
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
|
||
|
||
- Added support for setting up a pipeline task from external files. You can now
|
||
register custom pipeline task setup files by setting the
|
||
`PIPECAT_SETUP_FILES` environment variable. This variable should contain a
|
||
colon-separated list of Python files (e.g. `export
|
||
PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
|
||
function with the following signature:
|
||
|
||
```python
|
||
async def setup_pipeline_task(task: PipelineTask):
|
||
...
|
||
```
|
||
|
||
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
|
||
|
||
- Added a keepalive task for `InworldTTSService` to keep the service connected
|
||
in the event of no generations for longer periods of time.
|
||
(PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403))
|
||
|
||
- Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When
|
||
enabled, `GladiaSTTService` acts as the turn controller, emitting
|
||
`UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally
|
||
`InterruptionFrame`.
|
||
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
|
||
|
||
- Added `should_interrupt` property to `GladiaSTTService` to configure whether
|
||
the bot should be interrupted when the external service detects user speech.
|
||
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
|
||
|
||
- Added `VonageFrameSerializer` for the Vonage Video API Audio Connector
|
||
WebSocket protocol.
|
||
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
|
||
|
||
- Added `append_trailing_space` parameter to `TTSService` to automatically
|
||
append a trailing space to text before sending to TTS, helping prevent some
|
||
services from vocalizing trailing punctuation.
|
||
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
|
||
|
||
### Changed
|
||
|
||
- Updated `ElevenLabsRealtimeSTTService` to accept the
|
||
`include_language_detection` parameter to detect language.
|
||
|
||
```python
|
||
stt = ElevenLabsRealtimeSTTService(
|
||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||
include_language_detection=True
|
||
)
|
||
```
|
||
|
||
(PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216))
|
||
|
||
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved
|
||
VAD, Smart Turn capabilities, and brings dramatic improvements to latency
|
||
without any impact on accuracy. Use the `turn_detection_mode` parameter to control
|
||
the endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
|
||
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
|
||
|
||
```python
|
||
stt = SpeechmaticsSTTService(
|
||
api_key=os.getenv("SPEECHMATICS_API_KEY"),
|
||
params=SpeechmaticsSTTService.InputParams(
|
||
language=Language.EN,
|
||
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
|
||
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
|
||
),
|
||
)
|
||
```
|
||
|
||
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
|
||
|
||
- `daily-python` updated to 0.23.0.
|
||
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
|
||
|
||
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by
|
||
`DailyTransport` now include the transport source (i.e., the originating
|
||
audio track).
|
||
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
|
||
|
||
- Updates to Inworld TTS services:
|
||
|
||
- Improved `InworldTTSService`'s websocket implementation to better flush
|
||
and close context to better handle long inputs.
|
||
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
|
||
(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))
|
||
|
||
- Improved the error handling and reconnection logic for `WebsocketServer` by
|
||
distinguishing between errors when disconnecting and websocket communication
|
||
errors.
|
||
(PR [#3392](https://github.com/pipecat-ai/pipecat/pull/3392))
|
||
|
||
- Updated `DeepgramSTTService` to push user started/stopped speaking and
|
||
interruption frames when `vad_enabled` is set to true. This centralizes the
|
||
frames into the service, removing the need to have your application code
|
||
handle Deepgram's events and push these frames.
|
||
(PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314))
|
||
|
||
- Added encoding validation to `DeepgramTTSService` to prevent unsupported
|
||
encodings from reaching the API. The service now raises `ValueError` at
|
||
initialization with a clear error message.
|
||
(PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329))
|
||
|
||
- Updated `read_audio_frame` & `read_video_frame` methods in
|
||
`SmallWebRTCClient` to check if the track is enabled before logging a
|
||
warning.
|
||
(PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336))
|
||
|
||
- Updated `CartesiaTTSService` to support setting `language=None`, resulting in
|
||
Cartesia auto-detecting the language of the conversation.
|
||
(PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366))
|
||
|
||
- The bundled Smart Turn weights are now updated to v3.2, which has better
|
||
handling of short utterances, and is more robust against background noise.
|
||
(PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367))
|
||
|
||
- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6`
|
||
(PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371))
|
||
|
||
- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
|
||
meaning that the start of the turn audio is not cut off. This improves
|
||
accuracy for short utterances.
|
||
|
||
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
|
||
(PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377))
|
||
|
||
- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter`
|
||
to share a single SDK instance within the same process.
|
||
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
|
||
|
||
- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english`
|
||
and voice ID to `autumn`.
|
||
(PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399))
|
||
|
||
- Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio
|
||
packetization via the `fixed_audio_packet_size` parameter to support media
|
||
endpoints requiring strict framing and real-time pacing.
|
||
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
|
||
|
||
- `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to
|
||
`True` to prevent punctuation (e.g., “dot”) from being pronounced.
|
||
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
|
||
|
||
- Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`,
|
||
`LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns
|
||
thought content.
|
||
(PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431))
|
||
|
||
### Deprecated
|
||
|
||
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use
|
||
`pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with
|
||
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
|
||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||
|
||
- `FrameProcessor.interruption_strategies` is deprecated, use
|
||
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
|
||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||
|
||
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in
|
||
`pipecat.processors.aggregators.llm_response` are now deprecated. Use the new
|
||
universal `LLMContext` and `LLMContextAggregatorPair` instead.
|
||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||
|
||
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and
|
||
`UserStoppedSpeakingFrame` frames.
|
||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||
|
||
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame`
|
||
frames are deprecated.
|
||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||
|
||
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in
|
||
unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies`
|
||
parameter instead.
|
||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||
|
||
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is
|
||
deprecated. Use the new `turn_detection_mode` parameter instead, with
|
||
`TurnDetectionMode.EXTERNAL`,`TurnDetectionMode.ADAPTIVE`, or
|
||
`TurnDetectionMode.SMART_TURN`. The `enable_vad` parameter is also
|
||
deprecated and is inferred from the `turn_detection_mode`.
|
||
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
|
||
|
||
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are
|
||
now deprecated in favor of the universal `LLMContext` and its associated
|
||
things.
|
||
|
||
From the developer's point of view, switching to using `LLMContext`
|
||
machinery will usually be a matter of going from this:
|
||
|
||
```python
|
||
context = OpenAILLMContext(messages, tools)
|
||
context_aggregator = llm.create_context_aggregator(context)
|
||
```
|
||
|
||
To this:
|
||
|
||
```
|
||
context = LLMContext(messages, tools)
|
||
context_aggregator = LLMContextAggregatorPair(context)
|
||
```
|
||
|
||
(PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263))
|
||
|
||
- `STTMuteFilter` is deprecated and will be removed in a future version. Use
|
||
`LLMUserAggregator`'s new `user_mute_strategies` instead.
|
||
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
|
||
|
||
- `FrameProcessor.interruptions_allowed` is now deprecated, use
|
||
`LLMUserAggregator`'s new parameter `user_mute_strategies` instead.
|
||
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
|
||
|
||
- `PipelineParams.allow_interruptions` is now deprecated, use
|
||
`LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For
|
||
example, to disable interruptions but still get user turns you can do:
|
||
|
||
```python
|
||
context_aggregator = LLMContextAggregatorPair(
|
||
context,
|
||
user_params=LLMUserAggregatorParams(
|
||
user_turn_strategies=UserTurnStrategies(
|
||
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
|
||
),
|
||
),
|
||
)
|
||
```
|
||
|
||
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
|
||
|
||
- `TranscriptProcessor` and related data classes and frames
|
||
(`TranscriptionMessage`, `ThoughtTranscriptionMessage`,
|
||
`TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and
|
||
`LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and
|
||
`on_assistant_turn_stopped`) instead.
|
||
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
|
||
|
||
- Deprecated support for the `vad_events` `LiveOptions` in
|
||
`DeepgramSTTService`. Instead, use a local Silero VAD for VAD events.
|
||
Additionally, deprecated `should_interrupt` which will be removed along with
|
||
`vad_events` support in a future release.
|
||
(PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386))
|
||
|
||
- Loading external observers from files is deprecated, use the new pipeline
|
||
task setup files and `PIPECAT_SETUP_FILES` environment variable instead.
|
||
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
|
||
|
||
### Fixed
|
||
|
||
- Improved error handling in `ElevenLabsRealtimeSTTService`
|
||
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
|
||
|
||
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
|
||
that blocks the process if the websocket disconnects due to an error
|
||
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
|
||
|
||
- Fixed a bug in `STTMuteFilter` where the user was not always muted during
|
||
function calls, especially when there were multiple simultaneous calls.
|
||
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
|
||
|
||
- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate
|
||
memory" error when processing silence audio frames.
|
||
(PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322))
|
||
|
||
- Updated `SpeechmaticsSTTService` for version `0.0.99+`:
|
||
|
||
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
|
||
in order to finalize transcription.
|
||
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
|
||
detection.
|
||
- Only emit VAD + interruption frames if VAD is enabled within the plugin
|
||
(modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
|
||
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
|
||
|
||
- Fixed an issue with function calling where a handler failing to invoke its
|
||
result callback could leave the context stuck in IN_PROGRESS, causing LLM
|
||
inference for subsequent function call results to block while waiting on the
|
||
unresolved call.
|
||
(PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343))
|
||
|
||
- Fixed an issue with DeepgramTTSService where the model would output "Dot"
|
||
instead of a period in some circumstances.
|
||
(PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345))
|
||
|
||
- Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as
|
||
`unknown`.
|
||
(PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351))
|
||
|
||
- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were
|
||
occasionally not pushed.
|
||
(PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356))
|
||
|
||
- Fixed potential memory leaks and initialization issues in `KrispVivaFilter`
|
||
by improving SDK lifecycle management.
|
||
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
|
||
|
||
- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was
|
||
set after awaiting, allowing the event loop to re-enter the method before the
|
||
guard was set.
|
||
(PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400))
|
||
|
||
- Fixed parallel function calling when using Gemini thinking.
|
||
(PR [3420](https://github.com/pipecat-ai/pipecat/pull/3420))
|
||
|
||
- Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as
|
||
`unknown`.
|
||
(PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422))
|
||
|
||
- Fixed an issue in `traced_tts`, `traced_gemini_live`, and
|
||
`traced_openai_realtime` where `model_name` in OpenTelemetry appears as
|
||
`unknown`.
|
||
(PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428))
|
||
|
||
- Fixed `request_image_frame` (for backwards compatibility) and restored
|
||
function-call–related fields in `UserImageRequestFrame` and
|
||
`UserImageRawFrame`, preventing a case where adding a non-LLM message to the
|
||
context could trigger duplicate LLM inferences (on image arrival and on
|
||
function-call result), potentially causing an infinite inference loop.
|
||
(PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430))
|
||
|
||
- Fixed `LLMContext.create_audio_message()` by correcting an internal helper
|
||
that was incorrectly declared async while being run in `asyncio.to_thread()`.
|
||
(PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435))
|
||
|
||
### Other
|
||
|
||
- Added `52-live-transcription.py` foundational example demonstrating live
|
||
transcription and translation from English to Spanish. In this example, the
|
||
bot is not interruptible: as the user continues speaking, English
|
||
transcriptions are queued, and the bot continuously translates and speaks
|
||
each queued sentence in Spanish without being interrupted by new user speech.
|
||
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
|
||
|
||
- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows
|
||
how to use `UserTurnProcessor`.
|
||
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
|
||
|
||
- Added a new foundational example `28-user-assistant-turns.py` that shows how
|
||
to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to
|
||
gather a conversation transcript.
|
||
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
|
||
|
||
## [0.0.98] - 2025-12-17
|
||
|
||
### Added
|
||
|
||
- Added `RimeNonJsonTTSService` which supports non-JSON streaming mode. This
|
||
new class supports websocket streaming for the Arcana model.
|
||
(PR [#3085](https://github.com/pipecat-ai/pipecat/pull/3085))
|
||
|
||
- Added additional functionality related to "thinking", for Google and
|
||
Anthropic LLMs.
|
||
|
||
1. New typed parameters for Google and Anthropic LLMs that control the
|
||
models' thinking behavior (like how much thinking to do, and whether to
|
||
output thoughts or thought summaries):
|
||
- `AnthropicLLMService.ThinkingConfig`
|
||
- `GoogleLLMService.ThinkingConfig`
|
||
2. New frames for representing thoughts output by LLMs:
|
||
- `LLMThoughtStartFrame`
|
||
- `LLMThoughtTextFrame`
|
||
- `LLMThoughtEndFrame`
|
||
3. A generic mechanism for recording LLM thoughts to context, used
|
||
specifically to support Anthropic, whose thought signatures are expected
|
||
to appear alongside the text of the thoughts within assistant context
|
||
messages. See:
|
||
- `LLMThoughtEndFrame.signature`
|
||
- `LLMAssistantAggregator` handling of the above field
|
||
- `AnthropicLLMAdapter` handling of `"thought"` context messages
|
||
4. Google-specific logic for inserting thought signatures into the context,
|
||
to help maintain thinking continuity in a chain of LLM calls. See:
|
||
- `GoogleLLMService` sending `LLMMessagesAppendFrame`s to add
|
||
LLM-specific
|
||
`"thought_signature"` messages to context
|
||
- `GeminiLLMAdapter` handling of `"thought_signature"` messages
|
||
5. An expansion of `TranscriptProcessor` to process LLM thoughts in
|
||
addition to user and assistant utterances. See:
|
||
- `TranscriptProcessor(process_thoughts=True)` (defaults to `False`)
|
||
- `ThoughtTranscriptionMessage`, which is now also emitted with the
|
||
`"on_transcript_update"` event
|
||
(PR [#3175](https://github.com/pipecat-ai/pipecat/pull/3175))
|
||
|
||
- Data and control frames can now be marked as non-interruptible by using the
|
||
`UninterruptibleFrame` mixin. Frames marked as `UninterruptibleFrame` will
|
||
not be interrupted during processing, and any queued frames of this type will
|
||
be retained in the internal queues. This is useful when you need ordered
|
||
frames (data or control) that should not be discarded or cancelled due to
|
||
interruptions.
|
||
(PR [#3189](https://github.com/pipecat-ai/pipecat/pull/3189))
|
||
|
||
- Added `on_conversation_detected` event to `VoicemaiDetector`.
|
||
(PR [#3207](https://github.com/pipecat-ai/pipecat/pull/3207))
|
||
|
||
- Added `x-goog-api-client` header with Pipecat's version to all Google
|
||
services' requests.
|
||
(PR [#3208](https://github.com/pipecat-ai/pipecat/pull/3208))
|
||
|
||
- Added support for the HeyGen LiveAvatar API (see https://www.liveavatar.com/).
|
||
(PR [#3210](https://github.com/pipecat-ai/pipecat/pull/3210))
|
||
|
||
- Added to `AWSNovaSonicLLMService` functionality related to the new (and now
|
||
default) Nova 2 Sonic model (`"amazon.nova-2-sonic-v1:0"`):
|
||
|
||
- Added the `endpointing_sensitivity` parameter to control how quickly the
|
||
model decides the user has stopped speaking.
|
||
- Made the assistant-response-trigger hack a no-op. It's only needed for
|
||
the older Nova Sonic model.
|
||
(PR [#3212](https://github.com/pipecat-ai/pipecat/pull/3212))
|
||
|
||
- [Ultravox Realtime](https://docs.ultravox.ai) is now a supported
|
||
speech-to-speech service.
|
||
|
||
- Added `UltravoxRealtimeLLMService` for the integration.
|
||
- Added `49-ultravox-realtime.py` example (with tool calling).
|
||
(PR [#3227](https://github.com/pipecat-ai/pipecat/pull/3227))
|
||
|
||
- Added Daily PSTN dial-in support to the development runner with `--dialin`
|
||
flag. This includes:
|
||
|
||
- `/daily-dialin-webhook` endpoint that handles incoming Daily PSTN webhooks
|
||
- Automatic Daily room creation with SIP configuration
|
||
- `DialinSettings` and `DailyDialinRequest` types in `pipecat.runner.types`
|
||
for type-safe dial-in data
|
||
- The runner now mimics Pipecat Cloud's dial-in webhook handling for local
|
||
development
|
||
(PR [#3235](https://github.com/pipecat-ai/pipecat/pull/3235))
|
||
|
||
- Add Gladia session id to logs for `GladiaSTTService`.
|
||
(PR [#3236](https://github.com/pipecat-ai/pipecat/pull/3236))
|
||
|
||
- Added `InworldHttpTTSService` which uses Inworld's HTTP based TTS service in
|
||
either streaming or non-streaming mode. Note: This class was previously named
|
||
`InworldTTSService`.
|
||
(PR [#3239](https://github.com/pipecat-ai/pipecat/pull/3239))
|
||
|
||
- Added `language_hints_strict` parameter to `SonioxSTTService` to strictly
|
||
enforces language hints. This ensures that transcription occurs in the
|
||
specified language.
|
||
(PR [#3245](https://github.com/pipecat-ai/pipecat/pull/3245))
|
||
|
||
- Added Pipecat library version info to the `about` field in the `bot-ready`
|
||
RTVI message.
|
||
(PR [#3248](https://github.com/pipecat-ai/pipecat/pull/3248))
|
||
|
||
- Added `VisionFullResponseStartFrame`, `VisionFullResponseEndFrame` and
|
||
`VisionTextFrame`. This are used by vision services similar to LLM
|
||
services.
|
||
(PR [#3252](https://github.com/pipecat-ai/pipecat/pull/3252))
|
||
|
||
### Changed
|
||
|
||
- `FunctionCallInProgressFrame` and `FunctionCallResultFrame` have changed from
|
||
system frames to a control frame and a data frame, respectively, and are
|
||
now both marked as `UninterruptibleFrame`.
|
||
(PR [#3189](https://github.com/pipecat-ai/pipecat/pull/3189))
|
||
|
||
- `UserBotLatencyLogObserver` now uses `VADUserStartedSpeakingFrame` and
|
||
`VADUserStoppedSpeakingFrame` to determine latency from user stopped speaking
|
||
to bot started speaking.
|
||
(PR [#3206](https://github.com/pipecat-ai/pipecat/pull/3206))
|
||
|
||
- Updated `HeyGenVideoService` and `HeyGenTransport` to support both HeyGen
|
||
APIs (Interactive Avatar and Live Avatar).
|
||
Using them is as simple as specifying the `service_type` when creating the
|
||
`HeyGenVideoService` and the `HeyGenTransport`:
|
||
|
||
```python
|
||
heyGen = HeyGenVideoService(
|
||
api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"),
|
||
service_type=ServiceType.LIVE_AVATAR,
|
||
session=session,
|
||
)
|
||
```
|
||
|
||
(PR [#3210](https://github.com/pipecat-ai/pipecat/pull/3210))
|
||
|
||
- Made `"amazon.nova-2-sonic-v1:0"` the new default model for
|
||
`AWSNovaSonicLLMService`.
|
||
(PR [#3212](https://github.com/pipecat-ai/pipecat/pull/3212))
|
||
|
||
- Updated the `run_inference` methods in the LLM service classes
|
||
(`AnthropicLLMService`, `AWSBedrockLLMService`, `GoogleLLMService`, and
|
||
`OpenAILLMService` and its base classes) to use the provided LLM
|
||
configuration parameters.
|
||
(PR [#3214](https://github.com/pipecat-ai/pipecat/pull/3214))
|
||
|
||
- Updated default models for:
|
||
|
||
- `GeminiLiveLLMService` to `gemini-2.5-flash-native-audio-preview-12-2025`.
|
||
- `GeminiLiveVertexLLMService` to `gemini-live-2.5-flash-native-audio`.
|
||
(PR [#3228](https://github.com/pipecat-ai/pipecat/pull/3228))
|
||
|
||
- Changed the `reason` field in `EndFrame`, `CancelFrame`, `EndTaskFrame`, and
|
||
`CancelTaskFrame` from `str` to `Any` to indicate that it can hold values
|
||
other than strings.
|
||
(PR [#3231](https://github.com/pipecat-ai/pipecat/pull/3231))
|
||
|
||
- Updated websocket STT services to use the `WebsocketSTTService` base class.
|
||
This base class manages the websocket connection and handles reconnects.
|
||
Updated services:
|
||
|
||
- `AssemblyAISTTService`
|
||
- `AWSTranscribeSTTService`
|
||
- `GladiaSTTService`
|
||
- `SonioxSTTService`
|
||
(PR [#3236](https://github.com/pipecat-ai/pipecat/pull/3236))
|
||
|
||
- Changed Inworld's TTS service implementations:
|
||
|
||
- Previously, the HTTP implementation was named `InworldTTSService`. That
|
||
has been moved to `InworldHttpTTSService`. This service now supports
|
||
word-timestamp alignment data in both streaming and non-streaming modes.
|
||
- Updated the `InworldTTSService` class to use Inworld's Websocket API.
|
||
This class now has support for word-timestamp alignment data and tracks
|
||
contexts for each user turn.
|
||
(PR [#3239](https://github.com/pipecat-ai/pipecat/pull/3239))
|
||
|
||
- ⚠️ Breaking change: `WordTTSService.start_word_timestamps()` and
|
||
`WordTTSService.reset_word_timestamps()` are now async.
|
||
(PR [#3240](https://github.com/pipecat-ai/pipecat/pull/3240))
|
||
|
||
- Updated the current RTVI version to 1.1.0 to reflect recent additions and
|
||
deprecations.
|
||
|
||
- New RTVI Messages: `send-text` and `bot-output`
|
||
- Deprecated Messages: `append-to-context` and `bot-transcription`
|
||
(PR [#3248](https://github.com/pipecat-ai/pipecat/pull/3248))
|
||
|
||
- `MoondreamService` now pushes `VisionFullResponseStartFrame`,
|
||
`VisionFullResponseEndFrame` and `VisionTextFrame`.
|
||
(PR [#3252](https://github.com/pipecat-ai/pipecat/pull/3252))
|
||
|
||
### Deprecated
|
||
|
||
- `FalSmartTurnAnalyzer` and `LocalSmartTurnAnalyzer` are deprecated and will
|
||
be removed in a future version. Use `LocalSmartTurnAnalyzerV3` instead.
|
||
(PR [#3219](https://github.com/pipecat-ai/pipecat/pull/3219))
|
||
|
||
### Removed
|
||
|
||
- Removed the deprecated VLLM-based open source Ultravox STT service.
|
||
(PR [#3227](https://github.com/pipecat-ai/pipecat/pull/3227))
|
||
|
||
### Fixed
|
||
|
||
- Fixed a bug in `AWSNovaSonicLLMService` where we would mishandle cancelled
|
||
tool calls in the context, resulting in errors.
|
||
(PR [#3212](https://github.com/pipecat-ai/pipecat/pull/3212))
|
||
|
||
- Better support conversation history with Gemini 2.5 Flash Image (model
|
||
"gemini-2.5-flash-image"). Prior to this fix, the model had no memory of
|
||
previous images it had generated, so it wouldn't be able to iterate on
|
||
them.
|
||
(PR [#3224](https://github.com/pipecat-ai/pipecat/pull/3224))
|
||
|
||
- Support conversations with Gemini 3 Pro Image (model
|
||
"gemini-3-pro-image-preview"). Prior to this fix, after the model generated
|
||
an image the conversation would not be able to progress.
|
||
(PR [#3224](https://github.com/pipecat-ai/pipecat/pull/3224))
|
||
|
||
- Fixed an issue where `ElevenLabsHttpTTSService` was not updating
|
||
voice settings when receiving a `TTSUpdateSettingsFrame`.
|
||
(PR [#3226](https://github.com/pipecat-ai/pipecat/pull/3226))
|
||
|
||
- Fixed the return type for `SmallWebRTCRequestHandler.handle_web_request()`
|
||
function.
|
||
(PR [#3230](https://github.com/pipecat-ai/pipecat/pull/3230))
|
||
|
||
- Fix a bug in LLM context audio content handling
|
||
(PR [#3234](https://github.com/pipecat-ai/pipecat/pull/3234))
|
||
|
||
- In `GladiaSTTService`, reset the `_bytes_sent` counter on connecting the
|
||
websocket. This avoids unnecessary audio buffer trimming.
|
||
(PR [#3236](https://github.com/pipecat-ai/pipecat/pull/3236))
|
||
|
||
- Fixed a TTS service word-timestamp issue that could cause generated
|
||
`TTSTextFrame` instances to have an incorrect pts (`pts = -1`).
|
||
(PR [#3240](https://github.com/pipecat-ai/pipecat/pull/3240))
|
||
|
||
- Fixed an issue in `SimpleTextAggreagtor` where spaces were not being stripped
|
||
before returning the aggregation. This resulted in an extra space for TTS
|
||
services that don't support word-timestamp alignment data.
|
||
(PR [#3247](https://github.com/pipecat-ai/pipecat/pull/3247))
|
||
|
||
## [0.0.97] - 2025-12-05
|
||
|
||
### Added
|
||
|
||
- Added new Gradium services, `GradiumSTTService` and `GradiumTTSService`, for
|
||
speech-to-text and text-to-speech functionality using Gradium's API.
|
||
|
||
- Additions for `AsyncAITTSService` and `AsyncAIHttpTTSService`:
|
||
|
||
- Added new `languages`: `pt`, `nl`, `ar`, `ru`, `ro`, `ja`, `he`, `hy`,
|
||
`tr`, `hi`, `zh`.
|
||
- Updated the default model to `asyncflow_multilingual_v1.0` for improved
|
||
accuracy and broader language coverage.
|
||
|
||
- Added optional tool and tool output filters for MCP services.
|
||
|
||
### Changed
|
||
|
||
- Updated Deepgram logging to include Deepgram request IDs for improved
|
||
debugging.
|
||
|
||
- Text Aggregation Improvements:
|
||
|
||
- **Breaking Change**: `BaseTextAggregator.aggregate()` now returns
|
||
`AsyncIterator[Aggregation]` instead of `Optional[Aggregation]`. This
|
||
enables the aggregator to return multiple results based on the provided
|
||
text.
|
||
- Refactored text aggregators to use inheritance: `SkipTagsAggregator` and
|
||
`PatternPairAggregator` now inherit from `SimpleTextAggregator`, reusing
|
||
the base class's sentence detection logic.
|
||
|
||
- Improved interruption handling to prevent bots from repeating themselves. LLM
|
||
services that return multiple sentences in a single response (e.g.,
|
||
`GoogleLLMService`) are now split into individual sentences before being sent
|
||
to TTS. This ensures interruptions occur at sentence boundaries, preventing
|
||
the bot from repeating content after being interrupted during long responses.
|
||
|
||
- Updated `AICFilter` to use Quail STT as the default model
|
||
(`AICModelType.QUAIL_STT`). Quail STT is optimized for human-to-machine
|
||
interaction (e.g., voice agents, speech-to-text) and operates at a native
|
||
sample rate of 16 kHz with fixed enhancement parameters.
|
||
|
||
- If an unexpected exception is caught, or if `FrameProcessor.push_error()` is
|
||
called with an exception, the file name and line number where the exception
|
||
occured are now logged.
|
||
|
||
- Updated Smart Turn model weights to v3.1.
|
||
|
||
- Smart Turn analyzer now uses the full context of the turn rather than just
|
||
the audio since VAD last triggered.
|
||
|
||
- Updated `CartesiaSTTService` to return the full transcription `result` in the
|
||
`TranscriptionFrame` and `InterimTranscriptionFrame`. This provides access to
|
||
word timestamp data.
|
||
|
||
- `HumeTTSService` changes:
|
||
|
||
- Added tracking headers (`X-Hume-Client-Name` and `X-Hume-Client-Version`)
|
||
to all requests made by `HumeTTSService` to the Hume API for better usage
|
||
tracking and analytics.
|
||
- Added `stop()` and `cancel()` cleanup methods to `HumeTTSService` to
|
||
properly close the HTTP client and prevent resource leaks.
|
||
|
||
### Deprecated
|
||
|
||
- NVIDIA Services name changes (all functionality is unchanged):
|
||
|
||
- `NimLLMService` is now deprecated, use `NvidiaLLMService` instead.
|
||
- `RivaSTTService` is now deprecated, use `NvidiaSTTService` instead.
|
||
- `RivaTTSService` is now deprecated, use `NvidiaTTSService` instead.
|
||
- Use `uv pip install pipecat-ai[nvidia]` instead of
|
||
`uv pip install pipecat-ai[riva]`
|
||
|
||
- The `noise_gate_enable` parameter in `AICFilter` is deprecated and no longer
|
||
has any effect. Noise gating is now handled automatically by the AIC VAD
|
||
system. Use `AICFilter.create_vad_analyzer()` for VAD functionality instead.
|
||
|
||
- Package `pipecat.sync` is deprecated, use `pipecat.utils.sync` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed bug in `PatternPairAggregator` where pattern handlers could be called
|
||
multiple times for `KEEP` or `AGGREGATE` patterns.
|
||
|
||
- Fixed sentence aggregation to correctly handle ambiguous punctuation in
|
||
streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").
|
||
|
||
- Fixed an issue in `AWSTranscribeSTTService` where the `region` arg was always
|
||
set to `us-east-1` when providing an AWS_REGION env var.
|
||
|
||
- Fixed an issue in `SarvamTTSService` where the last sentence was not being
|
||
spoken. Now, audio is flushed when the TTS services receives the
|
||
`LLMFullResponseEndFrame` or `EndFrame`.
|
||
|
||
- Fixed an issue in `DeepgramTTSService` where a `TTSStoppedFrame` was
|
||
incorrectly pushed after a functional call. This caused an issue with the
|
||
voice-ui-kit's conversational panel rending of the LLM output after a
|
||
function call.
|
||
|
||
- Fixed an issue where `LLMTextFrame.skip_tts` was being overwritten by LLM
|
||
services.
|
||
|
||
- Fixed an issue that caused `WebsocketService` instances to attempt
|
||
reconnection during shutdown.
|
||
|
||
- Fixed an issue in `ElevenLabsTTSService` where character usage metrics were
|
||
only reported on the first TTS generation per turn.
|
||
|
||
## [0.0.96] - 2025-11-26 🦃 "Happy Thanksgiving!" 🦃
|
||
|
||
### Added
|
||
|
||
- Added `AWSBedrockAgentCoreProcessor` to support invoking an AgentCore-hosted
|
||
agent in a Pipecat pipeline.
|
||
|
||
- Enhanced error handling across the framework:
|
||
|
||
- Added `on_error` callback to `FrameProcessor` for centralized error
|
||
handling.
|
||
|
||
- Renamed `push_error(error: ErrorFrame)` to `push_error_frame(error: ErrorFrame)`
|
||
for clarity.
|
||
|
||
- Added new `push_error` method for simplified error reporting:
|
||
|
||
```python
|
||
async def push_error(error_msg: str,
|
||
exception: Optional[Exception] = None,
|
||
fatal: bool = False)
|
||
```
|
||
|
||
- Standardized error logging by replacing `logger.exception` calls with
|
||
`logger.error` throughout the codebase.
|
||
|
||
- Added `cache_read_input_tokens`, `cache_creation_input_tokens` and
|
||
`reasoning_tokens` to OTel spans for LLM call
|
||
|
||
- Added `LiveKitRESTHelper` utility class for managing LiveKit rooms via REST API.
|
||
|
||
- Added `DeepgramSageMakerSTTService` which connects to a SageMaker hosted
|
||
Deepgram STT model. Added `07c-interruptible-deepgram-sagemaker.py`
|
||
foundational example.
|
||
|
||
- Added `SageMakerBidiClient` to connect to SageMaker hosted BiDi compatible
|
||
services.
|
||
|
||
- Added support for `include_timestamps` and `enable_logging` in
|
||
`ElevenLabsRealtimeSTTService`. When `include_timestamps` is enabled,
|
||
timestamp data is included in the `TranscriptionFrame`'s `result`
|
||
parameter.
|
||
|
||
- Added optional speaking rate control to `InworldTTSService`.
|
||
|
||
- Introduced a new `AggregatedTextFrame` type to support passing text along with
|
||
an `aggregated_by` field to describe the type of text
|
||
included. `TTSTextFrame`s now inherit from `AggregatedTextFrame`. With this
|
||
inheritance, an observer can watch for `AggregatedTextFrame`s to accumlate the
|
||
perceived output and determine whether or not the text was spoken based on if
|
||
that frame is also a `TTSTextFrame`.
|
||
|
||
With this frame, the llm token stream can be transformed into custom
|
||
composable chunks, allowing for aggregation outside the TTS service. This
|
||
makes it possible to listen for or handle those aggregations and sets the
|
||
stage for doing things like composing a best effort of the perceived llm
|
||
output in a more digestable form and to do so whether or not it is processed
|
||
by a TTS or if even a TTS exists.
|
||
|
||
- Introduced `LLMTextProcessor`: A new processor meant to allow customization
|
||
for how LLMTextFrames should be aggregated and considered. It's purpose is to
|
||
turn `LLMTextFrame`s into `AggregatedTextFrame`s. By default, a TTSService
|
||
will still aggregate `LLMTextFrame`s by sentence for the service to
|
||
consume. However, if you wish to override how the llm text is aggregated, you
|
||
should no longer override the TTS's internal text_aggregator, but instead,
|
||
insert this processor between your LLM and TTS in the pipeline.
|
||
|
||
- New `bot-output` RTVI message to represent what the bot actually "says".
|
||
|
||
- The `RTVIObserver` now emits `bot-output` messages based off the new
|
||
`AggregatedTextFrame`s (`bot-tts-text` and `bot-llm-text` are still
|
||
supported and generated, but `bot-transcript` is now deprecated in lieu of
|
||
this new, more thorough, message).
|
||
|
||
- The new `RTVIBotOutputMessage` includes the fields:
|
||
|
||
- `spoken`: A boolean indicating whether the text was spoken by TTS
|
||
|
||
- `aggregated_by`: A string representing how the text was aggregated
|
||
("sentence", "word", "my custom aggregation")
|
||
|
||
- Introduced new fields to `RTVIObserver` to support the new `bot-output`
|
||
messaging:
|
||
|
||
- `bot_output_enabled`: Defaults to True. Set to false to disable bot-output
|
||
messages.
|
||
|
||
- `skip_aggregator_types`: Defaults to `None`. Set to a list of strings that
|
||
match aggregation types that should not be included in bot-output
|
||
messages. (Ex. `credit_card`)
|
||
|
||
- Introduced new methods, `add_text_transformer()` and
|
||
`remove_text_transformer()`, to `RTVIObserver` to support providing (and
|
||
subsequently removing) callbacks for various types of aggregations (or all
|
||
aggregations with `*`) that can modify the text before being sent as a
|
||
`bot-output` or `tts-text` message. (Think obscuring the credit card or
|
||
inserting extra detail the client might want that the context doesn't need.)
|
||
|
||
- In `MiniMaxHttpTTSService`:
|
||
|
||
- Added support for speech-2.6-hd and speech-2.6-turbo models
|
||
|
||
- Added languages: Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino,
|
||
Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian,
|
||
Swedish, and Tamil
|
||
|
||
- Added new emotions: calm and fluent
|
||
|
||
- Added `enable_logging` to `SimliVideoService` input parameters. It's disabled
|
||
by default.
|
||
|
||
### Changed
|
||
|
||
- Updated `FishAudioTTSService` default model to `s1`.
|
||
|
||
- Updated `DeepgramTTSService` to use Deepgram's TTS websocket API. ⚠️ This is
|
||
a potential breaking change, which only affects you if you're self-hosting
|
||
`DeepgramTTSService`. The new service uses Websockets and improves TTFB
|
||
latency.
|
||
|
||
- Updated `daily-python` to 0.22.0.
|
||
|
||
- `BaseTextAggregator` changes:
|
||
|
||
Modified the BaseTextAggregator type so that when text gets aggregated,
|
||
metadata can be associated with it. Currently, that just means a `type`, so
|
||
that the aggregation can be classified or described. Changes made to support
|
||
this:
|
||
|
||
- ⚠️ IMPORTANT: Aggregators are now expected to strip leading/trailing white
|
||
space characters before returning their aggregation from `aggregation()` or
|
||
`.text`. This way all aggregators have a consistent contract allowing
|
||
downstream use to know how to stitch aggregations back together.
|
||
|
||
- Introduced a new `Aggregation` dataclass to represent both the aggregated
|
||
`text` and a string identifying the `type` of aggregation (ex. "sentence",
|
||
"word", "my custom aggregation")
|
||
|
||
- ⚠️ Breaking change: `BaseTextAggregator.text` now returns an `Aggregation`
|
||
(instead of `str`).
|
||
|
||
Before:
|
||
|
||
```python
|
||
aggregated_text = myAggregator.text
|
||
```
|
||
|
||
Now:
|
||
|
||
```python
|
||
aggregated_text = myAggregator.text.text
|
||
```
|
||
|
||
- ⚠️ Breaking change: `BaseTextAggregator.aggregate()` now returns
|
||
`Optional[Aggregation]` (instead of `Optional[str]`).
|
||
|
||
Before:
|
||
|
||
```python
|
||
aggregation = myAggregator.aggregate(text)
|
||
print(f"successfully aggregated text: {aggregation}")
|
||
```
|
||
|
||
Now:
|
||
|
||
```python
|
||
aggregation = myAggregator.aggregate(text)
|
||
if aggregation:
|
||
print(f"successfully aggregated text: {aggregation.text}")
|
||
```
|
||
|
||
- `SimpleTextAggregator`, `SkipTagsAggregator`, `PatternPairAggregator`
|
||
updated to produce/consume `Aggregation` objects.
|
||
|
||
- All uses of the above Aggregators have been updated accordingly.
|
||
|
||
- Augmented the `PatternPairAggregator` so that matched patterns can be treated
|
||
as their own aggregation, taking advantage of the new. To that end:
|
||
|
||
- Introduced a new, preferred version of `add_pattern` to support a new option
|
||
for treating a match as a separate aggregation returned from
|
||
`aggregate()`. This replaces the now deprecated `add_pattern_pair` method
|
||
and you provide a `MatchAction` in lieu of the `remove_match` field.
|
||
|
||
- `MatchAction` enum: `REMOVE`, `KEEP`, `AGGREGATE`, allowing customization
|
||
for how a match should be handled.
|
||
|
||
- `REMOVE`: The text along with its delimiters will be removed from the
|
||
streaming text. Sentence aggregation will continue on as if this text
|
||
did not exist.
|
||
|
||
- `KEEP`: The delimiters will be removed, but the content between them
|
||
will be kept. Sentence aggregation will continue on with the internal
|
||
text included.
|
||
|
||
- `AGGREGATE`: The delimiters will be removed and the content between will
|
||
be treated as a separate aggregation. Any text before the start of the
|
||
pattern will be returned early, whether or not a complete sentence was
|
||
found. Then the pattern will be returned. Then the aggregation will
|
||
continue on sentence matching after the closing delimiter is found. The
|
||
content between the delimiters is not aggregated by sentence. It is
|
||
aggregated as one single block of text.
|
||
|
||
- `PatternMatch` now extends `Aggregation` and provides richer info to
|
||
handlers.
|
||
|
||
- ⚠️ Breaking change: The `PatternMatch` type returned to handlers registered
|
||
via `on_pattern_match` has been updated to subclass from the new
|
||
`Aggregation` type, which means that `content` has been replaced with
|
||
`text` and `pattern_id` has been replaced with `type`:
|
||
|
||
```python
|
||
async dev on_match_tag(match: PatternMatch):
|
||
pattern = match.type # instead of match.pattern_id
|
||
text = match.text # instead of match.content
|
||
```
|
||
|
||
- `TextFrame` now includes the field `append_to_context` to support setting
|
||
whether or not the encompassing text should be added to the LLM context (by
|
||
the LLM assistant aggregator). It defaults to `True`.
|
||
|
||
- `TTSService` base class updates:
|
||
|
||
- `TTSService`s now accept a new `skip_aggregator_types` to avoid speaking
|
||
certain aggregation types (now determined/returned by the aggregator)
|
||
|
||
- Introduced the ability to do a just-in-time transform of text before it gets
|
||
sent to the TTS service via callbacks you can set up via a new init field,
|
||
`text_transforms` or a new method `add_text_transformer()`. This makes it
|
||
possible to do things like introduce TTS-specific tags for spelling or
|
||
emotion or change the pronunciation of something on the
|
||
fly. `remove_text_transformer` has also been added to support removing a
|
||
registered transform callback.
|
||
|
||
- TTS services push `AggregatedTextFrame` in addition to `TTSTextFrame`s when
|
||
either an aggregation occurs that should not be spoken or when the TTS
|
||
service supports word-by-word timestamping. In the latter case, the
|
||
`TTSService` preliminarily generates an `AggregatedTextFrame`, aggregated by
|
||
sentence to generate the full sentence content as early as possible.
|
||
|
||
- Updated `CartesiaTTSService`:
|
||
|
||
- Modified use of custom default text_aggregator to avoid deprecation warnings
|
||
and push users towards use of transformers or the `LLMTextProcessor`
|
||
|
||
- Added convenience methods for taking advantage of Cartesia's SSML tags:
|
||
spell, emotion, pauses, volume, and speed.
|
||
|
||
- Updated `RimeTTSService`:
|
||
|
||
- Modified use of custom default text_aggregator to avoid deprecation warnings
|
||
and push users towards use of transformers or the `LLMTextProcessor`
|
||
|
||
- Added convenience methods for taking advantage of Rime's customization
|
||
options: spell, pauses, pronunciations, and inline speed control.
|
||
|
||
### Deprecated
|
||
|
||
- The TTS constructor field, `text_aggregator` is deprecated in favor of the new
|
||
`LLMTextProcessor`. TTSServices still have an internal aggregator for support
|
||
of default behavior, but if you want to override the aggregation behavior, you
|
||
should use the new processor.
|
||
|
||
- The RTVI `bot-transcription` event is deprecated in favor of the new
|
||
`bot-output` message which is the canonical representation of bot output
|
||
(spoken or not). The code still emits a transcription message for backwards
|
||
compatibility while transition occurs.
|
||
|
||
- Deprecated `add_pattern_pair` in the `PatternPairAggregator` which takes a
|
||
`pattern_id` and `remove_match` field in favor of the new `add_pattern` method
|
||
which takes a `type` and an `action`
|
||
|
||
- `english_normalization` input parameter for `MiniMaxHttpTTSService` is
|
||
deprecated, use `test_normalization` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue in `AWSBedrockLLMService` where the `aws_region` arg was
|
||
always set to `us-east-1` when providing an AWS_REGION env var.
|
||
|
||
- Fixed an issue with `DeepgramFluxSTTService` where it sometimes failed to reconnect.
|
||
|
||
- Fixed an issue in `ElevenLabsRealtimeSTTService` where dynamic language
|
||
updates were not working.
|
||
|
||
- Fixed an issue in `ElevenLabsRealtimeSTTService` where setting the sample
|
||
rate would result in transcripts failing.
|
||
|
||
- Fixed `InworldTTSService` audio config payload to use camelCase keys expected
|
||
by the Inworld API.
|
||
|
||
## [0.0.95] - 2025-11-18
|
||
|
||
### Added
|
||
|
||
- Added ai-coustics integrated VAD (`AICVADAnalyzer`) with `AICFilter` factory and
|
||
example wiring; leverages the enhancement model for robust detection with no
|
||
ONNX dependency or added processing complexity.
|
||
|
||
- Added a watchdog to `DeepgramFluxSTTService` to prevent dangling tasks in case the
|
||
user was speaking and we stop receiving audio.
|
||
|
||
- Introduced a minimum confidence parameter in `DeepgramFluxSTTService` to avoid
|
||
generating transcriptions below a defined threshold.
|
||
|
||
- Added `ElevenLabsRealtimeSTTService` which implements the Realtime STT
|
||
service from ElevenLabs.
|
||
|
||
- Added word-level timestamps support to Hume TTS service
|
||
|
||
### Changed
|
||
|
||
- ⚠️ Breaking change: `LLMContext.create_image_message()`,
|
||
`LLMContext.create_audio_message()`, `LLMContext.add_image_frame_message()`
|
||
and `LLMContext.add_audio_frames_message()` are now async methods. This fixes
|
||
an issue where the asyncio event loop would be blocked while encoding audio or
|
||
images.
|
||
|
||
- `ConsumerProcessor` now queues frames from the producer internally instead of
|
||
pushing them directly. This allows us to subclass consumer processors and
|
||
manipulate frames before they are pushed.
|
||
|
||
- `BaseTextFilter` only require subclasses to implement the `filter()` method.
|
||
|
||
- Extracted the logic for retrying connections, and create a new `send_with_retry`
|
||
method inside `WebSocketService`.
|
||
|
||
- Refactored `DeepgramFluxSTTService` to automatically reconnect if sending a
|
||
message fails.
|
||
|
||
- Updated all STT and TTS services to use consistent error handling pattern with
|
||
`push_error()` method for better pipeline error event integration.
|
||
|
||
- Added support for `maybe_capture_participant_camera()` and
|
||
`maybe_capture_participant_screen()` for `SmallWebRTCTransport` in the runner
|
||
utils.
|
||
|
||
- Added Hindi support for Rime TTS services.
|
||
|
||
- Updated `GeminiTTSService` to use Google Cloud Text-to-Speech streaming API
|
||
instead of the deprecated Gemini API. Now uses `credentials` /
|
||
`credentials_path` for authentication. The `api_key` parameter is deprecated.
|
||
Also, added support for `prompt` parameter for style instructions and
|
||
expressive markup tags. Significantly improved latency with streaming
|
||
synthesis.
|
||
|
||
- Updated language mappings for the Google and Gemini TTS services to match
|
||
official documentation.
|
||
|
||
### Deprecated
|
||
|
||
- The `api_key` parameter in `GeminiTTSService` is deprecated. Use
|
||
`credentials` or `credentials_path` instead for Google Cloud authentication.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `SimliVideoService` connection issue.
|
||
|
||
- Fixed an issue in the `Runner` where, when using `SmallWebRTCTransport`, the
|
||
`request_data` was not being passed to the `SmallWebRTCRunnerArguments` body.
|
||
|
||
- Fixed subtle issue of assistant context messages ending up with double spaces
|
||
between words or sentences.
|
||
|
||
- Fixed an issue where `NeuphonicTTSService` wasn't pushing `TTSTextFrame`s,
|
||
meaning assistant messages weren't being written to context.
|
||
|
||
- Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying
|
||
LLM completions and tools when using the universal `LLMContext`.
|
||
|
||
- Fixed issue where `DeepgramFluxSTTService` failed to connect if passing a
|
||
`keyterm` or `tag` containing a space.
|
||
|
||
- Prevented `HeyGenVideoService` from automatically disconnecting after 5 minutes.
|
||
|
||
## [0.0.94] - 2025-11-10
|
||
|
||
### Changed
|
||
|
||
- Added support for retrying `SpeechmaticsTTSService` when it returns a 503
|
||
error. Default values in `InputParams`.
|
||
|
||
### Deprecated
|
||
|
||
- The `KrispFilter` is deprecated and will be removed in a future version. Use
|
||
the `KrispVivaFilter` instead.
|
||
|
||
### Removed
|
||
|
||
- `LivekitFrameSerializer` has been removed. Use `LiveKitTransport` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a bug related to `LLMAssistantAggregator` where spaces were sometimes
|
||
missing from assistant messages in context.
|
||
|
||
## [0.0.93] - 2025-11-07
|
||
|
||
### Added
|
||
|
||
- Added support for Sarvam Speech-to-Text service (`SarvamSTTService`) with
|
||
streaming WebSocket support for `saarika` (STT) and `saaras` (STT-translate)
|
||
models.
|
||
|
||
- Added support for passing in a `ToolsSchema` in lieu of a list of provider-
|
||
specific dicts when initializing `OpenAIRealtimeLLMService` or when updating
|
||
it using `LLMUpdateSettingsFrame`.
|
||
|
||
- Added `TransportParams.audio_out_silence_secs`, which specifies how many
|
||
seconds of silence to output when an `EndFrame` reaches the output
|
||
transport. This can help ensure that all audio data is fully delivered to
|
||
clients.
|
||
|
||
- Added new `FrameProcessor.broadcast_frame()` method. This will push two
|
||
instances of a given frame class, one upstream and the other downstream.
|
||
|
||
```python
|
||
await self.broadcast_frame(UserSpeakingFrame)
|
||
```
|
||
|
||
- Added `MetricsLogObserver` for logging performance metrics from `MetricsFrame`
|
||
instances. Supports filtering via `include_metrics` parameter to control which
|
||
metrics types are logged (TTFB, processing time, LLM token usage, TTS usage,
|
||
smart turn metrics).
|
||
|
||
- Added `pronunciation_dictionary_locators` to `ElevenLabsTTSService` and
|
||
`ElevenLabsHttpTTSService`.
|
||
|
||
- Added support for loading external observers. You can now register custom
|
||
pipeline observers by setting the `PIPECAT_OBSERVER_FILES` environment
|
||
variable. This variable should contain a colon-separated list of Python files
|
||
(e.g. `export PIPECAT_OBSERVER_FILES="observer1.py:observer2.py:..."`). Each
|
||
file must define a function with the following signature:
|
||
|
||
```python
|
||
async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]:
|
||
...
|
||
```
|
||
|
||
- Added support for new sonic-3 languages in `CartesiaTTSService` and
|
||
`CartesiaHttpTTSService`.
|
||
|
||
- `EndFrame` and `EndTaskFrame` have an optional `reason` field to indicate why
|
||
the pipeline is being ended.
|
||
|
||
- `CancelFrame` and `CancelTaskFrame` have an optional `reason` field to
|
||
indicate why the pipeline is being canceled. This can be also specified when
|
||
you cancel a task with `PipelineTask.cancel(reason="cancellation reason")`.
|
||
|
||
- Added `include_prob_metrics` parameter to Whisper STT services to enable access
|
||
to probability metrics from transcription results.
|
||
|
||
- Added utility functions `extract_whisper_probability()`,
|
||
`extract_openai_gpt4o_probability()`, and `extract_deepgram_probability()` to
|
||
extract probability metrics from `TranscriptionFrame` objects for Whisper-based,
|
||
OpenAI GPT-4o-transcribe, and Deepgram STT services respectively.
|
||
|
||
- Added `LLMSwitcher.register_direct_function()`. It works much like
|
||
`LLMSwitcher.register_function()` in that it's a shorthand for registering
|
||
functions on all LLMs in the switcher, but for direct functions.
|
||
|
||
- Added `LLMSwitcher.register_direct_function()`. It works much like
|
||
`LLMSwitcher.register_function()` in that it's a shorthand for registering
|
||
a function on all LLMs in the switcher, except this new method takes a direct
|
||
function (a `FunctionSchema`-less function).
|
||
|
||
- Added `MCPClient.get_tools_schema()` and `MCPClient.register_tools_schema()`
|
||
as a two-step alternative to `MCPClient.register_tools()`, to allow users to
|
||
pass MCP tools to, say, `GeminiLiveLLMService` (as well as other
|
||
speech-to-speech services) in the constructor.
|
||
|
||
- Added support for passing in an `LLMSwicher` to `MCPClient.register_tools()`
|
||
(as well as the new `MCPClient.register_tools_schema()`).
|
||
|
||
- Added `cpu_count` parameter to `LocalSmartTurnAnalyzerV3`. This is set to `1`
|
||
by default for more predictable performance on low-CPU systems.
|
||
|
||
### Changed
|
||
|
||
- Updated `simli-ai` to 0.1.25.
|
||
|
||
- `STTMuteFilter` no longer sends `STTMuteFrame` to the STT service. The filter
|
||
now blocks frames locally without instructing the STT service to stop
|
||
processing audio. This prevents inactivity-related errors (such as 409 errors
|
||
from Google STT) while maintaining the same muting behavior at the application
|
||
level. Important: The STTMuteFilter should be placed _after_ the STT service
|
||
itself.
|
||
|
||
- Improved `GoogleSTTService` error handling to properly catch gRPC `Aborted`
|
||
exceptions (corresponding to 409 errors) caused by stream inactivity. These
|
||
exceptions are now logged at DEBUG level instead of ERROR level, since they
|
||
indicate expected behavior when no audio is sent for 10+ seconds (e.g., during
|
||
long silences or when audio input is blocked). The service automatically
|
||
reconnects when this occurs.
|
||
|
||
- Bumped the `fastapi` dependency's upperbound to `<0.122.0`.
|
||
|
||
- Updated the default model for `GoogleVertexLLMService` to `gemini-2.5-flash`.
|
||
|
||
- Updated the `GoogleVertexLLMService` to use the `GoogleLLMService` as a base
|
||
class instead of the `OpenAILLMService`.
|
||
|
||
- Updated STT and TTS services to pass through unverified language codes with a
|
||
warning instead of returning None. This allows developers to use newly
|
||
supported languages before Pipecat's service classes are updated, while still
|
||
providing guidance on verified languages.
|
||
|
||
### Removed
|
||
|
||
- Removed `needs_mcp_alternate_schema()` from `LLMService`. The mechanism that
|
||
relied on it went away.
|
||
|
||
### Fixed
|
||
|
||
- Restore backwards compatibility for vision/image features (broken in 0.0.92)
|
||
when using non-universal context and assistant aggregators.
|
||
|
||
- Fixed `DeepgramSTTService._disconnect()` to properly await `is_connected()`
|
||
method call, which is an async coroutine in the Deepgram SDK.
|
||
|
||
- Fixed an issue where the `SmallWebRTCRequest` dataclass in runner would scrub
|
||
arbitrary request data from client due to camelCase typing. This fixes data
|
||
passthrough for JS clients where `APIRequest` is used.
|
||
|
||
- Fixed a bug in `GeminiLiveLLMService` where in some circumstances it wouldn't
|
||
respond after a tool call.
|
||
|
||
- Fixed `GeminiLiveLLMService` session resumption after a connection timeout.
|
||
|
||
- `GeminiLiveLLMService` now properly supports context-provided system
|
||
instruction and tools.
|
||
|
||
- Fixed `GoogleLLMService` token counting to avoid double-counting tokens when
|
||
Gemini sends usage metadata across multiple streaming chunks.
|
||
|
||
## [0.0.92] - 2025-10-31 🎃 "The Haunted Edition" 👻
|
||
|
||
### Added
|
||
|
||
- Added a new `DeepgramHttpTTSService`, which delivers a meaningful reduction
|
||
in latency when compared to the `DeepgramTTSService`.
|
||
|
||
- Add support for `speaking_rate` input parameter in `GoogleHttpTTSService`.
|
||
|
||
- Added `enable_speaker_diarization` and `enable_language_identification` to
|
||
`SonioxSTTService`.
|
||
|
||
- Added `SpeechmaticsTTSService`, which uses Speechmatic's TTS API. Updated
|
||
examples 07a\* to use the new TTS service.
|
||
|
||
- Added support for including images or audio to LLM context messages using
|
||
`LLMContext.create_image_message()` or `LLMContext.create_image_url_message()`
|
||
(not all LLMs support URLs) and `LLMContext.create_audio_message()`. For
|
||
example, when creating `LLMMessagesAppendFrame`:
|
||
|
||
```python
|
||
message = LLMContext.create_image_message(image=..., size= ...)
|
||
await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True))
|
||
```
|
||
|
||
- New event handlers for the `DeepgramFluxSTTService`: `on_start_of_turn`,
|
||
`on_turn_resumed`, `on_end_of_turn`, `on_eager_end_of_turn`, `on_update`.
|
||
|
||
- Added `generation_config` parameter support to `CartesiaTTSService` and
|
||
`CartesiaHttpTTSService` for Cartesia Sonic-3 models. Includes a new
|
||
`GenerationConfig` class with `volume` (0.5-2.0), `speed` (0.6-1.5),
|
||
and `emotion` (60+ options) parameters for fine-grained speech generation
|
||
control.
|
||
|
||
- Expanded support for univeral `LLMContext` to `OpenAIRealtimeLLMService`.
|
||
As a reminder, the context-setup pattern when using `LLMContext` is:
|
||
|
||
```python
|
||
context = LLMContext(messages, tools)
|
||
context_aggregator = LLMContextAggregatorPair(context)
|
||
```
|
||
|
||
(Note that even though `OpenAIRealtimeLLMService` now supports the universal
|
||
`LLMContext`, it is not meant to be swapped out for another LLM service at
|
||
runtime with `LLMSwitcher`.)
|
||
|
||
Note: `TranscriptionFrame`s and `InterimTranscriptionFrame`s now go upstream
|
||
from `OpenAIRealtimeLLMService`, so if you're using `TranscriptProcessor`,
|
||
say, you'll want to adjust accordingly:
|
||
|
||
```python
|
||
pipeline = Pipeline(
|
||
[
|
||
transport.input(),
|
||
context_aggregator.user(),
|
||
|
||
# BEFORE
|
||
llm,
|
||
transcript.user(),
|
||
|
||
# AFTER
|
||
transcript.user(),
|
||
llm,
|
||
|
||
transport.output(),
|
||
transcript.assistant(),
|
||
context_aggregator.assistant(),
|
||
]
|
||
)
|
||
```
|
||
|
||
Also worth noting: whether or not you use the new context-setup pattern with
|
||
`OpenAIRealtimeLLMService`, some types have changed under the hood:
|
||
|
||
```python
|
||
## BEFORE:
|
||
|
||
# Context aggregator type
|
||
context_aggregator: OpenAIContextAggregatorPair
|
||
|
||
# Context frame type
|
||
frame: OpenAILLMContextFrame
|
||
|
||
# Context type
|
||
context: OpenAIRealtimeLLMContext
|
||
# or
|
||
context: OpenAILLMContext
|
||
|
||
## AFTER:
|
||
|
||
# Context aggregator type
|
||
context_aggregator: LLMContextAggregatorPair
|
||
|
||
# Context frame type
|
||
frame: LLMContextFrame
|
||
|
||
# Context type
|
||
context: LLMContext
|
||
```
|
||
|
||
Also note that `RealtimeMessagesUpdateFrame` and
|
||
`RealtimeFunctionCallResultFrame` have been deprecated, since they're no
|
||
longer used by `OpenAIRealtimeLLMService`. OpenAI Realtime now works more
|
||
like other LLM services in Pipecat, relying on updates to its context, pushed
|
||
by context aggregators, to update its internal state. Listen for
|
||
`LLMContextFrame`s for context updates.
|
||
|
||
Finally, `LLMTextFrame`s are no longer pushed from `OpenAIRealtimeLLMService`
|
||
when it's configured with `output_modalities=['audio']`. If you need
|
||
to process its output, listen for `TTSTextFrame`s instead.
|
||
|
||
- Expanded support for universal `LLMContext` to `GeminiLiveLLMService`.
|
||
As a reminder, the context-setup pattern when using `LLMContext` is:
|
||
|
||
```python
|
||
context = LLMContext(messages, tools)
|
||
context_aggregator = LLMContextAggregatorPair(context)
|
||
```
|
||
|
||
(Note that even though `GeminiLiveLLMService` now supports the universal
|
||
`LLMContext`, it is not meant to be swapped out for another LLM service at
|
||
runtime with `LLMSwitcher`.)
|
||
|
||
Worth noting: whether or not you use the new context-setup pattern with
|
||
`GeminiLiveLLMService`, some types have changed under the hood:
|
||
|
||
```python
|
||
## BEFORE:
|
||
|
||
# Context aggregator type
|
||
context_aggregator: GeminiLiveContextAggregatorPair
|
||
|
||
# Context frame type
|
||
frame: OpenAILLMContextFrame
|
||
|
||
# Context type
|
||
context: GeminiLiveLLMContext
|
||
# or
|
||
context: OpenAILLMContext
|
||
|
||
## AFTER:
|
||
|
||
# Context aggregator type
|
||
context_aggregator: LLMContextAggregatorPair
|
||
|
||
# Context frame type
|
||
frame: LLMContextFrame
|
||
|
||
# Context type
|
||
context: LLMContext
|
||
```
|
||
|
||
Also note that `LLMTextFrame`s are no longer pushed from `GeminiLiveLLMService`
|
||
when it's configured with `modalities=GeminiModalities.AUDIO`. If you need
|
||
to process its output, listen for `TTSTextFrame`s instead.
|
||
|
||
### Changed
|
||
|
||
- The development runner's `/start` endpoint now supports passing
|
||
`dailyRoomProperties` and `dailyMeetingTokenProperties` in the request body
|
||
when `createDailyRoom` is true. Properties are validated against the
|
||
`DailyRoomProperties` and `DailyMeetingTokenProperties` types respectively
|
||
and passed to Daily's room and token creation APIs.
|
||
|
||
- `UserImageRawFrame` new fields `append_to_context` and `text`. The
|
||
`append_to_context` field indicates if this image and text should be added to
|
||
the LLM context (by the LLM assistant aggregator). The `text` field, if set,
|
||
might also guide the LLM or the vision service on how to analyze the image.
|
||
|
||
- `UserImageRequestFrame` new fiels `append_to_context` and `text`. Both fields
|
||
will be used to set the same fields on the captured `UserImageRawFrame`.
|
||
|
||
- `UserImageRequestFrame` don't require function call name and ID anymore.
|
||
|
||
- Updated `MoondreamService` to process `UserImageRawFrame`.
|
||
|
||
- `VisionService` expects `UserImageRawFrame` in order to analyze images.
|
||
|
||
- `DailyTransport` triggers `on_error` event if transcription can't be started
|
||
or stopped.
|
||
|
||
- `DailyTransport` updates: `start_dialout()` now returns two values:
|
||
`session_id` and `error`. `start_recording()` now returns two values:
|
||
`stream_id` and `error`.
|
||
|
||
- Updated `daily-python` to 0.21.0.
|
||
|
||
- `SimliVideoService` now accepts `api_key` and `face_id` parameters directly,
|
||
with optional `params` for `max_session_length` and `max_idle_time`
|
||
configuration, aligning with other Pipecat service patterns.
|
||
|
||
- Updated the default model to `sonic-3` for `CartesiaTTSService` and
|
||
`CartesiaHttpTTSService`.
|
||
|
||
- `FunctionFilter` now has a `filter_system_frames` arg, which controls whether
|
||
or not SystemFrames are filtered.
|
||
|
||
- Upgraded `aws_sdk_bedrock_runtime` to v0.1.1 to resolve potential CPU issues
|
||
when running `AWSNovaSonicLLMService`.
|
||
|
||
### Deprecated
|
||
|
||
- The `expect_stripped_words` parameter of `LLMAssistantAggregatorParams` is
|
||
ignored when used with the newer `LLMAssistantAggregator`, which now handles
|
||
word spacing automatically.
|
||
|
||
- `LLMService.request_image_frame()` is deprecated, push a
|
||
`UserImageRequestFrame` instead.
|
||
|
||
- `UserResponseAggregator` is deprecated and will be removed in a future version.
|
||
|
||
- The `send_transcription_frames` argument to `OpenAIRealtimeLLMService` is
|
||
deprecated. Transcription frames are now always sent. They go upstream, to be
|
||
handled by the user context aggregator. See "Added" section for details.
|
||
|
||
- Types in `pipecat.services.openai.realtime.context` and
|
||
`pipecat.services.openai.realtime.frames` are deprecated, as they're no
|
||
longer used by `OpenAIRealtimeLLMService`. See "Added" section for details.
|
||
|
||
- `SimliVideoService` `simli_config` parameter is deprecated. Use `api_key` and
|
||
`face_id` parameters instead.
|
||
|
||
### Removed
|
||
|
||
- Removed `enable_non_final_tokens` and `max_non_final_tokens_duration_ms` from
|
||
`SonioxSTTService`.
|
||
|
||
- Removed the `aiohttp_session` arg from `SarvamTTSService` as it's no longer
|
||
used.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `PipelineTask` issue that was causing an idle timeout for frames that
|
||
were being generated but not reaching the end of the pipeline. Since the exact
|
||
point when frames are discarded is unknown, we now monitor pipeline frames
|
||
using an observer. If the observer detects frames are being generated, it will
|
||
prevent the pipeline from being considered idle.
|
||
|
||
- Fixed an issue in `HumeTTSService` that was only using Octave 2, which does
|
||
not support the `description` field. Now, if a description is provided, it
|
||
switches to Octave 1.
|
||
|
||
- Fixed an issue where `DailyTransport` would timeout prematurely on join and on
|
||
leave.
|
||
|
||
- Fixed an issue in the runner where starting a DailyTransport room via
|
||
`/start` didn't support using the `DAILY_SAMPLE_ROOM_URL` env var.
|
||
|
||
- Fixed an issue in `ServiceSwitcher` where the `STTService`s would result in
|
||
all STT services producing `TranscriptionFrame`s.
|
||
|
||
### Other
|
||
|
||
- Updated all vision 12-series foundational examples to load images from a file.
|
||
|
||
- Added 14-series video examples for different services. These new examples
|
||
request an image from the user camera through a function call.
|
||
|
||
## [0.0.91] - 2025-10-21
|
||
|
||
### Added
|
||
|
||
- It is now possible to start a bot from the `/start` endpoint when using the
|
||
runner Daily's transport. This follows the Pipecat Cloud format with
|
||
`createDailyRoom` and `body` fields in the POST request body.
|
||
|
||
- Added an ellipsis character (`…`) to the end of sentence detection in the
|
||
string utils.
|
||
|
||
- Expanded support for universal `LLMContext` to `AWSNovaSonicLLMService`.
|
||
As a reminder, the context-setup pattern when using `LLMContext` is:
|
||
|
||
```python
|
||
context = LLMContext(messages, tools)
|
||
context_aggregator = LLMContextAggregatorPair(context)
|
||
```
|
||
|
||
(Note that even though `AWSNovaSonicLLMService` now supports the universal
|
||
`LLMContext`, it is not meant to be swapped out for another LLM service at
|
||
runtime with `LLMSwitcher`.)
|
||
|
||
Worth noting: whether or not you use the new context-setup pattern with
|
||
`AWSNovaSonicLLMService`, some types have changed under the hood:
|
||
|
||
```python
|
||
## BEFORE:
|
||
|
||
# Context aggregator type
|
||
context_aggregator: AWSNovaSonicContextAggregatorPair
|
||
|
||
# Context frame type
|
||
frame: OpenAILLMContextFrame
|
||
|
||
# Context type
|
||
context: AWSNovaSonicLLMContext
|
||
# or
|
||
context: OpenAILLMContext
|
||
|
||
## AFTER:
|
||
|
||
# Context aggregator type
|
||
context_aggregator: LLMContextAggregatorPair
|
||
|
||
# Context frame type
|
||
frame: LLMContextFrame
|
||
|
||
# Context type
|
||
context: LLMContext
|
||
```
|
||
|
||
- Added support for `bulbul:v3` model in `SarvamTTSService` and
|
||
`SarvamHttpTTSService`.
|
||
|
||
- Added `keyterms_prompt` parameter to `AssemblyAIConnectionParams`.
|
||
|
||
- Added `speech_model` parameter to `AssemblyAIConnectionParams` to access the
|
||
multilingual model.
|
||
|
||
- Added support for trickle ICE to the `SmallWebRTCTransport`.
|
||
|
||
- Added support for updating `OpenAITTSService` settings (`instructions` and
|
||
`speed`) at runtime via `TTSUpdateSettingsFrame`.
|
||
|
||
- Added `--whatsapp` flag to runner to better surface WhatsApp transport logs.
|
||
|
||
- Added `on_connected` and `on_disconnected` events to TTS and STT
|
||
websocket-based services.
|
||
|
||
- Added an `aggregate_sentences` arg in `ElevenLabsHttpTTSService`, where the
|
||
default value is True.
|
||
|
||
- Added a `room_properties` arg to the Daily runner's `configure()` method,
|
||
allowing `DailyRoomProperties` to be provided.
|
||
|
||
- The runner `--folder` argument now supports downloading files from
|
||
subdirectories.
|
||
|
||
### Changed
|
||
|
||
- `RunnerArguments` now include the `body` field, so there's no need to add it
|
||
to subclasses. Also, all `RunnerArguments` fields are now keyword-only.
|
||
|
||
- `CartesiaSTTService` now inherits from `WebsocketSTTService`.
|
||
|
||
- Package upgrades:
|
||
|
||
- `daily-python` upgraded to 0.20.0.
|
||
- `openai` upgraded to support up to 2.x.x.
|
||
- `openpipe` upgraded to support up to 5.x.x.
|
||
|
||
- `SpeechmaticsSTTService` updated dependencies for `speechmatics-rt>=0.5.0`.
|
||
|
||
### Deprecated
|
||
|
||
- The `send_transcription_frames` argument to `AWSNovaSonicLLMService` is
|
||
deprecated. Transcription frames are now always sent. They go upstream, to be
|
||
handled by the user context aggregator. See "Added" section for details.
|
||
|
||
- Types in `pipecat.services.aws.nova_sonic.context` are deprecated, as they're
|
||
no longer used by `AWSNovaSonicLLMService`. See "Added" section for
|
||
details.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where the `RTVIProcessor` was sending duplicate
|
||
`UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` messages.
|
||
|
||
- Fixed an issue in `AWSBedrockLLMService` where both `temperature` and `top_p`
|
||
were always sent together, causing conflicts with models like Claude Sonnet 4.5
|
||
that don't allow both parameters simultaneously. The service now only includes
|
||
inference parameters that are explicitly set, and `InputParams` defaults have
|
||
been changed to `None` to rely on AWS Bedrock's built-in model defaults.
|
||
|
||
- Fixed an issue in `RivaSegmentedSTTService` where a runtime error occurred due
|
||
to a mismatch in the `_handle_transcription` method's signature.
|
||
|
||
- Fixed multiple pipeline task cancellation issues. `asyncio.CancelledError` is
|
||
now handled properly in `PipelineTask` making it possible to cancel an asyncio
|
||
task that it's executing a `PipelineRunner` cleanly. Also,
|
||
`PipelineTask.cancel()` does not block anymore waiting for the `CancelFrame`
|
||
to reach the end of the pipeline (going back to the behavior in < 0.0.83).
|
||
|
||
- Fixed an issue in `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` where
|
||
the Flash models would split words, resulting in a space being inserted
|
||
between words.
|
||
|
||
- Fixed an issue where audio filters' `stop()` would not be called when using
|
||
`CancelFrame`.
|
||
|
||
- Fixed an issue in `ElevenLabsHttpTTSService`, where
|
||
`apply_text_normalization` was incorrectly set as a query parameter. It's now
|
||
being added as a request parameter.
|
||
|
||
- Fixed an issue where `RimeHttpTTSService` and `PiperTTSService` could generate
|
||
incorrectly 16-bit aligned audio frames, potentially leading to internal
|
||
errors or static audio.
|
||
|
||
- Fixed an issue in `SpeechmaticsSTTService` where `AdditionalVocabEntry` items
|
||
needed to have `sounds_like` for the session to start.
|
||
|
||
### Other
|
||
|
||
- Added foundational example `47-sentry-metrics.py`, demonstrating how to use the
|
||
`SentryMetrics` processor.
|
||
|
||
- Added foundational example `14x-function-calling-openpipe.py`.
|
||
|
||
## [0.0.90] - 2025-10-10
|
||
|
||
### Added
|
||
|
||
- Added audio filter `KrispVivaFilter` using the Krisp VIVA SDK.
|
||
|
||
- Added `--folder` argument to the runner, allowing files saved in that folder
|
||
to be downloaded from `http://HOST:PORT/file/FILE`.
|
||
|
||
- Added `GeminiLiveVertexLLMService`, for accessing Gemini Live via Google
|
||
Vertex AI.
|
||
|
||
- Added some new configuration options to `GeminiLiveLLMService`:
|
||
|
||
- `thinking`
|
||
- `enable_affective_dialog`
|
||
- `proactivity`
|
||
|
||
Note that these new configuration options require using a newer model than
|
||
the default, like "gemini-2.5-flash-native-audio-preview-09-2025". The last
|
||
two require specifying `http_options=HttpOptions(api_version="v1alpha")`.
|
||
|
||
- Added `on_pipeline_error` event to `PipelineTask`. This event will get fired
|
||
when an `ErrorFrame` is pushed (use `FrameProcessor.push_error()`).
|
||
|
||
```python
|
||
@task.event_handler("on_pipeline_error")
|
||
async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame):
|
||
...
|
||
```
|
||
|
||
- Added a `service_tier` `InputParam` to the `BaseOpenAILLMService`. This
|
||
parameter can influence the latency of the response. For example `"priority"`
|
||
will result in faster completions, but in exchange for a higher price.
|
||
|
||
### Changed
|
||
|
||
- Updated `GeminiLiveLLMService` to use the `google-genai` library rather than
|
||
use WebSockets directly.
|
||
|
||
### Deprecated
|
||
|
||
- `LivekitFrameSerializer` is now deprecated. Use `LiveKitTransport` instead.
|
||
|
||
- `pipecat.service.openai_realtime` is now deprecated, use
|
||
`pipecat.services.openai.realtime` instead or
|
||
`pipecat.services.azure.realtime` for Azure Realtime.
|
||
|
||
- `pipecat.service.aws_nova_sonic` is now deprecated, use
|
||
`pipecat.services.aws.nova_sonic` instead.
|
||
|
||
- `GeminiMultimodalLiveLLMService` is now deprecated, use
|
||
`GeminiLiveLLMService`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `GoogleVertexLLMService` issue that would generate an error if no
|
||
token information was returned.
|
||
|
||
- `GeminiLiveLLMService` will now end gracefully (i.e. after the bot has
|
||
finished) upon receiving an `EndFrame`.
|
||
|
||
- `GeminiLiveLLMService` will try to seamlessly reconnect when it loses its
|
||
connection.
|
||
|
||
## [0.0.89] - 2025-10-07
|
||
|
||
### Fixed
|
||
|
||
- Reverted a change introduced in 0.0.88 that was causing pipelines to be frozen
|
||
when using interruption strategies and processors that block interruption
|
||
frames (e.g. `STTMuteFilter`).
|
||
|
||
## [0.0.88] - 2025-10-07
|
||
|
||
### Added
|
||
|
||
- Added support for Nano Banana models to `GoogleLLMService`. For example, you
|
||
can now use the `gemini-2.5-flash-image` model to generate images.
|
||
|
||
- Added `HumeTTSService` for text-to-speech synthesis using Hume AI's expressive
|
||
voice models. Provides high-quality, emotionally expressive speech synthesis
|
||
with support for various voice models. Includes example in
|
||
`examples/foundational/07ad-interruptible-hume.py`. Use with:
|
||
`uv pip install pipecat-ai[hume]`.
|
||
|
||
### Changed
|
||
|
||
- Updated default `GoogleLLMService` model to `gemini-2.5-flash`.
|
||
|
||
### Deprecated
|
||
|
||
- PlayHT is shutting down their API on December 31st, 2025. As a result,
|
||
`PlayHTTTSService` and `PlayHTHttpTTSService` are deprecated and will be
|
||
removed in a future version.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with `AWSNovaSonicLLMService` where the client wouldn't
|
||
connect due to a breaking change in the AWS dependency chain.
|
||
|
||
- `PermissionError` is now caught if NLTK's `punkt_tab` can't be downloaded.
|
||
|
||
- Fixed an issue that would cause wrong user/assistant context ordering when
|
||
using interruption strategies.
|
||
|
||
- Fixed RTVI incoming message handling, broken in 0.0.87.
|
||
|
||
## [0.0.87] - 2025-10-02
|
||
|
||
### Added
|
||
|
||
- Added `WebsocketSTTService` base class for websocket-based STT services.
|
||
Combines STT functionality with websocket connectivity, providing automatic
|
||
error handling and reconnection capabilities with exponential backoff.
|
||
|
||
- Added `DeepgramFluxSTTService` for real-time speech recognition using
|
||
Deepgram's Flux WebSocket API. Flux understands conversational flow and
|
||
automatically handles turn-taking.
|
||
|
||
- Added RTVI messages for user/bot audio levels and system logs.
|
||
|
||
- Include OpenAI-based LLM services cached tokens to `MetricsFrame`.
|
||
|
||
### Changed
|
||
|
||
- Updated the default model for `AnthropicLLMService` to
|
||
`claude-sonnet-4-5-20250929`.
|
||
|
||
### Deprecated
|
||
|
||
- `DailyTransportMessageFrame` and `DailyTransportMessageUrgentFrame` are
|
||
deprecated, use `DailyOutputTransportMessageFrame` and
|
||
`DailyOutputTransportMessageUrgentFrame` respectively instead.
|
||
|
||
- `LiveKitTransportMessageFrame` and `LiveKitTransportMessageUrgentFrame` are
|
||
deprecated, use `LiveKitOutputTransportMessageFrame` and
|
||
`LiveKitOutputTransportMessageUrgentFrame` respectively instead.
|
||
|
||
- `TransportMessageFrame` and `TransportMessageUrgentFrame` are deprecated, use
|
||
`OutputTransportMessageFrame` and `OutputTransportMessageUrgentFrame`
|
||
respectively instead.
|
||
|
||
- `InputTransportMessageUrgentFrame` is deprecated, use
|
||
`InputTransportMessageFrame` instead.
|
||
|
||
- `DailyUpdateRemoteParticipantsFrame` is deprecated and will be removed in a
|
||
future version. Instead, create your own custom frame and handle it in the
|
||
`@transport.output().event_handler("on_after_push_frame")` event handler or a
|
||
custom processor.
|
||
|
||
## Fixed
|
||
|
||
- Fixed an issue in `AWSBedrockLLMService` where timeout exceptions weren't
|
||
being detected.
|
||
|
||
- Fixed a `PipelineTask` issue that could prevent the application to exit if
|
||
`task.cancel()` was called when the task was already finished.
|
||
|
||
- Fixed an issue where local SmartTurn was not being ran in a separate thread.
|
||
|
||
## [0.0.86] - 2025-09-24
|
||
|
||
### Added
|
||
|
||
- Added `HeyGenTransport`. This is an integration for HeyGen Interactive
|
||
Avatar. A video service that handles audio streaming and requests HeyGen to
|
||
generate avatar video responses. (see https://www.heygen.com/). When used, the
|
||
Pipecat bot joins the same virtual room as the HeyGen Avatar and the user.
|
||
|
||
- Added support to `TwilioFrameSerializer` for `region` and `edge` settings.
|
||
|
||
- Added support for using universal `LLMContext` with:
|
||
|
||
- `LLMLogObserver`
|
||
- `GatedLLMContextAggregator` (formerly `GatedOpenAILLMContextAggregator`)
|
||
- `LangchainProcessor`
|
||
- `Mem0MemoryService`
|
||
|
||
- Added `StrandsAgentProcessor` which allows you to use the Strands Agents
|
||
framework to build your voice agents.
|
||
See https://strandsagents.com
|
||
|
||
- Added `ElevenLabsSTTService` for speech-to-text transcription.
|
||
|
||
- Added a peer connection monitor to the `SmallWebRTCConnection` that
|
||
automatically disconnects if the connection fails to establish within
|
||
the timeout (1 minute by default).
|
||
|
||
- Added memory cleanup improvements to reduce memory peaks.
|
||
|
||
- Added `on_before_process_frame`, `on_after_process_frame`,
|
||
`on_before_push_frame` and `on_after_push_frame`. These are synchronous events
|
||
that get called before and after a frame is processed or pushed. Note that
|
||
these events are synchrnous so they should ideally perform lightweight tasks
|
||
in order to not block the pipeline. See
|
||
`examples/foundational/45-before-and-after-events.py`.
|
||
|
||
- Added `on_before_leave` synchronous event to `DailyTransport`.
|
||
|
||
- Added `on_before_disconnect` synchronous event to `LiveKitTransport`.
|
||
|
||
- It is now possible to register synchronous event handlers. By default, all
|
||
event handlers are executed in a separate task. However, in some cases we want
|
||
to guarantee order of execution, for example, executing something before
|
||
disconnecting a transport.
|
||
|
||
```python
|
||
self._register_event_handler("on_event_name", sync=True)
|
||
```
|
||
|
||
- Added support for global location in `GoogleVertexLLMService`. The service now
|
||
supports both regional locations (e.g., "us-east4") and the "global" location
|
||
for Vertex AI endpoints. When using "global" location, the service will use
|
||
`aiplatform.googleapis.com` as the API host instead of the regional format.
|
||
|
||
- Added `on_pipeline_finished` event to `PipelineTask`. This event will get
|
||
fired when the pipeline is done running. This can be the result of a
|
||
`StopFrame`, `CancelFrame` or `EndFrame`.
|
||
|
||
```python
|
||
@task.event_handler("on_pipeline_finished")
|
||
async def on_pipeline_finished(task: PipelineTask, frame: Frame):
|
||
...
|
||
```
|
||
|
||
- Added support for new RTVI `send-text` event, along with the ability to toggle
|
||
the audio response off (skip tts) while handling the new context.
|
||
|
||
### Changed
|
||
|
||
- Updated `aiortc` to 1.13.0.
|
||
|
||
- Updated `sentry` to 2.38.0.
|
||
|
||
- `BaseOutputTransport` methods `write_audio_frame` and `write_video_frame` now
|
||
return a boolean to indicate if the transport implementation was able to write
|
||
the given frame or not.
|
||
|
||
- Updated Silero VAD model to v6.
|
||
|
||
- Updated `livekit` to 1.0.13.
|
||
|
||
- `torch` and `torchaudio` are no longer required for running Smart Turn
|
||
locally. This avoids gigabytes of dependencies being installed.
|
||
|
||
- Updated `websockets` dependency to support version 15.0. Removed deprecated
|
||
usage of `ConnectionClosed.code` and `ConnectionClosed.reason` attributes in
|
||
`AWSTranscribeSTTService` for compatibility.
|
||
|
||
- Refactored `pyproject.toml` to reduce websockets dependency repetition using
|
||
self-referencing extras. All websockets-dependent services now reference a
|
||
shared `websockets-base` extra.
|
||
|
||
### Deprecated
|
||
|
||
- `GladiaSTTService`'s `confidence` arg is deprecated. `confidence` is no
|
||
longer needed to determine which transcription or translation frames to
|
||
emit.
|
||
|
||
- `PipelineTask` events `on_pipeline_stopped`, `on_pipeline_ended` and
|
||
`on_pipeline_cancelled` are now deprecated. Use `on_pipeline_finished`
|
||
instead.
|
||
|
||
- Support for the RTVI `append-to-context` event, in lieu of the new `send-text`
|
||
event and making way for future events like `send-image`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where the pipeline could freeze if a task cancellation never
|
||
completed because a third-party library swallowed asyncio.CancelledError. We
|
||
now apply a timeout to task cancellations to prevent these freezes. If the
|
||
timeout is reached, the system logs warnings and leaves dangling tasks behind,
|
||
which can help diagnose where cancellation is being blocked.
|
||
|
||
- Fixed an `AudioBufferProcessor` issues that was causing user audio to be
|
||
missing in stereo recordings causing bot and user overlaps.
|
||
|
||
- Fixed a `BaseOutputTransport` issue that could produce large saved
|
||
`AudioBufferProcessor` files when using an audio mixer.
|
||
|
||
- Fixed a `PipelineRunner` issue on Windows where setting up SIGINT and SIGTERM
|
||
was raising an exception.
|
||
|
||
- Fixed an issue where multiple handlers for an event would not run in parallel.
|
||
|
||
- Fixed `DailyTransport.sip_call_transfer()` to automatically use the session
|
||
ID from the `on_dialin_connected` event, when not explicitly provided. Now
|
||
supports cold transfers (from incoming dial-in calls) by automatically
|
||
tracking session IDs from connection events.
|
||
|
||
- Fixed a memory leak in `SmallWebRTCTransport`. In `aiortc`, when you receive
|
||
a `MediaStreamTrack` (audio or video), frames are produced asynchronously. If
|
||
the code never consumes these frames, they are queued in memory, causing a
|
||
memory leak.
|
||
|
||
- Fixed an issue in `AsyncAITTSService`, where `TTSTextFrames` were not being
|
||
pushed.
|
||
|
||
- Fixed an issue that would cause `push_interruption_task_frame_and_wait()` to
|
||
not wait if a previous interruption had already happened.
|
||
|
||
- Fixed a couple of bugs in `ServiceSwitcher`:
|
||
|
||
- Using multiple `ServiceSwitcher`s in a pipeline would result in an error.
|
||
- `ServiceSwitcherFrame`s (such as `ManuallySwitchServiceFrame`s) were having
|
||
an effect too early, essentially "jumping the queue" in terms of pipeline
|
||
frame ordering.
|
||
|
||
- Fixed a self-cancellation deadlock in `UserIdleProcessor` when returning
|
||
`False` from an idle callback. The task now terminates naturally instead of
|
||
attempting to cancel itself.
|
||
|
||
- Fixed an issue in `AudioBufferProcessor` where a recording is not created
|
||
when a bot speaks and user input is blocked.
|
||
|
||
- Fixed a `FastAPIWebsocketTransport` and `SmallWebRTCTransport` issue where
|
||
`on_client_disconnected` would be triggered when the bot ends the
|
||
conversation. That is, `on_client_disconnected` should only be triggered when
|
||
the remote client actually disconnects.
|
||
|
||
- Fixed an issue in `HeyGenVideoService` where the `BotStartedSpeakingFrame`
|
||
was blocked from moving through the Pipeline.
|
||
|
||
## [0.0.85] - 2025-09-12
|
||
|
||
### Added
|
||
|
||
- `AzureSTTService` now pushes interim transcriptions.
|
||
|
||
- Added `voice_cloning_key` to `GoogleTTSService` to support custom cloned
|
||
voices.
|
||
|
||
- Added `speaking_rate` to `GoogleTTSService.InputParams` to control the
|
||
speaking rate.
|
||
|
||
- Added a `speed` arg to `OpenAITTSService` to control the speed of the voice
|
||
response.
|
||
|
||
- Added `FrameProcessor.push_interruption_task_frame_and_wait()`. Use this
|
||
method to programatically interrupt the bot from any part of the
|
||
pipeline. This guarantees that all the processors in the pipeline are
|
||
interrupted in order (from upstream to downstream). Internally, this works by
|
||
first pushing an `InterruptionTaskFrame` upstream until it reaches the
|
||
pipeline task. The pipeline task then generates an `InterruptionFrame`, which
|
||
flows downstream through all processors. Once the `InterruptionFrame` has
|
||
reaches the processor waiting for the interruption, the function returns and
|
||
execution continues after the call. Think of it as sending an upstream request
|
||
for interruption and waiting until the acknowledgment flows back downstream.
|
||
|
||
- Added new base `TaskFrame` (which is a system frame). This is the base class
|
||
for all task frames (`EndTaskFrame`, `CancelTaskFrame`, etc.) that are meant
|
||
to be pushed upstream to reach the pipeline task.
|
||
|
||
- Expanded support for universal `LLMContext` to the AWS Bedrock LLM service.
|
||
Using the universal `LLMContext` and associated `LLMContextAggregatorPair` is
|
||
a pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime.
|
||
|
||
- Added new fields to the development runner's `parse_telephony_websocket`
|
||
method in support of providing dynamic data to a bot.
|
||
|
||
- Twilio: Added a new `body` parameter, which parses the websocket message
|
||
for `customParameters`. Provide data via the `Parameter` nouns in your
|
||
TwiML to use this feature.
|
||
- Telnyx & Exotel: Both providers make the `to` and `from` phone numbers
|
||
available in the websocket messages. You can now access these numbers as
|
||
`call_data["to"]` and `call_data["from"]`.
|
||
|
||
Note: Each telephony provider offers different features. Refer to the
|
||
corresponding example in `pipecat-examples` to see how to pass custom data
|
||
to your bot.
|
||
|
||
- Added `body` to the `WebsocketRunnerArguments` as an optional parameter.
|
||
Custom `body` information can be passed from the server into the bot file via
|
||
the `bot()` method using this new parameter.
|
||
|
||
- Added video streaming support to `LiveKitTransport`.
|
||
|
||
- Added `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService` which provide
|
||
access to OpenAI Realtime.
|
||
|
||
### Changed
|
||
|
||
- `pipeline.tests.utils.run_test()` now allows passing `PipelineParams` instead
|
||
of individual parameters.
|
||
|
||
### Removed
|
||
|
||
- Remove `VisionImageRawFrame` in favor of context frames (`LLMContextFrame` or
|
||
`OpenAILLMContextFrame`).
|
||
|
||
### Deprecated
|
||
|
||
- `BotInterruptionFrame` is now deprecated, use `InterruptionTaskFrame` instead.
|
||
|
||
- `StartInterruptionFrame` is now deprected, use `InterruptionFrame` instead.
|
||
|
||
- Deprecate `VisionImageFrameAggregator` because `VisionImageRawFrame` has been
|
||
removed. See the `12*` examples for the new recommended replacement pattern.
|
||
|
||
- `NoisereduceFilter` is now deprecated and will be removed in a future
|
||
version. Use other audio filters like `KrispFilter` or `AICFilter`.
|
||
|
||
- Deprecated `OpenAIRealtimeBetaLLMService` and `AzureRealtimeBetaLLMService`.
|
||
Use `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService`, respectively.
|
||
Each service will be removed in an upcoming version, 1.0.0.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `BaseOutputTransport` issue that caused incorrect detection of when
|
||
the bot stopped talking while using an audio mixer.
|
||
|
||
- Fixed a `LiveKitTransport` issue where RTVI messages were not properly
|
||
encoded.
|
||
|
||
- Add additional fixups to Mistral context messages to ensure they meet
|
||
Mistral-specific requirements, avoiding Mistral "invalid request" errors.
|
||
|
||
- Fixed `DailyTransport` transcription handling to gracefully handle missing
|
||
`rawResponse` field in transcription messages, preventing KeyError crashes.
|
||
|
||
## [0.0.84] - 2025-09-05
|
||
|
||
### Added
|
||
|
||
- Add the ability to send DTMF to `LiveKitTransport`.
|
||
|
||
- Expanded support for universal `LLMContext` to the Anthropic LLM service.
|
||
Using the universal `LLMContext` and associated `LLMContextAggregatorPair` is
|
||
a pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime.
|
||
|
||
### Changed
|
||
|
||
- Updated `daily-python` to 0.19.9.
|
||
|
||
- Restored `DailyTransport`'s native DTMF support using Daily's `send_dtmf()`
|
||
method instead of generated audio tones.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `AWSBedrockLLMService` crash caused by an extra `await`.
|
||
|
||
- Fixed a `OpenAIImageGenService` issue where it was not creating
|
||
`URLImageRawFrame` correctly.
|
||
|
||
## [0.0.83] - 2025-09-03
|
||
|
||
### Added
|
||
|
||
- Added multilingual support for AsyncAI in `AsyncAITTSService` and `AsyncAIHttpTTSService`.
|
||
|
||
- New `languages`: `es`, `fr`, `de`, `it`.
|
||
|
||
- Added new frames `InputTransportMessageUrgentFrame` and
|
||
`DailyInputTransportMessageUrgentFrame` for transport messages received from
|
||
external sources.
|
||
|
||
- Added `UserSpeakingFrame`. This will be sent upstream and downstream while VAD
|
||
detects the user is speaking.
|
||
|
||
- Expanded support for universal `LLMContext` to more LLM services. Using the
|
||
universal `LLMContext` and associated `LLMContextAggregatorPair` is a
|
||
pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime.
|
||
Here are the newly-supported services:
|
||
|
||
- Azure
|
||
- Cerebras
|
||
- Deepseek
|
||
- Fireworks AI
|
||
- Google Vertex AI
|
||
- Grok
|
||
- Groq
|
||
- Mistral
|
||
- NVIDIA NIM
|
||
- Ollama
|
||
- OpenPipe
|
||
- OpenRouter
|
||
- Perplexity
|
||
- Qwen
|
||
- SambaNova
|
||
- Together.ai
|
||
|
||
- Added support for WhatsApp User-initiated Calls.
|
||
|
||
- Added new audio filter `AICFilter`, speech enhancement for improving VAD/STT
|
||
performance, no ONNX dependency.
|
||
See https://ai-coustics.com/sdk/
|
||
|
||
- Added a timeout around cancel input tasks to prevent indefinite hangs when
|
||
cancellation is swallowed by third-party code.
|
||
|
||
- Added `pipecat.extensions.ivr` for automated IVR system navigation with
|
||
configurable goals and conversation handling. Supports DTMF input, verbal
|
||
responses, and intelligent menu traversal.
|
||
|
||
Basic usage:
|
||
|
||
```python
|
||
from pipecat.extensions.ivr.ivr_navigator import IVRNavigator
|
||
|
||
# Create IVR navigator with your goal
|
||
ivr_navigator = IVRNavigator(
|
||
llm=llm_service,
|
||
ivr_prompt="Navigate to billing department to dispute a charge"
|
||
)
|
||
|
||
# Handle different outcomes
|
||
@ivr_navigator.event_handler("on_conversation_detected")
|
||
async def on_conversation(processor, conversation_history):
|
||
# Switch to normal conversation mode
|
||
pass
|
||
|
||
@ivr_navigator.event_handler("on_ivr_status_changed")
|
||
async def on_ivr_status(processor, status):
|
||
if status == IVRStatus.COMPLETED:
|
||
# End pipeline, transfer call, or start bot conversation
|
||
elif status == IVRStatus.STUCK:
|
||
# Handle navigation failure
|
||
```
|
||
|
||
- `BaseOutputTransport` now implements `write_dtmf()` by loading DTMF audio and
|
||
sending it through the transport. This makes sending DTMF generic across all
|
||
output transports.
|
||
|
||
- Added new config parameters to `GladiaSTTService`.
|
||
- PreProcessingConfig > `audio_enhancer` to enhance audio quality.
|
||
- CustomVocabularyItem > `pronunciations` and `language` to specify special
|
||
pronunciations and in which language it will be pronounced.
|
||
|
||
### Changed
|
||
|
||
- `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` are also pushed
|
||
upstream.
|
||
|
||
- `ParallelPipeline` now waits for `CancelFrame` to finish in all branches
|
||
before pushing it downstream.
|
||
|
||
- Added `sip_codecs` to the `DailyRoomSipParams`.
|
||
|
||
- Updated the `configure()` function in `pipecat.runner.daily` to include new
|
||
args to create SIP-enabled rooms. Additionally, added new args to control the
|
||
room and token expiration durations.
|
||
|
||
- `pipecat.frames.frames.KeypadEntry` is deprecated and has been moved to
|
||
`pipecat.audio.dtmf.types.KeypadEntry`.
|
||
|
||
- Updated `RimeTTSService`'s flush_audio message to conform with Rime's official
|
||
API.
|
||
|
||
- Updated the default model for `CerebrasLLMService` to GPT-OSS-120B.
|
||
|
||
### Removed
|
||
|
||
- Remove `StopInterruptionFrame`. This was a legacy frame that was not being
|
||
used really anywhere and it didn't provide any useful meaning. It was only
|
||
pushed after `UserStoppedSpeakingFrame`, so developers can just use
|
||
`UserStoppedSpeakingFrame`.
|
||
|
||
- `DailyTransport.write_dtmf()` has been removed in favor of the generic
|
||
`BaseOutputTransport.write_dtmf()`.
|
||
|
||
- Remove deprecated `DailyTransport.send_dtmf()`.
|
||
|
||
### Deprecated
|
||
|
||
- Transports have been re-organized.
|
||
|
||
```
|
||
pipecat.transports.network.small_webrtc -> pipecat.transports.smallwebrtc.transport
|
||
pipecat.transports.network.webrtc_connection -> pipecat.transports.smallwebrtc.connection
|
||
pipecat.transports.network.websocket_client -> pipecat.transports.websocket.client
|
||
pipecat.transports.network.websocket_server -> pipecat.transports.websocket.server
|
||
pipecat.transports.network.fastapi_websocket -> pipecat.transports.websocket.fastapi
|
||
pipecat.transports.services.daily -> pipecat.transports.daily.transport
|
||
pipecat.transports.services.helpers.daily_rest -> pipecat.transports.daily.utils
|
||
pipecat.transports.services.livekit -> pipecat.transports.livekit.transport
|
||
pipecat.transports.services.tavus -> pipecat.transports.tavus.transport
|
||
```
|
||
|
||
- `pipecat.frames.frames.KeypadEntry` is deprecated use
|
||
`pipecat.audio.dtmf.types.KeypadEntry` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where messages received from the transport were always being resent.
|
||
|
||
- Fixed `SmallWebRTCTransport` to not use `mid` to decide if the transceiver should
|
||
be `sendrecv` or not.
|
||
|
||
- Fixed an issue where Deepgram swallowed `asyncio.CancelledError` during
|
||
disconnect, preventing tasks from being cancelled.
|
||
|
||
- Fixed an issue where `PipelineTask` was not cleaning up the observers.
|
||
|
||
### Performance
|
||
|
||
- Reduced latency and improved memory performance in `Mem0MemoryService`.
|
||
|
||
## [0.0.82] - 2025-08-28
|
||
|
||
### Added
|
||
|
||
- Added a new `LLMRunFrame` to trigger an LLM response:
|
||
|
||
```python
|
||
await task.queue_frames([LLMRunFrame()])
|
||
```
|
||
|
||
This replaces `OpenAILLMContextFrame`, which you’d previously typically use
|
||
like this:
|
||
|
||
```python
|
||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||
```
|
||
|
||
Use this way of kicking off your conversation when you’ve already initialized
|
||
your context and are simply instructing the bot when to go:
|
||
|
||
```python
|
||
context = OpenAILLMContext(messages, tools)
|
||
context_aggregator = llm.create_context_aggregator(context)
|
||
|
||
# ...
|
||
|
||
@transport.event_handler("on_client_connected")
|
||
async def on_client_connected(transport, client):
|
||
# Kick off the conversation.
|
||
await task.queue_frames([LLMRunFrame()])
|
||
```
|
||
|
||
Note that if you want to add new messages when kicking off the conversation,
|
||
you could use `LLMMessagesAppendFrame` with `run_llm=True` instead:
|
||
|
||
```python
|
||
@transport.event_handler("on_client_connected")
|
||
async def on_client_connected(transport, client):
|
||
# Kick off the conversation.
|
||
await task.queue_frames([LLMMessagesAppendFrame(new_messages, run_llm=True)])
|
||
```
|
||
|
||
In the rare case you don’t have a context aggregator in your pipeline, then
|
||
you may continue using a context frame.
|
||
|
||
- Added support for switching between audio+text to text-only modes within the
|
||
same pipeline. This is done by pushing
|
||
`LLMConfigureOutputFrame(skip_tts=True)` to enter text-only mode, and
|
||
disabling it to return to audio+text. The LLM will still generate tokens and
|
||
add them to the context, but they will not be sent to TTS.
|
||
|
||
- Added `skip_tts` field to `TextFrame`. This lets a text frame bypass TTS while
|
||
still being included in the LLM context. Useful for cases like structured text
|
||
that isn’t meant to be spoken but should still contribute to context.
|
||
|
||
- Added a `cancel_timeout_secs` argument to `PipelineTask` which defines how
|
||
long the pipeline has to complete cancellation. When `PipelineTask.cancel()`
|
||
is called, a `CancelFrame` is pushed through the pipeline and must reach the
|
||
end. If it does not reach the end within the specified time, a warning is
|
||
shown and the wait is aborted.
|
||
|
||
- Added a new "universal" (LLM-agnostic) `LLMContext` and accompanying
|
||
`LLMContextAggregatorPair`, which will eventually replace `OpenAILLMContext`
|
||
(and the other under-the-hood contexts) and the other context aggregators.
|
||
The new universal `LLMContext` machinery allows a single context to be shared
|
||
between different LLMs, enabling runtime LLM switching and scenarios like
|
||
failover.
|
||
|
||
From the developer's point of view, switching to using the new universal
|
||
context machinery will usually be a matter of going from this:
|
||
|
||
```python
|
||
context = OpenAILLMContext(messages, tools)
|
||
context_aggregator = llm.create_context_aggregator(context)
|
||
```
|
||
|
||
To this:
|
||
|
||
```python
|
||
context = LLMContext(messages, tools)
|
||
context_aggregator = LLMContextAggregatorPair(context)
|
||
```
|
||
|
||
To start, the universal `LLMContext` is supported with the following LLM
|
||
services:
|
||
|
||
- `OpenAILLMService`
|
||
- `GoogleLLMService`
|
||
|
||
- Added a new `LLMSwitcher` class to enable runtime LLM switching, built atop a
|
||
new generic `ServiceSwitcher`.
|
||
|
||
Switchers take a switching strategy. The first available strategy is
|
||
`ServiceSwitcherStrategyManual`.
|
||
|
||
To switch LLMs at runtime, the LLMs must be sharing one instance of the new
|
||
universal `LLMContext` (see above bullet).
|
||
|
||
```python
|
||
# Instantiate your LLM services
|
||
llm_openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
||
llm_google = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
|
||
|
||
# Instantiate a switcher
|
||
# (ServiceSwitcherStrategyManual defaults to OpenAI, as it's first in the list)
|
||
llm_switcher = LLMSwitcher(
|
||
llms=[llm_openai, llm_google], strategy_type=ServiceSwitcherStrategyManual
|
||
)
|
||
|
||
# Create your pipeline
|
||
pipeline = Pipeline(
|
||
[
|
||
transport.input(),
|
||
stt,
|
||
context_aggregator.user(),
|
||
llm_switcher,
|
||
tts,
|
||
transport.output(),
|
||
context_aggregator.assistant(),
|
||
]
|
||
)
|
||
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
|
||
|
||
# ...
|
||
# Whenever is appropriate, switch LLMs!
|
||
await task.queue_frames([ManuallySwitchServiceFrame(service=llm_google)])
|
||
```
|
||
|
||
- Added an `LLMService.run_inference()` method to LLM services to enable
|
||
direct, out-of-band (i.e. out-of-pipeline) inference.
|
||
|
||
### Changed
|
||
|
||
- Updated `daily-python` to 0.19.8.
|
||
|
||
- `PipelineTask` now waits for `StartFrame` to reach the end of the pipeline
|
||
before pushing any other frames.
|
||
|
||
- Updated `CartesiaTTSService` and `CartesiaHttpTTSService` to align with
|
||
Cartesia's changes for the `speed` parameter. It now takes only an enum of
|
||
`slow`, `normal`, or `fast`.
|
||
|
||
- Added support to `AWSBedrockLLMService` for setting authentication
|
||
credentials through environment variables.
|
||
|
||
- Updated `SarvamTTSService` to use WebSocket streaming for real-time audio
|
||
generation with multiple Indian languages, with HTTP support still available
|
||
via `SarvamHttpTTSService`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an RTVI issue that was causing frames to be pushed before pipeline was
|
||
properly initialized.
|
||
|
||
- Fixed some `get_messages_for_logging()` that were returning a JSON string
|
||
instead of a list.
|
||
|
||
- Fixed a `DailyTransport` issue that prevented DTMF tones from being sent.
|
||
|
||
- Fixed a missing import in `SentryMetrics`.
|
||
|
||
- Fixed `AWSPollyTTSService` to support AWS credential provider chain (IAM
|
||
roles, IRSA, instance profiles) instead of requiring explicit environment
|
||
variables.
|
||
|
||
- Fixed a `CartesiaTTSService` issue that was causing the application to hang
|
||
after Cartesia's 5 minutes timed out.
|
||
|
||
- Fixed an issue preventing `SpeechmaticsSTTService` from transcribing audio.
|
||
|
||
## [0.0.81] - 2025-08-25
|
||
|
||
### Added
|
||
|
||
- Added `pipecat.extensions.voicemail`, a module for detecting voicemail vs.
|
||
live conversation, primarily intended for use in outbound calling scenarios.
|
||
The voicemail module is optimized for text LLMs only.
|
||
|
||
- Added new frames to the `idle_timeout_frames` arg: `TranscriptionFrame`,
|
||
`InterimTranscriptionFrame`, `UserStartedSpeakingFrame`, and
|
||
`UserStoppedSpeakingFrame`. These additions serve as indicators of user
|
||
activity in the pipeline idle detection logic.
|
||
|
||
- Allow passing custom pipeline sink and source processors to a
|
||
`Pipeline`. Pipeline source and sink processors are used to know and control
|
||
what's coming in and out of a `Pipeline` processor.
|
||
|
||
- Added `FrameProcessor.pause_processing_system_frames()` and
|
||
`FrameProcessor.resume_processing_system_frames()`. These allow to pause and
|
||
resume the processing of system frame.
|
||
|
||
- Added new `on_process_frame()` observer method which makes it possible to know
|
||
when a frame is being processed.
|
||
|
||
- Added new `FrameProcessor.entry_processor()` method. This allows you to access
|
||
the first non-compound processor in a pipeline.
|
||
|
||
- Added `FrameProcessor` properties `processors`, `next` and `previous`.
|
||
|
||
- `ElevenLabsTTSService` now supports additional runtime changes to the `model`,
|
||
`language`, and `voice_settings` parameters.
|
||
|
||
- Added `apply_text_normalization` support to `ElevenLabsTTSService` and
|
||
`ElevenLabsHttpTTSService`.
|
||
|
||
- Added `MistralLLMService`, using Mistral's chat completion API.
|
||
|
||
- Added the ability to retry executing a chat completion after a timeout period
|
||
for `OpenAILLMService` and its subclasses, `AnthropicLLMService`, and
|
||
`AWSBedrockLLMService`. The LLM services accept new args:
|
||
`retry_timeout_secs` and `retry_on_timeout`. This feature is disabled by
|
||
default.
|
||
|
||
### Changed
|
||
|
||
- Updated `daily-python` to 0.19.7.
|
||
|
||
### Deprecated
|
||
|
||
- `FrameProcessor.wait_for_task()` is deprecated. Use `await task` or
|
||
`await asyncio.wait_for(task, timeout)` instead.
|
||
|
||
### Removed
|
||
|
||
- Watchdog timers have been removed. They were introduced in 0.0.72 to help
|
||
diagnose pipeline freezes. Unfortunately, they proved ineffective since they
|
||
required developers to use Pipecat-specific queues, iterators, and events to
|
||
correctly reset the timer, which limited their usefulness and added friction.
|
||
|
||
- Removed unused `FrameProcessor.set_parent()` and
|
||
`FrameProcessor.get_parent()`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue that would cause `PipelineRunner` and `PipelineTask` to not
|
||
handle external asyncio task cancellation properly.
|
||
|
||
- Added `SpeechmaticsSTTService` exception handling on connection and sending.
|
||
|
||
- Replaced `asyncio.wait_for()` for `wait_for2.wait_for()` for Python <
|
||
3.12. because of issues regarding task cancellation (i.e. cancellation is
|
||
never propagated).
|
||
See https://bugs.python.org/issue42130
|
||
|
||
- Fixed an `AudioBufferProcessor` issues that would cause audio overlap when
|
||
setting a max buffer size.
|
||
|
||
- Fixed an issue where `AsyncAITTSService` had very high latency in responding
|
||
by adding `force=true` when sending the flush command.
|
||
|
||
### Performance
|
||
|
||
- Improve `PipelineTask` performance by using direct mode processors and by
|
||
removing unnecessary tasks.
|
||
|
||
- Improve `ParallelPipeline` performance by using direct mode, by not
|
||
creating a task for each frame and every sub-pipeline and also by removing
|
||
other unnecessary tasks.
|
||
|
||
- `Pipeline` performance improvements by using direct mode.
|
||
|
||
### Other
|
||
|
||
- Added `14w-function-calling-mistal.py` using `MistralLLMService`.
|
||
|
||
- Added `13j-azure-transcription.py` using `AzureSTTService`.
|
||
|
||
## [0.0.80] - 2025-08-13
|
||
|
||
### Added
|
||
|
||
- Added `GeminiTTSService` which uses Google Gemini to generate TTS output. The
|
||
Gemini model can be prompted to insert styled speech to control the TTS
|
||
output.
|
||
|
||
- Added Exotel support to Pipecat's development runner. You can now connect
|
||
using the runner with `uv run bot.py -t exotel` and an ngrok connection to
|
||
HTTP port 7860.
|
||
|
||
- Added `enable_direct_mode` argument to `FrameProcessor`. The direct mode is
|
||
for processors which require very little I/O or compute resources, that is
|
||
processors that can perform their task almost immediately. These type of
|
||
processors don't need any of the internal tasks and queues usually created by
|
||
frame processors which means overall application performance might be slightly
|
||
increased. Use with care.
|
||
|
||
- Added TTFB metrics for `HeyGenVideoService` and `TavusVideoService`.
|
||
|
||
- Added `endpoint_id` parameter to `AzureSTTService`. ([Custom EndpointId](https://docs.azure.cn/en-us/ai-services/speech-service/how-to-recognize-speech?pivots=programming-language-python#use-a-custom-endpoint))
|
||
|
||
### Changed
|
||
|
||
- `WatchdogPriorityQueue` now requires the items to be inserted to always be
|
||
tuples and the size of the tuple needs to be specified in the constructor when
|
||
creating the queue with the `tuple_size` argument.
|
||
|
||
- Updated Moondream to revision `2025-01-09`.
|
||
|
||
- Updated `PlayHTHttpTTSService` to no longer use the `pyht` client to remove
|
||
compatibility issues with other packages. Now you can use the PlayHT HTTP
|
||
service with other services, like GoogleLLMService.
|
||
|
||
- Updated `pyproject.toml` to once again pin `numba` to `==0.61.2` in order to
|
||
resolve package versioning issues.
|
||
|
||
- Updated the `STTMuteFilter` to include `VADUserStartedSpeakingFrame` and
|
||
`VADUserStoppedSpeakingFrame` in the list of frames to filter when the
|
||
filtering is on.
|
||
|
||
### Performance
|
||
|
||
- Improving the latency of the `HeyGenVideoService`.
|
||
|
||
- Improved some frame processors performance by using the new frame processor
|
||
direct mode. In direct mode a frame processor will process frames right away
|
||
avoiding the need for internal queues and tasks. This is useful for some
|
||
simple processors. For example, in processors that wrap other processors
|
||
(e.g. `Pipeline`, `ParallelPipeline`), we add one processor before and one
|
||
after the wrapped processors (internally, you will see them as sources and
|
||
sinks). These sources and sinks don't do any special processing and they
|
||
basically forward frames. So, for these simple processors we now enable the
|
||
new direct mode which avoids creating any internal tasks (and queues) and
|
||
therefore improves performance.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with the `BaseWhisperSTTService` where the language was
|
||
specified as an enum and not a string.
|
||
|
||
- Fixed an issue where `SmallWebRTCTransport` ended before TTS finished.
|
||
|
||
- Fixed an issue in `OpenAIRealtimeBetaLLMService` where specifying a `text`
|
||
`modalities` didn't result in text being outputted from the model.
|
||
|
||
- Added SSML reserved character escaping to `AzureBaseTTSService` to properly
|
||
handle special characters in text sent to Azure TTS. This fixes an issue
|
||
where characters like `&`, `<`, `>`, `"`, and `'` in LLM-generated text would
|
||
cause TTS failures.
|
||
|
||
- Fixed a `WatchdogPriorityQueue` issue that could cause an exception when
|
||
compating watchdog cancel sentinel items with other items in the queue.
|
||
|
||
- Fixed an issue that would cause system frames to not be processed with higher
|
||
priority than other frames. This could cause slower interruption times.
|
||
|
||
- Fixed an issue where retrying a websocket connection error would result in an
|
||
error.
|
||
|
||
### Other
|
||
|
||
- Add foundation example `19b-openai-realtime-beta-text.py`, showing how to use
|
||
`OpenAIRealtimeBetaLLMService` to output text to a TTS service.
|
||
|
||
- Add vision support to release evals so we can run the foundational examples 12
|
||
series.
|
||
|
||
- Added foundational example `15a-switch-languages.py` to release evals. It is
|
||
able to detect if we switched the language properly.
|
||
|
||
- Updated foundational examples to show how to enclose complex logic
|
||
(e.g. `ParallelPipeline`) into a single processor so the main pipeline becomes
|
||
simpler.
|
||
|
||
- Added `07n-interruptible-gemini.py`, demonstrating how to use
|
||
`GeminiTTSService`.
|
||
|
||
## [0.0.79] - 2025-08-07
|
||
|
||
### Changed
|
||
|
||
- Changed `pipecat-ai`'s `openai` dependency to `>=1.74.0,<=1.99.1` due to a
|
||
breaking change in `openai` 1.99.2 ([commit](https://github.com/openai/openai-python/commit/657f551dbe583ffb259d987dafae12c6211fba06))
|
||
|
||
### Deprecated
|
||
|
||
- `TTSService.say()` is deprecated, push a `TTSSpeakFrame` instead. Calling
|
||
functions directly is a discouraged pattern in Pipecat because, for example,
|
||
it might cause issues with frame ordering.
|
||
|
||
- `LLMMessagesFrame` is deprecated, in favor of either:
|
||
|
||
- `LLMMessagesUpdateFrame` with `run_llm=True`
|
||
- `OpenAILLMContextFrame` with desired messages in a new context
|
||
|
||
- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` are
|
||
deprecated, as they depended on the now-deprecated `LLMMessagesFrame`. Use
|
||
`LLMUserContextAggregator` and `LLMAssistantResponseAggregator` (or
|
||
LLM-specific subclasses thereof) instead.
|
||
|
||
## [0.0.78] - 2025-08-07
|
||
|
||
### Added
|
||
|
||
- Added `SonioxSTTService` using Soniox's STT websocket API.
|
||
|
||
- Added `enable_emulated_vad_interruptions` to `LLMUserAggregatorParams`.
|
||
When user speech is emulated (e.g. when a transcription is received but
|
||
VAD doesn't detect speech), this parameter controls whether the emulated
|
||
speech can interrupt the bot. Default is False (emulated speech is ignored
|
||
while the bot is speaking).
|
||
|
||
- Added new `handle_sigint` and `handle_sigterm` to `RunnerArguments`. This
|
||
allows applications to know what settings they should use for the environment
|
||
they are running on. Also, added `pipeline_idle_timeout_secs` to be able to
|
||
control the `PipelineTask` idle timeout.
|
||
|
||
- Added `processor` field to `ErrorFrame` to indicate `FrameProcessor` that
|
||
generated the error.
|
||
|
||
- Added new language support for `AWSTranscribeSTTService`. All languages
|
||
supporting streaming data input are now supported:
|
||
https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html
|
||
|
||
- Added support for Simli Trinity Avatars. A new `is_trinity_avatar` parameter
|
||
has been introduced to specify whether the provided `faceId` corresponds to a
|
||
Trinity avatar, which is required for optimal Trinity avatar performance.
|
||
|
||
- The development runner how handles custom `body` data for `DailyTransport`.
|
||
The `body` data is passed to the Pipecat client. You can POST to the `/start`
|
||
endpoint with a request body of:
|
||
|
||
```
|
||
{
|
||
"createDailyRoom": true,
|
||
"dailyRoomProperties": { "start_video_off": true },
|
||
"body": { "custom_data": "value" }
|
||
}
|
||
```
|
||
|
||
The `body` information is parsed and used in the application. The
|
||
`dailyRoomProperties` are currently not handled.
|
||
|
||
- Added detailed latency logging to `UserBotLatencyLogObserver`, capturing
|
||
average response time between user stop and bot start, as well as minimum and
|
||
maximum response latency.
|
||
|
||
- Added Chinese, Japanese, Korean word timestamp support to
|
||
`CartesiaTTSService`.
|
||
|
||
- Added `region` parameter to `GladiaSTTService`. Accepted values: eu-west
|
||
(default), us-west.
|
||
|
||
### Changed
|
||
|
||
- System frames are now queued. Before, system frames could be generated from
|
||
any task and would not guarantee any order which was causing undesired
|
||
behavior. Also, it was possible to get into some rare recursion issues because
|
||
of the way system frames were executed (they were executed in-place, meaning
|
||
calling `push_frame()` would finish after the system frame traversed all the
|
||
pipeline). This makes system frames more deterministic.
|
||
|
||
- Changed the default model for both `ElevenLabsTTSService` and
|
||
`ElevenLabsHttpTTSService` to `eleven_turbo_v2_5`. The rationale for this
|
||
change is that the Turbo v2.5 model exhibits the most stable voice quality
|
||
along with very low latency TTFB; latencies are on par with the Flash v2.5
|
||
model. Also, the Turbo v2.5 model outputs word/timestamp alignment data with
|
||
correct spacing.
|
||
|
||
- The development runners `/connect` and `/start` endpoint now both return
|
||
`dailyRoom` and `dailyToken` in place of the previous `room_url` and `token`.
|
||
|
||
- Updated the `pipecat.runner.daily` utility to only a take `DAILY_API_URL` and
|
||
`DAILY_SAMPLE_ROOM_URL` environment variables instead of argparsing `-u` and
|
||
`-k`, respectively.
|
||
|
||
- Updated `daily-python` to 0.19.6.
|
||
|
||
- Changed `TavusVideoService` to send audio or video frames only after the
|
||
transport is ready, preventing warning messages at startup.
|
||
|
||
- The development runner now strips any provided protocol (e.g. https://) from
|
||
the proxy address and issues a warning. It also strips trailing `/`.
|
||
|
||
### Deprecated
|
||
|
||
- In the `pipecat.runner.daily`, the `configure_with_args()` function is
|
||
deprecated. Use the `configure()` function instead.
|
||
|
||
- The development runner's `/connect` endpoint is deprecated and will be
|
||
removed in a future version. Use the `/start` endpoint in its place. In the
|
||
meantime, both endpoints work and deliver equivalent functionality.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `DailyTransport` issue that would result in an unhandled
|
||
`concurrent.futures.CancelledError` when a future is cancelled.
|
||
|
||
- Fixed a `RivaSTTService` issue that would result in an unhandled
|
||
`concurrent.futures.CancelledError` when a future is cancelled when reading
|
||
from the audio chunks from the incoming audio stream.
|
||
|
||
- Fixed an issue in the `BaseOutputTransport`, mainly reproducible with
|
||
`FastAPIWebsocketOutputTransport` when the audio mixer was enabled, where the
|
||
loop could consume 100% CPU by continuously returning without delay, preventing
|
||
other asyncio tasks (such as cancellation or shutdown signals) from being
|
||
processed.
|
||
|
||
- Fixed an issue where `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame`
|
||
were not emitted when using `TavusVideoService` or `HeyGenVideoService`.
|
||
|
||
- Fixed an issue in `LiveKitTransport` where empty `AudioRawFrame`s were pushed
|
||
down the pipeline. This resulted in warnings by the STT processor.
|
||
- Fixed `PiperTTSService` to send text as a JSON object in the request body,
|
||
resolving compatibility with Piper's HTTP API.
|
||
|
||
- Fixed an issue with the `TavusVideoService` where an error was thrown due to
|
||
missing transcription callbacks.
|
||
|
||
- Fixed an issue in `SpeechmaticsSTTService` where the `user_id` was set to
|
||
`None` when diarization is not enabled.
|
||
|
||
### Performance
|
||
|
||
- Fixed an issue in `TaskObserver` (a proxy to all observers) that was degrading
|
||
global performance.
|
||
|
||
### Other
|
||
|
||
- Added `07aa-interruptible-soniox.py`, `07ab-interruptible-inworld-http.py`,
|
||
`07ac-interruptible-asyncai.py` and `07ac-interruptible-asyncai-http.py`
|
||
release evals.
|
||
|
||
## [0.0.77] - 2025-07-31
|
||
|
||
### Added
|
||
|
||
- Added `InputTextRawFrame` frame type to handle user text input with Gemini
|
||
Multimodal Live.
|
||
|
||
- Added `HeyGenVideoService`. This is an integration for HeyGen Interactive
|
||
Avatar. A video service that handles audio streaming and requests HeyGen to
|
||
generate avatar video responses. (see https://www.heygen.com/)
|
||
|
||
- Added the ability to switch voices to `RimeTTSService`.
|
||
|
||
- Added unified development runner for building voice AI bots across multiple
|
||
transports
|
||
|
||
- `pipecat.runner.run` – FastAPI-based development server with automatic bot
|
||
discovery
|
||
- `pipecat.runner.types` – Runner session argument types
|
||
(`DailyRunnerArguments`, `SmallWebRTCRunnerArguments`,
|
||
`WebSocketRunnerArguments`)
|
||
- `pipecat.runner.utils.create_transport()` – Factory function for creating
|
||
transports from session arguments
|
||
- `pipecat.runner.daily` and `pipecat.runner.livekit` – Configuration
|
||
utilities for Daily and LiveKit setups
|
||
- Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo
|
||
- Automatic telephony provider detection and serializer configuration
|
||
- ESP32 WebRTC compatibility with SDP munging
|
||
- Environment detection (`ENV=local`) for conditional features
|
||
|
||
- Added Async.ai TTS integration (https://async.ai/)
|
||
|
||
- `AsyncAITTSService` – WebSocket-based streaming TTS with interruption
|
||
support
|
||
- `AsyncAIHttpTTSService` – HTTP-based streaming TTS service
|
||
- Example scripts:
|
||
- `examples/foundational/07ac-interruptible-asyncai.py` (WebSocket demo)
|
||
- `examples/foundational/07ac-interruptible-asyncai-http.py` (HTTP demo)
|
||
|
||
- Added `transcription_bucket` params support to the `DailyRESTHelper`.
|
||
|
||
- Added a new TTS service, `InworldTTSService`. This service provides
|
||
low-latency, high-quality speech generation using Inworld's streaming API.
|
||
|
||
- Added a new field `handle_sigterm` to `PipelineRunner`. It defaults to
|
||
`False`. This field handles SIGTERM signals. The `handle_sigint` field still
|
||
defaults to `True`, but now it handles only SIGINT signals.
|
||
|
||
- Added foundational example `14u-function-calling-ollama.py` for Ollama
|
||
function calling.
|
||
|
||
- Added `LocalSmartTurnAnalyzerV2`, which supports local on-device inference
|
||
with the new `smart-turn-v2` turn detection model.
|
||
|
||
- Added `set_log_level` to `DailyTransport`, allowing setting the logging level
|
||
for Daily's internal logging system.
|
||
|
||
- Added `on_transcription_stopped` and `on_transcription_error` to Daily
|
||
callbacks.
|
||
|
||
### Changed
|
||
|
||
- Changed the default `url` for `NeuphonicTTSService` to
|
||
`wss://api.neuphonic.com` as it provides better global performance. You can
|
||
set the URL to other URLs, such as the previous default:
|
||
`wss://eu-west-1.api.neuphonic.com`.
|
||
|
||
- Update `daily-python` to 0.19.5.
|
||
|
||
- `STTMuteFilter` now pushes the `STTMuteFrame` upstream and downstream, to
|
||
allow for more flexible `STTMuteFilter` placement.
|
||
|
||
- Play delayed messages from `ElevenLabsTTSService` if they still belong to the
|
||
current context.
|
||
|
||
- Dependency compatibility improvements: Relaxed version constraints for core
|
||
dependencies to support broader version ranges while maintaining stability:
|
||
|
||
- `aiohttp`, `Markdown`, `nltk`, `numpy`, `Pillow`, `pydantic`, `openai`,
|
||
`numba`: Now support up to the next major version (e.g. `numpy>=1.26.4,<3`)
|
||
- `pyht`: Relaxed to `>=0.1.6` to resolve `grpcio` conflicts with
|
||
`nvidia-riva-client`
|
||
- `fastapi`: Updated to support versions `>=0.115.6,<0.117.0`
|
||
- `torch`/`torchaudio`: Changed from exact pinning (`==2.5.0`) to compatible
|
||
range (`~=2.5.0`)
|
||
- `aws_sdk_bedrock_runtime`: Added Python 3.12+ constraint via environment
|
||
marker
|
||
- `numba`: Reduced minimum version to `0.60.0` for better compatibility
|
||
|
||
- Changed `NeuphonicHttpTTSService` to use a POST based request instead of the
|
||
`pyneuphonic` package. This removes a package requirement, allowing Neuphonic
|
||
to work with more services.
|
||
|
||
- Updated `ElevenLabsTTSService` to handle the case where
|
||
`allow_interruptions=False`. Now, when interruptions are disabled, the same
|
||
context ID will be used throughout the conversation.
|
||
|
||
- Updated the `deepgram` optional dependency to 4.7.0, which downgrades the
|
||
`tasks cancelled error` to a debug log. This removes the log from appearing
|
||
in Pipecat logs upon leaving.
|
||
|
||
- Upgraded the `websockets` implementation to the new asyncio implementation.
|
||
Along with this change, we're updating support for versions >=13.1.0 and
|
||
<15.0.0. All services have been update to use the asyncio implementation.
|
||
|
||
- Updated `MiniMaxHttpTTSService` with a `base_url` arg where you can specify
|
||
the Global endpoint (default) or Mainland China.
|
||
|
||
- Replaced regex-based sentence detection in `match_endofsentence` with NLTK's
|
||
punkt_tab tokenizer for more reliable sentence boundary detection.
|
||
|
||
- Changed the `livekit` optional dependency for `tenacity` to
|
||
`tenacity>=8.2.3,<10.0.0` in order to support the `google-genai` package.
|
||
|
||
- For `LmntTTSService`, changed the default `model` to `blizzard`, LMNT's
|
||
recommended model.
|
||
|
||
- Updated `SpeechmaticsSTTService`:
|
||
- Added support for additional diarization options.
|
||
- Added foundational example `07a-interruptible-speechmatics-vad.py`, which
|
||
uses VAD detection provided by `SpeechmaticsSTTService`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `LLMUserResponseAggregator` issue where interruptions were not being
|
||
handled properly.
|
||
|
||
- Fixed `PiperTTSService` to work with newer Piper GPL.
|
||
|
||
- Fixed a race condition in `FastAPIWebsocketClient` that occurred when
|
||
attempting to send a message while the client was disconnecting.
|
||
|
||
- Fixed an issue in `GoogleLLMService` where interruptions did not work when an
|
||
interruption strategy was used.
|
||
|
||
- Fixed an issue in the `TranscriptProcessor` where newline characters could
|
||
cause the transcript output to be corrupted (e.g. missing all spaces).
|
||
|
||
- Fixed an issue in `AudioBufferProcessor` when using `SmallWebRTCTransport`
|
||
where, if the microphone was muted, track timing was not respected.
|
||
|
||
- Fixed an error that occurs when pushing an `LLMMessagesFrame`. Only some LLM
|
||
services, like Grok, are impacted by this issue. The fix is to remove the
|
||
optional `name` property that was being added to the message.
|
||
|
||
- Fixed an issue in `AudioBufferProcessor` that caused garbled audio when
|
||
`enable_turn_audio` was enabled and audio resampling was required.
|
||
|
||
- Fixed a dependency issue for uv users where an `llvmlite` version required
|
||
python 3.9.
|
||
|
||
- Fixed an issue in `MiniMaxHttpTTSService` where the `pitch` param was the
|
||
incorrect type.
|
||
|
||
- Fixed an issue with OpenTelemetry tracing where the `enable_tracing` flag did
|
||
not disable the internal tracing decorator functions.
|
||
|
||
- Fixed an issue in `OLLamaLLMService` where kwargs were not passed correctly
|
||
to the parent class.
|
||
|
||
- Fixed an issue in `ElevenLabsTTSService` where the word/timestamp pairs were
|
||
calculating word boundaries incorrectly.
|
||
|
||
- Fixed an issue where, in some edge cases, the
|
||
`EmulateUserStartedSpeakingFrame` could be created even if we didn't have a
|
||
transcription.
|
||
|
||
- Fixed an issue in `GoogleLLMContext` where it would inject the
|
||
`system_message` as a "user" message into cases where it was not meant to;
|
||
it was only meant to do that when there were no "regular" (non-function-call)
|
||
messages in the context, to ensure that inference would run properly.
|
||
|
||
- Fixed an issue in `LiveKitTransport` where the `on_audio_track_subscribed` was
|
||
never emitted.
|
||
|
||
### Other
|
||
|
||
- Added new quickstart demos:
|
||
|
||
- examples/quickstart: voice AI bot quickstart
|
||
- examples/client-server-web: client/server starter example
|
||
- examples/phone-bot-twilio: twilio starter example
|
||
|
||
- Removed most of the examples from the pipecat repo. Examples can now be
|
||
found in: https://github.com/pipecat-ai/pipecat-examples.
|
||
|
||
## [0.0.76] - 2025-07-11
|
||
|
||
### Added
|
||
|
||
- Added `SpeechControlParamsFrame`, a new `SystemFrame` that notifies
|
||
downstream processors of the VAD and Turn analyzer params. This frame is
|
||
pushed by the `BaseInputTransport` at Start and any time a
|
||
`VADParamsUpdateFrame` is received.
|
||
|
||
### Changed
|
||
|
||
- Two package dependencies have been updated:
|
||
- `numpy` now supports 1.26.0 and newer
|
||
- `transformers` now supports 4.48.0 and newer
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with RTVI's handling of `append-to-context`.
|
||
|
||
- Fixed an issue where using audio input with a sample rate requiring resampling
|
||
could result in empty audio being passed to STT services, causing errors.
|
||
|
||
- Fixed the VAD analyzer to process the full audio buffer as long as it contains
|
||
more than the minimum required bytes per iteration, instead of only analyzing
|
||
the first chunk.
|
||
|
||
- Fixed an issue in ParallelPipeline that caused errors when attempting to drain
|
||
the queues.
|
||
|
||
- Fixed an issue with emulated VAD timeout inconsistency in
|
||
`LLMUserContextAggregator`. Previously, emulated VAD scenarios (where
|
||
transcription is received without VAD detection) used a hardcoded
|
||
`aggregation_timeout` (default 0.5s) instead of matching the VAD's
|
||
`stop_secs` parameter (default 0.8s). This created different user experiences
|
||
between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts
|
||
automatically synchronize with the VAD's `stop_secs` parameter.
|
||
|
||
- Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the
|
||
user started early, while the bot was still working through
|
||
`trigger_assistant_response()`.
|
||
|
||
## [0.0.75] - 2025-07-08 [YANKED]
|
||
|
||
**This release has been yanked due to resampling issues affecting audio output
|
||
quality and critical bugs impacting `ParallelPipelines` functionality.**
|
||
|
||
**Please upgrade to version 0.0.76 or later.**
|
||
|
||
### Added
|
||
|
||
- Added an `aggregate_sentences` arg in `CartesiaTTSService`,
|
||
`ElevenLabsTTSService`, `NeuphonicTTSService` and `RimeTTSService`, where the
|
||
default value is True. When `aggregate_sentences` is True, the `TTSService`
|
||
aggregates the LLM streamed tokens into sentences by default. Note: setting
|
||
the value to False requires a custom processor before the `TTSService` to
|
||
aggregate LLM tokens.
|
||
|
||
- Added `kwargs` to the `OLLamaLLMService` to allow for configuration args to
|
||
be passed to Ollama.
|
||
|
||
- Added call hang-up error handling in `TwilioFrameSerializer`, which handles
|
||
the case where the user has hung up before the `TwilioFrameSerializer` hangs
|
||
up the call.
|
||
|
||
### Changed
|
||
|
||
- Updated `RTVIObserver` and `RTVIProcessor` to match the new RTVI 1.0.0 protocol.
|
||
This includes:
|
||
|
||
- Deprecating support for all messages related to service configuaration and
|
||
actions.
|
||
- Adding support for obtaining and logging data about client, including its
|
||
RTVI version and optionally included system information (OS/browser/etc.)
|
||
- Adding support for handling the new `client-message` RTVI message through
|
||
either a `on_client_message` event handler or listening for a new
|
||
`RTVIClientMessageFrame`
|
||
- Adding support for responding to a `client-message` with a `server-response`
|
||
via either a direct call on the `RTVIProcessor` or via pushing a new
|
||
`RTVIServerResponseFrame`
|
||
- Adding built-in support for handling the new `append-to-context` RTVI message
|
||
which allows a client to add to the user or assistant llm context. No extra
|
||
code is required for supporting this behavior.
|
||
- Updating all JavaScript and React client RTVI examples to use versions 1.0.0
|
||
of the clients.
|
||
|
||
Get started migrating to RTVI protocol 1.0.0 by following the migration guide:
|
||
https://docs.pipecat.ai/client/migration-guide
|
||
|
||
- Refactored `AWSBedrockLLMService` and `AWSPollyTTSService` to work
|
||
asynchronously using `aioboto3` instead of the `boto3` library.
|
||
|
||
- The `UserIdleProcessor` now handles the scenario where function calls take
|
||
longer than the idle timeout duration. This allows you to use the
|
||
`UserIdleProcessor` in conjunction with function calls that take a while to
|
||
return a result.
|
||
|
||
### Fixed
|
||
|
||
- Updated the `NeuphonicTTSService` to work with the updated websocket API.
|
||
|
||
- Fixed an issue with `RivaSTTService` where the watchdog feature was causing
|
||
an error on initialization.
|
||
|
||
### Performance
|
||
|
||
- Remove unncessary push task in each `FrameProcessor`.
|
||
|
||
## [0.0.74] - 2025-07-03 [YANKED]
|
||
|
||
**This release has been yanked due to resampling issues affecting audio output
|
||
quality and critical bugs impacting `ParallelPipelines` functionality.**
|
||
|
||
**Please upgrade to version 0.0.76 or later.**
|
||
|
||
### Added
|
||
|
||
- Added a new STT service, `SpeechmaticsSTTService`. This service provides
|
||
real-time speech-to-text transcription using the Speechmatics API. It supports
|
||
partial and final transcriptions, multiple languages, various audio formats,
|
||
and speaker diarization.
|
||
|
||
- Added `normalize` and `model_id` to `FishAudioTTSService`.
|
||
|
||
- Added `http_options` argument to `GoogleLLMService`.
|
||
|
||
- Added `run_llm` field to `LLMMessagesAppendFrame` and `LLMMessagesUpdateFrame`
|
||
frames. If true, a context frame will be pushed triggering the LLM to respond.
|
||
|
||
- Added a new `SOXRStreamAudioResampler` for processing audio in chunks or
|
||
streams. If you write your own processor and need to use an audio resampler,
|
||
use the new `create_stream_resampler()`.
|
||
|
||
- Added new `DailyParams.audio_in_user_tracks` to allow receiving one track per
|
||
user (default) or a single track from the room (all participants mixed).
|
||
|
||
- Added support for providing "direct" functions, which don't need an
|
||
accompanying `FunctionSchema` or function definition dict. Instead, metadata
|
||
(i.e. `name`, `description`, `properties`, and `required`) are automatically
|
||
extracted from a combination of the function signature and docstring.
|
||
|
||
Usage:
|
||
|
||
```python
|
||
# "Direct" function
|
||
# `params` must be the first parameter
|
||
async def do_something(params: FunctionCallParams, foo: int, bar: str = ""):
|
||
"""
|
||
Do something interesting.
|
||
|
||
Args:
|
||
foo (int): The foo to do something interesting with.
|
||
bar (string): The bar to do something interesting with.
|
||
"""
|
||
|
||
result = await process(foo, bar)
|
||
await params.result_callback({"result": result})
|
||
|
||
# ...
|
||
|
||
llm.register_direct_function(do_something)
|
||
|
||
# ...
|
||
|
||
tools = ToolsSchema(standard_tools=[do_something])
|
||
```
|
||
|
||
- `user_id` is now populated in the `TranscriptionFrame` and
|
||
`InterimTranscriptionFrame` when using a transport that provides a `user_id`,
|
||
like `DailyTransport` or `LiveKitTransport`.
|
||
|
||
- Added `watchdog_coroutine()`. This is a watchdog helper for couroutines. So,
|
||
if you have a coroutine that is waiting for a result and that takes a long
|
||
time, you will need to wrap it with `watchdog_coroutine()` so the watchdog
|
||
timers are reset regularly.
|
||
|
||
- Added `session_token` parameter to `AWSNovaSonicLLMService`.
|
||
|
||
- Added Gemini Multimodal Live File API for uploading, fetching, listing, and
|
||
deleting files. See `26f-gemini-live-files-api.py` for example usage.
|
||
|
||
### Changed
|
||
|
||
- Updated all the services to use the new `SOXRStreamAudioResampler`, ensuring smooth
|
||
transitions and eliminating clicks.
|
||
|
||
- Upgraded `daily-python` to 0.19.4.
|
||
|
||
- Updated `google` optional dependency to use `google-genai` version `1.24.0`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where audio would get stuck in the queue when an interrupt occurs
|
||
during Azure TTS synthesis.
|
||
|
||
- Fixed a race condition that occurs in Python 3.10+ where the task could miss
|
||
the `CancelledError` and continue running indefinitely, freezing the pipeline.
|
||
|
||
- Fixed a `AWSNovaSonicLLMService` issue introduced in 0.0.72.
|
||
|
||
### Deprecated
|
||
|
||
- In `FishAudioTTSService`, deprecated `model` and replaced with
|
||
`reference_id`. This change is to better align with Fish Audio's variable
|
||
naming and to reduce confusion about what functionality the variable
|
||
controls.
|
||
|
||
## [0.0.73] - 2025-06-26
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue introduced in 0.0.72 that would cause `ElevenLabsTTSService`,
|
||
`GladiaSTTService`, `NeuphonicTTSService` and `OpenAIRealtimeBetaLLMService`
|
||
to throw an error.
|
||
|
||
## [0.0.72] - 2025-06-26
|
||
|
||
### Added
|
||
|
||
- Added logging and improved error handling to help diagnose and prevent potential
|
||
Pipeline freezes.
|
||
|
||
- Added `WatchdogQueue`, `WatchdogPriorityQueue`, `WatchdogEvent` and
|
||
`WatchdogAsyncIterator`. These helper utilities reset watchdog timers
|
||
appropriately before they expire. When watchdog timers are disabled, the
|
||
utilities behave as standard counterparts without side effects.
|
||
|
||
- Introduce task watchdog timers. Watchdog timers are used to detect if a
|
||
Pipecat task is taking longer than expected (by default 5 seconds). Watchdog
|
||
timers are disabled by default and can be enabled globally by passing
|
||
`enable_watchdog_timers` argument to `PipelineTask` constructor. It is
|
||
possible to change the default watchdog timer timeout by using the
|
||
`watchdog_timeout` argument. You can also log how long it takes to reset the
|
||
watchdog timers which is done with the `enable_watchdog_logging`. You can
|
||
control all these settings per each frame processor or even per task. That is,
|
||
you can set `enable_watchdog_timers`, `enable_watchdog_logging` and
|
||
`watchdog_timeout` when creating any frame processor through their constructor
|
||
arguments or when you create a task with `FrameProcessor.create_task()`. Note
|
||
that watchdog timers only work with Pipecat tasks and will not work if you use
|
||
`asycio.create_task()` or similar.
|
||
|
||
- Added `lexicon_names` parameter to `AWSPollyTTSService.InputParams`.
|
||
|
||
- Added reconnection logic and audio buffer management to `GladiaSTTService`.
|
||
|
||
- The `TurnTrackingObserver` now ends a turn upon observing an `EndFrame` or
|
||
`CancelFrame`.
|
||
|
||
- Added Polish support to `AWSTranscribeSTTService`.
|
||
|
||
- Added new frames `FrameProcessorPauseFrame` and `FrameProcessorResumeFrame`
|
||
which allow pausing and resuming frame processing for a given frame
|
||
processor. These are control frames, so they are ordered. Pausing frame
|
||
processor will keep old frames in the internal queues until resume takes
|
||
place. Frames being pushed while a frame processor is paused will be pushed to
|
||
the queues. When frame processing is resumed all queued frames will be
|
||
processed in order. Also added `FrameProcessorPauseUrgentFrame` and
|
||
`FrameProcessorResumeUrgentFrame` which are system frames and therefore they
|
||
have high priority.
|
||
|
||
- Added a property called `has_function_calls_in_progress` in
|
||
`LLMAssistantContextAggregator` that exposes whether a function call is in
|
||
progress.
|
||
|
||
- Added `SambaNovaLLMService` which provides llm api integration with an
|
||
OpenAI-compatible interface.
|
||
|
||
- Added `SambaNovaTTSService` which provides speech-to-text functionality using
|
||
SambaNovas's (whisper) API.
|
||
|
||
- Add fundational examples for function calling and transcription
|
||
`14s-function-calling-sambanova.py`, `13g-sambanova-transcription.py`
|
||
|
||
### Changed
|
||
|
||
- `HeartbeatFrame`s are now control frames. This will make it easier to detect
|
||
pipeline freezes. Previously, heartbeat frames were system frames which meant
|
||
they were not get queued with other frames, making it difficult to detect
|
||
pipeline stalls.
|
||
|
||
- Updated `OpenAIRealtimeBetaLLMService` to accept `language` in the
|
||
`InputAudioTranscription` class for all models.
|
||
|
||
- Updated the default model for `OpenAIRealtimeBetaLLMService` to
|
||
`gpt-4o-realtime-preview-2025-06-03`.
|
||
|
||
- The `PipelineParams` arg `allow_interruptions` now defaults to `True`.
|
||
|
||
- `TavusTransport` and `TavusVideoService` now send audio to Tavus using WebRTC
|
||
audio tracks instead of `app-messages` over WebSocket. This should improve the
|
||
overall audio quality.
|
||
|
||
- Upgraded `daily-python` to 0.19.3.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue that would cause heartbeat frames to be sent before processors
|
||
were started.
|
||
|
||
- Fixed an event loop blocking issue when using `SentryMetrics`.
|
||
|
||
- Fixed an issue in `FastAPIWebsocketClient` to ensure proper disconnection
|
||
when the websocket is already closed.
|
||
|
||
- Fixed an issue where the `UserStoppedSpeakingFrame` was not received if the
|
||
transport was not receiving new audio frames.
|
||
|
||
- Fixed an edge case where if the user interrupted the bot but no new aggregation
|
||
was received, the bot would not resume speaking.
|
||
|
||
- Fixed an issue with `TelnyxFrameSerializer` where it would throw an exception
|
||
when the user hung up the call.
|
||
|
||
- Fixed an issue with `ElevenLabsTTSService` where the context was not being
|
||
closed.
|
||
|
||
- Fixed function calling in `AWSNovaSonicLLMService`.
|
||
|
||
- Fixed an issue that would cause multiple `PipelineTask.on_idle_timeout`
|
||
events to be triggered repeatedly.
|
||
|
||
- Fixed an issue that was causing user and bot speech to not be synchronized
|
||
during recordings.
|
||
|
||
- Fixed an issue where voice settings weren't applied to ElevenLabsTTSService.
|
||
|
||
- Fixed an issue with `GroqTTSService` where it was not properly parsing the
|
||
WAV file header.
|
||
|
||
- Fixed an issue with `GoogleSTTService` where it was constantly reconnecting
|
||
before starting to receive audio from the user.
|
||
|
||
- Fixed an issue where `GoogleLLMService`'s TTFB value was incorrect.
|
||
|
||
### Deprecated
|
||
|
||
- `AudioBufferProcessor` parameter `user_continuos_stream` is deprecated.
|
||
|
||
### Other
|
||
|
||
- Rename `14e-function-calling-gemini.py` to `14e-function-calling-google.py`.
|
||
|
||
## [0.0.71] - 2025-06-10
|
||
|
||
### Added
|
||
|
||
- Adds a parameter called `additional_span_attributes` to PipelineTask that
|
||
lets you add any additional attributes you'd like to the conversation span.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with `CartesiaSTTService` initialization.
|
||
|
||
## [0.0.70] - 2025-06-10
|
||
|
||
### Added
|
||
|
||
- Added `ExotelFrameSerializer` to handle telephony calls via Exotel.
|
||
|
||
- Added the option `informal` to `TranslationConfig` on Gladia config.
|
||
Allowing to force informal language forms when available.
|
||
|
||
- Added `CartesiaSTTService` which is a websocket based implementation to
|
||
transcribe audio. Added a foundational example in
|
||
`13f-cartesia-transcription.py`
|
||
|
||
- Added an `websocket` example, showing how to use the new Pipecat client
|
||
`WebsocketTransport` to connect with Pipecat `FastAPIWebsocketTransport` or
|
||
`WebsocketServerTransport`.
|
||
|
||
- Added language support to `RimeHttpTTSService`. Extended languages to include
|
||
German and French for both `RimeTTSService` and `RimeHttpTTSService`.
|
||
|
||
### Changed
|
||
|
||
- Upgraded `daily-python` to 0.19.2.
|
||
|
||
- Make `PipelineTask.add_observer()` synchronous. This allows callers to call it
|
||
before doing the work of running the `PipelineTask` (i.e. without invoking
|
||
`PipelineTask.set_event_loop()` first).
|
||
|
||
- Pipecat 0.0.69 forced `uvloop` event loop on Linux on macOS. Unfortunately,
|
||
this is causing issue in some systems. So, `uvloop` is not enabled by default
|
||
anymore. If you want to use `uvloop` you can just set the `asyncio` event
|
||
policy before starting your agent with:
|
||
|
||
```python
|
||
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
|
||
```
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with various TTS services that would cause audio glitches at
|
||
the start of every bot turn.
|
||
|
||
- Fixed an `ElevenLabsTTSService` issue where a context warning was printed
|
||
when pushing a `TTSSpeakFrame`.
|
||
|
||
- Fixed an `AssemblyAISTTService` issue that could cause unexpected behavior
|
||
when yielding empty `Frame()`s.
|
||
|
||
- Fixed an issue where `OutputAudioRawFrame.transport_destination` was being
|
||
reset to `None` instead of retaining its intended value before sending the
|
||
audio frame to `write_audio_frame`.
|
||
|
||
- Fixed a typo in Livekit transport that prevented initialization.
|
||
|
||
## [0.0.69] - 2025-06-02 "AI Engineer World's Fair release" ✨
|
||
|
||
### Added
|
||
|
||
- Added a new frame `FunctionCallsStartedFrame`. This frame is pushed both
|
||
upstream and downstream from the LLM service to indicate that one or more
|
||
function calls are going to be executed.
|
||
|
||
- Added LLM services `on_function_calls_started` event. This event will be
|
||
triggered when the LLM service receives function calls from the model and is
|
||
going to start executing them.
|
||
|
||
- Function calls can now be executed sequentially (in the order received in the
|
||
completion) by passing `run_in_parallel=False` when creating your LLM
|
||
service. By default, if the LLM completion returns 2 or more function calls
|
||
they run concurrently. In both cases, concurrently and sequentially, a new LLM
|
||
completion will run when the last function call finishes.
|
||
|
||
- Added OpenTelemetry tracing for `GeminiMultimodalLiveLLMService` and
|
||
`OpenAIRealtimeBetaLLMService`.
|
||
|
||
- Added initial support for interruption strategies, which determine if the user
|
||
should interrupt the bot while the bot is speaking. Interruption strategies
|
||
can be based on factors such as audio volume or the number of words spoken by
|
||
the user. These can be specified via the new `interruption_strategies` field
|
||
in `PipelineParams`. A new `MinWordsInterruptionStrategy` strategy has been
|
||
introduced which triggers an interruption if the user has spoken a minimum
|
||
number of words. If no interruption strategies are specified, the normal
|
||
interruption behavior applies. If multiple strategies are provided, the first
|
||
one that evaluates to true will trigger the interruption.
|
||
|
||
- `BaseInputTransport` now handles `StopFrame`. When a `StopFrame` is received
|
||
the transport will pause sending frames downstream until a new `StartFrame` is
|
||
received. This allows the transport to be reused (keeping the same connection)
|
||
in a different pipeline.
|
||
|
||
- Updated AssemblyAI STT service to support their latest streaming
|
||
speech-to-text model with improved transcription latency and endpointing.
|
||
|
||
- You can now access STT service results through the new
|
||
`TranscriptionFrame.result` and `InterimTranscriptionFrame.result` field. This
|
||
is useful in case you use some specific settings for the STT and you want to
|
||
access the STT results.
|
||
|
||
- The examples runner is now public from the `pipecat.examples` package. This
|
||
allows everyone to build their own examples and run them easily.
|
||
|
||
- It is now possible to push `OutputDTMFFrame` or `OutputDTMFUrgentFrame` with
|
||
`DailyTransport`. This will be sent properly if a Daily dial-out connection
|
||
has been established.
|
||
|
||
- Added `OutputDTMFUrgentFrame` to send a DTMF keypress quickly. The previous
|
||
`OutputDTMFFrame` queues the keypress with the rest of data frames.
|
||
|
||
- Added `DTMFAggregator`, which aggregates keypad presses into
|
||
`TranscriptionFrame`s. Aggregation occurs after a timeout, termination key
|
||
press, or user interruption. You can specify the prefix of the
|
||
`TranscriptionFrame`.
|
||
|
||
- Added new functions `DailyTransport.start_transcription()` and
|
||
`DailyTransport.stop_transcription()` to be able to start and stop Daily
|
||
transcription dynamically (maybe with different settings).
|
||
|
||
### Changed
|
||
|
||
- Reverted the default model for `GeminiMultimodalLiveLLMService` back to
|
||
`models/gemini-2.0-flash-live-001`.
|
||
`gemini-2.5-flash-preview-native-audio-dialog` has inconsistent performance.
|
||
You can opt in to using this model by setting the `model` arg.
|
||
|
||
- Function calls are now cancelled by default if there's an interruption. To
|
||
disable this behavior you can set `cancel_on_interruption=False` when
|
||
registering the function call. Since function calls are executed as tasks you
|
||
can tell if a function call has been cancelled by catching the
|
||
`asyncio.CancelledError` exception (and don't forget to raise it again!).
|
||
|
||
- Updated OpenTelemetry tracing attribute `metrics.ttfb_ms` to `metrics.ttfb`.
|
||
The attribute reports TTFB in seconds.
|
||
|
||
### Deprecated
|
||
|
||
- `DailyTransport.send_dtmf()` is deprecated, push an `OutputDTMFFrame` or an
|
||
`OutputDTMFUrgentFrame` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with `ElevenLabsTTSService` where long responses would
|
||
continue generating output even after an interruption.
|
||
|
||
- Fixed an issue with the `OpenAILLMContext` where non-Roman characters were
|
||
being incorrectly encoded as Unicode escape sequences. This was a logging
|
||
issue and did not impact the actual conversation.
|
||
|
||
- In `AWSBedrockLLMService`, worked around a possible bug in AWS Bedrock where
|
||
a `toolConfig` is required if there has been previous tool use in the
|
||
messages array. This workaround includes a no_op factory function call is
|
||
used to satisfy the requirement.
|
||
|
||
- Fixed `WebsocketClientTransport` to use `FrameProcessorSetup.task_manager`
|
||
instead of `StartFrame.task_manager`.
|
||
|
||
### Performance
|
||
|
||
- Use `uvloop` as the new event loop on Linux and macOS systems.
|
||
|
||
## [0.0.68] - 2025-05-28
|
||
|
||
### Added
|
||
|
||
- Added `GoogleHttpTTSService` which uses Google's HTTP TTS API.
|
||
|
||
- Added `TavusTransport`, a new transport implementation compatible with any
|
||
Pipecat pipeline. When using the `TavusTransport`the Pipecat bot will
|
||
connect in the same room as the Tavus Avatar and the user.
|
||
|
||
- Added `PlivoFrameSerializer` to support Plivo calls. A full running example
|
||
has also been added to `examples/plivo-chatbot`.
|
||
|
||
- Added `UserBotLatencyLogObserver`. This is an observer that logs the latency
|
||
between when the user stops speaking and when the bot starts speaking. This
|
||
gives you an initial idea on how quickly the AI services respond.
|
||
|
||
- Added `SarvamTTSService`, which implements Sarvam AI's TTS API:
|
||
https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert.
|
||
|
||
- Added `PipelineTask.add_observer()` and `PipelineTask.remove_observer()` to
|
||
allow mangaging observers at runtime. This is useful for cases where the task
|
||
is passed around to other code components that might want to observe the
|
||
pipeline dynamically.
|
||
|
||
- Added `user_id` field to `TranscriptionMessage`. This allows identifying the
|
||
user in a multi-user scenario. Note that this requires that
|
||
`TranscriptionFrame` has the `user_id` properly set.
|
||
|
||
- Added new `PipelineTask` event handlers `on_pipeline_started`,
|
||
`on_pipeline_stopped`, `on_pipeline_ended` and `on_pipeline_cancelled`, which
|
||
correspond to the `StartFrame`, `StopFrame`, `EndFrame` and `CancelFrame`
|
||
respectively.
|
||
|
||
- Added additional languages to `LmntTTSService`. Languages include: `hi`,
|
||
`id`, `it`, `ja`, `nl`, `pl`, `ru`, `sv`, `th`, `tr`, `uk`, `vi`.
|
||
|
||
- Added a `model` parameter to the `LmntTTSService` constructor, allowing
|
||
switching between LMNT models.
|
||
|
||
- Added `MiniMaxHttpTTSService`, which implements MiniMax's T2A API for TTS.
|
||
Learn more: https://www.minimax.io/platform_overview
|
||
|
||
- A new function `FrameProcessor.setup()` has been added to allow setting up
|
||
frame processors before receiving a `StartFrame`. This is what's happening
|
||
internally: `FrameProcessor.setup()` is called, `StartFrame` is pushed from
|
||
the beginning of the pipeline, your regular pipeline operations, `EndFrame`
|
||
or `CancelFrame` are pushed from the beginning of the pipeline and finally
|
||
`FrameProcessor.cleanup()` is called.
|
||
|
||
- Added support for OpenTelemetry tracing in Pipecat. This initial
|
||
implementation includes:
|
||
|
||
- A `setup_tracing` method where you can specify your OpenTelemetry exporter
|
||
- Service decorators for STT (`@traced_stt`), LLM (`@traced_llm`), and TTS
|
||
(`@traced_tts`) which trace the execution and collect properties and
|
||
metrics (TTFB, token usage, character counts, etc.)
|
||
- Class decorators that provide execution tracking; these are generic and can
|
||
be used for service tracking as needed
|
||
- Spans that help track traces on a per conversations and turn basis:
|
||
|
||
```
|
||
conversation-uuid
|
||
├── turn-1
|
||
│ ├── stt_deepgramsttservice
|
||
│ ├── llm_openaillmservice
|
||
│ └── tts_cartesiattsservice
|
||
...
|
||
└── turn-n
|
||
└── ...
|
||
```
|
||
|
||
By default, Pipecat has implemented service decorators to trace execution of
|
||
STT, LLM, and TTS services. You can enable tracing by setting
|
||
`enable_tracing` to `True` in the PipelineTask.
|
||
|
||
- Added `TurnTrackingObserver`, which tracks the start and end of a user/bot
|
||
turn pair and emits events `on_turn_started` and `on_turn_stopped`
|
||
corresponding to the start and end of a turn, respectively.
|
||
|
||
- Allow passing observers to `run_test()` while running unit tests.
|
||
|
||
### Changed
|
||
|
||
- Upgraded `daily-python` to 0.19.1.
|
||
|
||
- ⚠️ Updated `SmallWebRTCTransport` to align with how other transports handle
|
||
`on_client_disconnected`. Now, when the connection is closed and no reconnection
|
||
is attempted, `on_client_disconnected` is called instead of `on_client_close`. The
|
||
`on_client_close` callback is no longer used, use `on_client_disconnected` instead.
|
||
|
||
- Check if `PipelineTask` has already been cancelled.
|
||
|
||
- Don't raise an exception if event handler is not registered.
|
||
|
||
- Upgraded `deepgram-sdk` to 4.1.0.
|
||
|
||
- Updated `GoogleTTSService` to use Google's streaming TTS API. The default
|
||
voice also updated to `en-US-Chirp3-HD-Charon`.
|
||
|
||
- ⚠️ Refactored the `TavusVideoService`, so it acts like a proxy, sending audio
|
||
to Tavus and receiving both audio and video. This will make
|
||
`TavusVideoService` usable with any Pipecat pipeline and with any transport.
|
||
This is a **breaking change**, check the
|
||
`examples/foundational/21a-tavus-layer-small-webrtc.py` to see how to use it.
|
||
|
||
- `DailyTransport` now uses custom microphone audio tracks instead of virtual
|
||
microphones. Now, multiple Daily transports can be used in the same process.
|
||
|
||
- `DailyTransport` now captures audio from individual participants instead of
|
||
the whole room. This allows identifying audio frames per participant.
|
||
|
||
- Updated the default model for `AnthropicLLMService` to
|
||
`claude-sonnet-4-20250514`.
|
||
|
||
- Updated the default model for `GeminiMultimodalLiveLLMService` to
|
||
`models/gemini-2.5-flash-preview-native-audio-dialog`.
|
||
|
||
- `BaseTextFilter` methods `filter()`, `update_settings()`,
|
||
`handle_interruption()` and `reset_interruption()` are now async.
|
||
|
||
- `BaseTextAggregator` methods `aggregate()`, `handle_interruption()` and
|
||
`reset()` are now async.
|
||
|
||
- The API version for `CartesiaTTSService` and `CartesiaHttpTTSService` has
|
||
been updated. Also, the `cartesia` dependency has been updated to 2.x.
|
||
|
||
- `CartesiaTTSService` and `CartesiaHttpTTSService` now support Cartesia's new
|
||
`speed` parameter which accepts values of `slow`, `normal`, and `fast`.
|
||
|
||
- `GeminiMultimodalLiveLLMService` now uses the user transcription and usage
|
||
metrics provided by Gemini Live.
|
||
|
||
- `GoogleLLMService` has been updated to use `google-genai` instead of the
|
||
deprecated `google-generativeai`.
|
||
|
||
### Deprecated
|
||
|
||
- In `CartesiaTTSService` and `CartesiaHttpTTSService`, `emotion` has been
|
||
deprecated by Cartesia. Pipecat is following suit and deprecating `emotion`
|
||
as well.
|
||
|
||
### Removed
|
||
|
||
- Since `GeminiMultimodalLiveLLMService` now transcribes it's own audio, the
|
||
`transcribe_user_audio` arg has been removed. Audio is now transcribed
|
||
automatically.
|
||
|
||
- Removed `SileroVAD` frame processor, just use `SileroVADAnalyzer`
|
||
instead. Also removed, `07a-interruptible-vad.py` example.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `DailyTransport` issue that was not allow capturing video frames if
|
||
framerate was greater than zero.
|
||
|
||
- Fixed a `DeegramSTTService` connection issue when the user provided their own
|
||
`LiveOptions`.
|
||
|
||
- Fixed a `DailyTransport` issue that would cause images needing resize to block
|
||
the event loop.
|
||
|
||
- Fixed an issue with `ElevenLabsTTSService` where changing the model or voice
|
||
while the service is running wasn't working.
|
||
|
||
- Fixed an issue that would cause multiple instances of the same class to behave
|
||
incorrectly if any of the given constructor arguments defaulted to a mutable
|
||
value (e.g. lists, dictionaries, objects).
|
||
|
||
- Fixed an issue with `CartesiaTTSService` where `TTSTextFrame` messages weren't
|
||
being emitted when the model was set to `sonic`. This resulted in the
|
||
assistant context not being updated with assistant messages.
|
||
|
||
### Performance
|
||
|
||
- `DailyTransport`: process audio, video and events in separate tasks.
|
||
|
||
- Don't create event handler tasks if no user event handlers have been
|
||
registered.
|
||
|
||
### Other
|
||
|
||
- It is now possible to run all (or most) foundational example with multiple
|
||
transports. By default, they run with P2P (Peer-To-Peer) WebRTC so you can try
|
||
everything locally. You can also run them with Daily or even with a Twilio
|
||
phone number.
|
||
|
||
- Added foundation examples `07y-interruptible-minimax.py` and
|
||
`07z-interruptible-sarvam.py`to show how to use the `MiniMaxHttpTTSService`
|
||
and `SarvamTTSService`, respectively.
|
||
|
||
- Added an `open-telemetry-tracing` example, showing how to setup tracing. The
|
||
example also includes Jaeger as an open source OpenTelemetry client to review
|
||
traces from the example runs.
|
||
|
||
- Added foundational example `29-turn-tracking-observer.py` to show how to use
|
||
the `TurnTrackingObserver`.
|
||
|
||
## [0.0.67] - 2025-05-07
|
||
|
||
### Added
|
||
|
||
- Added `DebugLogObserver` for detailed frame logging with configurable
|
||
filtering by frame type and endpoint. This observer automatically extracts
|
||
and formats all frame data fields for debug logging.
|
||
|
||
- `UserImageRequestFrame.video_source` field has been added to request an image
|
||
from the desired video source.
|
||
|
||
- Added support for the AWS Nova Sonic speech-to-speech model with the new
|
||
`AWSNovaSonicLLMService`.
|
||
See https://docs.aws.amazon.com/nova/latest/userguide/speech.html.
|
||
Note that it requires Python >= 3.12 and `pip install pipecat-ai[aws-nova-sonic]`.
|
||
|
||
- Added new AWS services `AWSBedrockLLMService` and `AWSTranscribeSTTService`.
|
||
|
||
- Added `on_active_speaker_changed` event handler to the `DailyTransport` class.
|
||
|
||
- Added `enable_ssml_parsing` and `enable_logging` to `InputParams` in
|
||
`ElevenLabsTTSService`.
|
||
|
||
- Added support to `RimeHttpTTSService` for the `arcana` model.
|
||
|
||
### Changed
|
||
|
||
- Updated `ElevenLabsTTSService` to use the beta websocket API
|
||
(multi-stream-input). This new API supports context_ids and cancelling those
|
||
contexts, which greatly improves interruption handling.
|
||
|
||
- Observers `on_push_frame()` now take a single argument `FramePushed` instead
|
||
of multiple arguments.
|
||
|
||
- Updated the default voice for `DeepgramTTSService` to `aura-2-helena-en`.
|
||
|
||
### Deprecated
|
||
|
||
- `PollyTTSService` is now deprecated, use `AWSPollyTTSService` instead.
|
||
|
||
- Observer `on_push_frame(src, dst, frame, direction, timestamp)` is now
|
||
deprecated, use `on_push_frame(data: FramePushed)` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `DailyTransport` issue that was causing issues when multiple audio or
|
||
video sources where being captured.
|
||
|
||
- Fixed a `UltravoxSTTService` issue that would cause the service to generate
|
||
all tokens as one word.
|
||
|
||
- Fixed a `PipelineTask` issue that would cause tasks to not be cancelled if
|
||
task was cancelled from outside of Pipecat.
|
||
|
||
- Fixed a `TaskManager` that was causing dangling tasks to be reported.
|
||
|
||
- Fixed an issue that could cause data to be sent to the transports when they
|
||
were still not ready.
|
||
|
||
- Remove custom audio tracks from `DailyTransport` before leaving.
|
||
|
||
### Removed
|
||
|
||
- Removed `CanonicalMetricsService` as it's no longer maintained.
|
||
|
||
## [0.0.66] - 2025-05-02
|
||
|
||
### Added
|
||
|
||
- Added two new input parameters to `RimeTTSService`: `pause_between_brackets`
|
||
and `phonemize_between_brackets`.
|
||
|
||
- Added support for cross-platform local smart turn detection. You can use
|
||
`LocalSmartTurnAnalyzer` for on-device inference using Torch.
|
||
|
||
- `BaseOutputTransport` now allows multiple destinations if the transport
|
||
implementation supports it (e.g. Daily's custom tracks). With multiple
|
||
destinations it is possible to send different audio or video tracks with a
|
||
single transport simultaneously. To do that, you need to set the new
|
||
`Frame.transport_destination` field with your desired transport destination
|
||
(e.g. custom track name), tell the transport you want a new destination with
|
||
`TransportParams.audio_out_destinations` or
|
||
`TransportParams.video_out_destinations` and the transport should take care of
|
||
the rest.
|
||
|
||
- Similar to the new `Frame.transport_destination`, there's a new
|
||
`Frame.transport_source` field which is set by the `BaseInputTransport` if the
|
||
incoming data comes from a non-default source (e.g. custom tracks).
|
||
|
||
- `TTSService` has a new `transport_destination` constructor parameter. This
|
||
parameter will be used to update the `Frame.transport_destination` field for
|
||
each generated `TTSAudioRawFrame`. This allows sending multiple bots' audio to
|
||
multiple destinations in the same pipeline.
|
||
|
||
- Added `DailyTransportParams.camera_out_enabled` and
|
||
`DailyTransportParams.microphone_out_enabled` which allows you to
|
||
enable/disable the main output camera or microphone tracks. This is useful if
|
||
you only want to use custom tracks and not send the main tracks. Note that you
|
||
still need `audio_out_enabled=True` or `video_out_enabled`.
|
||
|
||
- Added `DailyTransport.capture_participant_audio()` which allows you to capture
|
||
an audio source (e.g. "microphone", "screenAudio" or a custom track name) from
|
||
a remote participant.
|
||
|
||
- Added `DailyTransport.update_publishing()` which allows you to update the call
|
||
video and audio publishing settings (e.g. audio and video quality).
|
||
|
||
- Added `RTVIObserverParams` which allows you to configure what RTVI messages
|
||
are sent to the clients.
|
||
|
||
- Added a `context_window_compression` InputParam to
|
||
`GeminiMultimodalLiveLLMService` which allows you to enable a sliding context
|
||
window for the session as well as set the token limit of the sliding window.
|
||
|
||
- Updated `SmallWebRTCConnection` to support `ice_servers` with credentials.
|
||
|
||
- Added `VADUserStartedSpeakingFrame` and `VADUserStoppedSpeakingFrame`,
|
||
indicating when the VAD detected the user to start and stop speaking. These
|
||
events are helpful when using smart turn detection, as the user's stop time
|
||
can differ from when their turn ends (signified by UserStoppedSpeakingFrame).
|
||
|
||
- Added `TranslationFrame`, a new frame type that contains a translated
|
||
transcription.
|
||
|
||
- Added `TransportParams.audio_in_passthrough`. If set (the default), incoming
|
||
audio will be pushed downstream.
|
||
|
||
- Added `MCPClient`; a way to connect to MCP servers and use the MCP servers'
|
||
tools.
|
||
|
||
- Added `Mem0 OSS`, along with Mem0 cloud support now the OSS version is also
|
||
available.
|
||
|
||
### Changed
|
||
|
||
- `TransportParams.audio_mixer` now supports a string and also a dictionary to
|
||
provide a mixer per destination. For example:
|
||
|
||
```python
|
||
audio_out_mixer={
|
||
"track-1": SoundfileMixer(...),
|
||
"track-2": SoundfileMixer(...),
|
||
"track-N": SoundfileMixer(...),
|
||
},
|
||
```
|
||
|
||
- The `STTMuteFilter` now mutes `InterimTranscriptionFrame` and
|
||
`TranscriptionFrame` which allows the `STTMuteFilter` to be used in
|
||
conjunction with transports that generate transcripts, e.g. `DailyTransport`.
|
||
|
||
- Function calls now receive a single parameter `FunctionCallParams` instead of
|
||
`(function_name, tool_call_id, args, llm, context, result_callback)` which is
|
||
now deprecated.
|
||
|
||
- Changed the user aggregator timeout for late transcriptions from 1.0s to 0.5s
|
||
(`LLMUserAggregatorParams.aggregation_timeout`). Sometimes, the STT services
|
||
might give us more than one transcription which could come after the user
|
||
stopped speaking. We still want to include these additional transcriptions
|
||
with the first one because it's part of the user turn. This is what this
|
||
timeout is helpful with.
|
||
|
||
- Short utterances not detected by VAD while the bot is speaking are now
|
||
ignored. This reduces the amount of bot interruptions significantly providing
|
||
a more natural conversation experience.
|
||
|
||
- Updated `GladiaSTTService` to output a `TranslationFrame` when specifying a
|
||
`translation` and `translation_config`.
|
||
|
||
- STT services now passthrough audio frames by default. This allows you to add
|
||
audio recording without worrying about what's wrong in your pipeline when it
|
||
doesn't work the first time.
|
||
|
||
- Input transports now always push audio downstream unless disabled with
|
||
`TransportParams.audio_in_passthrough`. After many Pipecat releases, we
|
||
realized this is the common use case. There are use cases where the input
|
||
transport already provides STT and you also don't want recordings, in which
|
||
case there's no need to push audio to the rest of the pipeline, but this is
|
||
not a very common case.
|
||
|
||
- Added `RivaSegmentedSTTService`, which allows Riva offline/batch models, such
|
||
as to be "canary-1b-asr" used in Pipecat.
|
||
|
||
### Deprecated
|
||
|
||
- Function calls with parameters
|
||
`(function_name, tool_call_id, args, llm, context, result_callback)` are
|
||
deprectated, use a single `FunctionCallParams` parameter instead.
|
||
|
||
- `TransportParams.camera_*` parameters are now deprecated, use
|
||
`TransportParams.video_*` instead.
|
||
|
||
- `TransportParams.vad_enabled` parameter is now deprecated, use
|
||
`TransportParams.audio_in_enabled` and `TransportParams.vad_analyzer` instead.
|
||
|
||
- `TransportParams.vad_audio_passthrough` parameter is now deprecated, use
|
||
`TransportParams.audio_in_passthrough` instead.
|
||
|
||
- `ParakeetSTTService` is now deprecated, use `RivaSTTService` instead, which uses
|
||
the model "parakeet-ctc-1.1b-asr" by default.
|
||
|
||
- `FastPitchTTSService` is now deprecated, use `RivaTTSService` instead, which uses
|
||
the model "magpie-tts-multilingual" by default.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with `SimliVideoService` where the bot was continuously outputting
|
||
audio, which prevents the `BotStoppedSpeakingFrame` from being emitted.
|
||
|
||
- Fixed an issue where `OpenAIRealtimeBetaLLMService` would add two assistant
|
||
messages to the context.
|
||
|
||
- Fixed an issue with `GeminiMultimodalLiveLLMService` where the context
|
||
contained tokens instead of words.
|
||
|
||
- Fixed an issue with HTTP Smart Turn handling, where the service returns a 500
|
||
error. Previously, this would cause an unhandled exception. Now, a 500 error
|
||
is treated as an incomplete response.
|
||
|
||
- Fixed a TTS services issue that could cause assistant output not to be
|
||
aggregated to the context when also using `TTSSpeakFrame`s.
|
||
|
||
- Fixed an issue where the `SmartTurnMetricsData` was reporting 0ms for
|
||
inference and processing time when using the `FalSmartTurnAnalyzer`.
|
||
|
||
### Other
|
||
|
||
- Added `examples/daily-custom-tracks` to show how to send and receive Daily
|
||
custom tracks.
|
||
|
||
- Added `examples/daily-multi-translation` to showcase how to send multiple
|
||
simulataneous translations with the same transport.
|
||
|
||
- Added 04 foundational examples for client/server transports. Also, renamed
|
||
`29-livekit-audio-chat.py` to `04b-transports-livekit.py`.
|
||
|
||
- Added foundational example `13c-gladia-translation.py` showing how to use
|
||
`TranscriptionFrame` and `TranslationFrame`.
|
||
|
||
## [0.0.65] - 2025-04-23 "Sant Jordi's release" 🌹📕
|
||
|
||
https://en.wikipedia.org/wiki/Saint_George%27s_Day_in_Catalonia
|
||
|
||
### Added
|
||
|
||
- Added automatic hangup logic to the Telnyx serializer. This feature hangs up
|
||
the Telnyx call when an `EndFrame` or `CancelFrame` is received. It is
|
||
enabled by default and is configurable via the `auto_hang_up` `InputParam`.
|
||
|
||
- Added a keepalive task to `GladiaSTTService` to prevent the websocket from
|
||
disconnecting after 30 seconds of no audio input.
|
||
|
||
### Changed
|
||
|
||
- The `InputParams` for `ElevenLabsTTSService` and `ElevenLabsHttpTTSService`
|
||
no longer require that `stability` and `similarity_boost` be set. You can
|
||
individually set each param.
|
||
|
||
- In `TwilioFrameSerializer`, `call_sid` is Optional so as to avoid a breaking
|
||
changed. `call_sid` is required to automatically hang up.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where `TwilioFrameSerializer` would send two hang up commands:
|
||
one for the `EndFrame` and one for the `CancelFrame`.
|
||
|
||
## [0.0.64] - 2025-04-22
|
||
|
||
### Added
|
||
|
||
- Added automatic hangup logic to the Twilio serializer. This feature hangs up
|
||
the Twilio call when an `EndFrame` or `CancelFrame` is received. It is
|
||
enabled by default and is configurable via the `auto_hang_up` `InputParam`.
|
||
|
||
- Added `SmartTurnMetricsData`, which contains end-of-turn prediction metrics,
|
||
to the `MetricsFrame`. Using `MetricsFrame`, you can now retrieve prediction
|
||
confidence scores and processing time metrics from the smart turn analyzers.
|
||
|
||
- Added support for Application Default Credentials in Google services,
|
||
`GoogleSTTService`, `GoogleTTSService`, and `GoogleVertexLLMService`.
|
||
|
||
- Added support for Smart Turn Detection via the `turn_analyzer` transport
|
||
parameter. You can now choose between `HttpSmartTurnAnalyzer()` or
|
||
`FalSmartTurnAnalyzer()` for remote inference or
|
||
`LocalCoreMLSmartTurnAnalyzer()` for on-device inference using Core ML.
|
||
|
||
- `DeepgramTTSService` accepts `base_url` argument again, allowing you to
|
||
connect to an on-prem service.
|
||
|
||
- Added `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` which allow
|
||
you to control aggregator settings. You can now pass these arguments when
|
||
creating aggregator pairs with `create_context_aggregator()`.
|
||
|
||
- Added `previous_text` context support to ElevenLabsHttpTTSService, improving
|
||
speech consistency across sentences within an LLM response.
|
||
|
||
- Added word/timestamp pairs to `ElevenLabsHttpTTSService`.
|
||
|
||
- It is now possible to disable `SoundfileMixer` when created. You can then use
|
||
`MixerEnableFrame` to dynamically enable it when necessary.
|
||
|
||
- Added `on_client_connected` and `on_client_disconnected` event handlers to
|
||
the `DailyTransport` class. These handlers map to the same underlying Daily
|
||
events as `on_participant_joined` and `on_participant_left`, respectively.
|
||
This makes it easier to write a single bot pipeline that can also use other
|
||
transports like `SmallWebRTCTransport` and `FastAPIWebsocketTransport`.
|
||
|
||
### Changed
|
||
|
||
- `GrokLLMService` now uses `grok-3-beta` as its default model.
|
||
|
||
- Daily's REST helpers now include an `eject_at_token_exp` param, which ejects
|
||
the user when their token expires. This new parameter defaults to False.
|
||
Also, the default value for `enable_prejoin_ui` changed to False and
|
||
`eject_at_room_exp` changed to False.
|
||
|
||
- `OpenAILLMService` and `OpenPipeLLMService` now use `gpt-4.1` as their
|
||
default model.
|
||
|
||
- `SoundfileMixer` constructor arguments need to be keywords.
|
||
|
||
### Deprecated
|
||
|
||
- `DeepgramSTTService` parameter `url` is now deprecated, use `base_url`
|
||
instead.
|
||
|
||
### Removed
|
||
|
||
- Parameters `user_kwargs` and `assistant_kwargs` when creating a context
|
||
aggregator pair using `create_context_aggregator()` have been removed. Use
|
||
`user_params` and `assistant_params` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue that would cause TTS websocket-based services to not cleanup
|
||
resources properly when disconnecting.
|
||
|
||
- Fixed a `TavusVideoService` issue that was causing audio choppiness.
|
||
|
||
- Fixed an issue in `SmallWebRTCTransport` where an error was thrown if the
|
||
client did not create a video transceiver.
|
||
|
||
- Fixed an issue where LLM input parameters were not working and applied
|
||
correctly in `GoogleVertexLLMService`, causing unexpected behavior during
|
||
inference.
|
||
|
||
### Other
|
||
|
||
- Updated the `twilio-chatbot` example to use the auto-hangup feature.
|
||
|
||
## [0.0.63] - 2025-04-11
|
||
|
||
### Added
|
||
|
||
- Added media resolution control to `GeminiMultimodalLiveLLMService` with
|
||
`GeminiMediaResolution` enum, allowing configuration of token usage for
|
||
image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing
|
||
with 256 tokens).
|
||
|
||
- Added Gemini's Voice Activity Detection (VAD) configuration to
|
||
`GeminiMultimodalLiveLLMService` with `GeminiVADParams`, allowing fine
|
||
control over speech detection sensitivity and timing, including:
|
||
|
||
- Start sensitivity (how quickly speech is detected)
|
||
- End sensitivity (how quickly turns end after pauses)
|
||
- Prefix padding (milliseconds of audio to keep before speech is detected)
|
||
- Silence duration (milliseconds of silence required to end a turn)
|
||
|
||
- Added comprehensive language support to `GeminiMultimodalLiveLLMService`,
|
||
supporting over 30 languages via the `language` parameter, with proper
|
||
mapping between Pipecat's `Language` enum and Gemini's language codes.
|
||
|
||
- Added support in `SmallWebRTCTransport` to detect when remote tracks are
|
||
muted.
|
||
|
||
- Added support for image capture from a video stream to the
|
||
`SmallWebRTCTransport`.
|
||
|
||
- Added a new iOS client option to the `SmallWebRTCTransport`
|
||
**video-transform** example.
|
||
|
||
- Added new processors `ProducerProcessor` and `ConsumerProcessor`. The
|
||
producer processor processes frames from the pipeline and decides whether the
|
||
consumers should consume it or not. If so, the same frame that is received by
|
||
the producer is sent to the consumer. There can be multiple consumers per
|
||
producer. These processors can be useful to push frames from one part of a
|
||
pipeline to a different one (e.g. when using `ParallelPipeline`).
|
||
|
||
- Improvements for the `SmallWebRTCTransport`:
|
||
- Wait until the pipeline is ready before triggering the `connected` event.
|
||
- Queue messages if the data channel is not ready.
|
||
- Update the aiortc dependency to fix an issue where the 'video/rtx' MIME
|
||
type was incorrectly handled as a codec retransmission.
|
||
- Avoid initial video delays.
|
||
|
||
### Changed
|
||
|
||
- In `GeminiMultimodalLiveLLMService`, removed the `transcribe_model_audio`
|
||
parameter in favor of Gemini Live's native output transcription support. Now
|
||
text transcriptions are produced directly by the model. No configuration is
|
||
required.
|
||
|
||
- Updated `GeminiMultimodalLiveLLMService`’s default `model` to
|
||
`models/gemini-2.0-flash-live-001` and `base_url` to the `v1beta` websocket
|
||
URL.
|
||
|
||
### Fixed
|
||
|
||
- Updated `daily-python` to 0.17.0 to fix an issue that was preventing to run on
|
||
older platforms.
|
||
|
||
- Fixed an issue where `CartesiaTTSService`'s spell feature would result in
|
||
the spelled word in the context appearing as "F,O,O,B,A,R" instead of
|
||
"FOOBAR".
|
||
|
||
- Fixed an issue in the Azure TTS services where the language was being set
|
||
incorrectly.
|
||
|
||
- Fixed `SmallWebRTCTransport` to support dynamic values for
|
||
`TransportParams.audio_out_10ms_chunks`. Previously, it only worked with 20ms
|
||
chunks.
|
||
|
||
- Fixed an issue with `GeminiMultimodalLiveLLMService` where the assistant
|
||
context messages had no space between words.
|
||
|
||
- Fixed an issue where `LLMAssistantContextAggregator` would prevent a
|
||
`BotStoppedSpeakingFrame` from moving through the pipeline.
|
||
|
||
## [0.0.62] - 2025-04-01 "An April Fools' release"
|
||
|
||
### Added
|
||
|
||
- Added `TransportParams.audio_out_10ms_chunks` parameter to allow controlling
|
||
the amount of audio being sent by the output transport. It defaults to 4, so
|
||
40ms audio chunks are sent.
|
||
|
||
- Added `QwenLLMService` for Qwen integration with an OpenAI-compatible
|
||
interface. Added foundational example `14q-function-calling-qwen.py`.
|
||
|
||
- Added `Mem0MemoryService`. Mem0 is a self-improving memory layer for LLM
|
||
applications. Learn more at: https://mem0.ai/.
|
||
|
||
- Added `WhisperSTTServiceMLX` for Whisper transcription on Apple Silicon.
|
||
See example in `examples/foundational/13e-whisper-mlx.py`. Latency of
|
||
completed transcription using Whisper large-v3-turbo on an M4 macbook is
|
||
~500ms.
|
||
|
||
- Added `SmallWebRTCTransport`, a new P2P WebRTC transport.
|
||
|
||
- Created two examples in `p2p-webrtc`:
|
||
- **video-transform**: Demonstrates sending and receiving audio/video with
|
||
`SmallWebRTCTransport` using `TypeScript`. Includes video frame
|
||
processing with OpenCV.
|
||
- **voice-agent**: A minimal example of creating a voice agent with
|
||
`SmallWebRTCTransport`.
|
||
|
||
- `GladiaSTTService` now have comprehensive support for the latest API config
|
||
options, including model, language detection, preprocessing, custom
|
||
vocabulary, custom spelling, translation, and message filtering options.
|
||
|
||
- Added `SmallWebRTCTransport`, a new P2P WebRTC transport.
|
||
|
||
- Created two examples in `p2p-webrtc`:
|
||
- **video-transform**: Demonstrates sending and receiving audio/video with
|
||
`SmallWebRTCTransport` using `TypeScript`. Includes video frame
|
||
processing with OpenCV.
|
||
- **voice-agent**: A minimal example of creating a voice agent with
|
||
`SmallWebRTCTransport`.
|
||
|
||
- Added support to `ProtobufFrameSerializer` to send the messages from
|
||
`TransportMessageFrame` and `TransportMessageUrgentFrame`.
|
||
|
||
- Added support for a new TTS service, `PiperTTSService`.
|
||
(see https://github.com/rhasspy/piper/)
|
||
|
||
- It is now possible to tell whether `UserStartedSpeakingFrame` or
|
||
`UserStoppedSpeakingFrame` have been generated because of emulation frames.
|
||
|
||
### Changed
|
||
|
||
- `FunctionCallResultFrame`a are now system frames. This is to prevent function
|
||
call results to be discarded during interruptions.
|
||
|
||
- Pipecat services have been reorganized into packages. Each package can have
|
||
one or more of the following modules (in the future new module names might be
|
||
needed) depending on the services implemented:
|
||
|
||
- image: for image generation services
|
||
- llm: for LLM services
|
||
- memory: for memory services
|
||
- stt: for Speech-To-Text services
|
||
- tts: for Text-To-Speech services
|
||
- video: for video generation services
|
||
- vision: for video recognition services
|
||
|
||
- Base classes for AI services have been reorganized into modules. They can now
|
||
be found in
|
||
`pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]`.
|
||
|
||
- `GladiaSTTService` now uses the `solaria-1` model by default. Other params
|
||
use Gladia's default values. Added support for more language codes.
|
||
|
||
### Deprecated
|
||
|
||
- All Pipecat services imports have been deprecated and a warning will be shown
|
||
when using the old import. The new import should be
|
||
`pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]`. For
|
||
example, `from pipecat.services.openai.llm import OpenAILLMService`.
|
||
|
||
- Import for AI services base classes from `pipecat.services.ai_services` is now
|
||
deprecated, use one of
|
||
`pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]`.
|
||
|
||
- Deprecated the `language` parameter in `GladiaSTTService.InputParams` in
|
||
favor of `language_config`, which better aligns with Gladia's API.
|
||
|
||
- Deprecated using `GladiaSTTService.InputParams` directly. Use the new
|
||
`GladiaInputParams` class instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `FastAPIWebsocketTransport` and `WebsocketClientTransport` issue that
|
||
would cause the transport to be closed prematurely, preventing the internally
|
||
queued audio to be sent. The same issue could also cause an infinite loop
|
||
while using an output mixer and when sending an `EndFrame`, preventing the bot
|
||
to finish.
|
||
|
||
- Fixed an issue that could cause the `TranscriptionUpdateFrame` being pushed
|
||
because of an interruption to be discarded.
|
||
|
||
- Fixed an issue that would cause `SegmentedSTTService` based services
|
||
(e.g. `OpenAISTTService`) to try to transcribe non-spoken audio, causing
|
||
invalid transcriptions.
|
||
|
||
- Fixed an issue where `GoogleTTSService` was emitting two `TTSStoppedFrames`.
|
||
|
||
### Performance
|
||
|
||
- Output transports now send 40ms audio chunks instead of 20ms. This should
|
||
improve performance.
|
||
|
||
- `BotSpeakingFrame`s are now sent every 200ms. If the output transport audio chunks
|
||
are higher than 200ms then they will be sent at every audio chunk.
|
||
|
||
### Other
|
||
|
||
- Added foundational example `37-mem0.py` demonstrating how to use the
|
||
`Mem0MemoryService`.
|
||
|
||
- Added foundational example `13e-whisper-mlx.py` demonstrating how to use the
|
||
`WhisperSTTServiceMLX`.
|
||
|
||
## [0.0.61] - 2025-03-26
|
||
|
||
### Added
|
||
|
||
- Added a new frame, `LLMSetToolChoiceFrame`, which provides a mechanism
|
||
for modifying the `tool_choice` in the context.
|
||
|
||
- Added `GroqTTSService` which provides text-to-speech functionality using
|
||
Groq's API.
|
||
|
||
- Added support in `DailyTransport` for updating remote participants'
|
||
`canReceive` permission via the `update_remote_participants()` method, by
|
||
bumping the daily-python dependency to >= 0.16.0.
|
||
|
||
- ElevenLabs TTS services now support a sample rate of 8000.
|
||
|
||
- Added support for `instructions` in `OpenAITTSService`.
|
||
|
||
- Added support for `base_url` in `OpenAIImageGenService` and
|
||
`OpenAITTSService`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue in `RTVIObserver` that prevented handling of Google LLM
|
||
context messages. The observer now processes both OpenAI-style and
|
||
Google-style contexts.
|
||
|
||
- Fixed an issue in Daily involving switching virtual devices, by bumping the
|
||
daily-python dependency to >= 0.16.1.
|
||
|
||
- Fixed a `GoogleAssistantContextAggregator` issue where function calls
|
||
placeholders where not being updated when then function call result was
|
||
different from a string.
|
||
|
||
- Fixed an issue that would cause `LLMAssistantContextAggregator` to block
|
||
processing more frames while processing a function call result.
|
||
|
||
- Fixed an issue where the `RTVIObserver` would report two bot started and
|
||
stopped speaking events for each bot turn.
|
||
|
||
- Fixed an issue in `UltravoxSTTService` that caused improper audio processing
|
||
and incorrect LLM frame output.
|
||
|
||
### Other
|
||
|
||
- Added `examples/foundational/07x-interruptible-local.py` to show how a local
|
||
transport can be used.
|
||
|
||
## [0.0.60] - 2025-03-20
|
||
|
||
### Added
|
||
|
||
- Added `default_headers` parameter to `BaseOpenAILLMService` constructor.
|
||
|
||
### Changed
|
||
|
||
- Rollback to `deepgram-sdk` 3.8.0 since 3.10.1 was causing connections issues.
|
||
|
||
- Changed the default `InputAudioTranscription` model to `gpt-4o-transcribe`
|
||
for `OpenAIRealtimeBetaLLMService`.
|
||
|
||
### Other
|
||
|
||
- Update the `19-openai-realtime-beta.py` and `19a-azure-realtime-beta.py`
|
||
examples to use the FunctionSchema format.
|
||
|
||
## [0.0.59] - 2025-03-20
|
||
|
||
### Added
|
||
|
||
- When registering a function call it is now possible to indicate if you want
|
||
the function call to be cancelled if there's a user interruption via
|
||
`cancel_on_interruption` (defaults to False). This is now possible because
|
||
function calls are executed concurrently.
|
||
|
||
- Added support for detecting idle pipelines. By default, if no activity has
|
||
been detected during 5 minutes, the `PipelineTask` will be automatically
|
||
cancelled. It is possible to override this behavior by passing
|
||
`cancel_on_idle_timeout=False`. It is also possible to change the default
|
||
timeout with `idle_timeout_secs` or the frames that prevent the pipeline from
|
||
being idle with `idle_timeout_frames`. Finally, an `on_idle_timeout` event
|
||
handler will be triggered if the idle timeout is reached (whether the pipeline
|
||
task is cancelled or not).
|
||
|
||
- Added `FalSTTService`, which provides STT for Fal's Wizper API.
|
||
|
||
- Added a `reconnect_on_error` parameter to websocket-based TTS services as well
|
||
as a `on_connection_error` event handler. The `reconnect_on_error` indicates
|
||
whether the TTS service should reconnect on error. The `on_connection_error`
|
||
will always get called if there's any error no matter the value of
|
||
`reconnect_on_error`. This allows, for example, to fallback to a different TTS
|
||
provider if something goes wrong with the current one.
|
||
|
||
- Added new `SkipTagsAggregator` that extends `BaseTextAggregator` to aggregate
|
||
text and skips end of sentence matching if aggregated text is between
|
||
start/end tags.
|
||
|
||
- Added new `PatternPairAggregator` that extends `BaseTextAggregator` to
|
||
identify content between matching pattern pairs in streamed text. This allows
|
||
for detection and processing of structured content like XML-style tags that
|
||
may span across multiple text chunks or sentence boundaries.
|
||
|
||
- Added new `BaseTextAggregator`. Text aggregators are used by the TTS service
|
||
to aggregate LLM tokens and decide when the aggregated text should be pushed
|
||
to the TTS service. They also allow for the text to be manipulated while it's
|
||
being aggregated. A text aggregator can be passed via `text_aggregator` to the
|
||
TTS service.
|
||
|
||
- Added new `sample_rate` constructor parameter to `TavusVideoService` to allow
|
||
changing the output sample rate.
|
||
|
||
- Added new `NeuphonicTTSService`.
|
||
(see https://neuphonic.com)
|
||
|
||
- Added new `UltravoxSTTService`.
|
||
(see https://github.com/fixie-ai/ultravox)
|
||
|
||
- Added `on_frame_reached_upstream` and `on_frame_reached_downstream` event
|
||
handlers to `PipelineTask`. Those events will be called when a frame reaches
|
||
the beginning or end of the pipeline respectively. Note that by default, the
|
||
event handlers will not be called unless a filter is set with
|
||
`PipelineTask.set_reached_upstream_filter()` or
|
||
`PipelineTask.set_reached_downstream_filter()`.
|
||
|
||
- Added support for Chirp voices in `GoogleTTSService`.
|
||
|
||
- Added a `flush_audio()` method to `FishTTSService` and `LmntTTSService`.
|
||
|
||
- Added a `set_language` convenience method for `GoogleSTTService`, allowing
|
||
you to set a single language. This is in addition to the `set_languages`
|
||
method which allows you to set a list of languages.
|
||
|
||
- Added `on_user_turn_audio_data` and `on_bot_turn_audio_data` to
|
||
`AudioBufferProcessor`. This gives the ability to grab the audio of only that
|
||
turn for both the user and the bot.
|
||
|
||
- Added new base class `BaseObject` which is now the base class of
|
||
`FrameProcessor`, `PipelineRunner`, `PipelineTask` and `BaseTransport`. The
|
||
new `BaseObject` adds supports for event handlers.
|
||
|
||
- Added support for a unified format for specifying function calling across all
|
||
LLM services.
|
||
|
||
```python
|
||
weather_function = FunctionSchema(
|
||
name="get_current_weather",
|
||
description="Get the current weather",
|
||
properties={
|
||
"location": {
|
||
"type": "string",
|
||
"description": "The city and state, e.g. San Francisco, CA",
|
||
},
|
||
"format": {
|
||
"type": "string",
|
||
"enum": ["celsius", "fahrenheit"],
|
||
"description": "The temperature unit to use. Infer this from the user's location.",
|
||
},
|
||
},
|
||
required=["location"],
|
||
)
|
||
tools = ToolsSchema(standard_tools=[weather_function])
|
||
```
|
||
|
||
- Added `speech_threshold` parameter to `GladiaSTTService`.
|
||
|
||
- Allow passing user (`user_kwargs`) and assistant (`assistant_kwargs`) context
|
||
aggregator parameters when using `create_context_aggregator()`. The values are
|
||
passed as a mapping that will then be converted to arguments.
|
||
|
||
- Added `speed` as an `InputParam` for both `ElevenLabsTTSService` and
|
||
`ElevenLabsHttpTTSService`.
|
||
|
||
- Added new `LLMFullResponseAggregator` to aggregate full LLM completions. At
|
||
every completion the `on_completion` event handler is triggered.
|
||
|
||
- Added a new frame, `RTVIServerMessageFrame`, and RTVI message
|
||
`RTVIServerMessage` which provides a generic mechanism for sending custom
|
||
messages from server to client. The `RTVIServerMessageFrame` is processed by
|
||
the `RTVIObserver` and will be delivered to the client's `onServerMessage`
|
||
callback or `ServerMessage` event.
|
||
|
||
- Added `GoogleLLMOpenAIBetaService` for Google LLM integration with an
|
||
OpenAI-compatible interface. Added foundational example
|
||
`14o-function-calling-gemini-openai-format.py`.
|
||
|
||
- Added `AzureRealtimeBetaLLMService` to support Azure's OpeanAI Realtime API. Added
|
||
foundational example `19a-azure-realtime-beta.py`.
|
||
|
||
- Introduced `GoogleVertexLLMService`, a new class for integrating with Vertex AI
|
||
Gemini models. Added foundational example
|
||
`14p-function-calling-gemini-vertex-ai.py`.
|
||
|
||
- Added support in `OpenAIRealtimeBetaLLMService` for a slate of new features:
|
||
|
||
- The `'gpt-4o-transcribe'` input audio transcription model, along
|
||
with new `language` and `prompt` options specific to that model.
|
||
- The `input_audio_noise_reduction` session property.
|
||
|
||
```python
|
||
session_properties = SessionProperties(
|
||
# ...
|
||
input_audio_noise_reduction=InputAudioNoiseReduction(
|
||
type="near_field" # also supported: "far_field"
|
||
)
|
||
# ...
|
||
)
|
||
```
|
||
|
||
- The `'semantic_vad'` `turn_detection` session property value, a more
|
||
sophisticated model for detecting when the user has stopped speaking.
|
||
- `on_conversation_item_created` and `on_conversation_item_updated`
|
||
events to `OpenAIRealtimeBetaLLMService`.
|
||
|
||
```python
|
||
@llm.event_handler("on_conversation_item_created")
|
||
async def on_conversation_item_created(llm, item_id, item):
|
||
# ...
|
||
|
||
@llm.event_handler("on_conversation_item_updated")
|
||
async def on_conversation_item_updated(llm, item_id, item):
|
||
# `item` may not always be available here
|
||
# ...
|
||
```
|
||
|
||
- The `retrieve_conversation_item(item_id)` method for introspecting a
|
||
conversation item on the server.
|
||
|
||
```python
|
||
item = await llm.retrieve_conversation_item(item_id)
|
||
```
|
||
|
||
### Changed
|
||
|
||
- Updated `OpenAISTTService` to use `gpt-4o-transcribe` as the default
|
||
transcription model.
|
||
|
||
- Updated `OpenAITTSService` to use `gpt-4o-mini-tts` as the default TTS model.
|
||
|
||
- Function calls are now executed in tasks. This means that the pipeline will
|
||
not be blocked while the function call is being executed.
|
||
|
||
- ⚠️ `PipelineTask` will now be automatically cancelled if no bot activity is
|
||
happening in the pipeline. There are a few settings to configure this
|
||
behavior, see `PipelineTask` documentation for more details.
|
||
|
||
- All event handlers are now executed in separate tasks in order to prevent
|
||
blocking the pipeline. It is possible that event handlers take some time to
|
||
execute in which case the pipeline would be blocked waiting for the event
|
||
handler to complete.
|
||
|
||
- Updated `TranscriptProcessor` to support text output from
|
||
`OpenAIRealtimeBetaLLMService`.
|
||
|
||
- `OpenAIRealtimeBetaLLMService` and `GeminiMultimodalLiveLLMService` now push
|
||
a `TTSTextFrame`.
|
||
|
||
- Updated the default mode for `CartesiaTTSService` and
|
||
`CartesiaHttpTTSService` to `sonic-2`.
|
||
|
||
### Deprecated
|
||
|
||
- Passing a `start_callback` to `LLMService.register_function()` is now
|
||
deprecated, simply move the code from the start callback to the function call.
|
||
|
||
- `TTSService` parameter `text_filter` is now deprecated, use `text_filters`
|
||
instead which is now a list. This allows passing multiple filters that will be
|
||
executed in order.
|
||
|
||
### Removed
|
||
|
||
- Removed deprecated `audio.resample_audio()`, use `create_default_resampler()`
|
||
instead.
|
||
|
||
- Removed deprecated`stt_service` parameter from `STTMuteFilter`.
|
||
|
||
- Removed deprecated RTVI processors, use an `RTVIObserver` instead.
|
||
|
||
- Removed deprecated `AWSTTSService`, use `PollyTTSService` instead.
|
||
|
||
- Removed deprecated field `tier` from `DailyTranscriptionSettings`, use `model`
|
||
instead.
|
||
|
||
- Removed deprecated `pipecat.vad` package, use `pipecat.audio.vad` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an assistant aggregator issue that could cause assistant text to be
|
||
split into multiple chunks during function calls.
|
||
|
||
- Fixed an assistant aggregator issue that was causing assistant text to not be
|
||
added to the context during function calls. This could lead to duplications.
|
||
|
||
- Fixed a `SegmentedSTTService` issue that was causing audio to be sent
|
||
prematurely to the STT service. Instead of analyzing the volume in this
|
||
service we rely on VAD events which use both VAD and volume.
|
||
|
||
- Fixed a `GeminiMultimodalLiveLLMService` issue that was causing messages to be
|
||
duplicated in the context when pushing `LLMMessagesAppendFrame` frames.
|
||
|
||
- Fixed an issue with `SegmentedSTTService` based services
|
||
(e.g. `GroqSTTService`) that was not allow audio to pass-through downstream.
|
||
|
||
- Fixed a `CartesiaTTSService` and `RimeTTSService` issue that would consider
|
||
text between spelling out tags end of sentence.
|
||
|
||
- Fixed a `match_endofsentence` issue that would result in floating point
|
||
numbers to be considered an end of sentence.
|
||
|
||
- Fixed a `match_endofsentence` issue that would result in emails to be
|
||
considered an end of sentence.
|
||
|
||
- Fixed an issue where the RTVI message `disconnect-bot` was pushing an
|
||
`EndFrame`, resulting in the pipeline not shutting down. It now pushes an
|
||
`EndTaskFrame` upstream to shutdown the pipeline.
|
||
|
||
- Fixed an issue with the `GoogleSTTService` where stream timeouts during
|
||
periods of inactivity were causing connection failures. The service now
|
||
properly detects timeout errors and handles reconnection gracefully,
|
||
ensuring continuous operation even after periods of silence or when using an
|
||
`STTMuteFilter`.
|
||
|
||
- Fixed an issue in `RimeTTSService` where the last line of text sent didn't
|
||
result in an audio output being generated.
|
||
|
||
- Fixed `OpenAIRealtimeBetaLLMService` by adding proper handling for:
|
||
- The `conversation.item.input_audio_transcription.delta` server message,
|
||
which was added server-side at some point and not handled client-side.
|
||
- Errors reported by the `response.done` server message.
|
||
|
||
### Other
|
||
|
||
- Add foundational example `07w-interruptible-fal.py`, showing `FalSTTService`.
|
||
|
||
- Added a new Ultravox example
|
||
`examples/foundational/07u-interruptible-ultravox.py`.
|
||
|
||
- Added new Neuphonic examples
|
||
`examples/foundational/07v-interruptible-neuphonic.py` and
|
||
`examples/foundational/07v-interruptible-neuphonic-http.py`.
|
||
|
||
- Added a new example `examples/foundational/36-user-email-gathering.py` to show
|
||
how to gather user emails. The example uses's Cartesia's `<spell></spell>`
|
||
tags and Rime `spell()` function to spell out the emails for confirmation.
|
||
|
||
- Update the `34-audio-recording.py` example to include an STT processor.
|
||
|
||
- Added foundational example `35-voice-switching.py` showing how to use the new
|
||
`PatternPairAggregator`. This example shows how to encode information for the
|
||
LLM to instruct TTS voice changes, but this can be used to encode any
|
||
information into the LLM response, which you want to parse and use in other
|
||
parts of your application.
|
||
|
||
- Added a Pipecat Cloud deployment example to the `examples` directory.
|
||
|
||
- Removed foundational examples 28b and 28c as the TranscriptProcessor no
|
||
longer has an LLM depedency. Renamed foundational example 28a to
|
||
`28-transcript-processor.py`.
|
||
|
||
## [0.0.58] - 2025-02-26
|
||
|
||
### Added
|
||
|
||
- Added track-specific audio event `on_track_audio_data` to
|
||
`AudioBufferProcessor` for accessing separate input and output audio tracks.
|
||
|
||
- Pipecat version will now be logged on every application startup. This will
|
||
help us identify what version we are running in case of any issues.
|
||
|
||
- Added a new `StopFrame` which can be used to stop a pipeline task while
|
||
keeping the frame processors running. The frame processors could then be used
|
||
in a different pipeline. The difference between a `StopFrame` and a
|
||
`StopTaskFrame` is that, as with `EndFrame` and `EndTaskFrame`, the
|
||
`StopFrame` is pushed from the task and the `StopTaskFrame` is pushed upstream
|
||
inside the pipeline by any processor.
|
||
|
||
- Added a new `PipelineTask` parameter `observers` that replaces the previous
|
||
`PipelineParams.observers`.
|
||
|
||
- Added a new `PipelineTask` parameter `check_dangling_tasks` to enable or
|
||
disable checking for frame processors' dangling tasks when the Pipeline
|
||
finishes running.
|
||
|
||
- Added new `on_completion_timeout` event for LLM services (all OpenAI-based
|
||
services, Anthropic and Google). Note that this event will only get triggered
|
||
if LLM timeouts are setup and if the timeout was reached. It can be useful to
|
||
retrigger another completion and see if the timeout was just a blip.
|
||
|
||
- Added new log observers `LLMLogObserver` and `TranscriptionLogObserver` that
|
||
can be useful for debugging your pipelines.
|
||
|
||
- Added `room_url` property to `DailyTransport`.
|
||
|
||
- Added `addons` argument to `DeepgramSTTService`.
|
||
|
||
- Added `exponential_backoff_time()` to `utils.network` module.
|
||
|
||
### Changed
|
||
|
||
- ⚠️ `PipelineTask` now requires keyword arguments (except for the first one for
|
||
the pipeline).
|
||
|
||
- Updated `PlayHTHttpTTSService` to take a `voice_engine` and `protocol` input
|
||
in the constructor. The previous method of providing a `voice_engine` input
|
||
that contains the engine and protocol is deprecated by PlayHT.
|
||
|
||
- The base `TTSService` class now strips leading newlines before sending text
|
||
to the TTS provider. This change is to solve issues where some TTS providers,
|
||
like Azure, would not output text due to newlines.
|
||
|
||
- `GrokLLMSService` now uses `grok-2` as the default model.
|
||
|
||
- `AnthropicLLMService` now uses `claude-3-7-sonnet-20250219` as the default
|
||
model.
|
||
|
||
- `RimeHttpTTSService` needs an `aiohttp.ClientSession` to be passed to the
|
||
constructor as all the other HTTP-based services.
|
||
|
||
- `RimeHttpTTSService` doesn't use a default voice anymore.
|
||
|
||
- `DeepgramSTTService` now uses the new `nova-3` model by default. If you want
|
||
to use the previous model you can pass `LiveOptions(model="nova-2-general")`.
|
||
(see https://deepgram.com/learn/introducing-nova-3-speech-to-text-api)
|
||
|
||
```python
|
||
stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
|
||
```
|
||
|
||
### Deprecated
|
||
|
||
- `PipelineParams.observers` is now deprecated, you the new `PipelineTask`
|
||
parameter `observers`.
|
||
|
||
### Removed
|
||
|
||
- Remove `TransportParams.audio_out_is_live` since it was not being used at all.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue that would cause undesired interruptions via
|
||
`EmulateUserStartedSpeakingFrame`.
|
||
|
||
- Fixed a `GoogleLLMService` that was causing an exception when sending inline
|
||
audio in some cases.
|
||
|
||
- Fixed an `AudioContextWordTTSService` issue that would cause an `EndFrame` to
|
||
disconnect from the TTS service before audio from all the contexts was
|
||
received. This affected services like Cartesia and Rime.
|
||
|
||
- Fixed an issue that was not allowing to pass an `OpenAILLMContext` to create
|
||
`GoogleLLMService`'s context aggregators.
|
||
|
||
- Fixed a `ElevenLabsTTSService`, `FishAudioTTSService`, `LMNTTTSService` and
|
||
`PlayHTTTSService` issue that was resulting in audio requested before an
|
||
interruption being played after an interruption.
|
||
|
||
- Fixed `match_endofsentence` support for ellipses.
|
||
|
||
- Fixed an issue where `EndTaskFrame` was not triggering
|
||
`on_client_disconnected` or closing the WebSocket in FastAPI.
|
||
|
||
- Fixed an issue in `DeepgramSTTService` where the `sample_rate` passed to the
|
||
`LiveOptions` was not being used, causing the service to use the default
|
||
sample rate of pipeline.
|
||
|
||
- Fixed a context aggregator issue that would not append the LLM text response
|
||
to the context if a function call happened in the same LLM turn.
|
||
|
||
- Fixed an issue that was causing HTTP TTS services to push `TTSStoppedFrame`
|
||
more than once.
|
||
|
||
- Fixed a `FishAudioTTSService` issue where `TTSStoppedFrame` was not being
|
||
pushed.
|
||
|
||
- Fixed an issue that `start_callback` was not invoked for some LLM services.
|
||
|
||
- Fixed an issue that would cause `DeepgramSTTService` to stop working after an
|
||
error occurred (e.g. sudden network loss). If the network recovered we would
|
||
not reconnect.
|
||
|
||
- Fixed a `STTMuteFilter` issue that would not mute user audio frames causing
|
||
transcriptions to be generated by the STT service.
|
||
|
||
### Other
|
||
|
||
- Added Gemini support to `examples/phone-chatbot`.
|
||
|
||
- Added foundational example `34-audio-recording.py` showing how to use the
|
||
AudioBufferProcessor callbacks to save merged and track recordings.
|
||
|
||
## [0.0.57] - 2025-02-14
|
||
|
||
### Added
|
||
|
||
- Added new `AudioContextWordTTSService`. This is a TTS base class for TTS
|
||
services that handling multiple separate audio requests.
|
||
|
||
- Added new frames `EmulateUserStartedSpeakingFrame` and
|
||
`EmulateUserStoppedSpeakingFrame` which can be used to emulated VAD behavior
|
||
without VAD being present or not being triggered.
|
||
|
||
- Added a new `audio_in_stream_on_start` field to `TransportParams`.
|
||
|
||
- Added a new method `start_audio_in_streaming` in the `BaseInputTransport`.
|
||
|
||
- This method should be used to start receiving the input audio in case the
|
||
field `audio_in_stream_on_start` is set to `false`.
|
||
|
||
- Added support for the `RTVIProcessor` to handle buffered audio in `base64`
|
||
format, converting it into InputAudioRawFrame for transport.
|
||
|
||
- Added support for the `RTVIProcessor` to trigger `start_audio_in_streaming`
|
||
only after the `client-ready` message.
|
||
|
||
- Added new `MUTE_UNTIL_FIRST_BOT_COMPLETE` strategy to `STTMuteStrategy`. This
|
||
strategy starts muted and remains muted until the first bot speech completes,
|
||
ensuring the bot's first response cannot be interrupted. This complements the
|
||
existing `FIRST_SPEECH` strategy which only mutes during the first detected
|
||
bot speech.
|
||
|
||
- Added support for Google Cloud Speech-to-Text V2 through `GoogleSTTService`.
|
||
|
||
- Added `RimeTTSService`, a new `WordTTSService`. Updated the foundational
|
||
example `07q-interruptible-rime.py` to use `RimeTTSService`.
|
||
|
||
- Added support for Groq's Whisper API through the new `GroqSTTService` and
|
||
OpenAI's Whisper API through the new `OpenAISTTService`. Introduced a new
|
||
base class `BaseWhisperSTTService` to handle common Whisper API
|
||
functionality.
|
||
|
||
- Added `PerplexityLLMService` for Perplexity NIM API integration, with an
|
||
OpenAI-compatible interface. Also, added foundational example
|
||
`14n-function-calling-perplexity.py`.
|
||
|
||
- Added `DailyTransport.update_remote_participants()`. This allows you to update
|
||
remote participant's settings, like their permissions or which of their
|
||
devices are enabled. Requires that the local participant have participant
|
||
admin permission.
|
||
|
||
### Changed
|
||
|
||
- We don't consider a colon `:` and end of sentence any more.
|
||
|
||
- Updated `DailyTransport` to respect the `audio_in_stream_on_start` field,
|
||
ensuring it only starts receiving the audio input if it is enabled.
|
||
|
||
- Updated `FastAPIWebsocketOutputTransport` to send `TransportMessageFrame` and
|
||
`TransportMessageUrgentFrame` to the serializer.
|
||
|
||
- Updated `WebsocketServerOutputTransport` to send `TransportMessageFrame` and
|
||
`TransportMessageUrgentFrame` to the serializer.
|
||
|
||
- Enhanced `STTMuteConfig` to validate strategy combinations, preventing
|
||
`MUTE_UNTIL_FIRST_BOT_COMPLETE` and `FIRST_SPEECH` from being used together
|
||
as they handle first bot speech differently.
|
||
|
||
- Updated foundational example `07n-interruptible-google.py` to use all Google
|
||
services.
|
||
|
||
- `RimeHttpTTSService` now uses the `mistv2` model by default.
|
||
|
||
- Improved error handling in `AzureTTSService` to properly detect and log
|
||
synthesis cancellation errors.
|
||
|
||
- Enhanced `WhisperSTTService` with full language support and improved model
|
||
documentation.
|
||
|
||
- Updated foundation example `14f-function-calling-groq.py` to use
|
||
`GroqSTTService` for transcription.
|
||
|
||
- Updated `GroqLLMService` to use `llama-3.3-70b-versatile` as the default
|
||
model.
|
||
|
||
- `RTVIObserver` doesn't handle `LLMSearchResponseFrame` frames anymore. For
|
||
now, to handle those frames you need to create a `GoogleRTVIObserver`
|
||
instead.
|
||
|
||
### Deprecated
|
||
|
||
- `STTMuteFilter` constructor's `stt_service` parameter is now deprecated and
|
||
will be removed in a future version. The filter now manages mute state
|
||
internally instead of querying the STT service.
|
||
|
||
- `RTVI.observer()` is now deprecated, instantiate an `RTVIObserver` directly
|
||
instead.
|
||
|
||
- All RTVI frame processors (e.g. `RTVISpeakingProcessor`,
|
||
`RTVIBotLLMProcessor`) are now deprecated, instantiate an `RTVIObserver`
|
||
instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `FalImageGenService` issue that was causing the event loop to be
|
||
blocked while loading the downloadded image.
|
||
|
||
- Fixed a `CartesiaTTSService` service issue that would cause audio overlapping
|
||
in some cases.
|
||
|
||
- Fixed a websocket-based service issue (e.g. `CartesiaTTSService`) that was
|
||
preventing a reconnection after the server disconnected cleanly, which was
|
||
causing an inifite loop instead.
|
||
|
||
- Fixed a `BaseOutputTransport` issue that was causing upstream frames to no be
|
||
pushed upstream.
|
||
|
||
- Fixed multiple issue where user transcriptions where not being handled
|
||
properly. It was possible for short utterances to not trigger VAD which would
|
||
cause user transcriptions to be ignored. It was also possible for one or more
|
||
transcriptions to be generated after VAD in which case they would also be
|
||
ignored.
|
||
|
||
- Fixed an issue that was causing `BotStoppedSpeakingFrame` to be generated too
|
||
late. This could then cause issues unblocking `STTMuteFilter` later than
|
||
desired.
|
||
|
||
- Fixed an issue that was causing `AudioBufferProcessor` to not record
|
||
synchronized audio.
|
||
|
||
- Fixed an `RTVI` issue that was causing `bot-tts-text` messages to be sent
|
||
before being processed by the output transport.
|
||
|
||
- Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect
|
||
the websocket connection even when the connection is already closed.
|
||
|
||
- Fixed an issue where `has_regular_messages` condition was always true in
|
||
`GoogleLLMContext` due to `Part` having `function_call` & `function_response`
|
||
with `None` values.
|
||
|
||
### Other
|
||
|
||
- Added new `instant-voice` example. This example showcases how to enable
|
||
instant voice communication as soon as a user connects.
|
||
|
||
- Added new `local-input-select-stt` example. This examples allows you to play
|
||
with local audio inputs by slecting them through a nice text interface.
|
||
|
||
## [0.0.56] - 2025-02-06
|
||
|
||
### Changed
|
||
|
||
- Use `gemini-2.0-flash-001` as the default model for `GoogleLLMSerivce`.
|
||
|
||
- Improved foundational examples 22b, 22c, and 22d to support function calling.
|
||
With these base examples, `FunctionCallInProgressFrame` and
|
||
`FunctionCallResultFrame` will no longer be blocked by the gates.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `TkLocalTransport` and `LocalAudioTransport` issues that was causing
|
||
errors on cleanup.
|
||
|
||
- Fixed an issue that was causing `tests.utils` import to fail because of
|
||
logging setup.
|
||
|
||
- Fixed a `SentryMetrics` issue that was preventing any metrics to be sent to
|
||
Sentry and also was preventing from metrics frames to be pushed to the
|
||
pipeline.
|
||
|
||
- Fixed an issue in `BaseOutputTransport` where incoming audio would not be
|
||
resampled to the desired output sample rate.
|
||
|
||
- Fixed an issue with the `TwilioFrameSerializer` and `TelnyxFrameSerializer`
|
||
where `twilio_sample_rate` and `telnyx_sample_rate` were incorrectly
|
||
initialized to `audio_in_sample_rate`. Those values currently default to 8000
|
||
and should be set manually from the serializer constructor if a different
|
||
value is needed.
|
||
|
||
### Other
|
||
|
||
- Added a new `sentry-metrics` example.
|
||
|
||
## [0.0.55] - 2025-02-05
|
||
|
||
### Added
|
||
|
||
- Added a new `start_metadata` field to `PipelineParams`. The provided metadata
|
||
will be set to the initial `StartFrame` being pushed from the `PipelineTask`.
|
||
|
||
- Added new fields to `PipelineParams` to control audio input and output sample
|
||
rates for the whole pipeline. This allows controlling sample rates from a
|
||
single place instead of having to specify sample rates in each
|
||
service. Setting a sample rate to a service is still possible and will
|
||
override the value from `PipelineParams`.
|
||
|
||
- Introduce audio resamplers (`BaseAudioResampler`). This is just a base class
|
||
to implement audio resamplers. Currently, two implementations are provided
|
||
`SOXRAudioResampler` and `ResampyResampler`. A new
|
||
`create_default_resampler()` has been added (replacing the now deprecated
|
||
`resample_audio()`).
|
||
|
||
- It is now possible to specify the asyncio event loop that a `PipelineTask` and
|
||
all the processors should run on by passing it as a new argument to the
|
||
`PipelineRunner`. This could allow running pipelines in multiple threads each
|
||
one with its own event loop.
|
||
|
||
- Added a new `utils.TaskManager`. Instead of a global task manager we now have
|
||
a task manager per `PipelineTask`. In the previous version the task manager
|
||
was global, so running multiple simultaneous `PipelineTask`s could result in
|
||
dangling task warnings which were not actually true. In order, for all the
|
||
processors to know about the task manager, we pass it through the
|
||
`StartFrame`. This means that processors should create tasks when they receive
|
||
a `StartFrame` but not before (because they don't have a task manager yet).
|
||
|
||
- Added `TelnyxFrameSerializer` to support Telnyx calls. A full running example
|
||
has also been added to `examples/telnyx-chatbot`.
|
||
|
||
- Allow pushing silence audio frames before `TTSStoppedFrame`. This might be
|
||
useful for testing purposes, for example, passing bot audio to an STT service
|
||
which usually needs additional audio data to detect the utterance stopped.
|
||
|
||
- `TwilioSerializer` now supports transport message frames. With this we can
|
||
create Twilio emulators.
|
||
|
||
- Added a new transport: `WebsocketClientTransport`.
|
||
|
||
- Added a `metadata` field to `Frame` which makes it possible to pass custom
|
||
data to all frames.
|
||
|
||
- Added `test/utils.py` inside of pipecat package.
|
||
|
||
### Changed
|
||
|
||
- `GatedOpenAILLMContextAggregator` now require keyword arguments. Also, a new
|
||
`start_open` argument has been added to set the initial state of the gate.
|
||
|
||
- Added `organization` and `project` level authentication to
|
||
`OpenAILLMService`.
|
||
|
||
- Improved the language checking logic in `ElevenLabsTTSService` and
|
||
`ElevenLabsHttpTTSService` to properly handle language codes based on model
|
||
compatibility, with appropriate warnings when language codes cannot be
|
||
applied.
|
||
|
||
- Updated `GoogleLLMContext` to support pushing `LLMMessagesUpdateFrame`s that
|
||
contain a combination of function calls, function call responses, system
|
||
messages, or just messages.
|
||
|
||
- `InputDTMFFrame` is now based on `DTMFFrame`. There's also a new
|
||
`OutputDTMFFrame` frame.
|
||
|
||
### Deprecated
|
||
|
||
- `resample_audio()` is now deprecated, use `create_default_resampler()`
|
||
instead.
|
||
|
||
### Removed
|
||
|
||
- `AudioBufferProcessor.reset_audio_buffers()` has been removed, use
|
||
`AudioBufferProcessor.start_recording()` and
|
||
`AudioBufferProcessor.stop_recording()` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `AudioBufferProcessor` that would cause crackling in some recordings.
|
||
|
||
- Fixed an issue in `AudioBufferProcessor` where user callback would not be
|
||
called on task cancellation.
|
||
|
||
- Fixed an issue in `AudioBufferProcessor` that would cause wrong silence
|
||
padding in some cases.
|
||
|
||
- Fixed an issue where `ElevenLabsTTSService` messages would return a 1009
|
||
websocket error by increasing the max message size limit to 16MB.
|
||
|
||
- Fixed a `DailyTransport` issue that would cause events to be triggered before
|
||
join finished.
|
||
|
||
- Fixed a `PipelineTask` issue that was preventing processors to be cleaned up
|
||
after cancelling the task.
|
||
|
||
- Fixed an issue where queuing a `CancelFrame` to a pipeline task would not
|
||
cause the task to finish. However, using `PipelineTask.cancel()` is still the
|
||
recommended way to cancel a task.
|
||
|
||
### Other
|
||
|
||
- Improved Unit Test `run_test()` to use `PipelineTask` and
|
||
`PipelineRunner`. There's now also some control around `StartFrame` and
|
||
`EndFrame`. The `EndTaskFrame` has been removed since it doesn't seem
|
||
necessary with this new approach.
|
||
|
||
- Updated `twilio-chatbot` with a few new features: use 8000 sample rate and
|
||
avoid resampling, a new client useful for stress testing and testing locally
|
||
without the need to make phone calls. Also, added audio recording on both the
|
||
client and the server to make sure the audio sounds good.
|
||
|
||
- Updated examples to use `task.cancel()` to immediately exit the example when a
|
||
participant leaves or disconnects, instead of pushing an `EndFrame`. Pushing
|
||
an `EndFrame` causes the bot to run through everything that is internally
|
||
queued (which could take some seconds). Note that using `task.cancel()` might
|
||
not always be the best option and pushing an `EndFrame` could still be
|
||
desirable to make sure all the pipeline is flushed.
|
||
|
||
## [0.0.54] - 2025-01-27
|
||
|
||
### Added
|
||
|
||
- In order to create tasks in Pipecat frame processors it is now recommended to
|
||
use `FrameProcessor.create_task()` (which uses the new
|
||
`utils.asyncio.create_task()`). It takes care of uncaught exceptions, task
|
||
cancellation handling and task management. To cancel or wait for a task there
|
||
is `FrameProcessor.cancel_task()` and `FrameProcessor.wait_for_task()`. All of
|
||
Pipecat processors have been updated accordingly. Also, when a pipeline runner
|
||
finishes, a warning about dangling tasks might appear, which indicates if any
|
||
of the created tasks was never cancelled or awaited for (using these new
|
||
functions).
|
||
|
||
- It is now possible to specify the period of the `PipelineTask` heartbeat
|
||
frames with `heartbeats_period_secs`.
|
||
|
||
- Added `DailyMeetingTokenProperties` and `DailyMeetingTokenParams` Pydantic models
|
||
for meeting token creation in `get_token` method of `DailyRESTHelper`.
|
||
|
||
- Added `enable_recording` and `geo` parameters to `DailyRoomProperties`.
|
||
|
||
- Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings
|
||
to a custom AWS bucket.
|
||
|
||
### Changed
|
||
|
||
- Enhanced `UserIdleProcessor` with retry functionality and control over idle
|
||
monitoring via new callback signature `(processor, retry_count) -> bool`.
|
||
Updated the `17-detect-user-idle.py` to show how to use the `retry_count`.
|
||
|
||
- Add defensive error handling for `OpenAIRealtimeBetaLLMService`'s audio
|
||
truncation. Audio truncation errors during interruptions now log a warning
|
||
and allow the session to continue instead of throwing an exception.
|
||
|
||
- Modified `TranscriptProcessor` to use TTS text frames for more accurate assistant
|
||
transcripts. Assistant messages are now aggregated based on bot speaking boundaries
|
||
rather than LLM context, providing better handling of interruptions and partial
|
||
utterances.
|
||
|
||
- Updated foundational examples `28a-transcription-processor-openai.py`,
|
||
`28b-transcript-processor-anthropic.py`, and
|
||
`28c-transcription-processor-gemini.py` to use the updated
|
||
`TranscriptProcessor`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an `GeminiMultimodalLiveLLMService` issue that was preventing the user
|
||
to push initial LLM assistant messages (using `LLMMessagesAppendFrame`).
|
||
|
||
- Added missing `FrameProcessor.cleanup()` calls to `Pipeline`,
|
||
`ParallelPipeline` and `UserIdleProcessor`.
|
||
|
||
- Fixed a type error when using `voice_settings` in `ElevenLabsHttpTTSService`.
|
||
|
||
- Fixed an issue where `OpenAIRealtimeBetaLLMService` function calling resulted
|
||
in an error.
|
||
|
||
- Fixed an issue in `AudioBufferProcessor` where the last audio buffer was not
|
||
being processed, in cases where the `_user_audio_buffer` was smaller than the
|
||
buffer size.
|
||
|
||
### Performance
|
||
|
||
- Replaced audio resampling library `resampy` with `soxr`. Resampling a 2:21s
|
||
audio file from 24KHz to 16KHz took 1.41s with `resampy` and 0.031s with
|
||
`soxr` with similar audio quality.
|
||
|
||
### Other
|
||
|
||
- Added initial unit test infrastructure.
|
||
|
||
## [0.0.53] - 2025-01-18
|
||
|
||
### Added
|
||
|
||
- Added `ElevenLabsHttpTTSService` which uses EleveLabs' HTTP API instead of the
|
||
websocket one.
|
||
|
||
- Introduced pipeline frame observers. Observers can view all the frames that go
|
||
through the pipeline without the need to inject processors in the
|
||
pipeline. This can be useful, for example, to implement frame loggers or
|
||
debuggers among other things. The example
|
||
`examples/foundational/30-observer.py` shows how to add an observer to a
|
||
pipeline for debugging.
|
||
|
||
- Introduced heartbeat frames. The pipeline task can now push periodic
|
||
heartbeats down the pipeline when `enable_heartbeats=True`. Heartbeats are
|
||
system frames that are supposed to make it all the way to the end of the
|
||
pipeline. When a heartbeat frame is received the traversing time (i.e. the
|
||
time it took to go through the whole pipeline) will be displayed (with TRACE
|
||
logging) otherwise a warning will be shown. The example
|
||
`examples/foundational/31-heartbeats.py` shows how to enable heartbeats and
|
||
forces warnings to be displayed.
|
||
|
||
- Added `LLMTextFrame` and `TTSTextFrame` which should be pushed by LLM and TTS
|
||
services respectively instead of `TextFrame`s.
|
||
|
||
- Added `OpenRouter` for OpenRouter integration with an OpenAI-compatible
|
||
interface. Added foundational example `14m-function-calling-openrouter.py`.
|
||
|
||
- Added a new `WebsocketService` based class for TTS services, containing
|
||
base functions and retry logic.
|
||
|
||
- Added `DeepSeekLLMService` for DeepSeek integration with an OpenAI-compatible
|
||
interface. Added foundational example `14l-function-calling-deepseek.py`.
|
||
|
||
- Added `FunctionCallResultProperties` dataclass to provide a structured way to
|
||
control function call behavior, including:
|
||
|
||
- `run_llm`: Controls whether to trigger LLM completion
|
||
- `on_context_updated`: Optional callback triggered after context update
|
||
|
||
- Added a new foundational example `07e-interruptible-playht-http.py` for easy
|
||
testing of `PlayHTHttpTTSService`.
|
||
|
||
- Added support for Google TTS Journey voices in `GoogleTTSService`.
|
||
|
||
- Added `29-livekit-audio-chat.py`, as a new foundational examples for
|
||
`LiveKitTransportLayer`.
|
||
|
||
- Added `enable_prejoin_ui`, `max_participants` and `start_video_off` params
|
||
to `DailyRoomProperties`.
|
||
|
||
- Added `session_timeout` to `FastAPIWebsocketTransport` and
|
||
`WebsocketServerTransport` for configuring session timeouts (in
|
||
seconds). Triggers `on_session_timeout` for custom timeout handling.
|
||
See [examples/websocket-server/bot.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/websocket-server/bot.py).
|
||
|
||
- Added the new modalities option and helper function to set Gemini output
|
||
modalities.
|
||
|
||
- Added `examples/foundational/26d-gemini-live-text.py` which is
|
||
using Gemini as TEXT modality and using another TTS provider for TTS process.
|
||
|
||
### Changed
|
||
|
||
- Modified `UserIdleProcessor` to start monitoring only after first
|
||
conversation activity (`UserStartedSpeakingFrame` or
|
||
`BotStartedSpeakingFrame`) instead of immediately.
|
||
|
||
- Modified `OpenAIAssistantContextAggregator` to support controlled completions
|
||
and to emit context update callbacks via `FunctionCallResultProperties`.
|
||
|
||
- Added `aws_session_token` to the `PollyTTSService`.
|
||
|
||
- Changed the default model for `PlayHTHttpTTSService` to `Play3.0-mini-http`.
|
||
|
||
- `api_key`, `aws_access_key_id` and `region` are no longer required parameters
|
||
for the PollyTTSService (AWSTTSService)
|
||
|
||
- Added `session_timeout` example in `examples/websocket-server/bot.py` to
|
||
handle session timeout event.
|
||
|
||
- Changed `InputParams` in
|
||
`src/pipecat/services/gemini_multimodal_live/gemini.py` to support different
|
||
modalities.
|
||
|
||
- Changed `DeepgramSTTService` to send `finalize` event whenever VAD detects
|
||
`UserStoppedSpeakingFrame`. This helps in faster transcriptions and clearing
|
||
the `Deepgram` audio buffer.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where `DeepgramSTTService` was not generating metrics using
|
||
pipeline's VAD.
|
||
|
||
- Fixed `UserIdleProcessor` not properly propagating `EndFrame`s through the
|
||
pipeline.
|
||
|
||
- Fixed an issue where websocket based TTS services could incorrectly terminate
|
||
their connection due to a retry counter not resetting.
|
||
|
||
- Fixed a `PipelineTask` issue that would cause a dangling task after stopping
|
||
the pipeline with an `EndFrame`.
|
||
|
||
- Fixed an import issue for `PlayHTHttpTTSService`.
|
||
|
||
- Fixed an issue where languages couldn't be used with the `PlayHTHttpTTSService`.
|
||
|
||
- Fixed an issue where `OpenAIRealtimeBetaLLMService` audio chunks were hitting
|
||
an error when truncating audio content.
|
||
|
||
- Fixed an issue where setting the voice and model for `RimeHttpTTSService`
|
||
wasn't working.
|
||
|
||
- Fixed an issue where `IdleFrameProcessor` and `UserIdleProcessor` were getting
|
||
initialized before the start of the pipeline.
|
||
|
||
## [0.0.52] - 2024-12-24
|
||
|
||
### Added
|
||
|
||
- Constructor arguments for GoogleLLMService to directly set tools and tool_config.
|
||
|
||
- Smart turn detection example (`22d-natural-conversation-gemini-audio.py`) that
|
||
leverages Gemini 2.0 capabilities ().
|
||
(see https://x.com/kwindla/status/1870974144831275410)
|
||
|
||
- Added `DailyTransport.send_dtmf()` to send dial-out DTMF tones.
|
||
|
||
- Added `DailyTransport.sip_call_transfer()` to forward SIP and PSTN calls to
|
||
another address or number. For example, transfer a SIP call to a different
|
||
SIP address or transfer a PSTN phone number to a different PSTN phone number.
|
||
|
||
- Added `DailyTransport.sip_refer()` to transfer incoming SIP/PSTN calls from
|
||
outside Daily to another SIP/PSTN address.
|
||
|
||
- Added an `auto_mode` input parameter to `ElevenLabsTTSService`. `auto_mode`
|
||
is set to `True` by default. Enabling this setting disables the chunk
|
||
schedule and all buffers, which reduces latency.
|
||
|
||
- Added `KoalaFilter` which implement on device noise reduction using Koala
|
||
Noise Suppression.
|
||
(see https://picovoice.ai/platform/koala/)
|
||
|
||
- Added `CerebrasLLMService` for Cerebras integration with an OpenAI-compatible
|
||
interface. Added foundational example `14k-function-calling-cerebras.py`.
|
||
|
||
- Pipecat now supports Python 3.13. We had a dependency on the `audioop` package
|
||
which was deprecated and now removed on Python 3.13. We are now using
|
||
`audioop-lts` (https://github.com/AbstractUmbra/audioop) to provide the same
|
||
functionality.
|
||
|
||
- Added timestamped conversation transcript support:
|
||
|
||
- New `TranscriptProcessor` factory provides access to user and assistant
|
||
transcript processors.
|
||
- `UserTranscriptProcessor` processes user speech with timestamps from
|
||
transcription.
|
||
- `AssistantTranscriptProcessor` processes assistant responses with LLM
|
||
context timestamps.
|
||
- Messages emitted with ISO 8601 timestamps indicating when they were spoken.
|
||
- Supports all LLM formats (OpenAI, Anthropic, Google) via standard message
|
||
format.
|
||
- New examples: `28a-transcription-processor-openai.py`,
|
||
`28b-transcription-processor-anthropic.py`, and
|
||
`28c-transcription-processor-gemini.py`.
|
||
|
||
- Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino,
|
||
Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian,
|
||
Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa).
|
||
|
||
### Changed
|
||
|
||
- `PlayHTTTSService` uses the new v4 websocket API, which also fixes an issue
|
||
where text inputted to the TTS didn't return audio.
|
||
|
||
- The default model for `ElevenLabsTTSService` is now `eleven_flash_v2_5`.
|
||
|
||
- `OpenAIRealtimeBetaLLMService` now takes a `model` parameter in the
|
||
constructor.
|
||
|
||
- Updated the default model for the `OpenAIRealtimeBetaLLMService`.
|
||
|
||
- Room expiration (`exp`) in `DailyRoomProperties` is now optional (`None`) by
|
||
default instead of automatically setting a 5-minute expiration time. You must
|
||
explicitly set expiration time if desired.
|
||
|
||
### Deprecated
|
||
|
||
- `AWSTTSService` is now deprecated, use `PollyTTSService` instead.
|
||
|
||
### Fixed
|
||
|
||
- Fixed token counting in `GoogleLLMService`. Tokens were summed incorrectly
|
||
(double-counted in many cases).
|
||
|
||
- Fixed an issue that could cause the bot to stop talking if there was a user
|
||
interruption before getting any audio from the TTS service.
|
||
|
||
- Fixed an issue that would cause `ParallelPipeline` to handle `EndFrame`
|
||
incorrectly causing the main pipeline to not terminate or terminate too early.
|
||
|
||
- Fixed an audio stuttering issue in `FastPitchTTSService`.
|
||
|
||
- Fixed a `BaseOutputTransport` issue that was causing non-audio frames being
|
||
processed before the previous audio frames were played. This will allow, for
|
||
example, sending a frame `A` after a `TTSSpeakFrame` and the frame `A` will
|
||
only be pushed downstream after the audio generated from `TTSSpeakFrame` has
|
||
been spoken.
|
||
|
||
- Fixed a `DeepgramSTTService` issue that was causing language to be passed as
|
||
an object instead of a string resulting in the connection to fail.
|
||
|
||
## [0.0.51] - 2024-12-16
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue in websocket-based TTS services that was causing infinite
|
||
reconnections (Cartesia, ElevenLabs, PlayHT and LMNT).
|
||
|
||
## [0.0.50] - 2024-12-11
|
||
|
||
### Added
|
||
|
||
- Added `GeminiMultimodalLiveLLMService`. This is an integration for Google's
|
||
Gemini Multimodal Live API, supporting:
|
||
|
||
- Real-time audio and video input processing
|
||
- Streaming text responses with TTS
|
||
- Audio transcription for both user and bot speech
|
||
- Function calling
|
||
- System instructions and context management
|
||
- Dynamic parameter updates (temperature, top_p, etc.)
|
||
|
||
- Added `AudioTranscriber` utility class for handling audio transcription with
|
||
Gemini models.
|
||
|
||
- Added new context classes for Gemini:
|
||
|
||
- `GeminiMultimodalLiveContext`
|
||
- `GeminiMultimodalLiveUserContextAggregator`
|
||
- `GeminiMultimodalLiveAssistantContextAggregator`
|
||
- `GeminiMultimodalLiveContextAggregatorPair`
|
||
|
||
- Added new foundational examples for `GeminiMultimodalLiveLLMService`:
|
||
|
||
- `26-gemini-multimodal-live.py`
|
||
- `26a-gemini-live-transcription.py`
|
||
- `26b-gemini-live-video.py`
|
||
- `26c-gemini-live-video.py`
|
||
|
||
- Added `SimliVideoService`. This is an integration for Simli AI avatars.
|
||
(see https://www.simli.com)
|
||
|
||
- Added NVIDIA Riva's `FastPitchTTSService` and `ParakeetSTTService`.
|
||
(see https://www.nvidia.com/en-us/ai-data-science/products/riva/)
|
||
|
||
- Added `IdentityFilter`. This is the simplest frame filter that lets through
|
||
all incoming frames.
|
||
|
||
- New `STTMuteStrategy` called `FUNCTION_CALL` which mutes the STT service
|
||
during LLM function calls.
|
||
|
||
- `DeepgramSTTService` now exposes two event handlers `on_speech_started` and
|
||
`on_utterance_end` that could be used to implement interruptions. See new
|
||
example `examples/foundational/07c-interruptible-deepgram-vad.py`.
|
||
|
||
- Added `GroqLLMService`, `GrokLLMService`, and `NimLLMService` for Groq, Grok,
|
||
and NVIDIA NIM API integration, with an OpenAI-compatible interface.
|
||
|
||
- New examples demonstrating function calling with Groq, Grok, Azure OpenAI,
|
||
Fireworks, and NVIDIA NIM: `14f-function-calling-groq.py`,
|
||
`14g-function-calling-grok.py`, `14h-function-calling-azure.py`,
|
||
`14i-function-calling-fireworks.py`, and `14j-function-calling-nvidia.py`.
|
||
|
||
- In order to obtain the audio stored by the `AudioBufferProcessor` you can now
|
||
also register an `on_audio_data` event handler. The `on_audio_data` handler
|
||
will be called every time `buffer_size` (a new constructor argument) is
|
||
reached. If `buffer_size` is 0 (default) you need to manually get the audio as
|
||
before using `AudioBufferProcessor.merge_audio_buffers()`.
|
||
|
||
```
|
||
@audiobuffer.event_handler("on_audio_data")
|
||
async def on_audio_data(processor, audio, sample_rate, num_channels):
|
||
await save_audio(audio, sample_rate, num_channels)
|
||
```
|
||
|
||
- Added a new RTVI message called `disconnect-bot`, which when handled pushes
|
||
an `EndFrame` to trigger the pipeline to stop.
|
||
|
||
### Changed
|
||
|
||
- `STTMuteFilter` now supports multiple simultaneous muting strategies.
|
||
|
||
- `XTTSService` language now defaults to `Language.EN`.
|
||
|
||
- `SoundfileMixer` doesn't resample input files anymore to avoid startup
|
||
delays. The sample rate of the provided sound files now need to match the
|
||
sample rate of the output transport.
|
||
|
||
- Input frames (audio, image and transport messages) are now system frames. This
|
||
means they are processed immediately by all processors instead of being queued
|
||
internally.
|
||
|
||
- Expanded the transcriptions.language module to support a superset of
|
||
languages.
|
||
|
||
- Updated STT and TTS services with language options that match the supported
|
||
languages for each service.
|
||
|
||
- Updated the `AzureLLMService` to use the `OpenAILLMService`. Updated the
|
||
`api_version` to `2024-09-01-preview`.
|
||
|
||
- Updated the `FireworksLLMService` to use the `OpenAILLMService`. Updated the
|
||
default model to `accounts/fireworks/models/firefunction-v2`.
|
||
|
||
- Updated the `simple-chatbot` example to include a Javascript and React client
|
||
example, using RTVI JS and React.
|
||
|
||
### Removed
|
||
|
||
- Removed `AppFrame`. This was used as a special user custom frame, but there's
|
||
actually no use case for that.
|
||
|
||
### Fixed
|
||
|
||
- Fixed a `ParallelPipeline` issue that would cause system frames to be queued.
|
||
|
||
- Fixed `FastAPIWebsocketTransport` so it can work with binary data (e.g. using
|
||
the protobuf serializer).
|
||
|
||
- Fixed an issue in `CartesiaTTSService` that could cause previous audio to be
|
||
received after an interruption.
|
||
|
||
- Fixed Cartesia, ElevenLabs, LMNT and PlayHT TTS websocket
|
||
reconnection. Before, if an error occurred no reconnection was happening.
|
||
|
||
- Fixed a `BaseOutputTransport` issue that was causing audio to be discarded
|
||
after an `EndFrame` was received.
|
||
|
||
- Fixed an issue in `WebsocketServerTransport` and `FastAPIWebsocketTransport`
|
||
that would cause a busy loop when using audio mixer.
|
||
|
||
- Fixed a `DailyTransport` and `LiveKitTransport` issue where connections were
|
||
being closed in the input transport prematurely. This was causing frames
|
||
queued inside the pipeline being discarded.
|
||
|
||
- Fixed an issue in `DailyTransport` that would cause some internal callbacks to
|
||
not be executed.
|
||
|
||
- Fixed an issue where other frames were being processed while a `CancelFrame`
|
||
was being pushed down the pipeline.
|
||
|
||
- `AudioBufferProcessor` now handles interruptions properly.
|
||
|
||
- Fixed a `WebsocketServerTransport` issue that would prevent interruptions with
|
||
`TwilioSerializer` from working.
|
||
|
||
- `DailyTransport.capture_participant_video` now allows capturing user's screen
|
||
share by simply passing `video_source="screenVideo"`.
|
||
|
||
- Fixed Google Gemini message handling to properly convert appended messages to
|
||
Gemini's required format.
|
||
|
||
- Fixed an issue with `FireworksLLMService` where chat completions were failing
|
||
by removing the `stream_options` from the chat completion options.
|
||
|
||
## [0.0.49] - 2024-11-17
|
||
|
||
### Added
|
||
|
||
- Added RTVI `on_bot_started` event which is useful in a single turn
|
||
interaction.
|
||
|
||
- Added `DailyTransport` events `dialin-connected`, `dialin-stopped`,
|
||
`dialin-error` and `dialin-warning`. Needs daily-python >= 0.13.0.
|
||
|
||
- Added `RimeHttpTTSService` and the `07q-interruptible-rime.py` foundational
|
||
example.
|
||
|
||
- Added `STTMuteFilter`, a general-purpose processor that combines STT
|
||
muting and interruption control. When active, it prevents both transcription
|
||
and interruptions during bot speech. The processor supports multiple
|
||
strategies: `FIRST_SPEECH` (mute only during bot's first
|
||
speech), `ALWAYS` (mute during all bot speech), or `CUSTOM` (using provided
|
||
callback).
|
||
|
||
- Added `STTMuteFrame`, a control frame that enables/disables speech
|
||
transcription in STT services.
|
||
|
||
## [0.0.48] - 2024-11-10 "Antonio release"
|
||
|
||
### Added
|
||
|
||
- There's now an input queue in each frame processor. When you call
|
||
`FrameProcessor.push_frame()` this will internally call
|
||
`FrameProcessor.queue_frame()` on the next processor (upstream or downstream)
|
||
and the frame will be internally queued (except system frames). Then, the
|
||
queued frames will get processed. With this input queue it is also possible
|
||
for FrameProcessors to block processing more frames by calling
|
||
`FrameProcessor.pause_processing_frames()`. The way to resume processing
|
||
frames is by calling `FrameProcessor.resume_processing_frames()`.
|
||
|
||
- Added audio filter `NoisereduceFilter`.
|
||
|
||
- Introduce input transport audio filters (`BaseAudioFilter`). Audio filters can
|
||
be used to remove background noises before audio is sent to VAD.
|
||
|
||
- Introduce output transport audio mixers (`BaseAudioMixer`). Output transport
|
||
audio mixers can be used, for example, to add background sounds or any other
|
||
audio mixing functionality before the output audio is actually written to the
|
||
transport.
|
||
|
||
- Added `GatedOpenAILLMContextAggregator`. This aggregator keeps the last
|
||
received OpenAI LLM context frame and it doesn't let it through until the
|
||
notifier is notified.
|
||
|
||
- Added `WakeNotifierFilter`. This processor expects a list of frame types and
|
||
will execute a given callback predicate when a frame of any of those type is
|
||
being processed. If the callback returns true the notifier will be notified.
|
||
|
||
- Added `NullFilter`. A null filter doesn't push any frames upstream or
|
||
downstream. This is usually used to disable one of the pipelines in
|
||
`ParallelPipeline`.
|
||
|
||
- Added `EventNotifier`. This can be used as a very simple synchronization
|
||
feature between processors.
|
||
|
||
- Added `TavusVideoService`. This is an integration for Tavus digital twins.
|
||
(see https://www.tavus.io/)
|
||
|
||
- Added `DailyTransport.update_subscriptions()`. This allows you to have fine
|
||
grained control of what media subscriptions you want for each participant in a
|
||
room.
|
||
|
||
- Added audio filter `KrispFilter`.
|
||
|
||
### Changed
|
||
|
||
- The following `DailyTransport` functions are now `async` which means they need
|
||
to be awaited: `start_dialout`, `stop_dialout`, `start_recording`,
|
||
`stop_recording`, `capture_participant_transcription` and
|
||
`capture_participant_video`.
|
||
|
||
- Changed default output sample rate to 24000. This changes all TTS service to
|
||
output to 24000 and also the default output transport sample rate. This
|
||
improves audio quality at the cost of some extra bandwidth.
|
||
|
||
- `AzureTTSService` now uses Azure websockets instead of HTTP requests.
|
||
|
||
- The previous `AzureTTSService` HTTP implementation is now
|
||
`AzureHttpTTSService`.
|
||
|
||
### Fixed
|
||
|
||
- Websocket transports (FastAPI and Websocket) now synchronize with time before
|
||
sending data. This allows for interruptions to just work out of the box.
|
||
|
||
- Improved bot speaking detection for all TTS services by using actual bot
|
||
audio.
|
||
|
||
- Fixed an issue that was generating constant bot started/stopped speaking
|
||
frames for HTTP TTS services.
|
||
|
||
- Fixed an issue that was causing stuttering with AWS TTS service.
|
||
|
||
- Fixed an issue with PlayHTTTSService, where the TTFB metrics were reporting
|
||
very small time values.
|
||
|
||
- Fixed an issue where AzureTTSService wasn't initializing the specified
|
||
language.
|
||
|
||
### Other
|
||
|
||
- Add `23-bot-background-sound.py` foundational example.
|
||
|
||
- Added a new foundational example `22-natural-conversation.py`. This example
|
||
shows how to achieve a more natural conversation detecting when the user ends
|
||
statement.
|
||
|
||
## [0.0.47] - 2024-10-22
|
||
|
||
### Added
|
||
|
||
- Added `AssemblyAISTTService` and corresponding foundational examples
|
||
`07o-interruptible-assemblyai.py` and `13d-assemblyai-transcription.py`.
|
||
|
||
- Added a foundational example for Gladia transcription:
|
||
`13c-gladia-transcription.py`
|
||
|
||
### Changed
|
||
|
||
- Updated `GladiaSTTService` to use the V2 API.
|
||
|
||
- Changed `DailyTransport` transcription model to `nova-2-general`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue that would cause an import error when importing
|
||
`SileroVADAnalyzer` from the old package `pipecat.vad.silero`.
|
||
|
||
- Fixed `enable_usage_metrics` to control LLM/TTS usage metrics separately
|
||
from `enable_metrics`.
|
||
|
||
## [0.0.46] - 2024-10-19
|
||
|
||
### Added
|
||
|
||
- Added `audio_passthrough` parameter to `STTService`. If enabled it allows
|
||
audio frames to be pushed downstream in case other processors need them.
|
||
|
||
- Added input parameter options for `PlayHTTTSService` and
|
||
`PlayHTHttpTTSService`.
|
||
|
||
### Changed
|
||
|
||
- Changed `DeepgramSTTService` model to `nova-2-general`.
|
||
|
||
- Moved `SileroVAD` audio processor to `processors.audio.vad`.
|
||
|
||
- Module `utils.audio` is now `audio.utils`. A new `resample_audio` function has
|
||
been added.
|
||
|
||
- `PlayHTTTSService` now uses PlayHT websockets instead of HTTP requests.
|
||
|
||
- The previous `PlayHTTTSService` HTTP implementation is now
|
||
`PlayHTHttpTTSService`.
|
||
|
||
- `PlayHTTTSService` and `PlayHTHttpTTSService` now use a `voice_engine` of
|
||
`PlayHT3.0-mini`, which allows for multi-lingual support.
|
||
|
||
- Renamed `OpenAILLMServiceRealtimeBeta` to `OpenAIRealtimeBetaLLMService` to
|
||
match other services.
|
||
|
||
### Deprecated
|
||
|
||
- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` are
|
||
mostly deprecated, use `OpenAILLMContext` instead.
|
||
|
||
- The `vad` package is now deprecated and `audio.vad` should be used
|
||
instead. The `avd` package will get removed in a future release.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue that would cause an error if no VAD analyzer was passed to
|
||
`LiveKitTransport` params.
|
||
|
||
- Fixed `SileroVAD` processor to support interruptions properly.
|
||
|
||
### Other
|
||
|
||
- Added `examples/foundational/07-interruptible-vad.py`. This is the same as
|
||
`07-interruptible.py` but using the `SileroVAD` processor instead of passing
|
||
the `VADAnalyzer` in the transport.
|
||
|
||
## [0.0.45] - 2024-10-16
|
||
|
||
### Changed
|
||
|
||
- Metrics messages have moved out from the transport's base output into RTVI.
|
||
|
||
## [0.0.44] - 2024-10-15
|
||
|
||
### Added
|
||
|
||
- Added support for OpenAI Realtime API with the new
|
||
`OpenAILLMServiceRealtimeBeta` processor.
|
||
(see https://platform.openai.com/docs/guides/realtime/overview)
|
||
|
||
- Added `RTVIBotTranscriptionProcessor` which will send the RTVI
|
||
`bot-transcription` protocol message. These are TTS text aggregated (into
|
||
sentences) messages.
|
||
|
||
- Added new input params to the `MarkdownTextFilter` utility. You can set
|
||
`filter_code` to filter code from text and `filter_tables` to filter tables
|
||
from text.
|
||
|
||
- Added `CanonicalMetricsService`. This processor uses the new
|
||
`AudioBufferProcessor` to capture conversation audio and later send it to
|
||
Canonical AI.
|
||
(see https://canonical.chat/)
|
||
|
||
- Added `AudioBufferProcessor`. This processor can be used to buffer mixed user and
|
||
bot audio. This can later be saved into an audio file or processed by some
|
||
audio analyzer.
|
||
|
||
- Added `on_first_participant_joined` event to `LiveKitTransport`.
|
||
|
||
### Changed
|
||
|
||
- LLM text responses are now logged properly as unicode characters.
|
||
|
||
- `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`,
|
||
`BotStartedSpeakingFrame`, `BotStoppedSpeakingFrame`, `BotSpeakingFrame` and
|
||
`UserImageRequestFrame` are now based from `SystemFrame`
|
||
|
||
### Fixed
|
||
|
||
- Merge `RTVIBotLLMProcessor`/`RTVIBotLLMTextProcessor` and
|
||
`RTVIBotTTSProcessor`/`RTVIBotTTSTextProcessor` to avoid out of order issues.
|
||
|
||
- Fixed an issue in RTVI protocol that could cause a `bot-llm-stopped` or
|
||
`bot-tts-stopped` message to be sent before a `bot-llm-text` or `bot-tts-text`
|
||
message.
|
||
|
||
- Fixed `DeepgramSTTService` constructor settings not being merged with default
|
||
ones.
|
||
|
||
- Fixed an issue in Daily transport that would cause tasks to be hanging if
|
||
urgent transport messages were being sent from a transport event handler.
|
||
|
||
- Fixed an issue in `BaseOutputTransport` that would cause `EndFrame` to be
|
||
pushed downed too early and call `FrameProcessor.cleanup()` before letting the
|
||
transport stop properly.
|
||
|
||
## [0.0.43] - 2024-10-10
|
||
|
||
### Added
|
||
|
||
- Added a new util called `MarkdownTextFilter` which is a subclass of a new
|
||
base class called `BaseTextFilter`. This is a configurable utility which
|
||
is intended to filter text received by TTS services.
|
||
|
||
- Added new `RTVIUserLLMTextProcessor`. This processor will send an RTVI
|
||
`user-llm-text` message with the user content's that was sent to the LLM.
|
||
|
||
### Changed
|
||
|
||
- `TransportMessageFrame` doesn't have an `urgent` field anymore, instead
|
||
there's now a `TransportMessageUrgentFrame` which is a `SystemFrame` and
|
||
therefore skip all internal queuing.
|
||
|
||
- For TTS services, convert inputted languages to match each service's language
|
||
format
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where changing a language with the Deepgram STT service
|
||
wouldn't apply the change. This was fixed by disconnecting and reconnecting
|
||
when the language changes.
|
||
|
||
## [0.0.42] - 2024-10-02
|
||
|
||
### Added
|
||
|
||
- `SentryMetrics` has been added to report frame processor metrics to
|
||
Sentry. This is now possible because `FrameProcessorMetrics` can now be passed
|
||
to `FrameProcessor`.
|
||
|
||
- Added Google TTS service and corresponding foundational example
|
||
`07n-interruptible-google.py`
|
||
|
||
- Added AWS Polly TTS support and `07m-interruptible-aws.py` as an example.
|
||
|
||
- Added InputParams to Azure TTS service.
|
||
|
||
- Added `LivekitTransport` (audio-only for now).
|
||
|
||
- RTVI 0.2.0 is now supported.
|
||
|
||
- All `FrameProcessors` can now register event handlers.
|
||
|
||
```
|
||
tts = SomeTTSService(...)
|
||
|
||
@tts.event_handler("on_connected"):
|
||
async def on_connected(processor):
|
||
...
|
||
```
|
||
|
||
- Added `AsyncGeneratorProcessor`. This processor can be used together with a
|
||
`FrameSerializer` as an async generator. It provides a `generator()` function
|
||
that returns an `AsyncGenerator` and that yields serialized frames.
|
||
|
||
- Added `EndTaskFrame` and `CancelTaskFrame`. These are new frames that are
|
||
meant to be pushed upstream to tell the pipeline task to stop nicely or
|
||
immediately respectively.
|
||
|
||
- Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed)
|
||
for OpenAI, Anthropic, and Together AI services along with corresponding
|
||
setter functions.
|
||
|
||
- Added `sample_rate` as a constructor parameter for TTS services.
|
||
|
||
- Pipecat has a pipeline-based architecture. The pipeline consists of frame
|
||
processors linked to each other. The elements traveling across the pipeline
|
||
are called frames.
|
||
|
||
To have a deterministic behavior the frames traveling through the pipeline
|
||
should always be ordered, except system frames which are out-of-band
|
||
frames. To achieve that, each frame processor should only output frames from a
|
||
single task.
|
||
|
||
In this version all the frame processors have their own task to push
|
||
frames. That is, when `push_frame()` is called the given frame will be put
|
||
into an internal queue (with the exception of system frames) and a frame
|
||
processor task will push it out.
|
||
|
||
- Added pipeline clocks. A pipeline clock is used by the output transport to
|
||
know when a frame needs to be presented. For that, all frames now have an
|
||
optional `pts` field (prensentation timestamp). There's currently just one
|
||
clock implementation `SystemClock` and the `pts` field is currently only used
|
||
for `TextFrame`s (audio and image frames will be next).
|
||
|
||
- A clock can now be specified to `PipelineTask` (defaults to
|
||
`SystemClock`). This clock will be passed to each frame processor via the
|
||
`StartFrame`.
|
||
|
||
- Added `CartesiaHttpTTSService`.
|
||
|
||
- `DailyTransport` now supports setting the audio bitrate to improve audio
|
||
quality through the `DailyParams.audio_out_bitrate` parameter. The new
|
||
default is 96kbps.
|
||
|
||
- `DailyTransport` now uses the number of audio output channels (1 or 2) to set
|
||
mono or stereo audio when needed.
|
||
|
||
- Interruptions support has been added to `TwilioFrameSerializer` when using
|
||
`FastAPIWebsocketTransport`.
|
||
|
||
- Added new `LmntTTSService` text-to-speech service.
|
||
(see https://www.lmnt.com/)
|
||
|
||
- Added `TTSModelUpdateFrame`, `TTSLanguageUpdateFrame`, `STTModelUpdateFrame`,
|
||
and `STTLanguageUpdateFrame` frames to allow you to switch models, language
|
||
and voices in TTS and STT services.
|
||
|
||
- Added new `transcriptions.Language` enum.
|
||
|
||
### Changed
|
||
|
||
- Context frames are now pushed downstream from assistant context aggregators.
|
||
|
||
- Removed Silero VAD torch dependency.
|
||
|
||
- Updated individual update settings frame classes into a single
|
||
`ServiceUpdateSettingsFrame` class.
|
||
|
||
- We now distinguish between input and output audio and image frames. We
|
||
introduce `InputAudioRawFrame`, `OutputAudioRawFrame`, `InputImageRawFrame`
|
||
and `OutputImageRawFrame` (and other subclasses of those). The input frames
|
||
usually come from an input transport and are meant to be processed inside the
|
||
pipeline to generate new frames. However, the input frames will not be sent
|
||
through an output transport. The output frames can also be processed by any
|
||
frame processor in the pipeline and they are allowed to be sent by the output
|
||
transport.
|
||
|
||
- `ParallelTask` has been renamed to `SyncParallelPipeline`. A
|
||
`SyncParallelPipeline` is a frame processor that contains a list of different
|
||
pipelines to be executed concurrently. The difference between a
|
||
`SyncParallelPipeline` and a `ParallelPipeline` is that, given an input frame,
|
||
the `SyncParallelPipeline` will wait for all the internal pipelines to
|
||
complete. This is achieved by making sure the last processor in each of the
|
||
pipelines is synchronous (e.g. an HTTP-based service that waits for the
|
||
response).
|
||
|
||
- `StartFrame` is back a system frame to make sure it's processed immediately by
|
||
all processors. `EndFrame` stays a control frame since it needs to be ordered
|
||
allowing the frames in the pipeline to be processed.
|
||
|
||
- Updated `MoondreamService` revision to `2024-08-26`.
|
||
|
||
- `CartesiaTTSService` and `ElevenLabsTTSService` now add presentation
|
||
timestamps to their text output. This allows the output transport to push the
|
||
text frames downstream at almost the same time the words are spoken. We say
|
||
"almost" because currently the audio frames don't have presentation timestamp
|
||
but they should be played at roughly the same time.
|
||
|
||
- `DailyTransport.on_joined` event now returns the full session data instead of
|
||
just the participant.
|
||
|
||
- `CartesiaTTSService` is now a subclass of `TTSService`.
|
||
|
||
- `DeepgramSTTService` is now a subclass of `STTService`.
|
||
|
||
- `WhisperSTTService` is now a subclass of `SegmentedSTTService`. A
|
||
`SegmentedSTTService` is a `STTService` where the provided audio is given in a
|
||
big chunk (i.e. from when the user starts speaking until the user stops
|
||
speaking) instead of a continous stream.
|
||
|
||
### Fixed
|
||
|
||
- Fixed OpenAI multiple function calls.
|
||
|
||
- Fixed a Cartesia TTS issue that would cause audio to be truncated in some
|
||
cases.
|
||
|
||
- Fixed a `BaseOutputTransport` issue that would stop audio and video rendering
|
||
tasks (after receiving and `EndFrame`) before the internal queue was emptied,
|
||
causing the pipeline to finish prematurely.
|
||
|
||
- `StartFrame` should be the first frame every processor receives to avoid
|
||
situations where things are not initialized (because initialization happens on
|
||
`StartFrame`) and other frames come in resulting in undesired behavior.
|
||
|
||
### Performance
|
||
|
||
- `obj_id()` and `obj_count()` now use `itertools.count` avoiding the need of
|
||
`threading.Lock`.
|
||
|
||
### Other
|
||
|
||
- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).
|
||
|
||
## [0.0.41] - 2024-08-22
|
||
|
||
### Added
|
||
|
||
- Added `LivekitFrameSerializer` audio frame serializer.
|
||
|
||
### Fixed
|
||
|
||
- Fix `FastAPIWebsocketOutputTransport` variable name clash with subclass.
|
||
|
||
- Fix an `AnthropicLLMService` issue with empty arguments in function calling.
|
||
|
||
### Other
|
||
|
||
- Fixed `studypal` example errors.
|
||
|
||
## [0.0.40] - 2024-08-20
|
||
|
||
### Added
|
||
|
||
- VAD parameters can now be dynamicallt updated using the
|
||
`VADParamsUpdateFrame`.
|
||
|
||
- `ErrorFrame` has now a `fatal` field to indicate the bot should exit if a
|
||
fatal error is pushed upstream (false by default). A new `FatalErrorFrame`
|
||
that sets this flag to true has been added.
|
||
|
||
- `AnthropicLLMService` now supports function calling and initial support for
|
||
prompt caching.
|
||
(see https://www.anthropic.com/news/prompt-caching)
|
||
|
||
- `ElevenLabsTTSService` can now specify ElevenLabs input parameters such as
|
||
`output_format`.
|
||
|
||
- `TwilioFrameSerializer` can now specify Twilio's and Pipecat's desired sample
|
||
rates to use.
|
||
|
||
- Added new `on_participant_updated` event to `DailyTransport`.
|
||
|
||
- Added `DailyRESTHelper.delete_room_by_name()` and
|
||
`DailyRESTHelper.delete_room_by_url()`.
|
||
|
||
- Added LLM and TTS usage metrics. Those are enabled when
|
||
`PipelineParams.enable_usage_metrics` is True.
|
||
|
||
- `AudioRawFrame`s are now pushed downstream from the base output
|
||
transport. This allows capturing the exact words the bot says by adding an STT
|
||
service at the end of the pipeline.
|
||
|
||
- Added new `GStreamerPipelineSource`. This processor can generate image or
|
||
audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP
|
||
stream or anything supported by GStreamer).
|
||
|
||
- Added `TransportParams.audio_out_is_live`. This flag is False by default and
|
||
it is useful to indicate we should not synchronize audio with sporadic images.
|
||
|
||
- Added new `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame` control
|
||
frames. These frames are pushed upstream and they should wrap
|
||
`BotSpeakingFrame`.
|
||
|
||
- Transports now allow you to register event handlers without decorators.
|
||
|
||
### Changed
|
||
|
||
- Support RTVI message protocol 0.1. This includes new messages, support for
|
||
messages responses, support for actions, configuration, webhooks and a bunch
|
||
of new cool stuff.
|
||
(see https://docs.rtvi.ai/)
|
||
|
||
- `SileroVAD` dependency is now imported via pip's `silero-vad` package.
|
||
|
||
- `ElevenLabsTTSService` now uses `eleven_turbo_v2_5` model by default.
|
||
|
||
- `BotSpeakingFrame` is now a control frame.
|
||
|
||
- `StartFrame` is now a control frame similar to `EndFrame`.
|
||
|
||
- `DeepgramTTSService` now is more customizable. You can adjust the encoding and
|
||
sample rate.
|
||
|
||
### Fixed
|
||
|
||
- `TTSStartFrame` and `TTSStopFrame` are now sent when TTS really starts and
|
||
stops. This allows for knowing when the bot starts and stops speaking even
|
||
with asynchronous services (like Cartesia).
|
||
|
||
- Fixed `AzureSTTService` transcription frame timestamps.
|
||
|
||
- Fixed an issue with `DailyRESTHelper.create_room()` expirations which would
|
||
cause this function to stop working after the initial expiration elapsed.
|
||
|
||
- Improved `EndFrame` and `CancelFrame` handling. `EndFrame` should end things
|
||
gracefully while a `CancelFrame` should cancel all running tasks as soon as
|
||
possible.
|
||
|
||
- Fixed an issue in `AIService` that would cause a yielded `None` value to be
|
||
processed.
|
||
|
||
- RTVI's `bot-ready` message is now sent when the RTVI pipeline is ready and
|
||
a first participant joins.
|
||
|
||
- Fixed a `BaseInputTransport` issue that was causing incoming system frames to
|
||
be queued instead of being pushed immediately.
|
||
|
||
- Fixed a `BaseInputTransport` issue that was causing start/stop interruptions
|
||
incoming frames to not cancel tasks and be processed properly.
|
||
|
||
### Other
|
||
|
||
- Added `studypal` example (from to the Cartesia folks!).
|
||
|
||
- Most examples now use Cartesia.
|
||
|
||
- Added examples `foundational/19a-tools-anthropic.py`,
|
||
`foundational/19b-tools-video-anthropic.py` and
|
||
`foundational/19a-tools-togetherai.py`.
|
||
|
||
- Added examples `foundational/18-gstreamer-filesrc.py` and
|
||
`foundational/18a-gstreamer-videotestsrc.py` that show how to use
|
||
`GStreamerPipelineSource`
|
||
|
||
- Remove `requests` library usage.
|
||
|
||
- Cleanup examples and use `DailyRESTHelper`.
|
||
|
||
## [0.0.39] - 2024-07-23
|
||
|
||
### Fixed
|
||
|
||
- Fixed a regression introduced in 0.0.38 that would cause Daily transcription
|
||
to stop the Pipeline.
|
||
|
||
## [0.0.38] - 2024-07-23
|
||
|
||
### Added
|
||
|
||
- Added `force_reload`, `skip_validation` and `trust_repo` to `SileroVAD` and
|
||
`SileroVADAnalyzer`. This allows caching and various GitHub repo validations.
|
||
|
||
- Added `send_initial_empty_metrics` flag to `PipelineParams` to request for
|
||
initial empty metrics (zero values). True by default.
|
||
|
||
### Fixed
|
||
|
||
- Fixed initial metrics format. It was using the wrong keys name/time instead of
|
||
processor/value.
|
||
|
||
- STT services should be using ISO 8601 time format for transcription frames.
|
||
|
||
- Fixed an issue that would cause Daily transport to show a stop transcription
|
||
error when actually none occurred.
|
||
|
||
## [0.0.37] - 2024-07-22
|
||
|
||
### Added
|
||
|
||
- Added `RTVIProcessor` which implements the RTVI-AI standard.
|
||
See https://github.com/rtvi-ai
|
||
|
||
- Added `BotInterruptionFrame` which allows interrupting the bot while talking.
|
||
|
||
- Added `LLMMessagesAppendFrame` which allows appending messages to the current
|
||
LLM context.
|
||
|
||
- Added `LLMMessagesUpdateFrame` which allows changing the LLM context for the
|
||
one provided in this new frame.
|
||
|
||
- Added `LLMModelUpdateFrame` which allows updating the LLM model.
|
||
|
||
- Added `TTSSpeakFrame` which causes the bot say some text. This text will not
|
||
be part of the LLM context.
|
||
|
||
- Added `TTSVoiceUpdateFrame` which allows updating the TTS voice.
|
||
|
||
### Removed
|
||
|
||
- We remove the `LLMResponseStartFrame` and `LLMResponseEndFrame` frames. These
|
||
were added in the past to properly handle interruptions for the
|
||
`LLMAssistantContextAggregator`. But the `LLMContextAggregator` is now based
|
||
on `LLMResponseAggregator` which handles interruptions properly by just
|
||
processing the `StartInterruptionFrame`, so there's no need for these extra
|
||
frames any more.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with `StatelessTextTransformer` where it was pushing a string
|
||
instead of a `TextFrame`.
|
||
|
||
- `TTSService` end of sentence detection has been improved. It now works with
|
||
acronyms, numbers, hours and others.
|
||
|
||
- Fixed an issue in `TTSService` that would not properly flush the current
|
||
aggregated sentence if an `LLMFullResponseEndFrame` was found.
|
||
|
||
### Performance
|
||
|
||
- `CartesiaTTSService` now uses websockets which improves speed. It also
|
||
leverages the new Cartesia contexts which maintains generated audio prosody
|
||
when multiple inputs are sent, therefore improving audio quality a lot.
|
||
|
||
## [0.0.36] - 2024-07-02
|
||
|
||
### Added
|
||
|
||
- Added `GladiaSTTService`.
|
||
See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition
|
||
|
||
- Added `XTTSService`. This is a local Text-To-Speech service.
|
||
See https://github.com/coqui-ai/TTS
|
||
|
||
- Added `UserIdleProcessor`. This processor can be used to wait for any
|
||
interaction with the user. If the user doesn't say anything within a given
|
||
timeout a provided callback is called.
|
||
|
||
- Added `IdleFrameProcessor`. This processor can be used to wait for frames
|
||
within a given timeout. If no frame is received within the timeout a provided
|
||
callback is called.
|
||
|
||
- Added new frame `BotSpeakingFrame`. This frame will be continuously pushed
|
||
upstream while the bot is talking.
|
||
|
||
- It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer`
|
||
or `SileroVAD`.
|
||
|
||
- Added `AysncFrameProcessor` and `AsyncAIService`. Some services like
|
||
`DeepgramSTTService` need to process things asynchronously. For example, audio
|
||
is sent to Deepgram but transcriptions are not returned immediately. In these
|
||
cases we still require all frames (except system frames) to be pushed
|
||
downstream from a single task. That's what `AsyncFrameProcessor` is for. It
|
||
creates a task and all frames should be pushed from that task. So, whenever a
|
||
new Deepgram transcription is ready that transcription will also be pushed
|
||
from this internal task.
|
||
|
||
- The `MetricsFrame` now includes processing metrics if metrics are enabled. The
|
||
processing metrics indicate the time a processor needs to generate all its
|
||
output. Note that not all processors generate these kind of metrics.
|
||
|
||
### Changed
|
||
|
||
- `WhisperSTTService` model can now also be a string.
|
||
|
||
- Added missing \* keyword separators in services.
|
||
|
||
### Fixed
|
||
|
||
- `WebsocketServerTransport` doesn't try to send frames anymore if serializers
|
||
returns `None`.
|
||
|
||
- Fixed an issue where exceptions that occurred inside frame processors were
|
||
being swallowed and not displayed.
|
||
|
||
- Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send
|
||
data to the websocket after being closed.
|
||
|
||
### Other
|
||
|
||
- Added Fly.io deployment example in `examples/deployment/flyio-example`.
|
||
|
||
- Added new `17-detect-user-idle.py` example that shows how to use the new
|
||
`UserIdleProcessor`.
|
||
|
||
## [0.0.35] - 2024-06-28
|
||
|
||
### Changed
|
||
|
||
- `FastAPIWebsocketParams` now require a serializer.
|
||
|
||
- `TwilioFrameSerializer` now requires a `streamSid`.
|
||
|
||
### Fixed
|
||
|
||
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for
|
||
8000 sample rate.
|
||
|
||
## [0.0.34] - 2024-06-25
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
|
||
interruptions to ignore transcriptions.
|
||
|
||
- Fixed an issue introduced in 0.0.33 that would cause the LLM to generate
|
||
shorter output.
|
||
|
||
## [0.0.33] - 2024-06-25
|
||
|
||
### Changed
|
||
|
||
- Upgraded to Cartesia's new Python library 1.0.0. `CartesiaTTSService` now
|
||
expects a voice ID instead of a voice name (you can get the voice ID from
|
||
Cartesia's playground). You can also specify the audio `sample_rate` and
|
||
`encoding` instead of the previous `output_format`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
|
||
cause static audio issues and interruptions to not work properly when dealing
|
||
with multiple LLMs sentences.
|
||
|
||
- Fixed an issue that could mix new LLM responses with previous ones when
|
||
handling interruptions.
|
||
|
||
- Fixed a Daily transport blocking situation that occurred while reading audio
|
||
frames after a participant left the room. Needs daily-python >= 0.10.1.
|
||
|
||
## [0.0.32] - 2024-06-22
|
||
|
||
### Added
|
||
|
||
- Allow specifying a `DeepgramSTTService` url which allows using on-prem
|
||
Deepgram.
|
||
|
||
- Added new `FastAPIWebsocketTransport`. This is a new websocket transport that
|
||
can be integrated with FastAPI websockets.
|
||
|
||
- Added new `TwilioFrameSerializer`. This is a new serializer that knows how to
|
||
serialize and deserialize audio frames from Twilio.
|
||
|
||
- Added Daily transport event: `on_dialout_answered`. See
|
||
https://reference-python.daily.co/api_reference.html#daily.EventHandler
|
||
|
||
- Added new `AzureSTTService`. This allows you to use Azure Speech-To-Text.
|
||
|
||
### Performance
|
||
|
||
- Convert `BaseOutputTransport` and `BaseOutputTransport` to fully use asyncio
|
||
and remove the use of threads.
|
||
|
||
### Other
|
||
|
||
- Added `twilio-chatbot`. This is an example that shows how to integrate Twilio
|
||
phone numbers with a Pipecat bot.
|
||
|
||
- Updated `07f-interruptible-azure.py` to use `AzureLLMService`,
|
||
`AzureSTTService` and `AzureTTSService`.
|
||
|
||
## [0.0.31] - 2024-06-13
|
||
|
||
### Performance
|
||
|
||
- Break long audio frames into 20ms chunks instead of 10ms.
|
||
|
||
## [0.0.30] - 2024-06-13
|
||
|
||
### Added
|
||
|
||
- Added `report_only_initial_ttfb` to `PipelineParams`. This will make it so
|
||
only the initial TTFB metrics after the user stops talking are reported.
|
||
|
||
- Added `OpenPipeLLMService`. This service will let you run OpenAI through
|
||
OpenPipe's SDK.
|
||
|
||
- Allow specifying frame processors' name through a new `name` constructor
|
||
argument.
|
||
|
||
- Added `DeepgramSTTService`. This service has an ongoing websocket
|
||
connection. To handle this, it subclasses `AIService` instead of
|
||
`STTService`. The output of this service will be pushed from the same task,
|
||
except system frames like `StartFrame`, `CancelFrame` or
|
||
`StartInterruptionFrame`.
|
||
|
||
### Changed
|
||
|
||
- `FrameSerializer.deserialize()` can now return `None` in case it is not
|
||
possible to desearialize the given data.
|
||
|
||
- `daily_rest.DailyRoomProperties` now allows extra unknown parameters.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where `DailyRoomProperties.exp` always had the same old
|
||
timestamp unless set by the user.
|
||
|
||
- Fixed a couple of issues with `WebsocketServerTransport`. It needed to use
|
||
`push_audio_frame()` and also VAD was not working properly.
|
||
|
||
- Fixed an issue that would cause LLM aggregator to fail with small
|
||
`VADParams.stop_secs` values.
|
||
|
||
- Fixed an issue where `BaseOutputTransport` would send longer audio frames
|
||
preventing interruptions.
|
||
|
||
### Other
|
||
|
||
- Added new `07h-interruptible-openpipe.py` example. This example shows how to
|
||
use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe.
|
||
|
||
- Added new `dialin-chatbot` example. This examples shows how to call the bot
|
||
using a phone number.
|
||
|
||
## [0.0.29] - 2024-06-07
|
||
|
||
### Added
|
||
|
||
- Added a new `FunctionFilter`. This filter will let you filter frames based on
|
||
a given function, except system messages which should never be filtered.
|
||
|
||
- Added `FrameProcessor.can_generate_metrics()` method to indicate if a
|
||
processor can generate metrics. In the future this might get an extra argument
|
||
to ask for a specific type of metric.
|
||
|
||
- Added `BasePipeline`. All pipeline classes should be based on this class. All
|
||
subclasses should implement a `processors_with_metrics()` method that returns
|
||
a list of all `FrameProcessor`s in the pipeline that can generate metrics.
|
||
|
||
- Added `enable_metrics` to `PipelineParams`.
|
||
|
||
- Added `MetricsFrame`. The `MetricsFrame` will report different metrics in the
|
||
system. Right now, it can report TTFB (Time To First Byte) values for
|
||
different services, that is the time spent between the arrival of a `Frame` to
|
||
the processor/service until the first `DataFrame` is pushed downstream. If
|
||
metrics are enabled an intial `MetricsFrame` with all the services in the
|
||
pipeline will be sent.
|
||
|
||
- Added TTFB metrics and debug logging for TTS services.
|
||
|
||
### Changed
|
||
|
||
- Moved `ParallelTask` to `pipecat.pipeline.parallel_task`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed PlayHT TTS service to work properly async.
|
||
|
||
## [0.0.28] - 2024-06-05
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with `SileroVADAnalyzer` that would cause memory to keep
|
||
growing indefinitely.
|
||
|
||
## [0.0.27] - 2024-06-05
|
||
|
||
### Added
|
||
|
||
- Added `DailyTransport.participants()` and `DailyTransport.participant_counts()`.
|
||
|
||
## [0.0.26] - 2024-06-05
|
||
|
||
### Added
|
||
|
||
- Added `OpenAITTSService`.
|
||
|
||
- Allow passing `output_format` and `model_id` to `CartesiaTTSService` to change
|
||
audio sample format and the model to use.
|
||
|
||
- Added `DailyRESTHelper` which helps you create Daily rooms and tokens in an
|
||
easy way.
|
||
|
||
- `PipelineTask` now has a `has_finished()` method to indicate if the task has
|
||
completed. If a task is never ran `has_finished()` will return False.
|
||
|
||
- `PipelineRunner` now supports SIGTERM. If received, the runner will be
|
||
cancelled.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where `BaseInputTransport` and `BaseOutputTransport` where
|
||
stopping push tasks before pushing `EndFrame` frames could cause the bots to
|
||
get stuck.
|
||
|
||
- Fixed an error closing local audio transports.
|
||
|
||
- Fixed an issue with Deepgram TTS that was introduced in the previous release.
|
||
|
||
- Fixed `AnthropicLLMService` interruptions. If an interruption occurred, a
|
||
`user` message could be appended after the previous `user` message. Anthropic
|
||
does not allow that because it requires alternate `user` and `assistant`
|
||
messages.
|
||
|
||
### Performance
|
||
|
||
- The `BaseInputTransport` does not pull audio frames from sub-classes any
|
||
more. Instead, sub-classes now push audio frames into a queue in the base
|
||
class. Also, `DailyInputTransport` now pushes audio frames every 20ms instead
|
||
of 10ms.
|
||
|
||
- Remove redundant camera input thread from `DailyInputTransport`. This should
|
||
improve performance a little bit when processing participant videos.
|
||
|
||
- Load Cartesia voice on startup.
|
||
|
||
## [0.0.25] - 2024-05-31
|
||
|
||
### Added
|
||
|
||
- Added WebsocketServerTransport. This will create a websocket server and will
|
||
read messages coming from a client. The messages are serialized/deserialized
|
||
with protobufs. See `examples/websocket-server` for a detailed example.
|
||
|
||
- Added function calling (LLMService.register_function()). This will allow the
|
||
LLM to call functions you have registered when needed. For example, if you
|
||
register a function to get the weather in Los Angeles and ask the LLM about
|
||
the weather in Los Angeles, the LLM will call your function.
|
||
See https://platform.openai.com/docs/guides/function-calling
|
||
|
||
- Added new `LangchainProcessor`.
|
||
|
||
- Added Cartesia TTS support (https://cartesia.ai/)
|
||
|
||
### Fixed
|
||
|
||
- Fixed SileroVAD frame processor.
|
||
|
||
- Fixed an issue where `camera_out_enabled` would cause the highg CPU usage if
|
||
no image was provided.
|
||
|
||
### Performance
|
||
|
||
- Removed unnecessary audio input tasks.
|
||
|
||
## [0.0.24] - 2024-05-29
|
||
|
||
### Added
|
||
|
||
- Exposed `on_dialin_ready` for Daily transport SIP endpoint handling. This
|
||
notifies when the Daily room SIP endpoints are ready. This allows integrating
|
||
with third-party services like Twilio.
|
||
|
||
- Exposed Daily transport `on_app_message` event.
|
||
|
||
- Added Daily transport `on_call_state_updated` event.
|
||
|
||
- Added Daily transport `start_recording()`, `stop_recording` and
|
||
`stop_dialout`.
|
||
|
||
### Changed
|
||
|
||
- Added `PipelineParams`. This replaces the `allow_interruptions` argument in
|
||
`PipelineTask` and will allow future parameters in the future.
|
||
|
||
- Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.
|
||
|
||
- GoogleLLMService `api_key` argument is now mandatory.
|
||
|
||
### Fixed
|
||
|
||
- Daily tranport `dialin-ready` doesn't not block anymore and it now handles
|
||
timeouts.
|
||
|
||
- Fixed AzureLLMService.
|
||
|
||
## [0.0.23] - 2024-05-23
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue handling Daily transport `dialin-ready` event.
|
||
|
||
## [0.0.22] - 2024-05-23
|
||
|
||
### Added
|
||
|
||
- Added Daily transport `start_dialout()` to be able to make phone or SIP calls.
|
||
See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout
|
||
|
||
- Added Daily transport support for dial-in use cases.
|
||
|
||
- Added Daily transport events: `on_dialout_connected`, `on_dialout_stopped`,
|
||
`on_dialout_error` and `on_dialout_warning`. See
|
||
https://reference-python.daily.co/api_reference.html#daily.EventHandler
|
||
|
||
## [0.0.21] - 2024-05-22
|
||
|
||
### Added
|
||
|
||
- Added vision support to Anthropic service.
|
||
|
||
- Added `WakeCheckFilter` which allows you to pass information downstream only
|
||
if you say a certain phrase/word.
|
||
|
||
### Changed
|
||
|
||
- `FrameSerializer.serialize()` and `FrameSerializer.deserialize()` are now
|
||
`async`.
|
||
|
||
- `Filter` has been renamed to `FrameFilter` and it's now under
|
||
`processors/filters`.
|
||
|
||
### Fixed
|
||
|
||
- Fixed Anthropic service to use new frame types.
|
||
|
||
- Fixed an issue in `LLMUserResponseAggregator` and `UserResponseAggregator`
|
||
that would cause frames after a brief pause to not be pushed to the LLM.
|
||
|
||
- Clear the audio output buffer if we are interrupted.
|
||
|
||
- Re-add exponential smoothing after volume calculation. This makes sure the
|
||
volume value being used doesn't fluctuate so much.
|
||
|
||
## [0.0.20] - 2024-05-22
|
||
|
||
### Added
|
||
|
||
- In order to improve interruptions we now compute a loudness level using
|
||
[pyloudnorm](https://github.com/csteinmetz1/pyloudnorm). The audio coming
|
||
WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm
|
||
applied to the signal, however we don't do that on our local PyAudio
|
||
signals. This means that currently incoming audio from PyAudio is kind of
|
||
broken. We will fix it in future releases.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where `StartInterruptionFrame` would cause
|
||
`LLMUserResponseAggregator` to push the accumulated text causing the LLM
|
||
respond in the wrong task. The `StartInterruptionFrame` should not trigger any
|
||
new LLM response because that would be spoken in a different task.
|
||
|
||
- Fixed an issue where tasks and threads could be paused because the executor
|
||
didn't have more tasks available. This was causing issues when cancelling and
|
||
recreating tasks during interruptions.
|
||
|
||
## [0.0.19] - 2024-05-20
|
||
|
||
### Changed
|
||
|
||
- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` internal
|
||
messages are now exposed through the `messages` property.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where `LLMAssistantResponseAggregator` was not accumulating the
|
||
full response but short sentences instead. If there's an interruption we only
|
||
accumulate what the bot has spoken until now in a long response as well.
|
||
|
||
## [0.0.18] - 2024-05-20
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue in `DailyOuputTransport` where transport messages were not
|
||
being sent.
|
||
|
||
## [0.0.17] - 2024-05-19
|
||
|
||
### Added
|
||
|
||
- Added `google.generativeai` model support, including vision. This new `google`
|
||
service defaults to using `gemini-1.5-flash-latest`. Example in
|
||
`examples/foundational/12a-describe-video-gemini-flash.py`.
|
||
|
||
- Added vision support to `openai` service. Example in
|
||
`examples/foundational/12a-describe-video-gemini-flash.py`.
|
||
|
||
- Added initial interruptions support. The assistant contexts (or aggregators)
|
||
should now be placed after the output transport. This way, only the completed
|
||
spoken context is added to the assistant context.
|
||
|
||
- Added `VADParams` so you can control voice confidence level and others.
|
||
|
||
- `VADAnalyzer` now uses an exponential smoothed volume to improve speech
|
||
detection. This is useful when voice confidence is high (because there's
|
||
someone talking near you) but volume is low.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue where TTSService was not pushing TextFrames downstream.
|
||
|
||
- Fixed issues with Ctrl-C program termination.
|
||
|
||
- Fixed an issue that was causing `StopTaskFrame` to actually not exit the
|
||
`PipelineTask`.
|
||
|
||
## [0.0.16] - 2024-05-16
|
||
|
||
### Fixed
|
||
|
||
- `DailyTransport`: don't publish camera and audio tracks if not enabled.
|
||
|
||
- Fixed an issue in `BaseInputTransport` that was causing frames pushed
|
||
downstream not pushed in the right order.
|
||
|
||
## [0.0.15] - 2024-05-15
|
||
|
||
### Fixed
|
||
|
||
- Quick hot fix for receiving `DailyTransportMessage`.
|
||
|
||
## [0.0.14] - 2024-05-15
|
||
|
||
### Added
|
||
|
||
- Added `DailyTransport` event `on_participant_left`.
|
||
|
||
- Added support for receiving `DailyTransportMessage`.
|
||
|
||
### Fixed
|
||
|
||
- Images are now resized to the size of the output camera. This was causing
|
||
images not being displayed.
|
||
|
||
- Fixed an issue in `DailyTransport` that would not allow the input processor to
|
||
shutdown if no participant ever joined the room.
|
||
|
||
- Fixed base transports start and stop. In some situation processors would halt
|
||
or not shutdown properly.
|
||
|
||
## [0.0.13] - 2024-05-14
|
||
|
||
### Changed
|
||
|
||
- `MoondreamService` argument `model_id` is now `model`.
|
||
|
||
- `VADAnalyzer` arguments have been renamed for more clarity.
|
||
|
||
### Fixed
|
||
|
||
- Fixed an issue with `DailyInputTransport` and `DailyOutputTransport` that
|
||
could cause some threads to not start properly.
|
||
|
||
- Fixed `STTService`. Add `max_silence_secs` and `max_buffer_secs` to handle
|
||
better what's being passed to the STT service. Also add exponential smoothing
|
||
to the RMS.
|
||
|
||
- Fixed `WhisperSTTService`. Add `no_speech_prob` to avoid garbage output text.
|
||
|
||
## [0.0.12] - 2024-05-14
|
||
|
||
### Added
|
||
|
||
- Added `DailyTranscriptionSettings` to be able to specify transcription
|
||
settings much easier (e.g. language).
|
||
|
||
### Other
|
||
|
||
- Updated `simple-chatbot` with Spanish.
|
||
|
||
- Add missing dependencies in some of the examples.
|
||
|
||
## [0.0.11] - 2024-05-13
|
||
|
||
### Added
|
||
|
||
- Allow stopping pipeline tasks with new `StopTaskFrame`.
|
||
|
||
### Changed
|
||
|
||
- TTS, STT and image generation service now use `AsyncGenerator`.
|
||
|
||
### Fixed
|
||
|
||
- `DailyTransport`: allow registering for participant transcriptions even if
|
||
input transport is not initialized yet.
|
||
|
||
### Other
|
||
|
||
- Updated `storytelling-chatbot`.
|
||
|
||
## [0.0.10] - 2024-05-13
|
||
|
||
### Added
|
||
|
||
- Added Intel GPU support to `MoondreamService`.
|
||
|
||
- Added support for sending transport messages (e.g. to communicate with an app
|
||
at the other end of the transport).
|
||
|
||
- Added `FrameProcessor.push_error()` to easily send an `ErrorFrame` upstream.
|
||
|
||
### Fixed
|
||
|
||
- Fixed Azure services (TTS and image generation).
|
||
|
||
### Other
|
||
|
||
- Updated `simple-chatbot`, `moondream-chatbot` and `translation-chatbot`
|
||
examples.
|
||
|
||
## [0.0.9] - 2024-05-12
|
||
|
||
### Changed
|
||
|
||
Many things have changed in this version. Many of the main ideas such as frames,
|
||
processors, services and transports are still there but some things have changed
|
||
a bit.
|
||
|
||
- `Frame`s describe the basic units for processing. For example, text, image or
|
||
audio frames. Or control frames to indicate a user has started or stopped
|
||
speaking.
|
||
|
||
- `FrameProcessor`s process frames (e.g. they convert a `TextFrame` to an
|
||
`ImageRawFrame`) and push new frames downstream or upstream to their linked
|
||
peers.
|
||
|
||
- `FrameProcessor`s can be linked together. The easiest wait is to use the
|
||
`Pipeline` which is a container for processors. Linking processors allow
|
||
frames to travel upstream or downstream easily.
|
||
|
||
- `Transport`s are a way to send or receive frames. There can be local
|
||
transports (e.g. local audio or native apps), network transports
|
||
(e.g. websocket) or service transports (e.g. https://daily.co).
|
||
|
||
- `Pipeline`s are just a processor container for other processors.
|
||
|
||
- A `PipelineTask` know how to run a pipeline.
|
||
|
||
- A `PipelineRunner` can run one or more tasks and it is also used, for example,
|
||
to capture Ctrl-C from the user.
|
||
|
||
## [0.0.8] - 2024-04-11
|
||
|
||
### Added
|
||
|
||
- Added `FireworksLLMService`.
|
||
|
||
- Added `InterimTranscriptionFrame` and enable interim results in
|
||
`DailyTransport` transcriptions.
|
||
|
||
### Changed
|
||
|
||
- `FalImageGenService` now uses new `fal_client` package.
|
||
|
||
### Fixed
|
||
|
||
- `FalImageGenService`: use `asyncio.to_thread` to not block main loop when
|
||
generating images.
|
||
|
||
- Allow `TranscriptionFrame` after an end frame (transcriptions can be delayed
|
||
and received after `UserStoppedSpeakingFrame`).
|
||
|
||
## [0.0.7] - 2024-04-10
|
||
|
||
### Added
|
||
|
||
- Add `use_cpu` argument to `MoondreamService`.
|
||
|
||
## [0.0.6] - 2024-04-10
|
||
|
||
### Added
|
||
|
||
- Added `FalImageGenService.InputParams`.
|
||
|
||
- Added `URLImageFrame` and `UserImageFrame`.
|
||
|
||
- Added `UserImageRequestFrame` and allow requesting an image from a participant.
|
||
|
||
- Added base `VisionService` and `MoondreamService`
|
||
|
||
### Changed
|
||
|
||
- Don't pass `image_size` to `ImageGenService`, images should have their own size.
|
||
|
||
- `ImageFrame` now receives a tuple`(width,height)` to specify the size.
|
||
|
||
- `on_first_other_participant_joined` now gets a participant argument.
|
||
|
||
### Fixed
|
||
|
||
- Check if camera, speaker and microphone are enabled before writing to them.
|
||
|
||
### Performance
|
||
|
||
- `DailyTransport` only subscribe to desired participant video track.
|
||
|
||
## [0.0.5] - 2024-04-06
|
||
|
||
### Changed
|
||
|
||
- Use `camera_bitrate` and `camera_framerate`.
|
||
|
||
- Increase `camera_framerate` to 30 by default.
|
||
|
||
### Fixed
|
||
|
||
- Fixed `LocalTransport.read_audio_frames`.
|
||
|
||
## [0.0.4] - 2024-04-04
|
||
|
||
### Added
|
||
|
||
- Added project optional dependencies `[silero,openai,...]`.
|
||
|
||
### Changed
|
||
|
||
- Moved thransports to its own directory.
|
||
|
||
- Use `OPENAI_API_KEY` instead of `OPENAI_CHATGPT_API_KEY`.
|
||
|
||
### Fixed
|
||
|
||
- Don't write to microphone/speaker if not enabled.
|
||
|
||
### Other
|
||
|
||
- Added live translation example.
|
||
|
||
- Fix foundational examples.
|
||
|
||
## [0.0.3] - 2024-03-13
|
||
|
||
### Other
|
||
|
||
- Added `storybot` and `chatbot` examples.
|
||
|
||
## [0.0.2] - 2024-03-12
|
||
|
||
Initial public release.
|