# Changelog

All notable changes to **Pipecat** will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

<!-- towncrier release notes start -->

## [1.2.1] - 2026-05-15

### Changed

- Changed the default WebSocket endpoints for `GradiumSTTService` and
  `GradiumTTSService` to the region-neutral
  `wss://api.gradium.ai/api/speech/asr` and
  `wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes
  traffic to the nearest endpoint. Override the url to pin to a specific
  region.
  (PR [#4500](https://github.com/pipecat-ai/pipecat/pull/4500))

### Fixed

- Fixed bot hangs when `filter_incomplete_user_turns` was enabled and the LLM
  responded by calling a tool. The user turn never finalized, so the assistant
  aggregator gated the tool-result context push and the LLM continuation never
  ran. Tool calls now finalize the turn the moment they start, before the
  function dispatches.
  (PR [#4501](https://github.com/pipecat-ai/pipecat/pull/4501))

## [1.2.0] - 2026-05-14

### Added

- Added a `session_id` field to `RunnerArguments` so bots can log or trace a
  per-session identifier in local development the same way they can in Pipecat
  Cloud. The development runner now mints a UUID at every construction site,
  and paths that already returned a `sessionId` to the caller (Daily `/start`,
  dial-in webhook) share that same UUID with the runner args instead of
  generating two. The SmallWebRTC `/api/offer` endpoint also accepts an
  optional `session_id` query parameter so the `/sessions/{session_id}/...`
  proxy can thread it through.
  (PR [#4385](https://github.com/pipecat-ai/pipecat/pull/4385))

- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService`
  for controlling Cartesia's server-side text buffering. When unset, Pipecat
  picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE`
  mode (custom buffering — avoids stacking client-side aggregation on top of
  Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode
  (Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to
  override.
  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))

- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and
  `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
  Improvement Program. When set, the value is forwarded to Deepgram as a query
  parameter on the speak request. Defaults to `None`, which preserves the
  existing behavior. See https://dpgr.am/deepgram-mip for pricing implications
  before enabling.
  (PR [#4400](https://github.com/pipecat-ai/pipecat/pull/4400))

- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set
  via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that
  appends a developer-role message to the context whenever `LLMSetToolsFrame`
  changes the set of advertised standard tools. Helps the LLM stay coherent
  across mid-conversation tool changes, mitigating several flavors of
  tool-call-related hallucination: calling tools that have been removed,
  avoiding tools that have been re-added, and hallucinating output (made-up
  answers or tool-call-shaped non-tool-calls) when tools are unavailable.
  (PR [#4404](https://github.com/pipecat-ai/pipecat/pull/4404))

- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in
  `pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the
  inference-triggered event and suppresses `on_user_turn_stopped`, leaving
  finalization to another strategy in the chain such as
  `LLMTurnCompletionUserTurnStopStrategy`.
  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))

- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop` —
  a generic stop strategy that finalizes the user turn whenever a
  `UserTurnInferenceCompletedFrame` arrives, regardless of which component
  produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base;
  future producers (Flux, custom end-of-turn classifiers, etc.) can use the
  base directly or subclass it to add producer-specific setup.
  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))

- Added `on_user_turn_inference_triggered`, a new event on the user turn
  controller, processor, aggregator and stop strategies that fires when a
  strategy has enough signal to start LLM inference. By default it fires
  together with `on_user_turn_stopped`; a gating strategy can fire only the
  inference-triggered event and defer finalization to a peer.
  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))

- Added `FilterIncompleteUserTurnStrategies` in
  `pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization
  that wraps the detector chain with `deferred(...)` and appends
  `LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case:
  `user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass
  `config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.
  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))

- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`.
  When installed, the strategy gates `on_user_turn_stopped` on a
  `UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by
  any component that can judge turn completeness — e.g. the
  `UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout`
  provides a safety net if no completion frame ever arrives.
  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))

- Added first-class RTVI support for the UI Agent Protocol:
    - Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server
  messages, plus `ui-command` and `ui-task` server-to-client messages, with
  paired `*Data` / `*Message` pydantic models.
    - Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`,
  `Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching
  default handlers live in `@pipecat-ai/client-react`.
    - Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`,
  and `ui-cancel-task` messages.
    - Adds five UI pipeline frames, mirroring the `client-message`
  frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` /
  `RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` /
  `UITaskMessage` envelopes, while the processor pushes inbound
  `RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame`
  alongside `on_ui_message`.
    - Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.
  (PR [#4407](https://github.com/pipecat-ai/pipecat/pull/4407))

- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore
  processor now resolve credentials via the standard boto3 provider chain (EC2
  instance profiles, EKS pod roles / IRSA, ECS task roles, SSO,
  `~/.aws/credentials`) when explicit credentials and `AWS_*` environment
  variables are absent. Services running with IAM roles no longer need to
  export static credentials.
  (PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))

- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can
  bias transcription for both file-based and realtime transcription.
  (PR [#4426](https://github.com/pipecat-ai/pipecat/pull/4426))

- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and
  `DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum
  silence duration before the watchdog sends a silence packet to prevent
  dangling turns. The actual threshold is `max(chunk_duration * 2,
  watchdog_min_timeout)`, so it also adapts automatically to the audio chunk
  size in use.
  (PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))

- Added `cancel_on_interruption=False` support for `GeminiLiveLLMService` on
  models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini
  2.x); the conversation now continues while the tool runs. On models that
  don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time
  warning explaining the limitation. (Note: an intermittent 1008 error can
  occasionally fire on Gemini 2.5 during long-running tool calls; we
  auto-reconnect.)
  (PR [#4448](https://github.com/pipecat-ai/pipecat/pull/4448))

- Added `NvidiaSageMakerWebsocketSTTService` for streaming speech recognition
  using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint.
  Produces `InterimTranscriptionFrame` and `TranscriptionFrame` frames, is
  VAD-aware, and automatically reconnects on error.
  (PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))

- Added NVIDIA Magpie TTS services via AWS SageMaker:
  `NvidiaSageMakerHTTPTTSService` (single HTTP invocation, streams raw PCM
  back) and `NvidiaSageMakerWebsocketTTSService` (persistent HTTP/2 bidi-stream
  with full interruption support via `InterruptibleTTSService`).
  (PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))

- Added support for `reasoning` configuration on `OpenAIRealtimeLLMService`,
  for use with reasoning-capable Realtime models such as `gpt-realtime-2`.
  (PR [#4470](https://github.com/pipecat-ai/pipecat/pull/4470))

- Inworld TTS updates:
    - Added `delivery_mode` setting (`STABLE`/`BALANCED`/`CREATIVE`) to
  `InworldTTSService` and `InworldHttpTTSService`, enabling the
  stability-vs-creativity tradeoff in `inworld-tts-2`.
    - Added language support to `InworldTTSService` and
  `InworldHttpTTSService`. The `language` setting is now forwarded to the API,
  and a new `language_to_inworld_language()` helper normalizes Pipecat
  `Language` enums to Inworld's BCP-47 locale tags.
  (PR [#4473](https://github.com/pipecat-ai/pipecat/pull/4473))

### Changed

- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the
  generally available `tts-rt-v1`.
  (PR [#4386](https://github.com/pipecat-ai/pipecat/pull/4386))

- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16`
  to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the
  `use_normalized_timestamps` and `max_buffer_delay_ms` fields.
  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))

- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead
  of the deprecated `use_original_timestamps` field. Word timestamps now
  reflect what was actually spoken (post text-normalization and
  pronunciation-dictionary substitution), matching the convention Pipecat uses
  for ElevenLabs. This is a behavior change for `sonic-3` users, who were
  previously receiving timestamps tied to the input transcript.
  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))

- Broadened `tool_resources` to `app_resources` for easy access not just in
  tool handlers but in other places like custom `FrameProcessor`s. Three
  changes: a rename (`tool_resources` → `app_resources`), a new `app_resources`
  property on `PipelineTask`, and a new `pipeline_task` property on
  `FrameProcessor`. Tool handlers now read `params.app_resources`; custom
  processors read `self.pipeline_task.app_resources`. The previous
  `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and
  `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit
  `DeprecationWarning`s.
  (PR [#4395](https://github.com/pipecat-ai/pipecat/pull/4395))

- Lowered the per-message log in
  `SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App
  messages can be high-frequency and were noisy at debug level; set the loguru
  level to `TRACE` to see them again.
  (PR [#4397](https://github.com/pipecat-ai/pipecat/pull/4397))

- Changed the default model for `GrokRealtimeLLMService` to
  `grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The
  previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is
  being removed.
  (PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))

- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to
  `inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`,
  `InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing
  users can pin the prior model explicitly via the `model`/`tts_model`
  argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid
  model IDs.
  (PR [#4422](https://github.com/pipecat-ai/pipecat/pull/4422))

- Changed the default model for `GrokLLMService` from `grok-3` to
  `grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.
  (PR [#4429](https://github.com/pipecat-ai/pipecat/pull/4429))

- `DeepgramFluxSTT` watchdog silence threshold is now dynamic:
  `max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms.
  This prevents false silence injections when large audio chunks are sent at
  lower frequency.
  (PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))

- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the
  turn is complete (on `on_turn_context_completed`) rather than waiting until
  all audio has finished playing back. The `isFinal` message from ElevenLabs is
  now used to signal `TTSStoppedFrame` and clean up the audio context,
  improving turn transition timing.
  (PR [#4433](https://github.com/pipecat-ai/pipecat/pull/4433))

- Updated `InworldHttpTTSService` and `InworldTTSService` to use PCM audio
  encoding by default, which returns audio bytes without headers.
  (PR [#4446](https://github.com/pipecat-ai/pipecat/pull/4446))

- Moved `create_task`, `cancel_task`, the `task_manager` property, and
  `setup(task_manager)` up from `FrameProcessor` to `BaseObject`. Custom
  `BaseObject` subclasses (turn strategies, controllers, etc.) now inherit
  these methods directly instead of reimplementing the task manager wiring.
  Owners propagate the task manager to their child `BaseObject`s via `await
  child.setup(task_manager)`.
  (PR [#4449](https://github.com/pipecat-ai/pipecat/pull/4449))

- Changed the default OpenAI Realtime input audio transcription model from
  `gpt-4o-transcribe` to `gpt-realtime-whisper` for both
  `OpenAIRealtimeSTTService` and `OpenAIRealtimeLLMService`. The new model does
  not accept the `prompt` parameter; if a prompt is supplied alongside
  `gpt-realtime-whisper`, it is dropped automatically and a warning is logged.
  To keep using prompt hints, explicitly pin `model="gpt-4o-transcribe"` (or
  `"gpt-4o-mini-transcribe"`).
  (PR [#4450](https://github.com/pipecat-ai/pipecat/pull/4450))

- Updated the default model for `CartesiaTTSService` and
  `CartesiaHttpTTSService` from `sonic-3` to `sonic-3.5`.
  (PR [#4462](https://github.com/pipecat-ai/pipecat/pull/4462))

- Changed the default model for `OpenAIRealtimeLLMService` from
  `gpt-realtime-1.5` to `gpt-realtime-2`.
  (PR [#4472](https://github.com/pipecat-ai/pipecat/pull/4472))

### Deprecated

- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use
  `user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add
  `LLMTurnCompletionUserTurnStopStrategy` to a custom
  `user_turn_strategies.stop`) instead. Setting the legacy flag still works for
  one release: the aggregator emits a `DeprecationWarning` and rewires the
  strategies as if you had passed `FilterIncompleteUserTurnStrategies`
  directly.
  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))

- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the
  `create_file_resampler()` / `create_stream_resampler()` factories).
  Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class
  will be removed in Pipecat 2.0 along with the default `resampy` and `numba`
  dependencies.
  (PR [#4428](https://github.com/pipecat-ai/pipecat/pull/4428))

### Fixed

- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as
  `ErrorFrame`s. The latest API emits a `flush_done` per transcript when
  server-side buffering is disabled; Pipecat now consumes them silently since
  each turn already has its own `context_id`.
  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))

- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`,
  `VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance
  (e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both
  the class and an instance.
  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))

- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200
  response — one with the API's error text and a second, less informative
  "Unknown error" frame from the outer exception handler. It now pushes a
  single frame that includes the HTTP status code and returns cleanly.
  (PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))

- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally
  for user turn stop strategies. It is now only imported when
  `default_user_turn_stop_strategies()` is called. This improves startup time
  and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning
  when the default stop strategies are not used.
  (PR [#4393](https://github.com/pipecat-ai/pipecat/pull/4393))

- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was
  stored in `Settings` but never sent to xAI, so every session silently fell
  back to xAI's server-side default. The model is now passed via the `?model=`
  query parameter on the WebSocket URL as xAI's Voice Agent API requires.
  (PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))

- Fixed `on_user_turn_stopped` firing prematurely when
  `filter_incomplete_user_turns` was enabled. The event now fires only after
  the LLM confirms the user turn is complete (`✓`); previously the smart-turn
  detector's tentative stop was bubbling up before the LLM had a chance to veto
  it, causing observers, transcript appenders and UI indicators to receive an
  early — and sometimes duplicated — signal.
  (PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))

- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting
  across two assistant messages in the LLM context and not surfacing in
  `on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted
  at the end of a TTS context now carries a PTS just past the last word so it
  can't overtake clock-queued `TTSTextFrame`s in the transport's output, and
  `LLMAssistantAggregator` now triggers
  `on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the
  frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting
  transcripts).
  (PR [#4414](https://github.com/pipecat-ai/pipecat/pull/4414))

- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged
  words (e.g. `bookLook`) when using Flash models. Flash often splits sentences
  mid-stream into alignment chunks that begin with a real inter-word space, but
  the previous fix unconditionally stripped that space from every chunk.
  Leading spaces are now stripped only on the first alignment chunk of an
  utterance, so subsequent chunks correctly flush partial words across
  boundaries.
  (PR [#4415](https://github.com/pipecat-ai/pipecat/pull/4415))

- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor
  erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`
  was set in the environment. The half-populated kwargs are no longer forwarded
  to aioboto3; partial env-var configurations now fall through to the boto3
  credential chain like fully-unset configurations do.
  (PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))

- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing
  romanized/normalized text to the LLM context. With non-Latin input (e.g.,
  Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao
  !` instead of `你好！`), which then degraded subsequent LLM turns. The services
  now consume `alignment` by default and only switch to `normalizedAlignment` /
  `normalized_alignment` when `pronunciation_dictionary_locators` is configured
  (where `alignment` has overlapping restarts that produce duplicated/garbled
  words, per #4316). Both fields are read with preferred-with-fallback
  semantics since each is nullable per the API schema.
  (PR [#4424](https://github.com/pipecat-ai/pipecat/pull/4424))

- Fixed a deadlock in `TTSService` that could permanently stall pipeline
  processing when all three conditions occurred together:
  `pause_frame_processing=True`, an interruption arrived before any TTS audio
  was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`,
  `FunctionCallResultFrame`) was in the processing queue at that moment. The
  process task would block on `__process_event.wait()` indefinitely because
  `BotStoppedSpeakingFrame` never arrives (no audio was played) and the
  interruption handler did not resume processing. Affects services using
  `pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and
  ResembleAI.
  (PR [#4431](https://github.com/pipecat-ai/pipecat/pull/4431))

- Fixed interruptions being delayed when a slow non-uninterruptible frame was
  processing and an uninterruptible frame was waiting in the queue. The bot
  would stall until the slow frame finished instead of cancelling it
  immediately on interruption.
  (PR [#4434](https://github.com/pipecat-ai/pipecat/pull/4434))

- Fixed `TTSService` dropping uninterruptible frames (e.g.
  `FunctionCallResultFrame`) from its internal serialization queue when an
  interruption occurs. Previously, the queue was recreated on every
  interruption, silently discarding any queued frames. The queue is now reset
  instead of recreated, preserving uninterruptible frames so they are always
  delivered downstream.
  (PR [#4435](https://github.com/pipecat-ai/pipecat/pull/4435))

- Fixed a race condition in the Daily transport that caused `AttributeError:
  'NoneType' object has no attribute 'send_app_message'` when tearing down a
  pipeline. Both `DailyInputTransport` and `DailyOutputTransport` share the
  same `DailyTransportClient` and both call `cleanup()`, which was releasing
  the underlying `CallClient` on the first call — leaving the second caller
  with a `None` client.
  (PR [#4440](https://github.com/pipecat-ai/pipecat/pull/4440))

- Restored `cancel_on_interruption=False` support for `AWSNovaSonicLLMService`
  and `OpenAIRealtimeLLMService`. These services previously honored the flag by
  simply not cancelling in-flight function calls on interruption; the
  introduction of the new async-tool mechanism (which threads
  started/intermediate/final messages through the LLM context) broke that path
  because the realtime services didn't know how to interpret those messages.
  Note that new-style streamed intermediate results
  (`FunctionCallResultProperties(is_final=False)`) are not supported on these
  realtime services. Similar fixes for other impacted realtime services are
  forthcoming.
  (PR [#4441](https://github.com/pipecat-ai/pipecat/pull/4441))

- Fixed two misspelled Gemini TTS voice names in
  `GeminiTTSService.AVAILABLE_VOICES`.
  (PR [#4443](https://github.com/pipecat-ai/pipecat/pull/4443))

- Extended the `cancel_on_interruption=False` regression fix to
  `GrokRealtimeLLMService`, `AzureRealtimeLLMService`, and
  `UltravoxRealtimeLLMService`. Grok and Azure use the same approach as in
  #4441 (each service detects async-tool messages in the LLM context and routes
  the final result to its formal tool-result channel; Azure inherits
  transitively from `OpenAIRealtimeLLMService`). Ultravox needed a different
  approach because its API freezes the conversation between
  `client_tool_invocation` and the matching `client_tool_result` — for
  async-registered functions it now ships a placeholder `client_tool_result`
  immediately when the function is invoked (to unfreeze the conversation), then
  injects the real result as user-side text once the tool finishes. Streamed
  intermediate results (`FunctionCallResultProperties(is_final=False)`) are
  still not supported on any of these realtime services. `GeminiLiveLLMService`
  and `InworldRealtimeLLMService` are excluded for now: Gemini Live's
  async-tool path needs deeper investigation, and Inworld tool calling needs to
  be sorted out first.
  (PR [#4447](https://github.com/pipecat-ai/pipecat/pull/4447))

- Fixed `OpenAIRealtimeLLMService` handling of multi-output-item responses
  (observed with `gpt-realtime-2`). A single response can now contain more than
  one audio item, and the first item's `audio.done` may arrive after the second
  item's deltas have started. Deltas still arrive strictly in playback order,
  so we continue to forward them as received (matching OpenAI's reference
  implementation). The fix removes spurious warnings, ensures truncation always
  targets the latest audio item, and emits a single bracketing
  `TTSStartedFrame`/`TTSStoppedFrame` pair per assistant turn (the Stopped is
  now pushed on `response.done`).
  (PR [#4465](https://github.com/pipecat-ai/pipecat/pull/4465))

- Fixed missing `output` attribute on LLM OpenTelemetry spans when the LLM call
  is interrupted mid-stream.
  (PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))

- Fixed incorrect `metrics.ttfb` on STT OpenTelemetry spans, and parented them
  to the current turn span.
  (PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))

- Fixed incorrect `metrics.ttfb` on TTS OpenTelemetry spans for streaming
  services.
  (PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))

- Extended the `cancel_on_interruption=False` regression fix to
  `InworldRealtimeLLMService`. Uses the same approach as in #4441 (the service
  detects async-tool messages in the LLM context and routes the final result to
  its formal tool-result channel). Note: as of this writing, Inworld Realtime
  doesn't appear to handle the resulting delayed tool result reliably — the
  routing is best-effort and the service surfaces a one-time warning when
  async-tool messages are seen. Streamed intermediate results
  (`FunctionCallResultProperties(is_final=False)`) are still not supported on
  this realtime service. (Inworld was excluded from #4447 pending resolution of
  an unrelated tool-calling issue, which turned out to be an account-level
  matter.)
  (PR [#4474](https://github.com/pipecat-ai/pipecat/pull/4474))

- Fixed Cartesia TTS Korean word timestamps to use normal spacing rules,
  preserving word boundaries and per-word timestamp alignment during downstream
  aggregation.
  (PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))

- Fixed Cartesia TTS Chinese and Japanese timestamp grouping to preserve
  provider text spacing, avoiding artificial spaces when timestamp groups are
  reassembled downstream.
  (PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))

- Fixed `SonioxSTTService` final transcription frames missing detected language
  metadata when Soniox returns token-level language annotations.
  (PR [#4482](https://github.com/pipecat-ai/pipecat/pull/4482))

- Fixed Soniox final transcription language detection to use the most common
  recognized token language, avoiding mislabeling an utterance when the last
  token is tagged with a different language.
  (PR [#4495](https://github.com/pipecat-ai/pipecat/pull/4495))

- Fixed dropped audio in streaming TTS services whose wire protocol doesn't
  echo `context_id` back on incoming audio (Sarvam, Smallest, Soniox, Inworld,
  and others). Previously, audio that arrived between contexts or at the very
  start of a turn was tagged with `context_id=None` and silently dropped with
  an "unable to append audio to context: no context ID provided" debug log.
  `TTSService.get_active_audio_context_id()` now falls back to the
  synthesis-side `_turn_context_id` when the playback cursor isn't set yet.
  (PR [#4497](https://github.com/pipecat-ai/pipecat/pull/4497))

### Security

- Fixed a path traversal issue in the development runner's
  `/files/{filename:path}` download endpoint. Previously, when the runner was
  started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could
  escape the configured folder because `%2F`-encoded separators bypassed
  Starlette's path normalisation. The endpoint now resolves the joined path and
  rejects any filename that escapes the allowed base with a 403, and also
  returns 404 (instead of an implicit `null` 200) when `--folder` is unset.
  (PR [#4417](https://github.com/pipecat-ai/pipecat/pull/4417))

## [1.1.0] - 2026-04-27

### Added

- Added `MistralSTTService` for real-time speech-to-text using Mistral's
  Voxtral Realtime API (`voxtral-mini-transcribe-realtime-2602`). Supports
  streaming transcription with interim results, automatic language detection,
  and VAD-driven utterance lifecycle.
  (PR [#4253](https://github.com/pipecat-ai/pipecat/pull/4253))

- Added `buttons` field to `OutputDTMFFrame` and `OutputDTMFUrgentFrame` for
  sending multi-key DTMF sequences as a `list[KeypadEntry]`. Use
  `OutputDTMFFrame.from_string("123#")` (or the equivalent on
  `OutputDTMFUrgentFrame`) to build one from a dial string, and `to_string()`
  to convert back.
  (PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))

- Added `DailyTransport.send_dtmf()` to expose the Daily call client's DTMF
  sending capability, enabling applications to send tones during a call (e.g.
  IVR navigation).
  (PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))

- Added `DailyOutputDTMFFrame` and `DailyOutputDTMFUrgentFrame` frames. In
  addition to the inherited `buttons`, they accept `session_id`,
  `digit_duration_ms` and `method`, which are forwarded to Daily's `send_dtmf`
  as `sessionId`, `digitDurationMs` and `method`.
  (PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))

- Added incremental `pyright` type checking. A `pyrightconfig.json` at the repo
  root uses `typeCheckingMode: "basic"` with an explicit `include` list of
  modules that pass cleanly (`clocks`, `metrics`, `transcriptions`, `frames`,
  `observers`, `extensions`, `turns`, `pipeline`, `runner`). Remaining modules
  will be added in subsequent PRs. CI enforces the checked set via `uv run
  pyright` in the format workflow.
  (PR [#4324](https://github.com/pipecat-ai/pipecat/pull/4324))

- Added multilingual support to `DeepgramFluxSTTService` via a new
  `language_hints: list[Language]` setting. Works with Deepgram's new
  `flux-general-multi` model to bias transcription across English, Spanish,
  French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch.
  Omit the hints to use auto-detection, or pass a subset to bias toward
  expected languages. Hints can be updated mid-stream via
  `STTUpdateSettingsFrame` (sent as a Deepgram `Configure` control message, no
  reconnect) to support detect-then-lock flows.
  (PR [#4326](https://github.com/pipecat-ai/pipecat/pull/4326))

- Added fine-grained server-side VAD tuning options to
  `SarvamSTTService.Settings` for the `saaras:v3` model, including speech
  thresholds, frame-count controls, pre-speech padding, interruption
  sensitivity, and initial-frame skipping.
  (PR [#4334](https://github.com/pipecat-ai/pipecat/pull/4334))

- Added `XAISTTService` for real-time speech-to-text using xAI's voice STT
  WebSocket API (`wss://api.x.ai/v1/stt`). Streams raw audio (PCM, µ-law, or
  A-law) and emits interim and final transcription frames driven by the
  server's `is_final` / `speech_final` flags. Settings expose
  `interim_results`, `endpointing`, `language`, `multichannel`, `channels`, and
  `diarize`. Requires the `xai` optional extra (`pip install
  "pipecat-ai[xai]"`).
  (PR [#4340](https://github.com/pipecat-ai/pipecat/pull/4340))

- Added `XAITTSService` for streaming text-to-speech using xAI's WebSocket TTS
  endpoint (`wss://api.x.ai/v1/tts`). Streams `text.delta` chunks up and base64
  `audio.delta` chunks down on the same connection so audio begins flowing
  before the full utterance finishes synthesizing; complements the batch-HTTP
  `XAIHttpTTSService`. Defaults to raw PCM output so `TTSAudioRawFrame` needs
  no decoding. The `xai` optional extra now pulls in
  `pipecat-ai[websockets-base]`.
  (PR [#4341](https://github.com/pipecat-ai/pipecat/pull/4341))

- Added `SonioxTTSService`, a real-time WebSocket TTS service that streams text
  in and audio out over a persistent connection. Install with `pip install
  "pipecat-ai[soniox]"`.
  (PR [#4360](https://github.com/pipecat-ai/pipecat/pull/4360))

- Added support for Daily's built-in `screenVideo` destination in
  `DailyTransport`. When `"screenVideo"` is included in
  `video_out_destinations` transport parameter, a dedicated screen video track
  is created at join time and frames with `transport_destination="screenVideo"`
  are routed to it.

    ```python
    params = DailyParams(
          video_out_enabled=True,
          video_out_is_live=True,
          video_out_width=1280,
          video_out_height=720,
          video_out_destinations=["screenVideo"]
    )

    ...

    frame = OutputImageRawFrame(...)
    frame.transport_destination = "screenVideo"
    ```
  (PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))

- Added `camera_out_send_settings` to `DailyParams`. This dict is passed
  verbatim to the Daily client's camera publishing settings, allowing
  applications to fully control encoding, codec, bitrate, and framerate.

    ```python
    params = DailyParams(
        camera_out_send_settings={
            "maxQuality": "high",
            "encodings": {
                "high": {"maxBitrate": 2_000_000, "maxFramerate": 30}
            },
        },
    )
    ```
  (PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))

- Added `tool_resources` to `PipelineTask` and `FunctionCallParams`. Pass an
  application-defined object (DB handles, clients, state, etc.) to
  `PipelineTask(..., tool_resources=...)` and access it from any tool handler
  via `params.tool_resources`. Passed by reference; the caller retains their
  handle and can read mutations after the task finishes. Resolves #4256.
  (PR [#4371](https://github.com/pipecat-ai/pipecat/pull/4371))

### Changed

- Updated NVIDIA STT services to align with Nemotron Speech defaults and
  configuration: `api_key` is now optional for local deployments, additional
  recognition settings are available (including alternatives, word offsets, and
  diarization), and streaming/segmented docs now reflect Nemotron Speech APIs.
  - NVIDIA streaming STT now sets `TranscriptionFrame.finalized=True` when the
  provider marks a result as final, and preserves `language` on both
  `TranscriptionFrame` and `InterimTranscriptionFrame`.
  (PR [#4269](https://github.com/pipecat-ai/pipecat/pull/4269))

- Updated `NvidiaLLMService` to emit model reasoning as `LLMThought*Frame`s
  (from both `reasoning_content` and `<think>...</think>` output), avoid mixing
  reasoning text into normal assistant content, and allow keyless local NIM
  endpoints while warning when the cloud endpoint is used without an API key.
  (PR [#4270](https://github.com/pipecat-ai/pipecat/pull/4270))

- STT services now reconnect safely when settings change: reconnection is
  deferred until the current user turn ends (i.e., until
  `UserStoppedSpeakingFrame` is received) rather than interrupting an active
  speech session. Audio frames received while the reconnect is in progress are
  buffered and replayed once the new connection is ready. `CartesiaSTTService`
  and `DeepgramSTTService` both use this new behavior.
  (PR [#4311](https://github.com/pipecat-ai/pipecat/pull/4311))

- Reduced debug log noise for LLM services. The system instruction is now
  logged once when composed (e.g. when turn completion is enabled) instead of
  on every LLM call. Per-call logs now show only the conversation messages,
  consistent across Google, Anthropic, AWS, and OpenAI services.
  (PR [#4314](https://github.com/pipecat-ai/pipecat/pull/4314))

- `LiveKitRunnerArguments.token` is now a required `str` (previously `str |
  None` with a default of `None`). LiveKit requires a token to join a room, so
  the type now reflects reality. This only affects custom runners that
  construct `LiveKitRunnerArguments` directly; code consuming the argument from
  the standard runner is unaffected.
  (PR [#4324](https://github.com/pipecat-ai/pipecat/pull/4324))

- `TranscriptionFrame.language` and `InterimTranscriptionFrame.language`
  emitted by `DeepgramFluxSTTService` now reflect the language Deepgram
  detected for each turn (read from the `languages` field on Flux's `TurnInfo`
  event). On `flux-general-multi` this gives per-turn accuracy for downstream
  consumers (e.g. TTS voice selection). `flux-general-en` continues to emit
  `Language.EN`.
  (PR [#4326](https://github.com/pipecat-ai/pipecat/pull/4326))

- Added `includes_inter_frame_spaces` parameter to
  `TTSService.add_word_timestamps` and `_add_word_timestamps` (default `None`).
  When `True`, downstream consumers will not inject additional spaces between
  tokens; `None` leaves each frame's own default unchanged.
  - `InworldTTSService` now passes `includes_inter_frame_spaces=True` when
  reporting word timestamps, since Inworld tokens already include inter-word
  spacing.
  (PR [#4330](https://github.com/pipecat-ai/pipecat/pull/4330))

- `SarvamSTTService` now uses `saaras:v3` as its default model instead of
  `saarika:v2.5`. Applications that relied on the previous default should set
  `settings=SarvamSTTService.Settings(model="saarika:v2.5")` explicitly.
  (PR [#4334](https://github.com/pipecat-ai/pipecat/pull/4334))

- `SpeechTimeoutUserTurnStopStrategy` now waits only `user_speech_timeout` when
  a transcript arrives without a VAD stop event, rather than
  `max(ttfs_p99_latency, user_speech_timeout)`. If you had `ttfs_p99_latency >
  user_speech_timeout`, turn detection in that path is slightly faster than
  before.
  (PR [#4337](https://github.com/pipecat-ai/pipecat/pull/4337))

- If you use an STT service that emits finalized transcripts (Speechmatics,
  Soniox, Deepgram Flux, AssemblyAI) with `SpeechTimeoutUserTurnStopStrategy`,
  user turns now end as soon as `user_speech_timeout` elapses after VAD stop.
  Previously the strategy also waited for the STT P99 latency
  (`ttfs_p99_latency`) even when the transcript was already marked final.
  `user_speech_timeout` is still honored as a floor — STT finalization never
  shortens it.
  (PR [#4337](https://github.com/pipecat-ai/pipecat/pull/4337))

- ⚠️ `PlivoFrameSerializer` and `TelnyxFrameSerializer` now raise `ValueError`
  at construction when `auto_hang_up=True` (the default) but required
  credentials are missing, matching `TwilioFrameSerializer`. Previously they
  constructed successfully and the hangup failed silently at call-end, leaving
  phantom billable sessions on the provider. If you relied on the old silent
  behavior, pass `auto_hang_up=False` explicitly or provide the credentials.
  The specific fields checked are `call_id`/`auth_id`/`auth_token` for Plivo
  and `call_control_id`/`api_key` for Telnyx.
  (PR [#4349](https://github.com/pipecat-ai/pipecat/pull/4349))

- `ToolsSchema(standard_tools=...)` now accepts any `Sequence[FunctionSchema |
  DirectFunction]` rather than requiring an exact `list` of the union. Callers
  can pass a narrower `list[FunctionSchema]` (or any other `Sequence`) without
  the type checker complaining about list invariance.
  (PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))

- Updated `aic-sdk` dependency to `~=2.2.0`. The `AIC_LICENSE_KEY` environment
  variable replaces the previous `AICOUSTICS_LICENSE_KEY`.
  (PR [#4362](https://github.com/pipecat-ai/pipecat/pull/4362))

- Loosened the `protobuf` dependency to `>=5.29.6,<7`, so projects pinned to
  protobuf 5.x can install `pipecat-ai` again. The previous `>=6.31.1,<7` pin
  (introduced in 1.0.8 alongside the `nvidia-riva-client 2.25.1` upgrade)
  silently blocked any environment whose dependency graph already constrained
  protobuf to the 5.x line. The bundled `frames_pb2.py` is now compiled with
  protoc 5.x so it imports cleanly on both 5.x and 6.x runtimes.

  Installing the `nvidia` extra still pulls protobuf 6.x: `nvidia-riva-client
  2.25.1` ships gencode that requires a 6.x runtime, so `pipecat-ai[nvidia]`
  now declares `protobuf>=6.31.1,<7` explicitly to cover an upstream packaging
  gap (https://github.com/nvidia-riva/python-clients/issues/172).
  (PR [#4372](https://github.com/pipecat-ai/pipecat/pull/4372))

- Daily rooms created by the development runner (`pipecat.runner.run`) now
  expire after 4 hours with `eject_at_room_exp=True`, mirroring Pipecat Cloud's
  max session limit. Previously, runner-created rooms inherited a 2-hour
  expiration on the default code paths and had no expiration at all when
  callers posted partial `dailyRoomProperties` (e.g. `{"start_video_off":
  true}`) to `/start`, causing rooms to accumulate indefinitely. Explicit `exp`
  and `eject_at_room_exp` values in `dailyRoomProperties` are still respected.
  (PR [#4374](https://github.com/pipecat-ai/pipecat/pull/4374))

- Updated `daily-python` dependency to `~=0.28.0`.
  (PR [#4379](https://github.com/pipecat-ai/pipecat/pull/4379))

### Deprecated

- Deprecated `TransportParams.video_out_bitrate` for the Daily transport. Use
  `DailyParams.camera_out_send_settings` instead to configure camera publishing
  encodings (bitrate, framerate, codec, etc.).
  (PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))

### Fixed

- Fixed missing tool handlers so unregistered tool calls fail with a normal
  final tool result instead of leaving tool-call state hanging.
  (PR [#4301](https://github.com/pipecat-ai/pipecat/pull/4301))

- Fixed `pipecat-ai[tavus]` not installing the required `daily-python`
  dependency. Installing the `tavus` extra now correctly pulls in
  `pipecat-ai[daily]`.
  (PR [#4304](https://github.com/pipecat-ai/pipecat/pull/4304))

- Fixed audio loss and potential errors when STT settings were updated
  mid-speech. Previously, `CartesiaSTTService` and `DeepgramSTTService` would
  immediately disconnect and reconnect when settings changed, dropping any
  in-flight audio. Reconnection is now deferred until the user stops speaking,
  and audio arriving during the reconnect window is buffered and replayed.
(PR [#4311](https://github.com/pipecat-ai/pipecat/pull/4311))

- Fixed `SmallestTTSService` WebSocket endpoint URL to match Smallest AI v4.0.0
  API (`wss://waves-api.smallest.ai` → `wss://api.smallest.ai`) and restored
  keepalive using a silent space message instead of the unsupported flush
  command.
  (PR [#4320](https://github.com/pipecat-ai/pipecat/pull/4320))

- Fixed whitespace handling in TTS token streaming mode. Inter-token whitespace
  (e.g., spaces between words) is now preserved for correct prosody, while
  leading whitespace before the first non-whitespace token is still stripped to
  avoid issues with TTS models that are sensitive to leading spaces.
  (PR [#4323](https://github.com/pipecat-ai/pipecat/pull/4323))

- Fixed `SentryMetrics` silently dropping `MetricsFrame`s from
  `stop_ttfb_metrics` and `stop_processing_metrics`. `SentryMetrics` called the
  base `FrameProcessorMetrics` implementation but discarded its return value,
  so `FrameProcessor` never pushed the `MetricsFrame` downstream. This
  prevented observers (e.g. `UserBotLatencyObserver`, `MetricsLogObserver`)
  from seeing TTFB and processing metrics for any service using
  `metrics=SentryMetrics()`. The metrics were still calculated and Sentry
  transactions still completed — only the downstream frame push was affected.
  (PR [#4325](https://github.com/pipecat-ai/pipecat/pull/4325))

- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` emitting word
  timestamps and `TTSTextFrame` content that matched the input text instead of
  the spoken audio when a pronunciation dictionary
  (`pronunciation_dictionary_locators`) or text normalization rewrote the
  input. Both services now consume ElevenLabs' normalized alignment, so
  downstream consumers (captions, transcripts, context aggregation) reflect
  what the listener actually hears.
  (PR [#4344](https://github.com/pipecat-ai/pipecat/pull/4344))

- Fixed a crash in `DeepgramSTTService` when an `STTUpdateSettingsFrame`
  arrived before the WebSocket handshake completed (for example, when pushing
  an update upstream on `StartFrame`). The settings-triggered reconnect
  cancelled the in-flight connection task before its keepalive task was
  created, causing an `UnboundLocalError: cannot access local variable
  'keepalive_task'` in the handler's `finally` block.
  (PR [#4347](https://github.com/pipecat-ai/pipecat/pull/4347))

- Fixed direct-function registration crashing for functions without a
  docstring. `DirectFunctionWrapper` passed `inspect.getdoc()`'s result to
  `docstring_parser.parse()`, which raises when the docstring is `None`.
  Functions now register cleanly whether or not they have a docstring; an empty
  docstring produces empty description and parameter metadata as expected.
  (PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))

- Fixed `AssemblyAISTTService`, `CartesiaSTTService`, `GradiumSTTService`, and
  `SonioxSTTService` crashing the pipeline on transient WebSocket send
  failures. Each `run_stt` sent audio directly without catching errors, so a
  single network hiccup mid-stream raised an uncaught exception through
  `process_frame`. The guards now log a warning and let the connection-state
  check on the next call handle recovery, matching the pattern used by
  Deepgram, xAI, Azure, and other push-based STTs.
  (PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))

- Fixed Gemini Live losing conversation history in the (rare) case of a
  WebSocket reconnect before any session resumption handle is received. When
  the session reconnects (e.g. on system instruction change), conversation
  history is now re-seeded into the new session before it is marked ready for
  input.
  (PR [#4355](https://github.com/pipecat-ai/pipecat/pull/4355))

- Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte
  MTU (IPv6, Tailscale overlays, many consumer VPNs). aiortc's default SCTP
  chunk size of 1200 bytes produces ~1305-byte UDP datagrams after headers,
  which the kernel rejects with EMSGSIZE; aiortc has no path-MTU discovery so
  it retransmits forever at the same oversized size. The chunk size is now
  clamped to 1100 bytes (~1205-byte datagrams, ~75 bytes of slack). Override
  with `PIPECAT_SCTP_MAX_CHUNK_SIZE` if your path MTU requires a different
  value.
  (PR [#4358](https://github.com/pipecat-ai/pipecat/pull/4358))

## [1.0.0] - 2026-04-14

Migration guide: https://docs.pipecat.ai/pipecat/migration/migration-1.0

### Added

- Updated LemonSlice transport:
    - Added `on_avatar_connected` and `on_avatar_disconnected` events triggered
      when the avatar joins and leaves the room.
    - Added `api_url` parameter to `LemonSliceNewSessionRequest` to allow
      overriding the LemonSlice API endpoint.
    - Added support for passing arbitrary named parameters to the LemonSlice
      API endpoint.
  (PR [#3995](https://github.com/pipecat-ai/pipecat/pull/3995))

- Added Inworld Realtime LLM service with WebSocket-based cascade STT/LLM/TTS,
  semantic VAD, function calling, and Router support.
  (PR [#4140](https://github.com/pipecat-ai/pipecat/pull/4140))

- ⚠️ Added WebSocket-based `OpenAIResponsesLLMService` as the new default for
  the OpenAI Responses API. It maintains a persistent connection to
  `wss://api.openai.com/v1/responses` and automatically uses
  `previous_response_id` to send only incremental context, falling back to full
  context on reconnection or cache miss. The previous HTTP-based implementation
  is now available as `OpenAIResponsesHttpLLMService`.
  (PR [#4141](https://github.com/pipecat-ai/pipecat/pull/4141))

- Added `group_parallel_tools` parameter to `LLMService` (default `True`). When
  `True`, all function calls from the same LLM response batch share a group ID
  and the LLM is triggered exactly once after the last call completes. Set to
  `False` to trigger inference independently for each function call result as
  it arrives.
  (PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))

- Added async function call support to `register_function()` and
  `register_direct_function()` via `cancel_on_interruption=False`. When set to
  `False`, the LLM continues the conversation immediately without waiting for
  the function result. The result is injected back into the context as a
  `developer` message once available, triggering a new LLM inference at that
  point.
  (PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))

- Added `enable_prompt_caching` setting to `AWSBedrockLLMService` for Bedrock
  ConverseStream prompt caching.
  (PR [#4219](https://github.com/pipecat-ai/pipecat/pull/4219))

- Added support for streaming intermediate results from async function calls.
  Call `result_callback` multiple times with
  `properties=FunctionCallResultProperties(is_final=False)` to push incremental
  updates, then call it once more (with `is_final=True`, the default) to
  deliver the final result. Only valid for functions registered with
  `cancel_on_interruption=False`.
  (PR [#4230](https://github.com/pipecat-ai/pipecat/pull/4230))

- Added `LLMMessagesTransformFrame` to facilitate programmatically editing
  context in a frame-based way.

  The previous approach required the caller to directly grab a reference to
  the context object, grab a "snapshot" of its messages _at that point in
  time_, transform the messages, and then push an `LLMMessagesUpdateFrame` with
  the transformed messages. This approach can lead to problems: what if there
  had already been a change to the context queued in the pipeline? The
  transformed messages would simply overwrite it without consideration.
  (PR [#4231](https://github.com/pipecat-ai/pipecat/pull/4231))

- The development runner now exports a module-level `app` FastAPI instance
  (`from pipecat.runner.run import app`) so you can register custom routes
  before calling `main()`.
  (PR [#4234](https://github.com/pipecat-ai/pipecat/pull/4234))

- `ToolsSchema` now accepts `custom_tools` for OpenAI LLM services
  (`OpenAILLMService`, `OpenAIResponsesLLMService`,
  `OpenAIResponsesHttpLLMService`, and `OpenAIRealtimeLLMService`), letting you
  pass provider-specific tools like `tool_search` alongside standard function
  tools.
  (PR [#4248](https://github.com/pipecat-ai/pipecat/pull/4248))

- Added enhancements to `NvidiaTTSService`:

  - Cross-sentence stitching: multiple sentences within an LLM turn are fed
    into a single `SynthesizeOnline` gRPC stream for seamless audio across
    sentence boundaries (requires Magpie TTS model v1.7.0+).
  - `custom_dictionary` and `encoding` parameters for IPA-based custom
    pronunciation and output audio encoding.
  - Metrics generation (`can_generate_metrics` returns true) and
    `stop_all_metrics()` when an audio context is interrupted.
  - gRPC error handling around synthesis config retrieval
    (`GetRivaSynthesisConfig`).
  (PR [#4249](https://github.com/pipecat-ai/pipecat/pull/4249))

- Added `MistralTTSService` for streaming text-to-speech using Mistral's
  Voxtral TTS API (`voxtral-mini-tts-2603`). Supports SSE-based audio streaming
  with automatic resampling from the API's native 24kHz to any requested sample
  rate. Requires the `mistral` optional extra (`pip install
  pipecat-ai[mistral]`).
  (PR [#4251](https://github.com/pipecat-ai/pipecat/pull/4251))

- Added `truncate_large_values` parameter to `LLMContext.get_messages()`. When
  `True`, returns compact deep copies of messages with binary data (base64
  images, audio) replaced by short placeholders and long string values in
  LLM-specific messages recursively truncated. Useful for serialization,
  logging, and debugging tools.
  (PR [#4272](https://github.com/pipecat-ai/pipecat/pull/4272))

- `CartesiaSTTService` now supports runtime settings updates (e.g. changing
  `language` or `model` via `STTUpdateSettingsFrame`). The service
  automatically reconnects with the new parameters. Previously, settings
  updates were silently ignored.
  (PR [#4282](https://github.com/pipecat-ai/pipecat/pull/4282))

- Added `pcm_32000` and `pcm_48000` sample rate support to ElevenLabs TTS
  services.
  (PR [#4293](https://github.com/pipecat-ai/pipecat/pull/4293))

- Added `enable_logging` parameter to `ElevenLabsHttpTTSService`. Set to
  `False` to enable zero retention mode (enterprise only).
  (PR [#4293](https://github.com/pipecat-ai/pipecat/pull/4293))

### Changed

- Updated `onnxruntime` from 1.23.2 to 1.24.3, adding support for Python 3.14.
  (PR [#3984](https://github.com/pipecat-ai/pipecat/pull/3984))

- MCPClient now requires async with MCPClient(...) as mcp: or explicit
  start()/close() calls to manage the connection lifecycle.
  (PR [#4034](https://github.com/pipecat-ai/pipecat/pull/4034))

- ⚠️ Updated `langchain` extra to require langchain 1.x (from 0.3.x),
  langchain-community 0.4.x (from 0.3.x), and langchain-openai 1.x (from
  0.3.x). If you pin these packages in your project, update your pins
  accordingly.
  (PR [#4192](https://github.com/pipecat-ai/pipecat/pull/4192))

- `WebsocketService` reconnection errors are now non-fatal. When a websocket
  service exhausts its reconnection attempts (either via exponential backoff or
  quick failure detection), it emits a non-fatal `ErrorFrame` instead of a
  fatal one. This allows application-level failover (e.g. `ServiceSwitcher`) to
  handle the failure instead of killing the entire pipeline.
  (PR [#4201](https://github.com/pipecat-ai/pipecat/pull/4201))

- Changed `GrokLLMService` default model from `grok-3-beta` to `grok-3`, now
  that the model is generally available.
  (PR [#4209](https://github.com/pipecat-ai/pipecat/pull/4209))

- `GoogleImageGenService` now defaults to `imagen-4.0-generate-001` (previously
  `imagen-3.0-generate-002`).
  (PR [#4213](https://github.com/pipecat-ai/pipecat/pull/4213))

- ⚠️ `BaseOpenAILLMService.get_chat_completions()` now accepts an `LLMContext`
  instead of `OpenAILLMInvocationParams`. If you override this method, update
  your signature accordingly.
  (PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))

- When multiple function calls are returned in a single LLM response, by
  default (when `group_parallel_tools=True`) the LLM is now triggered exactly
  once after the last call in the batch completes, rather than waiting for all
  function calls.
  (PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))

- ⚠️ `LLMService.function_call_timeout_secs` now defaults to `None` instead of
  `10.0`. Deferred function calls will run indefinitely unless a timeout is
  explicitly set at the service level or per-call. If you relied on the
  previous 10-second default, pass `function_call_timeout_secs=10.0`
  explicitly.
  (PR [#4224](https://github.com/pipecat-ai/pipecat/pull/4224))

- Updated `NvidiaTTSService`:

  - Made `api_key` optional for local NIM deployments.
  - Voice, language, and quality can be updated without reconnecting the gRPC
    client; new values take effect on the next synthesis turn, not for the
    current turn's in-flight requests.
  - Replaced per-sentence synchronous `synthesize_online` calls with async
    queue-backed gRPC streaming.
  - Streaming now uses asyncio tasks with explicit gRPC cancellation on
    interruption and stale-response filtering when a stream is aborted or
    replaced.
  - Renamed Riva references to Nemotron Speech in docs and messages.
  - Disabled automatic TTS start frames at the service level
    (`push_start_frame=False`) and emit `TTSStartedFrame` when a stitched
    synthesis stream is started for a context.
  (PR [#4249](https://github.com/pipecat-ai/pipecat/pull/4249))

### Removed

- ⚠️ Removed `OpenPipeLLMService` and the `openpipe` extra. OpenPipe was
  acquired by CoreWeave and the package is no longer maintained. If you were
  using `openpipe` as an LLM provider, switch to the underlying provider
  directly (e.g. `openai`). The OpenPipe interface can still be used with
  `OpenAILLMService` by specifying a `base_url`.
  (PR [#4191](https://github.com/pipecat-ai/pipecat/pull/4191))

- ⚠️ Removed `NoisereduceFilter`. Use system-level noise reduction or a
  service-based alternative instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated `vad_enabled` and `vad_audio_passthrough` transport
  params.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated `camera_in_enabled`, `camera_in_is_live`,
  `camera_in_width`, `camera_in_height`, `camera_out_enabled`,
  `camera_out_is_live`, `camera_out_width`, `camera_out_height`, and
  `camera_out_color` transport params. Use the `video_in_*` and `video_out_*`
  equivalents instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `FrameProcessor.wait_for_task()`. Use `create_task()` and manage
  tasks with the built-in `TaskManager` instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated transport frames: `TransportMessageFrame`,
  `TransportMessageUrgentFrame`, `InputTransportMessageUrgentFrame`,
  `DailyTransportMessageFrame`, and `DailyTransportMessageUrgentFrame`. Use
  `OutputTransportMessageFrame`, `OutputTransportMessageUrgentFrame`,
  `InputTransportMessageFrame`, `DailyOutputTransportMessageFrame`, and
  `DailyOutputTransportMessageUrgentFrame` instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `create_default_resampler()` from `pipecat.audio.utils`.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `DailyRunner.configure_with_args()`. Use `PipelineRunner` with
  `RunnerArguments` instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated `on_pipeline_ended`, `on_pipeline_cancelled`, and
  `on_pipeline_stopped` events from `PipelineTask`. Use `on_pipeline_finished`
  instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed single-argument function call support from `LLMService`. Functions
  must use named parameters instead of a single `arguments` parameter.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `FalSmartTurnAnalyzer` and `LocalSmartTurnAnalyzer`.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `RTVIObserver.errors_enabled` parameter.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated RTVI models, frames, and processor methods including
  `RTVIConfig`, `RTVIServiceConfig`, `RTVIServiceOptionConfig`, various
  `RTVI*Data` models, `RTVIActionFrame`, and
  `RTVIProcessor.handle_function_call`/`handle_function_call_start`. Use the
  updated RTVI processor API instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated `KeypadEntryFrame` alias.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated interruption frames: `StartInterruptionFrame` and
  `BotInterruptionFrame`. Use `InterruptionFrame` and `InterruptionTaskFrame`
  instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `LLMService.request_image_frame()`. Push a `UserImageRequestFrame`
  instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `TTSService.say()`. Push a `TTSSpeakFrame` into the pipeline
  instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `KrispFilter`. The `krisp` extra has been removed from
  `pyproject.toml`.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `AudioBufferProcessor.user_continuous_stream` parameter. Use
  `user_audio_passthrough` instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed `LLMService.start_callback` parameter. Register an
  `on_llm_response_start` event handler instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated `observers` field from `PipelineParams`. Pass observers
  directly to `PipelineTask` constructor instead.
  (PR [#4204](https://github.com/pipecat-ai/pipecat/pull/4204))

- ⚠️ Removed deprecated `pipecat.services.openai_realtime` package. Use
  `pipecat.services.openai.realtime` instead.
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `pipecat.services.google.llm_vertex` module. Use
  `pipecat.services.google.vertex.llm` instead.
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `GoogleLLMOpenAIBetaService` from
  `pipecat.services.google.openai`. Use `GoogleLLMService` from
  `pipecat.services.google.llm` instead.
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `OpenAIRealtimeBetaLLMService` and
  `AzureRealtimeBetaLLMService`. Use `OpenAIRealtimeLLMService` and
  `AzureRealtimeLLMService` from `pipecat.services.openai.realtime` and
  `pipecat.services.azure.realtime` instead.
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `pipecat.services.ai_services` module. Import from
  `pipecat.services.ai_service`, `pipecat.services.llm_service`,
  `pipecat.services.stt_service`, `pipecat.services.tts_service`, etc. instead.
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `pipecat.services.gemini_multimodal_live` package. Use
  `pipecat.services.google.gemini_live` instead. Note that class names no
  longer include "Multimodal" (e.g. `GeminiMultimodalLiveLLMService` →
  `GeminiLiveLLMService`).
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `pipecat.services.google.gemini_live.llm_vertex`
  module. Use `pipecat.services.google.gemini_live.vertex.llm` instead.
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `pipecat.services.nim` package. Use
  `pipecat.services.nvidia.llm` instead (`NimLLMService` → `NvidiaLLMService`).
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `pipecat.services.deepgram.stt_sagemaker` and
  `pipecat.services.deepgram.tts_sagemaker` modules. Use
  `pipecat.services.deepgram.sagemaker.stt` and
  `pipecat.services.deepgram.sagemaker.tts` instead.
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `pipecat.services.aws_nova_sonic` package. Use
  `pipecat.services.aws.nova_sonic` instead.
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated `pipecat.services.riva` package. Use
  `pipecat.services.nvidia.stt` and `pipecat.services.nvidia.tts` instead
  (`RivaSTTService` → `NvidiaSTTService`, `RivaTTSService` →
  `NvidiaTTSService`).
  (PR [#4208](https://github.com/pipecat-ai/pipecat/pull/4208))

- ⚠️ Removed deprecated compatibility modules:
  `pipecat.services.openai_realtime_beta` (use
  `pipecat.services.openai.realtime`),
  `pipecat.services.openai_realtime.context`,
  `pipecat.services.openai_realtime.frames`,
  `pipecat.services.openai.realtime.context`,
  `pipecat.services.openai.realtime.frames`,
  `pipecat.services.gemini_multimodal_live` (use
  `pipecat.services.google.gemini_live`),
  `pipecat.services.aws_nova_sonic.context` (use
  `pipecat.services.aws.nova_sonic`), `pipecat.services.google.openai` and
  `pipecat.services.google.llm_openai` (use `pipecat.services.google.llm`).
  (PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))

- ⚠️ Removed `VisionImageFrameAggregator` (from
  `pipecat.processors.aggregators.vision_image_frame`). Vision/image handling
  is now built into `LLMContext` (from
  `pipecat.processors.aggregators.llm_context`). See the `12*` examples for the
  recommended replacement pattern.
  (PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))

- ⚠️ Removed `OpenAILLMContext`, `OpenAILLMContextFrame`, and
  `OpenAILLMContext.from_messages()`. Use `LLMContext` (from
  `pipecat.processors.aggregators.llm_context`) and `LLMContextFrame` (from
  `pipecat.frames.frames`) instead. All services now exclusively use the
  universal `LLMContext`.

  From the developer's point of view, migrating will usually be a matter of
  going from this:

    ```python
    context = OpenAILLMContext(messages, tools)
    context_aggregator = llm.create_context_aggregator(context)
    ```

    To this:

    ```python
    from pipecat.processors.aggregators.llm_context import LLMContext
    from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair

    context = LLMContext(messages, tools)
    context_aggregator = LLMContextAggregatorPair(context)
    ```
  (PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))

- ⚠️ Removed deprecated frame types `LLMMessagesFrame` and
  `OpenAILLMContextAssistantTimestampFrame` from `pipecat.frames.frames`.
  Instead of `LLMMessagesFrame`, use `LLMContextFrame` with the new messages,
  or `LLMMessagesUpdateFrame` with `run_llm=True`.
  (PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))

- ⚠️ Removed `GatedOpenAILLMContextAggregator` (from
  `pipecat.processors.aggregators.gated_open_ai_llm_context`). Use
  `GatedLLMContextAggregator` (from
  `pipecat.processors.aggregators.gated_llm_context`) instead.
  (PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))

- ⚠️ Removed deprecated service-specific context and aggregator machinery,
  which was superseded by the universal `LLMContext` system.

  Service-specific classes removed: `AnthropicLLMContext`,
  `AnthropicContextAggregatorPair`, `AWSBedrockLLMContext`,
  `AWSBedrockContextAggregatorPair`, `OpenAIContextAggregatorPair`, and their
  user/assistant aggregators. Also removed `create_context_aggregator()` from
  `LLMService`, `OpenAILLMService`, `AnthropicLLMService`, and
  `AWSBedrockLLMService`.

  Base aggregator classes removed (from
  `pipecat.processors.aggregators.llm_response`): `BaseLLMResponseAggregator`,
  `LLMContextResponseAggregator`, `LLMUserContextAggregator`,
  `LLMAssistantContextAggregator`, `LLMUserResponseAggregator`,
  `LLMAssistantResponseAggregator`.

  From the developer's point of view, migrating will usually be a matter of
  going from this:

    ```python
    context = OpenAILLMContext(messages, tools)
    context_aggregator = llm.create_context_aggregator(context)
    ```

    To this:

    ```python
    from pipecat.processors.aggregators.llm_context import LLMContext
    from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair

    context = LLMContext(messages, tools)
    context_aggregator = LLMContextAggregatorPair(context)
    ```
  (PR [#4215](https://github.com/pipecat-ai/pipecat/pull/4215))

- ⚠️ Removed deprecated service parameters and shims that have been replaced by
  the `settings=Service.Settings(...)` pattern or direct `__init__` parameters:
    - `PollyTTSService` alias (use `AWSTTSService`)
    - `TTSService`: `text_aggregator`, `text_filter` init params
    - `AWSNovaSonicLLMService`: `send_transcription_frames` init param
    - `DeepgramSTTService`: `url` init param (use `base_url`)
    - `FishAudioTTSService`: `model` init param (use `reference_id` or
      `settings`)
    - `GladiaSTTService`: `language` and `confidence` from `GladiaInputParams`,
      `InputParams` class alias
    - `GeminiTTSService`: `api_key` init param
    - `GeminiLiveLLMService`: `base_url` init param (use `http_options`)
    - `GoogleVertexLLMService`: `InputParams` class with
      `location`/`project_id` fields (use direct init params); `project_id` is now
      required, `location` defaults to `"us-east4"`
    - `MiniMaxHttpTTSService`: `english_normalization` from `InputParams` (use
      `text_normalization`)
    - `SimliVideoService`: `simli_config` init param (use `api_key`/`face_id`),
      `use_turn_server` init param; `api_key` and `face_id` are now required
    - `AnthropicLLMService`: `enable_prompt_caching_beta` from `InputParams`
      (use `enable_prompt_caching`)
  (PR [#4220](https://github.com/pipecat-ai/pipecat/pull/4220))

- ⚠️ Removed deprecated `pipecat.transports.services` and
  `pipecat.transports.network` module aliases. Update imports to use
  `pipecat.transports.daily.transport`, `pipecat.transports.livekit.transport`,
  `pipecat.transports.websocket.*`, `pipecat.transports.webrtc.*`, and
  `pipecat.transports.daily.utils` respectively.
  (PR [#4225](https://github.com/pipecat-ai/pipecat/pull/4225))

- ⚠️ Removed deprecated `pipecat.sync` package. Use `pipecat.utils.sync`
  instead.
  (PR [#4225](https://github.com/pipecat-ai/pipecat/pull/4225))

- ⚠️ Removed deprecated `TranscriptionMessage`, `ThoughtTranscriptionMessage`,
  and `TranscriptionUpdateFrame` from `pipecat.frames.frames`.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `allow_interruptions` parameter from `PipelineParams`,
  `StartFrame`, and `FrameProcessor`. Interruptions are now always allowed by
  default. Use `LLMUserAggregator`'s `user_turn_strategies` /
  `user_mute_strategies` parameters to control interruption behavior.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `STTMuteFilter`, `STTMuteConfig`, and `STTMuteStrategy`
  from `pipecat.processors.filters.stt_mute_filter`. Use
  `pipecat.turns.user_mute` strategies with `LLMUserAggregator`'s
  `user_mute_strategies` parameter instead.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `pipecat.processors.transcript_processor` module
  (`TranscriptProcessor`, `TranscriptProcessorConfig`). Use pipeline observers
  instead.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `EmulateUserStartedSpeakingFrame` and
  `EmulateUserStoppedSpeakingFrame` frames, and the `emulated` field from
  `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame`.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `interruption_strategies` parameter from
  `PipelineParams`, `StartFrame`, and `FrameProcessor`. Use
  `LLMUserAggregator`'s `user_turn_strategies` parameter instead.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `pipecat.audio.interruptions` module
  (`BaseInterruptionStrategy`, `MinWordsInterruptionStrategy`). Use
  `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with
  `LLMUserAggregator`'s `user_turn_strategies` parameter instead.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `pipecat.utils.tracing.class_decorators` module. Use
  `pipecat.utils.tracing.service_decorators` instead.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `add_pattern_pair` method from `PatternPairAggregator`.
  Use `add_pattern` instead.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed deprecated `UserResponseAggregator` class from
  `pipecat.processors.aggregators.user_response`. Use `LLMUserAggregator`
  instead.
  (PR [#4228](https://github.com/pipecat-ai/pipecat/pull/4228))

- ⚠️ Removed `ExternalUserTurnStrategies` and the automatic fallback to it in
  `LLMUserAggregator` when a `SpeechControlParamsFrame` was received from the
  transport.
  (PR [#4229](https://github.com/pipecat-ai/pipecat/pull/4229))

- ⚠️ Removed `vad_analyzer` and `turn_analyzer` parameters from
  `TransportParams` and all transport input classes, along with all deprecated
  VAD/turn analysis logic in `BaseInputTransport`. VAD and turn detection are
  now handled entirely by `LLMUserAggregator`.
  (PR [#4229](https://github.com/pipecat-ai/pipecat/pull/4229))

- ⚠️ Removed deprecated `TranscriptionUserTurnStopStrategy` alias (deprecated
  in 0.0.102). Use `SpeechTimeoutUserTurnStopStrategy` instead.
  (PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))

- ⚠️ Removed deprecated `vad_events` setting and `should_interrupt` parameter
  from `DeepgramSTTService` (deprecated in 0.0.99). Use Silero VAD for voice
  activity detection instead.
  (PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))

- ⚠️ Removed deprecated `send_transcription_frames` parameter from
  `OpenAIRealtimeLLMService` (deprecated in 0.0.92). Transcription frames are
  always sent.
  (PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))

- ⚠️ Removed deprecated `UserIdleProcessor` (deprecated in 0.0.100). Use
  `LLMUserAggregator` with the `user_idle_timeout` parameter instead.
  (PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))

- ⚠️ Removed deprecated `UserBotLatencyLogObserver` (deprecated in 0.0.102).
  Use `UserBotLatencyObserver` with its `on_latency_measured` event handler
  instead.
  (PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))

- ⚠️ Removed the `riva` install extra. Use `nvidia` instead (`pip install
  "pipecat-ai[nvidia]"`).
  (PR [#4235](https://github.com/pipecat-ai/pipecat/pull/4235))

- Removed the empty `remote-smart-turn` install extra (was already a no-op).
  (PR [#4235](https://github.com/pipecat-ai/pipecat/pull/4235))

- ⚠️ Removed `DeprecatedModuleProxy` and all service `__init__.py` re-export
  shims. Flat imports like `from pipecat.services.openai import
  OpenAILLMService` no longer work. Use the full submodule path instead: `from
  pipecat.services.openai.llm import OpenAILLMService`. This is already the
  established pattern across all examples and internal code.
  (PR [#4239](https://github.com/pipecat-ai/pipecat/pull/4239))

- ⚠️ Removed deprecated `PIPECAT_OBSERVER_FILES` environment variable support.
  Use `PIPECAT_SETUP_FILES` instead.
  (PR [#4267](https://github.com/pipecat-ai/pipecat/pull/4267))

### Fixed

- Fixed `IdleFrameProcessor` where `asyncio.Event` was unconditionally cleared
  in a `finally` block instead of only on the success path.
  (PR [#3796](https://github.com/pipecat-ai/pipecat/pull/3796))

- Fixed MCPClient opening a new connection for every tool call instead of
  reusing the session.
  (PR [#4034](https://github.com/pipecat-ai/pipecat/pull/4034))

- GoogleLLMService now applies a low-latency thinking default
  (`thinking_level="minimal"`) for Gemini 3+ Flash models.
  (PR [#4067](https://github.com/pipecat-ai/pipecat/pull/4067))

- Fixed `WebsocketService` entering an infinite reconnection loop when a server
  accepts the WebSocket handshake but immediately closes the connection (e.g.
  invalid API key, close code 1008). The service now detects connections that
  fail repeatedly within seconds of being established and stops retrying after
  3 consecutive quick failures.
  (PR [#4201](https://github.com/pipecat-ai/pipecat/pull/4201))

- Fixed `InworldHttpTTSService` streaming responses crashing with
  `UnicodeDecodeError` when multi-byte UTF-8 characters were split across chunk
  boundaries. This caused TTS audio to cut off mid-sentence intermittently.
  (PR [#4202](https://github.com/pipecat-ai/pipecat/pull/4202))

- Fixed a crash (`JSONDecodeError`) when a user interruption occurs while the
  LLM is streaming function call arguments. Previously, the incomplete JSON
  arguments were passed directly to `json.loads()`, causing an unhandled
  exception. Affected services: OpenAI, Google (OpenAI-compatible), and
  SambaNova.
  (PR [#4203](https://github.com/pipecat-ai/pipecat/pull/4203))

- Fixed `BaseOutputTransport` discarding pending `UninterruptibleFrame` items
  (e.g. function-call context updates) when an interruption arrived. The audio
  task is now kept alive and only interruptible frames are drained when
  uninterruptible frames are present in the queue.
  (PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))

- Fixed spurious LLM inference being triggered when a function call result
  arrived while the user was actively speaking. The context frame is now
  suppressed until the user stops speaking.
  (PR [#4217](https://github.com/pipecat-ai/pipecat/pull/4217))

- Fixed `CartesiaTTSService` failing with "Context has closed" errors when
  switching voice, model, or language via `TTSUpdateSettingsFrame`. The service
  now automatically flushes the current audio context and opens a fresh one
  when these settings change.
  (PR [#4220](https://github.com/pipecat-ai/pipecat/pull/4220))

- Fixed duplicate LLM replies that could occur when multiple async function
  call results arrived while an LLM request was already queued.
  (PR [#4230](https://github.com/pipecat-ai/pipecat/pull/4230))

- Fixed undefined `_warn_deprecated_param` calls in `OpenAIRealtimeLLMService`
  and `GrokRealtimeLLMService` for the deprecated `session_properties` init
  parameter.
  (PR [#4232](https://github.com/pipecat-ai/pipecat/pull/4232))

- Fixed Gemini Live bot hanging after a session resumption reconnect. Audio,
  video, and text input were silently dropped after reconnecting because the
  internal `_ready_for_realtime_input` flag was not being reset.
  (PR [#4242](https://github.com/pipecat-ai/pipecat/pull/4242))

- Fixed `VADController` getting stuck in the `SPEAKING` state when audio frames
  stop arriving mid-speech (e.g. user mutes mic). A new `audio_idle_timeout`
  parameter (default 1s, set to 0 to disable) forces a transition back to
  `QUIET` and emits `on_speech_stopped` when no audio is received while
  speaking.
  (PR [#4244](https://github.com/pipecat-ai/pipecat/pull/4244))

- Fixed `PipelineRunner._gc_collect()` blocking the event loop by running
  `gc.collect()` synchronously. Now offloaded via `asyncio.to_thread` to avoid
  stalling concurrent pipeline tasks.
  (PR [#4255](https://github.com/pipecat-ai/pipecat/pull/4255))

- Fixed `ElevenLabsTTSService` incorrectly enabling `auto_mode` when using
  `TextAggregationMode.TOKEN`. Auto mode disables server-side buffering and is
  designed for complete sentences — enabling it with token streaming degraded
  speech quality. The default is now derived automatically from the aggregation
  strategy: `auto_mode=True` for `SENTENCE`, `auto_mode=False` for `TOKEN`.
  Callers can still override by passing `auto_mode` explicitly.
  (PR [#4265](https://github.com/pipecat-ai/pipecat/pull/4265))

- Fixed `ValueError: write to closed file` during pipeline shutdown when
  observers were active. Observer proxy tasks are now cancelled before observer
  resources are cleaned up.
  (PR [#4267](https://github.com/pipecat-ai/pipecat/pull/4267))

- Fixed delayed turn completion when STT transcripts arrive after the p99
  timeout. Previously, a late transcript (beyond the p99 window) would fall
  through to the 5-second `user_turn_stop_timeout` fallback. Now the turn stop
  triggers immediately when the late transcript arrives.
  (PR [#4283](https://github.com/pipecat-ai/pipecat/pull/4283))

- Fixed `ElevenLabsTTSService` ignoring `enable_logging=False` and
  `enable_ssml_parsing=False`. The truthy check treated `False` the same as
  `None` (both skipped), and Python's `str(False)` produced `"False"` instead
  of the lowercase `"false"` expected by the API.
  (PR [#4293](https://github.com/pipecat-ai/pipecat/pull/4293))

- Fixed `on_assistant_turn_stopped` not resetting internal state when the LLM
  returned no text tokens. Added `interrupted` field to
  `AssistantTurnStoppedMessage` to indicate whether the assistant turn was
  interrupted.
  (PR [#4294](https://github.com/pipecat-ai/pipecat/pull/4294))

- Fixed `LLMContextSummarizer` failing with "No messages to summarize" when
  using `system_instruction` instead of a system-role message at the start of
  the context. The summarizer previously scanned the entire context for the
  first system message, which could match a mid-conversation injection (e.g.
  idle notifications) instead of the initial prompt, causing the summarization
  range to be empty.
  (PR [#4295](https://github.com/pipecat-ai/pipecat/pull/4295))

## [0.0.108] - 2026-03-27

### Added

- Added `SarvamLLMService` with support for `sarvam-30b`, `sarvam-30b-16k`,
  `sarvam-105b` and `sarvam-105b-32k`.
  (PR [#3978](https://github.com/pipecat-ai/pipecat/pull/3978))

- Added `on_turn_context_created(context_id)` hook to `TTSService`. Override
  this to perform provider-specific setup (e.g. eagerly opening a server-side
  context) before text starts flowing. Called each time a new turn context ID
  is created.
  (PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))

- Added `XAIHttpTTSService` for text-to-speech using xAI's HTTP TTS API.
  (PR [#4031](https://github.com/pipecat-ai/pipecat/pull/4031))

- Added support for "developer" role messages in conversation context across
  all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock),
  "developer" messages are converted to "user" messages (use
  `system_instruction` to set the system instruction). For OpenAI services,
  "developer" messages pass through in conversation history. For the Responses
  API, they are kept as "developer" role (matching the existing "system" →
  "developer" conversion).
  (PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))

- Added `SmallestTTSService`, a WebSocket-based TTS service integration with
  Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with
  configurable voice, language, speed, consistency, similarity, and enhancement
  settings.
  (PR [#4092](https://github.com/pipecat-ai/pipecat/pull/4092))

- Added warnings in turn stop strategies when `VADParams.stop_secs` differs
  from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`,
  which collapses the STT wait timeout to 0s and may cause delayed turn
  detection. The warnings guide developers to re-run the
  [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD
  settings.
  (PR [#4115](https://github.com/pipecat-ai/pipecat/pull/4115))

- Added `domain` parameter to `AssemblyAISTTSettings` for specialized
  recognition modes such as Medical Mode (`domain="medical-v1"`).
  (PR [#4117](https://github.com/pipecat-ai/pipecat/pull/4117))

- Added `NovitaLLMService` for using Novita AI's LLM models via their
  OpenAI-compatible API.
  (PR [#4119](https://github.com/pipecat-ai/pipecat/pull/4119))

- Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer
  resources are properly released when no longer needed. Custom `VADAnalyzer`
  subclasses can override `cleanup()` to free any held resources.
  (PR [#4120](https://github.com/pipecat-ai/pipecat/pull/4120))

- Added `on_end_of_turn` event handler to `AssemblyAISTTService`. This fires
  after the final transcript is pushed, providing a reliable hook for
  end-of-turn logic that doesn't race with `TranscriptionFrame`. Works in both
  Pipecat and AssemblyAI turn detection modes.
  (PR [#4128](https://github.com/pipecat-ai/pipecat/pull/4128))

- Added `DeepgramFluxSageMakerSTTService` for running Deepgram Flux
  speech-to-text on AWS SageMaker endpoints.  Use with
  `ExternalUserTurnStrategies` to take advantage of Flux's turn detection.
  (PR [#4143](https://github.com/pipecat-ai/pipecat/pull/4143))

- Added `Mem0MemoryService.get_memories()` convenience method for retrieving
  all stored memories outside the pipeline (e.g. to build a personalized
  greeting at connection time). This avoids the need to manually handle client
  type branching, filter construction, and async wrapping.
  (PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))

### Changed

- Added context prewarming path for `InworldTTSService` to improve first audio
  latency.
  (PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))

- Added `KrispVivaVadAnalyzer` for Voice Activity Detection using the Krisp
  VIVA SDK (requires `krisp_audio`).
  (PR [#4022](https://github.com/pipecat-ai/pipecat/pull/4022))

- Modified `InworldTTSService` to close context at end of turn instead of
  relying on idle timeout.
  (PR [#4028](https://github.com/pipecat-ai/pipecat/pull/4028))

- Added Gemini 3 support to the Gemini Live service.
  (PR [#4078](https://github.com/pipecat-ai/pipecat/pull/4078))

- `TTSService`: the default `stop_frame_timeout_s` (idle time before an
  automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has
  changed from `2.0` to `3.0` seconds.
  (PR [#4084](https://github.com/pipecat-ai/pipecat/pull/4084))

- ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system
  message, matching all other adapters. Previously it searched for the first
  "system" message anywhere in the conversation history. A "system" message
  appearing later in the list will now be converted to "user" instead of being
  extracted as the system instruction.
  (PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))

- Fixed `InworldTtsService` to fallback to full text when TTS timestamps are
  not received.
  (PR [#4113](https://github.com/pipecat-ai/pipecat/pull/4113))

- ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova
  Sonic) now prefer `system_instruction` from service settings over an initial
  system message in the LLM context, matching the behavior of non-realtime
  services. Previously, context-provided system instructions took precedence. A
  warning is now logged when both are set.
  (PR [#4130](https://github.com/pipecat-ai/pipecat/pull/4130))

- Bumped `nvidia-riva-client` minimum version to `>=2.25.1`.
  (PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))

- Upgraded `protobuf` from 5.x to 6.x (`>=6.31.1,<7`).
  (PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))

- Unrecognized language strings (e.g. Deepgram's `"multi"`) no longer produce a
  warning at startup. The log message has been downgraded to debug level since
  these are valid service-specific values that are passed through correctly.
  (PR [#4137](https://github.com/pipecat-ai/pipecat/pull/4137))

- `GrokLLMService` and `GrokRealtimeLLMService` now live in the
  `pipecat.services.xai` module alongside `XAIHttpTTSService`, since all three
  use the same xAI API. Update imports from `pipecat.services.grok.*` to
  `pipecat.services.xai.*` (e.g. `from pipecat.services.xai.llm import
  GrokLLMService`).
  (PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))

- ⚠️ Bumped `mem0ai` dependency from `~=0.1.94` to `>=1.0.8,<2`. Users of the
  `mem0` extra will need to update their mem0ai package.
  (PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))

### Deprecated

- `pipecat.services.grok.llm`, `pipecat.services.grok.realtime.llm`, and
  `pipecat.services.grok.realtime.events` are deprecated. The old import paths
  still work but emit a `DeprecationWarning`; use `pipecat.services.xai.llm`,
  `pipecat.services.xai.realtime.llm`, and
  `pipecat.services.xai.realtime.events` instead.
  (PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))

### Removed

- ⚠️ `TTSService.add_word_timestamps()` no longer supports the `"Reset"` and
  `"TTSStoppedFrame"` sentinel strings. If you have a custom TTS service that
  called `await self.add_word_timestamps([("Reset", 0)])` or `await
  self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id)`,
  replace them with `await self.append_to_audio_context(ctx_id,
  TTSStoppedFrame(context_id=ctx_id))` and let `_handle_audio_context` manage
  the word-timestamp reset automatically.
  (PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))

- Removed `SambaNovaSTTService`. SambaNova no longer offers speech-to-text
  audio models. Use another STT provider instead.
  (PR [#4154](https://github.com/pipecat-ai/pipecat/pull/4154))

### Fixed

- Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring
  `settings.system_instruction`. The system instruction was being read from a
  deprecated constructor parameter instead of the settings object, causing it
  to be silently ignored.
  (PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))

- Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when
  the only message in context was a system message. The lone system message is
  now converted to "user" role instead of being extracted, matching the
  existing Anthropic adapter behavior.
  (PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))

- Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was
  deferred while waiting for the bot to finish responding and `turn_complete`
  never arrived. As a possible root-cause fix, `turn_complete` messages are now
  handled even if they lack `usage_metadata`. As a fallback, the deferred
  `EndFrame` now has a 30-second safety timeout.
  (PR [#4125](https://github.com/pipecat-ai/pipecat/pull/4125))

- Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous
  contexts exceeded") caused by rapid user interruptions. When interruptions
  arrived before any TTS text was generated, phantom contexts were created on
  the ElevenLabs server that were never closed, eventually exceeding the
  5-context limit.
  (PR [#4126](https://github.com/pipecat-ai/pipecat/pull/4126))

- Fixed the final sentence being dropped from the conversation context when
  using RTVI text input with non-word-timestamp TTS services. The
  `LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`,
  causing the `LLMAssistantAggregator` to finalize the context before the final
  sentence arrived.
  (PR [#4127](https://github.com/pipecat-ai/pipecat/pull/4127))

- Fixed audio crackling and popping in recordings when both user and bot are
  speaking. `AudioBufferProcessor` no longer injects silence into a track's
  buffer while that track is actively producing audio, preventing mid-utterance
  interruptions in the recorded output.
  (PR [#4135](https://github.com/pipecat-ai/pipecat/pull/4135))

- Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale
  words or backward PTS values into later turns.
  (PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))

- Fixed a race condition in `InterruptibleTTSService` where, if `run_tts` had
  been invoked but `BotStartedSpeakingFrame` had not yet been received, a user
  interruption could allow stale audio to leak through.
  (PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))

- Fixed Gemini Live local VAD mode (`GeminiVADParams(disabled=True)` with
  external VAD) not working. The bot now correctly detects user speech and
  signals turn boundaries to the Gemini API.
  (PR [#4146](https://github.com/pipecat-ai/pipecat/pull/4146))

- Fixed Gemini Live message handling to process all `server_content` fields
  independently. Gemini 3.x can bundle multiple fields (e.g. `model_turn` and
  `output_transcription`) on the same message, but the previous `elif` chain
  only processed the first match, silently dropping the rest.
  (PR [#4147](https://github.com/pipecat-ai/pipecat/pull/4147))

- Fixed `ServiceSwitcher` with `ServiceSwitcherStrategyFailover` incorrectly
  triggering failover when `ErrorFrame`s from other pipeline stages (e.g. TTS)
  propagated upstream through the switcher. Previously, any non-fatal error
  passing through would be misattributed to the active service and trigger an
  unwanted service switch. Now only errors originating from the switcher's own
  managed services trigger failover.
  (PR [#4149](https://github.com/pipecat-ai/pipecat/pull/4149))

- Fixed `LiveKitOutputTransport` not clearing the `rtc.AudioSource` internal
  buffer on interruption, causing the bot to continue speaking for several
  seconds after being interrupted.
  (PR [#4151](https://github.com/pipecat-ai/pipecat/pull/4151))

- Fixed a crash in OpenAI LLM processing when the provider returns
  `chunk.choices[0].delta.audio = None`, which caused `'NoneType' object has no
  attribute 'get'` errors during audio transcript handling.
  (PR [#4152](https://github.com/pipecat-ai/pipecat/pull/4152))

- Fixed error floods in `DeepgramSTTService` when the WebSocket connection
  drops. With Deepgram SDK 6.x, `send_media()` raises exceptions on a dead
  connection instead of silently failing, causing every queued audio frame to
  log an error. Now `send_media()` failures are caught gracefully — a single
  warning is logged and audio frames are skipped until the existing
  reconnection logic restores the connection.
  (PR [#4153](https://github.com/pipecat-ai/pipecat/pull/4153))

- `Mem0MemoryService` no longer blocks the event loop during memory storage and
  retrieval. All Mem0 API calls now run in a background thread, and message
  storage is fire-and-forget so it doesn't delay downstream processing.
  (PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))

- Fixed `Mem0MemoryService` failing to store messages when the context
  contained system or developer role messages. The Mem0 API only accepts user
  and assistant roles, so other roles are now filtered out before storing.
  (PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))

- Added missing `on_dtmf_event` callback to `LemonSliceTransportClient.setup()`
  `DailyCallbacks` construction, fixing a `ValidationError` at pipeline setup
  time.
  (PR [#4161](https://github.com/pipecat-ai/pipecat/pull/4161))

- Fixed an issue in `InworldTTSService` where, in cases of fast interruption,
  we would continue receiving audio from the previous context.
  (PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))

- Fixed a word timestamp interleaving issue in `InworldTTSService` when
  processing multiple sentences.
  (PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))

- Fixed duplicate `TTSStoppedFrame` being pushed in TTS services using
  `push_stop_frames=True`. When the stop-frame timeout fired, a second
  `TTSStoppedFrame` could be pushed after the normal one at context completion.
  (PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))

- ⚠️ Fixed `DeepgramSTTService` compatibility with deepgram-sdk 6.1.0. The SDK
  now requires explicit message objects for `send_keep_alive()`,
  `send_close_stream()`, and `send_finalize()`. The minimum deepgram-sdk
  version is now 6.1.0.
  (PR [#4174](https://github.com/pipecat-ai/pipecat/pull/4174))

- Fixed RTVI events not being delivered to clients when using WebSocket
  transports. `ProtobufFrameSerializer` now sets `ignore_rtvi_messages=False`
  by default.
  (PR [#4176](https://github.com/pipecat-ai/pipecat/pull/4176))

- Fixed a timing issue where turn detection timer tasks (idle controller,
  speech timeout, turn analyzer, and turn completion) could miss their first
  tick because the newly created asyncio task was not yet scheduled when the
  caller continued.
  (PR [#4183](https://github.com/pipecat-ai/pipecat/pull/4183))

- Fixed `FastAPIWebsocketTransport` intermittently hanging on shutdown when the
  remote side (e.g. Twilio) disconnects while audio is being sent. A race
  condition between the send and receive paths could cause the
  `on_client_disconnected` callback to be skipped, leaving the pipeline waiting
  for a disconnect signal that never came.
  (PR [#4186](https://github.com/pipecat-ai/pipecat/pull/4186))

### Performance

- `RimeTTSService` now handles Rime's `done` WebSocket message to complete
  audio contexts immediately, eliminating the 3-second idle timeout that
  previously added latency at the end of each utterance.
  (PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))

## [0.0.107] - 2026-03-23

### Added

- Added `frame_order` parameter to `SyncParallelPipeline`. Set
  `frame_order=FrameOrder.PIPELINE` to push synchronized output frames in
  pipeline definition order (all frames from the first pipeline, then the
  second, etc.) instead of the default arrival order.
  (PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))

- Added `sync_with_audio` field to `OutputImageRawFrame`. When set to `True`,
  the output transport queues image frames with audio so they are displayed
  only after all preceding audio has been sent, enabling synchronized
  audio/image playback.
  (PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))

- Added `OpenAIResponsesLLMService`, a new LLM service that uses the OpenAI
  Responses API. Supports streaming text, function calling, usage metrics, and
  out-of-band inference. Works with the universal `LLMContext` and
  `LLMContextAggregatorPair`. See
  `examples/foundational/07-interruptible-openai-responses.py` and
  `14-function-calling-openai-responses.py`.
  (PR [#4074](https://github.com/pipecat-ai/pipecat/pull/4074))

- Added `audio_out_auto_silence` parameter to `TransportParams` (defaults to
  `True`). When set to `False`, the transport waits for audio data instead of
  inserting silence when the output queue is empty, which is useful for
  scenarios that require uninterrupted audio playback without artificial gaps.
  (PR [#4104](https://github.com/pipecat-ai/pipecat/pull/4104))

### Changed

- Renamed tracing span attributes to align with OpenTelemetry GenAI semantic
  conventions: `gen_ai.system` to `gen_ai.provider.name`, `system` to
  `gen_ai.system_instructions`, `gen_ai.usage.cache_read_input_tokens` to
  `gen_ai.usage.cache_read.input_tokens`, and
  `gen_ai.usage.cache_creation_input_tokens` to
  `gen_ai.usage.cache_creation.input_tokens`.
  (PR [#3449](https://github.com/pipecat-ai/pipecat/pull/3449))

- `DeepgramSageMakerTTSService` now correctly routes audio through the base
  `TTSService` audio context queue. Audio frames are delivered via
  `append_to_audio_context()` instead of being pushed directly, enabling proper
  ordering, interruption handling, and start/stop frame lifecycle management.
  Interruptions now trigger a `Clear` message to Deepgram (flushing its text
  buffer) at the right time via `on_audio_context_interrupted`.
  (PR [#4083](https://github.com/pipecat-ai/pipecat/pull/4083))

- `GradiumTTSService` now sends a per-context `setup` message with
  `client_req_id` before the first text message for each TTS context, following
  Gradium's multiplexing protocol. Previously, a single setup message was sent
  at connection time without a `client_req_id`, which prevented Gradium from
  associating requests with their sessions when using `close_ws_on_eos=False`.
  (PR [#4091](https://github.com/pipecat-ai/pipecat/pull/4091))

### Fixed

- Fixed stale `system_instruction` in LLM tracing spans by reading from
  `_settings.system_instruction` instead of the removed `_system_instruction`
  attribute.
  (PR [#3449](https://github.com/pipecat-ai/pipecat/pull/3449))

- Fixed `SyncParallelPipeline` breaking the Whisker debugger.
  (PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))

- Fixed `SyncParallelPipeline` race condition where concurrent SystemFrame
  processing (e.g. from RTVI) could corrupt sink queues and cause deadlocks.
  SystemFrames now take a fast path that passes them through without draining
  queued output.
  (PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))

- Fixed TTS frame ordering so that non-system frames always arrive in correct
  order relative to the `TTSStartedFrame`/`TTSAudioRawFrame`/`TTSStoppedFrame`
  sequence. Previously these frames could race ahead of or behind audio context
  frames, producing out-of-order output downstream.
  (PR [#4075](https://github.com/pipecat-ai/pipecat/pull/4075))

- Fixed `SarvamTTSService` audio and error frames now route through
  `append_to_audio_context()` instead of `push_frame()`, ensuring correct
  behavior with audio contexts and interruptions.
  (PR [#4082](https://github.com/pipecat-ai/pipecat/pull/4082))

- Fixed audio frame ordering and interruption handling in Fish Audio, LMNT,
  Neuphonic, and Rime NonJson TTS services. These services were bypassing the
  base `TTSService` audio context serialization queue by pushing audio frames
  directly, which could cause out-of-order frames and broken interruptions
  during speech.
  (PR [#4090](https://github.com/pipecat-ai/pipecat/pull/4090))

- Fixed Genesys AudioHook serializer to always include the `parameters` field in
  protocol messages. The AudioHook protocol requires every message to carry a
  `parameters` object (even if empty), but `_create_message` omitted it when no
  parameters were provided. This caused clients that validate message structure
  (including the Genesys reference implementation) to reject `pong` and
  parameter-less `closed` responses, breaking server sequence tracking and
  preventing `outputVariables` from reaching the Architect flow.
  (PR [#4093](https://github.com/pipecat-ai/pipecat/pull/4093))

## [0.0.106] - 2026-03-18

### Added

- Added optional `service` field to `ServiceUpdateSettingsFrame` (and its
  subclasses `LLMUpdateSettingsFrame`, `TTSUpdateSettingsFrame`,
  `STTUpdateSettingsFrame`) to target a specific service instance. When
  `service` is set, only the matching service applies the settings; others
  forward the frame unchanged. This enables updating a single service when
  multiple services of the same type exist in the pipeline.
  (PR [#4004](https://github.com/pipecat-ai/pipecat/pull/4004))

- Added `sip_provider` and `room_geo` parameters to `configure()` in the Daily
  runner. These convenience parameters let callers specify a SIP provider name
  and geographic region directly without manually constructing
  `DailyRoomProperties` and `DailyRoomSipParams`.
  (PR [#4005](https://github.com/pipecat-ai/pipecat/pull/4005))

- Added `PerplexityLLMAdapter` that automatically transforms conversation
  messages to satisfy Perplexity's stricter API constraints (strict role
  alternation, no non-initial system messages, last message must be user/tool).
  Previously, certain conversation histories could cause Perplexity API errors
  that didn't occur with OpenAI (`PerplexityLLMService` subclasses
  `OpenAILLMService` since Perplexity uses an OpenAI-compatible API).
  (PR [#4009](https://github.com/pipecat-ai/pipecat/pull/4009))

- Added DTMF input event support to the Daily transport. Incoming DTMF tones
  are now received via Daily's `on_dtmf_event` callback and pushed into the
  pipeline as `InputDTMFFrame`, enabling bots to react to keypad presses from
  phone callers.
  (PR [#4047](https://github.com/pipecat-ai/pipecat/pull/4047))

- Added `WakePhraseUserTurnStartStrategy` for triggering user turns based on
  wake phrases, with support for `single_activation` mode. Deprecates
  `WakeCheckFilter`.
  (PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))

- Added `default_user_turn_start_strategies()` and
  `default_user_turn_stop_strategies()` helper functions for composing custom
  strategy lists.
  (PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))

### Changed

- Changed tool result JSON serialization to use `ensure_ascii=False`,
  preserving UTF-8 characters instead of escaping them. This reduces context
  size and token usage for non-English languages.
  (PR [#3457](https://github.com/pipecat-ai/pipecat/pull/3457))

- `OpenAIRealtimeSTTService`'s `noise_reduction` parameter is now part of
  `OpenAIRealtimeSTTSettings`, making it runtime-updatable via
  `STTUpdateSettingsFrame`. The direct `noise_reduction` init argument is
  deprecated as of 0.0.106.
  (PR [#3991](https://github.com/pipecat-ai/pipecat/pull/3991))

- Updated `sarvamai` dependency from `0.1.26a2` (alpha) to `0.1.26` (stable
  release).
  (PR [#3997](https://github.com/pipecat-ai/pipecat/pull/3997))

- `SimliVideoService` now extends `AIService` instead of `FrameProcessor`,
  aligning it with the HeyGen and Tavus video services. It supports
  `SimliVideoService.Settings(...)` for configuration and uses
  `start()`/`stop()`/`cancel()` lifecycle methods. Existing constructor usage
  (`api_key`, `face_id`, etc.) remains unchanged.
  (PR [#4001](https://github.com/pipecat-ai/pipecat/pull/4001))

- Update `pipecat-ai-small-webrtc-prebuilt` to `2.4.0`.
  (PR [#4023](https://github.com/pipecat-ai/pipecat/pull/4023))

- Nova Sonic assistant text transcripts are now delivered in real-time using
  speculative text events instead of delayed final text events. Previously,
  assistant text only arrived after all audio had finished playing, causing
  laggy transcripts in client UIs. Speculative text arrives before each audio
  chunk, providing text synchronized with what the bot is saying. This also
  simplifies the internal text handling by removing the interruption re-push
  hack and assistant text buffer.
  (PR [#4042](https://github.com/pipecat-ai/pipecat/pull/4042))

- Updated `daily-python` dependency to 0.25.0.
  (PR [#4047](https://github.com/pipecat-ai/pipecat/pull/4047))

- Added `enable_dialout` parameter to `configure()` in `pipecat.runner.daily`
  to support dial-out rooms. Also narrowed misleading `Optional` type hints and
  deduplicated token expiry calculation.
  (PR [#4048](https://github.com/pipecat-ai/pipecat/pull/4048))

- Extended `ProcessFrameResult` to stop strategies, allowing a stop strategy to
  short-circuit evaluation of subsequent strategies by returning `STOP`.
  (PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))

- `GradiumSTTService` now takes both an `encoding` and `sample_rate`
  constructor argument which is assmebled in the class to form the
  `input_format`. PCM accepts `8000`, `16000`, and `24000` Hz sample rates.
  (PR [#4066](https://github.com/pipecat-ai/pipecat/pull/4066))

- Improved `GradiumSTTService` transcription accuracy by reworking how text
  fragments are accumulated and finalized. Previously, trailing words could be
  dropped when the server's `flushed` response arrived before all text tokens
  were delivered. The service now uses a short aggregation delay after flush to
  capture trailing tokens, producing complete utterances.
  (PR [#4066](https://github.com/pipecat-ai/pipecat/pull/4066))

### Deprecated

- `SimliVideoService.InputParams` is deprecated. Use the direct constructor
  parameters `max_session_length`, `max_idle_time`, and `enable_logging`
  instead.
  (PR [#4001](https://github.com/pipecat-ai/pipecat/pull/4001))

- Deprecated `LocalSmartTurnAnalyzerV2` and `LocalCoreMLSmartTurnAnalyzer`. Use
  `LocalSmartTurnAnalyzerV3` instead. Instantiating these analyzers will now
  emit a `DeprecationWarning`.
  (PR [#4012](https://github.com/pipecat-ai/pipecat/pull/4012))

- Deprecated `WakeCheckFilter` in favor of `WakePhraseUserTurnStartStrategy`.
  (PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))

### Fixed

- Fixed an issue where the default model for `OpenAILLMService` and
  `AzureLLMService` was mistakenly reverted to `gpt-4o`. The defaults are now
  restored to `gpt-4.1`.
  (PR [#4000](https://github.com/pipecat-ai/pipecat/pull/4000))

- Fixed a race condition where `EndTaskFrame` could cause the pipeline to shut
  down before in-flight frames (e.g. LLM function call responses) finished
  processing. `EndTaskFrame` and `StopTaskFrame` now flow through the pipeline
  as `ControlFrame`s, ensuring all pending work is flushed before shutdown
  begins. `CancelTaskFrame` and `InterruptionTaskFrame` remain immediate
  (`SystemFrame`).
  (PR [#4006](https://github.com/pipecat-ai/pipecat/pull/4006))

- Fixed `ParallelPipeline` dropping or misordering frames during lifecycle
  synchronization. Buffered frames are now flushed in the correct order
  relative to synchronization frames (`StartFrame` goes first,
  `EndFrame`/`CancelFrame` go after), and frames added to the buffer during
  flush are also drained.
  (PR [#4007](https://github.com/pipecat-ai/pipecat/pull/4007))

- Fixed `TTSService` potentially canceling in-flight audio during shutdown. The
  stop sequence now waits for all queued audio contexts to finish processing
  before canceling the stop frame task.
  (PR [#4007](https://github.com/pipecat-ai/pipecat/pull/4007))

- Fixed `Language` enum values (e.g. `Language.ES`) not being converted to
  service-specific codes when passed via
  `settings=Service.Settings(language=Language.ES)` at init time. This caused
  API errors (e.g. 400 from Rime) because the raw enum was sent instead of the
  expected language code (e.g. `"spa"`). Runtime updates via
  `UpdateSettingsFrame` were unaffected. The fix centralizes conversion in the
  base `TTSService` and `STTService` classes so all services handle this
  consistently.
  (PR [#4024](https://github.com/pipecat-ai/pipecat/pull/4024))

- Fixed `DeepgramSTTService` ignoring the `base_url` scheme when using `ws://`
  or `http://`. Previously these were silently overwritten with `wss://` /
  `https://`, breaking air-gapped or private deployments that don't use TLS.
  All scheme choices (`wss://`, `https://`, `ws://`, `http://`, or bare
  hostname) are now respected.
  (PR [#4026](https://github.com/pipecat-ai/pipecat/pull/4026))

- Fixed `LLMSwitcher.register_function()` and `register_direct_function()` not
  accepting or forwarding the `timeout_secs` parameter.
  (PR [#4037](https://github.com/pipecat-ai/pipecat/pull/4037))

- Fixed empty user transcriptions in Nova Sonic causing spurious interruptions.
  Previously, an empty transcription could trigger an interruption of the
  assistant's response even though the user hadn't actually spoken.
  (PR [#4042](https://github.com/pipecat-ai/pipecat/pull/4042))

- Fixed `SonioxSTTService` and `OpenAIRealtimeSTTService` crash when language
  parameters contain plain strings instead of `Language` enum values.
  (PR [#4046](https://github.com/pipecat-ai/pipecat/pull/4046))

- Fixed premature user turn stops caused by late transcriptions arriving
  between turns. A stale transcript from the previous turn could persist into
  the next turn and trigger a stop before the current turn's real transcript
  arrived. Stop strategies are now reset at both turn start and turn stop to
  prevent state from leaking across turn boundaries.
  (PR [#4057](https://github.com/pipecat-ai/pipecat/pull/4057))

- Fixed raw language strings like `"de-DE"` silently failing when passed to
  TTS/STT services (e.g. ElevenLabs producing no audio). Raw strings now go
  through the same `Language` enum resolution as enum values, so regional codes
  like `"de-DE"` are properly converted to service-expected formats like
  `"de"`. Unrecognized strings log a warning instead of failing silently.
  (PR [#4058](https://github.com/pipecat-ai/pipecat/pull/4058))

- Fixed Deepgram STT list-type settings (`keyterm`, `keywords`, `search`,
  `redact`, `replace`) being stringified instead of passed as lists to the SDK,
  which caused them to be sent as literal strings (e.g. `"['pipecat']"`) in the
  WebSocket query params.
  (PR [#4063](https://github.com/pipecat-ai/pipecat/pull/4063))

- Fixed `MinWordsUserTurnStartStrategy` including text below the word threshold
  in the output by resetting aggregation when the minimum word count is not
  met.
  (PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))

- Fixed audio overlap and potential dropped TTS content when multiple assistant
  turns occur in quick succession. `TTSService` now flushes remaining text
  before pausing frame processing on `LLMFullResponseEndFrame`/`EndFrame`,
  instead of pausing first.
  (PR [#4071](https://github.com/pipecat-ai/pipecat/pull/4071))

### Security

- Bumped PyJWT minimum version from 2.10.1 to 2.12.0 in the `livekit` extra to
  address CVE-2026-32597 (GHSA-752w-5fwx-jx9f), where PyJWT <= 2.11.0 accepted
  unknown `crit` header extensions.
  (PR [#4035](https://github.com/pipecat-ai/pipecat/pull/4035))

## [0.0.105] - 2026-03-10

### Added

- Added concurrent audio context support: `CartesiaTTSService` can now
  synthesize the next sentence while the previous one is still playing, by
  setting `pause_frame_processing=False` and routing each sentence through its
  own audio context queue.
  (PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))

- Added custom video track support to Daily transport. Use
  `video_out_destinations` in `DailyParams` to publish multiple video tracks
  simultaneously, mirroring the existing `audio_out_destinations` feature.
  (PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))

- Added `ServiceSwitcherStrategyFailover` that automatically switches to the
  next service when the active service reports a non-fatal error. Recovery
  policies can be implemented via the `on_service_switched` event handler.
  (PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))

- Added optional `timeout_secs` parameter to `register_function()` and
  `register_direct_function()` for per-tool function call timeout control,
  overriding the global `function_call_timeout_secs` default.
  (PR [#3915](https://github.com/pipecat-ai/pipecat/pull/3915))

- Added `cloud-audio-only` recording option to Daily transport's
  `enable_recording` property.
  (PR [#3916](https://github.com/pipecat-ai/pipecat/pull/3916))

- Wired up `system_instruction` in `BaseOpenAILLMService`,
  `AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default
  system prompt, matching the behavior of the Google services. This enables
  sharing a single `LLMContext` across multiple LLM services, where each
  service provides its own system instruction independently.

    ```python
    llm = OpenAILLMService(
        api_key=os.getenv("OPENAI_API_KEY"),
        system_instruction="You are a helpful assistant.",
    )

    context = LLMContext()

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        context.add_message({"role": "user", "content": "Please introduce yourself."})
        await task.queue_frames([LLMRunFrame()])
    ```
  (PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))

- Added `vad_threshold` parameter to `AssemblyAIConnectionParams` for
  configuring voice activity detection sensitivity in U3 Pro. Aligning this
  with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone"
  where AssemblyAI transcribes speech that VAD hasn't detected yet.
  (PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))

- Added `push_empty_transcripts` parameter to `BaseWhisperSTTService` and
  `OpenAISTTService` to allow empty transcripts to be pushed downstream as
  `TranscriptionFrame` instead of discarding them (the default behavior). This
  is intended for situations where VAD fires even though the user did not
  speak. In these cases, it is useful to know that nothing was transcribed so
  that the agent can resume speaking, instead of waiting longer for a
  transcription.
  (PR [#3930](https://github.com/pipecat-ai/pipecat/pull/3930))

- LLM services (`BaseOpenAILLMService`, `AnthropicLLMService`,
  `AWSBedrockLLMService`) now log a warning when both `system_instruction` and
  a system message in the context are set. The constructor's
  `system_instruction` takes precedence.
  (PR [#3932](https://github.com/pipecat-ai/pipecat/pull/3932))

- Runtime settings updates (via `STTUpdateSettingsFrame`) now work for AWS
  Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and
  Soniox STT services. Previously, changing settings at runtime only stored the
  new values without reconnecting.
  (PR [#3946](https://github.com/pipecat-ai/pipecat/pull/3946))

- Exposed `on_summary_applied` event on `LLMAssistantAggregator`, allowing
  users to listen for context summarization events without accessing private
  members.
  (PR [#3947](https://github.com/pipecat-ai/pipecat/pull/3947))

- Deepgram Flux STT settings (`keyterm`, `eot_threshold`,
  `eager_eot_threshold`, `eot_timeout_ms`) can now be updated mid-stream via
  `STTUpdateSettingsFrame` without triggering a reconnect. The new values are
  sent to Deepgram as a Configure WebSocket message on the existing connection.
  (PR [#3953](https://github.com/pipecat-ai/pipecat/pull/3953))

- Added `system_instruction` parameter to `run_inference` across all LLM
  services, allowing callers to override the system prompt for one-shot
  inference calls. Used by `_generate_summary` to pass the summarization prompt
  cleanly.
  (PR [#3968](https://github.com/pipecat-ai/pipecat/pull/3968))

### Changed

- Audio context management (previously in `AudioContextTTSService`) is now
  built into `TTSService`. All WebSocket providers (`cartesia`, `elevenlabs`,
  `asyncai`, `inworld`, `rime`, `gradium`, `resembleai`) now inherit from
  `WebsocketTTSService` directly. Word-timestamp baseline is set automatically
  on the first audio chunk of each context instead of requiring each provider
  to call `start_word_timestamps()` in their receive loop.
  (PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))

- Daily transport now uses `CustomVideoSource`/`CustomVideoTrack` instead of
  `VirtualCameraDevice` for the default camera output, mirroring how audio
  already works with `CustomAudioSource`/`CustomAudioTrack`.
  (PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))

- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions`
  class was removed from the SDK and is now provided by pipecat directly;
  import it from `pipecat.services.deepgram.stt` instead of `deepgram`.
  (PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))

- `ServiceSwitcherStrategy` base class now provides a `handle_error()` hook for
  subclasses to implement error-based switching. `ServiceSwitcher` defaults to
  `ServiceSwitcherStrategyManual` and `strategy_type` is now optional.
  (PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))

- Support for Voice Focus 2.0 models.
    - Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
    - Cleaned unused `ParameterFixedError` exception handling in `AICFilter`
      parameter setup.
  (PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))

- `max_context_tokens` and `max_unsummarized_messages` in
  `LLMAutoContextSummarizationConfig` (and deprecated
  `LLMContextSummarizationConfig`) can now be set to `None` independently to
  disable that summarization threshold. At least one must remain set.
  (PR [#3914](https://github.com/pipecat-ai/pipecat/pull/3914))

- ⚠️ Removed `formatted_finals` and `word_finalization_max_wait_time` from
  `AssemblyAIConnectionParams` as these were v2 API parameters not supported in
  v3. Clarified that `format_turns` only applies to Universal-Streaming models;
  U3 Pro has automatic formatting built-in.
  (PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))

- Changed `DeepgramTTSService` to send a Clear message on interruption instead
  of disconnecting and reconnecting the WebSocket, allowing the connection to
  persist throughout the session.
  (PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))

- Re-added `enhancement_level` support to `AICFilter` with runtime
  `FilterEnableFrame` control, applying `ProcessorParameter.Bypass` and
  `ProcessorParameter.EnhancementLevel` together.
  (PR [#3961](https://github.com/pipecat-ai/pipecat/pull/3961))

- Updated `daily-python` dependency from `~=0.23.0` to `~=0.24.0`.
  (PR [#3970](https://github.com/pipecat-ai/pipecat/pull/3970))

- Updated `FishAudioTTSService` default model from `s1` to `s2-pro`, matching
  Fish Audio's latest recommended model for improved quality and speed.
  (PR [#3973](https://github.com/pipecat-ai/pipecat/pull/3973))

- `AzureSTTService` `region` parameter is now optional when `private_endpoint`
  is provided. A `ValueError` is raised if neither is given, and a warning is
  logged if both are provided (`private_endpoint` takes priority).
  (PR [#3974](https://github.com/pipecat-ai/pipecat/pull/3974))

### Deprecated

- Deprecated `AudioContextTTSService` and `AudioContextWordTTSService`.
  Subclass `WebsocketTTSService` directly instead; audio context management is
  now part of the base `TTSService`.
  - Deprecated `WordTTSService`, `WebsocketWordTTSService`, and
    `InterruptibleWordTTSService`. Word timestamp logic is now always active in
    `TTSService` and no longer needs to be opted into via a subclass.
  (PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))

- Deprecated `pipecat.services.google.llm_vertex`,
  `pipecat.services.google.llm_openai`, and
  `pipecat.services.google.gemini_live.llm_vertex` modules. Use
  `pipecat.services.google.vertex.llm`, `pipecat.services.google.openai.llm`,
  and `pipecat.services.google.gemini_live.vertex.llm` instead. The old import
  paths still work but will emit a `DeprecationWarning`.
  (PR [#3980](https://github.com/pipecat-ai/pipecat/pull/3980))

### Removed

- ⚠️ Removed `supports_word_timestamps` parameter from `TTSService.__init__()`.
  Word timestamp logic is now always active. Remove this argument from any
  custom subclass `super().__init__()` calls.
  (PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))

### Fixed

- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The
  deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit
  `KeepAlive` messages every 5 seconds, within the recommended 3–5 second
  interval before Deepgram's 10-second inactivity timeout.
  (PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))

- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in
  `AICFilter` caused by holding a `memoryview` on the mutable audio buffer
  across async yield points.
  (PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))

- Fixed TTS context not being appended to the assistant message history when
  using `TTSSpeakFrame` with `append_to_context=True` with some TTS providers.
  (PR [#3936](https://github.com/pipecat-ai/pipecat/pull/3936))

- Fixed context summarization leaving orphaned tool responses in the kept
  context when tool calls were moved to the summarized portion.
  (PR [#3937](https://github.com/pipecat-ai/pipecat/pull/3937))

- Fixed turn completion state not resetting at end of LLM responses.
  `LLMFullResponseEndFrame` is pushed (not received) by the LLM service, so the
  mixin now handles it in `push_frame` instead of `process_frame`.
  (PR [#3956](https://github.com/pipecat-ai/pipecat/pull/3956))

- Fixed turn completion instructions being injected as a context system message
  instead of using `system_instruction`. This caused warning spam when
  `system_instruction` was also set and didn't persist across full context
  updates.
  (PR [#3957](https://github.com/pipecat-ai/pipecat/pull/3957))

- Fixed `TTSService` audio context queue getting blocked when
  `append_to_audio_context()` was called with a `None` context ID, which
  prevented subsequent audio from being delivered.
  (PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))

- Fixed `on_call_state_updated` event handler in LiveKit transport receiving
  incorrect number of arguments due to redundant `self` passed to
  `_call_event_handler`.
  (PR [#3959](https://github.com/pipecat-ai/pipecat/pull/3959))

- Fixed OpenAI Realtime, OpenAI Realtime Beta, and Grok realtime services
  treating `conversation_already_has_active_response` as a fatal error. These
  services now log it as a non-fatal debug event when a response is already in
  progress.
  (PR [#3960](https://github.com/pipecat-ai/pipecat/pull/3960))

- Fixed `SmallWebRTCConnection` silently discarding messages sent before the
  data channel is open by queuing them and flushing once the channel is ready.
  A bounded queue (`MAX_MESSAGE_QUEUE_SIZE = 50`) prevents unbounded memory
  growth, and a 10-second timeout after connection clears the queue and falls
  back to discard mode if the data channel never opens.
  (PR [#3962](https://github.com/pipecat-ai/pipecat/pull/3962))

- Fixed `AzureSTTService` failing to initialize when `private_endpoint` is
  provided. The Azure Speech SDK's `SpeechConfig` does not accept both `region`
  and `endpoint` simultaneously, so they are now passed conditionally.
  (PR [#3967](https://github.com/pipecat-ai/pipecat/pull/3967))

- Fixed `GoogleLLMService` ignoring the `system_instruction` set via
  constructor or `GoogleLLMSettings` when a system message was also present in
  the context. The settings value now correctly takes priority, and a warning
  is logged when both are set.
  (PR [#3976](https://github.com/pipecat-ai/pipecat/pull/3976))

### Other

- Updated foundational examples to use `system_instruction` on LLM services
  instead of adding system messages to `LLMContext`.
  (PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))

- Updated AssemblyAI turn detection example to use `keyterms_prompt` list
  format instead of `prompt` string for improved clarity.
  (PR [#3929](https://github.com/pipecat-ai/pipecat/pull/3929))

- Updated foundational examples and eval scripts to use `"user"` role instead
  of `"system"` when adding messages to `LLMContext`, since system prompts
  should be set via `system_instruction` on the LLM service.
  (PR [#3931](https://github.com/pipecat-ai/pipecat/pull/3931))

## [0.0.104] - 2026-03-02

### Added

- Added `TextAggregationMetricsData` metric measuring the time from the first
  LLM token to the first complete sentence, representing the latency cost of
  sentence aggregation in the TTS pipeline.
  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))

- Added support for using strongly-typed objects instead of dicts for updating
  service settings at runtime.

    Instead of, say:

    ```python
    await task.queue_frame(
        STTUpdateSettingsFrame(settings={"language": Language.ES})
    )
    ```

    you'd do:

    ```python
    await task.queue_frame(
        STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
    )
    ```

  Each service now vends strongly-typed classes like `DeepgramSTTSettings`
  representing the service's runtime-updatable settings.
  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))

- Added support for specifying private endpoints for Azure Speech-to-Text,
  enabling use in private networks behind firewalls.
  (PR [#3764](https://github.com/pipecat-ai/pipecat/pull/3764))

- Added `LemonSliceTransport` and `LemonSliceApi` to support adding real-time
  LemonSlice Avatars to any Daily room.
  (PR [#3791](https://github.com/pipecat-ai/pipecat/pull/3791))

- Added `output_medium` parameter to `AgentInputParams` and
  `OneShotInputParams` in Ultravox service to control initial output medium
  (text or voice) at call creation time.
  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))

- Added `TurnMetricsData` as a generic metrics class for turn detection, with
  e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData`
  with `e2e_processing_time_ms` tracking the interval from VAD
  speech-to-silence transition to turn completion.
  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))

- Added `on_audio_context_interrupted()` and `on_audio_context_completed()`
  callbacks to `AudioContextTTSService`. Subclasses can override these to
  perform provider-specific cleanup instead of overriding
  `_handle_interruption()`.
  (PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))

- Added `on_summary_applied` event to `LLMContextSummarizer` for observability,
  providing message counts before and after context summarization.
  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))

- Added `summary_message_template` to `LLMContextSummarizationConfig` for
  customizing how summaries are formatted when injected into context (e.g.,
  wrapping in XML tags).
  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))

- Added `summarization_timeout` to `LLMContextSummarizationConfig` (default
  120s) to prevent hung LLM calls from permanently blocking future
  summarizations.
  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))

- Added optional `llm` field to `LLMContextSummarizationConfig` for routing
  summarization to a dedicated LLM service (e.g., a cheaper/faster model)
  instead of the pipeline's primary model.
  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))

- Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))

- Added `LLMSummarizeContextFrame` to trigger on-demand context summarization
  from anywhere in the pipeline (e.g. a function call tool). Accepts an
  optional `config: LLMContextSummaryConfig` to override summary generation
  settings per request.
  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))

- Added `LLMContextSummaryConfig` (summary generation params:
  `target_context_tokens`, `min_messages_after_summary`,
  `summarization_prompt`) and `LLMAutoContextSummarizationConfig` (auto-trigger
  thresholds: `max_context_tokens`, `max_unsummarized_messages`, plus a nested
  `summary_config`). These replace the monolithic
  `LLMContextSummarizationConfig`.
  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))

- Added support for the `speed_alpha` parameter to the `arcana` model in
  `RimeTTSService`.
  (PR [#3873](https://github.com/pipecat-ai/pipecat/pull/3873))

- Added `ClientConnectedFrame`, a new `SystemFrame` pushed by all transports
  (Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen,
  Tavus) when a client connects. Enables observers to track transport readiness
  timing.
  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))

- Added `StartupTimingObserver` for measuring how long each processor's
  `start()` method takes during pipeline startup. Also measures transport
  readiness — the time from `StartFrame` to first client connection — via the
  `on_transport_timing_report` event.
  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))

- Added `BotConnectedFrame` for SFU transports and `on_transport_timing_report`
  event to `StartupTimingObserver` with bot and client connection timing.
  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))

- Added optional `direction` parameter to `PipelineTask.queue_frame()` and
  `PipelineTask.queue_frames()`, allowing frames to be pushed upstream from the
  end of the pipeline.
  (PR [#3883](https://github.com/pipecat-ai/pipecat/pull/3883))

- Added `on_latency_breakdown` event to `UserBotLatencyObserver` providing
  per-service TTFB, text aggregation, user turn duration, and function call
  latency metrics for each user-to-bot response cycle.
  (PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))

- Added `on_first_bot_speech_latency` event to `UserBotLatencyObserver`
  measuring the time from client connection to first bot speech. An
  `on_latency_breakdown` is also emitted for this first speech event.
  (PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))

- Added `broadcast_interruption()` to `FrameProcessor`. This method pushes an
  `InterruptionFrame` both upstream and downstream directly from the calling
  processor, avoiding the round-trip through the pipeline task that
  `push_interruption_task_frame_and_wait()` required.
  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))

### Changed

- Added `text_aggregation_mode` parameter to `TTSService` and all TTS
  subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All
  text now flows through text aggregators regardless of mode, enabling pattern
  detection and tag handling in TOKEN mode.
  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))

- ⚠️ Refactored runtime-updatable service settings to use strongly-typed
  classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific
  subclasses) instead of plain dicts. Each service's `_settings` now holds
  these strongly-typed objects. For service maintainers, see changes in
  COMMUNITY_INTEGRATIONS.md.
  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))

- Word timestamp support has been moved from `WordTTSService` into `TTSService`
  via a new `supports_word_timestamps` parameter. Services that previously
  extended `WordTTSService`, `AudioContextWordTTSService`, or
  `WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their
  parent `__init__` instead.
  (PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))

- Improved Ultravox TTFB measurement accuracy by using VAD speech end time
  instead of `UserStoppedSpeakingFrame` timing.
  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))

- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini
  realtime services: added `InterruptionFrame` handling with metrics cleanup,
  processing metrics at response boundaries, and improved agent transcript
  handling for both voice and text output modalities.
  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))

- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.
  (PR [#3807](https://github.com/pipecat-ai/pipecat/pull/3807))

- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and
  `KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to
  `KRISP_VIVA_API_KEY` environment variable.
  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))

- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security
  vulnerability.
  (PR [#3811](https://github.com/pipecat-ai/pipecat/pull/3811))

- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally
  speaking, you don't want a user interruption to prevent a service setting
  change from going into effect. Note that you usually don't use
  `ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
    - `LLMUpdateSettingsFrame`
    - `TTSUpdateSettingsFrame`
    - `STTUpdateSettingsFrame`
  (PR [#3819](https://github.com/pipecat-ai/pipecat/pull/3819))

- Updated context summarization to use `user` role instead of `assistant` for
  summary messages.
  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))

- Rename `AssemblyAISTTService` parameter
  `min_end_of_turn_silence_when_confident` parameter to `min_turn_silence` (old
  name still supported with deprecation warning)
  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))

- ⚠️ Renamed `LLMAssistantAggregatorParams` fields:
  `enable_context_summarization` → `enable_auto_context_summarization` and
  `context_summarization_config` → `auto_context_summarization_config` (now
  accepts `LLMAutoContextSummarizationConfig`). The old names still work with a
  `DeprecationWarning` for one release cycle.
  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))

- `ElevenLabsRealtimeSTTService` now sets `TranscriptionFrame.finalized` to
  `True` when using `CommitStrategy.MANUAL`.
  (PR [#3865](https://github.com/pipecat-ai/pipecat/pull/3865))

- Updated numba version pin from == to >=0.61.2
  (PR [#3868](https://github.com/pipecat-ai/pipecat/pull/3868))

- Updated tracing code to use `ServiceSettings` dataclass API
  (`given_fields()`, attribute access) instead of dict-style access
  (`.items()`, `in`, subscript).
  (PR [#3879](https://github.com/pipecat-ai/pipecat/pull/3879))

- ⚠️ Removed `event` field and `complete()` method from `InterruptionFrame`.
  Removed `event` field from `InterruptionTaskFrame`. These are no longer
  needed since `broadcast_interruption()` does not require a round-trip
  completion signal.
  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))

- Moved `pipecat.services.deepgram.stt_sagemaker` and
  `pipecat.services.deepgram.tts_sagemaker` to
  `pipecat.services.deepgram.sagemaker.stt` and
  `pipecat.services.deepgram.sagemaker.tts`. The old import paths still work
  but emit a `DeprecationWarning`.
  (PR [#3902](https://github.com/pipecat-ai/pipecat/pull/3902))

### Deprecated

- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS
  subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or
  `text_aggregation_mode=TextAggregationMode.TOKEN` instead.
  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))

- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services
  in favor of runtime updates via `TTSUpdateSettingsFrame`,
  `STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.

  ⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas
  previously only `set_language()` caused the service to actually react to the
  update (e.g. by reconnecting to a remote service so it an pick up the
  change), now all these methods do. This change was made as part of a refactor
  making them all work the same way under the hood.
  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))

- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of
  passing typed settings delta objects with
  `*UpdateSettingsFrame(delta={...})`.
  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))

- Deprecated `WordTTSService`, `WebsocketWordTTSService`,
  `AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their
  non-word counterparts with `supports_word_timestamps=True` instead:
    - `WordTTSService` → `TTSService(supports_word_timestamps=True)`
    - `WebsocketWordTTSService` →
  `WebsocketTTSService(supports_word_timestamps=True)`
    - `AudioContextWordTTSService` →
  `AudioContextTTSService(supports_word_timestamps=True)`
    - `InterruptibleWordTTSService` →
  `InterruptibleTTSService(supports_word_timestamps=True)`
  (PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))

- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`.
  `BaseSmartTurn` now emits `TurnMetricsData` directly.
  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))

- Deprecated `LLMContextSummarizationConfig`. Use
  `LLMAutoContextSummarizationConfig` with a nested `LLMContextSummaryConfig`
  instead. The old class emits a `DeprecationWarning`.
  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))

- Deprecated `push_interruption_task_frame_and_wait()` in `FrameProcessor`. Use
  `broadcast_interruption()` instead. The old method now delegates to
  `broadcast_interruption()` and logs a deprecation warning.
  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))

### Removed

- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The
  `transformers` and `onnxruntime` packages are now always installed as core
  dependencies since they are required by the default turn stop strategy,
  `TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
  (PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))

- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been
  shut down and is no longer available.
  (PR [#3838](https://github.com/pipecat-ai/pipecat/pull/3838))

### Fixed

- Added `LLMSpecificMessage` handling in `LLMContextSummarizationUtil` to skip
  provider-specific messages during context summarization.
  (PR [#3794](https://github.com/pipecat-ai/pipecat/pull/3794))

- Treated `response_cancel_not_active` as a non-fatal error in realtime
  services (`OpenAIRealtimeLLMService`, `GrokRealtimeLLMService`,
  `OpenAIRealtimeBetaLLMService`) to prevent WebSocket disconnection when
  cancelling an inactive response.
  (PR [#3795](https://github.com/pipecat-ai/pipecat/pull/3795))

- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies
  (`transformers`, `onnxruntime`) into core dependencies instead of using a
  self-referential extra.
  (PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))

- Fixed `SentryMetrics` method signatures to match updated
  `FrameProcessorMetrics` base class, resolving `TypeError` when using
  `start_time`/`end_time` keyword arguments.
  (PR [#3808](https://github.com/pipecat-ai/pipecat/pull/3808))

- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and
  `AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.
  (PR [#3813](https://github.com/pipecat-ai/pipecat/pull/3813))

- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI,
  ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio
  contexts after normal speech completion, only on interruption.
  (PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))

- Fixed STT TTFB metrics measuring timeout expiry time instead of actual
  transcript arrival time.
  (PR [#3822](https://github.com/pipecat-ai/pipecat/pull/3822))

- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being
  unintentionally pushed downstream in `LLMUserAggregator`. They are now
  consumed like `TranscriptionFrame`.
  (PR [#3825](https://github.com/pipecat-ai/pipecat/pull/3825))

- Fixed misleading "Empty audio frame received for STT service" warnings when
  using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`)
  that buffer audio internally.
  (PR [#3828](https://github.com/pipecat-ai/pipecat/pull/3828))

- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is
  sometimes vocalized
  (PR [#3837](https://github.com/pipecat-ai/pipecat/pull/3837))

- Fixed `TTSSpeakFrame` not committing spoken text to the conversation context
  when used outside of an LLM response (e.g., bot greetings or injected
  speech).
  (PR [#3845](https://github.com/pipecat-ai/pipecat/pull/3845))

- Removed verbose per-chunk audio logging from `GenesysAudioHookSerializer`
  that flooded production logs.
  (PR [#3850](https://github.com/pipecat-ai/pipecat/pull/3850))

- Add beta feature warning when using custom prompts with AssemblyAI
  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))

- Fixed `LocalSmartTurnAnalyzerV3` producing incorrect end-of-turn predictions
  at non-16kHz sample rates (e.g. 8kHz Twilio telephony) by adding automatic
  resampling to 16kHz before Whisper feature extraction.
  (PR [#3857](https://github.com/pipecat-ai/pipecat/pull/3857))

- Fixed `PipelineTask` double-inserting `RTVIProcessor` into the frame chain
  when the user provides both an `RTVIProcessor` in the pipeline and a custom
  `RTVIObserver` subclass in observers.
  (PR [#3867](https://github.com/pipecat-ai/pipecat/pull/3867))

- Fixed turn completion instructions being lost when `LLMMessagesUpdateFrame`
  replaces the LLM context. When `filter_incomplete_user_turns` is enabled, the
  turn completion system message is now re-injected after context replacement.
  (PR [#3888](https://github.com/pipecat-ai/pipecat/pull/3888))

- Fixed Azure TTS and STT services silently swallowing cancellation errors
  (invalid API key, network failures, rate limiting) instead of propagating
  them as `ErrorFrame`s to the pipeline.
  (PR [#3893](https://github.com/pipecat-ai/pipecat/pull/3893))

### Performance

- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to
  `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on
  every interruption by using `client_req_id`-based multiplexing.
  (PR [#3759](https://github.com/pipecat-ai/pipecat/pull/3759))

### Other

- Standardized Sarvam STT/TTS User-Agent header handling to consistently send
  Pipecat SDK identity in websocket requests.
  (PR [#3886](https://github.com/pipecat-ai/pipecat/pull/3886))

## [0.0.103] - 2026-02-20

### Added

- Added `"timestampTransportStrategy": "ASYNC"` to `InworldAITTSService`. This
  allows timestamps info to trail audio chunks arrival, resulting in much
  better first audio chunk latency
  (PR [#3625](https://github.com/pipecat-ai/pipecat/pull/3625))

- Added model-specific `InputParams` to `RimeTTSService`: arcana params
  (`repetition_penalty`, `temperature`, `top_p`) and mistv2 params
  (`no_text_normalization`, `save_oovs`, `segment`). Model, voice, and param
  changes now trigger WebSocket reconnection.
  (PR [#3642](https://github.com/pipecat-ai/pipecat/pull/3642))

- Added `write_transport_frame()` hook to `BaseOutputTransport` allowing
  transport subclasses to handle custom frame types that flow through the audio
  queue.
  (PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))

- Added `DailySIPTransferFrame` and `DailySIPReferFrame` to the Daily
  transport.  These frames queue SIP transfer and SIP REFER operations with
  audio, so the operation executes only after the bot finishes its current
  utterance.
  (PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))

- Added keepalive support to `SarvamSTTService` to prevent idle connection
  timeouts (e.g. when used behind a `ServiceSwitcher`).
  (PR [#3730](https://github.com/pipecat-ai/pipecat/pull/3730))

- Added `UserIdleTimeoutUpdateFrame` to enable or disable user idle detection
  at runtime by updating the timeout dynamically.
  (PR [#3748](https://github.com/pipecat-ai/pipecat/pull/3748))

- Added `broadcast_sibling_id` field to the base `Frame` class. This field is
  automatically set by `broadcast_frame()` and `broadcast_frame_instance()` to
  the ID of the paired frame pushed in the opposite direction, allowing
  receivers to identify broadcast pairs.
  (PR [#3774](https://github.com/pipecat-ai/pipecat/pull/3774))

- Added `ignored_sources` parameter to `RTVIObserverParams` and
  `add_ignored_source()`/`remove_ignored_source()` methods to `RTVIObserver` to
  suppress RTVI messages from specific pipeline processors (e.g. a silent
  evaluation LLM).
  (PR [#3779](https://github.com/pipecat-ai/pipecat/pull/3779))

- Added `DeepgramSageMakerTTSService` for running Deepgram TTS models deployed
  on AWS SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
  Deepgram TTS protocol (Speak, Flush, Clear, Close), interruption handling,
  and per-turn TTFB metrics.
  (PR [#3785](https://github.com/pipecat-ai/pipecat/pull/3785))

### Changed

- ⚠️ `RimeTTSService` now defaults to `model="arcana"` and the
  `wss://users-ws.rime.ai/ws3` endpoint. `InputParams` defaults changed from
  mistv2-specific values to `None` — only explicitly-set params are sent as
  query params.
  (PR [#3642](https://github.com/pipecat-ai/pipecat/pull/3642))

- `AICFilter` now shares read-only AIC models via a singleton `AICModelManager`
  in `aic_filter.py`.
    - Multiple filters using the same model path or `(model_id,
      model_download_dir)` share one loaded model, with reference counting and
      concurrent load deduplication.
    - Model file I/O runs off the event loop so the filter does not block.
  (PR [#3684](https://github.com/pipecat-ai/pipecat/pull/3684))

- Added `X-User-Agent` and `X-Request-Id` headers to `InworldTTSService` for
  better traceability.
  (PR [#3706](https://github.com/pipecat-ai/pipecat/pull/3706))

- `DailyUpdateRemoteParticipantsFrame` is no longer deprecated and is now
  queued with audio like other transport frames.
  (PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))

- Bumped Pillow dependency upper bound from `<12` to `<13` to allow Pillow
  12.x.
  (PR [#3728](https://github.com/pipecat-ai/pipecat/pull/3728))

- Moved STT keepalive mechanism from `WebsocketSTTService` to the `STTService`
  base class, allowing any STT service (not just websocket-based ones) to use
  idle-connection keepalive via the `keepalive_timeout` and
  `keepalive_interval` parameters.
  (PR [#3730](https://github.com/pipecat-ai/pipecat/pull/3730))

- Improved audio context management in `AudioContextTTSService` by moving
  context ID tracking to the base class and adding
  `reuse_context_id_within_turn` parameter to control concurrent TTS request
  handling.
    - Added helper methods: `has_active_audio_context()`,
      `get_active_audio_context_id()`, `remove_active_audio_context()`,
      `reset_active_audio_context()`
    - Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS
      implementations by removing duplicate context management code
  (PR [#3732](https://github.com/pipecat-ai/pipecat/pull/3732))

- `UserIdleController` is now always created with a default timeout of 0
  (disabled). The `user_idle_timeout` parameter changed from `Optional[float] =
  None` to `float = 0` in `UserTurnProcessor`, `LLMUserAggregatorParams`, and
  `UserIdleController`.
  (PR [#3748](https://github.com/pipecat-ai/pipecat/pull/3748))

- Change the version specifier from `>=0.2.8` to `~=0.2.8` for the
  `speechmatics-voice` package to ensure compatibility with future patch
  versions.
  (PR [#3761](https://github.com/pipecat-ai/pipecat/pull/3761))

- Updated `InworldTTSService` and `InworldHttpTTSService` to use `ASYNC`
  timestamp transport strategy by default
  (PR [#3765](https://github.com/pipecat-ai/pipecat/pull/3765))

- Added `start_time` and `end_time` parameters to `start_ttfb_metrics()`,
  `stop_ttfb_metrics()`, `start_processing_metrics()`, and
  `stop_processing_metrics()` in `FrameProcessor` and `FrameProcessorMetrics`,
  allowing custom timestamps for metrics measurement. `STTService` now uses
  these instead of custom TTFB tracking.
  (PR [#3776](https://github.com/pipecat-ai/pipecat/pull/3776))

- Updated default Anthropic model from `claude-sonnet-4-5-20250929` to
  `claude-sonnet-4-6`.
  (PR [#3792](https://github.com/pipecat-ai/pipecat/pull/3792))

### Deprecated

- Deprecated unused `Traceable`, `@traceable`, `@traced`, and
  `AttachmentStrategy` in `pipecat.utils.tracing.class_decorators`. This module
  will be removed in a future release.
  (PR [#3733](https://github.com/pipecat-ai/pipecat/pull/3733))

### Fixed

- Fixed race condition where `RTVIObserver` could send messages before
  `DailyTransport` join completed. Outbound messages are now queued & delivered
  after the transport is ready.
  (PR [#3615](https://github.com/pipecat-ai/pipecat/pull/3615))

- Fixed async generator cleanup in OpenAI LLM streaming to prevent
  `AttributeError` with uvloop on Python 3.12+ (MagicStack/uvloop#699).
  (PR [#3698](https://github.com/pipecat-ai/pipecat/pull/3698))

- Fixed `SmallWebRTCTransport` input audio resampling to properly handle all
  sample rates, including 8kHz audio.
  (PR [#3713](https://github.com/pipecat-ai/pipecat/pull/3713))

- Fixed a race condition in `RTVIObserver` where bot output messages could be
  sent before the bot-started-speaking event.
  (PR [#3718](https://github.com/pipecat-ai/pipecat/pull/3718))

- Fixed Grok Realtime `session.updated` event parsing failure caused by the API
  returning prefixed voice names (e.g. `"human_Ara"` instead of `"Ara"`).
  (PR [#3720](https://github.com/pipecat-ai/pipecat/pull/3720))

- Fixed context ID reuse issue in `ElevenLabsTTSService`, `InworldTTSService`,
  `RimeTTSService`, `CartesiaTTSService`, `AsyncAITTSService`, and
  `PlayHTTTSService`. Services now properly reuse the same context ID across
  multiple `run_tts()` invocations within a single LLM turn, preventing context
  tracking issues and incorrect lifecycle signaling.
  (PR [#3729](https://github.com/pipecat-ai/pipecat/pull/3729))

- Fixed word timestamp interleaving issue in `ElevenLabsTTSService` when
  processing multiple sentences within a single LLM turn.
  (PR [#3729](https://github.com/pipecat-ai/pipecat/pull/3729))

- Fixed tracing service decorators executing the wrapped function twice when
  the function itself raised an exception (e.g., LLM rate limit, TTS timeout).
  (PR [#3735](https://github.com/pipecat-ai/pipecat/pull/3735))

- Fixed `LLMUserAggregator` broadcasting mute events before `StartFrame`
  reaches downstream processors.
  (PR [#3737](https://github.com/pipecat-ai/pipecat/pull/3737))

- Fixed `UserIdleController` false idle triggers caused by gaps between user
  and bot activity frames. The idle timer now starts only after
  `BotStoppedSpeakingFrame` and is suppressed during active user turns and
  function calls.
  (PR [#3744](https://github.com/pipecat-ai/pipecat/pull/3744))

- Fixed incorrect `sample_rate` assignment in
  `TavusInputTransport._on_participant_audio_data` (was using
  `audio.audio_frames` instead of `audio.sample_rate`).
  (PR [#3768](https://github.com/pipecat-ai/pipecat/pull/3768))

- Fixed `RTVIObserver` not processing upstream-only frames. Previously, all
  upstream frames were filtered out to avoid duplicate messages from
  broadcasted frames. Now only upstream copies of broadcasted frames are
  skipped.
  (PR [#3774](https://github.com/pipecat-ai/pipecat/pull/3774))

- Fixed mutable default arguments in `LLMContextAggregatorPair.__init__()` that
  could cause shared state across instances.
  (PR [#3782](https://github.com/pipecat-ai/pipecat/pull/3782))

- Fixed `DeepgramSageMakerSTTService` to properly track finalize lifecycle
  using `request_finalize()` / `confirm_finalize()` and use `is_final` (instead
  of `is_final and speech_final`) for final transcription detection, matching
  `DeepgramSTTService` behavior.
  (PR [#3784](https://github.com/pipecat-ai/pipecat/pull/3784))

- Fixed a race condition in `AudioContextTTSService` where the audio context
  could time out between consecutive TTS requests within the same turn, causing
  audio to be discarded.
  (PR [#3787](https://github.com/pipecat-ai/pipecat/pull/3787))

- Fixed `push_interruption_task_frame_and_wait()` hanging indefinitely when the
  `InterruptionFrame` does not reach the pipeline sink within the timeout.
  Added a `timeout` keyword argument to customize the wait duration.
  (PR [#3789](https://github.com/pipecat-ai/pipecat/pull/3789))

## [0.0.102] - 2026-02-10

### Added

- Added `ResembleAITTSService` for text-to-speech using Resemble AI's streaming
  WebSocket API with word-level timestamps and jitter buffering for smooth
  audio playback.
  (PR [#3134](https://github.com/pipecat-ai/pipecat/pull/3134))

- Added `UserBotLatencyObserver` for tracking user-to-bot response latency.
  When tracing is enabled, latency measurements are automatically recorded as
  `turn.user_bot_latency_seconds` attributes on OpenTelemetry turn spans.
  (PR [#3355](https://github.com/pipecat-ai/pipecat/pull/3355))

- Added `append_to_context` parameter to `TTSSpeakFrame` for conditional LLM
  context addition.
    - Allows fine-grained control over whether text should be added to
      conversation context
    - Defaults to `True` to maintain backward compatibility
  (PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))

- Added TTS context tracking system with `context_id` field to trace audio
  generation through the pipeline.
    - `TTSAudioRawFrame`, `TTSStartedFrame`, `TTSStoppedFrame` now include
      `context_id`
    - `AggregatedTextFrame` and `TTSTextFrame` now include `context_id`
    - Enables tracking which TTS request generated specific audio chunks
  (PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))

- Added support for Inworld TTS Websocket Auto Mode for improved latency
  (PR [#3593](https://github.com/pipecat-ai/pipecat/pull/3593))

- Added new frames for context summarization: `LLMContextSummaryRequestFrame`
  and `LLMContextSummaryResultFrame`.
  (PR [#3621](https://github.com/pipecat-ai/pipecat/pull/3621))

- Added context summarization feature to automatically compress conversation
  history when conversation length limits (by token or message count) are
  reached, enabling efficient long-running conversations.
    - Configure via `enable_context_summarization=True` in
      `LLMAssistantAggregatorParams`
    - Customize behavior with `LLMContextSummarizationConfig` (max tokens,
      thresholds, etc.)
    - Automatically preserves incomplete function call sequences during
      summarization
    - See new examples:
  `examples/foundational/54-context-summarization-openai.py` and
  `examples/foundational/54a-context-summarization-google.py`
  (PR [#3621](https://github.com/pipecat-ai/pipecat/pull/3621))

- Added RTVI function call lifecycle events (`llm-function-call-started`,
  `llm-function-call-in-progress`, `llm-function-call-stopped`) with
  configurable security levels via
  `RTVIObserverParams.function_call_report_level`. Supports per-function
  control over what information is exposed (`DISABLED`, `NONE`, `NAME`, or
  `FULL`).
  (PR [#3630](https://github.com/pipecat-ai/pipecat/pull/3630))

- Added `RequestMetadataFrame` and metadata handling for `ServiceSwitcher` to
  ensure STT services correctly emit `STTMetadataFrame` when switching between
  services. Only the active service's metadata is propagated downstream,
  switching services triggers the newly active service to re-emit its metadata,
  and proper frame ordering is maintained at startup.
  (PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))

- Added `STTMetadataFrame` to broadcast STT service latency information at
  pipeline start.
    - STT services broadcast P99 time-to-final-segment (`ttfs_p99_latency`) to
      downstream processors
    - Turn stop strategies automatically configure their STT timeout from this
      metadata
    - Developers can override `ttfs_p99_latency` via constructor argument for
      custom deployments
    - Added measured P99 values for STT providers.
    - See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) to
      measure latency for your configuration
  (PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))

- Added support for `is_sandbox` parameter in `LiveAvatarNewSessionRequest` to
  enable sandbox mode for HeyGen LiveAvatar sessions.
  (PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))

- Added support for `video_settings` parameter in `LiveAvatarNewSessionRequest`
  to configure video encoding (H264/VP8) and quality levels.
  (PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))

- Added `OpenAIRealtimeSTTService` for real-time streaming speech-to-text using
  OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD
  and server-side VAD modes, noise reduction, and automatic reconnection.
  (PR [#3656](https://github.com/pipecat-ai/pipecat/pull/3656))

- Added `bulbul:v3-beta` TTS model support for Sarvam AI with temperature
  control and 25 new speaker voices.
  (PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))

- Added `saaras:v3` STT model support for Sarvam AI with new `mode` parameter
  (transcribe, translate, verbatim, translit, codemix) and prompt support.
  (PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))

- Added new OpenAI TTS voice options `marin` and `cedar`.
  (PR [#3682](https://github.com/pipecat-ai/pipecat/pull/3682))

- Added `UserMuteStartedFrame` and `UserMuteStoppedFrame` system frames, and
  corresponding `user-mute-started` / `user-mute-stopped` RTVI messages, so
  clients can observe when mute strategies activate or deactivate.
  (PR [#3687](https://github.com/pipecat-ai/pipecat/pull/3687))

### Changed

- Updated all 30+ TTS service implementations to support context tracking with
  `context_id`.
    - Services now generate and propagate context IDs through TTS frames
    - Enables end-to-end tracing of TTS requests through the pipeline
  (PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))

- ⚠️ `TTSService.run_tts()` now requires a `context_id` parameter for context
  tracking.
    - Custom TTS service implementations must update their `run_tts()`
      signature
    - Before: `async def run_tts(self, text: str) -> AsyncGenerator[Frame,
      None]:`
    - After: `async def run_tts(self, text: str, context_id: str) ->
      AsyncGenerator[Frame, None]:`
  (PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))

- Simplified context aggregators to use `frame.append_to_context` flag instead
  of tracking internal state.
    - Cleaner logic in `LLMResponseAggregator` and
      `LLMResponseUniversalAggregator`
    - More consistent behavior across aggregator implementations
  (PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))

- Updated timestamps to be cumulative within an agent turn, using
  flushCompleted message as an indication of when timestamps from the server
  are reset to 0
  (PR [#3593](https://github.com/pipecat-ai/pipecat/pull/3593))

- Changed `KokoroTTSService` to use `kokoro-onnx` instead of `kokoro` as the
  underlying TTS engine.
  (PR [#3612](https://github.com/pipecat-ai/pipecat/pull/3612))

- Improved user turn stop timing in `TranscriptionUserTurnStopStrategy` and
  `TurnAnalyzerUserTurnStopStrategy`.
    - Timeout now starts on `VADUserStoppedSpeakingFrame` for tighter, more
      predictable timing
    - Added support for finalized transcripts
      (`TranscriptionFrame.finalized=True`) to trigger earlier
    - Added fallback timeout for edge cases where transcripts arrive without
      VAD events
    - Removed `InterimTranscriptionFrame` handling (no longer affects timing)
  (PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))

- Improved the accuracy of the `UserBotLatencyObserver` and
  `UserBotLatencyLogObserver` by measuring from the time when the user actually
  starts speaking.
  (PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))

- ⚠️ Renamed `timeout` parameter to `user_speech_timeout` in
  `TranscriptionUserTurnStopStrategy`.
  (PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))

- Updated the `VADUserStartedSpeakingFrame` to include `start_secs` and
  `timestamp` and `VADUserStoppedSpeakingFrame` to include `stop_secs` and
  `timestamp`, removing the need to separately handle the
  `SpeechControlParamsFrame` for VADParams values.
  (PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))

- ⚠️ Renamed `TranscriptionUserTurnStopStrategy` to
  `SpeechTimeoutUserTurnStopStrategy`. The old name is deprecated and will be
  removed in a future release.
  (PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))

- `AssemblyAISTTService` now automatically configures optimal settings for
  manual turn detection when `vad_force_turn_endpoint=True`. This sets
  `end_of_turn_confidence_threshold=1.0` and `max_turn_silence=2000` by
  default, which disables model-based turn detection and reduces latency by
  relying on external VAD for turn endpoints. Warnings are logged if
  conflicting settings are detected.
  (PR [#3644](https://github.com/pipecat-ai/pipecat/pull/3644))

- Upgraded the `pipecat-ai-small-webrtc-prebuilt` package to v2.1.0.
  (PR [#3652](https://github.com/pipecat-ai/pipecat/pull/3652))

- Changed default session mode from "CUSTOM" to "LITE" in HeyGen LiveAvatar
  integration, with VP8 as the default video encoding.
  (PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))

- ⚠️ The default `VADParams` `stop_secs` default is changing from `0.8` seconds
  to `0.2` seconds. This change both simplifies the developer experience and
  improves the performance of STT services. With a shorter `stop_secs` value,
  STT services using a local VAD can finalize sooner, resulting in faster
  transcription.
    - `SpeechTimeoutUserTurnStopStrategy`: control how long to wait for
      additional user speech using `user_speech_timeout` (default: 0.6 sec).
    - `TurnAnalyzerUserTurnStopStrategy`: the turn analyzer automatically
      adjusts the user wait time based on the audio input.
  (PR [#3659](https://github.com/pipecat-ai/pipecat/pull/3659))

- Moved interruption wait event from per-processor instance state to
  `InterruptionFrame` itself. Added `InterruptionFrame.complete()` to signal
  when the interruption has fully traversed the pipeline. Custom processors
  that block or consume an `InterruptionFrame` before it reaches the pipeline
  sink must call `frame.complete()` to avoid stalling
  `push_interruption_task_frame_and_wait()`. A warning is logged if completion
  does not happen within 2 seconds.
  (PR [#3660](https://github.com/pipecat-ai/pipecat/pull/3660))

- Update the default model to `scribe_v2` for `ElevenLabsSTTService`.
  (PR [#3664](https://github.com/pipecat-ai/pipecat/pull/3664))

- Changed the `DeepgramSTTService` default setting for `smart_format` to
  `False`, as agents don't need smart formatting. Disabling this setting
  provides a small performance improvement, as well.
  (PR [#3666](https://github.com/pipecat-ai/pipecat/pull/3666))

- Changed `FunctionCallCancelFrame` to broadcast in both directions for
  consistency with other function call frames.
  (PR [#3672](https://github.com/pipecat-ai/pipecat/pull/3672))

- Changed default user turn stop strategy from
  `TranscriptionUserTurnStopStrategy` to `TurnAnalyzerUserTurnStopStrategy`
  with `LocalSmartTurnAnalyzerV3`.
  (PR [#3689](https://github.com/pipecat-ai/pipecat/pull/3689))

- Renamed `RequestMetadataFrame` to `ServiceSwitcherRequestMetadataFrame` and
  added a `service` field to target a specific service. The frame is now pushed
  downstream by services after handling instead of being silently consumed.
  (PR [#3692](https://github.com/pipecat-ai/pipecat/pull/3692))

- Update `SonioxSTTService` to set `vad_force_turn_endpoint` to `True`. This
  setting disabled the turn detection logic available natively in Soniox.
  Instead, Soniox relies on a local VAD to finalize the transcript. This
  configuration meaningfully reduces the time to final segment for Soniox. With
  this setting enabled, Soniox outputs a transcript in ~250ms (median). Pipecat
  enables smart-turn detection by default using the `LocalSmartTurnAnalyzerV3`.
  To use the native turn detection logic in Soniox, just set
  `vad_force_turn_endpoint` to `False`.
  (PR [#3697](https://github.com/pipecat-ai/pipecat/pull/3697))

- Update `SonioxSTTService` default model to `stt-rt-v4`.
  (PR [#3697](https://github.com/pipecat-ai/pipecat/pull/3697))

- Updated the default model to `async_flash_v1.0` and base URL to
  `https://api.async.com` for `AsyncAITTSService`.
  (PR [#3701](https://github.com/pipecat-ai/pipecat/pull/3701))

### Deprecated

- Deprecated `UserBotLatencyLogObserver`. Use `UserBotLatencyObserver` directly
  with its `on_latency_measured` event handler instead.
  (PR [#3355](https://github.com/pipecat-ai/pipecat/pull/3355))

- Deprecated `RTVILLMFunctionCallMessage`, `RTVILLMFunctionCallMessageData`,
  and `RTVIProcessor.handle_function_call()`. Use the new
  `llm-function-call-in-progress` event sent automatically by `RTVIObserver`
  instead.
  (PR [#3630](https://github.com/pipecat-ai/pipecat/pull/3630))

### Removed

- ⚠️ Removed `timeout` parameter from `TurnAnalyzerUserTurnStopStrategy`. The
  timeout is now managed internally based on STT latency.
  (PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))

### Fixed

- Fixed pipeline freeze when `InterruptionFrame` discards `EndFrame` or
  `StopFrame` by making terminal frames uninterruptible.
  (PR [#3542](https://github.com/pipecat-ai/pipecat/pull/3542))

- Fixed OpenAI LLM stream not being closed on cancellation/exception, which
  could leak sockets.
  (PR [#3589](https://github.com/pipecat-ai/pipecat/pull/3589))

- Fixed `PipelineTask` adding duplicate `RTVIProcessor` and `RTVIObserver` when
  they were already provided in the pipeline or observers list. They are now
  detected and skipped, with appropriate warnings and errors logged for
  mismatched configurations.
  (PR [#3610](https://github.com/pipecat-ai/pipecat/pull/3610))

- Fixed function call timeout task not being cancelled when the handler
  completes without calling `result_callback` or is cancelled externally, which
  caused `RuntimeWarning: coroutine was never awaited`.
  (PR [#3616](https://github.com/pipecat-ai/pipecat/pull/3616))

- Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin
  languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK
  languages, causing text to accumulate until flush instead of being split at
  sentence boundaries. Added fallback detection for unambiguous non-Latin
  sentence-ending punctuation (e.g., `。`, `？`, `！`).
  (PR [#3617](https://github.com/pipecat-ai/pipecat/pull/3617))

- Fixed `PipelineTask` to also call `set_bot_ready()` when an external
  `RTVIProcessor` is provided.
  (PR [#3623](https://github.com/pipecat-ai/pipecat/pull/3623))

- Fixed `VADController` not broadcasting `SpeechControlParamsFrame` on startup,
  which prevented STT services from receiving VAD params needed for TTFB
  measurement.
  (PR [#3628](https://github.com/pipecat-ai/pipecat/pull/3628))

- Fixed `StopAsyncIteration` exceptions in `parse_telephony_websocket()` when
  WebSocket connections close before sending expected messages.
  (PR [#3629](https://github.com/pipecat-ai/pipecat/pull/3629))

- Fixed WebSocket transport error when broadcasting
  `InputTransportMessageFrame` by correctly instantiating the frame with its
  message parameter.
  (PR [#3635](https://github.com/pipecat-ai/pipecat/pull/3635))

- Fixed orphan OpenTelemetry spans during flow initialization and transitions
  in tracing.
  (PR [#3649](https://github.com/pipecat-ai/pipecat/pull/3649))

- Fixed `SambaNovaLLMService` and `GoogleLLMOpenAIBetaService` streams not
  being closed on cancellation/exception, which could leak sockets.
  (PR [#3663](https://github.com/pipecat-ai/pipecat/pull/3663))

- Fixed an issue in `InworldTTSService` where punctuation was pronounced. Now,
  the `InworldTTSService` ensures proper spacing between sentences, resolving
  pronunciation issues.
  (PR [#3667](https://github.com/pipecat-ai/pipecat/pull/3667))

- Fixed `ParallelPipeline` allowing frames pushed by internal processors to
  escape during lifecycle frame (`StartFrame`/`EndFrame`/`CancelFrame`)
  synchronization. These frames are now buffered and flushed after all branches
  complete.
  (PR [#3668](https://github.com/pipecat-ai/pipecat/pull/3668))

- Fixed issues in Sarvam STT and TTS services: missing event handler
  registration for VAD signals, `Optional[bool]` type annotations, WebSocket
  state cleanup on API errors, and TTS disconnect/reconnection state
  management.
  (PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))

- Fixed `RTVIObserver` sending duplicate client messages for frames that are
  broadcast in both directions (e.g. `UserStartedSpeakingFrame`,
  `FunctionCallResultFrame`).
  (PR [#3672](https://github.com/pipecat-ai/pipecat/pull/3672))

- Fixed WebSocket STT services (ElevenLabs, Cartesia, Gladia, Soniox)
  disconnecting due to idle timeout when no audio is being sent (e.g. when
  inactive behind a `ServiceSwitcher`). `WebsocketSTTService` now provides
  opt-in silence-based keepalive via `keepalive_timeout` and
  `keepalive_interval` parameters.
  (PR [#3675](https://github.com/pipecat-ai/pipecat/pull/3675))

## [0.0.101] - 2026-01-30

### Added

- Additions for `AICFilter` and `AICVADAnalyzer`:
    - Added model downloading support to `AICFilter` with `model_id` and
      `model_download_dir` parameters.
    - Added `model_path` parameter to `AICFilter` for loading local `.aicmodel`
      files.
    - Added unit tests for `AICFilter` and `AICVADAnalyzer`.
  (PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))

- Added handling for `server_content.interrupted` signal in the Gemini Live
  service for faster interruption response in the case where there isn't
  already turn tracking in the pipeline, e.g. local VAD + context aggregators.
  When there is already turn tracking in the pipeline, the additional
  interruption does no harm.
  (PR [#3429](https://github.com/pipecat-ai/pipecat/pull/3429))

- Added new `GenesysFrameSerializer` for the Genesys AudioHook WebSocket
  protocol, enabling bidirectional audio streaming between Pipecat pipelines
  and Genesys Cloud contact center.
  (PR [#3500](https://github.com/pipecat-ai/pipecat/pull/3500))

- Added `reached_upstream_types` and `reached_downstream_types` read-only
  properties to `PipelineTask` for inspecting current frame filters.
  (PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))

- Added `add_reached_upstream_filter()` and `add_reached_downstream_filter()`
  methods to `PipelineTask` for appending frame types.
  (PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))

- Added `UserTurnCompletionLLMServiceMixin` for LLM services to detect and
  filter incomplete user turns. When enabled via `filter_incomplete_user_turns`
  in `LLMUserAggregatorParams`, the LLM outputs a turn completion marker at the
  start of each response: ✓ (complete), ○ (incomplete short), or ◐ (incomplete
  long). Incomplete turns are suppressed, and configurable timeouts
  automatically re-prompt the user.
  (PR [#3518](https://github.com/pipecat-ai/pipecat/pull/3518))

- Added `FrameProcessor.broadcast_frame_instance(frame)` method to broadcast a
  frame instance by extracting its fields and creating new instances for each
  direction.
  (PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))

- `PipelineTask` now automatically adds `RTVIProcessor` and registers
  `RTVIObserver` when `enable_rtvi=True` (default), simplifying pipeline setup.
  (PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))

- Added `RTVIProcessor.create_rtvi_observer()` factory method for creating RTVI
  observers.
  (PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))

- Added `video_out_codec` parameter to `TransportParams` allowing configuration
  of the preferred video codec (e.g., `"VP8"`, `"H264"`, `"H265"`) for video
  output in `DailyTransport`.
  (PR [#3520](https://github.com/pipecat-ai/pipecat/pull/3520))

- Added `location` parameter to Google TTS services (`GoogleHttpTTSService`,
  `GoogleTTSService`, `GeminiTTSService`) for regional endpoint support.
  (PR [#3523](https://github.com/pipecat-ai/pipecat/pull/3523))

- Added new `PIPECAT_SMART_TURN_LOG_DATA` environment variable, which causes
  Smart Turn input data to be saved to disk
  (PR [#3525](https://github.com/pipecat-ai/pipecat/pull/3525))

- Added `result_callback` parameter to `UserImageRequestFrame` to support
  deferred function call results.
  (PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))

- Added `function_call_timeout_secs` parameter to `LLMService` to configure
  timeout for deferred function calls (defaults to 10.0 seconds).
  (PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))

- Added `vad_analyzer` parameter to `LLMUserAggregatorParams`. VAD analysis is
  now handled inside the `LLMUserAggregator` rather than in the transport,
  keeping voice activity detection closer to where it is consumed. The
  `vad_analyzer` on `BaseInputTransport` is now deprecated.

    ```python
    context_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
            vad_analyzer=SileroVADAnalyzer(),
        ),
    )
    ```
  (PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))

- Added `VADProcessor` for detecting speech in audio streams within a pipeline.
  Pushes `VADUserStartedSpeakingFrame`, `VADUserStoppedSpeakingFrame`, and
  `UserSpeakingFrame` downstream based on VAD state changes.
  (PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))

- Added `VADController` for managing voice activity detection state and
  emitting speech events independently of transport or pipeline processors.
  (PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))

- Added local `PiperTTSService` for offline text-to-speech using Piper voice
  models. The existing HTTP-based service has been renamed to
  `PiperHttpTTSService`.
  (PR [#3585](https://github.com/pipecat-ai/pipecat/pull/3585))

- `main()` in `pipecat.runner.run` now accepts an optional
  `argparse.ArgumentParser`, allowing bots to define custom CLI arguments
  accessible via `runner_args.cli_args`.
  (PR [#3590](https://github.com/pipecat-ai/pipecat/pull/3590))

- Added `KokoroTTSService` for local text-to-speech synthesis using the
  Kokoro-82M model.
  (PR [#3595](https://github.com/pipecat-ai/pipecat/pull/3595))

### Changed

- Updated `AICFilter` and `AICVADAnalyzer` to use aic-sdk ~= 2.0.1.
  (PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))

- Improved the STT TTFB (Time To First Byte) measurement, reporting the delay
  between when the user stops speaking and when the final transcription is
  received. Note: Unlike traditional TTFB which measures from a discrete
  request, STT services receive continuous audio input—so we measure from
  speech end to final transcript, which captures the latency that matters for
  voice AI applications. In support of this change, added `finalized` field to
  `TranscriptionFrame` to indicate when a transcript is the final result for an
  utterance.
  (PR [#3495](https://github.com/pipecat-ai/pipecat/pull/3495))

- `SarvamSTTService` now defaults `vad_signals` and `high_vad_sensitivity` to
  `None` (omitted from connection parameters), improving latency by ~300ms
  compared to the previous defaults.
  (PR [#3495](https://github.com/pipecat-ai/pipecat/pull/3495))

- Changed frame filter storage from tuples to sets in `PipelineTask`.
  (PR [#3510](https://github.com/pipecat-ai/pipecat/pull/3510))

- Changed default Inworld TTS model from `inworld-tts-1` to
  `inworld-tts-1.5-max`.
  (PR [#3531](https://github.com/pipecat-ai/pipecat/pull/3531))

- `FrameSerializer` now subclasses from `BaseObject` to enable event support.
  (PR [#3560](https://github.com/pipecat-ai/pipecat/pull/3560))

- Added support for TTFS in `SpeechmaticsSTTService` and set the default mode
  to `EXTERNAL` to support Pipecat-controlled VAD.
  - Changed dependency to `speechmatics-voice[smart]>=0.2.8`
  (PR [#3562](https://github.com/pipecat-ai/pipecat/pull/3562))

- ⚠️ Changed function call handling to use timeout-based completion instead of
  immediate callback execution.
    - Function calls that defer their results (e.g., `UserImageRequestFrame`)
      now use a timeout mechanism
    - The `result_callback` is invoked automatically when the deferred
      operation completes or after timeout
    - This change affects examples using `UserImageRequestFrame` - the
      `result_callback` should now be passed to the frame instead of being called
      immediately
  (PR [#3571](https://github.com/pipecat-ai/pipecat/pull/3571))

- Pipecat runner now uses `DAILY_ROOM_URL` instead of `DAILY_SAMPLE_ROOM_URL`.
  (PR [#3582](https://github.com/pipecat-ai/pipecat/pull/3582))

- Updates to `GradiumSTTService`:
    - Now flushes pending transcriptions when VAD detects the user stopped
      speaking, improving response latency.
    - `GradiumSTTService` now supports `InputParams` for configuring `language`
      and `delay_in_frames` settings.
  (PR [#3587](https://github.com/pipecat-ai/pipecat/pull/3587))

### Deprecated

- ⚠️ Deprecated `vad_analyzer` parameter on `BaseInputTransport`. Pass
  `vad_analyzer` to `LLMUserAggregatorParams` instead or use `VADProcessor` in
  the pipeline.
  (PR [#3583](https://github.com/pipecat-ai/pipecat/pull/3583))

### Removed

- Removed deprecated `AICFilter` parameters: `enhancement_level`, `voice_gain`,
  `noise_gate_enable`.
  (PR [#3408](https://github.com/pipecat-ai/pipecat/pull/3408))

### Fixed

- Fixed an issue where if you were using `OpenRouterLLMService` with a Gemini
  model, it wouldn't handle multiple `"system"` messages as expected (and as we
  do in `GoogleLLMService`), which is to convert subsequent ones into `"user"`
  messages. Instead, the latest `"system"` message would overwrite the previous
  ones.
  (PR [#3406](https://github.com/pipecat-ai/pipecat/pull/3406))

- Transports now properly broadcast `InputTransportMessageFrame` frames both
  upstream and downstream instead of only pushing downstream.
  (PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))

- Fixed `FrameProcessor.broadcast_frame()` to deep copy kwargs, preventing
  shared mutable references between the downstream and upstream frame
  instances.
  (PR [#3519](https://github.com/pipecat-ai/pipecat/pull/3519))

- Fixed OpenAI LLM services to emit `ErrorFrame` on completion timeout,
  enabling proper error handling and LLMSwitcher failover.
  (PR [#3529](https://github.com/pipecat-ai/pipecat/pull/3529))

- Fixed a logging issue where non-ASCII characters (e.g., Japanese, Chinese,
  etc.) were being unnecessarily escaped to Unicode sequences when function
  call occurred.
  (PR [#3536](https://github.com/pipecat-ai/pipecat/pull/3536))

- Fixed how audio tracks are synchronized inside the `AudioBufferProcessor` to
  fix timing issues where silence and audio were misaligned between user and
  bot buffers.
  (PR [#3541](https://github.com/pipecat-ai/pipecat/pull/3541))

- Fixed race condition in `OpenAIRealtimeBetaLLMService` that could cause an
  error when truncating the conversation.
  (PR [#3567](https://github.com/pipecat-ai/pipecat/pull/3567))

- Fixed an infinite loop in `WebsocketService` that blocked the event loop when
  a remote server closed the connection gracefully.
  (PR [#3574](https://github.com/pipecat-ai/pipecat/pull/3574))

- Fixed `LLMUserAggregator` and `LLMAssistantAggregator` not emitting pending
  transcripts via `on_user_turn_stopped` and `on_assistant_turn_stopped` events
  when the conversation ends (`EndFrame`) or is cancelled (`CancelFrame`).
  (PR [#3575](https://github.com/pipecat-ai/pipecat/pull/3575))

- Added missing `LiveKitRunnerArguments` and `LiveKitTransport` support in
  runner utilities to enable LiveKit transport configuration.
  (PR [#3580](https://github.com/pipecat-ai/pipecat/pull/3580))

- Fixed race condition in `OpenAIRealtimeLLMService` that could cause an error
  when truncating the conversation.
  (PR [#3581](https://github.com/pipecat-ai/pipecat/pull/3581))

- Fixed `PiperHttpTTSService` (olf `PiperTTSService`) to resample audio output
  based on the model's sample rate parsed from the WAV header.
  (PR [#3585](https://github.com/pipecat-ai/pipecat/pull/3585))

- Fixed `UserTurnController` to reset user turn timeout when interim
  transcriptions are received.
  (PR [#3594](https://github.com/pipecat-ai/pipecat/pull/3594))

- Fixed an issue in the `IVRNavigator` where the `TextFrame`s pushed had
  incorrect spacing. Now, the internal `IVRProcessor` pushes
  `AggregatedTextFrame`s when in conversation mode. This allows for controlling
  spacing of the outputted, aggregated text.
  (PR [#3604](https://github.com/pipecat-ai/pipecat/pull/3604))

- Fixed `GeminiLiveLLMService` transcription timeout handler not being
  scheduled by yielding to the event loop after task creation.
  (PR [#3605](https://github.com/pipecat-ai/pipecat/pull/3605))

## [0.0.100] - 2026-01-20

### Added

- Added Hathora service to support Hathora-hosted TTS and STT models (only
  non-streaming)
  (PR [#3169](https://github.com/pipecat-ai/pipecat/pull/3169))

- Added `CambTTSService`, using Camb.ai's TTS integration with MARS models
  (mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech
  synthesis.
  (PR [#3349](https://github.com/pipecat-ai/pipecat/pull/3349))

- Added the `additional_headers` param to `WebsocketClientParams`, allowing
  `WebsocketClientTransport` to send custom headers on connect, for cases such
  as authentication.
  (PR [#3461](https://github.com/pipecat-ai/pipecat/pull/3461))

- Added `UserIdleController` for detecting user idle state, integrated into
  `LLMUserAggregator` and `UserTurnProcessor` via optional `user_idle_timeout`
  parameter. Emits `on_user_turn_idle` event for application-level handling.
  Deprecated `UserIdleProcessor` in favor of the new compositional approach.
  (PR [#3482](https://github.com/pipecat-ai/pipecat/pull/3482))

- Added `on_user_mute_started` and `on_user_mute_stopped` event handlers to
  `LLMUserAggregator` for tracking user mute state changes.
  (PR [#3490](https://github.com/pipecat-ai/pipecat/pull/3490))

### Changed

- Enhanced interruption handling in `AsyncAITTSService` by supporting
  multi-context WebSocket sessions for more robust context management.
  (PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))

- Throttle `UserSpeakingFrame` to broadcast at most every 200ms instead of on
  every audio chunk, reducing frame processing overhead during user speech.
  (PR [#3483](https://github.com/pipecat-ai/pipecat/pull/3483))

### Deprecated

- For consistency with other package names, we just deprecated
  `pipecat.turns.mute` (introduced in Pipecat 0.0.99) in favor of
  `pipecat.turns.user_mute`.
  (PR [#3479](https://github.com/pipecat-ai/pipecat/pull/3479))

### Fixed

- Corrected TTFB metric calculation in `AsyncAIHttpTTSService`.
  (PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))

- Fixed an issue where the "bot-llm-text" RTVI event would not fire for
  realtime (speech-to-speech) services:

    - `AWSNovaSonicLLMService`
    - `GeminiLiveLLMService`
    - `OpenAIRealtimeLLMService`
    - `GrokRealtimeLLMService`

  The issue was that these services weren't pushing `LLMTextFrame`s. Now
  they  do.
  (PR [#3446](https://github.com/pipecat-ai/pipecat/pull/3446))

- Fixed an issue where `on_user_turn_stop_timeout` could fire while a user is
  talking when using `ExternalUserTurnStrategies`.
  (PR [#3454](https://github.com/pipecat-ai/pipecat/pull/3454))

- Fixed an issue where user turn start strategies were not being reset after a
  user turn started, causing incorrect strategy behavior.
  (PR [#3455](https://github.com/pipecat-ai/pipecat/pull/3455))

- Fixed `MinWordsUserTurnStartStrategy` to not aggregate transcriptions,
  preventing incorrect turn starts when words are spoken with pauses between
  them.
  (PR [#3462](https://github.com/pipecat-ai/pipecat/pull/3462))

- Fixed an issue where Grok Realtime would error out when running with
  SmallWebRTC transport.
  (PR [#3480](https://github.com/pipecat-ai/pipecat/pull/3480))

- Fixed a `Mem0MemoryService` issue where passing `async_mode: true` was
  causing an error. See
  https://docs.mem0.ai/platform/features/async-mode-default-change.
  (PR [#3484](https://github.com/pipecat-ai/pipecat/pull/3484))

- Fixed `AWSNovaSonicLLMService.reset_conversation()`, which would previously
  error out. Now it successfully reconnects and "rehydrates" from the context
  object.
  (PR [#3486](https://github.com/pipecat-ai/pipecat/pull/3486))

- Fixed `AzureTTSService` transcript formatting issues:
    - Punctuation now appears without extra spaces (e.g., "Hello!" instead of
      "Hello !")
    - CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces
      between characters
  (PR [#3489](https://github.com/pipecat-ai/pipecat/pull/3489))

- Fixed an issue where `UninterruptibleFrame` frames would not be preserved in
  some cases.
  (PR [#3494](https://github.com/pipecat-ai/pipecat/pull/3494))

- Fixed memory leak in `LiveKitTransport` when `video_in_enabled` is `False`.
  (PR [#3499](https://github.com/pipecat-ai/pipecat/pull/3499))

- Fixed an issue in `AIService` where unhandled exceptions in `start()`,
  `stop()`, or `cancel()` implementations would prevent `process_frame()` to
  continue and therefore `StartFrame`, `EndFrame`, or `CancelFrame` from being
  pushed downstream, causing the pipeline to not start or stop properly.
  (PR [#3503](https://github.com/pipecat-ai/pipecat/pull/3503))

- Moved `NVIDIATTSService` and `NVIDIASTTService` client initialization from
  constructor to `start()` for better error handling.
  (PR [#3504](https://github.com/pipecat-ai/pipecat/pull/3504))

- Optimized `NVIDIATTSService` to process incoming audio frames immediately.
  (PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))

- Optimized `NVIDIASTTService` by removing unnecessary queue and task.
  (PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))

- Fixed a `CambTTSService` issue where client was being initialized in the
  constructor which wouldn't allow for proper Pipeline error handling.
  (PR [#3511](https://github.com/pipecat-ai/pipecat/pull/3511))

## [0.0.99] - 2026-01-13

### Added

- Introducing user turn strategies. User turn strategies indicate when the user
  turn starts or stops. In conversational agents, these are often referred to
  as start/stop speaking or turn-taking plans or policies.

  User turn start strategies indicate when the user starts speaking (e.g.
  using VAD events or when a user says one or more words).

  User turn stop strategies indicate when the user stops speaking (e.g. using
  an end-of-turn detection model or by observing incoming transcriptions).

  A list of strategies can be specified for both strategies; strategies are
  evaluated in order until one evaluates to true.

  Available user turn start strategies:

  - VADUserTurnStartStrategy
  - TranscriptionUserTurnStartStrategy
  - MinWordsUserTurnStartStrategy
  - ExternalUserTurnStartStrategy

  Available user turn stop strategies:

  - TranscriptionUserTurnStopStrategy
  - TurnAnalyzerUserTurnStopStrategy
  - ExternalUserTurnStopStrategy

  The default strategies are:

  - start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
  - stop: [TranscriptionUserTurnStopStrategy]

  Turn strategies are configured when setting up `LLMContextAggregatorPair`.
  For example:

  ```python
  context_aggregator = LLMContextAggregatorPair(
      context,
      user_params=LLMUserAggregatorParams(
          user_turn_strategies=UserTurnStrategies(
              stop=[
                  TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
                  )
              ],
          )
      ),
  )
  ```

  In order to use the user turn strategies you must update to the new
  universal `LLMContext` and `LLMContextAggregatorPair`.
  (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))

- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural
  network via pyrnnoise library.
  (PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205))

- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time
  voice conversations:

  - Support for real-time audio streaming with WebSocket connection
  - Built-in server-side VAD (Voice Activity Detection)
  - Multiple voice options: Ara, Rex, Sal, Eve, Leo
  - Built-in tools support: web_search, x_search, file_search
  - Custom function calling with standard Pipecat tools schema
  - Configurable audio formats (PCM at 8kHz-48kHz)
    (PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))

- Added an approximation of TTFB for Ultravox.
  (PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268))

- Added a new `AudioContextTTSService` to the TTS service base classes. The
  `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and
  `WebsocketWordTTSService`.
  (PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289))

- `LLMUserAggregator` now exposes the following events:

  - `on_user_turn_started`: triggered when a user turn starts
  - `on_user_turn_stopped`: triggered when a user turn ends
  - `on_user_turn_stop_timeout`: triggered when a user turn does not stop
    and times out
    (PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))

- Introducing user mute strategies. User mute strategies indicate when user
  input should be muted based on the current system state.

  In conversational agents, user mute strategies are used to prevent user
  input from interrupting bot speech, tool execution, or other critical system
  operations.

  A list of strategies can be specified; all strategies are evaluated for
  every frame so that each strategy can maintain its internal state. A user
  frame is muted if any of the configured strategies indicates it should be
  muted.

  Available user mute strategies:

  - `FirstSpeechUserMuteStrategy`
  - `MuteUntilFirstBotCompleteUserMuteStrategy`
  - `AlwaysUserMuteStrategy`
  - `FunctionCallUserMuteStrategy`

  User mute strategies replace the legacy `STTMuteFilter` and provide a more
  flexible and composable approach to muting user input.

  User mute strategies are configured when setting up the
  `LLMContextAggregatorPair`. For example:

  ```python
  context_aggregator = LLMContextAggregatorPair(
      context,
      user_params=LLMUserAggregatorParams(
          user_mute_strategies=[
              FirstSpeechUserMuteStrategy(),
          ]
      ),
  )
  ```

  In order to use user mute strategies you should update to the new universal
  `LLMContext` and `LLMContextAggregatorPair`.
  (PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))

- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService`
  and `NvidiaTTSService`.
  (PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300))

- Added `enable_interruptions` constructor argument to all user turn
  strategies. This tells the `LLMUserAggregator` to push or not push an
  `InterruptionFrame`.
  (PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))

- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control
  sentence splitting behavior for finals on sentence boundaries.
  (PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))

- Added word-level timestamp support to `AzureTTSService` for accurate
  text-to-audio synchronization.
  (PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334))

- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams`
  and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation
  dictionary feature for custom pronunciations.
  (PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346))

- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport`
  (see https://www.liveavatar.com/).
  (PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357))

- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:

  - New `start_video_paused` parameter to control initial video input state
  - New `video_frame_detail` parameter to set image processing quality
    ("auto",
    "low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
    parameter.
  - `set_video_input_paused()` method to pause/resume video input at runtime
  - `set_video_frame_detail()` method to adjust video frame quality
    dynamically
  - Automatic rate limiting (1 frame per second) to prevent API overload
    (PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))

- Added `UserTurnProcessor`, a frame processor built on `UserTurnController`
  that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames
  and interruptions based on the controller's user turn strategies.
  (PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))

- Added `UserTurnController` to manage user turns. It emits
  `on_user_turn_started`, `on_user_turn_stopped`, and
  `on_user_turn_stop_timeout` events, and can be integrated into processors to
  detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are
  implemented using this controller.
  (PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))

- Added `should_interrupt` property to `DeepgramFluxSTTService`,
  `DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the
  bot should be interrupted when the external service detects user speech.
  (PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374))

- `LLMAssistantAggregator` now exposes the following events:

  - `on_assistant_turn_started`: triggered when the assistant turn starts
  - `on_assistant_turn_stopped`: triggered when the assistant turn ends
  - `on_assistant_thought`: triggered when there's an assistant thought
    available
    (PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))

- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA
  SDK (requires `krisp_audio`).
  (PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))

- Added support for setting up a pipeline task from external files. You can now
  register custom pipeline task setup files by setting the
  `PIPECAT_SETUP_FILES` environment variable. This variable should contain a
  colon-separated list of Python files (e.g. `export
PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
  function with the following signature:

  ```python
  async def setup_pipeline_task(task: PipelineTask):
      ...
  ```

  (PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))

- Added a keepalive task for `InworldTTSService` to keep the service connected
  in the event of no generations for longer periods of time.
  (PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403))

- Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When
  enabled, `GladiaSTTService` acts as the turn controller, emitting
  `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally
  `InterruptionFrame`.
  (PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))

- Added `should_interrupt` property to `GladiaSTTService` to configure whether
  the bot should be interrupted when the external service detects user speech.
  (PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))

- Added `VonageFrameSerializer` for the Vonage Video API Audio Connector
  WebSocket protocol.
  (PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))

- Added `append_trailing_space` parameter to `TTSService` to automatically
  append a trailing space to text before sending to TTS, helping prevent some
  services from vocalizing trailing punctuation.
  (PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))

### Changed

- Updated `ElevenLabsRealtimeSTTService` to accept the
  `include_language_detection` parameter to detect language.

  ```python
    stt = ElevenLabsRealtimeSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        include_language_detection=True
    )
  ```

  (PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216))

- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved
  VAD, Smart Turn capabilities, and brings dramatic improvements to latency
  without any impact on accuracy. Use the `turn_detection_mode` parameter to control
  the endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
  `TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.

  ```python
      stt = SpeechmaticsSTTService(
          api_key=os.getenv("SPEECHMATICS_API_KEY"),
          params=SpeechmaticsSTTService.InputParams(
              language=Language.EN,
              turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
              speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
          ),
      )
  ```

  (PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))

- `daily-python` updated to 0.23.0.
  (PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))

- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by
  `DailyTransport` now include the transport source (i.e., the originating
  audio track).
  (PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))

- Updates to Inworld TTS services:

  - Improved `InworldTTSService`'s websocket implementation to better flush
    and close context to better handle long inputs.
  - Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
    (PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))

- Improved the error handling and reconnection logic for `WebsocketServer` by
  distinguishing between errors when disconnecting and websocket communication
  errors.
  (PR [#3392](https://github.com/pipecat-ai/pipecat/pull/3392))

- Updated `DeepgramSTTService` to push user started/stopped speaking and
  interruption frames when `vad_enabled` is set to true. This centralizes the
  frames into the service, removing the need to have your application code
  handle Deepgram's events and push these frames.
  (PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314))

- Added encoding validation to `DeepgramTTSService` to prevent unsupported
  encodings from reaching the API. The service now raises `ValueError` at
  initialization with a clear error message.
  (PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329))

- Updated `read_audio_frame` & `read_video_frame` methods in
  `SmallWebRTCClient` to check if the track is enabled before logging a
  warning.
  (PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336))

- Updated `CartesiaTTSService` to support setting `language=None`, resulting in
  Cartesia auto-detecting the language of the conversation.
  (PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366))

- The bundled Smart Turn weights are now updated to v3.2, which has better
  handling of short utterances, and is more robust against background noise.
  (PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367))

- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6`
  (PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371))

- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
  meaning that the start of the turn audio is not cut off. This improves
  accuracy for short utterances.

- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
  (PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377))

- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter`
  to share a single SDK instance within the same process.
  (PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))

- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english`
  and voice ID to `autumn`.
  (PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399))

- Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio
  packetization via the `fixed_audio_packet_size` parameter to support media
  endpoints requiring strict framing and real-time pacing.
  (PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))

- `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to
  `True` to prevent punctuation (e.g., “dot”) from being pronounced.
  (PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))

- Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`,
  `LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns
  thought content.
  (PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431))

### Deprecated

- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use
  `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with
  `LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
  (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))

- `FrameProcessor.interruption_strategies` is deprecated, use
  `LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
  (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))

- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in
  `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new
  universal `LLMContext` and `LLMContextAggregatorPair` instead.
  (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))

- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and
  `UserStoppedSpeakingFrame` frames.
  (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))

- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame`
  frames are deprecated.
  (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))

- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in
  unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies`
  parameter instead.
  (PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))

- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is
  deprecated. Use the new `turn_detection_mode` parameter instead, with
  `TurnDetectionMode.EXTERNAL`,`TurnDetectionMode.ADAPTIVE`, or
  `TurnDetectionMode.SMART_TURN`. The `enable_vad` parameter is also
  deprecated and is inferred from the `turn_detection_mode`.
  (PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))

- `OpenAILLMContext` and its associated things (context aggregators, etc.) are
  now deprecated in favor of the universal `LLMContext` and its associated
  things.

  From the developer's point of view, switching to using `LLMContext`
  machinery will usually be a matter of going from this:

  ```python
  context = OpenAILLMContext(messages, tools)
  context_aggregator = llm.create_context_aggregator(context)
  ```

  To this:

  ```
  context = LLMContext(messages, tools)
  context_aggregator = LLMContextAggregatorPair(context)
  ```

  (PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263))

- `STTMuteFilter` is deprecated and will be removed in a future version. Use
  `LLMUserAggregator`'s new `user_mute_strategies` instead.
  (PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))

- `FrameProcessor.interruptions_allowed` is now deprecated, use
  `LLMUserAggregator`'s new parameter `user_mute_strategies` instead.
  (PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))

- `PipelineParams.allow_interruptions` is now deprecated, use
  `LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For
  example, to disable interruptions but still get user turns you can do:

  ```python
  context_aggregator = LLMContextAggregatorPair(
      context,
      user_params=LLMUserAggregatorParams(
          user_turn_strategies=UserTurnStrategies(
              start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
          ),
      ),
  )
  ```

  (PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))

- `TranscriptProcessor` and related data classes and frames
  (`TranscriptionMessage`, `ThoughtTranscriptionMessage`,
  `TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and
  `LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and
  `on_assistant_turn_stopped`) instead.
  (PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))

- Deprecated support for the `vad_events` `LiveOptions` in
  `DeepgramSTTService`. Instead, use a local Silero VAD for VAD events.
  Additionally, deprecated `should_interrupt` which will be removed along with
  `vad_events` support in a future release.
  (PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386))

- Loading external observers from files is deprecated, use the new pipeline
  task setup files and `PIPECAT_SETUP_FILES` environment variable instead.
  (PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))

### Fixed

- Improved error handling in `ElevenLabsRealtimeSTTService`
  (PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))

- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
  that blocks the process if the websocket disconnects due to an error
  (PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))

- Fixed a bug in `STTMuteFilter` where the user was not always muted during
  function calls, especially when there were multiple simultaneous calls.
  (PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))

- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate
  memory" error when processing silence audio frames.
  (PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322))

- Updated `SpeechmaticsSTTService` for version `0.0.99+`:

  - Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
    in order to finalize transcription.
  - Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
    detection.
  - Only emit VAD + interruption frames if VAD is enabled within the plugin
    (modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
    (PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))

- Fixed an issue with function calling where a handler failing to invoke its
  result callback could leave the context stuck in IN_PROGRESS, causing LLM
  inference for subsequent function call results to block while waiting on the
  unresolved call.
  (PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343))

- Fixed an issue with DeepgramTTSService where the model would output "Dot"
  instead of a period in some circumstances.
  (PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345))

- Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as
  `unknown`.
  (PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351))

- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were
  occasionally not pushed.
  (PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356))

- Fixed potential memory leaks and initialization issues in `KrispVivaFilter`
  by improving SDK lifecycle management.
  (PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))

- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was
  set after awaiting, allowing the event loop to re-enter the method before the
  guard was set.
  (PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400))

- Fixed parallel function calling when using Gemini thinking.
  (PR [3420](https://github.com/pipecat-ai/pipecat/pull/3420))

- Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as
  `unknown`.
  (PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422))

- Fixed an issue in `traced_tts`, `traced_gemini_live`, and
  `traced_openai_realtime` where `model_name` in OpenTelemetry appears as
  `unknown`.
  (PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428))

- Fixed `request_image_frame` (for backwards compatibility) and restored
  function-call–related fields in `UserImageRequestFrame` and
  `UserImageRawFrame`, preventing a case where adding a non-LLM message to the
  context could trigger duplicate LLM inferences (on image arrival and on
  function-call result), potentially causing an infinite inference loop.
  (PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430))

- Fixed `LLMContext.create_audio_message()` by correcting an internal helper
  that was incorrectly declared async while being run in `asyncio.to_thread()`.
  (PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435))

### Other

- Added `52-live-transcription.py` foundational example demonstrating live
  transcription and translation from English to Spanish. In this example, the
  bot is not interruptible: as the user continues speaking, English
  transcriptions are queued, and the bot continuously translates and speaks
  each queued sentence in Spanish without being interrupted by new user speech.
  (PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))

- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows
  how to use `UserTurnProcessor`.
  (PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))

- Added a new foundational example `28-user-assistant-turns.py` that shows how
  to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to
  gather a conversation transcript.
  (PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))

## [0.0.98] - 2025-12-17

### Added

- Added `RimeNonJsonTTSService` which supports non-JSON streaming mode. This
  new class supports websocket streaming for the Arcana model.
  (PR [#3085](https://github.com/pipecat-ai/pipecat/pull/3085))

- Added additional functionality related to "thinking", for Google and
  Anthropic LLMs.

  1. New typed parameters for Google and Anthropic LLMs that control the
     models' thinking behavior (like how much thinking to do, and whether to
     output thoughts or thought summaries):
     - `AnthropicLLMService.ThinkingConfig`
     - `GoogleLLMService.ThinkingConfig`
  2. New frames for representing thoughts output by LLMs:
     - `LLMThoughtStartFrame`
     - `LLMThoughtTextFrame`
     - `LLMThoughtEndFrame`
  3. A generic mechanism for recording LLM thoughts to context, used
     specifically to support Anthropic, whose thought signatures are expected
     to appear alongside the text of the thoughts within assistant context
     messages. See:
     - `LLMThoughtEndFrame.signature`
     - `LLMAssistantAggregator` handling of the above field
     - `AnthropicLLMAdapter` handling of `"thought"` context messages
  4. Google-specific logic for inserting thought signatures into the context,
     to help maintain thinking continuity in a chain of LLM calls. See:
     - `GoogleLLMService` sending `LLMMessagesAppendFrame`s to add
       LLM-specific
       `"thought_signature"` messages to context
     - `GeminiLLMAdapter` handling of `"thought_signature"` messages
  5. An expansion of `TranscriptProcessor` to process LLM thoughts in
     addition to user and assistant utterances. See:
     - `TranscriptProcessor(process_thoughts=True)` (defaults to `False`)
     - `ThoughtTranscriptionMessage`, which is now also emitted with the
       `"on_transcript_update"` event
       (PR [#3175](https://github.com/pipecat-ai/pipecat/pull/3175))

- Data and control frames can now be marked as non-interruptible by using the
  `UninterruptibleFrame` mixin. Frames marked as `UninterruptibleFrame` will
  not be interrupted during processing, and any queued frames of this type will
  be retained in the internal queues. This is useful when you need ordered
  frames (data or control) that should not be discarded or cancelled due to
  interruptions.
  (PR [#3189](https://github.com/pipecat-ai/pipecat/pull/3189))

- Added `on_conversation_detected` event to `VoicemaiDetector`.
  (PR [#3207](https://github.com/pipecat-ai/pipecat/pull/3207))

- Added `x-goog-api-client` header with Pipecat's version to all Google
  services' requests.
  (PR [#3208](https://github.com/pipecat-ai/pipecat/pull/3208))

- Added support for the HeyGen LiveAvatar API (see https://www.liveavatar.com/).
  (PR [#3210](https://github.com/pipecat-ai/pipecat/pull/3210))

- Added to `AWSNovaSonicLLMService` functionality related to the new (and now
  default) Nova 2 Sonic model (`"amazon.nova-2-sonic-v1:0"`):

  - Added the `endpointing_sensitivity` parameter to control how quickly the
    model decides the user has stopped speaking.
  - Made the assistant-response-trigger hack a no-op. It's only needed for
    the older Nova Sonic model.
    (PR [#3212](https://github.com/pipecat-ai/pipecat/pull/3212))

- [Ultravox Realtime](https://docs.ultravox.ai) is now a supported
  speech-to-speech service.

  - Added `UltravoxRealtimeLLMService` for the integration.
  - Added `49-ultravox-realtime.py` example (with tool calling).
    (PR [#3227](https://github.com/pipecat-ai/pipecat/pull/3227))

- Added Daily PSTN dial-in support to the development runner with `--dialin`
  flag. This includes:

  - `/daily-dialin-webhook` endpoint that handles incoming Daily PSTN webhooks
  - Automatic Daily room creation with SIP configuration
  - `DialinSettings` and `DailyDialinRequest` types in `pipecat.runner.types`
    for type-safe dial-in data
  - The runner now mimics Pipecat Cloud's dial-in webhook handling for local
    development
    (PR [#3235](https://github.com/pipecat-ai/pipecat/pull/3235))

- Add Gladia session id to logs for `GladiaSTTService`.
  (PR [#3236](https://github.com/pipecat-ai/pipecat/pull/3236))

- Added `InworldHttpTTSService` which uses Inworld's HTTP based TTS service in
  either streaming or non-streaming mode. Note: This class was previously named
  `InworldTTSService`.
  (PR [#3239](https://github.com/pipecat-ai/pipecat/pull/3239))

- Added `language_hints_strict` parameter to `SonioxSTTService` to strictly
  enforces language hints. This ensures that transcription occurs in the
  specified language.
  (PR [#3245](https://github.com/pipecat-ai/pipecat/pull/3245))

- Added Pipecat library version info to the `about` field in the `bot-ready`
  RTVI message.
  (PR [#3248](https://github.com/pipecat-ai/pipecat/pull/3248))

- Added `VisionFullResponseStartFrame`, `VisionFullResponseEndFrame` and
  `VisionTextFrame`. This are used by vision services similar to LLM
  services.
  (PR [#3252](https://github.com/pipecat-ai/pipecat/pull/3252))

### Changed

- `FunctionCallInProgressFrame` and `FunctionCallResultFrame` have changed from
  system frames to a control frame and a data frame, respectively, and are
  now both marked as `UninterruptibleFrame`.
  (PR [#3189](https://github.com/pipecat-ai/pipecat/pull/3189))

- `UserBotLatencyLogObserver` now uses `VADUserStartedSpeakingFrame` and
  `VADUserStoppedSpeakingFrame` to determine latency from user stopped speaking
  to bot started speaking.
  (PR [#3206](https://github.com/pipecat-ai/pipecat/pull/3206))

- Updated `HeyGenVideoService` and `HeyGenTransport` to support both HeyGen
  APIs (Interactive Avatar and Live Avatar).
  Using them is as simple as specifying the `service_type` when creating the
  `HeyGenVideoService` and the `HeyGenTransport`:

  ```python
  heyGen = HeyGenVideoService(
      api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"),
      service_type=ServiceType.LIVE_AVATAR,
      session=session,
  )
  ```

  (PR [#3210](https://github.com/pipecat-ai/pipecat/pull/3210))

- Made `"amazon.nova-2-sonic-v1:0"` the new default model for
  `AWSNovaSonicLLMService`.
  (PR [#3212](https://github.com/pipecat-ai/pipecat/pull/3212))

- Updated the `run_inference` methods in the LLM service classes
  (`AnthropicLLMService`, `AWSBedrockLLMService`, `GoogleLLMService`, and
  `OpenAILLMService` and its base classes) to use the provided LLM
  configuration parameters.
  (PR [#3214](https://github.com/pipecat-ai/pipecat/pull/3214))

- Updated default models for:

  - `GeminiLiveLLMService` to `gemini-2.5-flash-native-audio-preview-12-2025`.
  - `GeminiLiveVertexLLMService` to `gemini-live-2.5-flash-native-audio`.
    (PR [#3228](https://github.com/pipecat-ai/pipecat/pull/3228))

- Changed the `reason` field in `EndFrame`, `CancelFrame`, `EndTaskFrame`, and
  `CancelTaskFrame` from `str` to `Any` to indicate that it can hold values
  other than strings.
  (PR [#3231](https://github.com/pipecat-ai/pipecat/pull/3231))

- Updated websocket STT services to use the `WebsocketSTTService` base class.
  This base class manages the websocket connection and handles reconnects.
  Updated services:

  - `AssemblyAISTTService`
  - `AWSTranscribeSTTService`
  - `GladiaSTTService`
  - `SonioxSTTService`
    (PR [#3236](https://github.com/pipecat-ai/pipecat/pull/3236))

- Changed Inworld's TTS service implementations:

  - Previously, the HTTP implementation was named `InworldTTSService`. That
    has been moved to `InworldHttpTTSService`. This service now supports
    word-timestamp alignment data in both streaming and non-streaming modes.
  - Updated the `InworldTTSService` class to use Inworld's Websocket API.
    This class now has support for word-timestamp alignment data and tracks
    contexts for each user turn.
    (PR [#3239](https://github.com/pipecat-ai/pipecat/pull/3239))

- ⚠️ Breaking change: `WordTTSService.start_word_timestamps()` and
  `WordTTSService.reset_word_timestamps()` are now async.
  (PR [#3240](https://github.com/pipecat-ai/pipecat/pull/3240))

- Updated the current RTVI version to 1.1.0 to reflect recent additions and
  deprecations.

  - New RTVI Messages: `send-text` and `bot-output`
  - Deprecated Messages: `append-to-context` and `bot-transcription`
    (PR [#3248](https://github.com/pipecat-ai/pipecat/pull/3248))

- `MoondreamService` now pushes `VisionFullResponseStartFrame`,
  `VisionFullResponseEndFrame` and `VisionTextFrame`.
  (PR [#3252](https://github.com/pipecat-ai/pipecat/pull/3252))

### Deprecated

- `FalSmartTurnAnalyzer` and `LocalSmartTurnAnalyzer` are deprecated and will
  be removed in a future version. Use `LocalSmartTurnAnalyzerV3` instead.
  (PR [#3219](https://github.com/pipecat-ai/pipecat/pull/3219))

### Removed

- Removed the deprecated VLLM-based open source Ultravox STT service.
  (PR [#3227](https://github.com/pipecat-ai/pipecat/pull/3227))

### Fixed

- Fixed a bug in `AWSNovaSonicLLMService` where we would mishandle cancelled
  tool calls in the context, resulting in errors.
  (PR [#3212](https://github.com/pipecat-ai/pipecat/pull/3212))

- Better support conversation history with Gemini 2.5 Flash Image (model
  "gemini-2.5-flash-image"). Prior to this fix, the model had no memory of
  previous images it had generated, so it wouldn't be able to iterate on
  them.
  (PR [#3224](https://github.com/pipecat-ai/pipecat/pull/3224))

- Support conversations with Gemini 3 Pro Image (model
  "gemini-3-pro-image-preview"). Prior to this fix, after the model generated
  an image the conversation would not be able to progress.
  (PR [#3224](https://github.com/pipecat-ai/pipecat/pull/3224))

- Fixed an issue where `ElevenLabsHttpTTSService` was not updating
  voice settings when receiving a `TTSUpdateSettingsFrame`.
  (PR [#3226](https://github.com/pipecat-ai/pipecat/pull/3226))

- Fixed the return type for `SmallWebRTCRequestHandler.handle_web_request()`
  function.
  (PR [#3230](https://github.com/pipecat-ai/pipecat/pull/3230))

- Fix a bug in LLM context audio content handling
  (PR [#3234](https://github.com/pipecat-ai/pipecat/pull/3234))

- In `GladiaSTTService`, reset the `_bytes_sent` counter on connecting the
  websocket. This avoids unnecessary audio buffer trimming.
  (PR [#3236](https://github.com/pipecat-ai/pipecat/pull/3236))

- Fixed a TTS service word-timestamp issue that could cause generated
  `TTSTextFrame` instances to have an incorrect pts (`pts = -1`).
  (PR [#3240](https://github.com/pipecat-ai/pipecat/pull/3240))

- Fixed an issue in `SimpleTextAggreagtor` where spaces were not being stripped
  before returning the aggregation. This resulted in an extra space for TTS
  services that don't support word-timestamp alignment data.
  (PR [#3247](https://github.com/pipecat-ai/pipecat/pull/3247))

## [0.0.97] - 2025-12-05

### Added

- Added new Gradium services, `GradiumSTTService` and `GradiumTTSService`, for
  speech-to-text and text-to-speech functionality using Gradium's API.

- Additions for `AsyncAITTSService` and `AsyncAIHttpTTSService`:

  - Added new `languages`: `pt`, `nl`, `ar`, `ru`, `ro`, `ja`, `he`, `hy`,
    `tr`, `hi`, `zh`.
  - Updated the default model to `asyncflow_multilingual_v1.0` for improved
    accuracy and broader language coverage.

- Added optional tool and tool output filters for MCP services.

### Changed

- Updated Deepgram logging to include Deepgram request IDs for improved
  debugging.

- Text Aggregation Improvements:

  - **Breaking Change**: `BaseTextAggregator.aggregate()` now returns
    `AsyncIterator[Aggregation]` instead of `Optional[Aggregation]`. This
    enables the aggregator to return multiple results based on the provided
    text.
  - Refactored text aggregators to use inheritance: `SkipTagsAggregator` and
    `PatternPairAggregator` now inherit from `SimpleTextAggregator`, reusing
    the base class's sentence detection logic.

- Improved interruption handling to prevent bots from repeating themselves. LLM
  services that return multiple sentences in a single response (e.g.,
  `GoogleLLMService`) are now split into individual sentences before being sent
  to TTS. This ensures interruptions occur at sentence boundaries, preventing
  the bot from repeating content after being interrupted during long responses.

- Updated `AICFilter` to use Quail STT as the default model
  (`AICModelType.QUAIL_STT`). Quail STT is optimized for human-to-machine
  interaction (e.g., voice agents, speech-to-text) and operates at a native
  sample rate of 16 kHz with fixed enhancement parameters.

- If an unexpected exception is caught, or if `FrameProcessor.push_error()` is
  called with an exception, the file name and line number where the exception
  occured are now logged.

- Updated Smart Turn model weights to v3.1.

- Smart Turn analyzer now uses the full context of the turn rather than just
  the audio since VAD last triggered.

- Updated `CartesiaSTTService` to return the full transcription `result` in the
  `TranscriptionFrame` and `InterimTranscriptionFrame`. This provides access to
  word timestamp data.

- `HumeTTSService` changes:

  - Added tracking headers (`X-Hume-Client-Name` and `X-Hume-Client-Version`)
    to all requests made by `HumeTTSService` to the Hume API for better usage
    tracking and analytics.
  - Added `stop()` and `cancel()` cleanup methods to `HumeTTSService` to
    properly close the HTTP client and prevent resource leaks.

### Deprecated

- NVIDIA Services name changes (all functionality is unchanged):

  - `NimLLMService` is now deprecated, use `NvidiaLLMService` instead.
  - `RivaSTTService` is now deprecated, use `NvidiaSTTService` instead.
  - `RivaTTSService` is now deprecated, use `NvidiaTTSService` instead.
  - Use `uv pip install pipecat-ai[nvidia]` instead of
    `uv pip install pipecat-ai[riva]`

- The `noise_gate_enable` parameter in `AICFilter` is deprecated and no longer
  has any effect. Noise gating is now handled automatically by the AIC VAD
  system. Use `AICFilter.create_vad_analyzer()` for VAD functionality instead.

- Package `pipecat.sync` is deprecated, use `pipecat.utils.sync` instead.

### Fixed

- Fixed bug in `PatternPairAggregator` where pattern handlers could be called
  multiple times for `KEEP` or `AGGREGATE` patterns.

- Fixed sentence aggregation to correctly handle ambiguous punctuation in
  streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").

- Fixed an issue in `AWSTranscribeSTTService` where the `region` arg was always
  set to `us-east-1` when providing an AWS_REGION env var.

- Fixed an issue in `SarvamTTSService` where the last sentence was not being
  spoken. Now, audio is flushed when the TTS services receives the
  `LLMFullResponseEndFrame` or `EndFrame`.

- Fixed an issue in `DeepgramTTSService` where a `TTSStoppedFrame` was
  incorrectly pushed after a functional call. This caused an issue with the
  voice-ui-kit's conversational panel rending of the LLM output after a
  function call.

- Fixed an issue where `LLMTextFrame.skip_tts` was being overwritten by LLM
  services.

- Fixed an issue that caused `WebsocketService` instances to attempt
  reconnection during shutdown.

- Fixed an issue in `ElevenLabsTTSService` where character usage metrics were
  only reported on the first TTS generation per turn.

## [0.0.96] - 2025-11-26 🦃 "Happy Thanksgiving!" 🦃

### Added

- Added `AWSBedrockAgentCoreProcessor` to support invoking an AgentCore-hosted
  agent in a Pipecat pipeline.

- Enhanced error handling across the framework:

  - Added `on_error` callback to `FrameProcessor` for centralized error
    handling.

  - Renamed `push_error(error: ErrorFrame)` to `push_error_frame(error: ErrorFrame)`
    for clarity.

  - Added new `push_error` method for simplified error reporting:

    ```python
    async def push_error(error_msg: str,
                         exception: Optional[Exception] = None,
                         fatal: bool = False)
    ```

  - Standardized error logging by replacing `logger.exception` calls with
    `logger.error` throughout the codebase.

- Added `cache_read_input_tokens`, `cache_creation_input_tokens` and
  `reasoning_tokens` to OTel spans for LLM call

- Added `LiveKitRESTHelper` utility class for managing LiveKit rooms via REST API.

- Added `DeepgramSageMakerSTTService` which connects to a SageMaker hosted
  Deepgram STT model. Added `07c-interruptible-deepgram-sagemaker.py`
  foundational example.

- Added `SageMakerBidiClient` to connect to SageMaker hosted BiDi compatible
  services.

- Added support for `include_timestamps` and `enable_logging` in
  `ElevenLabsRealtimeSTTService`. When `include_timestamps` is enabled,
  timestamp data is included in the `TranscriptionFrame`'s `result`
  parameter.

- Added optional speaking rate control to `InworldTTSService`.

- Introduced a new `AggregatedTextFrame` type to support passing text along with
  an `aggregated_by` field to describe the type of text
  included. `TTSTextFrame`s now inherit from `AggregatedTextFrame`. With this
  inheritance, an observer can watch for `AggregatedTextFrame`s to accumlate the
  perceived output and determine whether or not the text was spoken based on if
  that frame is also a `TTSTextFrame`.

  With this frame, the llm token stream can be transformed into custom
  composable chunks, allowing for aggregation outside the TTS service. This
  makes it possible to listen for or handle those aggregations and sets the
  stage for doing things like composing a best effort of the perceived llm
  output in a more digestable form and to do so whether or not it is processed
  by a TTS or if even a TTS exists.

- Introduced `LLMTextProcessor`: A new processor meant to allow customization
  for how LLMTextFrames should be aggregated and considered. It's purpose is to
  turn `LLMTextFrame`s into `AggregatedTextFrame`s. By default, a TTSService
  will still aggregate `LLMTextFrame`s by sentence for the service to
  consume. However, if you wish to override how the llm text is aggregated, you
  should no longer override the TTS's internal text_aggregator, but instead,
  insert this processor between your LLM and TTS in the pipeline.

- New `bot-output` RTVI message to represent what the bot actually "says".

  - The `RTVIObserver` now emits `bot-output` messages based off the new
    `AggregatedTextFrame`s (`bot-tts-text` and `bot-llm-text` are still
    supported and generated, but `bot-transcript` is now deprecated in lieu of
    this new, more thorough, message).

  - The new `RTVIBotOutputMessage` includes the fields:

    - `spoken`: A boolean indicating whether the text was spoken by TTS

    - `aggregated_by`: A string representing how the text was aggregated
      ("sentence", "word", "my custom aggregation")

  - Introduced new fields to `RTVIObserver` to support the new `bot-output`
    messaging:

    - `bot_output_enabled`: Defaults to True. Set to false to disable bot-output
      messages.

    - `skip_aggregator_types`: Defaults to `None`. Set to a list of strings that
      match aggregation types that should not be included in bot-output
      messages. (Ex. `credit_card`)

  - Introduced new methods, `add_text_transformer()` and
    `remove_text_transformer()`, to `RTVIObserver` to support providing (and
    subsequently removing) callbacks for various types of aggregations (or all
    aggregations with `*`) that can modify the text before being sent as a
    `bot-output` or `tts-text` message. (Think obscuring the credit card or
    inserting extra detail the client might want that the context doesn't need.)

- In `MiniMaxHttpTTSService`:

  - Added support for speech-2.6-hd and speech-2.6-turbo models

  - Added languages: Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino,
    Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian,
    Swedish, and Tamil

  - Added new emotions: calm and fluent

- Added `enable_logging` to `SimliVideoService` input parameters. It's disabled
  by default.

### Changed

- Updated `FishAudioTTSService` default model to `s1`.

- Updated `DeepgramTTSService` to use Deepgram's TTS websocket API. ⚠️ This is
  a potential breaking change, which only affects you if you're self-hosting
  `DeepgramTTSService`. The new service uses Websockets and improves TTFB
  latency.

- Updated `daily-python` to 0.22.0.

- `BaseTextAggregator` changes:

  Modified the BaseTextAggregator type so that when text gets aggregated,
  metadata can be associated with it. Currently, that just means a `type`, so
  that the aggregation can be classified or described. Changes made to support
  this:

  - ⚠️ IMPORTANT: Aggregators are now expected to strip leading/trailing white
    space characters before returning their aggregation from `aggregation()` or
    `.text`. This way all aggregators have a consistent contract allowing
    downstream use to know how to stitch aggregations back together.

  - Introduced a new `Aggregation` dataclass to represent both the aggregated
    `text` and a string identifying the `type` of aggregation (ex. "sentence",
    "word", "my custom aggregation")

  - ⚠️ Breaking change: `BaseTextAggregator.text` now returns an `Aggregation`
    (instead of `str`).

    Before:

    ```python
    aggregated_text = myAggregator.text
    ```

    Now:

    ```python
    aggregated_text = myAggregator.text.text
    ```

  - ⚠️ Breaking change: `BaseTextAggregator.aggregate()` now returns
    `Optional[Aggregation]` (instead of `Optional[str]`).

    Before:

    ```python
    aggregation = myAggregator.aggregate(text)
    print(f"successfully aggregated text: {aggregation}")
    ```

    Now:

    ```python
    aggregation = myAggregator.aggregate(text)
    if aggregation:
      print(f"successfully aggregated text: {aggregation.text}")
    ```

  - `SimpleTextAggregator`, `SkipTagsAggregator`, `PatternPairAggregator`
    updated to produce/consume `Aggregation` objects.

  - All uses of the above Aggregators have been updated accordingly.

- Augmented the `PatternPairAggregator` so that matched patterns can be treated
  as their own aggregation, taking advantage of the new. To that end:

  - Introduced a new, preferred version of `add_pattern` to support a new option
    for treating a match as a separate aggregation returned from
    `aggregate()`. This replaces the now deprecated `add_pattern_pair` method
    and you provide a `MatchAction` in lieu of the `remove_match` field.

    - `MatchAction` enum: `REMOVE`, `KEEP`, `AGGREGATE`, allowing customization
      for how a match should be handled.

      - `REMOVE`: The text along with its delimiters will be removed from the
        streaming text. Sentence aggregation will continue on as if this text
        did not exist.

      - `KEEP`: The delimiters will be removed, but the content between them
        will be kept. Sentence aggregation will continue on with the internal
        text included.

      - `AGGREGATE`: The delimiters will be removed and the content between will
        be treated as a separate aggregation. Any text before the start of the
        pattern will be returned early, whether or not a complete sentence was
        found. Then the pattern will be returned. Then the aggregation will
        continue on sentence matching after the closing delimiter is found. The
        content between the delimiters is not aggregated by sentence. It is
        aggregated as one single block of text.

    - `PatternMatch` now extends `Aggregation` and provides richer info to
      handlers.

  - ⚠️ Breaking change: The `PatternMatch` type returned to handlers registered
    via `on_pattern_match` has been updated to subclass from the new
    `Aggregation` type, which means that `content` has been replaced with
    `text` and `pattern_id` has been replaced with `type`:

    ```python
    async dev on_match_tag(match: PatternMatch):
       pattern = match.type # instead of match.pattern_id
       text = match.text # instead of match.content
    ```

- `TextFrame` now includes the field `append_to_context` to support setting
  whether or not the encompassing text should be added to the LLM context (by
  the LLM assistant aggregator). It defaults to `True`.

- `TTSService` base class updates:

  - `TTSService`s now accept a new `skip_aggregator_types` to avoid speaking
    certain aggregation types (now determined/returned by the aggregator)

  - Introduced the ability to do a just-in-time transform of text before it gets
    sent to the TTS service via callbacks you can set up via a new init field,
    `text_transforms` or a new method `add_text_transformer()`. This makes it
    possible to do things like introduce TTS-specific tags for spelling or
    emotion or change the pronunciation of something on the
    fly. `remove_text_transformer` has also been added to support removing a
    registered transform callback.

  - TTS services push `AggregatedTextFrame` in addition to `TTSTextFrame`s when
    either an aggregation occurs that should not be spoken or when the TTS
    service supports word-by-word timestamping. In the latter case, the
    `TTSService` preliminarily generates an `AggregatedTextFrame`, aggregated by
    sentence to generate the full sentence content as early as possible.

- Updated `CartesiaTTSService`:

  - Modified use of custom default text_aggregator to avoid deprecation warnings
    and push users towards use of transformers or the `LLMTextProcessor`

  - Added convenience methods for taking advantage of Cartesia's SSML tags:
    spell, emotion, pauses, volume, and speed.

- Updated `RimeTTSService`:

  - Modified use of custom default text_aggregator to avoid deprecation warnings
    and push users towards use of transformers or the `LLMTextProcessor`

  - Added convenience methods for taking advantage of Rime's customization
    options: spell, pauses, pronunciations, and inline speed control.

### Deprecated

- The TTS constructor field, `text_aggregator` is deprecated in favor of the new
  `LLMTextProcessor`. TTSServices still have an internal aggregator for support
  of default behavior, but if you want to override the aggregation behavior, you
  should use the new processor.

- The RTVI `bot-transcription` event is deprecated in favor of the new
  `bot-output` message which is the canonical representation of bot output
  (spoken or not). The code still emits a transcription message for backwards
  compatibility while transition occurs.

- Deprecated `add_pattern_pair` in the `PatternPairAggregator` which takes a
  `pattern_id` and `remove_match` field in favor of the new `add_pattern` method
  which takes a `type` and an `action`

- `english_normalization` input parameter for `MiniMaxHttpTTSService` is
  deprecated, use `test_normalization` instead.

### Fixed

- Fixed an issue in `AWSBedrockLLMService` where the `aws_region` arg was
  always set to `us-east-1` when providing an AWS_REGION env var.

- Fixed an issue with `DeepgramFluxSTTService` where it sometimes failed to reconnect.

- Fixed an issue in `ElevenLabsRealtimeSTTService` where dynamic language
  updates were not working.

- Fixed an issue in `ElevenLabsRealtimeSTTService` where setting the sample
  rate would result in transcripts failing.

- Fixed `InworldTTSService` audio config payload to use camelCase keys expected
  by the Inworld API.

## [0.0.95] - 2025-11-18

### Added

- Added ai-coustics integrated VAD (`AICVADAnalyzer`) with `AICFilter` factory and
  example wiring; leverages the enhancement model for robust detection with no
  ONNX dependency or added processing complexity.

- Added a watchdog to `DeepgramFluxSTTService` to prevent dangling tasks in case the
  user was speaking and we stop receiving audio.

- Introduced a minimum confidence parameter in `DeepgramFluxSTTService` to avoid
  generating transcriptions below a defined threshold.

- Added `ElevenLabsRealtimeSTTService` which implements the Realtime STT
  service from ElevenLabs.

- Added word-level timestamps support to Hume TTS service

### Changed

- ⚠️ Breaking change: `LLMContext.create_image_message()`,
  `LLMContext.create_audio_message()`, `LLMContext.add_image_frame_message()`
  and `LLMContext.add_audio_frames_message()` are now async methods. This fixes
  an issue where the asyncio event loop would be blocked while encoding audio or
  images.

- `ConsumerProcessor` now queues frames from the producer internally instead of
  pushing them directly. This allows us to subclass consumer processors and
  manipulate frames before they are pushed.

- `BaseTextFilter` only require subclasses to implement the `filter()` method.

- Extracted the logic for retrying connections, and create a new `send_with_retry`
  method inside `WebSocketService`.

- Refactored `DeepgramFluxSTTService` to automatically reconnect if sending a
  message fails.

- Updated all STT and TTS services to use consistent error handling pattern with
  `push_error()` method for better pipeline error event integration.

- Added support for `maybe_capture_participant_camera()` and
  `maybe_capture_participant_screen()` for `SmallWebRTCTransport` in the runner
  utils.

- Added Hindi support for Rime TTS services.

- Updated `GeminiTTSService` to use Google Cloud Text-to-Speech streaming API
  instead of the deprecated Gemini API. Now uses `credentials` /
  `credentials_path` for authentication. The `api_key` parameter is deprecated.
  Also, added support for `prompt` parameter for style instructions and
  expressive markup tags. Significantly improved latency with streaming
  synthesis.

- Updated language mappings for the Google and Gemini TTS services to match
  official documentation.

### Deprecated

- The `api_key` parameter in `GeminiTTSService` is deprecated. Use
  `credentials` or `credentials_path` instead for Google Cloud authentication.

### Fixed

- Fixed a `SimliVideoService` connection issue.

- Fixed an issue in the `Runner` where, when using `SmallWebRTCTransport`, the
  `request_data` was not being passed to the `SmallWebRTCRunnerArguments` body.

- Fixed subtle issue of assistant context messages ending up with double spaces
  between words or sentences.

- Fixed an issue where `NeuphonicTTSService` wasn't pushing `TTSTextFrame`s,
  meaning assistant messages weren't being written to context.

- Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying
  LLM completions and tools when using the universal `LLMContext`.

- Fixed issue where `DeepgramFluxSTTService` failed to connect if passing a
  `keyterm` or `tag` containing a space.

- Prevented `HeyGenVideoService` from automatically disconnecting after 5 minutes.

## [0.0.94] - 2025-11-10

### Changed

- Added support for retrying `SpeechmaticsTTSService` when it returns a 503
  error. Default values in `InputParams`.

### Deprecated

- The `KrispFilter` is deprecated and will be removed in a future version. Use
  the `KrispVivaFilter` instead.

### Removed

- `LivekitFrameSerializer` has been removed. Use `LiveKitTransport` instead.

### Fixed

- Fixed a bug related to `LLMAssistantAggregator` where spaces were sometimes
  missing from assistant messages in context.

## [0.0.93] - 2025-11-07

### Added

- Added support for Sarvam Speech-to-Text service (`SarvamSTTService`) with
  streaming WebSocket support for `saarika` (STT) and `saaras` (STT-translate)
  models.

- Added support for passing in a `ToolsSchema` in lieu of a list of provider-
  specific dicts when initializing `OpenAIRealtimeLLMService` or when updating
  it using `LLMUpdateSettingsFrame`.

- Added `TransportParams.audio_out_silence_secs`, which specifies how many
  seconds of silence to output when an `EndFrame` reaches the output
  transport. This can help ensure that all audio data is fully delivered to
  clients.

- Added new `FrameProcessor.broadcast_frame()` method. This will push two
  instances of a given frame class, one upstream and the other downstream.

  ```python
  await self.broadcast_frame(UserSpeakingFrame)
  ```

- Added `MetricsLogObserver` for logging performance metrics from `MetricsFrame`
  instances. Supports filtering via `include_metrics` parameter to control which
  metrics types are logged (TTFB, processing time, LLM token usage, TTS usage,
  smart turn metrics).

- Added `pronunciation_dictionary_locators` to `ElevenLabsTTSService` and
  `ElevenLabsHttpTTSService`.

- Added support for loading external observers. You can now register custom
  pipeline observers by setting the `PIPECAT_OBSERVER_FILES` environment
  variable. This variable should contain a colon-separated list of Python files
  (e.g. `export PIPECAT_OBSERVER_FILES="observer1.py:observer2.py:..."`). Each
  file must define a function with the following signature:

  ```python
  async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]:
      ...
  ```

- Added support for new sonic-3 languages in `CartesiaTTSService` and
  `CartesiaHttpTTSService`.

- `EndFrame` and `EndTaskFrame` have an optional `reason` field to indicate why
  the pipeline is being ended.

- `CancelFrame` and `CancelTaskFrame` have an optional `reason` field to
  indicate why the pipeline is being canceled. This can be also specified when
  you cancel a task with `PipelineTask.cancel(reason="cancellation reason")`.

- Added `include_prob_metrics` parameter to Whisper STT services to enable access
  to probability metrics from transcription results.

- Added utility functions `extract_whisper_probability()`,
  `extract_openai_gpt4o_probability()`, and `extract_deepgram_probability()` to
  extract probability metrics from `TranscriptionFrame` objects for Whisper-based,
  OpenAI GPT-4o-transcribe, and Deepgram STT services respectively.

- Added `LLMSwitcher.register_direct_function()`. It works much like
  `LLMSwitcher.register_function()` in that it's a shorthand for registering
  functions on all LLMs in the switcher, but for direct functions.

- Added `LLMSwitcher.register_direct_function()`. It works much like
  `LLMSwitcher.register_function()` in that it's a shorthand for registering
  a function on all LLMs in the switcher, except this new method takes a direct
  function (a `FunctionSchema`-less function).

- Added `MCPClient.get_tools_schema()` and `MCPClient.register_tools_schema()`
  as a two-step alternative to `MCPClient.register_tools()`, to allow users to
  pass MCP tools to, say, `GeminiLiveLLMService` (as well as other
  speech-to-speech services) in the constructor.

- Added support for passing in an `LLMSwicher` to `MCPClient.register_tools()`
  (as well as the new `MCPClient.register_tools_schema()`).

- Added `cpu_count` parameter to `LocalSmartTurnAnalyzerV3`. This is set to `1`
  by default for more predictable performance on low-CPU systems.

### Changed

- Updated `simli-ai` to 0.1.25.

- `STTMuteFilter` no longer sends `STTMuteFrame` to the STT service. The filter
  now blocks frames locally without instructing the STT service to stop
  processing audio. This prevents inactivity-related errors (such as 409 errors
  from Google STT) while maintaining the same muting behavior at the application
  level. Important: The STTMuteFilter should be placed _after_ the STT service
  itself.

- Improved `GoogleSTTService` error handling to properly catch gRPC `Aborted`
  exceptions (corresponding to 409 errors) caused by stream inactivity. These
  exceptions are now logged at DEBUG level instead of ERROR level, since they
  indicate expected behavior when no audio is sent for 10+ seconds (e.g., during
  long silences or when audio input is blocked). The service automatically
  reconnects when this occurs.

- Bumped the `fastapi` dependency's upperbound to `<0.122.0`.

- Updated the default model for `GoogleVertexLLMService` to `gemini-2.5-flash`.

- Updated the `GoogleVertexLLMService` to use the `GoogleLLMService` as a base
  class instead of the `OpenAILLMService`.

- Updated STT and TTS services to pass through unverified language codes with a
  warning instead of returning None. This allows developers to use newly
  supported languages before Pipecat's service classes are updated, while still
  providing guidance on verified languages.

### Removed

- Removed `needs_mcp_alternate_schema()` from `LLMService`. The mechanism that
  relied on it went away.

### Fixed

- Restore backwards compatibility for vision/image features (broken in 0.0.92)
  when using non-universal context and assistant aggregators.

- Fixed `DeepgramSTTService._disconnect()` to properly await `is_connected()`
  method call, which is an async coroutine in the Deepgram SDK.

- Fixed an issue where the `SmallWebRTCRequest` dataclass in runner would scrub
  arbitrary request data from client due to camelCase typing. This fixes data
  passthrough for JS clients where `APIRequest` is used.

- Fixed a bug in `GeminiLiveLLMService` where in some circumstances it wouldn't
  respond after a tool call.

- Fixed `GeminiLiveLLMService` session resumption after a connection timeout.

- `GeminiLiveLLMService` now properly supports context-provided system
  instruction and tools.

- Fixed `GoogleLLMService` token counting to avoid double-counting tokens when
  Gemini sends usage metadata across multiple streaming chunks.

## [0.0.92] - 2025-10-31 🎃 "The Haunted Edition" 👻

### Added

- Added a new `DeepgramHttpTTSService`, which delivers a meaningful reduction
  in latency when compared to the `DeepgramTTSService`.

- Add support for `speaking_rate` input parameter in `GoogleHttpTTSService`.

- Added `enable_speaker_diarization` and `enable_language_identification` to
  `SonioxSTTService`.

- Added `SpeechmaticsTTSService`, which uses Speechmatic's TTS API. Updated
  examples 07a\* to use the new TTS service.

- Added support for including images or audio to LLM context messages using
  `LLMContext.create_image_message()` or `LLMContext.create_image_url_message()`
  (not all LLMs support URLs) and `LLMContext.create_audio_message()`. For
  example, when creating `LLMMessagesAppendFrame`:

  ```python
  message = LLMContext.create_image_message(image=..., size= ...)
  await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True))
  ```

- New event handlers for the `DeepgramFluxSTTService`: `on_start_of_turn`,
  `on_turn_resumed`, `on_end_of_turn`, `on_eager_end_of_turn`, `on_update`.

- Added `generation_config` parameter support to `CartesiaTTSService` and
  `CartesiaHttpTTSService` for Cartesia Sonic-3 models. Includes a new
  `GenerationConfig` class with `volume` (0.5-2.0), `speed` (0.6-1.5),
  and `emotion` (60+ options) parameters for fine-grained speech generation
  control.

- Expanded support for univeral `LLMContext` to `OpenAIRealtimeLLMService`.
  As a reminder, the context-setup pattern when using `LLMContext` is:

  ```python
  context = LLMContext(messages, tools)
  context_aggregator = LLMContextAggregatorPair(context)
  ```

  (Note that even though `OpenAIRealtimeLLMService` now supports the universal
  `LLMContext`, it is not meant to be swapped out for another LLM service at
  runtime with `LLMSwitcher`.)

  Note: `TranscriptionFrame`s and `InterimTranscriptionFrame`s now go upstream
  from `OpenAIRealtimeLLMService`, so if you're using `TranscriptProcessor`,
  say, you'll want to adjust accordingly:

  ```python
  pipeline = Pipeline(
    [
      transport.input(),
      context_aggregator.user(),

      # BEFORE
      llm,
      transcript.user(),

      # AFTER
      transcript.user(),
      llm,

      transport.output(),
      transcript.assistant(),
      context_aggregator.assistant(),
    ]
  )
  ```

  Also worth noting: whether or not you use the new context-setup pattern with
  `OpenAIRealtimeLLMService`, some types have changed under the hood:

  ```python
  ## BEFORE:

  # Context aggregator type
  context_aggregator: OpenAIContextAggregatorPair

  # Context frame type
  frame: OpenAILLMContextFrame

  # Context type
  context: OpenAIRealtimeLLMContext
  # or
  context: OpenAILLMContext

  ## AFTER:

  # Context aggregator type
  context_aggregator: LLMContextAggregatorPair

  # Context frame type
  frame: LLMContextFrame

  # Context type
  context: LLMContext
  ```

  Also note that `RealtimeMessagesUpdateFrame` and
  `RealtimeFunctionCallResultFrame` have been deprecated, since they're no
  longer used by `OpenAIRealtimeLLMService`. OpenAI Realtime now works more
  like other LLM services in Pipecat, relying on updates to its context, pushed
  by context aggregators, to update its internal state. Listen for
  `LLMContextFrame`s for context updates.

  Finally, `LLMTextFrame`s are no longer pushed from `OpenAIRealtimeLLMService`
  when it's configured with `output_modalities=['audio']`. If you need
  to process its output, listen for `TTSTextFrame`s instead.

- Expanded support for universal `LLMContext` to `GeminiLiveLLMService`.
  As a reminder, the context-setup pattern when using `LLMContext` is:

  ```python
  context = LLMContext(messages, tools)
  context_aggregator = LLMContextAggregatorPair(context)
  ```

  (Note that even though `GeminiLiveLLMService` now supports the universal
  `LLMContext`, it is not meant to be swapped out for another LLM service at
  runtime with `LLMSwitcher`.)

  Worth noting: whether or not you use the new context-setup pattern with
  `GeminiLiveLLMService`, some types have changed under the hood:

  ```python
  ## BEFORE:

  # Context aggregator type
  context_aggregator: GeminiLiveContextAggregatorPair

  # Context frame type
  frame: OpenAILLMContextFrame

  # Context type
  context: GeminiLiveLLMContext
  # or
  context: OpenAILLMContext

  ## AFTER:

  # Context aggregator type
  context_aggregator: LLMContextAggregatorPair

  # Context frame type
  frame: LLMContextFrame

  # Context type
  context: LLMContext
  ```

  Also note that `LLMTextFrame`s are no longer pushed from `GeminiLiveLLMService`
  when it's configured with `modalities=GeminiModalities.AUDIO`. If you need
  to process its output, listen for `TTSTextFrame`s instead.

### Changed

- The development runner's `/start` endpoint now supports passing
  `dailyRoomProperties` and `dailyMeetingTokenProperties` in the request body
  when `createDailyRoom` is true. Properties are validated against the
  `DailyRoomProperties` and `DailyMeetingTokenProperties` types respectively
  and passed to Daily's room and token creation APIs.

- `UserImageRawFrame` new fields `append_to_context` and `text`. The
  `append_to_context` field indicates if this image and text should be added to
  the LLM context (by the LLM assistant aggregator). The `text` field, if set,
  might also guide the LLM or the vision service on how to analyze the image.

- `UserImageRequestFrame` new fiels `append_to_context` and `text`. Both fields
  will be used to set the same fields on the captured `UserImageRawFrame`.

- `UserImageRequestFrame` don't require function call name and ID anymore.

- Updated `MoondreamService` to process `UserImageRawFrame`.

- `VisionService` expects `UserImageRawFrame` in order to analyze images.

- `DailyTransport` triggers `on_error` event if transcription can't be started
  or stopped.

- `DailyTransport` updates: `start_dialout()` now returns two values:
  `session_id` and `error`. `start_recording()` now returns two values:
  `stream_id` and `error`.

- Updated `daily-python` to 0.21.0.

- `SimliVideoService` now accepts `api_key` and `face_id` parameters directly,
  with optional `params` for `max_session_length` and `max_idle_time`
  configuration, aligning with other Pipecat service patterns.

- Updated the default model to `sonic-3` for `CartesiaTTSService` and
  `CartesiaHttpTTSService`.

- `FunctionFilter` now has a `filter_system_frames` arg, which controls whether
  or not SystemFrames are filtered.

- Upgraded `aws_sdk_bedrock_runtime` to v0.1.1 to resolve potential CPU issues
  when running `AWSNovaSonicLLMService`.

### Deprecated

- The `expect_stripped_words` parameter of `LLMAssistantAggregatorParams` is
  ignored when used with the newer `LLMAssistantAggregator`, which now handles
  word spacing automatically.

- `LLMService.request_image_frame()` is deprecated, push a
  `UserImageRequestFrame` instead.

- `UserResponseAggregator` is deprecated and will be removed in a future version.

- The `send_transcription_frames` argument to `OpenAIRealtimeLLMService` is
  deprecated. Transcription frames are now always sent. They go upstream, to be
  handled by the user context aggregator. See "Added" section for details.

- Types in `pipecat.services.openai.realtime.context` and
  `pipecat.services.openai.realtime.frames` are deprecated, as they're no
  longer used by `OpenAIRealtimeLLMService`. See "Added" section for details.

- `SimliVideoService` `simli_config` parameter is deprecated. Use `api_key` and
  `face_id` parameters instead.

### Removed

- Removed `enable_non_final_tokens` and `max_non_final_tokens_duration_ms` from
  `SonioxSTTService`.

- Removed the `aiohttp_session` arg from `SarvamTTSService` as it's no longer
  used.

### Fixed

- Fixed a `PipelineTask` issue that was causing an idle timeout for frames that
  were being generated but not reaching the end of the pipeline. Since the exact
  point when frames are discarded is unknown, we now monitor pipeline frames
  using an observer. If the observer detects frames are being generated, it will
  prevent the pipeline from being considered idle.

- Fixed an issue in `HumeTTSService` that was only using Octave 2, which does
  not support the `description` field. Now, if a description is provided, it
  switches to Octave 1.

- Fixed an issue where `DailyTransport` would timeout prematurely on join and on
  leave.

- Fixed an issue in the runner where starting a DailyTransport room via
  `/start` didn't support using the `DAILY_SAMPLE_ROOM_URL` env var.

- Fixed an issue in `ServiceSwitcher` where the `STTService`s would result in
  all STT services producing `TranscriptionFrame`s.

### Other

- Updated all vision 12-series foundational examples to load images from a file.

- Added 14-series video examples for different services. These new examples
  request an image from the user camera through a function call.

## [0.0.91] - 2025-10-21

### Added

- It is now possible to start a bot from the `/start` endpoint when using the
  runner Daily's transport. This follows the Pipecat Cloud format with
  `createDailyRoom` and `body` fields in the POST request body.

- Added an ellipsis character (`…`) to the end of sentence detection in the
  string utils.

- Expanded support for universal `LLMContext` to `AWSNovaSonicLLMService`.
  As a reminder, the context-setup pattern when using `LLMContext` is:

  ```python
  context = LLMContext(messages, tools)
  context_aggregator = LLMContextAggregatorPair(context)
  ```

  (Note that even though `AWSNovaSonicLLMService` now supports the universal
  `LLMContext`, it is not meant to be swapped out for another LLM service at
  runtime with `LLMSwitcher`.)

  Worth noting: whether or not you use the new context-setup pattern with
  `AWSNovaSonicLLMService`, some types have changed under the hood:

  ```python
  ## BEFORE:

  # Context aggregator type
  context_aggregator: AWSNovaSonicContextAggregatorPair

  # Context frame type
  frame: OpenAILLMContextFrame

  # Context type
  context: AWSNovaSonicLLMContext
  # or
  context: OpenAILLMContext

  ## AFTER:

  # Context aggregator type
  context_aggregator: LLMContextAggregatorPair

  # Context frame type
  frame: LLMContextFrame

  # Context type
  context: LLMContext
  ```

- Added support for `bulbul:v3` model in `SarvamTTSService` and
  `SarvamHttpTTSService`.

- Added `keyterms_prompt` parameter to `AssemblyAIConnectionParams`.

- Added `speech_model` parameter to `AssemblyAIConnectionParams` to access the
  multilingual model.

- Added support for trickle ICE to the `SmallWebRTCTransport`.

- Added support for updating `OpenAITTSService` settings (`instructions` and
  `speed`) at runtime via `TTSUpdateSettingsFrame`.

- Added `--whatsapp` flag to runner to better surface WhatsApp transport logs.

- Added `on_connected` and `on_disconnected` events to TTS and STT
  websocket-based services.

- Added an `aggregate_sentences` arg in `ElevenLabsHttpTTSService`, where the
  default value is True.

- Added a `room_properties` arg to the Daily runner's `configure()` method,
  allowing `DailyRoomProperties` to be provided.

- The runner `--folder` argument now supports downloading files from
  subdirectories.

### Changed

- `RunnerArguments` now include the `body` field, so there's no need to add it
  to subclasses. Also, all `RunnerArguments` fields are now keyword-only.

- `CartesiaSTTService` now inherits from `WebsocketSTTService`.

- Package upgrades:

  - `daily-python` upgraded to 0.20.0.
  - `openai` upgraded to support up to 2.x.x.
  - `openpipe` upgraded to support up to 5.x.x.

- `SpeechmaticsSTTService` updated dependencies for `speechmatics-rt>=0.5.0`.

### Deprecated

- The `send_transcription_frames` argument to `AWSNovaSonicLLMService` is
  deprecated. Transcription frames are now always sent. They go upstream, to be
  handled by the user context aggregator. See "Added" section for details.

- Types in `pipecat.services.aws.nova_sonic.context` are deprecated, as they're
  no longer used by `AWSNovaSonicLLMService`. See "Added" section for
  details.

### Fixed

- Fixed an issue where the `RTVIProcessor` was sending duplicate
  `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` messages.

- Fixed an issue in `AWSBedrockLLMService` where both `temperature` and `top_p`
  were always sent together, causing conflicts with models like Claude Sonnet 4.5
  that don't allow both parameters simultaneously. The service now only includes
  inference parameters that are explicitly set, and `InputParams` defaults have
  been changed to `None` to rely on AWS Bedrock's built-in model defaults.

- Fixed an issue in `RivaSegmentedSTTService` where a runtime error occurred due
  to a mismatch in the `_handle_transcription` method's signature.

- Fixed multiple pipeline task cancellation issues. `asyncio.CancelledError` is
  now handled properly in `PipelineTask` making it possible to cancel an asyncio
  task that it's executing a `PipelineRunner` cleanly. Also,
  `PipelineTask.cancel()` does not block anymore waiting for the `CancelFrame`
  to reach the end of the pipeline (going back to the behavior in < 0.0.83).

- Fixed an issue in `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` where
  the Flash models would split words, resulting in a space being inserted
  between words.

- Fixed an issue where audio filters' `stop()` would not be called when using
  `CancelFrame`.

- Fixed an issue in `ElevenLabsHttpTTSService`, where
  `apply_text_normalization` was incorrectly set as a query parameter. It's now
  being added as a request parameter.

- Fixed an issue where `RimeHttpTTSService` and `PiperTTSService` could generate
  incorrectly 16-bit aligned audio frames, potentially leading to internal
  errors or static audio.

- Fixed an issue in `SpeechmaticsSTTService` where `AdditionalVocabEntry` items
  needed to have `sounds_like` for the session to start.

### Other

- Added foundational example `47-sentry-metrics.py`, demonstrating how to use the
  `SentryMetrics` processor.

- Added foundational example `14x-function-calling-openpipe.py`.

## [0.0.90] - 2025-10-10

### Added

- Added audio filter `KrispVivaFilter` using the Krisp VIVA SDK.

- Added `--folder` argument to the runner, allowing files saved in that folder
  to be downloaded from `http://HOST:PORT/file/FILE`.

- Added `GeminiLiveVertexLLMService`, for accessing Gemini Live via Google
  Vertex AI.

- Added some new configuration options to `GeminiLiveLLMService`:

  - `thinking`
  - `enable_affective_dialog`
  - `proactivity`

  Note that these new configuration options require using a newer model than
  the default, like "gemini-2.5-flash-native-audio-preview-09-2025". The last
  two require specifying `http_options=HttpOptions(api_version="v1alpha")`.

- Added `on_pipeline_error` event to `PipelineTask`. This event will get fired
  when an `ErrorFrame` is pushed (use `FrameProcessor.push_error()`).

  ```python
  @task.event_handler("on_pipeline_error")
  async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame):
      ...
  ```

- Added a `service_tier` `InputParam` to the `BaseOpenAILLMService`. This
  parameter can influence the latency of the response. For example `"priority"`
  will result in faster completions, but in exchange for a higher price.

### Changed

- Updated `GeminiLiveLLMService` to use the `google-genai` library rather than
  use WebSockets directly.

### Deprecated

- `LivekitFrameSerializer` is now deprecated. Use `LiveKitTransport` instead.

- `pipecat.service.openai_realtime` is now deprecated, use
  `pipecat.services.openai.realtime` instead or
  `pipecat.services.azure.realtime` for Azure Realtime.

- `pipecat.service.aws_nova_sonic` is now deprecated, use
  `pipecat.services.aws.nova_sonic` instead.

- `GeminiMultimodalLiveLLMService` is now deprecated, use
  `GeminiLiveLLMService`.

### Fixed

- Fixed a `GoogleVertexLLMService` issue that would generate an error if no
  token information was returned.

- `GeminiLiveLLMService` will now end gracefully (i.e. after the bot has
  finished) upon receiving an `EndFrame`.

- `GeminiLiveLLMService` will try to seamlessly reconnect when it loses its
  connection.

## [0.0.89] - 2025-10-07

### Fixed

- Reverted a change introduced in 0.0.88 that was causing pipelines to be frozen
  when using interruption strategies and processors that block interruption
  frames (e.g. `STTMuteFilter`).

## [0.0.88] - 2025-10-07

### Added

- Added support for Nano Banana models to `GoogleLLMService`. For example, you
  can now use the `gemini-2.5-flash-image` model to generate images.

- Added `HumeTTSService` for text-to-speech synthesis using Hume AI's expressive
  voice models. Provides high-quality, emotionally expressive speech synthesis
  with support for various voice models. Includes example in
  `examples/foundational/07ad-interruptible-hume.py`. Use with:
  `uv pip install pipecat-ai[hume]`.

### Changed

- Updated default `GoogleLLMService` model to `gemini-2.5-flash`.

### Deprecated

- PlayHT is shutting down their API on December 31st, 2025. As a result,
  `PlayHTTTSService` and `PlayHTHttpTTSService` are deprecated and will be
  removed in a future version.

### Fixed

- Fixed an issue with `AWSNovaSonicLLMService` where the client wouldn't
  connect due to a breaking change in the AWS dependency chain.

- `PermissionError` is now caught if NLTK's `punkt_tab` can't be downloaded.

- Fixed an issue that would cause wrong user/assistant context ordering when
  using interruption strategies.

- Fixed RTVI incoming message handling, broken in 0.0.87.

## [0.0.87] - 2025-10-02

### Added

- Added `WebsocketSTTService` base class for websocket-based STT services.
  Combines STT functionality with websocket connectivity, providing automatic
  error handling and reconnection capabilities with exponential backoff.

- Added `DeepgramFluxSTTService` for real-time speech recognition using
  Deepgram's Flux WebSocket API. Flux understands conversational flow and
  automatically handles turn-taking.

- Added RTVI messages for user/bot audio levels and system logs.

- Include OpenAI-based LLM services cached tokens to `MetricsFrame`.

### Changed

- Updated the default model for `AnthropicLLMService` to
  `claude-sonnet-4-5-20250929`.

### Deprecated

- `DailyTransportMessageFrame` and `DailyTransportMessageUrgentFrame` are
  deprecated, use `DailyOutputTransportMessageFrame` and
  `DailyOutputTransportMessageUrgentFrame` respectively instead.

- `LiveKitTransportMessageFrame` and `LiveKitTransportMessageUrgentFrame` are
  deprecated, use `LiveKitOutputTransportMessageFrame` and
  `LiveKitOutputTransportMessageUrgentFrame` respectively instead.

- `TransportMessageFrame` and `TransportMessageUrgentFrame` are deprecated, use
  `OutputTransportMessageFrame` and `OutputTransportMessageUrgentFrame`
  respectively instead.

- `InputTransportMessageUrgentFrame` is deprecated, use
  `InputTransportMessageFrame` instead.

- `DailyUpdateRemoteParticipantsFrame` is deprecated and will be removed in a
  future version. Instead, create your own custom frame and handle it in the
  `@transport.output().event_handler("on_after_push_frame")` event handler or a
  custom processor.

## Fixed

- Fixed an issue in `AWSBedrockLLMService` where timeout exceptions weren't
  being detected.

- Fixed a `PipelineTask` issue that could prevent the application to exit if
  `task.cancel()` was called when the task was already finished.

- Fixed an issue where local SmartTurn was not being ran in a separate thread.

## [0.0.86] - 2025-09-24

### Added

- Added `HeyGenTransport`. This is an integration for HeyGen Interactive
  Avatar. A video service that handles audio streaming and requests HeyGen to
  generate avatar video responses. (see https://www.heygen.com/). When used, the
  Pipecat bot joins the same virtual room as the HeyGen Avatar and the user.

- Added support to `TwilioFrameSerializer` for `region` and `edge` settings.

- Added support for using universal `LLMContext` with:

  - `LLMLogObserver`
  - `GatedLLMContextAggregator` (formerly `GatedOpenAILLMContextAggregator`)
  - `LangchainProcessor`
  - `Mem0MemoryService`

- Added `StrandsAgentProcessor` which allows you to use the Strands Agents
  framework to build your voice agents.
  See https://strandsagents.com

- Added `ElevenLabsSTTService` for speech-to-text transcription.

- Added a peer connection monitor to the `SmallWebRTCConnection` that
  automatically disconnects if the connection fails to establish within
  the timeout (1 minute by default).

- Added memory cleanup improvements to reduce memory peaks.

- Added `on_before_process_frame`, `on_after_process_frame`,
  `on_before_push_frame` and `on_after_push_frame`. These are synchronous events
  that get called before and after a frame is processed or pushed. Note that
  these events are synchrnous so they should ideally perform lightweight tasks
  in order to not block the pipeline. See
  `examples/foundational/45-before-and-after-events.py`.

- Added `on_before_leave` synchronous event to `DailyTransport`.

- Added `on_before_disconnect` synchronous event to `LiveKitTransport`.

- It is now possible to register synchronous event handlers. By default, all
  event handlers are executed in a separate task. However, in some cases we want
  to guarantee order of execution, for example, executing something before
  disconnecting a transport.

  ```python
  self._register_event_handler("on_event_name", sync=True)
  ```

- Added support for global location in `GoogleVertexLLMService`. The service now
  supports both regional locations (e.g., "us-east4") and the "global" location
  for Vertex AI endpoints. When using "global" location, the service will use
  `aiplatform.googleapis.com` as the API host instead of the regional format.

- Added `on_pipeline_finished` event to `PipelineTask`. This event will get
  fired when the pipeline is done running. This can be the result of a
  `StopFrame`, `CancelFrame` or `EndFrame`.

  ```python
  @task.event_handler("on_pipeline_finished")
  async def on_pipeline_finished(task: PipelineTask, frame: Frame):
      ...
  ```

- Added support for new RTVI `send-text` event, along with the ability to toggle
  the audio response off (skip tts) while handling the new context.

### Changed

- Updated `aiortc` to 1.13.0.

- Updated `sentry` to 2.38.0.

- `BaseOutputTransport` methods `write_audio_frame` and `write_video_frame` now
  return a boolean to indicate if the transport implementation was able to write
  the given frame or not.

- Updated Silero VAD model to v6.

- Updated `livekit` to 1.0.13.

- `torch` and `torchaudio` are no longer required for running Smart Turn
  locally. This avoids gigabytes of dependencies being installed.

- Updated `websockets` dependency to support version 15.0. Removed deprecated
  usage of `ConnectionClosed.code` and `ConnectionClosed.reason` attributes in
  `AWSTranscribeSTTService` for compatibility.

- Refactored `pyproject.toml` to reduce websockets dependency repetition using
  self-referencing extras. All websockets-dependent services now reference a
  shared `websockets-base` extra.

### Deprecated

- `GladiaSTTService`'s `confidence` arg is deprecated. `confidence` is no
  longer needed to determine which transcription or translation frames to
  emit.

- `PipelineTask` events `on_pipeline_stopped`, `on_pipeline_ended` and
  `on_pipeline_cancelled` are now deprecated. Use `on_pipeline_finished`
  instead.

- Support for the RTVI `append-to-context` event, in lieu of the new `send-text`
  event and making way for future events like `send-image`.

### Fixed

- Fixed an issue where the pipeline could freeze if a task cancellation never
  completed because a third-party library swallowed asyncio.CancelledError. We
  now apply a timeout to task cancellations to prevent these freezes. If the
  timeout is reached, the system logs warnings and leaves dangling tasks behind,
  which can help diagnose where cancellation is being blocked.

- Fixed an `AudioBufferProcessor` issues that was causing user audio to be
  missing in stereo recordings causing bot and user overlaps.

- Fixed a `BaseOutputTransport` issue that could produce large saved
  `AudioBufferProcessor` files when using an audio mixer.

- Fixed a `PipelineRunner` issue on Windows where setting up SIGINT and SIGTERM
  was raising an exception.

- Fixed an issue where multiple handlers for an event would not run in parallel.

- Fixed `DailyTransport.sip_call_transfer()` to automatically use the session
  ID from the `on_dialin_connected` event, when not explicitly provided. Now
  supports cold transfers (from incoming dial-in calls) by automatically
  tracking session IDs from connection events.

- Fixed a memory leak in `SmallWebRTCTransport`. In `aiortc`, when you receive
  a `MediaStreamTrack` (audio or video), frames are produced asynchronously. If
  the code never consumes these frames, they are queued in memory, causing a
  memory leak.

- Fixed an issue in `AsyncAITTSService`, where `TTSTextFrames` were not being
  pushed.

- Fixed an issue that would cause `push_interruption_task_frame_and_wait()` to
  not wait if a previous interruption had already happened.

- Fixed a couple of bugs in `ServiceSwitcher`:

  - Using multiple `ServiceSwitcher`s in a pipeline would result in an error.
  - `ServiceSwitcherFrame`s (such as `ManuallySwitchServiceFrame`s) were having
    an effect too early, essentially "jumping the queue" in terms of pipeline
    frame ordering.

- Fixed a self-cancellation deadlock in `UserIdleProcessor` when returning
  `False` from an idle callback. The task now terminates naturally instead of
  attempting to cancel itself.

- Fixed an issue in `AudioBufferProcessor` where a recording is not created
  when a bot speaks and user input is blocked.

- Fixed a `FastAPIWebsocketTransport` and `SmallWebRTCTransport` issue where
  `on_client_disconnected` would be triggered when the bot ends the
  conversation. That is, `on_client_disconnected` should only be triggered when
  the remote client actually disconnects.

- Fixed an issue in `HeyGenVideoService` where the `BotStartedSpeakingFrame`
  was blocked from moving through the Pipeline.

## [0.0.85] - 2025-09-12

### Added

- `AzureSTTService` now pushes interim transcriptions.

- Added `voice_cloning_key` to `GoogleTTSService` to support custom cloned
  voices.

- Added `speaking_rate` to `GoogleTTSService.InputParams` to control the
  speaking rate.

- Added a `speed` arg to `OpenAITTSService` to control the speed of the voice
  response.

- Added `FrameProcessor.push_interruption_task_frame_and_wait()`. Use this
  method to programatically interrupt the bot from any part of the
  pipeline. This guarantees that all the processors in the pipeline are
  interrupted in order (from upstream to downstream). Internally, this works by
  first pushing an `InterruptionTaskFrame` upstream until it reaches the
  pipeline task. The pipeline task then generates an `InterruptionFrame`, which
  flows downstream through all processors. Once the `InterruptionFrame` has
  reaches the processor waiting for the interruption, the function returns and
  execution continues after the call. Think of it as sending an upstream request
  for interruption and waiting until the acknowledgment flows back downstream.

- Added new base `TaskFrame` (which is a system frame). This is the base class
  for all task frames (`EndTaskFrame`, `CancelTaskFrame`, etc.) that are meant
  to be pushed upstream to reach the pipeline task.

- Expanded support for universal `LLMContext` to the AWS Bedrock LLM service.
  Using the universal `LLMContext` and associated `LLMContextAggregatorPair` is
  a pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime.

- Added new fields to the development runner's `parse_telephony_websocket`
  method in support of providing dynamic data to a bot.

  - Twilio: Added a new `body` parameter, which parses the websocket message
    for `customParameters`. Provide data via the `Parameter` nouns in your
    TwiML to use this feature.
  - Telnyx & Exotel: Both providers make the `to` and `from` phone numbers
    available in the websocket messages. You can now access these numbers as
    `call_data["to"]` and `call_data["from"]`.

  Note: Each telephony provider offers different features. Refer to the
  corresponding example in `pipecat-examples` to see how to pass custom data
  to your bot.

- Added `body` to the `WebsocketRunnerArguments` as an optional parameter.
  Custom `body` information can be passed from the server into the bot file via
  the `bot()` method using this new parameter.

- Added video streaming support to `LiveKitTransport`.

- Added `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService` which provide
  access to OpenAI Realtime.

### Changed

- `pipeline.tests.utils.run_test()` now allows passing `PipelineParams` instead
  of individual parameters.

### Removed

- Remove `VisionImageRawFrame` in favor of context frames (`LLMContextFrame` or
  `OpenAILLMContextFrame`).

### Deprecated

- `BotInterruptionFrame` is now deprecated, use `InterruptionTaskFrame` instead.

- `StartInterruptionFrame` is now deprected, use `InterruptionFrame` instead.

- Deprecate `VisionImageFrameAggregator` because `VisionImageRawFrame` has been
  removed. See the `12*` examples for the new recommended replacement pattern.

- `NoisereduceFilter` is now deprecated and will be removed in a future
  version. Use other audio filters like `KrispFilter` or `AICFilter`.

- Deprecated `OpenAIRealtimeBetaLLMService` and `AzureRealtimeBetaLLMService`.
  Use `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService`, respectively.
  Each service will be removed in an upcoming version, 1.0.0.

### Fixed

- Fixed a `BaseOutputTransport` issue that caused incorrect detection of when
  the bot stopped talking while using an audio mixer.

- Fixed a `LiveKitTransport` issue where RTVI messages were not properly
  encoded.

- Add additional fixups to Mistral context messages to ensure they meet
  Mistral-specific requirements, avoiding Mistral "invalid request" errors.

- Fixed `DailyTransport` transcription handling to gracefully handle missing
  `rawResponse` field in transcription messages, preventing KeyError crashes.

## [0.0.84] - 2025-09-05

### Added

- Add the ability to send DTMF to `LiveKitTransport`.

- Expanded support for universal `LLMContext` to the Anthropic LLM service.
  Using the universal `LLMContext` and associated `LLMContextAggregatorPair` is
  a pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime.

### Changed

- Updated `daily-python` to 0.19.9.

- Restored `DailyTransport`'s native DTMF support using Daily's `send_dtmf()`
  method instead of generated audio tones.

### Fixed

- Fixed a `AWSBedrockLLMService` crash caused by an extra `await`.

- Fixed a `OpenAIImageGenService` issue where it was not creating
  `URLImageRawFrame` correctly.

## [0.0.83] - 2025-09-03

### Added

- Added multilingual support for AsyncAI in `AsyncAITTSService` and `AsyncAIHttpTTSService`.

  - New `languages`: `es`, `fr`, `de`, `it`.

- Added new frames `InputTransportMessageUrgentFrame` and
  `DailyInputTransportMessageUrgentFrame` for transport messages received from
  external sources.

- Added `UserSpeakingFrame`. This will be sent upstream and downstream while VAD
  detects the user is speaking.

- Expanded support for universal `LLMContext` to more LLM services. Using the
  universal `LLMContext` and associated `LLMContextAggregatorPair` is a
  pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime.
  Here are the newly-supported services:

  - Azure
  - Cerebras
  - Deepseek
  - Fireworks AI
  - Google Vertex AI
  - Grok
  - Groq
  - Mistral
  - NVIDIA NIM
  - Ollama
  - OpenPipe
  - OpenRouter
  - Perplexity
  - Qwen
  - SambaNova
  - Together.ai

- Added support for WhatsApp User-initiated Calls.

- Added new audio filter `AICFilter`, speech enhancement for improving VAD/STT
  performance, no ONNX dependency.
  See https://ai-coustics.com/sdk/

- Added a timeout around cancel input tasks to prevent indefinite hangs when
  cancellation is swallowed by third-party code.

- Added `pipecat.extensions.ivr` for automated IVR system navigation with
  configurable goals and conversation handling. Supports DTMF input, verbal
  responses, and intelligent menu traversal.

  Basic usage:

  ```python
  from pipecat.extensions.ivr.ivr_navigator import IVRNavigator

  # Create IVR navigator with your goal
  ivr_navigator = IVRNavigator(
      llm=llm_service,
      ivr_prompt="Navigate to billing department to dispute a charge"
  )

  # Handle different outcomes
  @ivr_navigator.event_handler("on_conversation_detected")
  async def on_conversation(processor, conversation_history):
      # Switch to normal conversation mode
      pass

  @ivr_navigator.event_handler("on_ivr_status_changed")
  async def on_ivr_status(processor, status):
      if status == IVRStatus.COMPLETED:
          # End pipeline, transfer call, or start bot conversation
      elif status == IVRStatus.STUCK:
          # Handle navigation failure
  ```

- `BaseOutputTransport` now implements `write_dtmf()` by loading DTMF audio and
  sending it through the transport. This makes sending DTMF generic across all
  output transports.

- Added new config parameters to `GladiaSTTService`.
  - PreProcessingConfig > `audio_enhancer` to enhance audio quality.
  - CustomVocabularyItem > `pronunciations` and `language` to specify special
    pronunciations and in which language it will be pronounced.

### Changed

- `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` are also pushed
  upstream.

- `ParallelPipeline` now waits for `CancelFrame` to finish in all branches
  before pushing it downstream.

- Added `sip_codecs` to the `DailyRoomSipParams`.

- Updated the `configure()` function in `pipecat.runner.daily` to include new
  args to create SIP-enabled rooms. Additionally, added new args to control the
  room and token expiration durations.

- `pipecat.frames.frames.KeypadEntry` is deprecated and has been moved to
  `pipecat.audio.dtmf.types.KeypadEntry`.

- Updated `RimeTTSService`'s flush_audio message to conform with Rime's official
  API.

- Updated the default model for `CerebrasLLMService` to GPT-OSS-120B.

### Removed

- Remove `StopInterruptionFrame`. This was a legacy frame that was not being
  used really anywhere and it didn't provide any useful meaning. It was only
  pushed after `UserStoppedSpeakingFrame`, so developers can just use
  `UserStoppedSpeakingFrame`.

- `DailyTransport.write_dtmf()` has been removed in favor of the generic
  `BaseOutputTransport.write_dtmf()`.

- Remove deprecated `DailyTransport.send_dtmf()`.

### Deprecated

- Transports have been re-organized.

  ```
  pipecat.transports.network.small_webrtc        -> pipecat.transports.smallwebrtc.transport
  pipecat.transports.network.webrtc_connection   -> pipecat.transports.smallwebrtc.connection
  pipecat.transports.network.websocket_client    -> pipecat.transports.websocket.client
  pipecat.transports.network.websocket_server    -> pipecat.transports.websocket.server
  pipecat.transports.network.fastapi_websocket   -> pipecat.transports.websocket.fastapi
  pipecat.transports.services.daily              -> pipecat.transports.daily.transport
  pipecat.transports.services.helpers.daily_rest -> pipecat.transports.daily.utils
  pipecat.transports.services.livekit            -> pipecat.transports.livekit.transport
  pipecat.transports.services.tavus              -> pipecat.transports.tavus.transport
  ```

- `pipecat.frames.frames.KeypadEntry` is deprecated use
  `pipecat.audio.dtmf.types.KeypadEntry` instead.

### Fixed

- Fixed an issue where messages received from the transport were always being resent.

- Fixed `SmallWebRTCTransport` to not use `mid` to decide if the transceiver should
  be `sendrecv` or not.

- Fixed an issue where Deepgram swallowed `asyncio.CancelledError` during
  disconnect, preventing tasks from being cancelled.

- Fixed an issue where `PipelineTask` was not cleaning up the observers.

### Performance

- Reduced latency and improved memory performance in `Mem0MemoryService`.

## [0.0.82] - 2025-08-28

### Added

- Added a new `LLMRunFrame` to trigger an LLM response:

  ```python
  await task.queue_frames([LLMRunFrame()])
  ```

  This replaces `OpenAILLMContextFrame`, which you’d previously typically use
  like this:

  ```python
  await task.queue_frames([context_aggregator.user().get_context_frame()])
  ```

  Use this way of kicking off your conversation when you’ve already initialized
  your context and are simply instructing the bot when to go:

  ```python
  context = OpenAILLMContext(messages, tools)
  context_aggregator = llm.create_context_aggregator(context)

  # ...

  @transport.event_handler("on_client_connected")
  async def on_client_connected(transport, client):
      # Kick off the conversation.
      await task.queue_frames([LLMRunFrame()])
  ```

  Note that if you want to add new messages when kicking off the conversation,
  you could use `LLMMessagesAppendFrame` with `run_llm=True` instead:

  ```python
  @transport.event_handler("on_client_connected")
  async def on_client_connected(transport, client):
      # Kick off the conversation.
      await task.queue_frames([LLMMessagesAppendFrame(new_messages, run_llm=True)])
  ```

  In the rare case you don’t have a context aggregator in your pipeline, then
  you may continue using a context frame.

- Added support for switching between audio+text to text-only modes within the
  same pipeline. This is done by pushing
  `LLMConfigureOutputFrame(skip_tts=True)` to enter text-only mode, and
  disabling it to return to audio+text. The LLM will still generate tokens and
  add them to the context, but they will not be sent to TTS.

- Added `skip_tts` field to `TextFrame`. This lets a text frame bypass TTS while
  still being included in the LLM context. Useful for cases like structured text
  that isn’t meant to be spoken but should still contribute to context.

- Added a `cancel_timeout_secs` argument to `PipelineTask` which defines how
  long the pipeline has to complete cancellation. When `PipelineTask.cancel()`
  is called, a `CancelFrame` is pushed through the pipeline and must reach the
  end. If it does not reach the end within the specified time, a warning is
  shown and the wait is aborted.

- Added a new "universal" (LLM-agnostic) `LLMContext` and accompanying
  `LLMContextAggregatorPair`, which will eventually replace `OpenAILLMContext`
  (and the other under-the-hood contexts) and the other context aggregators.
  The new universal `LLMContext` machinery allows a single context to be shared
  between different LLMs, enabling runtime LLM switching and scenarios like
  failover.

  From the developer's point of view, switching to using the new universal
  context machinery will usually be a matter of going from this:

  ```python
  context = OpenAILLMContext(messages, tools)
  context_aggregator = llm.create_context_aggregator(context)
  ```

  To this:

  ```python
  context = LLMContext(messages, tools)
  context_aggregator = LLMContextAggregatorPair(context)
  ```

  To start, the universal `LLMContext` is supported with the following LLM
  services:

  - `OpenAILLMService`
  - `GoogleLLMService`

- Added a new `LLMSwitcher` class to enable runtime LLM switching, built atop a
  new generic `ServiceSwitcher`.

  Switchers take a switching strategy. The first available strategy is
  `ServiceSwitcherStrategyManual`.

  To switch LLMs at runtime, the LLMs must be sharing one instance of the new
  universal `LLMContext` (see above bullet).

  ```python
  # Instantiate your LLM services
  llm_openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
  llm_google = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))

  # Instantiate a switcher
  # (ServiceSwitcherStrategyManual defaults to OpenAI, as it's first in the list)
  llm_switcher = LLMSwitcher(
      llms=[llm_openai, llm_google], strategy_type=ServiceSwitcherStrategyManual
  )

  # Create your pipeline
  pipeline = Pipeline(
    [
        transport.input(),
        stt,
        context_aggregator.user(),
        llm_switcher,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ]
  )
  task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))

  # ...
  # Whenever is appropriate, switch LLMs!
  await task.queue_frames([ManuallySwitchServiceFrame(service=llm_google)])
  ```

- Added an `LLMService.run_inference()` method to LLM services to enable
  direct, out-of-band (i.e. out-of-pipeline) inference.

### Changed

- Updated `daily-python` to 0.19.8.

- `PipelineTask` now waits for `StartFrame` to reach the end of the pipeline
  before pushing any other frames.

- Updated `CartesiaTTSService` and `CartesiaHttpTTSService` to align with
  Cartesia's changes for the `speed` parameter. It now takes only an enum of
  `slow`, `normal`, or `fast`.

- Added support to `AWSBedrockLLMService` for setting authentication
  credentials through environment variables.

- Updated `SarvamTTSService` to use WebSocket streaming for real-time audio
  generation with multiple Indian languages, with HTTP support still available
  via `SarvamHttpTTSService`.

### Fixed

- Fixed an RTVI issue that was causing frames to be pushed before pipeline was
  properly initialized.

- Fixed some `get_messages_for_logging()` that were returning a JSON string
  instead of a list.

- Fixed a `DailyTransport` issue that prevented DTMF tones from being sent.

- Fixed a missing import in `SentryMetrics`.

- Fixed `AWSPollyTTSService` to support AWS credential provider chain (IAM
  roles, IRSA, instance profiles) instead of requiring explicit environment
  variables.

- Fixed a `CartesiaTTSService` issue that was causing the application to hang
  after Cartesia's 5 minutes timed out.

- Fixed an issue preventing `SpeechmaticsSTTService` from transcribing audio.

## [0.0.81] - 2025-08-25

### Added

- Added `pipecat.extensions.voicemail`, a module for detecting voicemail vs.
  live conversation, primarily intended for use in outbound calling scenarios.
  The voicemail module is optimized for text LLMs only.

- Added new frames to the `idle_timeout_frames` arg: `TranscriptionFrame`,
  `InterimTranscriptionFrame`, `UserStartedSpeakingFrame`, and
  `UserStoppedSpeakingFrame`. These additions serve as indicators of user
  activity in the pipeline idle detection logic.

- Allow passing custom pipeline sink and source processors to a
  `Pipeline`. Pipeline source and sink processors are used to know and control
  what's coming in and out of a `Pipeline` processor.

- Added `FrameProcessor.pause_processing_system_frames()` and
  `FrameProcessor.resume_processing_system_frames()`. These allow to pause and
  resume the processing of system frame.

- Added new `on_process_frame()` observer method which makes it possible to know
  when a frame is being processed.

- Added new `FrameProcessor.entry_processor()` method. This allows you to access
  the first non-compound processor in a pipeline.

- Added `FrameProcessor` properties `processors`, `next` and `previous`.

- `ElevenLabsTTSService` now supports additional runtime changes to the `model`,
  `language`, and `voice_settings` parameters.

- Added `apply_text_normalization` support to `ElevenLabsTTSService` and
  `ElevenLabsHttpTTSService`.

- Added `MistralLLMService`, using Mistral's chat completion API.

- Added the ability to retry executing a chat completion after a timeout period
  for `OpenAILLMService` and its subclasses, `AnthropicLLMService`, and
  `AWSBedrockLLMService`. The LLM services accept new args:
  `retry_timeout_secs` and `retry_on_timeout`. This feature is disabled by
  default.

### Changed

- Updated `daily-python` to 0.19.7.

### Deprecated

- `FrameProcessor.wait_for_task()` is deprecated. Use `await task` or
  `await asyncio.wait_for(task, timeout)` instead.

### Removed

- Watchdog timers have been removed. They were introduced in 0.0.72 to help
  diagnose pipeline freezes. Unfortunately, they proved ineffective since they
  required developers to use Pipecat-specific queues, iterators, and events to
  correctly reset the timer, which limited their usefulness and added friction.

- Removed unused `FrameProcessor.set_parent()` and
  `FrameProcessor.get_parent()`.

### Fixed

- Fixed an issue that would cause `PipelineRunner` and `PipelineTask` to not
  handle external asyncio task cancellation properly.

- Added `SpeechmaticsSTTService` exception handling on connection and sending.

- Replaced `asyncio.wait_for()` for `wait_for2.wait_for()` for Python <
  3.12. because of issues regarding task cancellation (i.e. cancellation is
  never propagated).
  See https://bugs.python.org/issue42130

- Fixed an `AudioBufferProcessor` issues that would cause audio overlap when
  setting a max buffer size.

- Fixed an issue where `AsyncAITTSService` had very high latency in responding
  by adding `force=true` when sending the flush command.

### Performance

- Improve `PipelineTask` performance by using direct mode processors and by
  removing unnecessary tasks.

- Improve `ParallelPipeline` performance by using direct mode, by not
  creating a task for each frame and every sub-pipeline and also by removing
  other unnecessary tasks.

- `Pipeline` performance improvements by using direct mode.

### Other

- Added `14w-function-calling-mistal.py` using `MistralLLMService`.

- Added `13j-azure-transcription.py` using `AzureSTTService`.

## [0.0.80] - 2025-08-13

### Added

- Added `GeminiTTSService` which uses Google Gemini to generate TTS output. The
  Gemini model can be prompted to insert styled speech to control the TTS
  output.

- Added Exotel support to Pipecat's development runner. You can now connect
  using the runner with `uv run bot.py -t exotel` and an ngrok connection to
  HTTP port 7860.

- Added `enable_direct_mode` argument to `FrameProcessor`. The direct mode is
  for processors which require very little I/O or compute resources, that is
  processors that can perform their task almost immediately. These type of
  processors don't need any of the internal tasks and queues usually created by
  frame processors which means overall application performance might be slightly
  increased. Use with care.

- Added TTFB metrics for `HeyGenVideoService` and `TavusVideoService`.

- Added `endpoint_id` parameter to `AzureSTTService`. ([Custom EndpointId](https://docs.azure.cn/en-us/ai-services/speech-service/how-to-recognize-speech?pivots=programming-language-python#use-a-custom-endpoint))

### Changed

- `WatchdogPriorityQueue` now requires the items to be inserted to always be
  tuples and the size of the tuple needs to be specified in the constructor when
  creating the queue with the `tuple_size` argument.

- Updated Moondream to revision `2025-01-09`.

- Updated `PlayHTHttpTTSService` to no longer use the `pyht` client to remove
  compatibility issues with other packages. Now you can use the PlayHT HTTP
  service with other services, like GoogleLLMService.

- Updated `pyproject.toml` to once again pin `numba` to `==0.61.2` in order to
  resolve package versioning issues.

- Updated the `STTMuteFilter` to include `VADUserStartedSpeakingFrame` and
  `VADUserStoppedSpeakingFrame` in the list of frames to filter when the
  filtering is on.

### Performance

- Improving the latency of the `HeyGenVideoService`.

- Improved some frame processors performance by using the new frame processor
  direct mode. In direct mode a frame processor will process frames right away
  avoiding the need for internal queues and tasks. This is useful for some
  simple processors. For example, in processors that wrap other processors
  (e.g. `Pipeline`, `ParallelPipeline`), we add one processor before and one
  after the wrapped processors (internally, you will see them as sources and
  sinks). These sources and sinks don't do any special processing and they
  basically forward frames. So, for these simple processors we now enable the
  new direct mode which avoids creating any internal tasks (and queues) and
  therefore improves performance.

### Fixed

- Fixed an issue with the `BaseWhisperSTTService` where the language was
  specified as an enum and not a string.

- Fixed an issue where `SmallWebRTCTransport` ended before TTS finished.

- Fixed an issue in `OpenAIRealtimeBetaLLMService` where specifying a `text`
  `modalities` didn't result in text being outputted from the model.

- Added SSML reserved character escaping to `AzureBaseTTSService` to properly
  handle special characters in text sent to Azure TTS. This fixes an issue
  where characters like `&`, `<`, `>`, `"`, and `'` in LLM-generated text would
  cause TTS failures.

- Fixed a `WatchdogPriorityQueue` issue that could cause an exception when
  compating watchdog cancel sentinel items with other items in the queue.

- Fixed an issue that would cause system frames to not be processed with higher
  priority than other frames. This could cause slower interruption times.

- Fixed an issue where retrying a websocket connection error would result in an
  error.

### Other

- Add foundation example `19b-openai-realtime-beta-text.py`, showing how to use
  `OpenAIRealtimeBetaLLMService` to output text to a TTS service.

- Add vision support to release evals so we can run the foundational examples 12
  series.

- Added foundational example `15a-switch-languages.py` to release evals. It is
  able to detect if we switched the language properly.

- Updated foundational examples to show how to enclose complex logic
  (e.g. `ParallelPipeline`) into a single processor so the main pipeline becomes
  simpler.

- Added `07n-interruptible-gemini.py`, demonstrating how to use
  `GeminiTTSService`.

## [0.0.79] - 2025-08-07

### Changed

- Changed `pipecat-ai`'s `openai` dependency to `>=1.74.0,<=1.99.1` due to a
  breaking change in `openai` 1.99.2 ([commit](https://github.com/openai/openai-python/commit/657f551dbe583ffb259d987dafae12c6211fba06))

### Deprecated

- `TTSService.say()` is deprecated, push a `TTSSpeakFrame` instead. Calling
  functions directly is a discouraged pattern in Pipecat because, for example,
  it might cause issues with frame ordering.

- `LLMMessagesFrame` is deprecated, in favor of either:

  - `LLMMessagesUpdateFrame` with `run_llm=True`
  - `OpenAILLMContextFrame` with desired messages in a new context

- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` are
  deprecated, as they depended on the now-deprecated `LLMMessagesFrame`. Use
  `LLMUserContextAggregator` and `LLMAssistantResponseAggregator` (or
  LLM-specific subclasses thereof) instead.

## [0.0.78] - 2025-08-07

### Added

- Added `SonioxSTTService` using Soniox's STT websocket API.

- Added `enable_emulated_vad_interruptions` to `LLMUserAggregatorParams`.
  When user speech is emulated (e.g. when a transcription is received but
  VAD doesn't detect speech), this parameter controls whether the emulated
  speech can interrupt the bot. Default is False (emulated speech is ignored
  while the bot is speaking).

- Added new `handle_sigint` and `handle_sigterm` to `RunnerArguments`. This
  allows applications to know what settings they should use for the environment
  they are running on. Also, added `pipeline_idle_timeout_secs` to be able to
  control the `PipelineTask` idle timeout.

- Added `processor` field to `ErrorFrame` to indicate `FrameProcessor` that
  generated the error.

- Added new language support for `AWSTranscribeSTTService`. All languages
  supporting streaming data input are now supported:
  https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html

- Added support for Simli Trinity Avatars. A new `is_trinity_avatar` parameter
  has been introduced to specify whether the provided `faceId` corresponds to a
  Trinity avatar, which is required for optimal Trinity avatar performance.

- The development runner how handles custom `body` data for `DailyTransport`.
  The `body` data is passed to the Pipecat client. You can POST to the `/start`
  endpoint with a request body of:

  ```
  {
      "createDailyRoom": true,
      "dailyRoomProperties": { "start_video_off": true },
      "body": { "custom_data": "value" }
  }
  ```

  The `body` information is parsed and used in the application. The
  `dailyRoomProperties` are currently not handled.

- Added detailed latency logging to `UserBotLatencyLogObserver`, capturing
  average response time between user stop and bot start, as well as minimum and
  maximum response latency.

- Added Chinese, Japanese, Korean word timestamp support to
  `CartesiaTTSService`.

- Added `region` parameter to `GladiaSTTService`. Accepted values: eu-west
  (default), us-west.

### Changed

- System frames are now queued. Before, system frames could be generated from
  any task and would not guarantee any order which was causing undesired
  behavior. Also, it was possible to get into some rare recursion issues because
  of the way system frames were executed (they were executed in-place, meaning
  calling `push_frame()` would finish after the system frame traversed all the
  pipeline). This makes system frames more deterministic.

- Changed the default model for both `ElevenLabsTTSService` and
  `ElevenLabsHttpTTSService` to `eleven_turbo_v2_5`. The rationale for this
  change is that the Turbo v2.5 model exhibits the most stable voice quality
  along with very low latency TTFB; latencies are on par with the Flash v2.5
  model. Also, the Turbo v2.5 model outputs word/timestamp alignment data with
  correct spacing.

- The development runners `/connect` and `/start` endpoint now both return
  `dailyRoom` and `dailyToken` in place of the previous `room_url` and `token`.

- Updated the `pipecat.runner.daily` utility to only a take `DAILY_API_URL` and
  `DAILY_SAMPLE_ROOM_URL` environment variables instead of argparsing `-u` and
  `-k`, respectively.

- Updated `daily-python` to 0.19.6.

- Changed `TavusVideoService` to send audio or video frames only after the
  transport is ready, preventing warning messages at startup.

- The development runner now strips any provided protocol (e.g. https://) from
  the proxy address and issues a warning. It also strips trailing `/`.

### Deprecated

- In the `pipecat.runner.daily`, the `configure_with_args()` function is
  deprecated. Use the `configure()` function instead.

- The development runner's `/connect` endpoint is deprecated and will be
  removed in a future version. Use the `/start` endpoint in its place. In the
  meantime, both endpoints work and deliver equivalent functionality.

### Fixed

- Fixed a `DailyTransport` issue that would result in an unhandled
  `concurrent.futures.CancelledError` when a future is cancelled.

- Fixed a `RivaSTTService` issue that would result in an unhandled
  `concurrent.futures.CancelledError` when a future is cancelled when reading
  from the audio chunks from the incoming audio stream.

- Fixed an issue in the `BaseOutputTransport`, mainly reproducible with
  `FastAPIWebsocketOutputTransport` when the audio mixer was enabled, where the
  loop could consume 100% CPU by continuously returning without delay, preventing
  other asyncio tasks (such as cancellation or shutdown signals) from being
  processed.

- Fixed an issue where `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame`
  were not emitted when using `TavusVideoService` or `HeyGenVideoService`.

- Fixed an issue in `LiveKitTransport` where empty `AudioRawFrame`s were pushed
  down the pipeline. This resulted in warnings by the STT processor.
- Fixed `PiperTTSService` to send text as a JSON object in the request body,
  resolving compatibility with Piper's HTTP API.

- Fixed an issue with the `TavusVideoService` where an error was thrown due to
  missing transcription callbacks.

- Fixed an issue in `SpeechmaticsSTTService` where the `user_id` was set to
  `None` when diarization is not enabled.

### Performance

- Fixed an issue in `TaskObserver` (a proxy to all observers) that was degrading
  global performance.

### Other

- Added `07aa-interruptible-soniox.py`, `07ab-interruptible-inworld-http.py`,
  `07ac-interruptible-asyncai.py` and `07ac-interruptible-asyncai-http.py`
  release evals.

## [0.0.77] - 2025-07-31

### Added

- Added `InputTextRawFrame` frame type to handle user text input with Gemini
  Multimodal Live.

- Added `HeyGenVideoService`. This is an integration for HeyGen Interactive
  Avatar. A video service that handles audio streaming and requests HeyGen to
  generate avatar video responses. (see https://www.heygen.com/)

- Added the ability to switch voices to `RimeTTSService`.

- Added unified development runner for building voice AI bots across multiple
  transports

  - `pipecat.runner.run` – FastAPI-based development server with automatic bot
    discovery
  - `pipecat.runner.types` – Runner session argument types
    (`DailyRunnerArguments`, `SmallWebRTCRunnerArguments`,
    `WebSocketRunnerArguments`)
  - `pipecat.runner.utils.create_transport()` – Factory function for creating
    transports from session arguments
  - `pipecat.runner.daily` and `pipecat.runner.livekit` – Configuration
    utilities for Daily and LiveKit setups
  - Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo
  - Automatic telephony provider detection and serializer configuration
  - ESP32 WebRTC compatibility with SDP munging
  - Environment detection (`ENV=local`) for conditional features

- Added Async.ai TTS integration (https://async.ai/)

  - `AsyncAITTSService` – WebSocket-based streaming TTS with interruption
    support
  - `AsyncAIHttpTTSService` – HTTP-based streaming TTS service
  - Example scripts:
    - `examples/foundational/07ac-interruptible-asyncai.py` (WebSocket demo)
    - `examples/foundational/07ac-interruptible-asyncai-http.py` (HTTP demo)

- Added `transcription_bucket` params support to the `DailyRESTHelper`.

- Added a new TTS service, `InworldTTSService`. This service provides
  low-latency, high-quality speech generation using Inworld's streaming API.

- Added a new field `handle_sigterm` to `PipelineRunner`. It defaults to
  `False`. This field handles SIGTERM signals. The `handle_sigint` field still
  defaults to `True`, but now it handles only SIGINT signals.

- Added foundational example `14u-function-calling-ollama.py` for Ollama
  function calling.

- Added `LocalSmartTurnAnalyzerV2`, which supports local on-device inference
  with the new `smart-turn-v2` turn detection model.

- Added `set_log_level` to `DailyTransport`, allowing setting the logging level
  for Daily's internal logging system.

- Added `on_transcription_stopped` and `on_transcription_error` to Daily
  callbacks.

### Changed

- Changed the default `url` for `NeuphonicTTSService` to
  `wss://api.neuphonic.com` as it provides better global performance. You can
  set the URL to other URLs, such as the previous default:
  `wss://eu-west-1.api.neuphonic.com`.

- Update `daily-python` to 0.19.5.

- `STTMuteFilter` now pushes the `STTMuteFrame` upstream and downstream, to
  allow for more flexible `STTMuteFilter` placement.

- Play delayed messages from `ElevenLabsTTSService` if they still belong to the
  current context.

- Dependency compatibility improvements: Relaxed version constraints for core
  dependencies to support broader version ranges while maintaining stability:

  - `aiohttp`, `Markdown`, `nltk`, `numpy`, `Pillow`, `pydantic`, `openai`,
    `numba`: Now support up to the next major version (e.g. `numpy>=1.26.4,<3`)
  - `pyht`: Relaxed to `>=0.1.6` to resolve `grpcio` conflicts with
    `nvidia-riva-client`
  - `fastapi`: Updated to support versions `>=0.115.6,<0.117.0`
  - `torch`/`torchaudio`: Changed from exact pinning (`==2.5.0`) to compatible
    range (`~=2.5.0`)
  - `aws_sdk_bedrock_runtime`: Added Python 3.12+ constraint via environment
    marker
  - `numba`: Reduced minimum version to `0.60.0` for better compatibility

- Changed `NeuphonicHttpTTSService` to use a POST based request instead of the
  `pyneuphonic` package. This removes a package requirement, allowing Neuphonic
  to work with more services.

- Updated `ElevenLabsTTSService` to handle the case where
  `allow_interruptions=False`. Now, when interruptions are disabled, the same
  context ID will be used throughout the conversation.

- Updated the `deepgram` optional dependency to 4.7.0, which downgrades the
  `tasks cancelled error` to a debug log. This removes the log from appearing
  in Pipecat logs upon leaving.

- Upgraded the `websockets` implementation to the new asyncio implementation.
  Along with this change, we're updating support for versions >=13.1.0 and
  <15.0.0. All services have been update to use the asyncio implementation.

- Updated `MiniMaxHttpTTSService` with a `base_url` arg where you can specify
  the Global endpoint (default) or Mainland China.

- Replaced regex-based sentence detection in `match_endofsentence` with NLTK's
  punkt_tab tokenizer for more reliable sentence boundary detection.

- Changed the `livekit` optional dependency for `tenacity` to
  `tenacity>=8.2.3,<10.0.0` in order to support the `google-genai` package.

- For `LmntTTSService`, changed the default `model` to `blizzard`, LMNT's
  recommended model.

- Updated `SpeechmaticsSTTService`:
  - Added support for additional diarization options.
  - Added foundational example `07a-interruptible-speechmatics-vad.py`, which
    uses VAD detection provided by `SpeechmaticsSTTService`.

### Fixed

- Fixed a `LLMUserResponseAggregator` issue where interruptions were not being
  handled properly.

- Fixed `PiperTTSService` to work with newer Piper GPL.

- Fixed a race condition in `FastAPIWebsocketClient` that occurred when
  attempting to send a message while the client was disconnecting.

- Fixed an issue in `GoogleLLMService` where interruptions did not work when an
  interruption strategy was used.

- Fixed an issue in the `TranscriptProcessor` where newline characters could
  cause the transcript output to be corrupted (e.g. missing all spaces).

- Fixed an issue in `AudioBufferProcessor` when using `SmallWebRTCTransport`
  where, if the microphone was muted, track timing was not respected.

- Fixed an error that occurs when pushing an `LLMMessagesFrame`. Only some LLM
  services, like Grok, are impacted by this issue. The fix is to remove the
  optional `name` property that was being added to the message.

- Fixed an issue in `AudioBufferProcessor` that caused garbled audio when
  `enable_turn_audio` was enabled and audio resampling was required.

- Fixed a dependency issue for uv users where an `llvmlite` version required
  python 3.9.

- Fixed an issue in `MiniMaxHttpTTSService` where the `pitch` param was the
  incorrect type.

- Fixed an issue with OpenTelemetry tracing where the `enable_tracing` flag did
  not disable the internal tracing decorator functions.

- Fixed an issue in `OLLamaLLMService` where kwargs were not passed correctly
  to the parent class.

- Fixed an issue in `ElevenLabsTTSService` where the word/timestamp pairs were
  calculating word boundaries incorrectly.

- Fixed an issue where, in some edge cases, the
  `EmulateUserStartedSpeakingFrame` could be created even if we didn't have a
  transcription.

- Fixed an issue in `GoogleLLMContext` where it would inject the
  `system_message` as a "user" message into cases where it was not meant to;
  it was only meant to do that when there were no "regular" (non-function-call)
  messages in the context, to ensure that inference would run properly.

- Fixed an issue in `LiveKitTransport` where the `on_audio_track_subscribed` was
  never emitted.

### Other

- Added new quickstart demos:

  - examples/quickstart: voice AI bot quickstart
  - examples/client-server-web: client/server starter example
  - examples/phone-bot-twilio: twilio starter example

- Removed most of the examples from the pipecat repo. Examples can now be
  found in: https://github.com/pipecat-ai/pipecat-examples.

## [0.0.76] - 2025-07-11

### Added

- Added `SpeechControlParamsFrame`, a new `SystemFrame` that notifies
  downstream processors of the VAD and Turn analyzer params. This frame is
  pushed by the `BaseInputTransport` at Start and any time a
  `VADParamsUpdateFrame` is received.

### Changed

- Two package dependencies have been updated:
  - `numpy` now supports 1.26.0 and newer
  - `transformers` now supports 4.48.0 and newer

### Fixed

- Fixed an issue with RTVI's handling of `append-to-context`.

- Fixed an issue where using audio input with a sample rate requiring resampling
  could result in empty audio being passed to STT services, causing errors.

- Fixed the VAD analyzer to process the full audio buffer as long as it contains
  more than the minimum required bytes per iteration, instead of only analyzing
  the first chunk.

- Fixed an issue in ParallelPipeline that caused errors when attempting to drain
  the queues.

- Fixed an issue with emulated VAD timeout inconsistency in
  `LLMUserContextAggregator`. Previously, emulated VAD scenarios (where
  transcription is received without VAD detection) used a hardcoded
  `aggregation_timeout` (default 0.5s) instead of matching the VAD's
  `stop_secs` parameter (default 0.8s). This created different user experiences
  between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts
  automatically synchronize with the VAD's `stop_secs` parameter.

- Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the
  user started early, while the bot was still working through
  `trigger_assistant_response()`.

## [0.0.75] - 2025-07-08 [YANKED]

**This release has been yanked due to resampling issues affecting audio output
quality and critical bugs impacting `ParallelPipelines` functionality.**

**Please upgrade to version 0.0.76 or later.**

### Added

- Added an `aggregate_sentences` arg in `CartesiaTTSService`,
  `ElevenLabsTTSService`, `NeuphonicTTSService` and `RimeTTSService`, where the
  default value is True. When `aggregate_sentences` is True, the `TTSService`
  aggregates the LLM streamed tokens into sentences by default. Note: setting
  the value to False requires a custom processor before the `TTSService` to
  aggregate LLM tokens.

- Added `kwargs` to the `OLLamaLLMService` to allow for configuration args to
  be passed to Ollama.

- Added call hang-up error handling in `TwilioFrameSerializer`, which handles
  the case where the user has hung up before the `TwilioFrameSerializer` hangs
  up the call.

### Changed

- Updated `RTVIObserver` and `RTVIProcessor` to match the new RTVI 1.0.0 protocol.
  This includes:

  - Deprecating support for all messages related to service configuaration and
    actions.
  - Adding support for obtaining and logging data about client, including its
    RTVI version and optionally included system information (OS/browser/etc.)
  - Adding support for handling the new `client-message` RTVI message through
    either a `on_client_message` event handler or listening for a new
    `RTVIClientMessageFrame`
  - Adding support for responding to a `client-message` with a `server-response`
    via either a direct call on the `RTVIProcessor` or via pushing a new
    `RTVIServerResponseFrame`
  - Adding built-in support for handling the new `append-to-context` RTVI message
    which allows a client to add to the user or assistant llm context. No extra
    code is required for supporting this behavior.
  - Updating all JavaScript and React client RTVI examples to use versions 1.0.0
    of the clients.

  Get started migrating to RTVI protocol 1.0.0 by following the migration guide:
  https://docs.pipecat.ai/client/migration-guide

- Refactored `AWSBedrockLLMService` and `AWSPollyTTSService` to work
  asynchronously using `aioboto3` instead of the `boto3` library.

- The `UserIdleProcessor` now handles the scenario where function calls take
  longer than the idle timeout duration. This allows you to use the
  `UserIdleProcessor` in conjunction with function calls that take a while to
  return a result.

### Fixed

- Updated the `NeuphonicTTSService` to work with the updated websocket API.

- Fixed an issue with `RivaSTTService` where the watchdog feature was causing
  an error on initialization.

### Performance

- Remove unncessary push task in each `FrameProcessor`.

## [0.0.74] - 2025-07-03 [YANKED]

**This release has been yanked due to resampling issues affecting audio output
quality and critical bugs impacting `ParallelPipelines` functionality.**

**Please upgrade to version 0.0.76 or later.**

### Added

- Added a new STT service, `SpeechmaticsSTTService`. This service provides
  real-time speech-to-text transcription using the Speechmatics API. It supports
  partial and final transcriptions, multiple languages, various audio formats,
  and speaker diarization.

- Added `normalize` and `model_id` to `FishAudioTTSService`.

- Added `http_options` argument to `GoogleLLMService`.

- Added `run_llm` field to `LLMMessagesAppendFrame` and `LLMMessagesUpdateFrame`
  frames. If true, a context frame will be pushed triggering the LLM to respond.

- Added a new `SOXRStreamAudioResampler` for processing audio in chunks or
  streams. If you write your own processor and need to use an audio resampler,
  use the new `create_stream_resampler()`.

- Added new `DailyParams.audio_in_user_tracks` to allow receiving one track per
  user (default) or a single track from the room (all participants mixed).

- Added support for providing "direct" functions, which don't need an
  accompanying `FunctionSchema` or function definition dict. Instead, metadata
  (i.e. `name`, `description`, `properties`, and `required`) are automatically
  extracted from a combination of the function signature and docstring.

  Usage:

  ```python
  # "Direct" function
  # `params` must be the first parameter
  async def do_something(params: FunctionCallParams, foo: int, bar: str = ""):
    """
    Do something interesting.

    Args:
      foo (int): The foo to do something interesting with.
      bar (string): The bar to do something interesting with.
    """

    result = await process(foo, bar)
    await params.result_callback({"result": result})

  # ...

  llm.register_direct_function(do_something)

  # ...

  tools = ToolsSchema(standard_tools=[do_something])
  ```

- `user_id` is now populated in the `TranscriptionFrame` and
  `InterimTranscriptionFrame` when using a transport that provides a `user_id`,
  like `DailyTransport` or `LiveKitTransport`.

- Added `watchdog_coroutine()`. This is a watchdog helper for couroutines. So,
  if you have a coroutine that is waiting for a result and that takes a long
  time, you will need to wrap it with `watchdog_coroutine()` so the watchdog
  timers are reset regularly.

- Added `session_token` parameter to `AWSNovaSonicLLMService`.

- Added Gemini Multimodal Live File API for uploading, fetching, listing, and
  deleting files. See `26f-gemini-live-files-api.py` for example usage.

### Changed

- Updated all the services to use the new `SOXRStreamAudioResampler`, ensuring smooth
  transitions and eliminating clicks.

- Upgraded `daily-python` to 0.19.4.

- Updated `google` optional dependency to use `google-genai` version `1.24.0`.

### Fixed

- Fixed an issue where audio would get stuck in the queue when an interrupt occurs
  during Azure TTS synthesis.

- Fixed a race condition that occurs in Python 3.10+ where the task could miss
  the `CancelledError` and continue running indefinitely, freezing the pipeline.

- Fixed a `AWSNovaSonicLLMService` issue introduced in 0.0.72.

### Deprecated

- In `FishAudioTTSService`, deprecated `model` and replaced with
  `reference_id`. This change is to better align with Fish Audio's variable
  naming and to reduce confusion about what functionality the variable
  controls.

## [0.0.73] - 2025-06-26

### Fixed

- Fixed an issue introduced in 0.0.72 that would cause `ElevenLabsTTSService`,
  `GladiaSTTService`, `NeuphonicTTSService` and `OpenAIRealtimeBetaLLMService`
  to throw an error.

## [0.0.72] - 2025-06-26

### Added

- Added logging and improved error handling to help diagnose and prevent potential
  Pipeline freezes.

- Added `WatchdogQueue`, `WatchdogPriorityQueue`, `WatchdogEvent` and
  `WatchdogAsyncIterator`. These helper utilities reset watchdog timers
  appropriately before they expire. When watchdog timers are disabled, the
  utilities behave as standard counterparts without side effects.

- Introduce task watchdog timers. Watchdog timers are used to detect if a
  Pipecat task is taking longer than expected (by default 5 seconds). Watchdog
  timers are disabled by default and can be enabled globally by passing
  `enable_watchdog_timers` argument to `PipelineTask` constructor. It is
  possible to change the default watchdog timer timeout by using the
  `watchdog_timeout` argument. You can also log how long it takes to reset the
  watchdog timers which is done with the `enable_watchdog_logging`. You can
  control all these settings per each frame processor or even per task. That is,
  you can set `enable_watchdog_timers`, `enable_watchdog_logging` and
  `watchdog_timeout` when creating any frame processor through their constructor
  arguments or when you create a task with `FrameProcessor.create_task()`. Note
  that watchdog timers only work with Pipecat tasks and will not work if you use
  `asycio.create_task()` or similar.

- Added `lexicon_names` parameter to `AWSPollyTTSService.InputParams`.

- Added reconnection logic and audio buffer management to `GladiaSTTService`.

- The `TurnTrackingObserver` now ends a turn upon observing an `EndFrame` or
  `CancelFrame`.

- Added Polish support to `AWSTranscribeSTTService`.

- Added new frames `FrameProcessorPauseFrame` and `FrameProcessorResumeFrame`
  which allow pausing and resuming frame processing for a given frame
  processor. These are control frames, so they are ordered. Pausing frame
  processor will keep old frames in the internal queues until resume takes
  place. Frames being pushed while a frame processor is paused will be pushed to
  the queues. When frame processing is resumed all queued frames will be
  processed in order. Also added `FrameProcessorPauseUrgentFrame` and
  `FrameProcessorResumeUrgentFrame` which are system frames and therefore they
  have high priority.

- Added a property called `has_function_calls_in_progress` in
  `LLMAssistantContextAggregator` that exposes whether a function call is in
  progress.

- Added `SambaNovaLLMService` which provides llm api integration with an
  OpenAI-compatible interface.

- Added `SambaNovaTTSService` which provides speech-to-text functionality using
  SambaNovas's (whisper) API.

- Add fundational examples for function calling and transcription
  `14s-function-calling-sambanova.py`, `13g-sambanova-transcription.py`

### Changed

- `HeartbeatFrame`s are now control frames. This will make it easier to detect
  pipeline freezes. Previously, heartbeat frames were system frames which meant
  they were not get queued with other frames, making it difficult to detect
  pipeline stalls.

- Updated `OpenAIRealtimeBetaLLMService` to accept `language` in the
  `InputAudioTranscription` class for all models.

- Updated the default model for `OpenAIRealtimeBetaLLMService` to
  `gpt-4o-realtime-preview-2025-06-03`.

- The `PipelineParams` arg `allow_interruptions` now defaults to `True`.

- `TavusTransport` and `TavusVideoService` now send audio to Tavus using WebRTC
  audio tracks instead of `app-messages` over WebSocket. This should improve the
  overall audio quality.

- Upgraded `daily-python` to 0.19.3.

### Fixed

- Fixed an issue that would cause heartbeat frames to be sent before processors
  were started.

- Fixed an event loop blocking issue when using `SentryMetrics`.

- Fixed an issue in `FastAPIWebsocketClient` to ensure proper disconnection
  when the websocket is already closed.

- Fixed an issue where the `UserStoppedSpeakingFrame` was not received if the
  transport was not receiving new audio frames.

- Fixed an edge case where if the user interrupted the bot but no new aggregation
  was received, the bot would not resume speaking.

- Fixed an issue with `TelnyxFrameSerializer` where it would throw an exception
  when the user hung up the call.

- Fixed an issue with `ElevenLabsTTSService` where the context was not being
  closed.

- Fixed function calling in `AWSNovaSonicLLMService`.

- Fixed an issue that would cause multiple `PipelineTask.on_idle_timeout`
  events to be triggered repeatedly.

- Fixed an issue that was causing user and bot speech to not be synchronized
  during recordings.

- Fixed an issue where voice settings weren't applied to ElevenLabsTTSService.

- Fixed an issue with `GroqTTSService` where it was not properly parsing the
  WAV file header.

- Fixed an issue with `GoogleSTTService` where it was constantly reconnecting
  before starting to receive audio from the user.

- Fixed an issue where `GoogleLLMService`'s TTFB value was incorrect.

### Deprecated

- `AudioBufferProcessor` parameter `user_continuos_stream` is deprecated.

### Other

- Rename `14e-function-calling-gemini.py` to `14e-function-calling-google.py`.

## [0.0.71] - 2025-06-10

### Added

- Adds a parameter called `additional_span_attributes` to PipelineTask that
  lets you add any additional attributes you'd like to the conversation span.

### Fixed

- Fixed an issue with `CartesiaSTTService` initialization.

## [0.0.70] - 2025-06-10

### Added

- Added `ExotelFrameSerializer` to handle telephony calls via Exotel.

- Added the option `informal` to `TranslationConfig` on Gladia config.
  Allowing to force informal language forms when available.

- Added `CartesiaSTTService` which is a websocket based implementation to
  transcribe audio. Added a foundational example in
  `13f-cartesia-transcription.py`

- Added an `websocket` example, showing how to use the new Pipecat client
  `WebsocketTransport` to connect with Pipecat `FastAPIWebsocketTransport` or
  `WebsocketServerTransport`.

- Added language support to `RimeHttpTTSService`. Extended languages to include
  German and French for both `RimeTTSService` and `RimeHttpTTSService`.

### Changed

- Upgraded `daily-python` to 0.19.2.

- Make `PipelineTask.add_observer()` synchronous. This allows callers to call it
  before doing the work of running the `PipelineTask` (i.e. without invoking
  `PipelineTask.set_event_loop()` first).

- Pipecat 0.0.69 forced `uvloop` event loop on Linux on macOS. Unfortunately,
  this is causing issue in some systems. So, `uvloop` is not enabled by default
  anymore. If you want to use `uvloop` you can just set the `asyncio` event
  policy before starting your agent with:

```python
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
```

### Fixed

- Fixed an issue with various TTS services that would cause audio glitches at
  the start of every bot turn.

- Fixed an `ElevenLabsTTSService` issue where a context warning was printed
  when pushing a `TTSSpeakFrame`.

- Fixed an `AssemblyAISTTService` issue that could cause unexpected behavior
  when yielding empty `Frame()`s.

- Fixed an issue where `OutputAudioRawFrame.transport_destination` was being
  reset to `None` instead of retaining its intended value before sending the
  audio frame to `write_audio_frame`.

- Fixed a typo in Livekit transport that prevented initialization.

## [0.0.69] - 2025-06-02 "AI Engineer World's Fair release" ✨

### Added

- Added a new frame `FunctionCallsStartedFrame`. This frame is pushed both
  upstream and downstream from the LLM service to indicate that one or more
  function calls are going to be executed.

- Added LLM services `on_function_calls_started` event. This event will be
  triggered when the LLM service receives function calls from the model and is
  going to start executing them.

- Function calls can now be executed sequentially (in the order received in the
  completion) by passing `run_in_parallel=False` when creating your LLM
  service. By default, if the LLM completion returns 2 or more function calls
  they run concurrently. In both cases, concurrently and sequentially, a new LLM
  completion will run when the last function call finishes.

- Added OpenTelemetry tracing for `GeminiMultimodalLiveLLMService` and
  `OpenAIRealtimeBetaLLMService`.

- Added initial support for interruption strategies, which determine if the user
  should interrupt the bot while the bot is speaking. Interruption strategies
  can be based on factors such as audio volume or the number of words spoken by
  the user. These can be specified via the new `interruption_strategies` field
  in `PipelineParams`. A new `MinWordsInterruptionStrategy` strategy has been
  introduced which triggers an interruption if the user has spoken a minimum
  number of words. If no interruption strategies are specified, the normal
  interruption behavior applies. If multiple strategies are provided, the first
  one that evaluates to true will trigger the interruption.

- `BaseInputTransport` now handles `StopFrame`. When a `StopFrame` is received
  the transport will pause sending frames downstream until a new `StartFrame` is
  received. This allows the transport to be reused (keeping the same connection)
  in a different pipeline.

- Updated AssemblyAI STT service to support their latest streaming
  speech-to-text model with improved transcription latency and endpointing.

- You can now access STT service results through the new
  `TranscriptionFrame.result` and `InterimTranscriptionFrame.result` field. This
  is useful in case you use some specific settings for the STT and you want to
  access the STT results.

- The examples runner is now public from the `pipecat.examples` package. This
  allows everyone to build their own examples and run them easily.

- It is now possible to push `OutputDTMFFrame` or `OutputDTMFUrgentFrame` with
  `DailyTransport`. This will be sent properly if a Daily dial-out connection
  has been established.

- Added `OutputDTMFUrgentFrame` to send a DTMF keypress quickly. The previous
  `OutputDTMFFrame` queues the keypress with the rest of data frames.

- Added `DTMFAggregator`, which aggregates keypad presses into
  `TranscriptionFrame`s. Aggregation occurs after a timeout, termination key
  press, or user interruption. You can specify the prefix of the
  `TranscriptionFrame`.

- Added new functions `DailyTransport.start_transcription()` and
  `DailyTransport.stop_transcription()` to be able to start and stop Daily
  transcription dynamically (maybe with different settings).

### Changed

- Reverted the default model for `GeminiMultimodalLiveLLMService` back to
  `models/gemini-2.0-flash-live-001`.
  `gemini-2.5-flash-preview-native-audio-dialog` has inconsistent performance.
  You can opt in to using this model by setting the `model` arg.

- Function calls are now cancelled by default if there's an interruption. To
  disable this behavior you can set `cancel_on_interruption=False` when
  registering the function call. Since function calls are executed as tasks you
  can tell if a function call has been cancelled by catching the
  `asyncio.CancelledError` exception (and don't forget to raise it again!).

- Updated OpenTelemetry tracing attribute `metrics.ttfb_ms` to `metrics.ttfb`.
  The attribute reports TTFB in seconds.

### Deprecated

- `DailyTransport.send_dtmf()` is deprecated, push an `OutputDTMFFrame` or an
  `OutputDTMFUrgentFrame` instead.

### Fixed

- Fixed an issue with `ElevenLabsTTSService` where long responses would
  continue generating output even after an interruption.

- Fixed an issue with the `OpenAILLMContext` where non-Roman characters were
  being incorrectly encoded as Unicode escape sequences. This was a logging
  issue and did not impact the actual conversation.

- In `AWSBedrockLLMService`, worked around a possible bug in AWS Bedrock where
  a `toolConfig` is required if there has been previous tool use in the
  messages array. This workaround includes a no_op factory function call is
  used to satisfy the requirement.

- Fixed `WebsocketClientTransport` to use `FrameProcessorSetup.task_manager`
  instead of `StartFrame.task_manager`.

### Performance

- Use `uvloop` as the new event loop on Linux and macOS systems.

## [0.0.68] - 2025-05-28

### Added

- Added `GoogleHttpTTSService` which uses Google's HTTP TTS API.

- Added `TavusTransport`, a new transport implementation compatible with any
  Pipecat pipeline. When using the `TavusTransport`the Pipecat bot will
  connect in the same room as the Tavus Avatar and the user.

- Added `PlivoFrameSerializer` to support Plivo calls. A full running example
  has also been added to `examples/plivo-chatbot`.

- Added `UserBotLatencyLogObserver`. This is an observer that logs the latency
  between when the user stops speaking and when the bot starts speaking. This
  gives you an initial idea on how quickly the AI services respond.

- Added `SarvamTTSService`, which implements Sarvam AI's TTS API:
  https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert.

- Added `PipelineTask.add_observer()` and `PipelineTask.remove_observer()` to
  allow mangaging observers at runtime. This is useful for cases where the task
  is passed around to other code components that might want to observe the
  pipeline dynamically.

- Added `user_id` field to `TranscriptionMessage`. This allows identifying the
  user in a multi-user scenario. Note that this requires that
  `TranscriptionFrame` has the `user_id` properly set.

- Added new `PipelineTask` event handlers `on_pipeline_started`,
  `on_pipeline_stopped`, `on_pipeline_ended` and `on_pipeline_cancelled`, which
  correspond to the `StartFrame`, `StopFrame`, `EndFrame` and `CancelFrame`
  respectively.

- Added additional languages to `LmntTTSService`. Languages include: `hi`,
  `id`, `it`, `ja`, `nl`, `pl`, `ru`, `sv`, `th`, `tr`, `uk`, `vi`.

- Added a `model` parameter to the `LmntTTSService` constructor, allowing
  switching between LMNT models.

- Added `MiniMaxHttpTTSService`, which implements MiniMax's T2A API for TTS.
  Learn more: https://www.minimax.io/platform_overview

- A new function `FrameProcessor.setup()` has been added to allow setting up
  frame processors before receiving a `StartFrame`. This is what's happening
  internally: `FrameProcessor.setup()` is called, `StartFrame` is pushed from
  the beginning of the pipeline, your regular pipeline operations, `EndFrame`
  or `CancelFrame` are pushed from the beginning of the pipeline and finally
  `FrameProcessor.cleanup()` is called.

- Added support for OpenTelemetry tracing in Pipecat. This initial
  implementation includes:

  - A `setup_tracing` method where you can specify your OpenTelemetry exporter
  - Service decorators for STT (`@traced_stt`), LLM (`@traced_llm`), and TTS
    (`@traced_tts`) which trace the execution and collect properties and
    metrics (TTFB, token usage, character counts, etc.)
  - Class decorators that provide execution tracking; these are generic and can
    be used for service tracking as needed
  - Spans that help track traces on a per conversations and turn basis:

  ```
  conversation-uuid
  ├── turn-1
  │   ├── stt_deepgramsttservice
  │   ├── llm_openaillmservice
  │   └── tts_cartesiattsservice
  ...
  └── turn-n
      └── ...
  ```

  By default, Pipecat has implemented service decorators to trace execution of
  STT, LLM, and TTS services. You can enable tracing by setting
  `enable_tracing` to `True` in the PipelineTask.

- Added `TurnTrackingObserver`, which tracks the start and end of a user/bot
  turn pair and emits events `on_turn_started` and `on_turn_stopped`
  corresponding to the start and end of a turn, respectively.

- Allow passing observers to `run_test()` while running unit tests.

### Changed

- Upgraded `daily-python` to 0.19.1.

- ⚠️ Updated `SmallWebRTCTransport` to align with how other transports handle
  `on_client_disconnected`. Now, when the connection is closed and no reconnection
  is attempted, `on_client_disconnected` is called instead of `on_client_close`. The
  `on_client_close` callback is no longer used, use `on_client_disconnected` instead.

- Check if `PipelineTask` has already been cancelled.

- Don't raise an exception if event handler is not registered.

- Upgraded `deepgram-sdk` to 4.1.0.

- Updated `GoogleTTSService` to use Google's streaming TTS API. The default
  voice also updated to `en-US-Chirp3-HD-Charon`.

- ⚠️ Refactored the `TavusVideoService`, so it acts like a proxy, sending audio
  to Tavus and receiving both audio and video. This will make
  `TavusVideoService` usable with any Pipecat pipeline and with any transport.
  This is a **breaking change**, check the
  `examples/foundational/21a-tavus-layer-small-webrtc.py` to see how to use it.

- `DailyTransport` now uses custom microphone audio tracks instead of virtual
  microphones. Now, multiple Daily transports can be used in the same process.

- `DailyTransport` now captures audio from individual participants instead of
  the whole room. This allows identifying audio frames per participant.

- Updated the default model for `AnthropicLLMService` to
  `claude-sonnet-4-20250514`.

- Updated the default model for `GeminiMultimodalLiveLLMService` to
  `models/gemini-2.5-flash-preview-native-audio-dialog`.

- `BaseTextFilter` methods `filter()`, `update_settings()`,
  `handle_interruption()` and `reset_interruption()` are now async.

- `BaseTextAggregator` methods `aggregate()`, `handle_interruption()` and
  `reset()` are now async.

- The API version for `CartesiaTTSService` and `CartesiaHttpTTSService` has
  been updated. Also, the `cartesia` dependency has been updated to 2.x.

- `CartesiaTTSService` and `CartesiaHttpTTSService` now support Cartesia's new
  `speed` parameter which accepts values of `slow`, `normal`, and `fast`.

- `GeminiMultimodalLiveLLMService` now uses the user transcription and usage
  metrics provided by Gemini Live.

- `GoogleLLMService` has been updated to use `google-genai` instead of the
  deprecated `google-generativeai`.

### Deprecated

- In `CartesiaTTSService` and `CartesiaHttpTTSService`, `emotion` has been
  deprecated by Cartesia. Pipecat is following suit and deprecating `emotion`
  as well.

### Removed

- Since `GeminiMultimodalLiveLLMService` now transcribes it's own audio, the
  `transcribe_user_audio` arg has been removed. Audio is now transcribed
  automatically.

- Removed `SileroVAD` frame processor, just use `SileroVADAnalyzer`
  instead. Also removed, `07a-interruptible-vad.py` example.

### Fixed

- Fixed a `DailyTransport` issue that was not allow capturing video frames if
  framerate was greater than zero.

- Fixed a `DeegramSTTService` connection issue when the user provided their own
  `LiveOptions`.

- Fixed a `DailyTransport` issue that would cause images needing resize to block
  the event loop.

- Fixed an issue with `ElevenLabsTTSService` where changing the model or voice
  while the service is running wasn't working.

- Fixed an issue that would cause multiple instances of the same class to behave
  incorrectly if any of the given constructor arguments defaulted to a mutable
  value (e.g. lists, dictionaries, objects).

- Fixed an issue with `CartesiaTTSService` where `TTSTextFrame` messages weren't
  being emitted when the model was set to `sonic`. This resulted in the
  assistant context not being updated with assistant messages.

### Performance

- `DailyTransport`: process audio, video and events in separate tasks.

- Don't create event handler tasks if no user event handlers have been
  registered.

### Other

- It is now possible to run all (or most) foundational example with multiple
  transports. By default, they run with P2P (Peer-To-Peer) WebRTC so you can try
  everything locally. You can also run them with Daily or even with a Twilio
  phone number.

- Added foundation examples `07y-interruptible-minimax.py` and
  `07z-interruptible-sarvam.py`to show how to use the `MiniMaxHttpTTSService`
  and `SarvamTTSService`, respectively.

- Added an `open-telemetry-tracing` example, showing how to setup tracing. The
  example also includes Jaeger as an open source OpenTelemetry client to review
  traces from the example runs.

- Added foundational example `29-turn-tracking-observer.py` to show how to use
  the `TurnTrackingObserver`.

## [0.0.67] - 2025-05-07

### Added

- Added `DebugLogObserver` for detailed frame logging with configurable
  filtering by frame type and endpoint. This observer automatically extracts
  and formats all frame data fields for debug logging.

- `UserImageRequestFrame.video_source` field has been added to request an image
  from the desired video source.

- Added support for the AWS Nova Sonic speech-to-speech model with the new
  `AWSNovaSonicLLMService`.
  See https://docs.aws.amazon.com/nova/latest/userguide/speech.html.
  Note that it requires Python >= 3.12 and `pip install pipecat-ai[aws-nova-sonic]`.

- Added new AWS services `AWSBedrockLLMService` and `AWSTranscribeSTTService`.

- Added `on_active_speaker_changed` event handler to the `DailyTransport` class.

- Added `enable_ssml_parsing` and `enable_logging` to `InputParams` in
  `ElevenLabsTTSService`.

- Added support to `RimeHttpTTSService` for the `arcana` model.

### Changed

- Updated `ElevenLabsTTSService` to use the beta websocket API
  (multi-stream-input). This new API supports context_ids and cancelling those
  contexts, which greatly improves interruption handling.

- Observers `on_push_frame()` now take a single argument `FramePushed` instead
  of multiple arguments.

- Updated the default voice for `DeepgramTTSService` to `aura-2-helena-en`.

### Deprecated

- `PollyTTSService` is now deprecated, use `AWSPollyTTSService` instead.

- Observer `on_push_frame(src, dst, frame, direction, timestamp)` is now
  deprecated, use `on_push_frame(data: FramePushed)` instead.

### Fixed

- Fixed a `DailyTransport` issue that was causing issues when multiple audio or
  video sources where being captured.

- Fixed a `UltravoxSTTService` issue that would cause the service to generate
  all tokens as one word.

- Fixed a `PipelineTask` issue that would cause tasks to not be cancelled if
  task was cancelled from outside of Pipecat.

- Fixed a `TaskManager` that was causing dangling tasks to be reported.

- Fixed an issue that could cause data to be sent to the transports when they
  were still not ready.

- Remove custom audio tracks from `DailyTransport` before leaving.

### Removed

- Removed `CanonicalMetricsService` as it's no longer maintained.

## [0.0.66] - 2025-05-02

### Added

- Added two new input parameters to `RimeTTSService`: `pause_between_brackets`
  and `phonemize_between_brackets`.

- Added support for cross-platform local smart turn detection. You can use
  `LocalSmartTurnAnalyzer` for on-device inference using Torch.

- `BaseOutputTransport` now allows multiple destinations if the transport
  implementation supports it (e.g. Daily's custom tracks). With multiple
  destinations it is possible to send different audio or video tracks with a
  single transport simultaneously. To do that, you need to set the new
  `Frame.transport_destination` field with your desired transport destination
  (e.g. custom track name), tell the transport you want a new destination with
  `TransportParams.audio_out_destinations` or
  `TransportParams.video_out_destinations` and the transport should take care of
  the rest.

- Similar to the new `Frame.transport_destination`, there's a new
  `Frame.transport_source` field which is set by the `BaseInputTransport` if the
  incoming data comes from a non-default source (e.g. custom tracks).

- `TTSService` has a new `transport_destination` constructor parameter. This
  parameter will be used to update the `Frame.transport_destination` field for
  each generated `TTSAudioRawFrame`. This allows sending multiple bots' audio to
  multiple destinations in the same pipeline.

- Added `DailyTransportParams.camera_out_enabled` and
  `DailyTransportParams.microphone_out_enabled` which allows you to
  enable/disable the main output camera or microphone tracks. This is useful if
  you only want to use custom tracks and not send the main tracks. Note that you
  still need `audio_out_enabled=True` or `video_out_enabled`.

- Added `DailyTransport.capture_participant_audio()` which allows you to capture
  an audio source (e.g. "microphone", "screenAudio" or a custom track name) from
  a remote participant.

- Added `DailyTransport.update_publishing()` which allows you to update the call
  video and audio publishing settings (e.g. audio and video quality).

- Added `RTVIObserverParams` which allows you to configure what RTVI messages
  are sent to the clients.

- Added a `context_window_compression` InputParam to
  `GeminiMultimodalLiveLLMService` which allows you to enable a sliding context
  window for the session as well as set the token limit of the sliding window.

- Updated `SmallWebRTCConnection` to support `ice_servers` with credentials.

- Added `VADUserStartedSpeakingFrame` and `VADUserStoppedSpeakingFrame`,
  indicating when the VAD detected the user to start and stop speaking. These
  events are helpful when using smart turn detection, as the user's stop time
  can differ from when their turn ends (signified by UserStoppedSpeakingFrame).

- Added `TranslationFrame`, a new frame type that contains a translated
  transcription.

- Added `TransportParams.audio_in_passthrough`. If set (the default), incoming
  audio will be pushed downstream.

- Added `MCPClient`; a way to connect to MCP servers and use the MCP servers'
  tools.

- Added `Mem0 OSS`, along with Mem0 cloud support now the OSS version is also
  available.

### Changed

- `TransportParams.audio_mixer` now supports a string and also a dictionary to
  provide a mixer per destination. For example:

```python
  audio_out_mixer={
      "track-1": SoundfileMixer(...),
      "track-2": SoundfileMixer(...),
      "track-N": SoundfileMixer(...),
  },
```

- The `STTMuteFilter` now mutes `InterimTranscriptionFrame` and
  `TranscriptionFrame` which allows the `STTMuteFilter` to be used in
  conjunction with transports that generate transcripts, e.g. `DailyTransport`.

- Function calls now receive a single parameter `FunctionCallParams` instead of
  `(function_name, tool_call_id, args, llm, context, result_callback)` which is
  now deprecated.

- Changed the user aggregator timeout for late transcriptions from 1.0s to 0.5s
  (`LLMUserAggregatorParams.aggregation_timeout`). Sometimes, the STT services
  might give us more than one transcription which could come after the user
  stopped speaking. We still want to include these additional transcriptions
  with the first one because it's part of the user turn. This is what this
  timeout is helpful with.

- Short utterances not detected by VAD while the bot is speaking are now
  ignored. This reduces the amount of bot interruptions significantly providing
  a more natural conversation experience.

- Updated `GladiaSTTService` to output a `TranslationFrame` when specifying a
  `translation` and `translation_config`.

- STT services now passthrough audio frames by default. This allows you to add
  audio recording without worrying about what's wrong in your pipeline when it
  doesn't work the first time.

- Input transports now always push audio downstream unless disabled with
  `TransportParams.audio_in_passthrough`. After many Pipecat releases, we
  realized this is the common use case. There are use cases where the input
  transport already provides STT and you also don't want recordings, in which
  case there's no need to push audio to the rest of the pipeline, but this is
  not a very common case.

- Added `RivaSegmentedSTTService`, which allows Riva offline/batch models, such
  as to be "canary-1b-asr" used in Pipecat.

### Deprecated

- Function calls with parameters
  `(function_name, tool_call_id, args, llm, context, result_callback)` are
  deprectated, use a single `FunctionCallParams` parameter instead.

- `TransportParams.camera_*` parameters are now deprecated, use
  `TransportParams.video_*` instead.

- `TransportParams.vad_enabled` parameter is now deprecated, use
  `TransportParams.audio_in_enabled` and `TransportParams.vad_analyzer` instead.

- `TransportParams.vad_audio_passthrough` parameter is now deprecated, use
  `TransportParams.audio_in_passthrough` instead.

- `ParakeetSTTService` is now deprecated, use `RivaSTTService` instead, which uses
  the model "parakeet-ctc-1.1b-asr" by default.

- `FastPitchTTSService` is now deprecated, use `RivaTTSService` instead, which uses
  the model "magpie-tts-multilingual" by default.

### Fixed

- Fixed an issue with `SimliVideoService` where the bot was continuously outputting
  audio, which prevents the `BotStoppedSpeakingFrame` from being emitted.

- Fixed an issue where `OpenAIRealtimeBetaLLMService` would add two assistant
  messages to the context.

- Fixed an issue with `GeminiMultimodalLiveLLMService` where the context
  contained tokens instead of words.

- Fixed an issue with HTTP Smart Turn handling, where the service returns a 500
  error. Previously, this would cause an unhandled exception. Now, a 500 error
  is treated as an incomplete response.

- Fixed a TTS services issue that could cause assistant output not to be
  aggregated to the context when also using `TTSSpeakFrame`s.

- Fixed an issue where the `SmartTurnMetricsData` was reporting 0ms for
  inference and processing time when using the `FalSmartTurnAnalyzer`.

### Other

- Added `examples/daily-custom-tracks` to show how to send and receive Daily
  custom tracks.

- Added `examples/daily-multi-translation` to showcase how to send multiple
  simulataneous translations with the same transport.

- Added 04 foundational examples for client/server transports. Also, renamed
  `29-livekit-audio-chat.py` to `04b-transports-livekit.py`.

- Added foundational example `13c-gladia-translation.py` showing how to use
  `TranscriptionFrame` and `TranslationFrame`.

## [0.0.65] - 2025-04-23 "Sant Jordi's release" 🌹📕

https://en.wikipedia.org/wiki/Saint_George%27s_Day_in_Catalonia

### Added

- Added automatic hangup logic to the Telnyx serializer. This feature hangs up
  the Telnyx call when an `EndFrame` or `CancelFrame` is received. It is
  enabled by default and is configurable via the `auto_hang_up` `InputParam`.

- Added a keepalive task to `GladiaSTTService` to prevent the websocket from
  disconnecting after 30 seconds of no audio input.

### Changed

- The `InputParams` for `ElevenLabsTTSService` and `ElevenLabsHttpTTSService`
  no longer require that `stability` and `similarity_boost` be set. You can
  individually set each param.

- In `TwilioFrameSerializer`, `call_sid` is Optional so as to avoid a breaking
  changed. `call_sid` is required to automatically hang up.

### Fixed

- Fixed an issue where `TwilioFrameSerializer` would send two hang up commands:
  one for the `EndFrame` and one for the `CancelFrame`.

## [0.0.64] - 2025-04-22

### Added

- Added automatic hangup logic to the Twilio serializer. This feature hangs up
  the Twilio call when an `EndFrame` or `CancelFrame` is received. It is
  enabled by default and is configurable via the `auto_hang_up` `InputParam`.

- Added `SmartTurnMetricsData`, which contains end-of-turn prediction metrics,
  to the `MetricsFrame`. Using `MetricsFrame`, you can now retrieve prediction
  confidence scores and processing time metrics from the smart turn analyzers.

- Added support for Application Default Credentials in Google services,
  `GoogleSTTService`, `GoogleTTSService`, and `GoogleVertexLLMService`.

- Added support for Smart Turn Detection via the `turn_analyzer` transport
  parameter. You can now choose between `HttpSmartTurnAnalyzer()` or
  `FalSmartTurnAnalyzer()` for remote inference or
  `LocalCoreMLSmartTurnAnalyzer()` for on-device inference using Core ML.

- `DeepgramTTSService` accepts `base_url` argument again, allowing you to
  connect to an on-prem service.

- Added `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` which allow
  you to control aggregator settings. You can now pass these arguments when
  creating aggregator pairs with `create_context_aggregator()`.

- Added `previous_text` context support to ElevenLabsHttpTTSService, improving
  speech consistency across sentences within an LLM response.

- Added word/timestamp pairs to `ElevenLabsHttpTTSService`.

- It is now possible to disable `SoundfileMixer` when created. You can then use
  `MixerEnableFrame` to dynamically enable it when necessary.

- Added `on_client_connected` and `on_client_disconnected` event handlers to
  the `DailyTransport` class. These handlers map to the same underlying Daily
  events as `on_participant_joined` and `on_participant_left`, respectively.
  This makes it easier to write a single bot pipeline that can also use other
  transports like `SmallWebRTCTransport` and `FastAPIWebsocketTransport`.

### Changed

- `GrokLLMService` now uses `grok-3-beta` as its default model.

- Daily's REST helpers now include an `eject_at_token_exp` param, which ejects
  the user when their token expires. This new parameter defaults to False.
  Also, the default value for `enable_prejoin_ui` changed to False and
  `eject_at_room_exp` changed to False.

- `OpenAILLMService` and `OpenPipeLLMService` now use `gpt-4.1` as their
  default model.

- `SoundfileMixer` constructor arguments need to be keywords.

### Deprecated

- `DeepgramSTTService` parameter `url` is now deprecated, use `base_url`
  instead.

### Removed

- Parameters `user_kwargs` and `assistant_kwargs` when creating a context
  aggregator pair using `create_context_aggregator()` have been removed. Use
  `user_params` and `assistant_params` instead.

### Fixed

- Fixed an issue that would cause TTS websocket-based services to not cleanup
  resources properly when disconnecting.

- Fixed a `TavusVideoService` issue that was causing audio choppiness.

- Fixed an issue in `SmallWebRTCTransport` where an error was thrown if the
  client did not create a video transceiver.

- Fixed an issue where LLM input parameters were not working and applied
  correctly in `GoogleVertexLLMService`, causing unexpected behavior during
  inference.

### Other

- Updated the `twilio-chatbot` example to use the auto-hangup feature.

## [0.0.63] - 2025-04-11

### Added

- Added media resolution control to `GeminiMultimodalLiveLLMService` with
  `GeminiMediaResolution` enum, allowing configuration of token usage for
  image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing
  with 256 tokens).

- Added Gemini's Voice Activity Detection (VAD) configuration to
  `GeminiMultimodalLiveLLMService` with `GeminiVADParams`, allowing fine
  control over speech detection sensitivity and timing, including:

  - Start sensitivity (how quickly speech is detected)
  - End sensitivity (how quickly turns end after pauses)
  - Prefix padding (milliseconds of audio to keep before speech is detected)
  - Silence duration (milliseconds of silence required to end a turn)

- Added comprehensive language support to `GeminiMultimodalLiveLLMService`,
  supporting over 30 languages via the `language` parameter, with proper
  mapping between Pipecat's `Language` enum and Gemini's language codes.

- Added support in `SmallWebRTCTransport` to detect when remote tracks are
  muted.

- Added support for image capture from a video stream to the
  `SmallWebRTCTransport`.

- Added a new iOS client option to the `SmallWebRTCTransport`
  **video-transform** example.

- Added new processors `ProducerProcessor` and `ConsumerProcessor`. The
  producer processor processes frames from the pipeline and decides whether the
  consumers should consume it or not. If so, the same frame that is received by
  the producer is sent to the consumer. There can be multiple consumers per
  producer. These processors can be useful to push frames from one part of a
  pipeline to a different one (e.g. when using `ParallelPipeline`).

- Improvements for the `SmallWebRTCTransport`:
  - Wait until the pipeline is ready before triggering the `connected` event.
  - Queue messages if the data channel is not ready.
  - Update the aiortc dependency to fix an issue where the 'video/rtx' MIME
    type was incorrectly handled as a codec retransmission.
  - Avoid initial video delays.

### Changed

- In `GeminiMultimodalLiveLLMService`, removed the `transcribe_model_audio`
  parameter in favor of Gemini Live's native output transcription support. Now
  text transcriptions are produced directly by the model. No configuration is
  required.

- Updated `GeminiMultimodalLiveLLMService`’s default `model` to
  `models/gemini-2.0-flash-live-001` and `base_url` to the `v1beta` websocket
  URL.

### Fixed

- Updated `daily-python` to 0.17.0 to fix an issue that was preventing to run on
  older platforms.

- Fixed an issue where `CartesiaTTSService`'s spell feature would result in
  the spelled word in the context appearing as "F,O,O,B,A,R" instead of
  "FOOBAR".

- Fixed an issue in the Azure TTS services where the language was being set
  incorrectly.

- Fixed `SmallWebRTCTransport` to support dynamic values for
  `TransportParams.audio_out_10ms_chunks`. Previously, it only worked with 20ms
  chunks.

- Fixed an issue with `GeminiMultimodalLiveLLMService` where the assistant
  context messages had no space between words.

- Fixed an issue where `LLMAssistantContextAggregator` would prevent a
  `BotStoppedSpeakingFrame` from moving through the pipeline.

## [0.0.62] - 2025-04-01 "An April Fools' release"

### Added

- Added `TransportParams.audio_out_10ms_chunks` parameter to allow controlling
  the amount of audio being sent by the output transport. It defaults to 4, so
  40ms audio chunks are sent.

- Added `QwenLLMService` for Qwen integration with an OpenAI-compatible
  interface. Added foundational example `14q-function-calling-qwen.py`.

- Added `Mem0MemoryService`. Mem0 is a self-improving memory layer for LLM
  applications. Learn more at: https://mem0.ai/.

- Added `WhisperSTTServiceMLX` for Whisper transcription on Apple Silicon.
  See example in `examples/foundational/13e-whisper-mlx.py`. Latency of
  completed transcription using Whisper large-v3-turbo on an M4 macbook is
  ~500ms.

- Added `SmallWebRTCTransport`, a new P2P WebRTC transport.

  - Created two examples in `p2p-webrtc`:
    - **video-transform**: Demonstrates sending and receiving audio/video with
      `SmallWebRTCTransport` using `TypeScript`. Includes video frame
      processing with OpenCV.
    - **voice-agent**: A minimal example of creating a voice agent with
      `SmallWebRTCTransport`.

- `GladiaSTTService` now have comprehensive support for the latest API config
  options, including model, language detection, preprocessing, custom
  vocabulary, custom spelling, translation, and message filtering options.

- Added `SmallWebRTCTransport`, a new P2P WebRTC transport.

  - Created two examples in `p2p-webrtc`:
    - **video-transform**: Demonstrates sending and receiving audio/video with
      `SmallWebRTCTransport` using `TypeScript`. Includes video frame
      processing with OpenCV.
    - **voice-agent**: A minimal example of creating a voice agent with
      `SmallWebRTCTransport`.

- Added support to `ProtobufFrameSerializer` to send the messages from
  `TransportMessageFrame` and `TransportMessageUrgentFrame`.

- Added support for a new TTS service, `PiperTTSService`.
  (see https://github.com/rhasspy/piper/)

- It is now possible to tell whether `UserStartedSpeakingFrame` or
  `UserStoppedSpeakingFrame` have been generated because of emulation frames.

### Changed

- `FunctionCallResultFrame`a are now system frames. This is to prevent function
  call results to be discarded during interruptions.

- Pipecat services have been reorganized into packages. Each package can have
  one or more of the following modules (in the future new module names might be
  needed) depending on the services implemented:

  - image: for image generation services
  - llm: for LLM services
  - memory: for memory services
  - stt: for Speech-To-Text services
  - tts: for Text-To-Speech services
  - video: for video generation services
  - vision: for video recognition services

- Base classes for AI services have been reorganized into modules. They can now
  be found in
  `pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]`.

- `GladiaSTTService` now uses the `solaria-1` model by default. Other params
  use Gladia's default values. Added support for more language codes.

### Deprecated

- All Pipecat services imports have been deprecated and a warning will be shown
  when using the old import. The new import should be
  `pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]`. For
  example, `from pipecat.services.openai.llm import OpenAILLMService`.

- Import for AI services base classes from `pipecat.services.ai_services` is now
  deprecated, use one of
  `pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]`.

- Deprecated the `language` parameter in `GladiaSTTService.InputParams` in
  favor of `language_config`, which better aligns with Gladia's API.

- Deprecated using `GladiaSTTService.InputParams` directly. Use the new
  `GladiaInputParams` class instead.

### Fixed

- Fixed a `FastAPIWebsocketTransport` and `WebsocketClientTransport` issue that
  would cause the transport to be closed prematurely, preventing the internally
  queued audio to be sent. The same issue could also cause an infinite loop
  while using an output mixer and when sending an `EndFrame`, preventing the bot
  to finish.

- Fixed an issue that could cause the `TranscriptionUpdateFrame` being pushed
  because of an interruption to be discarded.

- Fixed an issue that would cause `SegmentedSTTService` based services
  (e.g. `OpenAISTTService`) to try to transcribe non-spoken audio, causing
  invalid transcriptions.

- Fixed an issue where `GoogleTTSService` was emitting two `TTSStoppedFrames`.

### Performance

- Output transports now send 40ms audio chunks instead of 20ms. This should
  improve performance.

- `BotSpeakingFrame`s are now sent every 200ms. If the output transport audio chunks
  are higher than 200ms then they will be sent at every audio chunk.

### Other

- Added foundational example `37-mem0.py` demonstrating how to use the
  `Mem0MemoryService`.

- Added foundational example `13e-whisper-mlx.py` demonstrating how to use the
  `WhisperSTTServiceMLX`.

## [0.0.61] - 2025-03-26

### Added

- Added a new frame, `LLMSetToolChoiceFrame`, which provides a mechanism
  for modifying the `tool_choice` in the context.

- Added `GroqTTSService` which provides text-to-speech functionality using
  Groq's API.

- Added support in `DailyTransport` for updating remote participants'
  `canReceive` permission via the `update_remote_participants()` method, by
  bumping the daily-python dependency to >= 0.16.0.

- ElevenLabs TTS services now support a sample rate of 8000.

- Added support for `instructions` in `OpenAITTSService`.

- Added support for `base_url` in `OpenAIImageGenService` and
  `OpenAITTSService`.

### Fixed

- Fixed an issue in `RTVIObserver` that prevented handling of Google LLM
  context messages. The observer now processes both OpenAI-style and
  Google-style contexts.

- Fixed an issue in Daily involving switching virtual devices, by bumping the
  daily-python dependency to >= 0.16.1.

- Fixed a `GoogleAssistantContextAggregator` issue where function calls
  placeholders where not being updated when then function call result was
  different from a string.

- Fixed an issue that would cause `LLMAssistantContextAggregator` to block
  processing more frames while processing a function call result.

- Fixed an issue where the `RTVIObserver` would report two bot started and
  stopped speaking events for each bot turn.

- Fixed an issue in `UltravoxSTTService` that caused improper audio processing
  and incorrect LLM frame output.

### Other

- Added `examples/foundational/07x-interruptible-local.py` to show how a local
  transport can be used.

## [0.0.60] - 2025-03-20

### Added

- Added `default_headers` parameter to `BaseOpenAILLMService` constructor.

### Changed

- Rollback to `deepgram-sdk` 3.8.0 since 3.10.1 was causing connections issues.

- Changed the default `InputAudioTranscription` model to `gpt-4o-transcribe`
  for `OpenAIRealtimeBetaLLMService`.

### Other

- Update the `19-openai-realtime-beta.py` and `19a-azure-realtime-beta.py`
  examples to use the FunctionSchema format.

## [0.0.59] - 2025-03-20

### Added

- When registering a function call it is now possible to indicate if you want
  the function call to be cancelled if there's a user interruption via
  `cancel_on_interruption` (defaults to False). This is now possible because
  function calls are executed concurrently.

- Added support for detecting idle pipelines. By default, if no activity has
  been detected during 5 minutes, the `PipelineTask` will be automatically
  cancelled. It is possible to override this behavior by passing
  `cancel_on_idle_timeout=False`. It is also possible to change the default
  timeout with `idle_timeout_secs` or the frames that prevent the pipeline from
  being idle with `idle_timeout_frames`. Finally, an `on_idle_timeout` event
  handler will be triggered if the idle timeout is reached (whether the pipeline
  task is cancelled or not).

- Added `FalSTTService`, which provides STT for Fal's Wizper API.

- Added a `reconnect_on_error` parameter to websocket-based TTS services as well
  as a `on_connection_error` event handler. The `reconnect_on_error` indicates
  whether the TTS service should reconnect on error. The `on_connection_error`
  will always get called if there's any error no matter the value of
  `reconnect_on_error`. This allows, for example, to fallback to a different TTS
  provider if something goes wrong with the current one.

- Added new `SkipTagsAggregator` that extends `BaseTextAggregator` to aggregate
  text and skips end of sentence matching if aggregated text is between
  start/end tags.

- Added new `PatternPairAggregator` that extends `BaseTextAggregator` to
  identify content between matching pattern pairs in streamed text. This allows
  for detection and processing of structured content like XML-style tags that
  may span across multiple text chunks or sentence boundaries.

- Added new `BaseTextAggregator`. Text aggregators are used by the TTS service
  to aggregate LLM tokens and decide when the aggregated text should be pushed
  to the TTS service. They also allow for the text to be manipulated while it's
  being aggregated. A text aggregator can be passed via `text_aggregator` to the
  TTS service.

- Added new `sample_rate` constructor parameter to `TavusVideoService` to allow
  changing the output sample rate.

- Added new `NeuphonicTTSService`.
  (see https://neuphonic.com)

- Added new `UltravoxSTTService`.
  (see https://github.com/fixie-ai/ultravox)

- Added `on_frame_reached_upstream` and `on_frame_reached_downstream` event
  handlers to `PipelineTask`. Those events will be called when a frame reaches
  the beginning or end of the pipeline respectively. Note that by default, the
  event handlers will not be called unless a filter is set with
  `PipelineTask.set_reached_upstream_filter()` or
  `PipelineTask.set_reached_downstream_filter()`.

- Added support for Chirp voices in `GoogleTTSService`.

- Added a `flush_audio()` method to `FishTTSService` and `LmntTTSService`.

- Added a `set_language` convenience method for `GoogleSTTService`, allowing
  you to set a single language. This is in addition to the `set_languages`
  method which allows you to set a list of languages.

- Added `on_user_turn_audio_data` and `on_bot_turn_audio_data` to
  `AudioBufferProcessor`. This gives the ability to grab the audio of only that
  turn for both the user and the bot.

- Added new base class `BaseObject` which is now the base class of
  `FrameProcessor`, `PipelineRunner`, `PipelineTask` and `BaseTransport`. The
  new `BaseObject` adds supports for event handlers.

- Added support for a unified format for specifying function calling across all
  LLM services.

```python
  weather_function = FunctionSchema(
      name="get_current_weather",
      description="Get the current weather",
      properties={
          "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA",
          },
          "format": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"],
              "description": "The temperature unit to use. Infer this from the user's location.",
          },
      },
      required=["location"],
  )
  tools = ToolsSchema(standard_tools=[weather_function])
```

- Added `speech_threshold` parameter to `GladiaSTTService`.

- Allow passing user (`user_kwargs`) and assistant (`assistant_kwargs`) context
  aggregator parameters when using `create_context_aggregator()`. The values are
  passed as a mapping that will then be converted to arguments.

- Added `speed` as an `InputParam` for both `ElevenLabsTTSService` and
  `ElevenLabsHttpTTSService`.

- Added new `LLMFullResponseAggregator` to aggregate full LLM completions. At
  every completion the `on_completion` event handler is triggered.

- Added a new frame, `RTVIServerMessageFrame`, and RTVI message
  `RTVIServerMessage` which provides a generic mechanism for sending custom
  messages from server to client. The `RTVIServerMessageFrame` is processed by
  the `RTVIObserver` and will be delivered to the client's `onServerMessage`
  callback or `ServerMessage` event.

- Added `GoogleLLMOpenAIBetaService` for Google LLM integration with an
  OpenAI-compatible interface. Added foundational example
  `14o-function-calling-gemini-openai-format.py`.

- Added `AzureRealtimeBetaLLMService` to support Azure's OpeanAI Realtime API. Added
  foundational example `19a-azure-realtime-beta.py`.

- Introduced `GoogleVertexLLMService`, a new class for integrating with Vertex AI
  Gemini models. Added foundational example
  `14p-function-calling-gemini-vertex-ai.py`.

- Added support in `OpenAIRealtimeBetaLLMService` for a slate of new features:

  - The `'gpt-4o-transcribe'` input audio transcription model, along
    with new `language` and `prompt` options specific to that model.
  - The `input_audio_noise_reduction` session property.

    ```python
    session_properties = SessionProperties(
      # ...
      input_audio_noise_reduction=InputAudioNoiseReduction(
        type="near_field" # also supported: "far_field"
      )
      # ...
    )
    ```

  - The `'semantic_vad'` `turn_detection` session property value, a more
    sophisticated model for detecting when the user has stopped speaking.
  - `on_conversation_item_created` and `on_conversation_item_updated`
    events to `OpenAIRealtimeBetaLLMService`.

    ```python
    @llm.event_handler("on_conversation_item_created")
    async def on_conversation_item_created(llm, item_id, item):
      # ...

    @llm.event_handler("on_conversation_item_updated")
    async def on_conversation_item_updated(llm, item_id, item):
      # `item` may not always be available here
      # ...
    ```

  - The `retrieve_conversation_item(item_id)` method for introspecting a
    conversation item on the server.

    ```python
    item = await llm.retrieve_conversation_item(item_id)
    ```

### Changed

- Updated `OpenAISTTService` to use `gpt-4o-transcribe` as the default
  transcription model.

- Updated `OpenAITTSService` to use `gpt-4o-mini-tts` as the default TTS model.

- Function calls are now executed in tasks. This means that the pipeline will
  not be blocked while the function call is being executed.

- ⚠️ `PipelineTask` will now be automatically cancelled if no bot activity is
  happening in the pipeline. There are a few settings to configure this
  behavior, see `PipelineTask` documentation for more details.

- All event handlers are now executed in separate tasks in order to prevent
  blocking the pipeline. It is possible that event handlers take some time to
  execute in which case the pipeline would be blocked waiting for the event
  handler to complete.

- Updated `TranscriptProcessor` to support text output from
  `OpenAIRealtimeBetaLLMService`.

- `OpenAIRealtimeBetaLLMService` and `GeminiMultimodalLiveLLMService` now push
  a `TTSTextFrame`.

- Updated the default mode for `CartesiaTTSService` and
  `CartesiaHttpTTSService` to `sonic-2`.

### Deprecated

- Passing a `start_callback` to `LLMService.register_function()` is now
  deprecated, simply move the code from the start callback to the function call.

- `TTSService` parameter `text_filter` is now deprecated, use `text_filters`
  instead which is now a list. This allows passing multiple filters that will be
  executed in order.

### Removed

- Removed deprecated `audio.resample_audio()`, use `create_default_resampler()`
  instead.

- Removed deprecated`stt_service` parameter from `STTMuteFilter`.

- Removed deprecated RTVI processors, use an `RTVIObserver` instead.

- Removed deprecated `AWSTTSService`, use `PollyTTSService` instead.

- Removed deprecated field `tier` from `DailyTranscriptionSettings`, use `model`
  instead.

- Removed deprecated `pipecat.vad` package, use `pipecat.audio.vad` instead.

### Fixed

- Fixed an assistant aggregator issue that could cause assistant text to be
  split into multiple chunks during function calls.

- Fixed an assistant aggregator issue that was causing assistant text to not be
  added to the context during function calls. This could lead to duplications.

- Fixed a `SegmentedSTTService` issue that was causing audio to be sent
  prematurely to the STT service. Instead of analyzing the volume in this
  service we rely on VAD events which use both VAD and volume.

- Fixed a `GeminiMultimodalLiveLLMService` issue that was causing messages to be
  duplicated in the context when pushing `LLMMessagesAppendFrame` frames.

- Fixed an issue with `SegmentedSTTService` based services
  (e.g. `GroqSTTService`) that was not allow audio to pass-through downstream.

- Fixed a `CartesiaTTSService` and `RimeTTSService` issue that would consider
  text between spelling out tags end of sentence.

- Fixed a `match_endofsentence` issue that would result in floating point
  numbers to be considered an end of sentence.

- Fixed a `match_endofsentence` issue that would result in emails to be
  considered an end of sentence.

- Fixed an issue where the RTVI message `disconnect-bot` was pushing an
  `EndFrame`, resulting in the pipeline not shutting down. It now pushes an
  `EndTaskFrame` upstream to shutdown the pipeline.

- Fixed an issue with the `GoogleSTTService` where stream timeouts during
  periods of inactivity were causing connection failures. The service now
  properly detects timeout errors and handles reconnection gracefully,
  ensuring continuous operation even after periods of silence or when using an
  `STTMuteFilter`.

- Fixed an issue in `RimeTTSService` where the last line of text sent didn't
  result in an audio output being generated.

- Fixed `OpenAIRealtimeBetaLLMService` by adding proper handling for:
  - The `conversation.item.input_audio_transcription.delta` server message,
    which was added server-side at some point and not handled client-side.
  - Errors reported by the `response.done` server message.

### Other

- Add foundational example `07w-interruptible-fal.py`, showing `FalSTTService`.

- Added a new Ultravox example
  `examples/foundational/07u-interruptible-ultravox.py`.

- Added new Neuphonic examples
  `examples/foundational/07v-interruptible-neuphonic.py` and
  `examples/foundational/07v-interruptible-neuphonic-http.py`.

- Added a new example `examples/foundational/36-user-email-gathering.py` to show
  how to gather user emails. The example uses's Cartesia's `<spell></spell>`
  tags and Rime `spell()` function to spell out the emails for confirmation.

- Update the `34-audio-recording.py` example to include an STT processor.

- Added foundational example `35-voice-switching.py` showing how to use the new
  `PatternPairAggregator`. This example shows how to encode information for the
  LLM to instruct TTS voice changes, but this can be used to encode any
  information into the LLM response, which you want to parse and use in other
  parts of your application.

- Added a Pipecat Cloud deployment example to the `examples` directory.

- Removed foundational examples 28b and 28c as the TranscriptProcessor no
  longer has an LLM depedency. Renamed foundational example 28a to
  `28-transcript-processor.py`.

## [0.0.58] - 2025-02-26

### Added

- Added track-specific audio event `on_track_audio_data` to
  `AudioBufferProcessor` for accessing separate input and output audio tracks.

- Pipecat version will now be logged on every application startup. This will
  help us identify what version we are running in case of any issues.

- Added a new `StopFrame` which can be used to stop a pipeline task while
  keeping the frame processors running. The frame processors could then be used
  in a different pipeline. The difference between a `StopFrame` and a
  `StopTaskFrame` is that, as with `EndFrame` and `EndTaskFrame`, the
  `StopFrame` is pushed from the task and the `StopTaskFrame` is pushed upstream
  inside the pipeline by any processor.

- Added a new `PipelineTask` parameter `observers` that replaces the previous
  `PipelineParams.observers`.

- Added a new `PipelineTask` parameter `check_dangling_tasks` to enable or
  disable checking for frame processors' dangling tasks when the Pipeline
  finishes running.

- Added new `on_completion_timeout` event for LLM services (all OpenAI-based
  services, Anthropic and Google). Note that this event will only get triggered
  if LLM timeouts are setup and if the timeout was reached. It can be useful to
  retrigger another completion and see if the timeout was just a blip.

- Added new log observers `LLMLogObserver` and `TranscriptionLogObserver` that
  can be useful for debugging your pipelines.

- Added `room_url` property to `DailyTransport`.

- Added `addons` argument to `DeepgramSTTService`.

- Added `exponential_backoff_time()` to `utils.network` module.

### Changed

- ⚠️ `PipelineTask` now requires keyword arguments (except for the first one for
  the pipeline).

- Updated `PlayHTHttpTTSService` to take a `voice_engine` and `protocol` input
  in the constructor. The previous method of providing a `voice_engine` input
  that contains the engine and protocol is deprecated by PlayHT.

- The base `TTSService` class now strips leading newlines before sending text
  to the TTS provider. This change is to solve issues where some TTS providers,
  like Azure, would not output text due to newlines.

- `GrokLLMSService` now uses `grok-2` as the default model.

- `AnthropicLLMService` now uses `claude-3-7-sonnet-20250219` as the default
  model.

- `RimeHttpTTSService` needs an `aiohttp.ClientSession` to be passed to the
  constructor as all the other HTTP-based services.

- `RimeHttpTTSService` doesn't use a default voice anymore.

- `DeepgramSTTService` now uses the new `nova-3` model by default. If you want
  to use the previous model you can pass `LiveOptions(model="nova-2-general")`.
  (see https://deepgram.com/learn/introducing-nova-3-speech-to-text-api)

```python
stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
```

### Deprecated

- `PipelineParams.observers` is now deprecated, you the new `PipelineTask`
  parameter `observers`.

### Removed

- Remove `TransportParams.audio_out_is_live` since it was not being used at all.

### Fixed

- Fixed an issue that would cause undesired interruptions via
  `EmulateUserStartedSpeakingFrame`.

- Fixed a `GoogleLLMService` that was causing an exception when sending inline
  audio in some cases.

- Fixed an `AudioContextWordTTSService` issue that would cause an `EndFrame` to
  disconnect from the TTS service before audio from all the contexts was
  received. This affected services like Cartesia and Rime.

- Fixed an issue that was not allowing to pass an `OpenAILLMContext` to create
  `GoogleLLMService`'s context aggregators.

- Fixed a `ElevenLabsTTSService`, `FishAudioTTSService`, `LMNTTTSService` and
  `PlayHTTTSService` issue that was resulting in audio requested before an
  interruption being played after an interruption.

- Fixed `match_endofsentence` support for ellipses.

- Fixed an issue where `EndTaskFrame` was not triggering
  `on_client_disconnected` or closing the WebSocket in FastAPI.

- Fixed an issue in `DeepgramSTTService` where the `sample_rate` passed to the
  `LiveOptions` was not being used, causing the service to use the default
  sample rate of pipeline.

- Fixed a context aggregator issue that would not append the LLM text response
  to the context if a function call happened in the same LLM turn.

- Fixed an issue that was causing HTTP TTS services to push `TTSStoppedFrame`
  more than once.

- Fixed a `FishAudioTTSService` issue where `TTSStoppedFrame` was not being
  pushed.

- Fixed an issue that `start_callback` was not invoked for some LLM services.

- Fixed an issue that would cause `DeepgramSTTService` to stop working after an
  error occurred (e.g. sudden network loss). If the network recovered we would
  not reconnect.

- Fixed a `STTMuteFilter` issue that would not mute user audio frames causing
  transcriptions to be generated by the STT service.

### Other

- Added Gemini support to `examples/phone-chatbot`.

- Added foundational example `34-audio-recording.py` showing how to use the
  AudioBufferProcessor callbacks to save merged and track recordings.

## [0.0.57] - 2025-02-14

### Added

- Added new `AudioContextWordTTSService`. This is a TTS base class for TTS
  services that handling multiple separate audio requests.

- Added new frames `EmulateUserStartedSpeakingFrame` and
  `EmulateUserStoppedSpeakingFrame` which can be used to emulated VAD behavior
  without VAD being present or not being triggered.

- Added a new `audio_in_stream_on_start` field to `TransportParams`.

- Added a new method `start_audio_in_streaming` in the `BaseInputTransport`.

  - This method should be used to start receiving the input audio in case the
    field `audio_in_stream_on_start` is set to `false`.

- Added support for the `RTVIProcessor` to handle buffered audio in `base64`
  format, converting it into InputAudioRawFrame for transport.

- Added support for the `RTVIProcessor` to trigger `start_audio_in_streaming`
  only after the `client-ready` message.

- Added new `MUTE_UNTIL_FIRST_BOT_COMPLETE` strategy to `STTMuteStrategy`. This
  strategy starts muted and remains muted until the first bot speech completes,
  ensuring the bot's first response cannot be interrupted. This complements the
  existing `FIRST_SPEECH` strategy which only mutes during the first detected
  bot speech.

- Added support for Google Cloud Speech-to-Text V2 through `GoogleSTTService`.

- Added `RimeTTSService`, a new `WordTTSService`. Updated the foundational
  example `07q-interruptible-rime.py` to use `RimeTTSService`.

- Added support for Groq's Whisper API through the new `GroqSTTService` and
  OpenAI's Whisper API through the new `OpenAISTTService`. Introduced a new
  base class `BaseWhisperSTTService` to handle common Whisper API
  functionality.

- Added `PerplexityLLMService` for Perplexity NIM API integration, with an
  OpenAI-compatible interface. Also, added foundational example
  `14n-function-calling-perplexity.py`.

- Added `DailyTransport.update_remote_participants()`. This allows you to update
  remote participant's settings, like their permissions or which of their
  devices are enabled. Requires that the local participant have participant
  admin permission.

### Changed

- We don't consider a colon `:` and end of sentence any more.

- Updated `DailyTransport` to respect the `audio_in_stream_on_start` field,
  ensuring it only starts receiving the audio input if it is enabled.

- Updated `FastAPIWebsocketOutputTransport` to send `TransportMessageFrame` and
  `TransportMessageUrgentFrame` to the serializer.

- Updated `WebsocketServerOutputTransport` to send `TransportMessageFrame` and
  `TransportMessageUrgentFrame` to the serializer.

- Enhanced `STTMuteConfig` to validate strategy combinations, preventing
  `MUTE_UNTIL_FIRST_BOT_COMPLETE` and `FIRST_SPEECH` from being used together
  as they handle first bot speech differently.

- Updated foundational example `07n-interruptible-google.py` to use all Google
  services.

- `RimeHttpTTSService` now uses the `mistv2` model by default.

- Improved error handling in `AzureTTSService` to properly detect and log
  synthesis cancellation errors.

- Enhanced `WhisperSTTService` with full language support and improved model
  documentation.

- Updated foundation example `14f-function-calling-groq.py` to use
  `GroqSTTService` for transcription.

- Updated `GroqLLMService` to use `llama-3.3-70b-versatile` as the default
  model.

- `RTVIObserver` doesn't handle `LLMSearchResponseFrame` frames anymore. For
  now, to handle those frames you need to create a `GoogleRTVIObserver`
  instead.

### Deprecated

- `STTMuteFilter` constructor's `stt_service` parameter is now deprecated and
  will be removed in a future version. The filter now manages mute state
  internally instead of querying the STT service.

- `RTVI.observer()` is now deprecated, instantiate an `RTVIObserver` directly
  instead.

- All RTVI frame processors (e.g. `RTVISpeakingProcessor`,
  `RTVIBotLLMProcessor`) are now deprecated, instantiate an `RTVIObserver`
  instead.

### Fixed

- Fixed a `FalImageGenService` issue that was causing the event loop to be
  blocked while loading the downloadded image.

- Fixed a `CartesiaTTSService` service issue that would cause audio overlapping
  in some cases.

- Fixed a websocket-based service issue (e.g. `CartesiaTTSService`) that was
  preventing a reconnection after the server disconnected cleanly, which was
  causing an inifite loop instead.

- Fixed a `BaseOutputTransport` issue that was causing upstream frames to no be
  pushed upstream.

- Fixed multiple issue where user transcriptions where not being handled
  properly. It was possible for short utterances to not trigger VAD which would
  cause user transcriptions to be ignored. It was also possible for one or more
  transcriptions to be generated after VAD in which case they would also be
  ignored.

- Fixed an issue that was causing `BotStoppedSpeakingFrame` to be generated too
  late. This could then cause issues unblocking `STTMuteFilter` later than
  desired.

- Fixed an issue that was causing `AudioBufferProcessor` to not record
  synchronized audio.

- Fixed an `RTVI` issue that was causing `bot-tts-text` messages to be sent
  before being processed by the output transport.

- Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect
  the websocket connection even when the connection is already closed.

- Fixed an issue where `has_regular_messages` condition was always true in
  `GoogleLLMContext` due to `Part` having `function_call` & `function_response`
  with `None` values.

### Other

- Added new `instant-voice` example. This example showcases how to enable
  instant voice communication as soon as a user connects.

- Added new `local-input-select-stt` example. This examples allows you to play
  with local audio inputs by slecting them through a nice text interface.

## [0.0.56] - 2025-02-06

### Changed

- Use `gemini-2.0-flash-001` as the default model for `GoogleLLMSerivce`.

- Improved foundational examples 22b, 22c, and 22d to support function calling.
  With these base examples, `FunctionCallInProgressFrame` and
  `FunctionCallResultFrame` will no longer be blocked by the gates.

### Fixed

- Fixed a `TkLocalTransport` and `LocalAudioTransport` issues that was causing
  errors on cleanup.

- Fixed an issue that was causing `tests.utils` import to fail because of
  logging setup.

- Fixed a `SentryMetrics` issue that was preventing any metrics to be sent to
  Sentry and also was preventing from metrics frames to be pushed to the
  pipeline.

- Fixed an issue in `BaseOutputTransport` where incoming audio would not be
  resampled to the desired output sample rate.

- Fixed an issue with the `TwilioFrameSerializer` and `TelnyxFrameSerializer`
  where `twilio_sample_rate` and `telnyx_sample_rate` were incorrectly
  initialized to `audio_in_sample_rate`. Those values currently default to 8000
  and should be set manually from the serializer constructor if a different
  value is needed.

### Other

- Added a new `sentry-metrics` example.

## [0.0.55] - 2025-02-05

### Added

- Added a new `start_metadata` field to `PipelineParams`. The provided metadata
  will be set to the initial `StartFrame` being pushed from the `PipelineTask`.

- Added new fields to `PipelineParams` to control audio input and output sample
  rates for the whole pipeline. This allows controlling sample rates from a
  single place instead of having to specify sample rates in each
  service. Setting a sample rate to a service is still possible and will
  override the value from `PipelineParams`.

- Introduce audio resamplers (`BaseAudioResampler`). This is just a base class
  to implement audio resamplers. Currently, two implementations are provided
  `SOXRAudioResampler` and `ResampyResampler`. A new
  `create_default_resampler()` has been added (replacing the now deprecated
  `resample_audio()`).

- It is now possible to specify the asyncio event loop that a `PipelineTask` and
  all the processors should run on by passing it as a new argument to the
  `PipelineRunner`. This could allow running pipelines in multiple threads each
  one with its own event loop.

- Added a new `utils.TaskManager`. Instead of a global task manager we now have
  a task manager per `PipelineTask`. In the previous version the task manager
  was global, so running multiple simultaneous `PipelineTask`s could result in
  dangling task warnings which were not actually true. In order, for all the
  processors to know about the task manager, we pass it through the
  `StartFrame`. This means that processors should create tasks when they receive
  a `StartFrame` but not before (because they don't have a task manager yet).

- Added `TelnyxFrameSerializer` to support Telnyx calls. A full running example
  has also been added to `examples/telnyx-chatbot`.

- Allow pushing silence audio frames before `TTSStoppedFrame`. This might be
  useful for testing purposes, for example, passing bot audio to an STT service
  which usually needs additional audio data to detect the utterance stopped.

- `TwilioSerializer` now supports transport message frames. With this we can
  create Twilio emulators.

- Added a new transport: `WebsocketClientTransport`.

- Added a `metadata` field to `Frame` which makes it possible to pass custom
  data to all frames.

- Added `test/utils.py` inside of pipecat package.

### Changed

- `GatedOpenAILLMContextAggregator` now require keyword arguments. Also, a new
  `start_open` argument has been added to set the initial state of the gate.

- Added `organization` and `project` level authentication to
  `OpenAILLMService`.

- Improved the language checking logic in `ElevenLabsTTSService` and
  `ElevenLabsHttpTTSService` to properly handle language codes based on model
  compatibility, with appropriate warnings when language codes cannot be
  applied.

- Updated `GoogleLLMContext` to support pushing `LLMMessagesUpdateFrame`s that
  contain a combination of function calls, function call responses, system
  messages, or just messages.

- `InputDTMFFrame` is now based on `DTMFFrame`. There's also a new
  `OutputDTMFFrame` frame.

### Deprecated

- `resample_audio()` is now deprecated, use `create_default_resampler()`
  instead.

### Removed

- `AudioBufferProcessor.reset_audio_buffers()` has been removed, use
  `AudioBufferProcessor.start_recording()` and
  `AudioBufferProcessor.stop_recording()` instead.

### Fixed

- Fixed a `AudioBufferProcessor` that would cause crackling in some recordings.

- Fixed an issue in `AudioBufferProcessor` where user callback would not be
  called on task cancellation.

- Fixed an issue in `AudioBufferProcessor` that would cause wrong silence
  padding in some cases.

- Fixed an issue where `ElevenLabsTTSService` messages would return a 1009
  websocket error by increasing the max message size limit to 16MB.

- Fixed a `DailyTransport` issue that would cause events to be triggered before
  join finished.

- Fixed a `PipelineTask` issue that was preventing processors to be cleaned up
  after cancelling the task.

- Fixed an issue where queuing a `CancelFrame` to a pipeline task would not
  cause the task to finish. However, using `PipelineTask.cancel()` is still the
  recommended way to cancel a task.

### Other

- Improved Unit Test `run_test()` to use `PipelineTask` and
  `PipelineRunner`. There's now also some control around `StartFrame` and
  `EndFrame`. The `EndTaskFrame` has been removed since it doesn't seem
  necessary with this new approach.

- Updated `twilio-chatbot` with a few new features: use 8000 sample rate and
  avoid resampling, a new client useful for stress testing and testing locally
  without the need to make phone calls. Also, added audio recording on both the
  client and the server to make sure the audio sounds good.

- Updated examples to use `task.cancel()` to immediately exit the example when a
  participant leaves or disconnects, instead of pushing an `EndFrame`. Pushing
  an `EndFrame` causes the bot to run through everything that is internally
  queued (which could take some seconds). Note that using `task.cancel()` might
  not always be the best option and pushing an `EndFrame` could still be
  desirable to make sure all the pipeline is flushed.

## [0.0.54] - 2025-01-27

### Added

- In order to create tasks in Pipecat frame processors it is now recommended to
  use `FrameProcessor.create_task()` (which uses the new
  `utils.asyncio.create_task()`). It takes care of uncaught exceptions, task
  cancellation handling and task management. To cancel or wait for a task there
  is `FrameProcessor.cancel_task()` and `FrameProcessor.wait_for_task()`. All of
  Pipecat processors have been updated accordingly. Also, when a pipeline runner
  finishes, a warning about dangling tasks might appear, which indicates if any
  of the created tasks was never cancelled or awaited for (using these new
  functions).

- It is now possible to specify the period of the `PipelineTask` heartbeat
  frames with `heartbeats_period_secs`.

- Added `DailyMeetingTokenProperties` and `DailyMeetingTokenParams` Pydantic models
  for meeting token creation in `get_token` method of `DailyRESTHelper`.

- Added `enable_recording` and `geo` parameters to `DailyRoomProperties`.

- Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings
  to a custom AWS bucket.

### Changed

- Enhanced `UserIdleProcessor` with retry functionality and control over idle
  monitoring via new callback signature `(processor, retry_count) -> bool`.
  Updated the `17-detect-user-idle.py` to show how to use the `retry_count`.

- Add defensive error handling for `OpenAIRealtimeBetaLLMService`'s audio
  truncation. Audio truncation errors during interruptions now log a warning
  and allow the session to continue instead of throwing an exception.

- Modified `TranscriptProcessor` to use TTS text frames for more accurate assistant
  transcripts. Assistant messages are now aggregated based on bot speaking boundaries
  rather than LLM context, providing better handling of interruptions and partial
  utterances.

- Updated foundational examples `28a-transcription-processor-openai.py`,
  `28b-transcript-processor-anthropic.py`, and
  `28c-transcription-processor-gemini.py` to use the updated
  `TranscriptProcessor`.

### Fixed

- Fixed an `GeminiMultimodalLiveLLMService` issue that was preventing the user
  to push initial LLM assistant messages (using `LLMMessagesAppendFrame`).

- Added missing `FrameProcessor.cleanup()` calls to `Pipeline`,
  `ParallelPipeline` and `UserIdleProcessor`.

- Fixed a type error when using `voice_settings` in `ElevenLabsHttpTTSService`.

- Fixed an issue where `OpenAIRealtimeBetaLLMService` function calling resulted
  in an error.

- Fixed an issue in `AudioBufferProcessor` where the last audio buffer was not
  being processed, in cases where the `_user_audio_buffer` was smaller than the
  buffer size.

### Performance

- Replaced audio resampling library `resampy` with `soxr`. Resampling a 2:21s
  audio file from 24KHz to 16KHz took 1.41s with `resampy` and 0.031s with
  `soxr` with similar audio quality.

### Other

- Added initial unit test infrastructure.

## [0.0.53] - 2025-01-18

### Added

- Added `ElevenLabsHttpTTSService` which uses EleveLabs' HTTP API instead of the
  websocket one.

- Introduced pipeline frame observers. Observers can view all the frames that go
  through the pipeline without the need to inject processors in the
  pipeline. This can be useful, for example, to implement frame loggers or
  debuggers among other things. The example
  `examples/foundational/30-observer.py` shows how to add an observer to a
  pipeline for debugging.

- Introduced heartbeat frames. The pipeline task can now push periodic
  heartbeats down the pipeline when `enable_heartbeats=True`. Heartbeats are
  system frames that are supposed to make it all the way to the end of the
  pipeline. When a heartbeat frame is received the traversing time (i.e. the
  time it took to go through the whole pipeline) will be displayed (with TRACE
  logging) otherwise a warning will be shown. The example
  `examples/foundational/31-heartbeats.py` shows how to enable heartbeats and
  forces warnings to be displayed.

- Added `LLMTextFrame` and `TTSTextFrame` which should be pushed by LLM and TTS
  services respectively instead of `TextFrame`s.

- Added `OpenRouter` for OpenRouter integration with an OpenAI-compatible
  interface. Added foundational example `14m-function-calling-openrouter.py`.

- Added a new `WebsocketService` based class for TTS services, containing
  base functions and retry logic.

- Added `DeepSeekLLMService` for DeepSeek integration with an OpenAI-compatible
  interface. Added foundational example `14l-function-calling-deepseek.py`.

- Added `FunctionCallResultProperties` dataclass to provide a structured way to
  control function call behavior, including:

  - `run_llm`: Controls whether to trigger LLM completion
  - `on_context_updated`: Optional callback triggered after context update

- Added a new foundational example `07e-interruptible-playht-http.py` for easy
  testing of `PlayHTHttpTTSService`.

- Added support for Google TTS Journey voices in `GoogleTTSService`.

- Added `29-livekit-audio-chat.py`, as a new foundational examples for
  `LiveKitTransportLayer`.

- Added `enable_prejoin_ui`, `max_participants` and `start_video_off` params
  to `DailyRoomProperties`.

- Added `session_timeout` to `FastAPIWebsocketTransport` and
  `WebsocketServerTransport` for configuring session timeouts (in
  seconds). Triggers `on_session_timeout` for custom timeout handling.
  See [examples/websocket-server/bot.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/websocket-server/bot.py).

- Added the new modalities option and helper function to set Gemini output
  modalities.

- Added `examples/foundational/26d-gemini-live-text.py` which is
  using Gemini as TEXT modality and using another TTS provider for TTS process.

### Changed

- Modified `UserIdleProcessor` to start monitoring only after first
  conversation activity (`UserStartedSpeakingFrame` or
  `BotStartedSpeakingFrame`) instead of immediately.

- Modified `OpenAIAssistantContextAggregator` to support controlled completions
  and to emit context update callbacks via `FunctionCallResultProperties`.

- Added `aws_session_token` to the `PollyTTSService`.

- Changed the default model for `PlayHTHttpTTSService` to `Play3.0-mini-http`.

- `api_key`, `aws_access_key_id` and `region` are no longer required parameters
  for the PollyTTSService (AWSTTSService)

- Added `session_timeout` example in `examples/websocket-server/bot.py` to
  handle session timeout event.

- Changed `InputParams` in
  `src/pipecat/services/gemini_multimodal_live/gemini.py` to support different
  modalities.

- Changed `DeepgramSTTService` to send `finalize` event whenever VAD detects
  `UserStoppedSpeakingFrame`. This helps in faster transcriptions and clearing
  the `Deepgram` audio buffer.

### Fixed

- Fixed an issue where `DeepgramSTTService` was not generating metrics using
  pipeline's VAD.

- Fixed `UserIdleProcessor` not properly propagating `EndFrame`s through the
  pipeline.

- Fixed an issue where websocket based TTS services could incorrectly terminate
  their connection due to a retry counter not resetting.

- Fixed a `PipelineTask` issue that would cause a dangling task after stopping
  the pipeline with an `EndFrame`.

- Fixed an import issue for `PlayHTHttpTTSService`.

- Fixed an issue where languages couldn't be used with the `PlayHTHttpTTSService`.

- Fixed an issue where `OpenAIRealtimeBetaLLMService` audio chunks were hitting
  an error when truncating audio content.

- Fixed an issue where setting the voice and model for `RimeHttpTTSService`
  wasn't working.

- Fixed an issue where `IdleFrameProcessor` and `UserIdleProcessor` were getting
  initialized before the start of the pipeline.

## [0.0.52] - 2024-12-24

### Added

- Constructor arguments for GoogleLLMService to directly set tools and tool_config.

- Smart turn detection example (`22d-natural-conversation-gemini-audio.py`) that
  leverages Gemini 2.0 capabilities ().
  (see https://x.com/kwindla/status/1870974144831275410)

- Added `DailyTransport.send_dtmf()` to send dial-out DTMF tones.

- Added `DailyTransport.sip_call_transfer()` to forward SIP and PSTN calls to
  another address or number. For example, transfer a SIP call to a different
  SIP address or transfer a PSTN phone number to a different PSTN phone number.

- Added `DailyTransport.sip_refer()` to transfer incoming SIP/PSTN calls from
  outside Daily to another SIP/PSTN address.

- Added an `auto_mode` input parameter to `ElevenLabsTTSService`. `auto_mode`
  is set to `True` by default. Enabling this setting disables the chunk
  schedule and all buffers, which reduces latency.

- Added `KoalaFilter` which implement on device noise reduction using Koala
  Noise Suppression.
  (see https://picovoice.ai/platform/koala/)

- Added `CerebrasLLMService` for Cerebras integration with an OpenAI-compatible
  interface. Added foundational example `14k-function-calling-cerebras.py`.

- Pipecat now supports Python 3.13. We had a dependency on the `audioop` package
  which was deprecated and now removed on Python 3.13. We are now using
  `audioop-lts` (https://github.com/AbstractUmbra/audioop) to provide the same
  functionality.

- Added timestamped conversation transcript support:

  - New `TranscriptProcessor` factory provides access to user and assistant
    transcript processors.
  - `UserTranscriptProcessor` processes user speech with timestamps from
    transcription.
  - `AssistantTranscriptProcessor` processes assistant responses with LLM
    context timestamps.
  - Messages emitted with ISO 8601 timestamps indicating when they were spoken.
  - Supports all LLM formats (OpenAI, Anthropic, Google) via standard message
    format.
  - New examples: `28a-transcription-processor-openai.py`,
    `28b-transcription-processor-anthropic.py`, and
    `28c-transcription-processor-gemini.py`.

- Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino,
  Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian,
  Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa).

### Changed

- `PlayHTTTSService` uses the new v4 websocket API, which also fixes an issue
  where text inputted to the TTS didn't return audio.

- The default model for `ElevenLabsTTSService` is now `eleven_flash_v2_5`.

- `OpenAIRealtimeBetaLLMService` now takes a `model` parameter in the
  constructor.

- Updated the default model for the `OpenAIRealtimeBetaLLMService`.

- Room expiration (`exp`) in `DailyRoomProperties` is now optional (`None`) by
  default instead of automatically setting a 5-minute expiration time. You must
  explicitly set expiration time if desired.

### Deprecated

- `AWSTTSService` is now deprecated, use `PollyTTSService` instead.

### Fixed

- Fixed token counting in `GoogleLLMService`. Tokens were summed incorrectly
  (double-counted in many cases).

- Fixed an issue that could cause the bot to stop talking if there was a user
  interruption before getting any audio from the TTS service.

- Fixed an issue that would cause `ParallelPipeline` to handle `EndFrame`
  incorrectly causing the main pipeline to not terminate or terminate too early.

- Fixed an audio stuttering issue in `FastPitchTTSService`.

- Fixed a `BaseOutputTransport` issue that was causing non-audio frames being
  processed before the previous audio frames were played. This will allow, for
  example, sending a frame `A` after a `TTSSpeakFrame` and the frame `A` will
  only be pushed downstream after the audio generated from `TTSSpeakFrame` has
  been spoken.

- Fixed a `DeepgramSTTService` issue that was causing language to be passed as
  an object instead of a string resulting in the connection to fail.

## [0.0.51] - 2024-12-16

### Fixed

- Fixed an issue in websocket-based TTS services that was causing infinite
  reconnections (Cartesia, ElevenLabs, PlayHT and LMNT).

## [0.0.50] - 2024-12-11

### Added

- Added `GeminiMultimodalLiveLLMService`. This is an integration for Google's
  Gemini Multimodal Live API, supporting:

  - Real-time audio and video input processing
  - Streaming text responses with TTS
  - Audio transcription for both user and bot speech
  - Function calling
  - System instructions and context management
  - Dynamic parameter updates (temperature, top_p, etc.)

- Added `AudioTranscriber` utility class for handling audio transcription with
  Gemini models.

- Added new context classes for Gemini:

  - `GeminiMultimodalLiveContext`
  - `GeminiMultimodalLiveUserContextAggregator`
  - `GeminiMultimodalLiveAssistantContextAggregator`
  - `GeminiMultimodalLiveContextAggregatorPair`

- Added new foundational examples for `GeminiMultimodalLiveLLMService`:

  - `26-gemini-multimodal-live.py`
  - `26a-gemini-live-transcription.py`
  - `26b-gemini-live-video.py`
  - `26c-gemini-live-video.py`

- Added `SimliVideoService`. This is an integration for Simli AI avatars.
  (see https://www.simli.com)

- Added NVIDIA Riva's `FastPitchTTSService` and `ParakeetSTTService`.
  (see https://www.nvidia.com/en-us/ai-data-science/products/riva/)

- Added `IdentityFilter`. This is the simplest frame filter that lets through
  all incoming frames.

- New `STTMuteStrategy` called `FUNCTION_CALL` which mutes the STT service
  during LLM function calls.

- `DeepgramSTTService` now exposes two event handlers `on_speech_started` and
  `on_utterance_end` that could be used to implement interruptions. See new
  example `examples/foundational/07c-interruptible-deepgram-vad.py`.

- Added `GroqLLMService`, `GrokLLMService`, and `NimLLMService` for Groq, Grok,
  and NVIDIA NIM API integration, with an OpenAI-compatible interface.

- New examples demonstrating function calling with Groq, Grok, Azure OpenAI,
  Fireworks, and NVIDIA NIM: `14f-function-calling-groq.py`,
  `14g-function-calling-grok.py`, `14h-function-calling-azure.py`,
  `14i-function-calling-fireworks.py`, and `14j-function-calling-nvidia.py`.

- In order to obtain the audio stored by the `AudioBufferProcessor` you can now
  also register an `on_audio_data` event handler. The `on_audio_data` handler
  will be called every time `buffer_size` (a new constructor argument) is
  reached. If `buffer_size` is 0 (default) you need to manually get the audio as
  before using `AudioBufferProcessor.merge_audio_buffers()`.

```
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(processor, audio, sample_rate, num_channels):
    await save_audio(audio, sample_rate, num_channels)
```

- Added a new RTVI message called `disconnect-bot`, which when handled pushes
  an `EndFrame` to trigger the pipeline to stop.

### Changed

- `STTMuteFilter` now supports multiple simultaneous muting strategies.

- `XTTSService` language now defaults to `Language.EN`.

- `SoundfileMixer` doesn't resample input files anymore to avoid startup
  delays. The sample rate of the provided sound files now need to match the
  sample rate of the output transport.

- Input frames (audio, image and transport messages) are now system frames. This
  means they are processed immediately by all processors instead of being queued
  internally.

- Expanded the transcriptions.language module to support a superset of
  languages.

- Updated STT and TTS services with language options that match the supported
  languages for each service.

- Updated the `AzureLLMService` to use the `OpenAILLMService`. Updated the
  `api_version` to `2024-09-01-preview`.

- Updated the `FireworksLLMService` to use the `OpenAILLMService`. Updated the
  default model to `accounts/fireworks/models/firefunction-v2`.

- Updated the `simple-chatbot` example to include a Javascript and React client
  example, using RTVI JS and React.

### Removed

- Removed `AppFrame`. This was used as a special user custom frame, but there's
  actually no use case for that.

### Fixed

- Fixed a `ParallelPipeline` issue that would cause system frames to be queued.

- Fixed `FastAPIWebsocketTransport` so it can work with binary data (e.g. using
  the protobuf serializer).

- Fixed an issue in `CartesiaTTSService` that could cause previous audio to be
  received after an interruption.

- Fixed Cartesia, ElevenLabs, LMNT and PlayHT TTS websocket
  reconnection. Before, if an error occurred no reconnection was happening.

- Fixed a `BaseOutputTransport` issue that was causing audio to be discarded
  after an `EndFrame` was received.

- Fixed an issue in `WebsocketServerTransport` and `FastAPIWebsocketTransport`
  that would cause a busy loop when using audio mixer.

- Fixed a `DailyTransport` and `LiveKitTransport` issue where connections were
  being closed in the input transport prematurely. This was causing frames
  queued inside the pipeline being discarded.

- Fixed an issue in `DailyTransport` that would cause some internal callbacks to
  not be executed.

- Fixed an issue where other frames were being processed while a `CancelFrame`
  was being pushed down the pipeline.

- `AudioBufferProcessor` now handles interruptions properly.

- Fixed a `WebsocketServerTransport` issue that would prevent interruptions with
  `TwilioSerializer` from working.

- `DailyTransport.capture_participant_video` now allows capturing user's screen
  share by simply passing `video_source="screenVideo"`.

- Fixed Google Gemini message handling to properly convert appended messages to
  Gemini's required format.

- Fixed an issue with `FireworksLLMService` where chat completions were failing
  by removing the `stream_options` from the chat completion options.

## [0.0.49] - 2024-11-17

### Added

- Added RTVI `on_bot_started` event which is useful in a single turn
  interaction.

- Added `DailyTransport` events `dialin-connected`, `dialin-stopped`,
  `dialin-error` and `dialin-warning`. Needs daily-python >= 0.13.0.

- Added `RimeHttpTTSService` and the `07q-interruptible-rime.py` foundational
  example.

- Added `STTMuteFilter`, a general-purpose processor that combines STT
  muting and interruption control. When active, it prevents both transcription
  and interruptions during bot speech. The processor supports multiple
  strategies: `FIRST_SPEECH` (mute only during bot's first
  speech), `ALWAYS` (mute during all bot speech), or `CUSTOM` (using provided
  callback).

- Added `STTMuteFrame`, a control frame that enables/disables speech
  transcription in STT services.

## [0.0.48] - 2024-11-10 "Antonio release"

### Added

- There's now an input queue in each frame processor. When you call
  `FrameProcessor.push_frame()` this will internally call
  `FrameProcessor.queue_frame()` on the next processor (upstream or downstream)
  and the frame will be internally queued (except system frames). Then, the
  queued frames will get processed. With this input queue it is also possible
  for FrameProcessors to block processing more frames by calling
  `FrameProcessor.pause_processing_frames()`. The way to resume processing
  frames is by calling `FrameProcessor.resume_processing_frames()`.

- Added audio filter `NoisereduceFilter`.

- Introduce input transport audio filters (`BaseAudioFilter`). Audio filters can
  be used to remove background noises before audio is sent to VAD.

- Introduce output transport audio mixers (`BaseAudioMixer`). Output transport
  audio mixers can be used, for example, to add background sounds or any other
  audio mixing functionality before the output audio is actually written to the
  transport.

- Added `GatedOpenAILLMContextAggregator`. This aggregator keeps the last
  received OpenAI LLM context frame and it doesn't let it through until the
  notifier is notified.

- Added `WakeNotifierFilter`. This processor expects a list of frame types and
  will execute a given callback predicate when a frame of any of those type is
  being processed. If the callback returns true the notifier will be notified.

- Added `NullFilter`. A null filter doesn't push any frames upstream or
  downstream. This is usually used to disable one of the pipelines in
  `ParallelPipeline`.

- Added `EventNotifier`. This can be used as a very simple synchronization
  feature between processors.

- Added `TavusVideoService`. This is an integration for Tavus digital twins.
  (see https://www.tavus.io/)

- Added `DailyTransport.update_subscriptions()`. This allows you to have fine
  grained control of what media subscriptions you want for each participant in a
  room.

- Added audio filter `KrispFilter`.

### Changed

- The following `DailyTransport` functions are now `async` which means they need
  to be awaited: `start_dialout`, `stop_dialout`, `start_recording`,
  `stop_recording`, `capture_participant_transcription` and
  `capture_participant_video`.

- Changed default output sample rate to 24000. This changes all TTS service to
  output to 24000 and also the default output transport sample rate. This
  improves audio quality at the cost of some extra bandwidth.

- `AzureTTSService` now uses Azure websockets instead of HTTP requests.

- The previous `AzureTTSService` HTTP implementation is now
  `AzureHttpTTSService`.

### Fixed

- Websocket transports (FastAPI and Websocket) now synchronize with time before
  sending data. This allows for interruptions to just work out of the box.

- Improved bot speaking detection for all TTS services by using actual bot
  audio.

- Fixed an issue that was generating constant bot started/stopped speaking
  frames for HTTP TTS services.

- Fixed an issue that was causing stuttering with AWS TTS service.

- Fixed an issue with PlayHTTTSService, where the TTFB metrics were reporting
  very small time values.

- Fixed an issue where AzureTTSService wasn't initializing the specified
  language.

### Other

- Add `23-bot-background-sound.py` foundational example.

- Added a new foundational example `22-natural-conversation.py`. This example
  shows how to achieve a more natural conversation detecting when the user ends
  statement.

## [0.0.47] - 2024-10-22

### Added

- Added `AssemblyAISTTService` and corresponding foundational examples
  `07o-interruptible-assemblyai.py` and `13d-assemblyai-transcription.py`.

- Added a foundational example for Gladia transcription:
  `13c-gladia-transcription.py`

### Changed

- Updated `GladiaSTTService` to use the V2 API.

- Changed `DailyTransport` transcription model to `nova-2-general`.

### Fixed

- Fixed an issue that would cause an import error when importing
  `SileroVADAnalyzer` from the old package `pipecat.vad.silero`.

- Fixed `enable_usage_metrics` to control LLM/TTS usage metrics separately
  from `enable_metrics`.

## [0.0.46] - 2024-10-19

### Added

- Added `audio_passthrough` parameter to `STTService`. If enabled it allows
  audio frames to be pushed downstream in case other processors need them.

- Added input parameter options for `PlayHTTTSService` and
  `PlayHTHttpTTSService`.

### Changed

- Changed `DeepgramSTTService` model to `nova-2-general`.

- Moved `SileroVAD` audio processor to `processors.audio.vad`.

- Module `utils.audio` is now `audio.utils`. A new `resample_audio` function has
  been added.

- `PlayHTTTSService` now uses PlayHT websockets instead of HTTP requests.

- The previous `PlayHTTTSService` HTTP implementation is now
  `PlayHTHttpTTSService`.

- `PlayHTTTSService` and `PlayHTHttpTTSService` now use a `voice_engine` of
  `PlayHT3.0-mini`, which allows for multi-lingual support.

- Renamed `OpenAILLMServiceRealtimeBeta` to `OpenAIRealtimeBetaLLMService` to
  match other services.

### Deprecated

- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` are
  mostly deprecated, use `OpenAILLMContext` instead.

- The `vad` package is now deprecated and `audio.vad` should be used
  instead. The `avd` package will get removed in a future release.

### Fixed

- Fixed an issue that would cause an error if no VAD analyzer was passed to
  `LiveKitTransport` params.

- Fixed `SileroVAD` processor to support interruptions properly.

### Other

- Added `examples/foundational/07-interruptible-vad.py`. This is the same as
  `07-interruptible.py` but using the `SileroVAD` processor instead of passing
  the `VADAnalyzer` in the transport.

## [0.0.45] - 2024-10-16

### Changed

- Metrics messages have moved out from the transport's base output into RTVI.

## [0.0.44] - 2024-10-15

### Added

- Added support for OpenAI Realtime API with the new
  `OpenAILLMServiceRealtimeBeta` processor.
  (see https://platform.openai.com/docs/guides/realtime/overview)

- Added `RTVIBotTranscriptionProcessor` which will send the RTVI
  `bot-transcription` protocol message. These are TTS text aggregated (into
  sentences) messages.

- Added new input params to the `MarkdownTextFilter` utility. You can set
  `filter_code` to filter code from text and `filter_tables` to filter tables
  from text.

- Added `CanonicalMetricsService`. This processor uses the new
  `AudioBufferProcessor` to capture conversation audio and later send it to
  Canonical AI.
  (see https://canonical.chat/)

- Added `AudioBufferProcessor`. This processor can be used to buffer mixed user and
  bot audio. This can later be saved into an audio file or processed by some
  audio analyzer.

- Added `on_first_participant_joined` event to `LiveKitTransport`.

### Changed

- LLM text responses are now logged properly as unicode characters.

- `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`,
  `BotStartedSpeakingFrame`, `BotStoppedSpeakingFrame`, `BotSpeakingFrame` and
  `UserImageRequestFrame` are now based from `SystemFrame`

### Fixed

- Merge `RTVIBotLLMProcessor`/`RTVIBotLLMTextProcessor` and
  `RTVIBotTTSProcessor`/`RTVIBotTTSTextProcessor` to avoid out of order issues.

- Fixed an issue in RTVI protocol that could cause a `bot-llm-stopped` or
  `bot-tts-stopped` message to be sent before a `bot-llm-text` or `bot-tts-text`
  message.

- Fixed `DeepgramSTTService` constructor settings not being merged with default
  ones.

- Fixed an issue in Daily transport that would cause tasks to be hanging if
  urgent transport messages were being sent from a transport event handler.

- Fixed an issue in `BaseOutputTransport` that would cause `EndFrame` to be
  pushed downed too early and call `FrameProcessor.cleanup()` before letting the
  transport stop properly.

## [0.0.43] - 2024-10-10

### Added

- Added a new util called `MarkdownTextFilter` which is a subclass of a new
  base class called `BaseTextFilter`. This is a configurable utility which
  is intended to filter text received by TTS services.

- Added new `RTVIUserLLMTextProcessor`. This processor will send an RTVI
  `user-llm-text` message with the user content's that was sent to the LLM.

### Changed

- `TransportMessageFrame` doesn't have an `urgent` field anymore, instead
  there's now a `TransportMessageUrgentFrame` which is a `SystemFrame` and
  therefore skip all internal queuing.

- For TTS services, convert inputted languages to match each service's language
  format

### Fixed

- Fixed an issue where changing a language with the Deepgram STT service
  wouldn't apply the change. This was fixed by disconnecting and reconnecting
  when the language changes.

## [0.0.42] - 2024-10-02

### Added

- `SentryMetrics` has been added to report frame processor metrics to
  Sentry. This is now possible because `FrameProcessorMetrics` can now be passed
  to `FrameProcessor`.

- Added Google TTS service and corresponding foundational example
  `07n-interruptible-google.py`

- Added AWS Polly TTS support and `07m-interruptible-aws.py` as an example.

- Added InputParams to Azure TTS service.

- Added `LivekitTransport` (audio-only for now).

- RTVI 0.2.0 is now supported.

- All `FrameProcessors` can now register event handlers.

```
tts = SomeTTSService(...)

@tts.event_handler("on_connected"):
async def on_connected(processor):
  ...
```

- Added `AsyncGeneratorProcessor`. This processor can be used together with a
  `FrameSerializer` as an async generator. It provides a `generator()` function
  that returns an `AsyncGenerator` and that yields serialized frames.

- Added `EndTaskFrame` and `CancelTaskFrame`. These are new frames that are
  meant to be pushed upstream to tell the pipeline task to stop nicely or
  immediately respectively.

- Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed)
  for OpenAI, Anthropic, and Together AI services along with corresponding
  setter functions.

- Added `sample_rate` as a constructor parameter for TTS services.

- Pipecat has a pipeline-based architecture. The pipeline consists of frame
  processors linked to each other. The elements traveling across the pipeline
  are called frames.

  To have a deterministic behavior the frames traveling through the pipeline
  should always be ordered, except system frames which are out-of-band
  frames. To achieve that, each frame processor should only output frames from a
  single task.

  In this version all the frame processors have their own task to push
  frames. That is, when `push_frame()` is called the given frame will be put
  into an internal queue (with the exception of system frames) and a frame
  processor task will push it out.

- Added pipeline clocks. A pipeline clock is used by the output transport to
  know when a frame needs to be presented. For that, all frames now have an
  optional `pts` field (prensentation timestamp). There's currently just one
  clock implementation `SystemClock` and the `pts` field is currently only used
  for `TextFrame`s (audio and image frames will be next).

- A clock can now be specified to `PipelineTask` (defaults to
  `SystemClock`). This clock will be passed to each frame processor via the
  `StartFrame`.

- Added `CartesiaHttpTTSService`.

- `DailyTransport` now supports setting the audio bitrate to improve audio
  quality through the `DailyParams.audio_out_bitrate` parameter. The new
  default is 96kbps.

- `DailyTransport` now uses the number of audio output channels (1 or 2) to set
  mono or stereo audio when needed.

- Interruptions support has been added to `TwilioFrameSerializer` when using
  `FastAPIWebsocketTransport`.

- Added new `LmntTTSService` text-to-speech service.
  (see https://www.lmnt.com/)

- Added `TTSModelUpdateFrame`, `TTSLanguageUpdateFrame`, `STTModelUpdateFrame`,
  and `STTLanguageUpdateFrame` frames to allow you to switch models, language
  and voices in TTS and STT services.

- Added new `transcriptions.Language` enum.

### Changed

- Context frames are now pushed downstream from assistant context aggregators.

- Removed Silero VAD torch dependency.

- Updated individual update settings frame classes into a single
  `ServiceUpdateSettingsFrame` class.

- We now distinguish between input and output audio and image frames. We
  introduce `InputAudioRawFrame`, `OutputAudioRawFrame`, `InputImageRawFrame`
  and `OutputImageRawFrame` (and other subclasses of those). The input frames
  usually come from an input transport and are meant to be processed inside the
  pipeline to generate new frames. However, the input frames will not be sent
  through an output transport. The output frames can also be processed by any
  frame processor in the pipeline and they are allowed to be sent by the output
  transport.

- `ParallelTask` has been renamed to `SyncParallelPipeline`. A
  `SyncParallelPipeline` is a frame processor that contains a list of different
  pipelines to be executed concurrently. The difference between a
  `SyncParallelPipeline` and a `ParallelPipeline` is that, given an input frame,
  the `SyncParallelPipeline` will wait for all the internal pipelines to
  complete. This is achieved by making sure the last processor in each of the
  pipelines is synchronous (e.g. an HTTP-based service that waits for the
  response).

- `StartFrame` is back a system frame to make sure it's processed immediately by
  all processors. `EndFrame` stays a control frame since it needs to be ordered
  allowing the frames in the pipeline to be processed.

- Updated `MoondreamService` revision to `2024-08-26`.

- `CartesiaTTSService` and `ElevenLabsTTSService` now add presentation
  timestamps to their text output. This allows the output transport to push the
  text frames downstream at almost the same time the words are spoken. We say
  "almost" because currently the audio frames don't have presentation timestamp
  but they should be played at roughly the same time.

- `DailyTransport.on_joined` event now returns the full session data instead of
  just the participant.

- `CartesiaTTSService` is now a subclass of `TTSService`.

- `DeepgramSTTService` is now a subclass of `STTService`.

- `WhisperSTTService` is now a subclass of `SegmentedSTTService`. A
  `SegmentedSTTService` is a `STTService` where the provided audio is given in a
  big chunk (i.e. from when the user starts speaking until the user stops
  speaking) instead of a continous stream.

### Fixed

- Fixed OpenAI multiple function calls.

- Fixed a Cartesia TTS issue that would cause audio to be truncated in some
  cases.

- Fixed a `BaseOutputTransport` issue that would stop audio and video rendering
  tasks (after receiving and `EndFrame`) before the internal queue was emptied,
  causing the pipeline to finish prematurely.

- `StartFrame` should be the first frame every processor receives to avoid
  situations where things are not initialized (because initialization happens on
  `StartFrame`) and other frames come in resulting in undesired behavior.

### Performance

- `obj_id()` and `obj_count()` now use `itertools.count` avoiding the need of
  `threading.Lock`.

### Other

- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).

## [0.0.41] - 2024-08-22

### Added

- Added `LivekitFrameSerializer` audio frame serializer.

### Fixed

- Fix `FastAPIWebsocketOutputTransport` variable name clash with subclass.

- Fix an `AnthropicLLMService` issue with empty arguments in function calling.

### Other

- Fixed `studypal` example errors.

## [0.0.40] - 2024-08-20

### Added

- VAD parameters can now be dynamicallt updated using the
  `VADParamsUpdateFrame`.

- `ErrorFrame` has now a `fatal` field to indicate the bot should exit if a
  fatal error is pushed upstream (false by default). A new `FatalErrorFrame`
  that sets this flag to true has been added.

- `AnthropicLLMService` now supports function calling and initial support for
  prompt caching.
  (see https://www.anthropic.com/news/prompt-caching)

- `ElevenLabsTTSService` can now specify ElevenLabs input parameters such as
  `output_format`.

- `TwilioFrameSerializer` can now specify Twilio's and Pipecat's desired sample
  rates to use.

- Added new `on_participant_updated` event to `DailyTransport`.

- Added `DailyRESTHelper.delete_room_by_name()` and
  `DailyRESTHelper.delete_room_by_url()`.

- Added LLM and TTS usage metrics. Those are enabled when
  `PipelineParams.enable_usage_metrics` is True.

- `AudioRawFrame`s are now pushed downstream from the base output
  transport. This allows capturing the exact words the bot says by adding an STT
  service at the end of the pipeline.

- Added new `GStreamerPipelineSource`. This processor can generate image or
  audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP
  stream or anything supported by GStreamer).

- Added `TransportParams.audio_out_is_live`. This flag is False by default and
  it is useful to indicate we should not synchronize audio with sporadic images.

- Added new `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame` control
  frames. These frames are pushed upstream and they should wrap
  `BotSpeakingFrame`.

- Transports now allow you to register event handlers without decorators.

### Changed

- Support RTVI message protocol 0.1. This includes new messages, support for
  messages responses, support for actions, configuration, webhooks and a bunch
  of new cool stuff.
  (see https://docs.rtvi.ai/)

- `SileroVAD` dependency is now imported via pip's `silero-vad` package.

- `ElevenLabsTTSService` now uses `eleven_turbo_v2_5` model by default.

- `BotSpeakingFrame` is now a control frame.

- `StartFrame` is now a control frame similar to `EndFrame`.

- `DeepgramTTSService` now is more customizable. You can adjust the encoding and
  sample rate.

### Fixed

- `TTSStartFrame` and `TTSStopFrame` are now sent when TTS really starts and
  stops. This allows for knowing when the bot starts and stops speaking even
  with asynchronous services (like Cartesia).

- Fixed `AzureSTTService` transcription frame timestamps.

- Fixed an issue with `DailyRESTHelper.create_room()` expirations which would
  cause this function to stop working after the initial expiration elapsed.

- Improved `EndFrame` and `CancelFrame` handling. `EndFrame` should end things
  gracefully while a `CancelFrame` should cancel all running tasks as soon as
  possible.

- Fixed an issue in `AIService` that would cause a yielded `None` value to be
  processed.

- RTVI's `bot-ready` message is now sent when the RTVI pipeline is ready and
  a first participant joins.

- Fixed a `BaseInputTransport` issue that was causing incoming system frames to
  be queued instead of being pushed immediately.

- Fixed a `BaseInputTransport` issue that was causing start/stop interruptions
  incoming frames to not cancel tasks and be processed properly.

### Other

- Added `studypal` example (from to the Cartesia folks!).

- Most examples now use Cartesia.

- Added examples `foundational/19a-tools-anthropic.py`,
  `foundational/19b-tools-video-anthropic.py` and
  `foundational/19a-tools-togetherai.py`.

- Added examples `foundational/18-gstreamer-filesrc.py` and
  `foundational/18a-gstreamer-videotestsrc.py` that show how to use
  `GStreamerPipelineSource`

- Remove `requests` library usage.

- Cleanup examples and use `DailyRESTHelper`.

## [0.0.39] - 2024-07-23

### Fixed

- Fixed a regression introduced in 0.0.38 that would cause Daily transcription
  to stop the Pipeline.

## [0.0.38] - 2024-07-23

### Added

- Added `force_reload`, `skip_validation` and `trust_repo` to `SileroVAD` and
  `SileroVADAnalyzer`. This allows caching and various GitHub repo validations.

- Added `send_initial_empty_metrics` flag to `PipelineParams` to request for
  initial empty metrics (zero values). True by default.

### Fixed

- Fixed initial metrics format. It was using the wrong keys name/time instead of
  processor/value.

- STT services should be using ISO 8601 time format for transcription frames.

- Fixed an issue that would cause Daily transport to show a stop transcription
  error when actually none occurred.

## [0.0.37] - 2024-07-22

### Added

- Added `RTVIProcessor` which implements the RTVI-AI standard.
  See https://github.com/rtvi-ai

- Added `BotInterruptionFrame` which allows interrupting the bot while talking.

- Added `LLMMessagesAppendFrame` which allows appending messages to the current
  LLM context.

- Added `LLMMessagesUpdateFrame` which allows changing the LLM context for the
  one provided in this new frame.

- Added `LLMModelUpdateFrame` which allows updating the LLM model.

- Added `TTSSpeakFrame` which causes the bot say some text. This text will not
  be part of the LLM context.

- Added `TTSVoiceUpdateFrame` which allows updating the TTS voice.

### Removed

- We remove the `LLMResponseStartFrame` and `LLMResponseEndFrame` frames. These
  were added in the past to properly handle interruptions for the
  `LLMAssistantContextAggregator`. But the `LLMContextAggregator` is now based
  on `LLMResponseAggregator` which handles interruptions properly by just
  processing the `StartInterruptionFrame`, so there's no need for these extra
  frames any more.

### Fixed

- Fixed an issue with `StatelessTextTransformer` where it was pushing a string
  instead of a `TextFrame`.

- `TTSService` end of sentence detection has been improved. It now works with
  acronyms, numbers, hours and others.

- Fixed an issue in `TTSService` that would not properly flush the current
  aggregated sentence if an `LLMFullResponseEndFrame` was found.

### Performance

- `CartesiaTTSService` now uses websockets which improves speed. It also
  leverages the new Cartesia contexts which maintains generated audio prosody
  when multiple inputs are sent, therefore improving audio quality a lot.

## [0.0.36] - 2024-07-02

### Added

- Added `GladiaSTTService`.
  See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition

- Added `XTTSService`. This is a local Text-To-Speech service.
  See https://github.com/coqui-ai/TTS

- Added `UserIdleProcessor`. This processor can be used to wait for any
  interaction with the user. If the user doesn't say anything within a given
  timeout a provided callback is called.

- Added `IdleFrameProcessor`. This processor can be used to wait for frames
  within a given timeout. If no frame is received within the timeout a provided
  callback is called.

- Added new frame `BotSpeakingFrame`. This frame will be continuously pushed
  upstream while the bot is talking.

- It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer`
  or `SileroVAD`.

- Added `AysncFrameProcessor` and `AsyncAIService`. Some services like
  `DeepgramSTTService` need to process things asynchronously. For example, audio
  is sent to Deepgram but transcriptions are not returned immediately. In these
  cases we still require all frames (except system frames) to be pushed
  downstream from a single task. That's what `AsyncFrameProcessor` is for. It
  creates a task and all frames should be pushed from that task. So, whenever a
  new Deepgram transcription is ready that transcription will also be pushed
  from this internal task.

- The `MetricsFrame` now includes processing metrics if metrics are enabled. The
  processing metrics indicate the time a processor needs to generate all its
  output. Note that not all processors generate these kind of metrics.

### Changed

- `WhisperSTTService` model can now also be a string.

- Added missing \* keyword separators in services.

### Fixed

- `WebsocketServerTransport` doesn't try to send frames anymore if serializers
  returns `None`.

- Fixed an issue where exceptions that occurred inside frame processors were
  being swallowed and not displayed.

- Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send
  data to the websocket after being closed.

### Other

- Added Fly.io deployment example in `examples/deployment/flyio-example`.

- Added new `17-detect-user-idle.py` example that shows how to use the new
  `UserIdleProcessor`.

## [0.0.35] - 2024-06-28

### Changed

- `FastAPIWebsocketParams` now require a serializer.

- `TwilioFrameSerializer` now requires a `streamSid`.

### Fixed

- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for
  8000 sample rate.

## [0.0.34] - 2024-06-25

### Fixed

- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
  interruptions to ignore transcriptions.

- Fixed an issue introduced in 0.0.33 that would cause the LLM to generate
  shorter output.

## [0.0.33] - 2024-06-25

### Changed

- Upgraded to Cartesia's new Python library 1.0.0. `CartesiaTTSService` now
  expects a voice ID instead of a voice name (you can get the voice ID from
  Cartesia's playground). You can also specify the audio `sample_rate` and
  `encoding` instead of the previous `output_format`.

### Fixed

- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
  cause static audio issues and interruptions to not work properly when dealing
  with multiple LLMs sentences.

- Fixed an issue that could mix new LLM responses with previous ones when
  handling interruptions.

- Fixed a Daily transport blocking situation that occurred while reading audio
  frames after a participant left the room. Needs daily-python >= 0.10.1.

## [0.0.32] - 2024-06-22

### Added

- Allow specifying a `DeepgramSTTService` url which allows using on-prem
  Deepgram.

- Added new `FastAPIWebsocketTransport`. This is a new websocket transport that
  can be integrated with FastAPI websockets.

- Added new `TwilioFrameSerializer`. This is a new serializer that knows how to
  serialize and deserialize audio frames from Twilio.

- Added Daily transport event: `on_dialout_answered`. See
  https://reference-python.daily.co/api_reference.html#daily.EventHandler

- Added new `AzureSTTService`. This allows you to use Azure Speech-To-Text.

### Performance

- Convert `BaseOutputTransport` and `BaseOutputTransport` to fully use asyncio
  and remove the use of threads.

### Other

- Added `twilio-chatbot`. This is an example that shows how to integrate Twilio
  phone numbers with a Pipecat bot.

- Updated `07f-interruptible-azure.py` to use `AzureLLMService`,
  `AzureSTTService` and `AzureTTSService`.

## [0.0.31] - 2024-06-13

### Performance

- Break long audio frames into 20ms chunks instead of 10ms.

## [0.0.30] - 2024-06-13

### Added

- Added `report_only_initial_ttfb` to `PipelineParams`. This will make it so
  only the initial TTFB metrics after the user stops talking are reported.

- Added `OpenPipeLLMService`. This service will let you run OpenAI through
  OpenPipe's SDK.

- Allow specifying frame processors' name through a new `name` constructor
  argument.

- Added `DeepgramSTTService`. This service has an ongoing websocket
  connection. To handle this, it subclasses `AIService` instead of
  `STTService`. The output of this service will be pushed from the same task,
  except system frames like `StartFrame`, `CancelFrame` or
  `StartInterruptionFrame`.

### Changed

- `FrameSerializer.deserialize()` can now return `None` in case it is not
  possible to desearialize the given data.

- `daily_rest.DailyRoomProperties` now allows extra unknown parameters.

### Fixed

- Fixed an issue where `DailyRoomProperties.exp` always had the same old
  timestamp unless set by the user.

- Fixed a couple of issues with `WebsocketServerTransport`. It needed to use
  `push_audio_frame()` and also VAD was not working properly.

- Fixed an issue that would cause LLM aggregator to fail with small
  `VADParams.stop_secs` values.

- Fixed an issue where `BaseOutputTransport` would send longer audio frames
  preventing interruptions.

### Other

- Added new `07h-interruptible-openpipe.py` example. This example shows how to
  use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe.

- Added new `dialin-chatbot` example. This examples shows how to call the bot
  using a phone number.

## [0.0.29] - 2024-06-07

### Added

- Added a new `FunctionFilter`. This filter will let you filter frames based on
  a given function, except system messages which should never be filtered.

- Added `FrameProcessor.can_generate_metrics()` method to indicate if a
  processor can generate metrics. In the future this might get an extra argument
  to ask for a specific type of metric.

- Added `BasePipeline`. All pipeline classes should be based on this class. All
  subclasses should implement a `processors_with_metrics()` method that returns
  a list of all `FrameProcessor`s in the pipeline that can generate metrics.

- Added `enable_metrics` to `PipelineParams`.

- Added `MetricsFrame`. The `MetricsFrame` will report different metrics in the
  system. Right now, it can report TTFB (Time To First Byte) values for
  different services, that is the time spent between the arrival of a `Frame` to
  the processor/service until the first `DataFrame` is pushed downstream. If
  metrics are enabled an intial `MetricsFrame` with all the services in the
  pipeline will be sent.

- Added TTFB metrics and debug logging for TTS services.

### Changed

- Moved `ParallelTask` to `pipecat.pipeline.parallel_task`.

### Fixed

- Fixed PlayHT TTS service to work properly async.

## [0.0.28] - 2024-06-05

### Fixed

- Fixed an issue with `SileroVADAnalyzer` that would cause memory to keep
  growing indefinitely.

## [0.0.27] - 2024-06-05

### Added

- Added `DailyTransport.participants()` and `DailyTransport.participant_counts()`.

## [0.0.26] - 2024-06-05

### Added

- Added `OpenAITTSService`.

- Allow passing `output_format` and `model_id` to `CartesiaTTSService` to change
  audio sample format and the model to use.

- Added `DailyRESTHelper` which helps you create Daily rooms and tokens in an
  easy way.

- `PipelineTask` now has a `has_finished()` method to indicate if the task has
  completed. If a task is never ran `has_finished()` will return False.

- `PipelineRunner` now supports SIGTERM. If received, the runner will be
  cancelled.

### Fixed

- Fixed an issue where `BaseInputTransport` and `BaseOutputTransport` where
  stopping push tasks before pushing `EndFrame` frames could cause the bots to
  get stuck.

- Fixed an error closing local audio transports.

- Fixed an issue with Deepgram TTS that was introduced in the previous release.

- Fixed `AnthropicLLMService` interruptions. If an interruption occurred, a
  `user` message could be appended after the previous `user` message. Anthropic
  does not allow that because it requires alternate `user` and `assistant`
  messages.

### Performance

- The `BaseInputTransport` does not pull audio frames from sub-classes any
  more. Instead, sub-classes now push audio frames into a queue in the base
  class. Also, `DailyInputTransport` now pushes audio frames every 20ms instead
  of 10ms.

- Remove redundant camera input thread from `DailyInputTransport`. This should
  improve performance a little bit when processing participant videos.

- Load Cartesia voice on startup.

## [0.0.25] - 2024-05-31

### Added

- Added WebsocketServerTransport. This will create a websocket server and will
  read messages coming from a client. The messages are serialized/deserialized
  with protobufs. See `examples/websocket-server` for a detailed example.

- Added function calling (LLMService.register_function()). This will allow the
  LLM to call functions you have registered when needed. For example, if you
  register a function to get the weather in Los Angeles and ask the LLM about
  the weather in Los Angeles, the LLM will call your function.
  See https://platform.openai.com/docs/guides/function-calling

- Added new `LangchainProcessor`.

- Added Cartesia TTS support (https://cartesia.ai/)

### Fixed

- Fixed SileroVAD frame processor.

- Fixed an issue where `camera_out_enabled` would cause the highg CPU usage if
  no image was provided.

### Performance

- Removed unnecessary audio input tasks.

## [0.0.24] - 2024-05-29

### Added

- Exposed `on_dialin_ready` for Daily transport SIP endpoint handling. This
  notifies when the Daily room SIP endpoints are ready. This allows integrating
  with third-party services like Twilio.

- Exposed Daily transport `on_app_message` event.

- Added Daily transport `on_call_state_updated` event.

- Added Daily transport `start_recording()`, `stop_recording` and
  `stop_dialout`.

### Changed

- Added `PipelineParams`. This replaces the `allow_interruptions` argument in
  `PipelineTask` and will allow future parameters in the future.

- Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.

- GoogleLLMService `api_key` argument is now mandatory.

### Fixed

- Daily tranport `dialin-ready` doesn't not block anymore and it now handles
  timeouts.

- Fixed AzureLLMService.

## [0.0.23] - 2024-05-23

### Fixed

- Fixed an issue handling Daily transport `dialin-ready` event.

## [0.0.22] - 2024-05-23

### Added

- Added Daily transport `start_dialout()` to be able to make phone or SIP calls.
  See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout

- Added Daily transport support for dial-in use cases.

- Added Daily transport events: `on_dialout_connected`, `on_dialout_stopped`,
  `on_dialout_error` and `on_dialout_warning`. See
  https://reference-python.daily.co/api_reference.html#daily.EventHandler

## [0.0.21] - 2024-05-22

### Added

- Added vision support to Anthropic service.

- Added `WakeCheckFilter` which allows you to pass information downstream only
  if you say a certain phrase/word.

### Changed

- `FrameSerializer.serialize()` and `FrameSerializer.deserialize()` are now
  `async`.

- `Filter` has been renamed to `FrameFilter` and it's now under
  `processors/filters`.

### Fixed

- Fixed Anthropic service to use new frame types.

- Fixed an issue in `LLMUserResponseAggregator` and `UserResponseAggregator`
  that would cause frames after a brief pause to not be pushed to the LLM.

- Clear the audio output buffer if we are interrupted.

- Re-add exponential smoothing after volume calculation. This makes sure the
  volume value being used doesn't fluctuate so much.

## [0.0.20] - 2024-05-22

### Added

- In order to improve interruptions we now compute a loudness level using
  [pyloudnorm](https://github.com/csteinmetz1/pyloudnorm). The audio coming
  WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm
  applied to the signal, however we don't do that on our local PyAudio
  signals. This means that currently incoming audio from PyAudio is kind of
  broken. We will fix it in future releases.

### Fixed

- Fixed an issue where `StartInterruptionFrame` would cause
  `LLMUserResponseAggregator` to push the accumulated text causing the LLM
  respond in the wrong task. The `StartInterruptionFrame` should not trigger any
  new LLM response because that would be spoken in a different task.

- Fixed an issue where tasks and threads could be paused because the executor
  didn't have more tasks available. This was causing issues when cancelling and
  recreating tasks during interruptions.

## [0.0.19] - 2024-05-20

### Changed

- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` internal
  messages are now exposed through the `messages` property.

### Fixed

- Fixed an issue where `LLMAssistantResponseAggregator` was not accumulating the
  full response but short sentences instead. If there's an interruption we only
  accumulate what the bot has spoken until now in a long response as well.

## [0.0.18] - 2024-05-20

### Fixed

- Fixed an issue in `DailyOuputTransport` where transport messages were not
  being sent.

## [0.0.17] - 2024-05-19

### Added

- Added `google.generativeai` model support, including vision. This new `google`
  service defaults to using `gemini-1.5-flash-latest`. Example in
  `examples/foundational/12a-describe-video-gemini-flash.py`.

- Added vision support to `openai` service. Example in
  `examples/foundational/12a-describe-video-gemini-flash.py`.

- Added initial interruptions support. The assistant contexts (or aggregators)
  should now be placed after the output transport. This way, only the completed
  spoken context is added to the assistant context.

- Added `VADParams` so you can control voice confidence level and others.

- `VADAnalyzer` now uses an exponential smoothed volume to improve speech
  detection. This is useful when voice confidence is high (because there's
  someone talking near you) but volume is low.

### Fixed

- Fixed an issue where TTSService was not pushing TextFrames downstream.

- Fixed issues with Ctrl-C program termination.

- Fixed an issue that was causing `StopTaskFrame` to actually not exit the
  `PipelineTask`.

## [0.0.16] - 2024-05-16

### Fixed

- `DailyTransport`: don't publish camera and audio tracks if not enabled.

- Fixed an issue in `BaseInputTransport` that was causing frames pushed
  downstream not pushed in the right order.

## [0.0.15] - 2024-05-15

### Fixed

- Quick hot fix for receiving `DailyTransportMessage`.

## [0.0.14] - 2024-05-15

### Added

- Added `DailyTransport` event `on_participant_left`.

- Added support for receiving `DailyTransportMessage`.

### Fixed

- Images are now resized to the size of the output camera. This was causing
  images not being displayed.

- Fixed an issue in `DailyTransport` that would not allow the input processor to
  shutdown if no participant ever joined the room.

- Fixed base transports start and stop. In some situation processors would halt
  or not shutdown properly.

## [0.0.13] - 2024-05-14

### Changed

- `MoondreamService` argument `model_id` is now `model`.

- `VADAnalyzer` arguments have been renamed for more clarity.

### Fixed

- Fixed an issue with `DailyInputTransport` and `DailyOutputTransport` that
  could cause some threads to not start properly.

- Fixed `STTService`. Add `max_silence_secs` and `max_buffer_secs` to handle
  better what's being passed to the STT service. Also add exponential smoothing
  to the RMS.

- Fixed `WhisperSTTService`. Add `no_speech_prob` to avoid garbage output text.

## [0.0.12] - 2024-05-14

### Added

- Added `DailyTranscriptionSettings` to be able to specify transcription
  settings much easier (e.g. language).

### Other

- Updated `simple-chatbot` with Spanish.

- Add missing dependencies in some of the examples.

## [0.0.11] - 2024-05-13

### Added

- Allow stopping pipeline tasks with new `StopTaskFrame`.

### Changed

- TTS, STT and image generation service now use `AsyncGenerator`.

### Fixed

- `DailyTransport`: allow registering for participant transcriptions even if
  input transport is not initialized yet.

### Other

- Updated `storytelling-chatbot`.

## [0.0.10] - 2024-05-13

### Added

- Added Intel GPU support to `MoondreamService`.

- Added support for sending transport messages (e.g. to communicate with an app
  at the other end of the transport).

- Added `FrameProcessor.push_error()` to easily send an `ErrorFrame` upstream.

### Fixed

- Fixed Azure services (TTS and image generation).

### Other

- Updated `simple-chatbot`, `moondream-chatbot` and `translation-chatbot`
  examples.

## [0.0.9] - 2024-05-12

### Changed

Many things have changed in this version. Many of the main ideas such as frames,
processors, services and transports are still there but some things have changed
a bit.

- `Frame`s describe the basic units for processing. For example, text, image or
  audio frames. Or control frames to indicate a user has started or stopped
  speaking.

- `FrameProcessor`s process frames (e.g. they convert a `TextFrame` to an
  `ImageRawFrame`) and push new frames downstream or upstream to their linked
  peers.

- `FrameProcessor`s can be linked together. The easiest wait is to use the
  `Pipeline` which is a container for processors. Linking processors allow
  frames to travel upstream or downstream easily.

- `Transport`s are a way to send or receive frames. There can be local
  transports (e.g. local audio or native apps), network transports
  (e.g. websocket) or service transports (e.g. https://daily.co).

- `Pipeline`s are just a processor container for other processors.

- A `PipelineTask` know how to run a pipeline.

- A `PipelineRunner` can run one or more tasks and it is also used, for example,
  to capture Ctrl-C from the user.

## [0.0.8] - 2024-04-11

### Added

- Added `FireworksLLMService`.

- Added `InterimTranscriptionFrame` and enable interim results in
  `DailyTransport` transcriptions.

### Changed

- `FalImageGenService` now uses new `fal_client` package.

### Fixed

- `FalImageGenService`: use `asyncio.to_thread` to not block main loop when
  generating images.

- Allow `TranscriptionFrame` after an end frame (transcriptions can be delayed
  and received after `UserStoppedSpeakingFrame`).

## [0.0.7] - 2024-04-10

### Added

- Add `use_cpu` argument to `MoondreamService`.

## [0.0.6] - 2024-04-10

### Added

- Added `FalImageGenService.InputParams`.

- Added `URLImageFrame` and `UserImageFrame`.

- Added `UserImageRequestFrame` and allow requesting an image from a participant.

- Added base `VisionService` and `MoondreamService`

### Changed

- Don't pass `image_size` to `ImageGenService`, images should have their own size.

- `ImageFrame` now receives a tuple`(width,height)` to specify the size.

- `on_first_other_participant_joined` now gets a participant argument.

### Fixed

- Check if camera, speaker and microphone are enabled before writing to them.

### Performance

- `DailyTransport` only subscribe to desired participant video track.

## [0.0.5] - 2024-04-06

### Changed

- Use `camera_bitrate` and `camera_framerate`.

- Increase `camera_framerate` to 30 by default.

### Fixed

- Fixed `LocalTransport.read_audio_frames`.

## [0.0.4] - 2024-04-04

### Added

- Added project optional dependencies `[silero,openai,...]`.

### Changed

- Moved thransports to its own directory.

- Use `OPENAI_API_KEY` instead of `OPENAI_CHATGPT_API_KEY`.

### Fixed

- Don't write to microphone/speaker if not enabled.

### Other

- Added live translation example.

- Fix foundational examples.

## [0.0.3] - 2024-03-13

### Other

- Added `storybot` and `chatbot` examples.

## [0.0.2] - 2024-03-12

Initial public release.