Merge pull request #3906 from pipecat-ai/changelog-0.0.104
Release 0.0.104 - Changelog Update
This commit is contained in:
383
CHANGELOG.md
383
CHANGELOG.md
@@ -7,6 +7,389 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
|
||||
<!-- towncrier release notes start -->
|
||||
|
||||
## [0.0.104] - 2026-03-02
|
||||
|
||||
### Added
|
||||
|
||||
- Added `TextAggregationMetricsData` metric measuring the time from the first
|
||||
LLM token to the first complete sentence, representing the latency cost of
|
||||
sentence aggregation in the TTS pipeline.
|
||||
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
|
||||
|
||||
- Added support for using strongly-typed objects instead of dicts for updating
|
||||
service settings at runtime.
|
||||
|
||||
Instead of, say:
|
||||
|
||||
```python
|
||||
await task.queue_frame(
|
||||
STTUpdateSettingsFrame(settings={"language": Language.ES})
|
||||
)
|
||||
```
|
||||
|
||||
you'd do:
|
||||
|
||||
```python
|
||||
await task.queue_frame(
|
||||
STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
|
||||
)
|
||||
```
|
||||
|
||||
Each service now vends strongly-typed classes like `DeepgramSTTSettings`
|
||||
representing the service's runtime-updatable settings.
|
||||
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
|
||||
|
||||
- Added support for specifying private endpoints for Azure Speech-to-Text,
|
||||
enabling use in private networks behind firewalls.
|
||||
(PR [#3764](https://github.com/pipecat-ai/pipecat/pull/3764))
|
||||
|
||||
- Added `LemonSliceTransport` and `LemonSliceApi` to support adding real-time
|
||||
LemonSlice Avatars to any Daily room.
|
||||
(PR [#3791](https://github.com/pipecat-ai/pipecat/pull/3791))
|
||||
|
||||
- Added `output_medium` parameter to `AgentInputParams` and
|
||||
`OneShotInputParams` in Ultravox service to control initial output medium
|
||||
(text or voice) at call creation time.
|
||||
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
|
||||
|
||||
- Added `TurnMetricsData` as a generic metrics class for turn detection, with
|
||||
e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData`
|
||||
with `e2e_processing_time_ms` tracking the interval from VAD
|
||||
speech-to-silence transition to turn completion.
|
||||
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
|
||||
|
||||
- Added `on_audio_context_interrupted()` and `on_audio_context_completed()`
|
||||
callbacks to `AudioContextTTSService`. Subclasses can override these to
|
||||
perform provider-specific cleanup instead of overriding
|
||||
`_handle_interruption()`.
|
||||
(PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
|
||||
|
||||
- Added `on_summary_applied` event to `LLMContextSummarizer` for observability,
|
||||
providing message counts before and after context summarization.
|
||||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||||
|
||||
- Added `summary_message_template` to `LLMContextSummarizationConfig` for
|
||||
customizing how summaries are formatted when injected into context (e.g.,
|
||||
wrapping in XML tags).
|
||||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||||
|
||||
- Added `summarization_timeout` to `LLMContextSummarizationConfig` (default
|
||||
120s) to prevent hung LLM calls from permanently blocking future
|
||||
summarizations.
|
||||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||||
|
||||
- Added optional `llm` field to `LLMContextSummarizationConfig` for routing
|
||||
summarization to a dedicated LLM service (e.g., a cheaper/faster model)
|
||||
instead of the pipeline's primary model.
|
||||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||||
|
||||
- Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
|
||||
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
|
||||
|
||||
- Added `LLMSummarizeContextFrame` to trigger on-demand context summarization
|
||||
from anywhere in the pipeline (e.g. a function call tool). Accepts an
|
||||
optional `config: LLMContextSummaryConfig` to override summary generation
|
||||
settings per request.
|
||||
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
|
||||
|
||||
- Added `LLMContextSummaryConfig` (summary generation params:
|
||||
`target_context_tokens`, `min_messages_after_summary`,
|
||||
`summarization_prompt`) and `LLMAutoContextSummarizationConfig` (auto-trigger
|
||||
thresholds: `max_context_tokens`, `max_unsummarized_messages`, plus a nested
|
||||
`summary_config`). These replace the monolithic
|
||||
`LLMContextSummarizationConfig`.
|
||||
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
|
||||
|
||||
- Added support for the `speed_alpha` parameter to the `arcana` model in
|
||||
`RimeTTSService`.
|
||||
(PR [#3873](https://github.com/pipecat-ai/pipecat/pull/3873))
|
||||
|
||||
- Added `ClientConnectedFrame`, a new `SystemFrame` pushed by all transports
|
||||
(Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen,
|
||||
Tavus) when a client connects. Enables observers to track transport readiness
|
||||
timing.
|
||||
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
|
||||
|
||||
- Added `StartupTimingObserver` for measuring how long each processor's
|
||||
`start()` method takes during pipeline startup. Also measures transport
|
||||
readiness — the time from `StartFrame` to first client connection — via the
|
||||
`on_transport_timing_report` event.
|
||||
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
|
||||
|
||||
- Added `BotConnectedFrame` for SFU transports and `on_transport_timing_report`
|
||||
event to `StartupTimingObserver` with bot and client connection timing.
|
||||
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
|
||||
|
||||
- Added optional `direction` parameter to `PipelineTask.queue_frame()` and
|
||||
`PipelineTask.queue_frames()`, allowing frames to be pushed upstream from the
|
||||
end of the pipeline.
|
||||
(PR [#3883](https://github.com/pipecat-ai/pipecat/pull/3883))
|
||||
|
||||
- Added `on_latency_breakdown` event to `UserBotLatencyObserver` providing
|
||||
per-service TTFB, text aggregation, user turn duration, and function call
|
||||
latency metrics for each user-to-bot response cycle.
|
||||
(PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
|
||||
|
||||
- Added `on_first_bot_speech_latency` event to `UserBotLatencyObserver`
|
||||
measuring the time from client connection to first bot speech. An
|
||||
`on_latency_breakdown` is also emitted for this first speech event.
|
||||
(PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
|
||||
|
||||
- Added `broadcast_interruption()` to `FrameProcessor`. This method pushes an
|
||||
`InterruptionFrame` both upstream and downstream directly from the calling
|
||||
processor, avoiding the round-trip through the pipeline task that
|
||||
`push_interruption_task_frame_and_wait()` required.
|
||||
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
|
||||
|
||||
### Changed
|
||||
|
||||
- Added `text_aggregation_mode` parameter to `TTSService` and all TTS
|
||||
subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All
|
||||
text now flows through text aggregators regardless of mode, enabling pattern
|
||||
detection and tag handling in TOKEN mode.
|
||||
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
|
||||
|
||||
- ⚠️ Refactored runtime-updatable service settings to use strongly-typed
|
||||
classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific
|
||||
subclasses) instead of plain dicts. Each service's `_settings` now holds
|
||||
these strongly-typed objects. For service maintainers, see changes in
|
||||
COMMUNITY_INTEGRATIONS.md.
|
||||
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
|
||||
|
||||
- Word timestamp support has been moved from `WordTTSService` into `TTSService`
|
||||
via a new `supports_word_timestamps` parameter. Services that previously
|
||||
extended `WordTTSService`, `AudioContextWordTTSService`, or
|
||||
`WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their
|
||||
parent `__init__` instead.
|
||||
(PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
|
||||
|
||||
- Improved Ultravox TTFB measurement accuracy by using VAD speech end time
|
||||
instead of `UserStoppedSpeakingFrame` timing.
|
||||
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
|
||||
|
||||
- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini
|
||||
realtime services: added `InterruptionFrame` handling with metrics cleanup,
|
||||
processing metrics at response boundaries, and improved agent transcript
|
||||
handling for both voice and text output modalities.
|
||||
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
|
||||
|
||||
- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.
|
||||
(PR [#3807](https://github.com/pipecat-ai/pipecat/pull/3807))
|
||||
|
||||
- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and
|
||||
`KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to
|
||||
`KRISP_VIVA_API_KEY` environment variable.
|
||||
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
|
||||
|
||||
- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security
|
||||
vulnerability.
|
||||
(PR [#3811](https://github.com/pipecat-ai/pipecat/pull/3811))
|
||||
|
||||
- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally
|
||||
speaking, you don't want a user interruption to prevent a service setting
|
||||
change from going into effect. Note that you usually don't use
|
||||
`ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
|
||||
- `LLMUpdateSettingsFrame`
|
||||
- `TTSUpdateSettingsFrame`
|
||||
- `STTUpdateSettingsFrame`
|
||||
(PR [#3819](https://github.com/pipecat-ai/pipecat/pull/3819))
|
||||
|
||||
- Updated context summarization to use `user` role instead of `assistant` for
|
||||
summary messages.
|
||||
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
|
||||
|
||||
- Rename `AssemblyAISTTService` parameter
|
||||
`min_end_of_turn_silence_when_confident` parameter to `min_turn_silence` (old
|
||||
name still supported with deprecation warning)
|
||||
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
|
||||
|
||||
- ⚠️ Renamed `LLMAssistantAggregatorParams` fields:
|
||||
`enable_context_summarization` → `enable_auto_context_summarization` and
|
||||
`context_summarization_config` → `auto_context_summarization_config` (now
|
||||
accepts `LLMAutoContextSummarizationConfig`). The old names still work with a
|
||||
`DeprecationWarning` for one release cycle.
|
||||
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
|
||||
|
||||
- `ElevenLabsRealtimeSTTService` now sets `TranscriptionFrame.finalized` to
|
||||
`True` when using `CommitStrategy.MANUAL`.
|
||||
(PR [#3865](https://github.com/pipecat-ai/pipecat/pull/3865))
|
||||
|
||||
- Updated numba version pin from == to >=0.61.2
|
||||
(PR [#3868](https://github.com/pipecat-ai/pipecat/pull/3868))
|
||||
|
||||
- Updated tracing code to use `ServiceSettings` dataclass API
|
||||
(`given_fields()`, attribute access) instead of dict-style access
|
||||
(`.items()`, `in`, subscript).
|
||||
(PR [#3879](https://github.com/pipecat-ai/pipecat/pull/3879))
|
||||
|
||||
- ⚠️ Removed `event` field and `complete()` method from `InterruptionFrame`.
|
||||
Removed `event` field from `InterruptionTaskFrame`. These are no longer
|
||||
needed since `broadcast_interruption()` does not require a round-trip
|
||||
completion signal.
|
||||
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
|
||||
|
||||
- Moved `pipecat.services.deepgram.stt_sagemaker` and
|
||||
`pipecat.services.deepgram.tts_sagemaker` to
|
||||
`pipecat.services.deepgram.sagemaker.stt` and
|
||||
`pipecat.services.deepgram.sagemaker.tts`. The old import paths still work
|
||||
but emit a `DeprecationWarning`.
|
||||
(PR [#3902](https://github.com/pipecat-ai/pipecat/pull/3902))
|
||||
|
||||
### Deprecated
|
||||
|
||||
- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS
|
||||
subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or
|
||||
`text_aggregation_mode=TextAggregationMode.TOKEN` instead.
|
||||
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
|
||||
|
||||
- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services
|
||||
in favor of runtime updates via `TTSUpdateSettingsFrame`,
|
||||
`STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.
|
||||
|
||||
⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas
|
||||
previously only `set_language()` caused the service to actually react to the
|
||||
update (e.g. by reconnecting to a remote service so it an pick up the
|
||||
change), now all these methods do. This change was made as part of a refactor
|
||||
making them all work the same way under the hood.
|
||||
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
|
||||
|
||||
- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of
|
||||
passing typed settings delta objects with
|
||||
`*UpdateSettingsFrame(delta={...})`.
|
||||
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
|
||||
|
||||
- Deprecated `WordTTSService`, `WebsocketWordTTSService`,
|
||||
`AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their
|
||||
non-word counterparts with `supports_word_timestamps=True` instead:
|
||||
- `WordTTSService` → `TTSService(supports_word_timestamps=True)`
|
||||
- `WebsocketWordTTSService` →
|
||||
`WebsocketTTSService(supports_word_timestamps=True)`
|
||||
- `AudioContextWordTTSService` →
|
||||
`AudioContextTTSService(supports_word_timestamps=True)`
|
||||
- `InterruptibleWordTTSService` →
|
||||
`InterruptibleTTSService(supports_word_timestamps=True)`
|
||||
(PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
|
||||
|
||||
- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`.
|
||||
`BaseSmartTurn` now emits `TurnMetricsData` directly.
|
||||
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
|
||||
|
||||
- Deprecated `LLMContextSummarizationConfig`. Use
|
||||
`LLMAutoContextSummarizationConfig` with a nested `LLMContextSummaryConfig`
|
||||
instead. The old class emits a `DeprecationWarning`.
|
||||
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
|
||||
|
||||
- Deprecated `push_interruption_task_frame_and_wait()` in `FrameProcessor`. Use
|
||||
`broadcast_interruption()` instead. The old method now delegates to
|
||||
`broadcast_interruption()` and logs a deprecation warning.
|
||||
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
|
||||
|
||||
### Removed
|
||||
|
||||
- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The
|
||||
`transformers` and `onnxruntime` packages are now always installed as core
|
||||
dependencies since they are required by the default turn stop strategy,
|
||||
`TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
|
||||
(PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
|
||||
|
||||
- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been
|
||||
shut down and is no longer available.
|
||||
(PR [#3838](https://github.com/pipecat-ai/pipecat/pull/3838))
|
||||
|
||||
### Fixed
|
||||
|
||||
- Added `LLMSpecificMessage` handling in `LLMContextSummarizationUtil` to skip
|
||||
provider-specific messages during context summarization.
|
||||
(PR [#3794](https://github.com/pipecat-ai/pipecat/pull/3794))
|
||||
|
||||
- Treated `response_cancel_not_active` as a non-fatal error in realtime
|
||||
services (`OpenAIRealtimeLLMService`, `GrokRealtimeLLMService`,
|
||||
`OpenAIRealtimeBetaLLMService`) to prevent WebSocket disconnection when
|
||||
cancelling an inactive response.
|
||||
(PR [#3795](https://github.com/pipecat-ai/pipecat/pull/3795))
|
||||
|
||||
- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies
|
||||
(`transformers`, `onnxruntime`) into core dependencies instead of using a
|
||||
self-referential extra.
|
||||
(PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
|
||||
|
||||
- Fixed `SentryMetrics` method signatures to match updated
|
||||
`FrameProcessorMetrics` base class, resolving `TypeError` when using
|
||||
`start_time`/`end_time` keyword arguments.
|
||||
(PR [#3808](https://github.com/pipecat-ai/pipecat/pull/3808))
|
||||
|
||||
- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and
|
||||
`AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.
|
||||
(PR [#3813](https://github.com/pipecat-ai/pipecat/pull/3813))
|
||||
|
||||
- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI,
|
||||
ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio
|
||||
contexts after normal speech completion, only on interruption.
|
||||
(PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
|
||||
|
||||
- Fixed STT TTFB metrics measuring timeout expiry time instead of actual
|
||||
transcript arrival time.
|
||||
(PR [#3822](https://github.com/pipecat-ai/pipecat/pull/3822))
|
||||
|
||||
- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being
|
||||
unintentionally pushed downstream in `LLMUserAggregator`. They are now
|
||||
consumed like `TranscriptionFrame`.
|
||||
(PR [#3825](https://github.com/pipecat-ai/pipecat/pull/3825))
|
||||
|
||||
- Fixed misleading "Empty audio frame received for STT service" warnings when
|
||||
using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`)
|
||||
that buffer audio internally.
|
||||
(PR [#3828](https://github.com/pipecat-ai/pipecat/pull/3828))
|
||||
|
||||
- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is
|
||||
sometimes vocalized
|
||||
(PR [#3837](https://github.com/pipecat-ai/pipecat/pull/3837))
|
||||
|
||||
- Fixed `TTSSpeakFrame` not committing spoken text to the conversation context
|
||||
when used outside of an LLM response (e.g., bot greetings or injected
|
||||
speech).
|
||||
(PR [#3845](https://github.com/pipecat-ai/pipecat/pull/3845))
|
||||
|
||||
- Removed verbose per-chunk audio logging from `GenesysAudioHookSerializer`
|
||||
that flooded production logs.
|
||||
(PR [#3850](https://github.com/pipecat-ai/pipecat/pull/3850))
|
||||
|
||||
- Add beta feature warning when using custom prompts with AssemblyAI
|
||||
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
|
||||
|
||||
- Fixed `LocalSmartTurnAnalyzerV3` producing incorrect end-of-turn predictions
|
||||
at non-16kHz sample rates (e.g. 8kHz Twilio telephony) by adding automatic
|
||||
resampling to 16kHz before Whisper feature extraction.
|
||||
(PR [#3857](https://github.com/pipecat-ai/pipecat/pull/3857))
|
||||
|
||||
- Fixed `PipelineTask` double-inserting `RTVIProcessor` into the frame chain
|
||||
when the user provides both an `RTVIProcessor` in the pipeline and a custom
|
||||
`RTVIObserver` subclass in observers.
|
||||
(PR [#3867](https://github.com/pipecat-ai/pipecat/pull/3867))
|
||||
|
||||
- Fixed turn completion instructions being lost when `LLMMessagesUpdateFrame`
|
||||
replaces the LLM context. When `filter_incomplete_user_turns` is enabled, the
|
||||
turn completion system message is now re-injected after context replacement.
|
||||
(PR [#3888](https://github.com/pipecat-ai/pipecat/pull/3888))
|
||||
|
||||
- Fixed Azure TTS and STT services silently swallowing cancellation errors
|
||||
(invalid API key, network failures, rate limiting) instead of propagating
|
||||
them as `ErrorFrame`s to the pipeline.
|
||||
(PR [#3893](https://github.com/pipecat-ai/pipecat/pull/3893))
|
||||
|
||||
### Performance
|
||||
|
||||
- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to
|
||||
`AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on
|
||||
every interruption by using `client_req_id`-based multiplexing.
|
||||
(PR [#3759](https://github.com/pipecat-ai/pipecat/pull/3759))
|
||||
|
||||
### Other
|
||||
|
||||
- Standardized Sarvam STT/TTS User-Agent header handling to consistently send
|
||||
Pipecat SDK identity in websocket requests.
|
||||
(PR [#3886](https://github.com/pipecat-ai/pipecat/pull/3886))
|
||||
|
||||
## [0.0.103] - 2026-02-20
|
||||
|
||||
### Added
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
- Added `TextAggregationMetricsData` metric measuring the time from the first LLM token to the first complete sentence, representing the latency cost of sentence aggregation in the TTS pipeline.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `text_aggregation_mode` parameter to `TTSService` and all TTS subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All text now flows through text aggregators regardless of mode, enabling pattern detection and tag handling in TOKEN mode.
|
||||
@@ -1 +0,0 @@
|
||||
- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or `text_aggregation_mode=TextAggregationMode.TOKEN` instead.
|
||||
@@ -1,19 +0,0 @@
|
||||
- Added support for using strongly-typed objects instead of dicts for updating service settings at runtime.
|
||||
|
||||
Instead of, say:
|
||||
|
||||
```python
|
||||
await task.queue_frame(
|
||||
STTUpdateSettingsFrame(settings={"language": Language.ES})
|
||||
)
|
||||
```
|
||||
|
||||
you'd do:
|
||||
|
||||
```python
|
||||
await task.queue_frame(
|
||||
STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
|
||||
)
|
||||
```
|
||||
|
||||
Each service now vends strongly-typed classes like `DeepgramSTTSettings` representing the service's runtime-updatable settings.
|
||||
@@ -1 +0,0 @@
|
||||
- ⚠️ Refactored runtime-updatable service settings to use strongly-typed classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific subclasses) instead of plain dicts. Each service's `_settings` now holds these strongly-typed objects. For service maintainers, see changes in COMMUNITY_INTEGRATIONS.md.
|
||||
@@ -1 +0,0 @@
|
||||
- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of passing typed settings delta objects with `*UpdateSettingsFrame(delta={...})`.
|
||||
@@ -1,3 +0,0 @@
|
||||
- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services in favor of runtime updates via `TTSUpdateSettingsFrame`, `STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.
|
||||
|
||||
⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas previously only `set_language()` caused the service to actually react to the update (e.g. by reconnecting to a remote service so it an pick up the change), now all these methods do. This change was made as part of a refactor making them all work the same way under the hood.
|
||||
@@ -1 +0,0 @@
|
||||
- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on every interruption by using `client_req_id`-based multiplexing.
|
||||
@@ -1 +0,0 @@
|
||||
- Added support for specifying private endpoints for Azure Speech-to-Text, enabling use in private networks behind firewalls.
|
||||
@@ -1 +0,0 @@
|
||||
- Word timestamp support has been moved from `WordTTSService` into `TTSService` via a new `supports_word_timestamps` parameter. Services that previously extended `WordTTSService`, `AudioContextWordTTSService`, or `WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their parent `__init__` instead.
|
||||
@@ -1,5 +0,0 @@
|
||||
- Deprecated `WordTTSService`, `WebsocketWordTTSService`, `AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their non-word counterparts with `supports_word_timestamps=True` instead:
|
||||
- `WordTTSService` → `TTSService(supports_word_timestamps=True)`
|
||||
- `WebsocketWordTTSService` → `WebsocketTTSService(supports_word_timestamps=True)`
|
||||
- `AudioContextWordTTSService` → `AudioContextTTSService(supports_word_timestamps=True)`
|
||||
- `InterruptibleWordTTSService` → `InterruptibleTTSService(supports_word_timestamps=True)`
|
||||
@@ -1 +0,0 @@
|
||||
- Added `LemonSliceTransport` and `LemonSliceApi` to support adding real-time LemonSlice Avatars to any Daily room.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `LLMSpecificMessage` handling in `LLMContextSummarizationUtil` to skip provider-specific messages during context summarization.
|
||||
@@ -1 +0,0 @@
|
||||
- Treated `response_cancel_not_active` as a non-fatal error in realtime services (`OpenAIRealtimeLLMService`, `GrokRealtimeLLMService`, `OpenAIRealtimeBetaLLMService`) to prevent WebSocket disconnection when cancelling an inactive response.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies (`transformers`, `onnxruntime`) into core dependencies instead of using a self-referential extra.
|
||||
@@ -1 +0,0 @@
|
||||
- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The `transformers` and `onnxruntime` packages are now always installed as core dependencies since they are required by the default turn stop strategy, `TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `output_medium` parameter to `AgentInputParams` and `OneShotInputParams` in Ultravox service to control initial output medium (text or voice) at call creation time.
|
||||
@@ -1 +0,0 @@
|
||||
- Improved Ultravox TTFB measurement accuracy by using VAD speech end time instead of `UserStoppedSpeakingFrame` timing.
|
||||
@@ -1 +0,0 @@
|
||||
- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini realtime services: added `InterruptionFrame` handling with metrics cleanup, processing metrics at response boundaries, and improved agent transcript handling for both voice and text output modalities.
|
||||
@@ -1 +0,0 @@
|
||||
- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `SentryMetrics` method signatures to match updated `FrameProcessorMetrics` base class, resolving `TypeError` when using `start_time`/`end_time` keyword arguments.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `TurnMetricsData` as a generic metrics class for turn detection, with e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData` with `e2e_processing_time_ms` tracking the interval from VAD speech-to-silence transition to turn completion.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and `KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to `KRISP_VIVA_API_KEY` environment variable.
|
||||
@@ -1 +0,0 @@
|
||||
- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`. `BaseSmartTurn` now emits `TurnMetricsData` directly.
|
||||
@@ -1 +0,0 @@
|
||||
- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security vulnerability.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and `AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `on_audio_context_interrupted()` and `on_audio_context_completed()` callbacks to `AudioContextTTSService`. Subclasses can override these to perform provider-specific cleanup instead of overriding `_handle_interruption()`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI, ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio contexts after normal speech completion, only on interruption.
|
||||
@@ -1,4 +0,0 @@
|
||||
- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally speaking, you don't want a user interruption to prevent a service setting change from going into effect. Note that you usually don't use `ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
|
||||
- `LLMUpdateSettingsFrame`
|
||||
- `TTSUpdateSettingsFrame`
|
||||
- `STTUpdateSettingsFrame`
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed STT TTFB metrics measuring timeout expiry time instead of actual transcript arrival time.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being unintentionally pushed downstream in `LLMUserAggregator`. They are now consumed like `TranscriptionFrame`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed misleading "Empty audio frame received for STT service" warnings when using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`) that buffer audio internally.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is sometimes vocalized
|
||||
@@ -1 +0,0 @@
|
||||
- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been shut down and is no longer available.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `TTSSpeakFrame` not committing spoken text to the conversation context when used outside of an LLM response (e.g., bot greetings or injected speech).
|
||||
@@ -1 +0,0 @@
|
||||
- Removed verbose per-chunk audio logging from `GenesysAudioHookSerializer` that flooded production logs.
|
||||
@@ -1 +0,0 @@
|
||||
- Added optional `llm` field to `LLMContextSummarizationConfig` for routing summarization to a dedicated LLM service (e.g., a cheaper/faster model) instead of the pipeline's primary model.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `summarization_timeout` to `LLMContextSummarizationConfig` (default 120s) to prevent hung LLM calls from permanently blocking future summarizations.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `on_summary_applied` event to `LLMContextSummarizer` for observability, providing message counts before and after context summarization.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `summary_message_template` to `LLMContextSummarizationConfig` for customizing how summaries are formatted when injected into context (e.g., wrapping in XML tags).
|
||||
@@ -1 +0,0 @@
|
||||
- Updated context summarization to use `user` role instead of `assistant` for summary messages.
|
||||
@@ -1 +0,0 @@
|
||||
- Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
|
||||
@@ -1 +0,0 @@
|
||||
- Rename `AssemblyAISTTService` parameter `min_end_of_turn_silence_when_confident` parameter to `min_turn_silence` (old name still supported with deprecation warning)
|
||||
@@ -1 +0,0 @@
|
||||
- Add beta feature warning when using custom prompts with AssemblyAI
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `LocalSmartTurnAnalyzerV3` producing incorrect end-of-turn predictions at non-16kHz sample rates (e.g. 8kHz Twilio telephony) by adding automatic resampling to 16kHz before Whisper feature extraction.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `LLMContextSummaryConfig` (summary generation params: `target_context_tokens`, `min_messages_after_summary`, `summarization_prompt`) and `LLMAutoContextSummarizationConfig` (auto-trigger thresholds: `max_context_tokens`, `max_unsummarized_messages`, plus a nested `summary_config`). These replace the monolithic `LLMContextSummarizationConfig`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `LLMSummarizeContextFrame` to trigger on-demand context summarization from anywhere in the pipeline (e.g. a function call tool). Accepts an optional `config: LLMContextSummaryConfig` to override summary generation settings per request.
|
||||
@@ -1 +0,0 @@
|
||||
- ⚠️ Renamed `LLMAssistantAggregatorParams` fields: `enable_context_summarization` → `enable_auto_context_summarization` and `context_summarization_config` → `auto_context_summarization_config` (now accepts `LLMAutoContextSummarizationConfig`). The old names still work with a `DeprecationWarning` for one release cycle.
|
||||
@@ -1 +0,0 @@
|
||||
- Deprecated `LLMContextSummarizationConfig`. Use `LLMAutoContextSummarizationConfig` with a nested `LLMContextSummaryConfig` instead. The old class emits a `DeprecationWarning`.
|
||||
@@ -1 +0,0 @@
|
||||
- `ElevenLabsRealtimeSTTService` now sets `TranscriptionFrame.finalized` to `True` when using `CommitStrategy.MANUAL`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `PipelineTask` double-inserting `RTVIProcessor` into the frame chain when the user provides both an `RTVIProcessor` in the pipeline and a custom `RTVIObserver` subclass in observers.
|
||||
@@ -1 +0,0 @@
|
||||
- Updated numba version pin from == to >=0.61.2
|
||||
@@ -1 +0,0 @@
|
||||
- Added support for the `speed_alpha` parameter to the `arcana` model in `RimeTTSService`.
|
||||
@@ -1 +0,0 @@
|
||||
- Updated tracing code to use `ServiceSettings` dataclass API (`given_fields()`, attribute access) instead of dict-style access (`.items()`, `in`, subscript).
|
||||
@@ -1 +0,0 @@
|
||||
- Added `ClientConnectedFrame`, a new `SystemFrame` pushed by all transports (Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen, Tavus) when a client connects. Enables observers to track transport readiness timing.
|
||||
@@ -1 +0,0 @@
|
||||
Added `BotConnectedFrame` for SFU transports and `on_transport_timing_report` event to `StartupTimingObserver` with bot and client connection timing.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `StartupTimingObserver` for measuring how long each processor's `start()` method takes during pipeline startup. Also measures transport readiness — the time from `StartFrame` to first client connection — via the `on_transport_timing_report` event.
|
||||
@@ -1 +0,0 @@
|
||||
- Added optional `direction` parameter to `PipelineTask.queue_frame()` and `PipelineTask.queue_frames()`, allowing frames to be pushed upstream from the end of the pipeline.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `on_first_bot_speech_latency` event to `UserBotLatencyObserver` measuring the time from client connection to first bot speech. An `on_latency_breakdown` is also emitted for this first speech event.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `on_latency_breakdown` event to `UserBotLatencyObserver` providing per-service TTFB, text aggregation, user turn duration, and function call latency metrics for each user-to-bot response cycle.
|
||||
@@ -1 +0,0 @@
|
||||
- Standardized Sarvam STT/TTS User-Agent header handling to consistently send Pipecat SDK identity in websocket requests.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed turn completion instructions being lost when `LLMMessagesUpdateFrame` replaces the LLM context. When `filter_incomplete_user_turns` is enabled, the turn completion system message is now re-injected after context replacement.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed Azure TTS and STT services silently swallowing cancellation errors (invalid API key, network failures, rate limiting) instead of propagating them as `ErrorFrame`s to the pipeline.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `broadcast_interruption()` to `FrameProcessor`. This method pushes an `InterruptionFrame` both upstream and downstream directly from the calling processor, avoiding the round-trip through the pipeline task that `push_interruption_task_frame_and_wait()` required.
|
||||
@@ -1 +0,0 @@
|
||||
- ⚠️ Removed `event` field and `complete()` method from `InterruptionFrame`. Removed `event` field from `InterruptionTaskFrame`. These are no longer needed since `broadcast_interruption()` does not require a round-trip completion signal.
|
||||
@@ -1 +0,0 @@
|
||||
- Deprecated `push_interruption_task_frame_and_wait()` in `FrameProcessor`. Use `broadcast_interruption()` instead. The old method now delegates to `broadcast_interruption()` and logs a deprecation warning.
|
||||
@@ -1 +0,0 @@
|
||||
- Moved `pipecat.services.deepgram.stt_sagemaker` and `pipecat.services.deepgram.tts_sagemaker` to `pipecat.services.deepgram.sagemaker.stt` and `pipecat.services.deepgram.sagemaker.tts`. The old import paths still work but emit a `DeprecationWarning`.
|
||||
Reference in New Issue
Block a user