Merge pull request #3201 from pipecat-ai/changelog-0.0.97

Release 0.0.97 - Changelog Update
2025-12-05 18:49:15 -05:00
parent 9ef139d020 4df0a9bf73
commit 4cefe1357c
24 changed files with 108 additions and 48 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,114 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 <!-- towncrier release notes start -->

+## [0.0.97] - 2025-12-05
+
+### Added
+
+- Added new Gradium services, `GradiumSTTService` and `GradiumTTSService`, for
+  speech-to-text and text-to-speech functionality using Gradium's API.
+
+- Additions for `AsyncAITTSService` and `AsyncAIHttpTTSService`:
+
+  - Added new `languages`: `pt`, `nl`, `ar`, `ru`, `ro`, `ja`, `he`, `hy`,
+    `tr`, `hi`, `zh`.
+  - Updated the default model to `asyncflow_multilingual_v1.0` for improved
+    accuracy and broader language coverage.
+
+- Added optional tool and tool output filters for MCP services.
+
+### Changed
+
+- Updated Deepgram logging to include Deepgram request IDs for improved
+  debugging.
+
+- Text Aggregation Improvements:
+
+  - **Breaking Change**: `BaseTextAggregator.aggregate()` now returns
+    `AsyncIterator[Aggregation]` instead of `Optional[Aggregation]`. This
+    enables the aggregator to return multiple results based on the provided
+    text.
+  - Refactored text aggregators to use inheritance: `SkipTagsAggregator` and
+    `PatternPairAggregator` now inherit from `SimpleTextAggregator`, reusing
+    the base class's sentence detection logic.
+
+- Improved interruption handling to prevent bots from repeating themselves. LLM
+  services that return multiple sentences in a single response (e.g.,
+  `GoogleLLMService`) are now split into individual sentences before being sent
+  to TTS. This ensures interruptions occur at sentence boundaries, preventing
+  the bot from repeating content after being interrupted during long responses.
+
+- Updated `AICFilter` to use Quail STT as the default model
+  (`AICModelType.QUAIL_STT`). Quail STT is optimized for human-to-machine
+  interaction (e.g., voice agents, speech-to-text) and operates at a native
+  sample rate of 16 kHz with fixed enhancement parameters.
+
+- If an unexpected exception is caught, or if `FrameProcessor.push_error()` is
+  called with an exception, the file name and line number where the exception
+  occured are now logged.
+
+- Updated Smart Turn model weights to v3.1.
+
+- Smart Turn analyzer now uses the full context of the turn rather than just
+  the audio since VAD last triggered.
+
+- Updated `CartesiaSTTService` to return the full transcription `result` in the
+  `TranscriptionFrame` and `InterimTranscriptionFrame`. This provides access to
+  word timestamp data.
+
+- `HumeTTSService` changes:
+
+  - Added tracking headers (`X-Hume-Client-Name` and `X-Hume-Client-Version`)
+    to all requests made by `HumeTTSService` to the Hume API for better usage
+    tracking and analytics.
+  - Added `stop()` and `cancel()` cleanup methods to `HumeTTSService` to
+    properly close the HTTP client and prevent resource leaks.
+
+### Deprecated
+
+- NVIDIA Services name changes (all functionality is unchanged):
+
+  - `NimLLMService` is now deprecated, use `NvidiaLLMService` instead.
+  - `RivaSTTService` is now deprecated, use `NvidiaSTTService` instead.
+  - `RivaTTSService` is now deprecated, use `NvidiaTTSService` instead.
+  - Use `uv pip install pipecat-ai[nvidia]` instead of
+    `uv pip install pipecat-ai[riva]`
+
+- The `noise_gate_enable` parameter in `AICFilter` is deprecated and no longer
+  has any effect. Noise gating is now handled automatically by the AIC VAD
+  system. Use `AICFilter.create_vad_analyzer()` for VAD functionality instead.
+
+- Package `pipecat.sync` is deprecated, use `pipecat.utils.sync` instead.
+
+### Fixed
+
+- Fixed bug in `PatternPairAggregator` where pattern handlers could be called
+  multiple times for `KEEP` or `AGGREGATE` patterns.
+
+- Fixed sentence aggregation to correctly handle ambiguous punctuation in
+  streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").
+
+- Fixed an issue in `AWSTranscribeSTTService` where the `region` arg was always
+  set to `us-east-1` when providing an AWS_REGION env var.
+
+- Fixed an issue in `SarvamTTSService` where the last sentence was not being
+  spoken. Now, audio is flushed when the TTS services receives the
+  `LLMFullResponseEndFrame` or `EndFrame`.
+
+- Fixed an issue in `DeepgramTTSService` where a `TTSStoppedFrame` was
+  incorrectly pushed after a functional call. This caused an issue with the
+  voice-ui-kit's conversational panel rending of the LLM output after a
+  function call.
+
+- Fixed an issue where `LLMTextFrame.skip_tts` was being overwritten by LLM
+  services.
+
+- Fixed an issue that caused `WebsocketService` instances to attempt
+  reconnection during shutdown.
+
+- Fixed an issue in `ElevenLabsTTSService` where character usage metrics were
+  only reported on the first TTS generation per turn.
+
 ## [0.0.96] - 2025-11-26 🦃 "Happy Thanksgiving!" 🦃

 ### Added
--- a/changelog/3072.changed.md
+++ b/changelog/3072.changed.md
@@ -1 +0,0 @@
- Updated Deepgram logging to include Deepgram request IDs for improved debugging.
--- a/changelog/3130.deprecated.md
+++ b/changelog/3130.deprecated.md
@@ -1,7 +0,0 @@
- NVIDIA Services name changes (all functionality is unchanged):
-
-  - `NimLLMService` is now deprecated, use `NvidiaLLMService` instead.
-  - `RivaSTTService` is now deprecated, use `NvidiaSTTService` instead.
-  - `RivaTTSService` is now deprecated, use `NvidiaTTSService` instead.
-  - Use `uv pip install pipecat-ai[nvidia]` instead of
-    `uv pip install pipecat-ai[riva]`
--- a/changelog/3132.changed.2.md
+++ b/changelog/3132.changed.2.md
@@ -1,9 +0,0 @@
- Text Aggregation Improvements:
-
-  - **Breaking Change**: `BaseTextAggregator.aggregate()` now returns
-    `AsyncIterator[Aggregation]` instead of `Optional[Aggregation]`. This
-    enables the aggregator to return multiple results based on the provided
-    text.
-  - Refactored text aggregators to use inheritance: `SkipTagsAggregator` and
-    `PatternPairAggregator` now inherit from `SimpleTextAggregator`, reusing
-    the base class's sentence detection logic.
--- a/changelog/3132.changed.md
+++ b/changelog/3132.changed.md
@@ -1 +0,0 @@
- Improved interruption handling to prevent bots from repeating themselves. LLM services that return multiple sentences in a single response (e.g., `GoogleLLMService`) are now split into individual sentences before being sent to TTS. This ensures interruptions occur at sentence boundaries, preventing the bot from repeating content after being interrupted during long responses.
--- a/changelog/3132.fixed.2.md
+++ b/changelog/3132.fixed.2.md
@@ -1 +0,0 @@
- Fixed bug in `PatternPairAggregator` where pattern handlers could be called multiple times for `KEEP` or `AGGREGATE` patterns.
--- a/changelog/3132.fixed.md
+++ b/changelog/3132.fixed.md
@@ -1 +0,0 @@
- Fixed sentence aggregation to correctly handle ambiguous punctuation in streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").
--- a/changelog/3153.fixed.md
+++ b/changelog/3153.fixed.md
@@ -1 +0,0 @@
- Fixed an issue in `AWSTranscribeSTTService` where the `region` arg was always set to `us-east-1` when providing an AWS_REGION env var.
--- a/changelog/3155.fixed.md
+++ b/changelog/3155.fixed.md
@@ -1 +0,0 @@
- Fixed an issue in `SarvamTTSService` where the last sentence was not being spoken. Now, audio is flushed when the TTS services receives the `LLMFullResponseEndFrame` or `EndFrame`.
--- a/changelog/3156.fixed.md
+++ b/changelog/3156.fixed.md
@@ -1 +0,0 @@
- Fixed an issue in `DeepgramTTSService` where a `TTSStoppedFrame` was incorrectly pushed after a functional call. This caused an issue with the voice-ui-kit's conversational panel rending of the LLM output after a function call.
--- a/changelog/3162.changed.md
+++ b/changelog/3162.changed.md
@@ -1 +0,0 @@
- Updated `AICFilter` to use Quail STT as the default model (`AICModelType.QUAIL_STT`). Quail STT is optimized for human-to-machine interaction (e.g., voice agents, speech-to-text) and operates at a native sample rate of 16 kHz with fixed enhancement parameters.
--- a/changelog/3162.deprecated.md
+++ b/changelog/3162.deprecated.md
@@ -1 +0,0 @@
- The `noise_gate_enable` parameter in `AICFilter` is deprecated and no longer has any effect. Noise gating is now handled automatically by the AIC VAD system. Use `AICFilter.create_vad_analyzer()` for VAD functionality instead.
--- a/changelog/3168.fixed.md
+++ b/changelog/3168.fixed.md
@@ -1 +0,0 @@
- Fixed an issue where `LLMTextFrame.skip_tts` was being overwritten by LLM services.
--- a/changelog/3176.added.md
+++ b/changelog/3176.added.md
@@ -1 +0,0 @@
- Added new Gradium services, `GradiumSTTService` and `GradiumTTSService`, for speech-to-text and text-to-speech functionality using Gradium's API.
--- a/changelog/3176.changed.md
+++ b/changelog/3176.changed.md
@@ -1 +0,0 @@
- If an unexpected exception is caught, or if `FrameProcessor.push_error()` is called with an exception, the file name and line number where the exception occured are now logged.
--- a/changelog/3177.changed.md
+++ b/changelog/3177.changed.md
@@ -1 +0,0 @@
- Updated Smart Turn model weights to v3.1.
--- a/changelog/3181.deprecated.md
+++ b/changelog/3181.deprecated.md
@@ -1 +0,0 @@
- Package `pipecat.sync` is deprecated, use `pipecat.utils.sync` instead.
--- a/changelog/3183.changed.md
+++ b/changelog/3183.changed.md
@@ -1 +0,0 @@
- Smart Turn analyzer now uses the full context of the turn rather than just the audio since VAD last triggered.
--- a/changelog/3184.added.md
+++ b/changelog/3184.added.md
@@ -1,6 +0,0 @@
- Additions for `AsyncAITTSService` and `AsyncAIHttpTTSService`:
-
-  - Added new `languages`: `pt`, `nl`, `ar`, `ru`, `ro`, `ja`, `he`, `hy`,
-    `tr`, `hi`, `zh`.
-  - Updated the default model to `asyncflow_multilingual_v1.0` for improved
-    accuracy and broader language coverage.
--- a/changelog/3185.fixed.md
+++ b/changelog/3185.fixed.md
@@ -1 +0,0 @@
- Fixed an issue that caused `WebsocketService` instances to attempt reconnection during shutdown.
--- a/changelog/3186.fixed.md
+++ b/changelog/3186.fixed.md
@@ -1 +0,0 @@
- Fixed an issue in `ElevenLabsTTSService` where character usage metrics were only reported on the first TTS generation per turn.
--- a/changelog/3187.added.md
+++ b/changelog/3187.added.md
@@ -1 +0,0 @@
- Added optional tool and tool output filters for MCP services.
--- a/changelog/3192.changed.md
+++ b/changelog/3192.changed.md
@@ -1 +0,0 @@
- Updated `CartesiaSTTService` to return the full transcription `result` in the `TranscriptionFrame` and `InterimTranscriptionFrame`. This provides access to word timestamp data.
--- a/changelog/3195.changed.md
+++ b/changelog/3195.changed.md
@@ -1,7 +0,0 @@
- `HumeTTSService` changes:
-
-  - Added tracking headers (`X-Hume-Client-Name` and `X-Hume-Client-Version`)
-    to all requests made by `HumeTTSService` to the Hume API for better usage
-    tracking and analytics.
-  - Added `stop()` and `cancel()` cleanup methods to `HumeTTSService` to
-    properly close the HTTP client and prevent resource leaks.
				`@@ -1 +0,0 @@`
				`- Updated Deepgram logging to include Deepgram request IDs for improved debugging.`
				`@@ -1 +0,0 @@`
				- Improved interruption handling to prevent bots from repeating themselves. LLM services that return multiple sentences in a single response (e.g., `GoogleLLMService`) are now split into individual sentences before being sent to TTS. This ensures interruptions occur at sentence boundaries, preventing the bot from repeating content after being interrupted during long responses.
				`@@ -1 +0,0 @@`
				- Fixed bug in `PatternPairAggregator` where pattern handlers could be called multiple times for `KEEP` or `AGGREGATE` patterns.
				`@@ -1 +0,0 @@`
				`- Fixed sentence aggregation to correctly handle ambiguous punctuation in streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").`
				`@@ -1 +0,0 @@`
				- Fixed an issue in `AWSTranscribeSTTService` where the `region` arg was always set to `us-east-1` when providing an AWS_REGION env var.
				`@@ -1 +0,0 @@`
				- Fixed an issue in `SarvamTTSService` where the last sentence was not being spoken. Now, audio is flushed when the TTS services receives the `LLMFullResponseEndFrame` or `EndFrame`.
				`@@ -1 +0,0 @@`
				- Fixed an issue in `DeepgramTTSService` where a `TTSStoppedFrame` was incorrectly pushed after a functional call. This caused an issue with the voice-ui-kit's conversational panel rending of the LLM output after a function call.
				`@@ -1 +0,0 @@`
				- Updated `AICFilter` to use Quail STT as the default model (`AICModelType.QUAIL_STT`). Quail STT is optimized for human-to-machine interaction (e.g., voice agents, speech-to-text) and operates at a native sample rate of 16 kHz with fixed enhancement parameters.
				`@@ -1 +0,0 @@`
				- The `noise_gate_enable` parameter in `AICFilter` is deprecated and no longer has any effect. Noise gating is now handled automatically by the AIC VAD system. Use `AICFilter.create_vad_analyzer()` for VAD functionality instead.
				`@@ -1 +0,0 @@`
				- Fixed an issue where `LLMTextFrame.skip_tts` was being overwritten by LLM services.