Commit Graph

7921 Commits

Author SHA1 Message Date
zkleb-aai
5c2ca0ce64 Update changelog/3856.changed.md
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:54 -05:00
zkleb-aai
6729f4366a Update src/pipecat/services/assemblyai/stt.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:42 -05:00
zkleb-aai
7648b62e6e Update src/pipecat/services/assemblyai/stt.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:17 -05:00
zack
cb7e612738 Remove test files and testing documentation from PR 2026-03-01 11:51:51 -05:00
zack
36b9c05730 Fix changelog entries to use proper markdown bullet format 2026-03-01 11:45:24 -05:00
zack
6968d83ccb Add changelog entries for PR #3856 2026-03-01 11:44:51 -05:00
zack
42f91a9056 Apply ruff formatting fixes 2026-03-01 11:44:37 -05:00
zack
5de495cc98 Use logger.warning instead of warnings.warn for deprecation message
- Makes deprecation warning visible in logs without needing Python warning flags
- Users will see the warning during normal operation
2026-03-01 11:39:00 -05:00
zack
d1cbc81108 Fix 07o example to use new min_turn_silence parameter name in docs and comments 2026-03-01 11:36:46 -05:00
zack
66fca7e382 Add backward compatibility for min_end_of_turn_silence_when_confident parameter
- Keep old parameter name for backward compatibility
- Add deprecation warning when old parameter is used
- Automatically migrate old parameter value to new min_turn_silence parameter
- Exclude deprecated parameter from WebSocket URL to avoid sending it to API
- New parameter takes precedence if both are set
2026-03-01 11:33:22 -05:00
zack
07ae4b8d38 Update AssemblyAI examples to use u3-rt-pro and improve 55d example
- Update 13d-assemblyai-transcription.py to explicitly use u3-rt-pro model
- Update 55d-update-settings-assemblyai-stt.py to demonstrate keyterms updates instead of language updates
- Add helpful logging to show before/after keyterms boosting effect
- Use difficult names (Xiomara, Saoirse, Krzystof) to demonstrate boosting effectiveness
2026-03-01 11:27:31 -05:00
zack
21a409e447 Update prompt warning and rename min_end_of_turn_silence_when_confident to min_turn_silence
- Add "beta feature" note to custom prompt warning
- Rename min_end_of_turn_silence_when_confident parameter to min_turn_silence across all AssemblyAI code
- Update documentation, examples, and test files to use new parameter name
2026-03-01 11:17:39 -05:00
zack
d7ce1eedd9 Add foundational examples for AssemblyAI u3-rt-pro
- 07o-interruptible-assemblyai.py: Basic example using Pipecat VAD mode
- 07o-interruptible-assemblyai-stt.py: Advanced example using STT-controlled
  turn detection with comprehensive documentation on u3-rt-pro features
  (turn detection tuning, prompt-based enhancement, speaker diarization)
2026-02-27 17:58:18 -05:00
zack
ef00f27d53 Fix incorrect await on synchronous request_finalize() method
The request_finalize() method in STTService is synchronous (sets a flag),
but was being called with await in the VAD turn endpoint handling code.
This caused "object NoneType can't be used in 'await' expression" errors.

Also includes automatic formatting improvements from ruff.
2026-02-27 17:58:05 -05:00
zack
45532a9478 Remove info logs and unused import per PR feedback
- Remove unused Mapping import
- Remove info logs at initialization (connection params)
- Remove info logs in _handle_transcription (transcript details, text sent to LLM)
- Remove info logs in _build_ws_url (WebSocket URL and params)
- Keep debug logs (less verbose, appropriate for development)
2026-02-27 16:15:49 -05:00
zack
6ba9f780b0 Remove unnecessary SpeechStarted fallback in STT mode
u3-rt-pro guarantees SpeechStarted is always sent before transcripts,
so the fallback UserStartedSpeakingFrame broadcast is never needed.

This ensures clean pairing of UserStarted/StoppedSpeakingFrame:
- Start: Always from _handle_speech_started
- Stop: Always from _handle_transcription on final turn
2026-02-27 15:00:38 -05:00
zack
aa7e9a17d5 Fix finalization pattern: Use request/confirm in Pipecat mode, finalized flag in STT mode
- Add request_finalize() before sending ForceEndpoint in Pipecat mode
- Keep confirm_finalize() when receiving formatted finals in Pipecat mode
- Remove confirm_finalize() from STT mode (use finalized=True instead)

This follows Pipecat's two-step finalization pattern where request_finalize()
is called when sending a finalize request to the STT service, and
confirm_finalize() is called when receiving confirmation back.
2026-02-27 14:55:22 -05:00
zack
cd07937c5d Fix missing imports: Add UserStartedSpeakingFrame and UserStoppedSpeakingFrame 2026-02-26 22:18:02 -05:00
zack
72934bd8ae Add u3-rt-pro support and improvements to AssemblyAI STT service
- Fix speaker diarization: Add field alias for speaker_label → speaker
  mapping in TurnMessage model
- Add warning for non-optimal min_end_of_turn_silence_when_confident
  values (recommends 100ms for best latency)
- Improve max_turn_silence override warning message clarity
- Update custom prompt warning (remove 88% accuracy claim)
- Add comprehensive logging for debugging:
  - Log final connection params after modifications
  - Log WebSocket URL and parsed parameters
  - Log speaker field in transcripts
  - Log text sent to LLM with speaker formatting
- Support dynamic configuration updates via STTUpdateSettingsFrame:
  - keyterms_prompt (when AssemblyAI API supports it)
  - prompt
  - max_turn_silence
  - min_end_of_turn_silence_when_confident
2026-02-26 22:04:21 -05:00
Mark Backman
2a6a993869 Merge pull request #3850 from rupesh-svg/fix/genesys-remove-audio-chunk-logging
Remove verbose audio chunk logging from GenesysAudioHookSerializer
2026-02-26 21:52:54 -05:00
Rupesh
bbaa79fef0 Add changelog for PR #3850 2026-02-26 14:00:34 -08:00
Rupesh
fff9db0d8f Remove verbose audio chunk logging from GenesysAudioHookSerializer
Fixes #3777
2026-02-26 13:51:05 -08:00
kompfner
7fe458fe59 Merge pull request #3817 from pipecat-ai/pk/service-settings-fix-back-compat-for-nested-external-sdk-types
Flatten `LiveOptions` into individual fields on `DeepgramSTTSettings`…
2026-02-26 11:08:27 -05:00
Paul Kompfner
faed775d90 Extract _DeepgramSTTSettingsBase with shared _merge_live_options_delta to deduplicate LiveOptions merge logic between __init__ and apply_update, and between the Deepgram STT and SageMaker variants; make top-level model/language take precedence over conflicting live_options values in updates; remove unnecessary Language enum-to-string conversion (Language is a StrEnum) 2026-02-26 11:02:44 -05:00
Mark Backman
b63ca524f5 Merge pull request #3806 from pipecat-ai/mb/ultravox-updates
Align Ultravox Realtime service with OpenAI/Gemini patterns
2026-02-26 10:49:21 -05:00
Mark Backman
907ff58d41 Align Ultravox Realtime service with OpenAI/Gemini patterns
- Add InterruptionFrame handling with stop_all_metrics()
- Add processing metrics (start/stop) at response boundaries
- Fix agent transcript handling for voice and text modalities:
  - Voice mode: push LLMTextFrame (append_to_context=False) and
    TTSTextFrame for deltas, skip duplicated final text
  - Text mode: push LLMTextFrame with proper response lifecycle,
    no TTSTextFrame (downstream TTS handles audio)
- Add output_medium parameter to AgentInputParams and OneShotInputParams
- Improve TTFB measurement using VAD speech end time
- Update example with user turn strategies and transcript events
- Add text-only output example (50a-ultravox-realtime-text.py)
2026-02-26 10:44:36 -05:00
Mark Backman
97b93ebe57 Merge pull request #3696 from pipecat-ai/mb/streaming-tts-input
Improve streaming TTS input support, add TextAggregationMetricsData
2026-02-26 10:26:53 -05:00
Mark Backman
3ae173520e Code review feedback 2026-02-26 10:23:35 -05:00
Paul Kompfner
c184ac09b8 Inline _build_live_options into _connect in DeepgramSTTService and DeepgramSageMakerSTTService since it's trivial and only called from one place 2026-02-26 09:42:15 -05:00
Paul Kompfner
3c20eda8bf Keep model/language in LiveOptions at construction time so apply_update's bidirectional sync is sufficient; simplify _build_live_options to only add sample_rate 2026-02-26 09:32:52 -05:00
Mark Backman
d69a337def Add text_aggregation_mode parameter to TTSService
Move the sentence vs token aggregation concern into text aggregators
so all text flows through them regardless of mode. This enables
pattern detection and tag handling to work in TOKEN mode.

- Add TextAggregationMode enum (SENTENCE, TOKEN) as the user-facing
  TTS setting, separate from the internal AggregationType
- Add TOKEN mode support to Simple, SkipTags, and PatternPair aggregators
- Add text_aggregation_mode parameter to TTSService and all TTS subclasses
- Deprecate aggregate_sentences in favor of text_aggregation_mode
- Merge TTSService._process_text_frame() into a single codepath
2026-02-26 08:55:41 -05:00
Mark Backman
f7434cdde1 Add text aggregation time metric for TTS sentence aggregation
Add TextAggregationMetricsData measuring the time from the first LLM
token to the first complete sentence, representing the latency cost of
sentence aggregation in the TTS pipeline.
2026-02-26 08:48:47 -05:00
Paul Kompfner
e21e8585f0 Add deepgram and sagemaker extras to CI test dependencies so Deepgram and Deepgram Sagemaker settings tests can run 2026-02-25 18:59:59 -05:00
Paul Kompfner
8b6aa4b912 Unflatten LiveOptions back into a single live_options field on DeepgramSTTSettings and DeepgramSageMakerSTTSettings; add apply_update override with delta-merge semantics and from_mapping override for backward-compatible dict-style updates 2026-02-25 18:25:11 -05:00
Paul Kompfner
a4b6db6fb4 Flatten LiveOptions into individual fields on DeepgramSTTSettings and DeepgramSageMakerSTTSettings for backward-compatible dict-style updates via STTUpdateSettingsFrame; during the big service settings refactor, we accidentally got rid of the ability to update individual LiveOptions fields with a sparse update 2026-02-25 17:39:31 -05:00
Mark Backman
edc79d374a Merge pull request #3836 from pipecat-ai/mb/small-webrtc-prebuilt-2.3.0
Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0
2026-02-25 17:18:32 -05:00
Mark Backman
e521aef5df Merge pull request #3842 from pipecat-ai/mb/claude-plugin-docs
Add /update-docs skill to claude-plugin
2026-02-25 16:38:16 -05:00
kompfner
3cfff51205 Merge pull request #3827 from pipecat-ai/pk/gemini-tts-service-remove-model-ivar
Remove unnecessary `_model` ivar from `GeminiTTSService`, using `_set…
2026-02-25 16:14:38 -05:00
Paul Kompfner
3d8e3a4043 Remove unnecessary _model ivar from ElevenLabs STT services, using _settings.model instead. 2026-02-25 16:07:33 -05:00
Paul Kompfner
7ee0400c4c Remove unnecessary _model ivar from Hathora TTS and STT services, using _settings.model instead. 2026-02-25 16:07:26 -05:00
Paul Kompfner
781d191509 Remove unnecessary _model ivar from GeminiTTSService, using _settings.model instead 2026-02-25 15:59:38 -05:00
kompfner
a8cb2a26d1 Merge pull request #3841 from pipecat-ai/pk/groq-tweaks
A few Groq-related tweaks:
2026-02-25 15:54:33 -05:00
kompfner
b1df1ba5d4 Merge pull request #3834 from pipecat-ai/pk/make-ai-service-exclusive-syncer-of-model-name-to-metrics
Make it so that `AIService` is the exclusive "syncer" of model name t…
2026-02-25 15:53:59 -05:00
Mark Backman
eee2ef7e85 Add /update-docs skill to claude-plugin 2026-02-25 15:45:16 -05:00
Paul Kompfner
ff0f3dce32 A few Groq-related tweaks:
- Wire up passing speed setting to Groq, even though only a value of 1.0 is supported today
- Update the 55y example to switch voices instead of changing speed
- Add a 55zn example to exercise runtime updates of Groq STT
2026-02-25 15:10:48 -05:00
Paul Kompfner
bca42f7d68 Fix Hathora 55 series examples, and fix Hathora missing settings field warning 2026-02-25 14:48:40 -05:00
Paul Kompfner
27940d83a2 Make it so that AIService is the exclusive "syncer" of model name to metrics.
The only (rare) exception—where a service directly still needs to directly call `self._sync_model_name_to_metrics()`—is when the model name need to be "pulled" from another field (or nested field) in settings up to settings.model on a settings update. This only occurs in Deepgram services, where we use the voice as the model name.

This change has the side-effect of bringing model name to metrics for a number of services that were accidentally omitting it before.
2026-02-25 14:48:24 -05:00
Mark Backman
937c691f2a Merge pull request #3838 from pipecat-ai/mb/remove-playht
Remove PlayHT TTS services
2026-02-25 14:34:15 -05:00
Mark Backman
6803d38d3f Merge pull request #3833 from pipecat-ai/mb/add-performance-changelog-fragment
Add Performance as a changelog fragment option
2026-02-25 14:33:52 -05:00
Mark Backman
44993fe9e3 Remove PlayHT TTS services 2026-02-25 14:12:39 -05:00