Commit Graph

7904 Commits

Author SHA1 Message Date
zack
cd07937c5d Fix missing imports: Add UserStartedSpeakingFrame and UserStoppedSpeakingFrame 2026-02-26 22:18:02 -05:00
zack
72934bd8ae Add u3-rt-pro support and improvements to AssemblyAI STT service
- Fix speaker diarization: Add field alias for speaker_label → speaker
  mapping in TurnMessage model
- Add warning for non-optimal min_end_of_turn_silence_when_confident
  values (recommends 100ms for best latency)
- Improve max_turn_silence override warning message clarity
- Update custom prompt warning (remove 88% accuracy claim)
- Add comprehensive logging for debugging:
  - Log final connection params after modifications
  - Log WebSocket URL and parsed parameters
  - Log speaker field in transcripts
  - Log text sent to LLM with speaker formatting
- Support dynamic configuration updates via STTUpdateSettingsFrame:
  - keyterms_prompt (when AssemblyAI API supports it)
  - prompt
  - max_turn_silence
  - min_end_of_turn_silence_when_confident
2026-02-26 22:04:21 -05:00
Mark Backman
2a6a993869 Merge pull request #3850 from rupesh-svg/fix/genesys-remove-audio-chunk-logging
Remove verbose audio chunk logging from GenesysAudioHookSerializer
2026-02-26 21:52:54 -05:00
Rupesh
bbaa79fef0 Add changelog for PR #3850 2026-02-26 14:00:34 -08:00
Rupesh
fff9db0d8f Remove verbose audio chunk logging from GenesysAudioHookSerializer
Fixes #3777
2026-02-26 13:51:05 -08:00
kompfner
7fe458fe59 Merge pull request #3817 from pipecat-ai/pk/service-settings-fix-back-compat-for-nested-external-sdk-types
Flatten `LiveOptions` into individual fields on `DeepgramSTTSettings`…
2026-02-26 11:08:27 -05:00
Paul Kompfner
faed775d90 Extract _DeepgramSTTSettingsBase with shared _merge_live_options_delta to deduplicate LiveOptions merge logic between __init__ and apply_update, and between the Deepgram STT and SageMaker variants; make top-level model/language take precedence over conflicting live_options values in updates; remove unnecessary Language enum-to-string conversion (Language is a StrEnum) 2026-02-26 11:02:44 -05:00
Mark Backman
b63ca524f5 Merge pull request #3806 from pipecat-ai/mb/ultravox-updates
Align Ultravox Realtime service with OpenAI/Gemini patterns
2026-02-26 10:49:21 -05:00
Mark Backman
907ff58d41 Align Ultravox Realtime service with OpenAI/Gemini patterns
- Add InterruptionFrame handling with stop_all_metrics()
- Add processing metrics (start/stop) at response boundaries
- Fix agent transcript handling for voice and text modalities:
  - Voice mode: push LLMTextFrame (append_to_context=False) and
    TTSTextFrame for deltas, skip duplicated final text
  - Text mode: push LLMTextFrame with proper response lifecycle,
    no TTSTextFrame (downstream TTS handles audio)
- Add output_medium parameter to AgentInputParams and OneShotInputParams
- Improve TTFB measurement using VAD speech end time
- Update example with user turn strategies and transcript events
- Add text-only output example (50a-ultravox-realtime-text.py)
2026-02-26 10:44:36 -05:00
Mark Backman
97b93ebe57 Merge pull request #3696 from pipecat-ai/mb/streaming-tts-input
Improve streaming TTS input support, add TextAggregationMetricsData
2026-02-26 10:26:53 -05:00
Mark Backman
3ae173520e Code review feedback 2026-02-26 10:23:35 -05:00
Paul Kompfner
c184ac09b8 Inline _build_live_options into _connect in DeepgramSTTService and DeepgramSageMakerSTTService since it's trivial and only called from one place 2026-02-26 09:42:15 -05:00
Paul Kompfner
3c20eda8bf Keep model/language in LiveOptions at construction time so apply_update's bidirectional sync is sufficient; simplify _build_live_options to only add sample_rate 2026-02-26 09:32:52 -05:00
Mark Backman
d69a337def Add text_aggregation_mode parameter to TTSService
Move the sentence vs token aggregation concern into text aggregators
so all text flows through them regardless of mode. This enables
pattern detection and tag handling to work in TOKEN mode.

- Add TextAggregationMode enum (SENTENCE, TOKEN) as the user-facing
  TTS setting, separate from the internal AggregationType
- Add TOKEN mode support to Simple, SkipTags, and PatternPair aggregators
- Add text_aggregation_mode parameter to TTSService and all TTS subclasses
- Deprecate aggregate_sentences in favor of text_aggregation_mode
- Merge TTSService._process_text_frame() into a single codepath
2026-02-26 08:55:41 -05:00
Mark Backman
f7434cdde1 Add text aggregation time metric for TTS sentence aggregation
Add TextAggregationMetricsData measuring the time from the first LLM
token to the first complete sentence, representing the latency cost of
sentence aggregation in the TTS pipeline.
2026-02-26 08:48:47 -05:00
Paul Kompfner
e21e8585f0 Add deepgram and sagemaker extras to CI test dependencies so Deepgram and Deepgram Sagemaker settings tests can run 2026-02-25 18:59:59 -05:00
Paul Kompfner
8b6aa4b912 Unflatten LiveOptions back into a single live_options field on DeepgramSTTSettings and DeepgramSageMakerSTTSettings; add apply_update override with delta-merge semantics and from_mapping override for backward-compatible dict-style updates 2026-02-25 18:25:11 -05:00
Paul Kompfner
a4b6db6fb4 Flatten LiveOptions into individual fields on DeepgramSTTSettings and DeepgramSageMakerSTTSettings for backward-compatible dict-style updates via STTUpdateSettingsFrame; during the big service settings refactor, we accidentally got rid of the ability to update individual LiveOptions fields with a sparse update 2026-02-25 17:39:31 -05:00
Mark Backman
edc79d374a Merge pull request #3836 from pipecat-ai/mb/small-webrtc-prebuilt-2.3.0
Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0
2026-02-25 17:18:32 -05:00
Mark Backman
e521aef5df Merge pull request #3842 from pipecat-ai/mb/claude-plugin-docs
Add /update-docs skill to claude-plugin
2026-02-25 16:38:16 -05:00
kompfner
3cfff51205 Merge pull request #3827 from pipecat-ai/pk/gemini-tts-service-remove-model-ivar
Remove unnecessary `_model` ivar from `GeminiTTSService`, using `_set…
2026-02-25 16:14:38 -05:00
Paul Kompfner
3d8e3a4043 Remove unnecessary _model ivar from ElevenLabs STT services, using _settings.model instead. 2026-02-25 16:07:33 -05:00
Paul Kompfner
7ee0400c4c Remove unnecessary _model ivar from Hathora TTS and STT services, using _settings.model instead. 2026-02-25 16:07:26 -05:00
Paul Kompfner
781d191509 Remove unnecessary _model ivar from GeminiTTSService, using _settings.model instead 2026-02-25 15:59:38 -05:00
kompfner
a8cb2a26d1 Merge pull request #3841 from pipecat-ai/pk/groq-tweaks
A few Groq-related tweaks:
2026-02-25 15:54:33 -05:00
kompfner
b1df1ba5d4 Merge pull request #3834 from pipecat-ai/pk/make-ai-service-exclusive-syncer-of-model-name-to-metrics
Make it so that `AIService` is the exclusive "syncer" of model name t…
2026-02-25 15:53:59 -05:00
Mark Backman
eee2ef7e85 Add /update-docs skill to claude-plugin 2026-02-25 15:45:16 -05:00
Paul Kompfner
ff0f3dce32 A few Groq-related tweaks:
- Wire up passing speed setting to Groq, even though only a value of 1.0 is supported today
- Update the 55y example to switch voices instead of changing speed
- Add a 55zn example to exercise runtime updates of Groq STT
2026-02-25 15:10:48 -05:00
Paul Kompfner
bca42f7d68 Fix Hathora 55 series examples, and fix Hathora missing settings field warning 2026-02-25 14:48:40 -05:00
Paul Kompfner
27940d83a2 Make it so that AIService is the exclusive "syncer" of model name to metrics.
The only (rare) exception—where a service directly still needs to directly call `self._sync_model_name_to_metrics()`—is when the model name need to be "pulled" from another field (or nested field) in settings up to settings.model on a settings update. This only occurs in Deepgram services, where we use the voice as the model name.

This change has the side-effect of bringing model name to metrics for a number of services that were accidentally omitting it before.
2026-02-25 14:48:24 -05:00
Mark Backman
937c691f2a Merge pull request #3838 from pipecat-ai/mb/remove-playht
Remove PlayHT TTS services
2026-02-25 14:34:15 -05:00
Mark Backman
6803d38d3f Merge pull request #3833 from pipecat-ai/mb/add-performance-changelog-fragment
Add Performance as a changelog fragment option
2026-02-25 14:33:52 -05:00
Mark Backman
44993fe9e3 Remove PlayHT TTS services 2026-02-25 14:12:39 -05:00
Mark Backman
0fe4c732b7 Merge pull request #3837 from alts/alts/append-trailing-space
Add `append_trailing_space` to all Rime websocket services
2026-02-25 13:35:07 -05:00
Stephen Altamirano
ceead60ef2 Add append_trailing_space to all Rime websocket services
This was added in 31daa889e8, but only
to `RimeTTSService`, not to `RimeNonJsonTTSService. Bringing these
to parity means that users switching between the two, with the same
inputs, have more consistent vocalization behaviors.
2026-02-25 10:02:38 -08:00
Mark Backman
e028194dbe Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0 2026-02-25 12:23:13 -05:00
Mark Backman
81f4672535 Add Performance as a changelog fragment option 2026-02-25 09:47:42 -05:00
Mark Backman
9273b158ea Merge pull request #3825 from pipecat-ai/mb/llm-user-aggregator-interim-transcription
Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
2026-02-25 09:06:34 -05:00
Mark Backman
353a28842c Merge pull request #3807 from pipecat-ai/mb/update-openai-realtime-1.5
Update OpenAI Realtime default model to gpt-realtime-1.5
2026-02-25 09:06:19 -05:00
Mark Backman
3e6c59c736 Merge pull request #3809 from pipecat-ai/mb/krisp-viva-result
Add Krisp API key support and debug logging
2026-02-25 09:05:12 -05:00
Mark Backman
0ca8c850fb Add TurnMetricsData and e2e processing time for KrispVivaTurn
Introduce a generic TurnMetricsData class for turn detection metrics,
replacing the service-specific SmartTurnMetricsData (now deprecated).
Add end-to-end processing time measurement to KrispVivaTurn, tracking
the interval from VAD speech-to-silence transition to model threshold
crossing. Consume metrics in the strategy _handle_input_audio path
so they are pushed immediately when fresh.
2026-02-25 09:01:21 -05:00
Mark Backman
73ee4da7d4 Add Krisp API key support for new SDK licensing requirement
The Krisp VIVA SDK v1.8.0 requires a license key in globalInit(). Add
api_key parameter to KrispVivaSDKManager, KrispVivaTurn, and
KrispVivaFilter with fallback to KRISP_API_KEY env var. Maintain
backwards compatibility with older SDK versions by catching TypeError
and falling back to the old 3-arg signature.
2026-02-25 09:01:00 -05:00
Filipi da Silva Fuchter
2f60074da3 Merge pull request #3814 from pipecat-ai/filipi/fix_close_context
Fixed an issue where the TTS providers did not close the context after the audio context finished processing all audio.
2026-02-25 08:21:04 -05:00
filipi87
751b1b8100 Adding the changelog entries for the tts fixes. 2026-02-25 10:18:25 -03:00
filipi87
d899f0af11 Refactored all AudioContextTTSService based providers to override the new callbacks instead of _handle_interruption(), making provider-specific cleanup cleaner and more explicit 2026-02-25 10:18:16 -03:00
filipi87
c09ae6ba6d Added two new lifecycle callbacks to AudioContextTTSService: on_audio_context_interrupted() and on_audio_context_completed() 2026-02-25 10:17:54 -03:00
Mark Backman
a187a4b3b2 Merge pull request #3830 from pipecat-ai/aleix/restore-dev-skills 2026-02-25 06:33:16 -05:00
Aleix Conchillo Flaqué
68e19a730b Restore dev skills and add marketplace for maintainer workflows
Brings back the 6 development workflow skills (changelog, cleanup,
code-review, docstring, pr-description, pr-submit) that were moved
to pipecat-ai/skills, and adds a .claude-plugin/marketplace.json so
other pipecat-ai repos can install them. Updates README contributing
section with installation instructions.
2026-02-24 23:47:06 -08:00
Mark Backman
67cb7d575f Merge pull request #3828 from pipecat-ai/mb/skip-empty-audio-filter-frames
Skip empty audio frames after filter buffering
2026-02-24 23:27:22 -05:00
Mark Backman
a84930dc3e Skip empty audio frames after filter buffering
Audio filters like RNNoise, KrispViva, and AIC return empty bytes while
buffering audio to accumulate their required frame size. These empty
frames were flowing downstream, causing misleading "Empty audio frame
received for STT service" warnings.

Skip the frame in BaseInputTransport when audio is empty, preventing
unnecessary processing in VAD and downstream processors.

Fixes #3517
2026-02-24 23:21:52 -05:00