Commit Graph

1663 Commits

Author SHA1 Message Date
kompfner
a8cb2a26d1 Merge pull request #3841 from pipecat-ai/pk/groq-tweaks
A few Groq-related tweaks:
2026-02-25 15:54:33 -05:00
Paul Kompfner
ff0f3dce32 A few Groq-related tweaks:
- Wire up passing speed setting to Groq, even though only a value of 1.0 is supported today
- Update the 55y example to switch voices instead of changing speed
- Add a 55zn example to exercise runtime updates of Groq STT
2026-02-25 15:10:48 -05:00
Paul Kompfner
bca42f7d68 Fix Hathora 55 series examples, and fix Hathora missing settings field warning 2026-02-25 14:48:40 -05:00
Mark Backman
44993fe9e3 Remove PlayHT TTS services 2026-02-25 14:12:39 -05:00
Mark Backman
3e6c59c736 Merge pull request #3809 from pipecat-ai/mb/krisp-viva-result
Add Krisp API key support and debug logging
2026-02-25 09:05:12 -05:00
Mark Backman
0ca8c850fb Add TurnMetricsData and e2e processing time for KrispVivaTurn
Introduce a generic TurnMetricsData class for turn detection metrics,
replacing the service-specific SmartTurnMetricsData (now deprecated).
Add end-to-end processing time measurement to KrispVivaTurn, tracking
the interval from VAD speech-to-silence transition to model threshold
crossing. Consume metrics in the strategy _handle_input_audio path
so they are pushed immediately when fresh.
2026-02-25 09:01:21 -05:00
Mark Backman
73ee4da7d4 Add Krisp API key support for new SDK licensing requirement
The Krisp VIVA SDK v1.8.0 requires a license key in globalInit(). Add
api_key parameter to KrispVivaSDKManager, KrispVivaTurn, and
KrispVivaFilter with fallback to KRISP_API_KEY env var. Maintain
backwards compatibility with older SDK versions by catching TypeError
and falling back to the old 3-arg signature.
2026-02-25 09:01:00 -05:00
Paul Kompfner
bcc2b4def4 Make clearer the distinction between "storage-mode" and "delta-mode" usage of *Settings objects
- Storage mode: for use in `self._settings`. All fields should be specified, i.e. should not be `NOT_GIVEN`.
- Delta mode: for use in `*UpdateSettingsFrame`.

In service of this, this commit:
- Adds a runtime check that all fields are specified in storage mode
- Updates all services to specify all fields in stored settings
- Updates all services to no longer check for `is_given` in stored settings (not necessary anymore)
- Updates relevant docstrings
- Renames `update` to `delta` in `*UpdateSettingsFrame`
- Updates community integrations guide
2026-02-24 14:01:28 -05:00
Mark Backman
65f563ad34 Add debug logging to KrispVivaTurn analyze_end_of_turn and update example
Move speech detection tracking outside the per-frame loop in append_audio
since is_speech applies to the whole buffer. Add debug log in
analyze_end_of_turn to show state and probability at decision time. Update
the Krisp VIVA example to use Cartesia TTS and turn analyzer strategy.
2026-02-23 21:35:35 -05:00
Paul Kompfner
ff174dd1c2 Fix STT/TTS Deepgram Sagemaker 55-series examples (examples updating settings at runtime) 2026-02-23 16:02:00 -05:00
Paul Kompfner
029f3dbefb Updating 55o ElevenLabsTTSService example to also exercise switching voices, which requires reconnect 2026-02-23 12:08:13 -05:00
kompfner
03cb0054f9 Merge branch 'main' into pk/service-settings-refactor 2026-02-23 11:46:03 -05:00
Aleix Conchillo Flaqué
abb20f34ba Update default Anthropic model to claude-sonnet-4-6
Update the default model in AnthropicLLMService and remove the
now-unnecessary explicit model from the function calling example.
2026-02-20 16:17:51 -08:00
Aleix Conchillo Flaqué
af4ef95dc6 Fix missing await on add_audio_frames_message in Google audio examples
The method is async but was being called without await, silently
discarding the coroutine.
2026-02-20 14:24:22 -08:00
Filipi da Silva Fuchter
c9615c8db6 Merge pull request #3779 from pipecat-ai/filipi/filter_observer
Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer.
2026-02-20 12:42:02 -05:00
Mark Backman
82ce3ea8de Update 07c example to use DeepgramSageMakerTTSService 2026-02-20 08:10:41 -07:00
Mark Backman
273692421f Add DeepgramSageMakerTTSService for Deepgram TTS on AWS SageMaker
Adds a TTS service that connects to Deepgram models deployed on AWS
SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
Deepgram TTS protocol (Speak, Flush, Clear, Close) over the BiDi
client, with interruption handling and per-turn TTFB metrics.

Updates the example and env.example with separate STT/TTS endpoint names.
2026-02-20 08:08:00 -07:00
Paul Kompfner
fb27642190 Add self._settings to 6 remaining services
- AWSNovaSonicLLMService: new `AWSNovaSonicLLMSettings` with `voice_id` and `endpointing_sensitivity`; remove `self._params` entirely, storing audio I/O config as plain instance variables
- NeuphonicHttpTTSService: reuse `NeuphonicTTSSettings`; use inherited `language` field instead of bespoke `lang_code`
- NvidiaTTSService: new `NvidiaTTSSettings` with `quality`
- PiperTTSService / PiperHttpTTSService: new `PiperTTSSettings` / `PiperHttpTTSSettings` (no extra fields)
- SpeechmaticsTTSService: new `SpeechmaticsTTSSettings` with `max_retries`

Also remove redundant `lang_code` from `NeuphonicTTSSettings` (both WS and HTTP services now use the inherited `TTSSettings.language` field, with automatic enum conversion via the base class).

HTTP services (Neuphonic HTTP, Piper HTTP, Speechmatics) don't override `_update_settings` since the base class applies changes to `self._settings` and subsequent requests read from it automatically.
2026-02-19 18:35:59 -05:00
Paul Kompfner
463ea3725b Update Deepgram Flux with the new service settings pattern 2026-02-19 17:12:24 -05:00
Paul Kompfner
6c609031ee Add more 55-series examples
Also:
- remove unnecessary pass-through `_update_settings` implementation in `FalSTTService`
- warn that `AsyncAITTSService` doesn't currently support runtime settings updates
- update how `GradiumTTSService._update_settings` checks for voice changes
- remove a couple of unnecessary args (because they specified defaults) in other examples
2026-02-19 16:46:14 -05:00
filipi87
18630c9478 Adding changelog entry for RTVI observer ignored_sources feature. 2026-02-19 18:41:05 -03:00
filipi87
3a8d3cc841 Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer. 2026-02-19 18:36:12 -03:00
Paul Kompfner
cc54ff4708 Add more 55-series examples 2026-02-19 14:55:21 -05:00
Paul Kompfner
a7edd8e441 Fix 55zp example 2026-02-18 17:15:22 -05:00
Paul Kompfner
2a07138abf Fix Grok Realtime dynamic session properties updating, and update corresponding 55zo example 2026-02-18 17:12:36 -05:00
Paul Kompfner
ad942f6e4c Update 55zn example (UIltravox dynamic settings updates) to exercise changing modality, which is a setting that supports dynamic updates 2026-02-18 16:33:05 -05:00
Paul Kompfner
97d34ef9e1 Update OpenAI Realtime to warn when you try to update settings that can't be updated dynamically.
Update corresponding example to demonstrate updating output modality.
2026-02-18 16:16:06 -05:00
Paul Kompfner
c054780477 Fix 55zh example 2026-02-18 15:59:34 -05:00
Paul Kompfner
88a2dbdb82 Update 55zf example to update a setting that is supported by the default Camb TTS model 2026-02-18 15:48:50 -05:00
Paul Kompfner
d386a0efda Update Sarvam TTS to apply all changes to settings, not just voic 2026-02-18 15:31:08 -05:00
Paul Kompfner
b718a23c17 Tweak 55zd example 2026-02-18 15:25:50 -05:00
Paul Kompfner
e38f7d9451 Fix 55zc example 2026-02-18 15:23:23 -05:00
Paul Kompfner
b00d454842 Fix Inworld TTS settings updating 2026-02-18 15:19:57 -05:00
Paul Kompfner
0fa51811ea Fix 55z example 2026-02-18 15:11:04 -05:00
Paul Kompfner
323ee00b83 Fix 55w example 2026-02-18 14:51:48 -05:00
Paul Kompfner
0c73b77327 Update Lmnt TTS to support updating settings dynamically 2026-02-18 14:47:38 -05:00
Paul Kompfner
416e1cf877 Update Rime TTS services to store voice in the standard settings.voice field, as opposed to the nonstandard speaker field 2026-02-18 14:46:47 -05:00
Paul Kompfner
b4c5cb258b Tweak 55r example to make the settings update more pronounced 2026-02-18 14:15:14 -05:00
Paul Kompfner
728a97ade3 Update Deepgram TTS to support updating settings dynamically 2026-02-18 14:11:51 -05:00
Paul Kompfner
28677ec829 Tweak 55p example to make the settings update more pronounced 2026-02-18 13:49:32 -05:00
Paul Kompfner
17886d14e8 Fix ElevenLabsTTSService settings update code 2026-02-18 13:47:02 -05:00
Paul Kompfner
caf5dacbe8 Update 55j example to avoid console warning 2026-02-18 12:37:50 -05:00
Paul Kompfner
b8b531b66a In Cartesia TTS service, we don't need to override _update_settings. Parent class handling is enough, as new settings are picked up on the next run_tts (no need to reconnect). 2026-02-18 12:37:34 -05:00
Paul Kompfner
a14690e3a0 Fix the 55i example 2026-02-18 11:55:14 -05:00
Paul Kompfner
d913d954db Fix SpeechmaticsSTTService settings update code, and augment test file to better exercise it 2026-02-18 11:34:52 -05:00
Paul Kompfner
e98bb1df66 Simplify 55* examples: inline the settings update directly in the on_client_connected handler instead of wrapping it in a separate async task 2026-02-18 11:06:33 -05:00
Paul Kompfner
d7d94a29f0 Add foundational examples (55) for runtime settings updates via *UpdateSettingsFrame
42 examples covering STT (13), TTS (21), LLM (4), and realtime (4) services. Each demonstrates updating service settings 10 seconds after client connects, verifying the typed settings machinery end-to-end for every provider.
2026-02-18 09:46:23 -05:00
Mark Backman
507765625f Make UserIdleController always-on with dynamic timeout updates
Always create UserIdleController (timeout=0 means disabled), removing
all Optional guards. Add UserIdleTimeoutUpdateFrame to allow changing
the idle timeout at runtime.
2026-02-14 09:54:30 -05:00
Mark Backman
012ef41ff4 Redesign UserIdleController to use BotStoppedSpeakingFrame
Replace the continuous heartbeat-based timer (UserSpeakingFrame/BotSpeakingFrame
+ asyncio.Event loop) with a simple one-shot timer that starts when
BotStoppedSpeakingFrame is received and cancels on UserStartedSpeakingFrame or
BotStartedSpeakingFrame. This eliminates false idle triggers caused by gaps
between the user finishing speaking and the bot starting to speak (LLM/TTS
latency).

Guard the timer start with two conditions to prevent false triggers:
- User turn in progress: during interruptions, BotStoppedSpeaking arrives
  while the user is still speaking mid-turn.
- Function calls in progress: FunctionCallsStarted arrives before
  BotStoppedSpeaking because the bot speaks concurrently with the function
  call starting, so the timer must wait for the result and subsequent bot
  response.
2026-02-14 08:55:56 -05:00
Paul Kompfner
8a4ab611be Broad service settings refactor, with the primary aim of making service settings discoverable and strongly-typed. Service settings can be updated at runtime with *UpdateSettingsFrames.
Does not (yet) touch `InputParams`, to avoid scope creep and touching something currently part of the public API. But there is a lot of overlap between `*Settings` object fields and `InputParams` fields.

Other than discoverability/typing, these are some other improvements brought by this refactor:
- There is now a single code path (see `_update_settings_from_typed`) where services can respond to settings changes (by, say, reconnecting if needed), improving maintainability and guaranteeing one and only one reconnection no matter which settings changed
- `set_language`/`set_model`/`set_voice`—which we're assuming are usable as public methods, though *not* recommended over `*UpdateSettingsFrame`—all use the same code path as settings updates. They're also now all consistent in that, if a service needs to respond to a change (by, say, reconnecting if needed), any of these methods will kick off that process. Note that this is technically a behavior change.
- Several services now properly react to changed settings by reconnecting:
  - `AWSTranscribeSTTService`
  - `AzureSTTService`
  - `SonioxSTTService`
  - `GladiaSTTService`
  - `SpeechmaticsSTTService`
  - `AssemblyAISTTService`
  - `CartesiaSTTService`
  - `FishAudioTTSService` (would previously only reconnect when `model` changed)
  - `GoogleSTTService`
  - `SpeechmaticsSTTService` (which previously only handled *some* settings updates through a nonstandard public `update_params` method)
  - `GradiumSTTService`
  - `NvidiaSegmentedSTTService` (which previously only handled changes to language)
- Bookkeeping across various services has been reduced, mostly by deduping ivars; the `self._settings` ivar is treated as the source of truth

NOTE: I pretty much guarantee that there are services missed in this PR in terms of bringing to consistency with how updates are handled (like whether changes in certain fields trigger reconnects when they need to). We can squash remaining inconsistencies as we stumble onto them, service by service. The goal here is to get things *mostly* in order, and establish the infrastructure and patterns we'll need going forward.
2026-02-13 15:12:26 -05:00