Compare commits

...

359 Commits

Author SHA1 Message Date
Mark Backman
4b4e8b839c Add changelog for PR #3851 2026-02-26 18:27:50 -05:00
Mark Backman
86c2dd5cfc Remove processing metrics (ProcessingMetricsData)
Processing metrics were an early addition that predated a clear
understanding of what timing measurements matter in real-time pipelines.
They were inconsistently implemented across services, often broken, and
overlapped with the better-defined TTFB metric.

- Remove ProcessingMetricsData class and all start/stop_processing_metrics
  methods from FrameProcessorMetrics, FrameProcessor, and SentryMetrics
- Remove all processing metrics calls from 31 service files (LLM, TTS,
  STT, image, vision, realtime)
- Clean up empty _start_metrics() stubs left in STT services
- Remove processing metrics handling from RTVI, metrics log observer,
  pipeline task initial metrics, and strands agents framework
- Update tests and examples

Remaining metrics (TTFB, LLM token usage, TTS character usage, text
aggregation time) are well-defined and consistently implemented.
2026-02-26 18:20:49 -05:00
kompfner
7fe458fe59 Merge pull request #3817 from pipecat-ai/pk/service-settings-fix-back-compat-for-nested-external-sdk-types
Flatten `LiveOptions` into individual fields on `DeepgramSTTSettings`…
2026-02-26 11:08:27 -05:00
Paul Kompfner
faed775d90 Extract _DeepgramSTTSettingsBase with shared _merge_live_options_delta to deduplicate LiveOptions merge logic between __init__ and apply_update, and between the Deepgram STT and SageMaker variants; make top-level model/language take precedence over conflicting live_options values in updates; remove unnecessary Language enum-to-string conversion (Language is a StrEnum) 2026-02-26 11:02:44 -05:00
Mark Backman
b63ca524f5 Merge pull request #3806 from pipecat-ai/mb/ultravox-updates
Align Ultravox Realtime service with OpenAI/Gemini patterns
2026-02-26 10:49:21 -05:00
Mark Backman
907ff58d41 Align Ultravox Realtime service with OpenAI/Gemini patterns
- Add InterruptionFrame handling with stop_all_metrics()
- Add processing metrics (start/stop) at response boundaries
- Fix agent transcript handling for voice and text modalities:
  - Voice mode: push LLMTextFrame (append_to_context=False) and
    TTSTextFrame for deltas, skip duplicated final text
  - Text mode: push LLMTextFrame with proper response lifecycle,
    no TTSTextFrame (downstream TTS handles audio)
- Add output_medium parameter to AgentInputParams and OneShotInputParams
- Improve TTFB measurement using VAD speech end time
- Update example with user turn strategies and transcript events
- Add text-only output example (50a-ultravox-realtime-text.py)
2026-02-26 10:44:36 -05:00
Mark Backman
97b93ebe57 Merge pull request #3696 from pipecat-ai/mb/streaming-tts-input
Improve streaming TTS input support, add TextAggregationMetricsData
2026-02-26 10:26:53 -05:00
Mark Backman
3ae173520e Code review feedback 2026-02-26 10:23:35 -05:00
Paul Kompfner
c184ac09b8 Inline _build_live_options into _connect in DeepgramSTTService and DeepgramSageMakerSTTService since it's trivial and only called from one place 2026-02-26 09:42:15 -05:00
Paul Kompfner
3c20eda8bf Keep model/language in LiveOptions at construction time so apply_update's bidirectional sync is sufficient; simplify _build_live_options to only add sample_rate 2026-02-26 09:32:52 -05:00
Mark Backman
d69a337def Add text_aggregation_mode parameter to TTSService
Move the sentence vs token aggregation concern into text aggregators
so all text flows through them regardless of mode. This enables
pattern detection and tag handling to work in TOKEN mode.

- Add TextAggregationMode enum (SENTENCE, TOKEN) as the user-facing
  TTS setting, separate from the internal AggregationType
- Add TOKEN mode support to Simple, SkipTags, and PatternPair aggregators
- Add text_aggregation_mode parameter to TTSService and all TTS subclasses
- Deprecate aggregate_sentences in favor of text_aggregation_mode
- Merge TTSService._process_text_frame() into a single codepath
2026-02-26 08:55:41 -05:00
Mark Backman
f7434cdde1 Add text aggregation time metric for TTS sentence aggregation
Add TextAggregationMetricsData measuring the time from the first LLM
token to the first complete sentence, representing the latency cost of
sentence aggregation in the TTS pipeline.
2026-02-26 08:48:47 -05:00
Paul Kompfner
e21e8585f0 Add deepgram and sagemaker extras to CI test dependencies so Deepgram and Deepgram Sagemaker settings tests can run 2026-02-25 18:59:59 -05:00
Paul Kompfner
8b6aa4b912 Unflatten LiveOptions back into a single live_options field on DeepgramSTTSettings and DeepgramSageMakerSTTSettings; add apply_update override with delta-merge semantics and from_mapping override for backward-compatible dict-style updates 2026-02-25 18:25:11 -05:00
Paul Kompfner
a4b6db6fb4 Flatten LiveOptions into individual fields on DeepgramSTTSettings and DeepgramSageMakerSTTSettings for backward-compatible dict-style updates via STTUpdateSettingsFrame; during the big service settings refactor, we accidentally got rid of the ability to update individual LiveOptions fields with a sparse update 2026-02-25 17:39:31 -05:00
Mark Backman
edc79d374a Merge pull request #3836 from pipecat-ai/mb/small-webrtc-prebuilt-2.3.0
Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0
2026-02-25 17:18:32 -05:00
Mark Backman
e521aef5df Merge pull request #3842 from pipecat-ai/mb/claude-plugin-docs
Add /update-docs skill to claude-plugin
2026-02-25 16:38:16 -05:00
kompfner
3cfff51205 Merge pull request #3827 from pipecat-ai/pk/gemini-tts-service-remove-model-ivar
Remove unnecessary `_model` ivar from `GeminiTTSService`, using `_set…
2026-02-25 16:14:38 -05:00
Paul Kompfner
3d8e3a4043 Remove unnecessary _model ivar from ElevenLabs STT services, using _settings.model instead. 2026-02-25 16:07:33 -05:00
Paul Kompfner
7ee0400c4c Remove unnecessary _model ivar from Hathora TTS and STT services, using _settings.model instead. 2026-02-25 16:07:26 -05:00
Paul Kompfner
781d191509 Remove unnecessary _model ivar from GeminiTTSService, using _settings.model instead 2026-02-25 15:59:38 -05:00
kompfner
a8cb2a26d1 Merge pull request #3841 from pipecat-ai/pk/groq-tweaks
A few Groq-related tweaks:
2026-02-25 15:54:33 -05:00
kompfner
b1df1ba5d4 Merge pull request #3834 from pipecat-ai/pk/make-ai-service-exclusive-syncer-of-model-name-to-metrics
Make it so that `AIService` is the exclusive "syncer" of model name t…
2026-02-25 15:53:59 -05:00
Mark Backman
eee2ef7e85 Add /update-docs skill to claude-plugin 2026-02-25 15:45:16 -05:00
Paul Kompfner
ff0f3dce32 A few Groq-related tweaks:
- Wire up passing speed setting to Groq, even though only a value of 1.0 is supported today
- Update the 55y example to switch voices instead of changing speed
- Add a 55zn example to exercise runtime updates of Groq STT
2026-02-25 15:10:48 -05:00
Paul Kompfner
bca42f7d68 Fix Hathora 55 series examples, and fix Hathora missing settings field warning 2026-02-25 14:48:40 -05:00
Paul Kompfner
27940d83a2 Make it so that AIService is the exclusive "syncer" of model name to metrics.
The only (rare) exception—where a service directly still needs to directly call `self._sync_model_name_to_metrics()`—is when the model name need to be "pulled" from another field (or nested field) in settings up to settings.model on a settings update. This only occurs in Deepgram services, where we use the voice as the model name.

This change has the side-effect of bringing model name to metrics for a number of services that were accidentally omitting it before.
2026-02-25 14:48:24 -05:00
Mark Backman
937c691f2a Merge pull request #3838 from pipecat-ai/mb/remove-playht
Remove PlayHT TTS services
2026-02-25 14:34:15 -05:00
Mark Backman
6803d38d3f Merge pull request #3833 from pipecat-ai/mb/add-performance-changelog-fragment
Add Performance as a changelog fragment option
2026-02-25 14:33:52 -05:00
Mark Backman
44993fe9e3 Remove PlayHT TTS services 2026-02-25 14:12:39 -05:00
Mark Backman
0fe4c732b7 Merge pull request #3837 from alts/alts/append-trailing-space
Add `append_trailing_space` to all Rime websocket services
2026-02-25 13:35:07 -05:00
Stephen Altamirano
ceead60ef2 Add append_trailing_space to all Rime websocket services
This was added in 31daa889e8, but only
to `RimeTTSService`, not to `RimeNonJsonTTSService. Bringing these
to parity means that users switching between the two, with the same
inputs, have more consistent vocalization behaviors.
2026-02-25 10:02:38 -08:00
Mark Backman
e028194dbe Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0 2026-02-25 12:23:13 -05:00
Mark Backman
81f4672535 Add Performance as a changelog fragment option 2026-02-25 09:47:42 -05:00
Mark Backman
9273b158ea Merge pull request #3825 from pipecat-ai/mb/llm-user-aggregator-interim-transcription
Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
2026-02-25 09:06:34 -05:00
Mark Backman
353a28842c Merge pull request #3807 from pipecat-ai/mb/update-openai-realtime-1.5
Update OpenAI Realtime default model to gpt-realtime-1.5
2026-02-25 09:06:19 -05:00
Mark Backman
3e6c59c736 Merge pull request #3809 from pipecat-ai/mb/krisp-viva-result
Add Krisp API key support and debug logging
2026-02-25 09:05:12 -05:00
Mark Backman
0ca8c850fb Add TurnMetricsData and e2e processing time for KrispVivaTurn
Introduce a generic TurnMetricsData class for turn detection metrics,
replacing the service-specific SmartTurnMetricsData (now deprecated).
Add end-to-end processing time measurement to KrispVivaTurn, tracking
the interval from VAD speech-to-silence transition to model threshold
crossing. Consume metrics in the strategy _handle_input_audio path
so they are pushed immediately when fresh.
2026-02-25 09:01:21 -05:00
Mark Backman
73ee4da7d4 Add Krisp API key support for new SDK licensing requirement
The Krisp VIVA SDK v1.8.0 requires a license key in globalInit(). Add
api_key parameter to KrispVivaSDKManager, KrispVivaTurn, and
KrispVivaFilter with fallback to KRISP_API_KEY env var. Maintain
backwards compatibility with older SDK versions by catching TypeError
and falling back to the old 3-arg signature.
2026-02-25 09:01:00 -05:00
Filipi da Silva Fuchter
2f60074da3 Merge pull request #3814 from pipecat-ai/filipi/fix_close_context
Fixed an issue where the TTS providers did not close the context after the audio context finished processing all audio.
2026-02-25 08:21:04 -05:00
filipi87
751b1b8100 Adding the changelog entries for the tts fixes. 2026-02-25 10:18:25 -03:00
filipi87
d899f0af11 Refactored all AudioContextTTSService based providers to override the new callbacks instead of _handle_interruption(), making provider-specific cleanup cleaner and more explicit 2026-02-25 10:18:16 -03:00
filipi87
c09ae6ba6d Added two new lifecycle callbacks to AudioContextTTSService: on_audio_context_interrupted() and on_audio_context_completed() 2026-02-25 10:17:54 -03:00
Mark Backman
a187a4b3b2 Merge pull request #3830 from pipecat-ai/aleix/restore-dev-skills 2026-02-25 06:33:16 -05:00
Aleix Conchillo Flaqué
68e19a730b Restore dev skills and add marketplace for maintainer workflows
Brings back the 6 development workflow skills (changelog, cleanup,
code-review, docstring, pr-description, pr-submit) that were moved
to pipecat-ai/skills, and adds a .claude-plugin/marketplace.json so
other pipecat-ai repos can install them. Updates README contributing
section with installation instructions.
2026-02-24 23:47:06 -08:00
Mark Backman
67cb7d575f Merge pull request #3828 from pipecat-ai/mb/skip-empty-audio-filter-frames
Skip empty audio frames after filter buffering
2026-02-24 23:27:22 -05:00
Mark Backman
a84930dc3e Skip empty audio frames after filter buffering
Audio filters like RNNoise, KrispViva, and AIC return empty bytes while
buffering audio to accumulate their required frame size. These empty
frames were flowing downstream, causing misleading "Empty audio frame
received for STT service" warnings.

Skip the frame in BaseInputTransport when audio is empty, preventing
unnecessary processing in VAD and downstream processors.

Fixes #3517
2026-02-24 23:21:52 -05:00
kompfner
54fd73c460 Merge pull request #3821 from pipecat-ai/pk/fix-missing-field-warning-rime-tts
Fix missing field warning in `RimeTTSService`
2026-02-24 21:58:17 -05:00
kompfner
c954e1c898 Merge pull request #3820 from pipecat-ai/pk/fix-breakage-when-sending-generic-settings-update
Fix breakage when using a generic settings update (e.g. a `TTSSetting…
2026-02-24 21:58:05 -05:00
kompfner
db76cd052a Merge pull request #3819 from pipecat-ai/pk/make-update-settings-frames-uninterruptible
Make `ServiceUpdateSettingsFrame` uninterruptible—settings updates ar…
2026-02-24 21:57:41 -05:00
Mark Backman
167e68672b Add changelog for InterimTranscriptionFrame/TranslationFrame fix 2026-02-24 20:52:16 -05:00
Mark Backman
69d916ca51 Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
These frames were falling through to the else branch and being pushed
downstream, unlike TranscriptionFrame which is explicitly consumed.
This aligns with how the assistant aggregator already filters them.
2026-02-24 20:51:41 -05:00
Mark Backman
9890b93d08 Merge pull request #3822 from pipecat-ai/fix/stt-ttfb-timeout-timestamp
Fix STT TTFB timeout measuring to use transcript arrival time
2026-02-24 19:45:15 -05:00
Mark Backman
f928206b3a Add changelog for STT TTFB timeout fix 2026-02-24 19:02:40 -05:00
Mark Backman
f421ad9cf6 Fix STT TTFB timeout measuring to timeout expiry instead of transcript time
PR #3776 replaced manual timestamp tracking with stop_ttfb_metrics() in
the timeout handler, but without an end_time it uses time.time() at
timeout expiry—producing TTFB = timeout + stop_secs (~2.2s) instead of
the actual transcript latency.

Restore _last_transcript_time tracking so the timeout handler measures
to when the transcript arrived, and skip reporting if none arrived.
2026-02-24 18:57:38 -05:00
Paul Kompfner
d918a20b75 Fix missing field warning in RimeTTSService 2026-02-24 18:14:16 -05:00
Paul Kompfner
d91c230b85 Fix breakage when using a generic settings update (e.g. a TTSSettings) instead of a more specific one (e.g. a RimeTTSSettings). Both should work, assuming you're only changing fields present in the generic settings. 2026-02-24 18:05:27 -05:00
Paul Kompfner
b6f21ab15d Make ServiceUpdateSettingsFrame uninterruptible—settings updates are generally independent of specific utterances.
Before this change, settings updates were often not applied. For example, a `TTSUpdateSettingsFrame` queued while the bot was speaking would only have an effect at the end of the bot's reply, and any interruption before the end of the reply would "cancel" the update.
2026-02-24 17:47:53 -05:00
kompfner
6f0061ab96 Merge pull request #3812 from pipecat-ai/pk/service-settings-storage-v-delta-mode
Make clearer the distinction between "storage-mode" and "delta-mode" …
2026-02-24 15:37:49 -05:00
Aleix Conchillo Flaqué
761397d1f9 Merge pull request #3816 from pipecat-ai/aleix/use-skills-repo
Move skills to pipecat-ai/skills repo, add README instructions
2026-02-24 12:11:20 -08:00
Aleix Conchillo Flaqué
d9bb4d07c6 Merge pull request #3815 from pipecat-ai/fix/sentry-metrics-signatures
Fix SentryMetrics method signatures to match base class
2026-02-24 12:11:07 -08:00
Aleix Conchillo Flaqué
ee46cbce4c Move skills to pipecat-ai/skills repo, add README instructions
Remove bundled Claude Code skills (changelog, cleanup, code-review,
docstring, pr-description, pr-submit) that now live in
https://github.com/pipecat-ai/skills. Add a section to the README
with installation instructions. The update-docs skill remains as
it is specific to this repository.
2026-02-24 11:41:19 -08:00
Aleix Conchillo Flaqué
b4b9976b9c Fix SentryMetrics method signatures to match base class
Update start_ttfb_metrics, stop_ttfb_metrics, start_processing_metrics,
and stop_processing_metrics to accept start_time/end_time keyword
arguments matching the updated FrameProcessorMetrics signatures.

Closes #3808
2026-02-24 11:26:34 -08:00
Paul Kompfner
b78a293ffb Flatten input_params into individual fields on SonioxSTTSettings and GladiaSTTSettings
This makes each service-specific field individually visible to the delta/update mechanism (`apply_update`, `given_fields`) and removes the need for complex sync logic between `input_params` and top-level fields like `model`.

- Soniox: replace `input_params: SonioxInputParams` with 8 individual fields, simplify `_update_settings` by removing model sync logic, remove unused `is_given` import
- Gladia: replace `input_params: GladiaInputParams` with 11 individual fields, resolve deprecated `language` → `language_config` at init time rather than at `_prepare_settings` time
2026-02-24 14:01:43 -05:00
Paul Kompfner
0a89d24f70 Update some more services to ensure that there are no un-initialized fields in self._settings 2026-02-24 14:01:43 -05:00
Paul Kompfner
8c9ccf8f82 Bump various deprecation messages from mentioning version 0.0.103 to 0.0.104 2026-02-24 14:01:43 -05:00
Paul Kompfner
bcc2b4def4 Make clearer the distinction between "storage-mode" and "delta-mode" usage of *Settings objects
- Storage mode: for use in `self._settings`. All fields should be specified, i.e. should not be `NOT_GIVEN`.
- Delta mode: for use in `*UpdateSettingsFrame`.

In service of this, this commit:
- Adds a runtime check that all fields are specified in storage mode
- Updates all services to specify all fields in stored settings
- Updates all services to no longer check for `is_given` in stored settings (not necessary anymore)
- Updates relevant docstrings
- Renames `update` to `delta` in `*UpdateSettingsFrame`
- Updates community integrations guide
2026-02-24 14:01:28 -05:00
Filipi da Silva Fuchter
57d25c564c Merge pull request #3786 from pipecat-ai/filipi/refactor_word_tts_service
Refactoring the services using the WordTTSService
2026-02-24 13:53:58 -05:00
filipi87
6cda2ff941 Changelog entry for word timestamp refactor and deprecation notes. 2026-02-24 15:49:02 -03:00
filipi87
323477bfa4 Refactoring the services using the WordTTSService. 2026-02-24 15:48:46 -03:00
Mark Backman
fa692ec989 Merge pull request #3813 from pipecat-ai/mb/fix-stt-ttfb
Fix STT TTFB metrics for Soniox and AWS Transcribe
2026-02-24 13:12:32 -05:00
Mark Backman
23ad181515 Fix Soniox processing metrics to measure token-to-transcript time
Move start_processing_metrics from run_stt (called per audio chunk,
producing noisy 0ms logs) to _receive_messages when the first final
token arrives for a new utterance. The existing stop_processing_metrics
in send_endpoint_transcript completes the pair, giving a meaningful
measurement of time from first recognition to finalized transcript.
2026-02-24 13:09:29 -05:00
Mark Backman
6f7664846c Add can_generate_metrics to Soniox and AWS Transcribe STT services
Commit 859cd7c9 refactored STT TTFB measurement to use the base class
start_ttfb_metrics/stop_ttfb_metrics, which are gated behind
can_generate_metrics(). Soniox and AWS Transcribe never overrode this
method (default returns False), so TTFB was silently never reported.
2026-02-24 12:59:44 -05:00
Mark Backman
081aaa50dc Merge pull request #3811 from pipecat-ai/mb/nltk-upgrade
Bump nltk minimum version from 3.9.1 to 3.9.3
2026-02-24 10:28:32 -05:00
Mark Backman
aff8ab8a40 Update OpenAI Realtime default model to gpt-realtime-1.5 2026-02-24 09:07:31 -05:00
Mark Backman
0f7e6e14ab Bump nltk minimum version from 3.9.1 to 3.9.3
Resolves a security vulnerability flagged by Dependabot (#164).
2026-02-24 08:56:00 -05:00
Mark Backman
65f563ad34 Add debug logging to KrispVivaTurn analyze_end_of_turn and update example
Move speech detection tracking outside the per-frame loop in append_audio
since is_speech applies to the whole buffer. Add debug log in
analyze_end_of_turn to show state and probability at decision time. Update
the Krisp VIVA example to use Cartesia TTS and turn analyzer strategy.
2026-02-23 21:35:35 -05:00
Mark Backman
9c2ac661a3 Merge pull request #3805 from pipecat-ai/mb/dataclass-basemodel
Add dataclass vs Pydantic BaseModel convention to CLAUDE.md
2026-02-23 19:32:31 -05:00
kompfner
cdd65b6c0a Merge pull request #3714 from pipecat-ai/pk/service-settings-refactor
Broad refactor of service settings and how they’re updated at runtime
2026-02-23 17:15:15 -05:00
Paul Kompfner
71fc078c24 Refine ServiceSettings docstring: clarify NOT_GIVEN semantics and fix frame reference
Use wildcard `*UpdateSettingsFrame` to cover all frame types. Clarify that NOT_GIVEN only appears in update deltas, not in the service's current settings state.
2026-02-23 16:55:11 -05:00
Paul Kompfner
7556427862 Revise changelog entries for service settings refactor
Split the single "changed" entry into separate "added", "changed", and "deprecated" entries for clarity. Add a note about the subtle behavior change in the deprecated set_model/set_voice/set_language methods.
2026-02-23 16:52:11 -05:00
Paul Kompfner
bcf11ecbd4 Looks like the Deepgram Sagemaker TTS services aren't able yet to successfully disconnect/reconnect to apply runtime settings updates. For now, marking them as not yet supporting runtime settings updates. 2026-02-23 16:02:00 -05:00
Paul Kompfner
ff174dd1c2 Fix STT/TTS Deepgram Sagemaker 55-series examples (examples updating settings at runtime) 2026-02-23 16:02:00 -05:00
Paul Kompfner
e804060e17 Update COMMUNITY_INTEGRATIONS.md _update_settings examples
Simplify the reconnect example to show a common pattern (reconnect on any change) and improve the _warn_unhandled_updated_settings example to show selective handling of specific fields.
2026-02-23 15:45:00 -05:00
Paul Kompfner
30db5fea7c Clarify that ServiceSettings and subclasses represent runtime-updatable settings
Update docstrings for ServiceSettings, LLMSettings, TTSSettings, and STTSettings to make clear these capture only the subset of service configuration that can be changed while the pipeline is running via UpdateSettingsFrame, not all constructor parameters.
2026-02-23 15:38:57 -05:00
Mark Backman
c527e1f30f Add dataclass vs Pydantic BaseModel rule to CLAUDE.md
Document the existing convention: use @dataclass for frames and
internal pipeline data, use Pydantic BaseModel for configuration,
parameters, metrics, and external API data.
2026-02-23 14:26:16 -05:00
Paul Kompfner
029f3dbefb Updating 55o ElevenLabsTTSService example to also exercise switching voices, which requires reconnect 2026-02-23 12:08:13 -05:00
kompfner
03cb0054f9 Merge branch 'main' into pk/service-settings-refactor 2026-02-23 11:46:03 -05:00
Mark Backman
6a7e9358c6 Merge pull request #3803 from pipecat-ai/mb/inline-smart-turn-v3-deps
Inline local-smart-turn-v3 deps for Poetry compatibility
2026-02-23 09:29:52 -05:00
Mark Backman
6a3718d33d Inline local-smart-turn-v3 deps for Poetry compatibility
Replace self-referential `pipecat-ai[local-smart-turn-v3]` extra in core
dependencies with the actual packages (`transformers`, `onnxruntime`).
Self-referential extras are not supported by Poetry and cause dependency
resolution failures. Since these are required by the default turn stop
strategy (LocalSmartTurnAnalyzerV3), they belong in core dependencies.

- Remove `local-smart-turn-v3` optional extra from pyproject.toml
- Remove try/except ModuleNotFoundError guard (now always installed)
- Remove `--extra local-smart-turn-v3` from CI workflows
2026-02-23 09:00:36 -05:00
Aleix Conchillo Flaqué
b67af19d47 Merge pull request #3793 from pipecat-ai/changelog-0.0.103
Release 0.0.103 - Changelog Update
2026-02-20 16:42:46 -08:00
aconchillo
6d9c07b945 Update changelog for version 0.0.103 2026-02-20 16:39:36 -08:00
Aleix Conchillo Flaqué
18429f80f1 github(changelog): allow performance type 2026-02-20 16:32:40 -08:00
Aleix Conchillo Flaqué
0a54dc9721 Merge pull request #3792 from pipecat-ai/aleix/update-anthropic-default-model
Update default Anthropic model to claude-sonnet-4-6
2026-02-20 16:28:08 -08:00
Aleix Conchillo Flaqué
521f669051 Add changelog entries for PR #3792 2026-02-20 16:18:21 -08:00
Aleix Conchillo Flaqué
abb20f34ba Update default Anthropic model to claude-sonnet-4-6
Update the default model in AnthropicLLMService and remove the
now-unnecessary explicit model from the function calling example.
2026-02-20 16:17:51 -08:00
Aleix Conchillo Flaqué
b1e72ad4b7 Merge pull request #3789 from pipecat-ai/aleix/fix-missing-await-and-interruption-hang
Fix missing await and interruption timeout hang
2026-02-20 14:59:11 -08:00
Aleix Conchillo Flaqué
f610fb95f9 Add changelog entries for PR #3789 2026-02-20 14:56:46 -08:00
Aleix Conchillo Flaqué
827032fefb Unblock push_interruption_task_frame_and_wait after timeout
When the InterruptionFrame does not complete within the timeout the
caller was stuck in an infinite loop logging warnings. Now the event
is set after the first timeout so the processor can continue.

Also adds a keyword timeout parameter so callers can customize the
wait duration.
2026-02-20 14:56:42 -08:00
Aleix Conchillo Flaqué
af4ef95dc6 Fix missing await on add_audio_frames_message in Google audio examples
The method is async but was being called without await, silently
discarding the coroutine.
2026-02-20 14:24:22 -08:00
Aleix Conchillo Flaqué
0370bb15e4 update uv.lock 2026-02-20 13:47:04 -08:00
Aleix Conchillo Flaqué
2b3595485f Merge pull request #3788 from dhruvladia-sarvam/v3-fix-final
initial
2026-02-20 13:46:18 -08:00
Paul Kompfner
af4226adbf Add changelog entries for service settings refactor PR #3714 2026-02-20 15:26:17 -05:00
Paul Kompfner
29e2a861dc Update AIService.set_model_name to AIService._sync_model_name_to_metrics to:
- indicate clearly that it's not meant for public use
- make it clear the `self._settings` is the single source of truth for model information
- set the stage for an upcoming change where `AIService` subclasses won't have to ever worry about explicitly calling an `AIService` method to sync model name to metrics

Across all services, switch from accessing `self._model_name` or `self.model_name` in favor of `self._settings.model`.
2026-02-20 15:02:50 -05:00
Filipi da Silva Fuchter
63c664becb Merge pull request #3787 from pipecat-ai/filipi/refresh_active_audio_context
Fix race condition where context times out after sending second transcript
2026-02-20 14:50:38 -05:00
dhruvladia-sarvam
fecf462139 initial 2026-02-21 01:02:37 +05:30
Daksh Dua
023063759a Changelog entry for TTS race condition fix. 2026-02-20 16:00:34 -03:00
Daksh Dua
c49eda98e7 Fix race condition where context times out after sending second transcript 2026-02-20 15:37:14 -03:00
Filipi da Silva Fuchter
5d07326e36 Merge pull request #3732 from pipecat-ai/filipi/tts_updates
Refactored audio context management in TTS services
2026-02-20 13:02:42 -05:00
filipi87
fa659311b6 Changelog entry 2026-02-20 14:57:59 -03:00
filipi87
125c423356 Refactored audio context management in TTS services to improve encapsulation and reduce code duplication 2026-02-20 14:57:44 -03:00
Filipi da Silva Fuchter
c9615c8db6 Merge pull request #3779 from pipecat-ai/filipi/filter_observer
Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer.
2026-02-20 12:42:02 -05:00
Aleix Conchillo Flaqué
28c542f6ed Merge pull request #3785 from pipecat-ai/mb/deepgram-sagemaker-tts
Add DeepgramSageMakerTTSService
2026-02-20 09:01:32 -08:00
Paul Kompfner
f5b86d9cdc Actually, revert the change making it so that STTService takes model and language args at init time. It'll be up to the subclasses to append those to _settings (or better yet, provide their own service-specific _settings). This avoids rocking the boat too too much. 2026-02-20 11:26:28 -05:00
Aleix Conchillo Flaqué
5708c81b93 Merge pull request #3782 from pipecat-ai/aleix/fix-mutable-default-args-aggregator-pair
Fix mutable default arguments in LLMContextAggregatorPair
2026-02-20 08:02:18 -08:00
Paul Kompfner
f4e9825c03 Remove self._voice_id from TTS Service implementations in favor of self._settings.voice 2026-02-20 10:52:57 -05:00
Mark Backman
82ce3ea8de Update 07c example to use DeepgramSageMakerTTSService 2026-02-20 08:10:41 -07:00
Mark Backman
62ada92188 Add changelog for PR #3785 2026-02-20 08:09:57 -07:00
Mark Backman
273692421f Add DeepgramSageMakerTTSService for Deepgram TTS on AWS SageMaker
Adds a TTS service that connects to Deepgram models deployed on AWS
SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
Deepgram TTS protocol (Speak, Flush, Clear, Close) over the BiDi
client, with interruption handling and per-turn TTFB metrics.

Updates the example and env.example with separate STT/TTS endpoint names.
2026-02-20 08:08:00 -07:00
Paul Kompfner
5d8a5bf750 Add initialization of self._settings to service superclasses (STTService, TTSService, LLMService), using "generic" settings for those services (STTSettings, TTSSettings, LLMSettings) 2026-02-20 09:31:22 -05:00
Mark Backman
0a3e212f93 Merge pull request #3784 from pipecat-ai/mb/stt-sagemaker-finalize
Align DeepgramSageMakerSTTService finalize pattern with DeepgramSTTService
2026-02-20 09:26:23 -05:00
Mark Backman
43d686c622 Add changelog entry for PR #3784 2026-02-20 07:17:36 -07:00
Mark Backman
4d136e1e28 Align DeepgramSageMakerSTTService finalize pattern with DeepgramSTTService 2026-02-20 07:15:38 -07:00
Aleix Conchillo Flaqué
2024285c75 Add changelog entries for PR #3782 2026-02-19 20:52:31 -08:00
Aleix Conchillo Flaqué
bc830c16f1 Fix mutable default arguments in LLMContextAggregatorPair
Replace mutable default parameter values with None and instantiate
inside the method body to avoid shared state across calls.
2026-02-19 20:52:00 -08:00
Paul Kompfner
fb27642190 Add self._settings to 6 remaining services
- AWSNovaSonicLLMService: new `AWSNovaSonicLLMSettings` with `voice_id` and `endpointing_sensitivity`; remove `self._params` entirely, storing audio I/O config as plain instance variables
- NeuphonicHttpTTSService: reuse `NeuphonicTTSSettings`; use inherited `language` field instead of bespoke `lang_code`
- NvidiaTTSService: new `NvidiaTTSSettings` with `quality`
- PiperTTSService / PiperHttpTTSService: new `PiperTTSSettings` / `PiperHttpTTSSettings` (no extra fields)
- SpeechmaticsTTSService: new `SpeechmaticsTTSSettings` with `max_retries`

Also remove redundant `lang_code` from `NeuphonicTTSSettings` (both WS and HTTP services now use the inherited `TTSSettings.language` field, with automatic enum conversion via the base class).

HTTP services (Neuphonic HTTP, Piper HTTP, Speechmatics) don't override `_update_settings` since the base class applies changes to `self._settings` and subsequent requests read from it automatically.
2026-02-19 18:35:59 -05:00
Paul Kompfner
463ea3725b Update Deepgram Flux with the new service settings pattern 2026-02-19 17:12:24 -05:00
Paul Kompfner
6c609031ee Add more 55-series examples
Also:
- remove unnecessary pass-through `_update_settings` implementation in `FalSTTService`
- warn that `AsyncAITTSService` doesn't currently support runtime settings updates
- update how `GradiumTTSService._update_settings` checks for voice changes
- remove a couple of unnecessary args (because they specified defaults) in other examples
2026-02-19 16:46:14 -05:00
filipi87
18630c9478 Adding changelog entry for RTVI observer ignored_sources feature. 2026-02-19 18:41:05 -03:00
filipi87
3a8d3cc841 Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer. 2026-02-19 18:36:12 -03:00
Filipi da Silva Fuchter
2963c7589d Merge pull request #3774 from pipecat-ai/mb/broadcast-frames-rtvi-observer
Fix RTVIObserver missing upstream-only frames
2026-02-19 15:32:48 -05:00
filipi87
63caa403cb Improving RTVI doc description. 2026-02-19 17:31:25 -03:00
Paul Kompfner
ebb42a3c6d Fix forward reference crash in Google and Anthropic LLM ThinkingConfig
ThinkingConfig was defined as an inner class on the service but referenced in the Settings dataclass declared before the service class, causing a crash at import time. Move ThinkingConfig to a standalone class defined before Settings, and keep a class attribute alias for backward compatibility.
2026-02-19 15:06:48 -05:00
Paul Kompfner
cc54ff4708 Add more 55-series examples 2026-02-19 14:55:21 -05:00
Aleix Conchillo Flaqué
846cf0794d Merge pull request #3615 from omChauhanDev/fix/daily-transport-message-queue
fix(daily): queue outbound messages until transport joins
2026-02-19 11:55:11 -08:00
Aleix Conchillo Flaqué
498349c17e Merge pull request #3776 from pipecat-ai/aleix/stt-ttfb-metrics-refactor
Refactor STT TTFB metrics to use base class start/stop pattern
2026-02-19 11:46:46 -08:00
Aleix Conchillo Flaqué
474b27305f Merge pull request #3748 from pipecat-ai/mb/user-idle-configurable
Make UserIdleController always-on with dynamic timeout updates
2026-02-19 11:44:51 -08:00
Aleix Conchillo Flaqué
20509e8f96 Merge pull request #3744 from pipecat-ai/mb/user-idle-timeout-frame
Redesign UserIdleController to use BotStoppedSpeakingFrame
2026-02-19 11:34:42 -08:00
filipi87
5b2fa69bdc Renaming from broadcasted_sibling_id to broadcast_sibling_id 2026-02-19 16:24:07 -03:00
Aleix Conchillo Flaqué
4f8cacc769 Merge pull request #3747 from pipecat-ai/mb/update-comment-mute-strategy
Update comment in _maybe_mute_frame
2026-02-19 11:19:44 -08:00
Aleix Conchillo Flaqué
0145fb4ea0 Merge pull request #3763 from lukepayyapilli/fix/asyncgen-cleanup-uvloop-crash
Fix async generator cleanup to prevent uvloop crash on Python 3.12+
2026-02-19 11:14:00 -08:00
Aleix Conchillo Flaqué
8e52df7f03 Add changelog entries for PR #3776 2026-02-19 10:52:45 -08:00
Aleix Conchillo Flaqué
8ee99e37ff Merge pull request #3768 from tanmayc25/fix/tavus-sample-rate
fix: use audio.sample_rate instead of audio.audio_frames in TavusInputTransport
2026-02-19 10:52:34 -08:00
Aleix Conchillo Flaqué
bae4211369 Update dependency lock file 2026-02-19 10:52:28 -08:00
Aleix Conchillo Flaqué
859cd7c920 Refactor STT TTFB metrics to use base class start/stop pattern
Eliminate custom _emit_stt_ttfb_metric and manual timestamp tracking in
STTService by reusing FrameProcessor's start_ttfb_metrics/stop_ttfb_metrics
with new start_time/end_time parameters. This keeps the chronological
start→stop ordering and removes _speech_end_time and _last_transcription_time
state from STTService.
2026-02-19 10:52:24 -08:00
filipi87
d608c400f9 Preventing the duplicated BotStartedSpeakingFrame and BotStoppedSpeakingFrame. 2026-02-19 15:49:22 -03:00
Aleix Conchillo Flaqué
94e93bed83 Merge pull request #3719 from pipecat-ai/aleix/sip-transfer-refer-frames
Add SIP transfer and SIP REFER frames to Daily transport
2026-02-19 10:09:13 -08:00
filipi87
b1cee140b9 Refactoring to use broadcasted_sibling_id instead of broadcasted field. 2026-02-19 15:06:50 -03:00
Aleix Conchillo Flaqué
352361bdd2 Update changelog skill to avoid line wrapping 2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
baa61468a1 Add changelog entries for PR #3719 2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
7501ba2e45 Undeprecate DailyUpdateRemoteParticipantsFrame
Remove the deprecation warning and __post_init__ override. Also fix the
default value for remote_participants to use field(default_factory=dict)
instead of None.
2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
200716e8fe Add SIP transfer and SIP REFER frames to Daily transport
Add write_transport_frame() hook to BaseOutputTransport so subclasses
can handle custom frame types that flow through the audio queue. Add
DailySIPTransferFrame and DailySIPReferFrame as DataFrame subclasses
that queue with audio, ensuring SIP operations execute only after the
bot finishes its current utterance. Override write_transport_frame in
DailyOutputTransport to dispatch these frames to the existing
sip_call_transfer() and sip_refer() client methods.

Also switch DailyOutputTransport.send_message error handling from
logger.error to push_error for consistency.
2026-02-19 09:20:33 -08:00
Paul Kompfner
421696e1c2 Replace Any with specific types and add | _NotGiven to all *Settings field annotations across 49 service files
Every `*Settings` dataclass field whose default is `NOT_GIVEN` now carries `_NotGiven` in its type union so the type system accurately reflects the three-state semantics (real value, `None` where applicable, or not-yet-specified). Fields previously typed as bare `Any`, `str`, `float`, `bool`, `list`, `dict`, or `Optional[X]` are now narrowed to the specific type from the corresponding `InputParams` Pydantic model.
2026-02-19 11:28:29 -05:00
Mark Backman
50ef4909e3 Add changelog entries for PR #3774 2026-02-19 07:44:52 -07:00
Mark Backman
63df4642b5 Fix RTVIObserver missing upstream-only frames by adding broadcasted flag
RTVIObserver previously filtered out all upstream frames to avoid
duplicate messages from broadcasted frames. This caused upstream-only
frames to be silently ignored. Instead, add a `broadcasted` field to
the Frame base class that is set by broadcast_frame() and
broadcast_frame_instance(), and only skip upstream copies of
broadcasted frames.
2026-02-19 07:43:20 -07:00
Filipi da Silva Fuchter
43869a499d Merge pull request #3773 from pipecat-ai/mb/fix-ci-apt-get-update
Fix CI: add apt-get update before installing system packages
2026-02-19 09:28:25 -05:00
Mark Backman
d2bf3952ec Merge pull request #3772 from simliai/main
Update SimliClient to latest
2026-02-19 09:13:14 -05:00
Mark Backman
92c380ee77 Add apt-get update before installing system packages in CI
The CI was failing because the runner's package index was stale,
causing a 404 when fetching libasound2-dev (a dependency of
portaudio19-dev). Running apt-get update first refreshes the index.
2026-02-19 07:01:07 -07:00
antonyesk601
a55ba40921 fix: remove misimport 2026-02-19 10:41:17 +00:00
antonyesk601
fb1bfd03dd update SimliClient to latest 2026-02-19 10:35:50 +00:00
Paul Kompfner
a7edd8e441 Fix 55zp example 2026-02-18 17:15:22 -05:00
Paul Kompfner
2a07138abf Fix Grok Realtime dynamic session properties updating, and update corresponding 55zo example 2026-02-18 17:12:36 -05:00
Filipi da Silva Fuchter
a0a7b3101d Merge pull request #3765 from ianbbqzy/ian/inworld-default-async
[inworld] default timestamp transport strategy to ASYNC
2026-02-18 16:59:01 -05:00
Filipi da Silva Fuchter
39dc4ba99c Updated changelog/3765.changed.md 2026-02-18 16:58:27 -05:00
Paul Kompfner
ad942f6e4c Update 55zn example (UIltravox dynamic settings updates) to exercise changing modality, which is a setting that supports dynamic updates 2026-02-18 16:33:05 -05:00
Paul Kompfner
97d34ef9e1 Update OpenAI Realtime to warn when you try to update settings that can't be updated dynamically.
Update corresponding example to demonstrate updating output modality.
2026-02-18 16:16:06 -05:00
Paul Kompfner
c054780477 Fix 55zh example 2026-02-18 15:59:34 -05:00
Paul Kompfner
88a2dbdb82 Update 55zf example to update a setting that is supported by the default Camb TTS model 2026-02-18 15:48:50 -05:00
Paul Kompfner
d386a0efda Update Sarvam TTS to apply all changes to settings, not just voic 2026-02-18 15:31:08 -05:00
Paul Kompfner
b718a23c17 Tweak 55zd example 2026-02-18 15:25:50 -05:00
Paul Kompfner
e38f7d9451 Fix 55zc example 2026-02-18 15:23:23 -05:00
Paul Kompfner
b00d454842 Fix Inworld TTS settings updating 2026-02-18 15:19:57 -05:00
Paul Kompfner
0fa51811ea Fix 55z example 2026-02-18 15:11:04 -05:00
Paul Kompfner
323ee00b83 Fix 55w example 2026-02-18 14:51:48 -05:00
Paul Kompfner
0c73b77327 Update Lmnt TTS to support updating settings dynamically 2026-02-18 14:47:38 -05:00
Paul Kompfner
416e1cf877 Update Rime TTS services to store voice in the standard settings.voice field, as opposed to the nonstandard speaker field 2026-02-18 14:46:47 -05:00
Paul Kompfner
b4c5cb258b Tweak 55r example to make the settings update more pronounced 2026-02-18 14:15:14 -05:00
Paul Kompfner
728a97ade3 Update Deepgram TTS to support updating settings dynamically 2026-02-18 14:11:51 -05:00
Paul Kompfner
28677ec829 Tweak 55p example to make the settings update more pronounced 2026-02-18 13:49:32 -05:00
Paul Kompfner
17886d14e8 Fix ElevenLabsTTSService settings update code 2026-02-18 13:47:02 -05:00
Paul Kompfner
caf5dacbe8 Update 55j example to avoid console warning 2026-02-18 12:37:50 -05:00
Paul Kompfner
b8b531b66a In Cartesia TTS service, we don't need to override _update_settings. Parent class handling is enough, as new settings are picked up on the next run_tts (no need to reconnect). 2026-02-18 12:37:34 -05:00
Paul Kompfner
a14690e3a0 Fix the 55i example 2026-02-18 11:55:14 -05:00
Paul Kompfner
d913d954db Fix SpeechmaticsSTTService settings update code, and augment test file to better exercise it 2026-02-18 11:34:52 -05:00
Paul Kompfner
e98bb1df66 Simplify 55* examples: inline the settings update directly in the on_client_connected handler instead of wrapping it in a separate async task 2026-02-18 11:06:33 -05:00
Paul Kompfner
a7ada79fd9 Fix ElevenLabsRealtimeSTTService:
- Move `CommitStrategy` up in the file so it could be used by `ElevenLabsRealtimeSTTSettings`
- Fix a bug where `run_tts` would erroneously try to reconnect if a reconnection was already in flight (like a reconnection triggered by `_update_settings`)
2026-02-18 10:50:53 -05:00
Filipi da Silva Fuchter
a5b5a8e5cf Merge pull request #3759 from pipecat-ai/mb/gradium-context-update
Switch Gradium TTS to AudioContextWordTTSService for multiplexing
2026-02-18 10:16:57 -05:00
filipi87
1daea78b91 Fix GradiumTTSService to reuse context IDs across multiple run_tts calls and prevent the parent class from pushing text frames. 2026-02-18 12:12:49 -03:00
Paul Kompfner
7910f20e14 Update comment in Azure TTS explaining how we could support dynamic settings updates in the future 2026-02-18 10:07:33 -05:00
Paul Kompfner
d7d94a29f0 Add foundational examples (55) for runtime settings updates via *UpdateSettingsFrame
42 examples covering STT (13), TTS (21), LLM (4), and realtime (4) services. Each demonstrates updating service settings 10 seconds after client connects, verifying the typed settings machinery end-to-end for every provider.
2026-02-18 09:46:23 -05:00
Tanmay Chaudhari
6066eec853 Add changelog for PR #3768 2026-02-18 14:31:16 +05:30
Tanmay Chaudhari
cd379671aa fix: use audio.sample_rate instead of audio.audio_frames in TavusInputTransport 2026-02-18 14:18:16 +05:30
Ian Lee
8006223911 [inworld] default timestamp transport strategy to ASYNC 2026-02-17 15:13:20 -08:00
Paul Kompfner
ce51df677c Add backward-compat _aliases and from_mapping overrides to TTS settings
The migration from plain-dict `self._settings` to typed dataclasses renamed keys and flattened nested dicts. The deprecated dict-based `TTSUpdateSettingsFrame(settings={...})` code path calls `from_mapping`, which silently dropped old keys into `extra`.

- Add `_aliases` so renamed flat keys (e.g. `sample_rate` → `fish_sample_rate`, camelCase Inworld keys) resolve correctly.
- Override `from_mapping` to destructure nested dicts (`output_format`, `prosody`, `audioConfig`, `voice_setting`, `audio_setting`) into their flat field equivalents.
- Fix AsyncAI constructor bug passing `output_format={...}` dict instead of individual `output_container`/`output_encoding`/`output_sample_rate` fields.
2026-02-17 17:07:14 -05:00
Paul Kompfner
68ebd3d063 Migrate HumeTTSService to standard TTSSettings pattern and remove dead TTSService.update_setting
HumeTTSService now stores its params (description, speed, trailing_silence) in a proper `HumeTTSSettings` dataclass instead of a separate `_params` Pydantic model, making it work with `TTSUpdateSettingsFrame(update=...)`. The old `update_setting(key, value)` method is kept but deprecated.

Also removes the unused no-op `TTSService.update_setting` base method, which was never called by the `TTSUpdateSettingsFrame` pipeline.
2026-02-17 15:44:41 -05:00
Paul Kompfner
94a651cee2 Remove dead ServiceSettings.to_dict method 2026-02-17 15:15:18 -05:00
Paul Kompfner
1cad4210ce Deprecate dict-based *UpdateSettingsFrame(settings={...}) code path in STT, TTS, and LLM services.
The dataclass-based API (`*UpdateSettingsFrame(update=*Settings(...))`) is the preferred path since 0.0.103. The dict path still works but now emits a `DeprecationWarning`.
2026-02-17 15:09:39 -05:00
Paul Kompfner
1cec8d119d Expand language field docstrings to clarify storage invariant.
The union type reflects the input side; after construction and `_update_settings`, the stored value is always a service-specific string.
2026-02-17 14:57:38 -05:00
Paul Kompfner
7dc16b1d92 Type language fields and centralize conversion in STT services.
Change `TTSSettings.language` and `STTSettings.language` from `Any` to `Language | str | _NotGiven`. Add `language_to_service_language` base method and centralized `isinstance`-guarded conversion in `STTService._update_settings` (mirroring TTS). Update the TTS guard from `is not None` to `isinstance(…, Language)` so raw strings pass through unchanged.

Remove now-redundant per-service language conversion from `_update_settings` overrides (ElevenLabs, Azure, Fal, Whisper). Add `language_to_service_language` to Azure STT so the centralized conversion picks it up. Fix AWS and NVIDIA STT `__init__` to convert language at construction time, then simplify their runtime accessors to read `_settings.language` directly.
2026-02-17 14:49:26 -05:00
Luke Payyapilli
247f0bbcd3 Fix async generator cleanup to prevent uvloop crash on Python 3.12+ 2026-02-17 13:10:31 -05:00
Paul Kompfner
d2372c127a Add specific type annotations to ServiceSettings fields, replacing Any with str, float, int unions as appropriate. 2026-02-17 11:56:37 -05:00
Paul Kompfner
3b1ba57452 Change apply_update / _update_settings return type from set[str] to dict[str, Any]. The dict maps each changed field name to its pre-update value, enabling services to do granular diffing of complex settings objects. Existing call-site patterns ("field" in changed, if changed, iteration) work unchanged; set-difference sites use changed.keys() - {...}. 2026-02-17 11:49:15 -05:00
Paul Kompfner
02c2778b8d Document _warn_unhandled_updated_settings pattern in COMMUNITY_INTEGRATIONS.md. 2026-02-17 11:08:26 -05:00
Paul Kompfner
fa6a6dabee Fix DeepgramSageMakerSTTService._update_settings live_options sync to match DeepgramSTTService pattern.
Add missing reverse sync (live_options → top-level model/language) and `set_model_name()` call.
2026-02-17 11:02:13 -05:00
Paul Kompfner
3a77b4c1d8 In services that don't handle runtime settings updates—or don't handle them for *all* available settings—log a warning about which fields specifically aren't handled. Revert new apply-settings-updates logic across various services, to reduce PR testing scope. This logic can be added service by service gradually as future work.
Note that for services that previously handled applying updates (through methods like `set_model` and `set_language`), we're keeping the update-applying logic (some or most of which is already well-tested) and expanding it to cover all relevant settings fields. Services under this bucket are:
- Deepgram STT
- Deepgram Sagemaker STT
- Elevenlabs STT
- Google STT
- Gradium STT
- OpenAI STT
- Speechmatics STT
2026-02-17 10:58:29 -05:00
Mark Backman
3537420d91 Merge pull request #3761 from speechmatics/fix/sdk-version 2026-02-17 08:02:00 -05:00
Sam Sykes
65fb88e61e chore: update version specifier for speechmatics-voice
Change the version specifier from `>=0.2.8` to
`~=0.2.8` for the `speechmatics-voice` package.
This ensures compatibility with future patch
versions while preventing potential breaking
changes from minor updates.
2026-02-17 09:58:17 +00:00
Sam Sykes
b345f48ac1 fix: update dependency specifier for speechmatics-voice
Change the version specifier from >=0.2.8 to ~=0.2.8 for the
speechmatics-voice package to ensure compatibility with future
patch versions.
2026-02-17 09:55:43 +00:00
Mark Backman
f181e12d8f Add changelog for PR #3759 2026-02-16 11:35:45 -07:00
Mark Backman
36de6003d0 Switch Gradium TTS to AudioContextWordTTSService for multiplexing
Use client_req_id-based multiplexing instead of disconnecting and
reconnecting the websocket on every interruption. This follows the
same pattern used by Cartesia, ElevenLabs, and other services via
AudioContextWordTTSService.

Key changes:
- Base class: InterruptibleWordTTSService -> AudioContextWordTTSService
- Add close_ws_on_eos: False to setup message to keep connection alive
- Add client_req_id to text, end_of_stream messages for demultiplexing
- Route audio via append_to_audio_context() instead of push_frame()
- Silently drop messages for cancelled/unknown contexts on interruption
- Add _handle_interruption() that resets context without reconnecting
- Remove no-op push_frame() override
2026-02-16 11:34:16 -07:00
Mark Backman
dba4de77bf Merge pull request #3684 from ai-coustics/goedev/aic-model-caching
AIC model caching
2026-02-16 10:43:14 -05:00
Mark Backman
507765625f Make UserIdleController always-on with dynamic timeout updates
Always create UserIdleController (timeout=0 means disabled), removing
all Optional guards. Add UserIdleTimeoutUpdateFrame to allow changing
the idle timeout at runtime.
2026-02-14 09:54:30 -05:00
Mark Backman
8f5e5e8e7c Update comment in _maybe_mute_frame 2026-02-14 09:41:42 -05:00
Mark Backman
c682a44bb6 Merge pull request #3738 from lukepayyapilli/fix/mute-events-before-start-frame
Fix LLMUserAggregator broadcasting mute events before StartFrame
2026-02-14 09:40:40 -05:00
Mark Backman
cb7023681f Add changelog for PR #3744 2026-02-14 08:57:46 -05:00
Mark Backman
012ef41ff4 Redesign UserIdleController to use BotStoppedSpeakingFrame
Replace the continuous heartbeat-based timer (UserSpeakingFrame/BotSpeakingFrame
+ asyncio.Event loop) with a simple one-shot timer that starts when
BotStoppedSpeakingFrame is received and cancels on UserStartedSpeakingFrame or
BotStartedSpeakingFrame. This eliminates false idle triggers caused by gaps
between the user finishing speaking and the bot starting to speak (LLM/TTS
latency).

Guard the timer start with two conditions to prevent false triggers:
- User turn in progress: during interruptions, BotStoppedSpeaking arrives
  while the user is still speaking mid-turn.
- Function calls in progress: FunctionCallsStarted arrives before
  BotStoppedSpeaking because the bot speaks concurrently with the function
  call starting, so the timer must wait for the result and subsequent bot
  response.
2026-02-14 08:55:56 -05:00
Paul Kompfner
66b7b4a5d4 Update COMMUNITY_INTEGRATIONS.md for the new dataclass-based service settings pattern. 2026-02-13 16:04:49 -05:00
Filipi da Silva Fuchter
f6bb5fa124 Merge pull request #3741 from pipecat-ai/filipi/update_prebuilt
Using the latest version of pipecat-ai-small-webrtc-prebuilt.
2026-02-13 15:31:48 -05:00
Paul Kompfner
b08548af9d Remove typed-settings migration scaffolding and rename _update_settings_from_typed to _update_settings.
Now that all services use typed `ServiceSettings` objects, this removes the interim scaffolding that supported both dict-based and typed settings paths in parallel. Specifically: removes old dict-based `_update_settings(settings: Mapping)` methods from base classes, removes `isinstance(self._settings, ServiceSettings)` guards, simplifies `process_frame` branching, and renames `_update_settings_from_typed` to `_update_settings` across all ~30 service implementations. Also renames the no-arg `_update_settings()` helper on realtime services to `_send_session_update()` to avoid collision, adds `from_mapping` overrides on `GoogleLLMSettings` and `AnthropicLLMSettings` for ThinkingConfig dict-to-object conversion, and replaces a broken no-arg `_update_settings()` call in Gemini Live with a TODO.
2026-02-13 15:12:26 -05:00
Paul Kompfner
ab92a0e1d7 Remove/deprecate service-specific set_model and set_voice overrides.
- NvidiaSTTService.set_model: convert to proper DeprecationWarning (model can't change at runtime for Riva streaming STT)
- NvidiaTTSService.set_model: same treatment for Riva TTS
- NvidiaSegmentedSTTService.set_model: remove — base class now routes through _update_settings_from_typed which re-creates the recognition config
- GeminiTTSService.set_voice: remove — move AVAILABLE_VOICES validation into _update_settings_from_typed so it fires on both legacy and new paths
2026-02-13 15:12:26 -05:00
Paul Kompfner
e37f2f99c4 Deprecate set_model, set_voice, and set_language in favor of *UpdateSettingsFrame. 2026-02-13 15:12:26 -05:00
Paul Kompfner
e43351f5f8 Add class-level _settings type annotations to all service classes for better editor support.
Standardize all STT, TTS, and LLM service classes to declare `_settings` with the narrowed Settings type as a class-level annotation. This gives editors and type checkers the specific type when hovering or autocompleting on `self._settings` in each service and its subclasses. Inline `self._settings: Type = ...` assignments are replaced with plain `self._settings = ...`.
2026-02-13 15:12:26 -05:00
Paul Kompfner
444cbb6499 Add turn-completion fields to LLMSettings and handle them in the typed-service-settings path.
`filter_incomplete_user_turns` and `user_turn_completion_config` were only handled in the legacy dict-based `_update_settings` code path. This adds them to `LLMSettings` and introduces `LLMService._update_settings_from_typed` so the typed path handles them too.
2026-02-13 15:12:26 -05:00
Paul Kompfner
8a4ab611be Broad service settings refactor, with the primary aim of making service settings discoverable and strongly-typed. Service settings can be updated at runtime with *UpdateSettingsFrames.
Does not (yet) touch `InputParams`, to avoid scope creep and touching something currently part of the public API. But there is a lot of overlap between `*Settings` object fields and `InputParams` fields.

Other than discoverability/typing, these are some other improvements brought by this refactor:
- There is now a single code path (see `_update_settings_from_typed`) where services can respond to settings changes (by, say, reconnecting if needed), improving maintainability and guaranteeing one and only one reconnection no matter which settings changed
- `set_language`/`set_model`/`set_voice`—which we're assuming are usable as public methods, though *not* recommended over `*UpdateSettingsFrame`—all use the same code path as settings updates. They're also now all consistent in that, if a service needs to respond to a change (by, say, reconnecting if needed), any of these methods will kick off that process. Note that this is technically a behavior change.
- Several services now properly react to changed settings by reconnecting:
  - `AWSTranscribeSTTService`
  - `AzureSTTService`
  - `SonioxSTTService`
  - `GladiaSTTService`
  - `SpeechmaticsSTTService`
  - `AssemblyAISTTService`
  - `CartesiaSTTService`
  - `FishAudioTTSService` (would previously only reconnect when `model` changed)
  - `GoogleSTTService`
  - `SpeechmaticsSTTService` (which previously only handled *some* settings updates through a nonstandard public `update_params` method)
  - `GradiumSTTService`
  - `NvidiaSegmentedSTTService` (which previously only handled changes to language)
- Bookkeeping across various services has been reduced, mostly by deduping ivars; the `self._settings` ivar is treated as the source of truth

NOTE: I pretty much guarantee that there are services missed in this PR in terms of bringing to consistency with how updates are handled (like whether changes in certain fields trigger reconnects when they need to). We can squash remaining inconsistencies as we stumble onto them, service by service. The goal here is to get things *mostly* in order, and establish the infrastructure and patterns we'll need going forward.
2026-02-13 15:12:26 -05:00
filipi87
2489c76bc6 Using the latest version of pipecat-ai-small-webrtc-prebuilt. 2026-02-13 16:43:25 -03:00
Mark Backman
73cb96bf66 Merge pull request #3739 from pipecat-ai/mb/docs-skill
Add /update-docs Claude Code skill
2026-02-13 13:26:06 -05:00
Mark Backman
79ec61d1d8 Merge pull request #3642 from pipecat-ai/cb/rime-arcana-v3
Update RimeTTSService for arcana and mistv2 model support
2026-02-13 13:25:27 -05:00
Mark Backman
ca440594fe Merge pull request #3720 from pipecat-ai/mb/fix-grok-realtime
Fix Grok Realtime voice type validation for server responses
2026-02-13 13:24:53 -05:00
Mark Backman
6c25dd4aa2 Merge pull request #3736 from pipecat-ai/mb/improve-events-docstrings
Improve events docstrings
2026-02-13 13:24:15 -05:00
Mark Backman
09bb6bb03b Merge pull request #3735 from pipecat-ai/mb/fix-llm-tracing-error-handilng
Fix double execution of service functions on tracing errors
2026-02-13 13:23:55 -05:00
Mark Backman
746fdfbfef Merge pull request #3728 from pipecat-ai/mb/upgrade-pillow
Bump Pillow upper bound from <12 to <13
2026-02-13 13:23:41 -05:00
Mark Backman
f7af9f1efd Broaden /update-docs scope to detect missing doc sections 2026-02-13 13:14:45 -05:00
Mark Backman
a5f95acaf5 Add changelog for PR #3735 2026-02-13 13:08:03 -05:00
Mark Backman
e50b138ab2 Fix double execution of service functions when tracing errors occur
The outer try/except in each service decorator caught both tracing
setup errors and application errors from the wrapped function. If the
function itself raised (e.g. LLM rate limit, TTS timeout), the
exception was caught and the function was called a second time.

Fix by tracking whether the original function was called via a
fn_called flag. If the function was already called, re-raise the
exception instead of falling back to untraced re-execution.
2026-02-13 13:08:03 -05:00
Mark Backman
3640c7a2dd Merge pull request #3733 from pipecat-ai/mb/fix-traceable-init
Deprecate unused class_decorators tracing module and fix stale comments
2026-02-13 13:04:34 -05:00
Mark Backman
2454bedf29 Add /update-docs skill for keeping docs in sync with source changes
Adds a Claude Code skill that analyzes the current branch diff against
main, maps changed source files to their doc pages, and makes targeted
updates to Configuration, InputParams, Usage, Notes, and Event Handlers
sections.
2026-02-13 12:52:23 -05:00
Luke Payyapilli
3adb2f50a6 Fix LLMUserAggregator broadcasting mute events before StartFrame 2026-02-13 11:59:56 -05:00
Mark Backman
01b7a93e08 Deprecate unused Traceable/class_decorators module and fix stale comments
The class_decorators.py module (Traceable, @traceable, @traced) is not
used anywhere in the codebase. Mark it deprecated and fix the misleading
comment in service_decorators.py that referenced it as if it were active.
2026-02-13 11:25:40 -05:00
Mark Backman
347eaf582d Merge pull request #3721 from pipecat-ai/fix/pipeline-scoped-tracing-context
Replace singleton context providers with pipeline-scoped TracingContext
2026-02-13 11:24:37 -05:00
Mark Backman
25ca296477 Move tracing fields to AIService and extract _get_turn_context helper
Consolidate _tracing_enabled and _tracing_context from LLMService,
STTService, and TTSService into the shared AIService base class.
Extract _get_turn_context() helper in service_decorators.py to
encapsulate the repeated pattern across all traced decorators.
2026-02-13 11:21:24 -05:00
Mark Backman
3fce88555f Improve events docstrings 2026-02-13 09:39:44 -05:00
Mark Backman
9e6f27c9f1 Merge pull request #3625 from ianbbqzy/ian/inworld-async-timestamp
[inworld] Allow Async delivery of timestamps info
2026-02-12 21:20:22 -05:00
Ian Lee
94f01af545 [inworld] Allow Async delivery of timestamps info
* speed up first audio chunk latency
2026-02-12 17:48:58 -08:00
Filipi da Silva Fuchter
432870cc36 Merge pull request #3729 from pipecat-ai/filipi/elevenlabs_issue
TTS services fixes.
2026-02-12 16:31:46 -05:00
Filipi da Silva Fuchter
e065907745 Merge pull request #3718 from pipecat-ai/filipi/bot_started_speaking
Fixing an issue in RTVI where we were sometimes receiving bot output messages before the bot started speaking.
2026-02-12 16:31:14 -05:00
Mark Backman
b7a5ca3d1e Merge pull request #3730 from pipecat-ai/mb/stt-keepalive
Move STT keepalive from WebsocketSTTService to STTService base class
2026-02-12 15:37:23 -05:00
filipi87
9569625f03 Changelog entries for the TTS fixes. 2026-02-12 16:11:02 -03:00
Mark Backman
18afe37bd1 Add changelog entries for PR #3642 2026-02-12 14:09:24 -05:00
Mark Backman
2b9777b812 Update RimeTTSService InputParams for arcana and mistv2 model support
Add model-specific params (arcana: repetition_penalty, temperature, top_p;
mistv2: no_text_normalization, save_oovs, segment) with dynamic query param
building via _build_settings(). Model/voice/param changes now trigger
WebSocket reconnection since all settings are URL query params.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 14:01:41 -05:00
filipi87
8866ab1585 Fixing RimeTTSService to reuse the same context when needed. 2026-02-12 15:53:38 -03:00
filipi87
f0995164d9 Fixing PlayHTTTSService to reuse the same context when needed. 2026-02-12 15:50:18 -03:00
filipi87
136732afae Fixing InworldTTSService to reuse the same context when needed. 2026-02-12 15:46:59 -03:00
filipi87
3410eb82b3 Fixing CartesiaTTSService to reuse the same context when needed. 2026-02-12 15:26:49 -03:00
Chad Bailey
794811fbdb Updated WSS endpoint for Rime Arcana v3 support 2026-02-12 13:24:29 -05:00
filipi87
abea22ec57 Fixing AsyncAITTSService to reuse the same context when needed. 2026-02-12 15:17:47 -03:00
Mark Backman
08beb0264a Add changelog entries for PR #3730 2026-02-12 13:14:11 -05:00
Mark Backman
2e15b4842c Move STT keepalive mechanism from WebsocketSTTService to STTService base class
This allows non-websocket STT services (like SarvamSTTService, which uses
the Sarvam Python SDK for connection management) to reuse the same keepalive
pattern. Subclasses override _send_keepalive() and _is_keepalive_ready() for
their specific protocol.
2026-02-12 11:09:39 -05:00
filipi87
6d95a2425c Fixing ElevenLabs TTS word timestamp interleaving across sentences. 2026-02-12 12:54:47 -03:00
Mark Backman
4667a3d66d Add changelog for #3728 2026-02-12 09:42:23 -05:00
Mark Backman
0bf2477d2c Bump Pillow upper bound from <12 to <13 2026-02-12 09:41:18 -05:00
Mark Backman
71a752c971 Add tests for TracingContext and TurnTraceObserver
Cover pipeline-scoped tracing context lifecycle, span hierarchy,
conversation/turn context management, and concurrent pipeline isolation.
2026-02-11 23:27:35 -05:00
Mark Backman
358f237507 Replace singleton context providers with pipeline-scoped TracingContext
ConversationContextProvider and TurnContextProvider were singletons that
stored tracing context as class-level state. When two PipelineTask instances
ran concurrently, they would overwrite each other's context, causing service
spans to attach to the wrong pipeline's turn span.

Replace both singletons with a single TracingContext object owned by each
PipelineTask, threaded to services via StartFrame.
2026-02-11 21:58:10 -05:00
Mark Backman
d99a256715 Merge pull request #3706 from ianbbqzy/ian/inworld-user-agent
[Inworld] add User-Agent and X-Request-Id for better traceability
2026-02-11 19:38:26 -05:00
Ian Lee
dcbcab1542 [Inworld] add User-Agent and X-Request-Id for better traceability 2026-02-11 15:47:20 -08:00
Mark Backman
a966947220 Add changelog for #3720 2026-02-11 18:04:58 -05:00
Mark Backman
16b060d9e9 Fix Grok Realtime voice type validation for server responses
The Grok API now returns prefixed voice names (e.g. "human_Ara") in
session.updated events, causing Pydantic validation errors. Widen the
voice field type from GrokVoice to GrokVoice | str to accept both
user-facing names and server-returned values.
2026-02-11 18:04:20 -05:00
filipi87
ed7fde324e Adding changelog entry for the RTVIObserver fix. 2026-02-11 16:23:42 -03:00
filipi87
beb4e86b5f Fixing an issue in RTVI where we were sometimes receiving bot output messages before the bot started speaking. 2026-02-11 16:17:28 -03:00
Aleix Conchillo Flaqué
e75ccd9c2f Merge pull request #3717 from pipecat-ai/aleix/update-claude-md-pr-instructions
Add /pr-submit skill and clean up CLAUDE.md
2026-02-11 10:40:20 -08:00
Aleix Conchillo Flaqué
a80919ceff Move PR submission instructions from CLAUDE.md to /pr-submit skill
Extract the procedural PR workflow into an actionable skill that can be
invoked with /pr-submit. CLAUDE.md is better suited for project context
and conventions, not step-by-step procedures.
2026-02-11 09:57:42 -08:00
Aleix Conchillo Flaqué
1fe4538982 Update PR submission instructions in CLAUDE.md
Expand the Pull Requests section with detailed step-by-step instructions
including branch naming, commit guidance, changelog generation, and PR
description updates.
2026-02-11 09:51:10 -08:00
Filipi da Silva Fuchter
9a48d93bd2 Merge pull request #3713 from pipecat-ai/filipi/smallwebrtc_8khz
Fixing smallwebrtc transport input audio resampling logic.
2026-02-11 11:58:32 -05:00
filipi87
0c3e59ed61 Adding changelog entry for the SmallWebRTCTransport fix. 2026-02-11 13:07:52 -03:00
filipi87
ec2b38dc29 Fixing smallwebrtc transport input audio resampling logic. 2026-02-11 13:01:25 -03:00
Gökmen Görgen
2036757b84 add unit tests for AICModelManager and AICFilter error handling, model loading, and processor behavior 2026-02-11 15:22:37 +01:00
Mark Backman
0574167fbd Merge pull request #3709 from pipecat-ai/mb/fix-quickstart-pcc-deploy
Fix quickstart pcc-deploy.toml
2026-02-10 22:19:37 -05:00
Mark Backman
972ad93e18 Fix quickstart pcc-deploy.toml 2026-02-10 22:17:09 -05:00
Mark Backman
ac53594967 Merge pull request #3708 from pipecat-ai/mb/fix-quickstart-pyproject
Fix quickstart pyproject.toml
2026-02-10 22:09:49 -05:00
Mark Backman
b063d9d43b Fix quickstart pyproject.toml 2026-02-10 22:06:38 -05:00
Mark Backman
48e93beadf Merge pull request #3705 from pipecat-ai/mb/quickstart-0.0.102
Update quickstart for 0.0.102
2026-02-10 21:57:33 -05:00
Aleix Conchillo Flaqué
640940a41a Merge pull request #3704 from pipecat-ai/changelog-0.0.102
Release 0.0.102 - Changelog Update
2026-02-10 18:31:30 -08:00
aconchillo
f1e2001a4e Update changelog for version 0.0.102 2026-02-10 18:28:21 -08:00
Aleix Conchillo Flaqué
12dc6c0b9e Merge pull request #3707 from pipecat-ai/aleix/fix-openai-stream-close-compat
fix(openai): use compatible stream closing for non-OpenAI providers
2026-02-10 18:26:18 -08:00
Aleix Conchillo Flaqué
93f4402198 Update stream close test to match new _closing helper 2026-02-10 18:19:57 -08:00
Aleix Conchillo Flaqué
f3eb5b30a0 Add changelog for #3707 2026-02-10 18:01:29 -08:00
Aleix Conchillo Flaqué
18aad05a7c fix(openai): use compatible stream closing for non-OpenAI providers
OpenAI's AsyncStream uses close() while async generators (e.g. from
OpenPipe) use aclose(). Replace direct async-with on the stream with a
helper that handles both protocols.
2026-02-10 17:59:21 -08:00
Mark Backman
883b24f577 Update quickstart for 0.0.102 2026-02-10 18:14:04 -05:00
Mark Backman
17ab9c425f Merge pull request #3675 from pipecat-ai/mb/elevenlabs-realtime-send-silence
Add silence-based keepalive to WebsocketSTTService
2026-02-10 18:03:38 -05:00
Mark Backman
2f5e61ac55 Add silence-based keepalive to WebsocketSTTService
Adds opt-in keepalive_timeout and keepalive_interval params to
WebsocketSTTService. When enabled, a background task sends silent audio
(or a service-specific protocol message) when the connection has been
idle, preventing server-side timeout disconnects.

Subclasses override _send_keepalive(silence) to wrap the silence in
their wire format. The default sends raw PCM bytes.

Enables keepalive for ElevenLabs (10s), Gladia (20s), and Soniox (1s),
replacing their per-service custom keepalive tasks.
2026-02-10 17:58:47 -05:00
Aleix Conchillo Flaqué
1128c5b7fb Merge pull request #3702 from pipecat-ai/aleix/add-missing-local-smartturn-dependency
pyproject: add local smartturn as a default dependency
2026-02-10 14:34:43 -08:00
Aleix Conchillo Flaqué
a9a5edd8ca pyproject: add local smartturn as a default dependency 2026-02-10 14:32:32 -08:00
Filipi da Silva Fuchter
a98c884e31 Merge pull request #3621 from pipecat-ai/filipi/context_compressure
Context summarization feature implementation
2026-02-10 17:04:47 -05:00
filipi87
2475697955 Changelog entries for context summarization 2026-02-10 18:59:12 -03:00
filipi87
ba242d4875 Context summarization example with Google 2026-02-10 18:59:03 -03:00
filipi87
5deb80932b Context summarization example with OpenAI 2026-02-10 18:58:55 -03:00
filipi87
4a00e6829f Automated tests for the context summarizer. 2026-02-10 18:58:44 -03:00
filipi87
9d89afa7d4 Automated tests for the context summarization feature. 2026-02-10 18:58:33 -03:00
filipi87
92b6ecd945 New Claude skill to help refactor and cleanup the code. 2026-02-10 18:58:22 -03:00
filipi87
314d074c61 Context summarization feature implementation. 2026-02-10 18:58:12 -03:00
Filipi da Silva Fuchter
9c627e7292 Merge pull request #3653 from pipecat-ai/filipi/heygen_lite
HeyGen improvements.
2026-02-10 12:12:22 -05:00
Filipi da Silva Fuchter
ad179b0852 Merge pull request #3584 from pipecat-ai/filipi/speak_frame
TTS services improvements.
2026-02-10 12:11:47 -05:00
filipi87
5128089d42 Add changelog entries for PR #3653. 2026-02-10 14:02:32 -03:00
filipi87
87a79df048 Updating the heygen examples to use sandbox by default. 2026-02-10 14:02:20 -03:00
filipi87
24f90715e3 Use LITE as the default mode, and add support for video_settings and is_sandbox in LiveAvatarNewSessionRequest. 2026-02-10 14:02:09 -03:00
filipi87
e00b98343e Changelog entries for TTS context tracking 2026-02-10 11:37:21 -03:00
filipi87
ad1bec4583 Updated openai example to use on_tts_request and append_to_text. 2026-02-10 11:28:35 -03:00
filipi87
a47d7f98ee Refactored all 30+ TTS service implementations to support context tracking 2026-02-10 11:28:08 -03:00
filipi87
19cd242261 Added TTS context tracking system to trace audio generation through the pipeline. 2026-02-10 11:27:58 -03:00
filipi87
9bb712a47b Simplified universal context aggregators, _handle_text() to only check frame.append_to_context instead of also checking self._started 2026-02-10 11:27:30 -03:00
filipi87
1dccbe7c0b Simplified context aggregators, _handle_text() to only check frame.append_to_context instead of also checking self._started 2026-02-10 11:27:13 -03:00
Mark Backman
2dd3e2f1e7 Merge pull request #3697 from pipecat-ai/mb/soniox-rt-4
Update SonioxSTTService default model to stt-rt-v4
2026-02-10 09:24:39 -05:00
filipi87
f206aaa28d - Added context_id field to all TTS-related frames (TTSAudioRawFrame, TTSStartedFrame, TTSStoppedFrame, AggregatedTextFrame, TTSTextFrame)
- Added append_to_context parameter to TTSSpeakFrame for conditional LLM context addition
2026-02-10 11:22:26 -03:00
Mark Backman
60e42f5690 Merge pull request #3701 from pipecat-ai/mb/changelog-3700 2026-02-10 09:19:42 -05:00
Mark Backman
88e981c013 Set vad_force_turn_endpoint to False in SonioxSTTService 2026-02-10 09:16:03 -05:00
Mark Backman
7bd8dfe898 Add changelog for PR 3700 2026-02-10 08:20:03 -05:00
Mark Backman
83039a1a35 Merge pull request #3700 from ashotbagh/chore/async-migration
chore: update Async API URL and default model
2026-02-10 08:17:04 -05:00
Ashot
28e8b61eb4 chore: update Async API URL and default model 2026-02-10 15:23:51 +04:00
Mark Backman
d47d95e1f0 Update SonioxSTTService default model to stt-rt-v4 2026-02-09 23:48:08 -05:00
Mark Backman
79b9d929c5 Merge pull request #3682 from eoinoreilly30/patch-1
Add new voice options 'marin' and 'cedar'
2026-02-09 23:47:39 -05:00
Eoin
dfc0856d54 Added changelog entry 2026-02-10 12:31:26 +09:00
Eoin
f3c1cd4cd6 Lint 2026-02-10 12:31:26 +09:00
Eoin
18d91d6df3 Add new voice options 'marin' and 'cedar' 2026-02-10 12:31:26 +09:00
Mark Backman
688f502488 Merge pull request #3644 from pipecat-ai/mb/update-assembly-ai-default-config
AssemblyAISTTService: Disable turn detection when setting vad_force_t…
2026-02-09 22:27:44 -05:00
Mark Backman
7684a94c33 AssemblyAISTTService: Disable turn detection when setting vad_force_turn_endpoint to True 2026-02-09 22:20:35 -05:00
Aleix Conchillo Flaqué
e27f4bccfb Merge pull request #3695 from pipecat-ai/aleix/more-claude-updates
CLAUDE.md: add pipeline task and pipeline runner
2026-02-09 18:14:30 -08:00
Mark Backman
fa8b0aeda8 Merge pull request #3690 from pipecat-ai/mb/add-claude-settings
Add shared Claude Code settings
2026-02-09 19:22:28 -05:00
Aleix Conchillo Flaqué
946f0f4e77 CLAUDE.md: add pipeline task and pipeline runner 2026-02-09 16:19:11 -08:00
Mark Backman
b9cf3f3225 Merge pull request #3694 from pipecat-ai/mb/claude-updates
Add observers, error handling, task management, and testing to CLAUDE.md
2026-02-09 19:05:49 -05:00
Aleix Conchillo Flaqué
d32c4b2f5f Merge pull request #3693 from pipecat-ai/aleix/update-examples-remove-default-turn-analyzer
remove the now default turn analyzer from examples
2026-02-09 16:04:19 -08:00
Mark Backman
77a5d16a10 Merge pull request #3692 from pipecat-ai/mb/request-metadata-updates
Rename RequestMetadataFrame to ServiceSwitcherRequestMetadataFrame with service targeting
2026-02-09 18:19:29 -05:00
Mark Backman
ca224834b2 Add observers, error handling, task management, and testing to CLAUDE.md 2026-02-09 18:12:24 -05:00
Aleix Conchillo Flaqué
3867bc6302 LLMUserAggregator: update turn analyzer warning 2026-02-09 14:33:38 -08:00
Aleix Conchillo Flaqué
83a8379401 examples: remove the now default turn analyzer user turn stop strategy 2026-02-09 14:33:38 -08:00
mattie ruth backman
f2688deb0d Update args field in RTVILLMFunctionCallInProgressMessageData to match API of existing RTVILLMFunctionCallResultData. 2026-02-09 17:17:01 -05:00
Mark Backman
981253c703 Rename RequestMetadataFrame to ServiceSwitcherRequestMetadataFrame with service targeting
Add a `service` field so the frame targets a specific service, allowing
ServiceSwitcher.push_frame to consume it only when the targeted service
matches the active service. STTService and test mocks now push the frame
downstream after handling instead of silently consuming it.
2026-02-09 16:48:34 -05:00
Mark Backman
aa6c9797ca Merge pull request #3671 from pipecat-ai/mb/sarvam-cleanup
Clean up on Sarvam STT and TTS classes
2026-02-09 15:58:34 -05:00
Mark Backman
6305e04569 Clean up on Sarvam STT and TTS classes 2026-02-09 15:53:05 -05:00
Mark Backman
3ff9b7b5ad Merge pull request #3687 from pipecat-ai/mb/rtvi-mute-events
Emit RTVI events for user mute/unmute
2026-02-09 15:18:28 -05:00
Mark Backman
cc797ba3cf Add shared Claude Code settings to disable commit attribution 2026-02-09 15:15:31 -05:00
Aleix Conchillo Flaqué
91c8122c17 Merge pull request #3689 from pipecat-ai/aleix/default-smart-turn-stop-strategy
Use TurnAnalyzerUserTurnStopStrategy as default stop strategy
2026-02-09 12:07:16 -08:00
Aleix Conchillo Flaqué
944ac92593 Fix test_langchain to use explicit stop strategy
The default stop strategy changed to TurnAnalyzerUserTurnStopStrategy,
which requires actual audio analysis. Use SpeechTimeoutUserTurnStopStrategy
explicitly since this test is not testing turn detection.
2026-02-09 12:00:41 -08:00
Aleix Conchillo Flaqué
ca0d2e68c3 Add changelog for #3689 2026-02-09 11:58:09 -08:00
Aleix Conchillo Flaqué
631463e573 Use TurnAnalyzerUserTurnStopStrategy as default stop strategy
Change the default user turn stop strategy from
TranscriptionUserTurnStopStrategy to TurnAnalyzerUserTurnStopStrategy
with LocalSmartTurnAnalyzerV3. Also reduce AUDIO_INPUT_TIMEOUT_SECS
from 1.0 to 0.5 and remove its debug log.
2026-02-09 11:58:09 -08:00
Mark Backman
6a553367a2 Merge pull request #3676 from pipecat-ai/mb/code-review-skill
Add Claude code-review skill
2026-02-09 14:48:20 -05:00
Mark Backman
00ec6c77ea Emit RTVI events for user mute/unmute state changes
Add UserMuteStartedFrame/UserMuteStoppedFrame and corresponding RTVI
messages so clients can observe when mute strategies activate/deactivate.
2026-02-09 14:44:32 -05:00
Mark Backman
ee6520db30 Merge pull request #3637 from pipecat-ai/mb/improve-user-stop-turn
Improve user turn stop timing by triggering timeout from VAD stop, push STT metadata to user aggregator
2026-02-09 14:43:22 -05:00
Aleix Conchillo Flaqué
2a572aedba Simplify ServiceSwitcher with closure-based filters
- Make ServiceSwitcherStrategy inherit from BaseObject with properties
  for services and active_service, and move initial service selection
  into the base class
- Add on_service_switched event to ServiceSwitcherStrategy
- handle_frame now returns the switched-to service (or None), allowing
  ServiceSwitcher to swallow ManuallySwitchServiceFrame on switch and
  request metadata from the new active service
- Override push_frame to suppress RequestMetadataFrame and
  ServiceMetadataFrame from inactive services
- Remove ServiceSwitcherFilter and ServiceSwitcherFilterFrame in favor
  of plain FunctionFilter instances with closures that check the
  strategy's active service directly
- FunctionFilter: add FilterType alias
- FunctionFilter: when direction is None, frames in both directions
  are filtered instead of just one
- Add docstrings to ServiceSwitcher and its components
2026-02-09 14:12:33 -05:00
Mark Backman
5e66702cf5 Improved the accuracy of the UserBotLatencyObserver and UserBotLatencyLogObserver 2026-02-09 14:12:33 -05:00
Mark Backman
34b068d657 Improve user turn stop timing by triggering timeout from VAD stop
Refactor TranscriptionUserTurnStopStrategy and TurnAnalyzerUserTurnStopStrategy
to use VADUserStoppedSpeakingFrame as the ground truth for when speech ended,
rather than triggering timeouts from transcription frames.
2026-02-09 14:12:33 -05:00
Mark Backman
05e2a013b3 Merge pull request #3672 from pipecat-ai/mb/rtvi-duplicate-events
Filter RTVIObserver to downstream frames only and broadcast FunctionCallCancelFrame
2026-02-09 12:58:28 -05:00
Mark Backman
5f64dae0cf Filter RTVIObserver to downstream frames only and broadcast FunctionCallCancelFrame
RTVIObserver now skips upstream frames to prevent duplicate RTVI messages
when frames are broadcast in both directions. Also changed
FunctionCallCancelFrame to use broadcast_frame for consistency with
other function call frames.
2026-02-09 12:39:25 -05:00
Mark Backman
1bf8b54502 Merge pull request #3683 from dhruvladia-sarvam/sarvam-v3-update 2026-02-09 06:49:59 -05:00
Gökmen Görgen
ed3ec045aa add changelog file. 2026-02-09 12:04:09 +01:00
Gökmen Görgen
67d39a97f7 AIC model caching. 2026-02-09 11:51:28 +01:00
dhruvladia-sarvam
947ff03c9f v3 addition 2026-02-09 13:04:45 +05:30
Om Chauhan
a4e187e138 replace background task with flush-on-join 2026-02-09 06:04:08 +05:30
Om Chauhan
9f380170d7 added changelog 2026-02-09 05:37:43 +05:30
Om Chauhan
12f27f9cda fix(daily): queue outbound messages until transport joins 2026-02-09 05:37:43 +05:30
Mark Backman
3494a94cac Add Claude code-review skill 2026-02-08 11:06:48 -05:00
461 changed files with 27755 additions and 7281 deletions

View File

@@ -0,0 +1,27 @@
{
"name": "pipecat-dev-skills",
"owner": {
"name": "Pipecat"
},
"metadata": {
"description": "Development workflow skills for contributing to the Pipecat project",
"version": "1.0.0"
},
"plugins": [
{
"name": "pipecat-dev",
"description": "Development workflow skills for contributing to the Pipecat project",
"version": "1.0.0",
"source": "./",
"skills": [
"./.claude/skills/changelog",
"./.claude/skills/cleanup",
"./.claude/skills/code-review",
"./.claude/skills/docstring",
"./.claude/skills/pr-description",
"./.claude/skills/pr-submit",
"./.claude/skills/update-docs"
]
}
]
}

5
.claude/settings.json Normal file
View File

@@ -0,0 +1,5 @@
{
"attribution": {
"commit": ""
}
}

View File

@@ -26,7 +26,7 @@ Create changelog files for the important commits in this PR. The PR number is pr
- `{PR_NUMBER}.performance.md` - for performance improvements
- `{PR_NUMBER}.other.md` - for other changes
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change. No line wrapping.
5. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.

View File

@@ -0,0 +1,307 @@
# Code Cleanup Skill
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
It focuses on **readability, correctness, performance, and consistency**, while avoiding breaking changes.
---
## Skill Overview
This skill analyzes all changes introduced in your branch and performs the following actions:
1. **Analyze Branch Changes**
- Review uncommitted changes and outgoing commits
2. **Refactor for Readability**
- Improve clarity, naming, structure, and modern Python usage
3. **Enhance Performance**
- Identify safe, conservative optimization opportunities
4. **Add Documentation**
- Apply Pipecat-style, Google-format docstrings
5. **Ensure Pattern Consistency**
- Match existing Pipecat services, pipelines, and examples
6. **Validate Examples**
- Ensure examples follow foundational patterns (e.g. `07-interruptible.py`)
---
## Usage
Invoke the skill using any of the following commands:
- "Clean up my branch code"
- "Refactor the changes in my branch"
- "Review and improve my branch code"
- `/cleanup`
---
## What This Skill Does
### 1. Analyze Branch Changes
The skill retrieves all uncommitted changes and outgoing commits to understand:
- New files added
- Modified files
- Code additions and deletions
- Overall scope and intent of changes
---
### 2. Code Refactoring
#### Readability Improvements
- Replace tuples with named classes or dataclasses
- Improve variable, method, and class naming
- Extract complex logic into well-named helper methods
- Add missing type hints
- Simplify nested or complex conditionals
- Replace deprecated methods and features
- Normalize formatting to match Pipecat style
#### Performance Enhancements
- Identify inefficient loops or repeated work
- Suggest appropriate data structures
- Optimize async workflows and I/O
- Remove redundant operations
> Performance changes are conservative and non-breaking.
---
### 3. Documentation
Documentation follows **Google-style docstrings**, consistent with Pipecat conventions.
#### Class Documentation
```python
class ExampleService:
"""Brief one-line description.
Detailed explanation of the class purpose, responsibilities,
and important behaviors.
Supported features:
- Feature 1
- Feature 2
- Feature 3
"""
```
#### Method Documentation
```python
def process_data(self, data: str, options: Optional[dict] = None) -> bool:
"""Process incoming data with optional configuration.
Args:
data: The input data to process.
options: Optional configuration dictionary.
Returns:
True if processing succeeded, False otherwise.
Raises:
ValueError: If data is empty or invalid.
"""
```
#### Pydantic Model Parameters
```python
class InputParams(BaseModel):
"""Configuration parameters for the service.
Parameters:
timeout: Request timeout in seconds.
retry_count: Number of retry attempts.
enable_logging: Whether to enable debug logging.
"""
timeout: Optional[float] = None
retry_count: int = 3
enable_logging: bool = False
```
---
### 4. Pattern Consistency Checks
#### Service Classes
- Correct inheritance (`TTSService`, `STTService`, `LLMService`)
- Consistent constructor signatures
- Frame emission patterns
- Metrics support:
- `can_generate_metrics()`
- TTFB metrics
- Usage metrics
- Alignment with similar existing services
#### Examples
Validated against `examples/foundational/07-interruptible.py`:
- Proper `create_transport()` usage
- Correct pipeline structure
- Task setup and observers
- Event handler registration
- Runner and bot entrypoint consistency
---
### 5. Specific Implementation Patterns
#### Service Implementation
```python
class ExampleTTSService(TTSService):
def __init__(self, *, api_key: Optional[str] = None, **kwargs):
super().__init__(**kwargs)
self._api_key = api_key or os.getenv("SERVICE_API_KEY")
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
try:
await self.start_ttfb_metrics()
yield TTSStartedFrame()
# ... processing ...
yield TTSAudioRawFrame(...)
finally:
await self.stop_ttfb_metrics()
```
---
#### Example Structure Pattern
```python
transport_params = {
"daily": lambda: DailyParams(...),
"twilio": lambda: FastAPIWebsocketParams(...),
"webrtc": lambda: TransportParams(...),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(...)
tts = SomeTTSService(...)
llm = OpenAILLMService(...)
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(...)
pipeline = Pipeline([...])
task = PipelineTask(pipeline, params=..., observers=[...])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
```
---
## Execution Flow
1. Fetch uncommitted and outgoing changes
2. Categorize files (services, examples, tests, utilities)
3. Analyze each file:
- Readability
- Performance
- Documentation
- Pattern consistency
4. Generate actionable recommendations
5. Apply Pipecat standards
---
## Examples
### Before: Tuple Usage
```python
def get_audio_info(self) -> Tuple[int, int]:
return (48000, 1)
```
### After: Named Class
```python
class AudioInfo:
"""Audio configuration information.
Parameters:
sample_rate: Sample rate in Hz.
num_channels: Number of audio channels.
"""
sample_rate: int
num_channels: int
def get_audio_info(self) -> AudioInfo:
return AudioInfo(sample_rate=48000, num_channels=1)
```
---
### Before: Missing Documentation
```python
class NewTTSService(TTSService):
def __init__(self, api_key: str, voice: str):
self._api_key = api_key
self._voice = voice
```
### After: Fully Documented
```python
class NewTTSService(TTSService):
"""Text-to-speech service using NewProvider API.
Streams PCM audio and emits TTSAudioRawFrame frames compatible
with Pipecat transports.
Supported features:
- Text-to-speech synthesis
- Streaming PCM audio
- Voice customization
- TTFB metrics
"""
def __init__(self, *, api_key: str, voice: str, **kwargs):
"""Initialize the NewTTSService.
Args:
api_key: API key for authentication.
voice: Voice identifier to use.
**kwargs: Additional arguments passed to the parent service.
"""
super().__init__(**kwargs)
self._api_key = api_key
self.set_voice(voice)
```
---
## Notes
- Non-breaking improvements only
- Backward compatibility preserved
- Conservative performance changes
- Google-style docstrings
- Pattern checks follow recent Pipecat code

View File

@@ -0,0 +1,107 @@
---
name: code-review
description: Automated code review for pull requests using multiple specialized agents
disable-model-invocation: true
allowed-tools: Bash(gh issue view:*), Bash(gh search:*), Bash(gh issue list:*), Bash(gh pr comment:*), Bash(gh pr diff:*), Bash(gh pr view:*), Bash(gh pr list:*)
---
Provide a code review for the given pull request.
**Agent assumptions (applies to all agents and subagents):**
- All tools are functional and will work without error. Do not test tools or make exploratory calls. Make sure this is clear to every subagent that is launched.
- Only call a tool if it is required to complete the task. Every tool call should have a clear purpose.
To do this, follow these steps precisely:
1. Launch a haiku agent to check if any of the following are true:
- The pull request is closed
- The pull request is a draft
- The pull request does not need code review (e.g. automated PR, trivial change that is obviously correct)
- Claude has already commented on this PR (check `gh pr view <PR> --comments` for comments left by claude)
If any condition is true, stop and do not proceed.
Note: Still review Claude generated PR's.
2. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including:
- The root CLAUDE.md file, if it exists
- Any CLAUDE.md files in directories containing files modified by the pull request
3. Launch a sonnet agent to view the pull request and return a summary of the changes
4. Launch 4 agents in parallel to independently review the changes. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following:
Agents 1 + 2: CLAUDE.md compliance sonnet agents
Audit changes for CLAUDE.md compliance in parallel. Note: When evaluating CLAUDE.md compliance for a file, you should only consider CLAUDE.md files that share a file path with the file or parents.
Agent 3: Opus bug agent (parallel subagent with agent 4)
Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff.
Agent 4: Opus bug agent (parallel subagent with agent 3)
Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code.
**CRITICAL: We only want HIGH SIGNAL issues.** Flag issues where:
- The code will fail to compile or parse (syntax errors, type errors, missing imports, unresolved references)
- The code will definitely produce wrong results regardless of inputs (clear logic errors)
- Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken
Do NOT flag:
- Code style or quality concerns
- Potential issues that depend on specific inputs or state
- Subjective suggestions or improvements
If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time.
In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent.
5. For each issue found in the previous step by agents 3 and 4, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations.
6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review.
7. If issues were found, skip to step 8 to post comments.
If NO issues were found, post a summary comment using `gh pr comment` (if `--comment` argument is provided):
"No issues found. Checked for bugs and CLAUDE.md compliance."
8. Create a list of all comments that you plan on leaving. This is only for you to make sure you are comfortable with the comments. Do not post this list anywhere.
9. Post inline comments for each issue using `gh pr review` with inline comments. For each comment:
- Provide a brief description of the issue
- For small, self-contained fixes, include a committable suggestion block
- For larger fixes (6+ lines, structural changes, or changes spanning multiple locations), describe the issue and suggested fix without a suggestion block
- Never post a committable suggestion UNLESS committing the suggestion fixes the issue entirely. If follow up steps are required, do not leave a committable suggestion.
**IMPORTANT: Only post ONE comment per unique issue. Do not post duplicate comments.**
Use this list when evaluating issues in Steps 4 and 5 (these are false positives, do NOT flag):
- Pre-existing issues
- Something that appears to be a bug but is actually correct
- Pedantic nitpicks that a senior engineer would not flag
- Issues that a linter will catch (do not run the linter to verify)
- General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md
- Issues mentioned in CLAUDE.md but explicitly silenced in the code (e.g., via a lint ignore comment)
Notes:
- Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch.
- Create a todo list before starting.
- You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md, include a link to it).
- If no issues are found, post a comment with the following format:
---
## Code review
No issues found. Checked for bugs and CLAUDE.md compliance.
---
- When linking to code in inline comments, follow the following format precisely, otherwise the Markdown preview won't render correctly: `https://github.com/OWNER/REPO/blob/FULL_SHA/path/to/file.py#L10-L15`
- Requires full git sha
- You must provide the full sha. Commands like `https://github.com/owner/repo/blob/$(git rev-parse HEAD)/foo/bar` will not work, since your comment will be directly rendered in Markdown.
- Repo name must match the repo you're code reviewing
- # sign after the file name
- Line range format is L[start]-L[end]
- Provide at least 1 line of context before and after, centered on the line you are commenting about (eg. if you are commenting about lines 5-6, you should link to `L4-7`)

View File

@@ -3,21 +3,20 @@ name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
Document a Python module or class using Google-style docstrings following project conventions. The argument can be a class name or a module path.
## Instructions
1. First, find the class in the codebase:
```
Search for "class ClassName" in src/pipecat/
```
1. Determine what to document based on the argument:
2. If multiple files contain that class name:
- List all matches with their file paths
- Ask the user which one they want to document
- Wait for confirmation before proceeding
**If a module path is provided** (e.g. `src/pipecat/audio/vad/vad_analyzer.py`):
- Use that file directly
3. Once the file is identified, read the module to understand its structure:
**If a class name is provided** (e.g. `VADAnalyzer`):
- Search for `class ClassName` in `src/pipecat/`
- If multiple files contain that class name, list all matches with their file paths, ask the user which one they want to document, and wait for confirmation
2. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component

View File

@@ -0,0 +1,28 @@
---
name: pr-submit
description: Create and submit a GitHub PR from the current branch
---
Submit the current changes as a GitHub pull request.
## Instructions
1. Check the current state of the repository:
- Run `git status` to see staged, unstaged, and untracked changes
- Run `git diff` to see current changes
- Run `git log --oneline -10` to see recent commits
2. If there are uncommitted changes relevant to the PR:
- Ask the user if they want a specific prefix for the branch name (e.g., `alice/`, `fix/`, `feat/`)
- Create a new branch based on the current branch
- Commit the changes using multiple commits if the changes are unrelated
3. Push the branch and create the PR:
- Push with `-u` flag to set upstream tracking
- Create the PR using `gh pr create`
4. After the PR is created:
- Run `/changelog <pr_number>` to generate changelog files, then commit and push them
- Run `/pr-description <pr_number>` to update the PR description
5. Return the PR URL to the user.

View File

@@ -0,0 +1,250 @@
---
name: update-docs
description: Update documentation pages to match source code changes on the current branch
---
Update documentation pages to reflect source code changes on the current branch. Analyzes the diff against main, maps changed source files to their corresponding doc pages, and makes targeted edits.
## Arguments
```
/update-docs [DOCS_PATH]
```
- `DOCS_PATH` (optional): Path to the docs repository root. If not provided, ask the user.
Examples:
- `/update-docs /Users/me/src/docs`
- `/update-docs`
## Instructions
### Step 1: Resolve docs path
If `DOCS_PATH` was provided as an argument, use it. Otherwise, ask the user for the path to their docs repository.
Verify the path exists and contains `server/services/` subdirectory.
### Step 2: Create docs branch
Get the current pipecat branch name:
```bash
git rev-parse --abbrev-ref HEAD
```
In the docs repo, create a new branch off main with a matching name:
```bash
cd DOCS_PATH && git checkout main && git pull && git checkout -b {branch-name}-docs
```
For example, if the pipecat branch is `feat/new-service`, the docs branch becomes `feat/new-service-docs`.
All doc edits in subsequent steps are made on this branch.
### Step 3: Detect changed source files
Run:
```bash
git diff main..HEAD --name-only
```
Filter to files that could affect documentation:
- `src/pipecat/services/**/*.py` (service implementations)
- `src/pipecat/transports/**/*.py` (transport implementations)
- `src/pipecat/serializers/**/*.py` (serializer implementations)
- `src/pipecat/processors/**/*.py` (processor implementations)
- `src/pipecat/audio/**/*.py` (audio utilities)
- `src/pipecat/turns/**/*.py` (turn management)
- `src/pipecat/observers/**/*.py` (observers)
- `src/pipecat/pipeline/**/*.py` (pipeline core)
Ignore `__init__.py`, `__pycache__`, test files, and files that only contain type re-exports.
### Step 4: Map source files to doc pages
For each changed source file, find the corresponding doc page. Read the mapping file at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md` and apply its tiered lookup: tier 1 (known exceptions) → tier 2 (pattern matching) → tier 3 (search fallback). **First match wins.**
### Step 5: Analyze each source-doc pair
For each mapped pair:
1. **Read the full source file** to understand current state
2. **Read the diff** for that file: `git diff main..HEAD -- <source_file>`
3. **Read the current doc page** in full
Identify what changed by comparing source to docs:
- **Constructor parameters**: Compare `__init__` signature to the Configuration section's `<ParamField>` entries
- **InputParams fields**: Compare `InputParams(BaseModel)` class fields to the InputParams table
- **Event handlers**: Compare `_register_event_handler` calls and event handler definitions to Event Handlers section
- **Class names / imports**: Check if Usage examples reference correct names
- **Behavioral changes**: Check if Notes section needs updating
### Step 6: Make targeted edits
For each doc page that needs updates, edit **only the sections that need changes**. Preserve all other content exactly as-is.
#### Rules
- **Never remove content** unless the corresponding source code was removed
- **Never rewrite sections** that are already accurate
- **Match existing formatting** — if the page uses `<ParamField>` tags, use them; if it uses tables, use tables
- **Keep descriptions concise** — match the tone and length of surrounding content
- **Preserve CardGroup, links, and examples** unless they reference removed functionality
- **Don't touch frontmatter** unless the class was renamed
#### Section-specific guidance
**Configuration** (constructor params):
- Use `<ParamField path="name" type="type" default="value">` format if the page already uses it
- Add new params in logical order (required first, then optional)
- Remove params that no longer exist in source
- Update types/defaults that changed
**InputParams** (runtime settings):
- Use markdown table format: `| Parameter | Type | Default | Description |`
- Match the field names and types from the `InputParams(BaseModel)` class
- Include the default values from the source
**Usage** (code examples):
- Update import paths, class names, and parameter names
- Only modify examples if they would break or be misleading with the new API
- Don't rewrite working examples just to add new optional params
**Notes**:
- Add notes for new behavioral gotchas or breaking changes
- Remove notes about limitations that were fixed
- Keep existing notes that are still accurate
**Event Handlers**:
- Update the event table and example code
- Add new events, remove deleted ones
- Update handler signatures if they changed
**Overview / Key Features / Prerequisites**:
- Only update if the PR fundamentally changes what the service does (new capability, removed capability, renamed class)
- Most PRs will NOT need changes to these sections
### Step 7: Update guides
Guides at `DOCS_PATH/guides/` reference specific class names, parameters, imports, and code patterns. After completing reference doc edits, check if any guides need updates too.
For each changed source file, collect the class names, renamed parameters, and changed imports from the diff. Search the guides directory:
```bash
grep -rl "ClassName\|old_param_name" DOCS_PATH/guides/
```
For each guide that references changed code:
1. Read the full guide
2. Update class names, parameter names, import paths, and code examples that are now incorrect
3. **Don't rewrite prose** — only fix the specific references that changed
4. Leave guides alone if they reference the service generally but don't use any changed APIs
Guide directories:
- `guides/learn/` — conceptual tutorials (pipeline, LLM, STT, TTS, etc.)
- `guides/fundamentals/` — practical how-tos (metrics, recording, transcripts, etc.)
- `guides/features/` — feature-specific guides (Gemini Live, OpenAI audio, WhatsApp, etc.)
- `guides/telephony/` — telephony integration guides (Twilio, Plivo, Telnyx, etc.)
### Step 8: Identify doc gaps
After processing all mapped pairs, check for two kinds of gaps:
**Missing pages**: Source files that had no doc page mapping (neither tier 1, 2, nor 3) and are not marked as "(skip)". For each, tell the user:
- The source file path
- The main class(es) it defines
- Whether a new doc page should be created
**Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.
If the user wants a new page, create it using this template structure:
```
---
title: "Service Name"
description: "Brief description"
---
## Overview
[Description from class docstring or source analysis]
<CardGroup cols={2}>
[Cards for API reference and examples if available]
</CardGroup>
## Installation
```bash
pip install "pipecat-ai[package-name]"
```
## Prerequisites
[Environment variables and account setup]
## Configuration
[ParamField entries for constructor params]
## InputParams
[Table of InputParams fields, if the service has them]
## Usage
### Basic Setup
```python
[Minimal working example]
```
## Notes
[Important caveats]
## Event Handlers
[Event table and example code]
```
### Step 9: Output summary
After all edits are complete, print a summary:
```
## Documentation Updates
### Updated reference pages
- `server/services/stt/deepgram.mdx` — Updated Configuration (added `new_param`), InputParams (updated `language` default)
- `server/services/tts/elevenlabs.mdx` — Updated Event Handlers (added `on_connected`)
### Updated guides
- `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param``new_param`)
### Unmapped source files
- `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)
### Skipped files
- `src/pipecat/services/ai_service.py` — internal base class
```
## Guidelines
- **Be conservative** — only change what the diff warrants. Don't "improve" docs beyond what changed in source.
- **Read before editing** — always read the full doc page before making changes so you understand the existing structure.
- **Preserve voice** — match the writing style of the existing doc page, don't impose a different tone.
- **One PR at a time** — this skill operates on the current branch's diff against main. Don't look at other branches.
- **Parallel analysis** — when multiple source files map to different doc pages, analyze and edit them in parallel for efficiency.
- **Shared source files** — files like `services/google/google.py` are shared bases. Check which services import from them and update all affected doc pages.
## Checklist
Before finishing, verify:
- [ ] All changed source files were checked against the mapping table
- [ ] Each doc page edit matches the actual source code change (not guessed)
- [ ] No content was removed unless the corresponding source was removed
- [ ] New parameters have accurate types and defaults from source
- [ ] Formatting matches the existing page style
- [ ] Guides referencing changed APIs were checked and updated
- [ ] Unmapped files were reported to the user

View File

@@ -0,0 +1,79 @@
# Source-to-Doc Mapping
Maps pipecat source files to their documentation pages. Source paths are relative to `src/pipecat/`. Doc paths are relative to `DOCS_PATH`.
## Name mismatches
These source paths don't follow the standard `services/{provider}/{type}.py``server/services/{type}/{provider}.mdx` pattern.
| Source path | Doc page |
|---|---|
| `services/google/llm.py` | `server/services/llm/gemini.mdx` |
| `services/google/llm_vertex.py` | `server/services/llm/google-vertex.mdx` |
| `services/google/google.py` | (shared base — check which services use it) |
| `services/google/gemini_live/**` | `server/services/s2s/gemini-live.mdx` |
| `services/google/gemini_live/llm_vertex.py` | `server/services/s2s/gemini-live-vertex.mdx` |
| `services/aws_nova_sonic/**` | `server/services/s2s/aws.mdx` |
| `services/ultravox/**` | `server/services/s2s/ultravox.mdx` |
| `services/grok/realtime/**` | `server/services/s2s/grok.mdx` |
| `services/openai/realtime/**` | `server/services/s2s/openai.mdx` |
| `processors/frameworks/rtvi.py` | `server/frameworks/rtvi/rtvi-processor.mdx` and `server/frameworks/rtvi/rtvi-observer.mdx` |
| `processors/transcript_processor.py` | `server/utilities/transcript-processor.mdx` |
| `processors/user_idle_processor.py` | `server/utilities/user-idle-processor.mdx` |
| `processors/idle_frame_processor.py` | `server/pipeline/pipeline-idle-detection.mdx` |
| `pipeline/task.py` | `server/pipeline/pipeline-task.mdx` |
| `pipeline/runner.py` | `server/utilities/runner/guide.mdx` |
| `transports/base_transport.py` | `server/services/transport/transport-params.mdx` |
## Skip list
These files should never trigger doc updates.
| Pattern | Reason |
|---|---|
| `services/ai_service.py` | Internal base class |
| `services/stt_service.py` | Internal base class |
| `services/tts_service.py` | Internal base class |
| `services/llm_service.py` | Internal base class |
| `services/websocket_service.py` | Internal base class |
| `services/openai_realtime_beta/**` | Deprecated |
| `services/openai_realtime/**` | Deprecated |
| `services/gemini_multimodal_live/**` | Deprecated |
| `services/aws/agent_core.py` | Internal |
| `services/aws/sagemaker/**` | No doc page |
| `transports/base_input.py` | Internal base class |
| `transports/base_output.py` | Internal base class |
| `transports/websocket/client.py` | No doc page |
| `serializers/base_serializer.py` | Internal base class |
| `serializers/protobuf.py` | Internal |
| `processors/audio/**` | Internal |
| `pipeline/pipeline.py` | Core architecture, not a service doc |
## Pattern matching
For files not in the tables above, apply these patterns. Convert underscores to hyphens in provider names for doc filenames.
| Source pattern | Doc pattern |
|---|---|
| `services/{provider}/stt*.py` | `server/services/stt/{provider}.mdx` |
| `services/{provider}/tts*.py` | `server/services/tts/{provider}.mdx` |
| `services/{provider}/llm*.py` | `server/services/llm/{provider}.mdx` |
| `services/{provider}/image*.py` | `server/services/image-generation/{provider}.mdx` |
| `services/{provider}/video*.py` | `server/services/video/{provider}.mdx` |
| `services/{provider}/realtime/**` | `server/services/s2s/{provider}.mdx` |
| `transports/{name}/**` | `server/services/transport/{name}.mdx` |
| `serializers/{name}.py` | `server/services/serializers/{name}.mdx` |
| `observers/**` | `server/utilities/observers/` (match by class name) |
| `audio/vad/**` | `server/utilities/audio/` (match by class name) |
| `audio/filters/**` | `server/utilities/audio/` (match by class name) |
| `audio/mixers/**` | `server/utilities/audio/` (match by class name) |
| `processors/filters/**` | `server/utilities/filters/` (match by class name) |
If the doc file doesn't exist at the resolved path, the file is **unmapped**.
## Search fallback
For files that don't match any table or pattern above:
1. Extract the main class name(s) from the source file
2. Search the docs directory for that class name: `grep -r "ClassName" DOCS_PATH/server/`
3. If found in a doc page, use that as the mapping

View File

@@ -29,6 +29,7 @@ jobs:
- name: Install system packages
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Install dependencies
@@ -36,10 +37,13 @@ jobs:
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra deepgram \
--extra google \
--extra langchain \
--extra livekit \
--extra piper \
--extra sagemaker \
--extra tracing \
--extra websocket
- name: Run tests with coverage

View File

@@ -86,7 +86,7 @@ jobs:
fi
# Validate fragment types
VALID_TYPES="added changed deprecated removed fixed security other"
VALID_TYPES="added changed deprecated removed fixed performance security other"
INVALID_FRAGMENTS=""
for file in changelog/*.md; do

View File

@@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.10.18', '3.11.13', '3.12.11', '3.13.5']
python-version: ['3.10.19', '3.11.14', '3.12.12', '3.13.12']
name: Python ${{ matrix.python-version }}
steps:
@@ -40,20 +40,10 @@ jobs:
uv python install ${{ matrix.python-version }}
uv python pin ${{ matrix.python-version }}
- name: Test uv sync with all extras (Python < 3.13)
if: "!startsWith(matrix.python-version, '3.13.')"
- name: Test uv sync with all extras
run: |
uv sync --group dev --all-extras --no-extra krisp
- name: Test uv sync without PyTorch extras (Python 3.13+)
if: startsWith(matrix.python-version, '3.13.')
run: |
uv sync --group dev --all-extras \
--no-extra krisp \
--no-extra local-smart-turn \
--no-extra moondream \
--no-extra mlx-whisper
- name: Verify installation
run: |
uv run python --version

View File

@@ -33,6 +33,7 @@ jobs:
- name: Install system packages
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Install dependencies
@@ -40,10 +41,13 @@ jobs:
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra deepgram \
--extra google \
--extra langchain \
--extra livekit \
--extra piper \
--extra sagemaker \
--extra tracing \
--extra websocket
- name: Test with pytest

View File

@@ -7,6 +7,562 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [0.0.103] - 2026-02-20
### Added
- Added `"timestampTransportStrategy": "ASYNC"` to `InworldAITTSService`. This
allows timestamps info to trail audio chunks arrival, resulting in much
better first audio chunk latency
(PR [#3625](https://github.com/pipecat-ai/pipecat/pull/3625))
- Added model-specific `InputParams` to `RimeTTSService`: arcana params
(`repetition_penalty`, `temperature`, `top_p`) and mistv2 params
(`no_text_normalization`, `save_oovs`, `segment`). Model, voice, and param
changes now trigger WebSocket reconnection.
(PR [#3642](https://github.com/pipecat-ai/pipecat/pull/3642))
- Added `write_transport_frame()` hook to `BaseOutputTransport` allowing
transport subclasses to handle custom frame types that flow through the audio
queue.
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
- Added `DailySIPTransferFrame` and `DailySIPReferFrame` to the Daily
transport. These frames queue SIP transfer and SIP REFER operations with
audio, so the operation executes only after the bot finishes its current
utterance.
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
- Added keepalive support to `SarvamSTTService` to prevent idle connection
timeouts (e.g. when used behind a `ServiceSwitcher`).
(PR [#3730](https://github.com/pipecat-ai/pipecat/pull/3730))
- Added `UserIdleTimeoutUpdateFrame` to enable or disable user idle detection
at runtime by updating the timeout dynamically.
(PR [#3748](https://github.com/pipecat-ai/pipecat/pull/3748))
- Added `broadcast_sibling_id` field to the base `Frame` class. This field is
automatically set by `broadcast_frame()` and `broadcast_frame_instance()` to
the ID of the paired frame pushed in the opposite direction, allowing
receivers to identify broadcast pairs.
(PR [#3774](https://github.com/pipecat-ai/pipecat/pull/3774))
- Added `ignored_sources` parameter to `RTVIObserverParams` and
`add_ignored_source()`/`remove_ignored_source()` methods to `RTVIObserver` to
suppress RTVI messages from specific pipeline processors (e.g. a silent
evaluation LLM).
(PR [#3779](https://github.com/pipecat-ai/pipecat/pull/3779))
- Added `DeepgramSageMakerTTSService` for running Deepgram TTS models deployed
on AWS SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
Deepgram TTS protocol (Speak, Flush, Clear, Close), interruption handling,
and per-turn TTFB metrics.
(PR [#3785](https://github.com/pipecat-ai/pipecat/pull/3785))
### Changed
- ⚠️ `RimeTTSService` now defaults to `model="arcana"` and the
`wss://users-ws.rime.ai/ws3` endpoint. `InputParams` defaults changed from
mistv2-specific values to `None` — only explicitly-set params are sent as
query params.
(PR [#3642](https://github.com/pipecat-ai/pipecat/pull/3642))
- `AICFilter` now shares read-only AIC models via a singleton `AICModelManager`
in `aic_filter.py`.
- Multiple filters using the same model path or `(model_id,
model_download_dir)` share one loaded model, with reference counting and
concurrent load deduplication.
- Model file I/O runs off the event loop so the filter does not block.
(PR [#3684](https://github.com/pipecat-ai/pipecat/pull/3684))
- Added `X-User-Agent` and `X-Request-Id` headers to `InworldTTSService` for
better traceability.
(PR [#3706](https://github.com/pipecat-ai/pipecat/pull/3706))
- `DailyUpdateRemoteParticipantsFrame` is no longer deprecated and is now
queued with audio like other transport frames.
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
- Bumped Pillow dependency upper bound from `<12` to `<13` to allow Pillow
12.x.
(PR [#3728](https://github.com/pipecat-ai/pipecat/pull/3728))
- Moved STT keepalive mechanism from `WebsocketSTTService` to the `STTService`
base class, allowing any STT service (not just websocket-based ones) to use
idle-connection keepalive via the `keepalive_timeout` and
`keepalive_interval` parameters.
(PR [#3730](https://github.com/pipecat-ai/pipecat/pull/3730))
- Improved audio context management in `AudioContextTTSService` by moving
context ID tracking to the base class and adding
`reuse_context_id_within_turn` parameter to control concurrent TTS request
handling.
- Added helper methods: `has_active_audio_context()`,
`get_active_audio_context_id()`, `remove_active_audio_context()`,
`reset_active_audio_context()`
- Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS
implementations by removing duplicate context management code
(PR [#3732](https://github.com/pipecat-ai/pipecat/pull/3732))
- `UserIdleController` is now always created with a default timeout of 0
(disabled). The `user_idle_timeout` parameter changed from `Optional[float] =
None` to `float = 0` in `UserTurnProcessor`, `LLMUserAggregatorParams`, and
`UserIdleController`.
(PR [#3748](https://github.com/pipecat-ai/pipecat/pull/3748))
- Change the version specifier from `>=0.2.8` to `~=0.2.8` for the
`speechmatics-voice` package to ensure compatibility with future patch
versions.
(PR [#3761](https://github.com/pipecat-ai/pipecat/pull/3761))
- Updated `InworldTTSService` and `InworldHttpTTSService` to use `ASYNC`
timestamp transport strategy by default
(PR [#3765](https://github.com/pipecat-ai/pipecat/pull/3765))
- Added `start_time` and `end_time` parameters to `start_ttfb_metrics()`,
`stop_ttfb_metrics()`, `start_processing_metrics()`, and
`stop_processing_metrics()` in `FrameProcessor` and `FrameProcessorMetrics`,
allowing custom timestamps for metrics measurement. `STTService` now uses
these instead of custom TTFB tracking.
(PR [#3776](https://github.com/pipecat-ai/pipecat/pull/3776))
- Updated default Anthropic model from `claude-sonnet-4-5-20250929` to
`claude-sonnet-4-6`.
(PR [#3792](https://github.com/pipecat-ai/pipecat/pull/3792))
### Deprecated
- Deprecated unused `Traceable`, `@traceable`, `@traced`, and
`AttachmentStrategy` in `pipecat.utils.tracing.class_decorators`. This module
will be removed in a future release.
(PR [#3733](https://github.com/pipecat-ai/pipecat/pull/3733))
### Fixed
- Fixed race condition where `RTVIObserver` could send messages before
`DailyTransport` join completed. Outbound messages are now queued & delivered
after the transport is ready.
(PR [#3615](https://github.com/pipecat-ai/pipecat/pull/3615))
- Fixed async generator cleanup in OpenAI LLM streaming to prevent
`AttributeError` with uvloop on Python 3.12+ (MagicStack/uvloop#699).
(PR [#3698](https://github.com/pipecat-ai/pipecat/pull/3698))
- Fixed `SmallWebRTCTransport` input audio resampling to properly handle all
sample rates, including 8kHz audio.
(PR [#3713](https://github.com/pipecat-ai/pipecat/pull/3713))
- Fixed a race condition in `RTVIObserver` where bot output messages could be
sent before the bot-started-speaking event.
(PR [#3718](https://github.com/pipecat-ai/pipecat/pull/3718))
- Fixed Grok Realtime `session.updated` event parsing failure caused by the API
returning prefixed voice names (e.g. `"human_Ara"` instead of `"Ara"`).
(PR [#3720](https://github.com/pipecat-ai/pipecat/pull/3720))
- Fixed context ID reuse issue in `ElevenLabsTTSService`, `InworldTTSService`,
`RimeTTSService`, `CartesiaTTSService`, `AsyncAITTSService`, and
`PlayHTTTSService`. Services now properly reuse the same context ID across
multiple `run_tts()` invocations within a single LLM turn, preventing context
tracking issues and incorrect lifecycle signaling.
(PR [#3729](https://github.com/pipecat-ai/pipecat/pull/3729))
- Fixed word timestamp interleaving issue in `ElevenLabsTTSService` when
processing multiple sentences within a single LLM turn.
(PR [#3729](https://github.com/pipecat-ai/pipecat/pull/3729))
- Fixed tracing service decorators executing the wrapped function twice when
the function itself raised an exception (e.g., LLM rate limit, TTS timeout).
(PR [#3735](https://github.com/pipecat-ai/pipecat/pull/3735))
- Fixed `LLMUserAggregator` broadcasting mute events before `StartFrame`
reaches downstream processors.
(PR [#3737](https://github.com/pipecat-ai/pipecat/pull/3737))
- Fixed `UserIdleController` false idle triggers caused by gaps between user
and bot activity frames. The idle timer now starts only after
`BotStoppedSpeakingFrame` and is suppressed during active user turns and
function calls.
(PR [#3744](https://github.com/pipecat-ai/pipecat/pull/3744))
- Fixed incorrect `sample_rate` assignment in
`TavusInputTransport._on_participant_audio_data` (was using
`audio.audio_frames` instead of `audio.sample_rate`).
(PR [#3768](https://github.com/pipecat-ai/pipecat/pull/3768))
- Fixed `RTVIObserver` not processing upstream-only frames. Previously, all
upstream frames were filtered out to avoid duplicate messages from
broadcasted frames. Now only upstream copies of broadcasted frames are
skipped.
(PR [#3774](https://github.com/pipecat-ai/pipecat/pull/3774))
- Fixed mutable default arguments in `LLMContextAggregatorPair.__init__()` that
could cause shared state across instances.
(PR [#3782](https://github.com/pipecat-ai/pipecat/pull/3782))
- Fixed `DeepgramSageMakerSTTService` to properly track finalize lifecycle
using `request_finalize()` / `confirm_finalize()` and use `is_final` (instead
of `is_final and speech_final`) for final transcription detection, matching
`DeepgramSTTService` behavior.
(PR [#3784](https://github.com/pipecat-ai/pipecat/pull/3784))
- Fixed a race condition in `AudioContextTTSService` where the audio context
could time out between consecutive TTS requests within the same turn, causing
audio to be discarded.
(PR [#3787](https://github.com/pipecat-ai/pipecat/pull/3787))
- Fixed `push_interruption_task_frame_and_wait()` hanging indefinitely when the
`InterruptionFrame` does not reach the pipeline sink within the timeout.
Added a `timeout` keyword argument to customize the wait duration.
(PR [#3789](https://github.com/pipecat-ai/pipecat/pull/3789))
## [0.0.102] - 2026-02-10
### Added
- Added `ResembleAITTSService` for text-to-speech using Resemble AI's streaming
WebSocket API with word-level timestamps and jitter buffering for smooth
audio playback.
(PR [#3134](https://github.com/pipecat-ai/pipecat/pull/3134))
- Added `UserBotLatencyObserver` for tracking user-to-bot response latency.
When tracing is enabled, latency measurements are automatically recorded as
`turn.user_bot_latency_seconds` attributes on OpenTelemetry turn spans.
(PR [#3355](https://github.com/pipecat-ai/pipecat/pull/3355))
- Added `append_to_context` parameter to `TTSSpeakFrame` for conditional LLM
context addition.
- Allows fine-grained control over whether text should be added to
conversation context
- Defaults to `True` to maintain backward compatibility
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
- Added TTS context tracking system with `context_id` field to trace audio
generation through the pipeline.
- `TTSAudioRawFrame`, `TTSStartedFrame`, `TTSStoppedFrame` now include
`context_id`
- `AggregatedTextFrame` and `TTSTextFrame` now include `context_id`
- Enables tracking which TTS request generated specific audio chunks
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
- Added support for Inworld TTS Websocket Auto Mode for improved latency
(PR [#3593](https://github.com/pipecat-ai/pipecat/pull/3593))
- Added new frames for context summarization: `LLMContextSummaryRequestFrame`
and `LLMContextSummaryResultFrame`.
(PR [#3621](https://github.com/pipecat-ai/pipecat/pull/3621))
- Added context summarization feature to automatically compress conversation
history when conversation length limits (by token or message count) are
reached, enabling efficient long-running conversations.
- Configure via `enable_context_summarization=True` in
`LLMAssistantAggregatorParams`
- Customize behavior with `LLMContextSummarizationConfig` (max tokens,
thresholds, etc.)
- Automatically preserves incomplete function call sequences during
summarization
- See new examples:
`examples/foundational/54-context-summarization-openai.py` and
`examples/foundational/54a-context-summarization-google.py`
(PR [#3621](https://github.com/pipecat-ai/pipecat/pull/3621))
- Added RTVI function call lifecycle events (`llm-function-call-started`,
`llm-function-call-in-progress`, `llm-function-call-stopped`) with
configurable security levels via
`RTVIObserverParams.function_call_report_level`. Supports per-function
control over what information is exposed (`DISABLED`, `NONE`, `NAME`, or
`FULL`).
(PR [#3630](https://github.com/pipecat-ai/pipecat/pull/3630))
- Added `RequestMetadataFrame` and metadata handling for `ServiceSwitcher` to
ensure STT services correctly emit `STTMetadataFrame` when switching between
services. Only the active service's metadata is propagated downstream,
switching services triggers the newly active service to re-emit its metadata,
and proper frame ordering is maintained at startup.
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
- Added `STTMetadataFrame` to broadcast STT service latency information at
pipeline start.
- STT services broadcast P99 time-to-final-segment (`ttfs_p99_latency`) to
downstream processors
- Turn stop strategies automatically configure their STT timeout from this
metadata
- Developers can override `ttfs_p99_latency` via constructor argument for
custom deployments
- Added measured P99 values for STT providers.
- See [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) to
measure latency for your configuration
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
- Added support for `is_sandbox` parameter in `LiveAvatarNewSessionRequest` to
enable sandbox mode for HeyGen LiveAvatar sessions.
(PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))
- Added support for `video_settings` parameter in `LiveAvatarNewSessionRequest`
to configure video encoding (H264/VP8) and quality levels.
(PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))
- Added `OpenAIRealtimeSTTService` for real-time streaming speech-to-text using
OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD
and server-side VAD modes, noise reduction, and automatic reconnection.
(PR [#3656](https://github.com/pipecat-ai/pipecat/pull/3656))
- Added `bulbul:v3-beta` TTS model support for Sarvam AI with temperature
control and 25 new speaker voices.
(PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))
- Added `saaras:v3` STT model support for Sarvam AI with new `mode` parameter
(transcribe, translate, verbatim, translit, codemix) and prompt support.
(PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))
- Added new OpenAI TTS voice options `marin` and `cedar`.
(PR [#3682](https://github.com/pipecat-ai/pipecat/pull/3682))
- Added `UserMuteStartedFrame` and `UserMuteStoppedFrame` system frames, and
corresponding `user-mute-started` / `user-mute-stopped` RTVI messages, so
clients can observe when mute strategies activate or deactivate.
(PR [#3687](https://github.com/pipecat-ai/pipecat/pull/3687))
### Changed
- Updated all 30+ TTS service implementations to support context tracking with
`context_id`.
- Services now generate and propagate context IDs through TTS frames
- Enables end-to-end tracing of TTS requests through the pipeline
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
- ⚠️ `TTSService.run_tts()` now requires a `context_id` parameter for context
tracking.
- Custom TTS service implementations must update their `run_tts()`
signature
- Before: `async def run_tts(self, text: str) -> AsyncGenerator[Frame,
None]:`
- After: `async def run_tts(self, text: str, context_id: str) ->
AsyncGenerator[Frame, None]:`
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
- Simplified context aggregators to use `frame.append_to_context` flag instead
of tracking internal state.
- Cleaner logic in `LLMResponseAggregator` and
`LLMResponseUniversalAggregator`
- More consistent behavior across aggregator implementations
(PR [#3584](https://github.com/pipecat-ai/pipecat/pull/3584))
- Updated timestamps to be cumulative within an agent turn, using
flushCompleted message as an indication of when timestamps from the server
are reset to 0
(PR [#3593](https://github.com/pipecat-ai/pipecat/pull/3593))
- Changed `KokoroTTSService` to use `kokoro-onnx` instead of `kokoro` as the
underlying TTS engine.
(PR [#3612](https://github.com/pipecat-ai/pipecat/pull/3612))
- Improved user turn stop timing in `TranscriptionUserTurnStopStrategy` and
`TurnAnalyzerUserTurnStopStrategy`.
- Timeout now starts on `VADUserStoppedSpeakingFrame` for tighter, more
predictable timing
- Added support for finalized transcripts
(`TranscriptionFrame.finalized=True`) to trigger earlier
- Added fallback timeout for edge cases where transcripts arrive without
VAD events
- Removed `InterimTranscriptionFrame` handling (no longer affects timing)
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
- Improved the accuracy of the `UserBotLatencyObserver` and
`UserBotLatencyLogObserver` by measuring from the time when the user actually
starts speaking.
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
- ⚠️ Renamed `timeout` parameter to `user_speech_timeout` in
`TranscriptionUserTurnStopStrategy`.
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
- Updated the `VADUserStartedSpeakingFrame` to include `start_secs` and
`timestamp` and `VADUserStoppedSpeakingFrame` to include `stop_secs` and
`timestamp`, removing the need to separately handle the
`SpeechControlParamsFrame` for VADParams values.
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
- ⚠️ Renamed `TranscriptionUserTurnStopStrategy` to
`SpeechTimeoutUserTurnStopStrategy`. The old name is deprecated and will be
removed in a future release.
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
- `AssemblyAISTTService` now automatically configures optimal settings for
manual turn detection when `vad_force_turn_endpoint=True`. This sets
`end_of_turn_confidence_threshold=1.0` and `max_turn_silence=2000` by
default, which disables model-based turn detection and reduces latency by
relying on external VAD for turn endpoints. Warnings are logged if
conflicting settings are detected.
(PR [#3644](https://github.com/pipecat-ai/pipecat/pull/3644))
- Upgraded the `pipecat-ai-small-webrtc-prebuilt` package to v2.1.0.
(PR [#3652](https://github.com/pipecat-ai/pipecat/pull/3652))
- Changed default session mode from "CUSTOM" to "LITE" in HeyGen LiveAvatar
integration, with VP8 as the default video encoding.
(PR [#3653](https://github.com/pipecat-ai/pipecat/pull/3653))
- ⚠️ The default `VADParams` `stop_secs` default is changing from `0.8` seconds
to `0.2` seconds. This change both simplifies the developer experience and
improves the performance of STT services. With a shorter `stop_secs` value,
STT services using a local VAD can finalize sooner, resulting in faster
transcription.
- `SpeechTimeoutUserTurnStopStrategy`: control how long to wait for
additional user speech using `user_speech_timeout` (default: 0.6 sec).
- `TurnAnalyzerUserTurnStopStrategy`: the turn analyzer automatically
adjusts the user wait time based on the audio input.
(PR [#3659](https://github.com/pipecat-ai/pipecat/pull/3659))
- Moved interruption wait event from per-processor instance state to
`InterruptionFrame` itself. Added `InterruptionFrame.complete()` to signal
when the interruption has fully traversed the pipeline. Custom processors
that block or consume an `InterruptionFrame` before it reaches the pipeline
sink must call `frame.complete()` to avoid stalling
`push_interruption_task_frame_and_wait()`. A warning is logged if completion
does not happen within 2 seconds.
(PR [#3660](https://github.com/pipecat-ai/pipecat/pull/3660))
- Update the default model to `scribe_v2` for `ElevenLabsSTTService`.
(PR [#3664](https://github.com/pipecat-ai/pipecat/pull/3664))
- Changed the `DeepgramSTTService` default setting for `smart_format` to
`False`, as agents don't need smart formatting. Disabling this setting
provides a small performance improvement, as well.
(PR [#3666](https://github.com/pipecat-ai/pipecat/pull/3666))
- Changed `FunctionCallCancelFrame` to broadcast in both directions for
consistency with other function call frames.
(PR [#3672](https://github.com/pipecat-ai/pipecat/pull/3672))
- Changed default user turn stop strategy from
`TranscriptionUserTurnStopStrategy` to `TurnAnalyzerUserTurnStopStrategy`
with `LocalSmartTurnAnalyzerV3`.
(PR [#3689](https://github.com/pipecat-ai/pipecat/pull/3689))
- Renamed `RequestMetadataFrame` to `ServiceSwitcherRequestMetadataFrame` and
added a `service` field to target a specific service. The frame is now pushed
downstream by services after handling instead of being silently consumed.
(PR [#3692](https://github.com/pipecat-ai/pipecat/pull/3692))
- Update `SonioxSTTService` to set `vad_force_turn_endpoint` to `True`. This
setting disabled the turn detection logic available natively in Soniox.
Instead, Soniox relies on a local VAD to finalize the transcript. This
configuration meaningfully reduces the time to final segment for Soniox. With
this setting enabled, Soniox outputs a transcript in ~250ms (median). Pipecat
enables smart-turn detection by default using the `LocalSmartTurnAnalyzerV3`.
To use the native turn detection logic in Soniox, just set
`vad_force_turn_endpoint` to `False`.
(PR [#3697](https://github.com/pipecat-ai/pipecat/pull/3697))
- Update `SonioxSTTService` default model to `stt-rt-v4`.
(PR [#3697](https://github.com/pipecat-ai/pipecat/pull/3697))
- Updated the default model to `async_flash_v1.0` and base URL to
`https://api.async.com` for `AsyncAITTSService`.
(PR [#3701](https://github.com/pipecat-ai/pipecat/pull/3701))
### Deprecated
- Deprecated `UserBotLatencyLogObserver`. Use `UserBotLatencyObserver` directly
with its `on_latency_measured` event handler instead.
(PR [#3355](https://github.com/pipecat-ai/pipecat/pull/3355))
- Deprecated `RTVILLMFunctionCallMessage`, `RTVILLMFunctionCallMessageData`,
and `RTVIProcessor.handle_function_call()`. Use the new
`llm-function-call-in-progress` event sent automatically by `RTVIObserver`
instead.
(PR [#3630](https://github.com/pipecat-ai/pipecat/pull/3630))
### Removed
- ⚠️ Removed `timeout` parameter from `TurnAnalyzerUserTurnStopStrategy`. The
timeout is now managed internally based on STT latency.
(PR [#3637](https://github.com/pipecat-ai/pipecat/pull/3637))
### Fixed
- Fixed pipeline freeze when `InterruptionFrame` discards `EndFrame` or
`StopFrame` by making terminal frames uninterruptible.
(PR [#3542](https://github.com/pipecat-ai/pipecat/pull/3542))
- Fixed OpenAI LLM stream not being closed on cancellation/exception, which
could leak sockets.
(PR [#3589](https://github.com/pipecat-ai/pipecat/pull/3589))
- Fixed `PipelineTask` adding duplicate `RTVIProcessor` and `RTVIObserver` when
they were already provided in the pipeline or observers list. They are now
detected and skipped, with appropriate warnings and errors logged for
mismatched configurations.
(PR [#3610](https://github.com/pipecat-ai/pipecat/pull/3610))
- Fixed function call timeout task not being cancelled when the handler
completes without calling `result_callback` or is cancelled externally, which
caused `RuntimeWarning: coroutine was never awaited`.
(PR [#3616](https://github.com/pipecat-ai/pipecat/pull/3616))
- Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin
languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK
languages, causing text to accumulate until flush instead of being split at
sentence boundaries. Added fallback detection for unambiguous non-Latin
sentence-ending punctuation (e.g., ``, ``, ``).
(PR [#3617](https://github.com/pipecat-ai/pipecat/pull/3617))
- Fixed `PipelineTask` to also call `set_bot_ready()` when an external
`RTVIProcessor` is provided.
(PR [#3623](https://github.com/pipecat-ai/pipecat/pull/3623))
- Fixed `VADController` not broadcasting `SpeechControlParamsFrame` on startup,
which prevented STT services from receiving VAD params needed for TTFB
measurement.
(PR [#3628](https://github.com/pipecat-ai/pipecat/pull/3628))
- Fixed `StopAsyncIteration` exceptions in `parse_telephony_websocket()` when
WebSocket connections close before sending expected messages.
(PR [#3629](https://github.com/pipecat-ai/pipecat/pull/3629))
- Fixed WebSocket transport error when broadcasting
`InputTransportMessageFrame` by correctly instantiating the frame with its
message parameter.
(PR [#3635](https://github.com/pipecat-ai/pipecat/pull/3635))
- Fixed orphan OpenTelemetry spans during flow initialization and transitions
in tracing.
(PR [#3649](https://github.com/pipecat-ai/pipecat/pull/3649))
- Fixed `SambaNovaLLMService` and `GoogleLLMOpenAIBetaService` streams not
being closed on cancellation/exception, which could leak sockets.
(PR [#3663](https://github.com/pipecat-ai/pipecat/pull/3663))
- Fixed an issue in `InworldTTSService` where punctuation was pronounced. Now,
the `InworldTTSService` ensures proper spacing between sentences, resolving
pronunciation issues.
(PR [#3667](https://github.com/pipecat-ai/pipecat/pull/3667))
- Fixed `ParallelPipeline` allowing frames pushed by internal processors to
escape during lifecycle frame (`StartFrame`/`EndFrame`/`CancelFrame`)
synchronization. These frames are now buffered and flushed after all branches
complete.
(PR [#3668](https://github.com/pipecat-ai/pipecat/pull/3668))
- Fixed issues in Sarvam STT and TTS services: missing event handler
registration for VAD signals, `Optional[bool]` type annotations, WebSocket
state cleanup on API errors, and TTS disconnect/reconnection state
management.
(PR [#3671](https://github.com/pipecat-ai/pipecat/pull/3671))
- Fixed `RTVIObserver` sending duplicate client messages for frames that are
broadcast in both directions (e.g. `UserStartedSpeakingFrame`,
`FunctionCallResultFrame`).
(PR [#3672](https://github.com/pipecat-ai/pipecat/pull/3672))
- Fixed WebSocket STT services (ElevenLabs, Cartesia, Gladia, Soniox)
disconnecting due to idle timeout when no audio is being sent (e.g. when
inactive behind a `ServiceSwitcher`). `WebsocketSTTService` now provides
opt-in silence-based keepalive via `keepalive_timeout` and
`keepalive_interval` parameters.
(PR [#3675](https://github.com/pipecat-ai/pipecat/pull/3675))
## [0.0.101] - 2026-01-30
### Added

View File

@@ -25,7 +25,7 @@ uv run pytest tests/test_name.py
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
towncrier build --draft --version Unreleased
uv run towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
@@ -42,7 +42,7 @@ uv lock && uv sync
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
Transport Input → Pipeline Source → [Processor1] → [Processor2] → ... → Pipeline Sink → Transport Output
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
@@ -55,7 +55,11 @@ Transport Input → Pipeline Source → [Processor1] → [Processor2] → ...
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): External I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
@@ -63,12 +67,14 @@ Transport Input → Pipeline Source → [Processor1] → [Processor2] → ...
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
@@ -76,26 +82,34 @@ Transport Input → Pipeline Source → [Processor1] → [Processor2] → ...
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to exectue fast.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
### Key Directories
| Directory | Purpose |
|---------------------------|----------------------------------------------------|
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
| Directory | Purpose |
| -------------------------- | -------------------------------------------------- |
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
@@ -138,6 +152,6 @@ When adding a new service:
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Pull Requests
## Testing
After creating a PR, use `/changelog <pr_number>` to generate the changelog file and `/pr-description <pr_number>` to update the PR description.
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.

View File

@@ -25,7 +25,6 @@ Your repository must contain these components:
- **Source code** - Complete implementation following Pipecat patterns
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
- **README.md** - Must include:
- Introduction and explanation of your integration
- Installation instructions
- Usage instructions with Pipecat Pipeline
@@ -110,7 +109,6 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Key requirements:
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
@@ -235,22 +233,79 @@ def can_generate_metrics(self) -> bool:
### Dynamic Settings Updates
STT, LLM, and TTS services support `ServiceUpdateSettingsFrame` for dynamic configuration changes. The base STTService has an `_update_settings()` method that handles settings, and the private `_settings` `Dict` is used to store settings and provide access to the subclass.
STT, LLM, and TTS services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
Each service declares a settings dataclass that extends the appropriate base (`STTSettings`, `TTSSettings`, `LLMSettings`). Fields default to `NOT_GIVEN` so that update objects can represent sparse deltas:
```python
async def set_language(self, language: Language):
"""Set the recognition language and reconnect.
from dataclasses import dataclass, field
Args:
language: The language to use for speech recognition.
from pipecat.services.settings import STTSettings, NOT_GIVEN
@dataclass
class MySTTSettings(STTSettings):
"""Settings for my STT service.
Parameters:
region: Cloud region for the service.
"""
logger.info(f"Switching STT language to: [{language}]")
self._settings["language"] = language
await self._disconnect()
await self._connect()
region: str = field(default_factory=lambda: NOT_GIVEN)
```
Note that, in this example, Deepgram requires the websocket connection be disconnected and reconnected to reinitialize the service with the new value. Consider if your service requires reconnection.
The service stores its current settings in `self._settings` and declares the type with a class-level annotation for editor support:
```python
class MySTTService(STTService):
_settings: MySTTSettings
def __init__(self, *, model: str, language: str, region: str, **kwargs):
# An initial value should be provided for every settings field.
# This will be validated at service start.
# (If you track sample_rate, it can be a placeholder value like 0; see
# "Sample Rate Handling").
super().__init__(
settings=MySTTSettings(model=model, language=language, region=region), **kwargs
)
```
To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
```python
async def _update_settings(self, update: STTSettings) -> dict[str, Any]:
"""Apply a settings update, reconfiguring the recognizer if needed."""
changed = await super()._update_settings(update)
if not changed:
return changed
await self._disconnect()
await self._connect()
return changed
```
The dict keys work like a set for membership tests (`"language" in changed`) and truthiness (`if changed`). Use `changed.keys() - {"language"}` for set difference, or `changed["language"]` to inspect the previous value of a field.
Note that, in this example, the service requires a reconnect to apply the new language. Consider, for each setting, whether your service requires reconnection or can apply changes in-place.
If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
```python
async def _update_settings(self, update: STTSettings) -> dict[str, Any]:
changed = await super()._update_settings(update)
if not changed:
return changed
if "language" in changed:
await self._update_language()
else:
# TODO: this should be temporary - handle changes to other settings soon!
self._warn_unhandled_updated_settings(changed.keys() - {"language"})
return changed
```
### Sample Rate Handling
@@ -260,7 +315,7 @@ Sample rates are set via PipelineParams and passed to each frame processor at in
async def start(self, frame: StartFrame):
"""Start the service."""
await super().start(frame)
self._settings["output_format"]["sample_rate"] = self.sample_rate
self._settings.output_sample_rate = self.sample_rate
await self._connect()
```

View File

@@ -49,12 +49,12 @@ Every pull request that makes a user-facing change should include a changelog en
```
2. Choose the appropriate type:
- `added.md` - New features
- `changed.md` - Changes in existing functionality
- `deprecated.md` - Soon-to-be removed features
- `removed.md` - Removed features
- `fixed.md` - Bug fixes
- `performance.md` - Performance improvements
- `security.md` - Security fixes
- `other.md` - Other changes (documentation, dependencies, etc.)
@@ -80,7 +80,6 @@ Every pull request that makes a user-facing change should include a changelog en
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```
@@ -105,7 +104,6 @@ changelog/1234.changed.2.md
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```

View File

@@ -55,6 +55,16 @@ Looking for help debugging your pipeline and processors? Check out [Whisker](htt
Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.
### 🤖 Claude Code Skills
Use [Pipecat Skills](https://github.com/pipecat-ai/skills) with [Claude Code](https://claude.ai/code) to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:
```
claude plugin marketplace add pipecat-ai/skills
```
and install any of the available plugins.
### 📺️ Pipecat TV Channel
Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
@@ -71,19 +81,19 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
## 🧩 Available services
| Category | Services |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
@@ -163,6 +173,15 @@ You can get started with Pipecat running on your local machine, then move your a
> **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.
### Claude Code Skills
Install development workflow skills for contributing to Pipecat with [Claude Code](https://claude.ai/code):
```
claude plugin marketplace add pipecat-ai/pipecat
claude plugin install pipecat-dev@pipecat-dev-skills
```
### Running tests
To run all tests, from the root directory:

View File

@@ -1 +0,0 @@
- Added `ResembleAITTSService` for text-to-speech using Resemble AI's streaming WebSocket API with word-level timestamps and jitter buffering for smooth audio playback.

View File

@@ -1 +0,0 @@
- Added `UserBotLatencyObserver` for tracking user-to-bot response latency. When tracing is enabled, latency measurements are automatically recorded as `turn.user_bot_latency_seconds` attributes on OpenTelemetry turn spans.

View File

@@ -1 +0,0 @@
- Deprecated `UserBotLatencyLogObserver`. Use `UserBotLatencyObserver` directly with its `on_latency_measured` event handler instead.

View File

@@ -1 +0,0 @@
- Fixed pipeline freeze when `InterruptionFrame` discards `EndFrame` or `StopFrame` by making terminal frames uninterruptible.

View File

@@ -1 +0,0 @@
- Fixed OpenAI LLM stream not being closed on cancellation/exception, which could leak sockets.

View File

@@ -1 +0,0 @@
- Added support for Inworld TTS Websocket Auto Mode for improved latency

View File

@@ -1 +0,0 @@
- Updated timestamps to be cumulative within an agent turn, using flushCompleted message as an indication of when timestamps from the server are reset to 0

View File

@@ -1 +0,0 @@
- Fixed `PipelineTask` adding duplicate `RTVIProcessor` and `RTVIObserver` when they were already provided in the pipeline or observers list. They are now detected and skipped, with appropriate warnings and errors logged for mismatched configurations.

View File

@@ -1 +0,0 @@
- Changed `KokoroTTSService` to use `kokoro-onnx` instead of `kokoro` as the underlying TTS engine.

View File

@@ -1 +0,0 @@
- Fixed function call timeout task not being cancelled when the handler completes without calling `result_callback` or is cancelled externally, which caused `RuntimeWarning: coroutine was never awaited`.

View File

@@ -1,5 +0,0 @@
- Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin
languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK
languages, causing text to accumulate until flush instead of being split at
sentence boundaries. Added fallback detection for unambiguous non-Latin
sentence-ending punctuation (e.g., `。`, ``, ``).

View File

@@ -1 +0,0 @@
- Fixed `PipelineTask` to also call `set_bot_ready()` when an external `RTVIProcessor` is provided.

View File

@@ -1 +0,0 @@
- Fixed `VADController` not broadcasting `SpeechControlParamsFrame` on startup, which prevented STT services from receiving VAD params needed for TTFB measurement.

View File

@@ -1 +0,0 @@
- Fixed `StopAsyncIteration` exceptions in `parse_telephony_websocket()` when WebSocket connections close before sending expected messages.

View File

@@ -1 +0,0 @@
- Added RTVI function call lifecycle events (`llm-function-call-started`, `llm-function-call-in-progress`, `llm-function-call-stopped`) with configurable security levels via `RTVIObserverParams.function_call_report_level`. Supports per-function control over what information is exposed (`DISABLED`, `NONE`, `NAME`, or `FULL`).

View File

@@ -1 +0,0 @@
- Deprecated `RTVILLMFunctionCallMessage`, `RTVILLMFunctionCallMessageData`, and `RTVIProcessor.handle_function_call()`. Use the new `llm-function-call-in-progress` event sent automatically by `RTVIObserver` instead.

View File

@@ -1 +0,0 @@
- Fixed WebSocket transport error when broadcasting `InputTransportMessageFrame` by correctly instantiating the frame with its message parameter.

View File

@@ -1 +0,0 @@
- Fixed orphan OpenTelemetry spans during flow initialization and transitions in tracing.

View File

@@ -1 +0,0 @@
- Upgraded the `pipecat-ai-small-webrtc-prebuilt` package to v2.1.0.

View File

@@ -1 +0,0 @@
- Added `OpenAIRealtimeSTTService` for real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD and server-side VAD modes, noise reduction, and automatic reconnection.

View File

@@ -1,10 +0,0 @@
- ⚠️ The default `VADParams` `stop_secs` default is changing from `0.8` seconds
to `0.2` seconds. This change both simplifies the developer experience and
improves the performance of STT services. With a shorter `stop_secs` value,
STT services using a local VAD can finalize sooner, resulting in faster
transcription.
- `SpeechTimeoutUserTurnStopStrategy`: control how long to wait for
additional user speech using `user_speech_timeout` (default: 0.6 sec).
- `TurnAnalyzerUserTurnStopStrategy`: the turn analyzer automatically adjusts
the user wait time based on the audio input.

View File

@@ -1 +0,0 @@
- Moved interruption wait event from per-processor instance state to `InterruptionFrame` itself. Added `InterruptionFrame.complete()` to signal when the interruption has fully traversed the pipeline. Custom processors that block or consume an `InterruptionFrame` before it reaches the pipeline sink must call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()`. A warning is logged if completion does not happen within 2 seconds.

View File

@@ -1 +0,0 @@
- Fixed `SambaNovaLLMService` and `GoogleLLMOpenAIBetaService` streams not being closed on cancellation/exception, which could leak sockets.

View File

@@ -1 +0,0 @@
- Update the default model to `scribe_v2` for `ElevenLabsSTTService`.

View File

@@ -1 +0,0 @@
- Changed the `DeepgramSTTService` default setting for `smart_format` to `False`, as agents don't need smart formatting. Disabling this setting provides a small performance improvement, as well.

View File

@@ -1 +0,0 @@
- Fixed an issue in `InworldTTSService` where punctuation was pronounced. Now, the `InworldTTSService` ensures proper spacing between sentences, resolving pronunciation issues.

View File

@@ -1 +0,0 @@
- Fixed `ParallelPipeline` allowing frames pushed by internal processors to escape during lifecycle frame (`StartFrame`/`EndFrame`/`CancelFrame`) synchronization. These frames are now buffered and flushed after all branches complete.

1
changelog/3696.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `TextAggregationMetricsData` metric measuring the time from the first LLM token to the first complete sentence, representing the latency cost of sentence aggregation in the TTS pipeline.

View File

@@ -0,0 +1 @@
- Added `text_aggregation_mode` parameter to `TTSService` and all TTS subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All text now flows through text aggregators regardless of mode, enabling pattern detection and tag handling in TOKEN mode.

View File

@@ -0,0 +1 @@
- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or `text_aggregation_mode=TextAggregationMode.TOKEN` instead.

19
changelog/3714.added.md Normal file
View File

@@ -0,0 +1,19 @@
- Added support for using strongly-typed objects instead of dicts for updating service settings at runtime.
Instead of, say:
```python
await task.queue_frame(
STTUpdateSettingsFrame(settings={"language": Language.ES})
)
```
you'd do:
```python
await task.queue_frame(
STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
)
```
Each service now vends strongly-typed classes like `DeepgramSTTSettings` representing the service's runtime-updatable settings.

View File

@@ -0,0 +1 @@
- ⚠️ Refactored runtime-updatable service settings to use strongly-typed classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific subclasses) instead of plain dicts. Each service's `_settings` now holds these strongly-typed objects. For service maintainers, see changes in COMMUNITY_INTEGRATIONS.md.

View File

@@ -0,0 +1 @@
- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of passing typed settings delta objects with `*UpdateSettingsFrame(delta={...})`.

View File

@@ -0,0 +1,3 @@
- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services in favor of runtime updates via `TTSUpdateSettingsFrame`, `STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.
⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas previously only `set_language()` caused the service to actually react to the update (e.g. by reconnecting to a remote service so it an pick up the change), now all these methods do. This change was made as part of a refactor making them all work the same way under the hood.

View File

@@ -0,0 +1 @@
- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on every interruption by using `client_req_id`-based multiplexing.

View File

@@ -0,0 +1 @@
- Word timestamp support has been moved from `WordTTSService` into `TTSService` via a new `supports_word_timestamps` parameter. Services that previously extended `WordTTSService`, `AudioContextWordTTSService`, or `WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their parent `__init__` instead.

View File

@@ -0,0 +1,5 @@
- Deprecated `WordTTSService`, `WebsocketWordTTSService`, `AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their non-word counterparts with `supports_word_timestamps=True` instead:
- `WordTTSService``TTSService(supports_word_timestamps=True)`
- `WebsocketWordTTSService``WebsocketTTSService(supports_word_timestamps=True)`
- `AudioContextWordTTSService``AudioContextTTSService(supports_word_timestamps=True)`
- `InterruptibleWordTTSService``InterruptibleTTSService(supports_word_timestamps=True)`

1
changelog/3803.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies (`transformers`, `onnxruntime`) into core dependencies instead of using a self-referential extra.

View File

@@ -0,0 +1 @@
- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The `transformers` and `onnxruntime` packages are now always installed as core dependencies since they are required by the default turn stop strategy, `TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.

1
changelog/3806.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `output_medium` parameter to `AgentInputParams` and `OneShotInputParams` in Ultravox service to control initial output medium (text or voice) at call creation time.

View File

@@ -0,0 +1 @@
- Improved Ultravox TTFB measurement accuracy by using VAD speech end time instead of `UserStoppedSpeakingFrame` timing.

View File

@@ -0,0 +1 @@
- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini realtime services: added `InterruptionFrame` handling with metrics cleanup, processing metrics at response boundaries, and improved agent transcript handling for both voice and text output modalities.

View File

@@ -0,0 +1 @@
- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.

1
changelog/3808.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `SentryMetrics` method signatures to match updated `FrameProcessorMetrics` base class, resolving `TypeError` when using `start_time`/`end_time` keyword arguments.

1
changelog/3809.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `TurnMetricsData` as a generic metrics class for turn detection, with e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData` with `e2e_processing_time_ms` tracking the interval from VAD speech-to-silence transition to turn completion.

View File

@@ -0,0 +1 @@
- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and `KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to `KRISP_VIVA_API_KEY` environment variable.

View File

@@ -0,0 +1 @@
- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`. `BaseSmartTurn` now emits `TurnMetricsData` directly.

View File

@@ -0,0 +1 @@
- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security vulnerability.

1
changelog/3813.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and `AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.

1
changelog/3814.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `on_audio_context_interrupted()` and `on_audio_context_completed()` callbacks to `AudioContextTTSService`. Subclasses can override these to perform provider-specific cleanup instead of overriding `_handle_interruption()`.

1
changelog/3814.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI, ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio contexts after normal speech completion, only on interruption.

View File

@@ -0,0 +1,4 @@
- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally speaking, you don't want a user interruption to prevent a service setting change from going into effect. Note that you usually don't use `ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
- `LLMUpdateSettingsFrame`
- `TTSUpdateSettingsFrame`
- `STTUpdateSettingsFrame`

1
changelog/3822.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed STT TTFB metrics measuring timeout expiry time instead of actual transcript arrival time.

1
changelog/3825.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being unintentionally pushed downstream in `LLMUserAggregator`. They are now consumed like `TranscriptionFrame`.

1
changelog/3828.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed misleading "Empty audio frame received for STT service" warnings when using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`) that buffer audio internally.

1
changelog/3837.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is sometimes vocalized

View File

@@ -0,0 +1 @@
- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been shut down and is no longer available.

View File

@@ -0,0 +1 @@
- ⚠️ Removed `ProcessingMetricsData` and all `start_processing_metrics()`/`stop_processing_metrics()` methods from `FrameProcessor` and `FrameProcessorMetrics`. These metrics were inconsistently implemented across services and overlapped with the better-defined TTFB metric. TTFB, LLM token usage, TTS character usage, and text aggregation metrics are unaffected.

View File

@@ -42,7 +42,7 @@ This script:
- Creates a fresh virtual environment
- Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva and PlayHT)
- Handles conflicting dependencies (like grpcio versions for Riva)
- Builds the documentation in an isolated environment
- Provides detailed logging of the build process
@@ -74,7 +74,6 @@ start _build/html/index.html
├── index.rst # Main documentation entry point
├── requirements-base.txt # Base documentation dependencies
├── requirements-riva.txt # Riva-specific dependencies
├── requirements-playht.txt # PlayHT-specific dependencies
├── build-docs.sh # Local build script
└── rtd-test.py # ReadTheDocs test build script
```

View File

@@ -47,7 +47,8 @@ DAILY_ROOM_URL=https://...
# Deepgram
DEEPGRAM_API_KEY=...
SAGEMAKER_ENDPOINT_NAME=...
SAGEMAKER_STT_ENDPOINT_NAME=...
SAGEMAKER_TTS_ENDPOINT_NAME=...
# DeepSeek
DEEPSEEK_API_KEY=...
@@ -103,6 +104,7 @@ INWORLD_API_KEY=...
KRISP_MODEL_PATH=...
# Krisp Viva
KRISP_VIVA_API_KEY=...
KRISP_VIVA_FILTER_MODEL_PATH=...
KRISP_VIVA_TURN_MODEL_PATH=...
@@ -145,10 +147,6 @@ KOALA_ACCESS_KEY=...
# Piper
PIPER_BASE_URL=...
# PlayHT
PLAYHT_USER_ID=...
PLAYHT_API_KEY=...
# Plivo
PLIVO_AUTH_ID=...
PLIVO_AUTH_TOKEN=...

View File

@@ -17,7 +17,6 @@ from fastapi.responses import RedirectResponse
from loguru import logger
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -34,8 +33,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.smallwebrtc.connection import IceServer, SmallWebRTCConnection
from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -85,12 +82,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -12,7 +12,6 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -27,8 +26,6 @@ from pipecat.runner.daily import configure
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.daily.transport import DailyParams, DailyTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -68,14 +65,7 @@ async def main():
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -12,7 +12,6 @@ import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
InterruptionFrame,
@@ -34,8 +33,6 @@ from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.livekit.transport import LiveKitParams, LiveKitTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -78,12 +75,7 @@ async def main():
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -9,12 +9,10 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, LLMRunFrame, MetricsFrame
from pipecat.metrics.metrics import (
LLMUsageMetricsData,
ProcessingMetricsData,
TTFBMetricsData,
TTSUsageMetricsData,
)
@@ -35,8 +33,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -49,8 +45,6 @@ class MetricsLogger(FrameProcessor):
for d in frame.data:
if isinstance(d, TTFBMetricsData):
print(f"!!! MetricsFrame: {frame}, ttfb: {d.value}")
elif isinstance(d, ProcessingMetricsData):
print(f"!!! MetricsFrame: {frame}, processing: {d.value}")
elif isinstance(d, LLMUsageMetricsData):
tokens = d.value
print(
@@ -103,12 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
@@ -35,8 +34,6 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -118,12 +115,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
image_sync_aggregator = ImageSyncAggregator(

View File

@@ -9,7 +9,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -28,8 +27,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -74,12 +71,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -9,7 +9,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -25,11 +24,10 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.tts_service import TextAggregationMode
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -59,6 +57,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
# Alternatively, you can use TextAggregationMode.TOKEN to stream tokens instead of
# sentencesfor faster response times.
# text_aggregation_mode=TextAggregationMode.TOKEN,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
@@ -73,12 +74,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -31,8 +30,6 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -114,14 +111,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -15,7 +15,6 @@ from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -34,8 +33,6 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -100,12 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -11,7 +11,6 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -30,8 +29,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -78,14 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -25,12 +24,10 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService
from pipecat.services.deepgram.stt_sagemaker import DeepgramSageMakerSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.deepgram.tts_sagemaker import DeepgramSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -61,11 +58,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram model
stt = DeepgramSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_ENDPOINT_NAME"),
endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
# Initialize Deepgram SageMaker TTS Service
# This requires:
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram TTS model
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
voice="aura-2-andromeda-en",
)
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
@@ -83,12 +88,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -29,8 +28,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -72,12 +69,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -11,7 +11,6 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -30,8 +29,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -82,14 +79,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -29,8 +28,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -75,12 +72,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -29,8 +28,6 @@ from pipecat.services.azure.tts import AzureHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -81,12 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -29,8 +28,6 @@ from pipecat.services.azure.tts import AzureTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -81,12 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
@@ -30,8 +29,6 @@ from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -76,12 +73,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -30,8 +29,6 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -81,12 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -11,7 +11,6 @@ import time
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -30,8 +29,6 @@ from pipecat.services.openpipe.llm import OpenPipeLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -80,12 +77,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -11,7 +11,6 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -30,8 +29,6 @@ from pipecat.services.xtts.tts import XTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -78,14 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -31,8 +30,6 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -84,12 +81,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -29,8 +28,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -71,12 +68,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

Some files were not shown because too many files have changed in this diff Show More